Quantcast
Channel: Connotate » Data Monetization
Viewing all articles
Browse latest Browse all 4

Connotate Web Extraction Technology: A Review by Dan Woods

$
0
0

A Core Component of a Robust Data Supply Chain

A video blog with Dan Woods, CTO and Editor of CITO Research

Dan Woods is CTO and Editor of CITO Research, the author of more than 20 books and over 1,000 white papers on technology, and a columnist for Forbes.com. He also works with many technology executives to help them understand the critical new changes that Big Data is requiring for their companies’ data architecture, technology stack, and data supply chain. In this context, we asked him to assess the Connotate platform and discuss its place in a Big Data technology strategy.

We’re in the thick of the “Data Gold Rush”…

Dan notes that there’s a data “gold rush” going on in the information industry now – particularly centered around harvesting information housed on the world’s 1 billion+ websites.  Google may have started the rush, he says, but – because Google indexes less than1% of the total content on the Web – there’s lots of white space for other companies to stake their claim, finding and monetizing their own lodes of data.

Worldwide Web: Modern Goldmine

… but you need the right technology to get the “Web gold”

Highly valuable content is definitely available throughout the Web, but it’s not all easy pickings – particularly if you’re an information aggregator looking to continually harvest content at scale from many different sites.  Dan says that many companies naturally start with the narrow question, “How can I efficiently extract and acquire this data?” Defining the problem this way naturally drives them toward acquiring simple web-scraping technology or developing do-it-yourself scripts and processes.

Obtaining data from websites is not easy.  Website content is a visual medium meant for human consumption, not high-scale automated content harvesting. Machines must be taught to “see” the site like a human being.  In addition, today’s dynamic sites, filled with Javascript and Ajax, add a whole layer of complexity to the machine-site interactions. So it’s small wonder that most companies only seek to find technology to solve this thorny “extraction” problem.

But, Dan argues, companies that truly want to build a business on monetizing website content need to consider the problem much more broadly, and choose technology that does far more than just enable data extraction. The best technology, he says, will also allow companies to understand much more about the data they’re bringing in, as well as to monitor sites for changes. And it will also drive efficiencies and new capabilities throughout a company’s data operations by pre-processing and structuring Web data before it ever enters the production process:

What Web Harvesting Technology Must Deliver

Dan believes that the Connotate platform is uniquely is suited to provide the technology foundation for high-scale harvesting of website content. His assessment:  not only has Connotate “used hundreds of tricks to cleverly solve” the extraction problem, but our technology also drives the production efficiencies and data insights that are critical to high-scale monetization of website content.

The Right Technology: Critical to High-Performing Data Supply Chains

We really couldn’t have said it better ourselves, and we’re very pleased that someone as knowledgeable and strategic as Dan Woods understands not only the immense potential in monetizing website content, but also that Connotate uniquely provides the technological foundation for doing so efficiently, effectively, and at high scale.

If you’d like to learn more about what we can do for your data sourcing and data operations, please call us at 732-296-8844, email us at info@connotate.com, or subscribe to our blog with the form above to receive more posts about data extraction technology and the data supply chain. We also hope you will join us for our data supply chain webinar on Thursday, March 26.

About Dan Woods:  Dan is a seasoned CTO, author, speaker, and entrepreneur with experience in business, computer science, journalism, and publishing. He is CTO and Editor of CITO Research, a website dedicated to creating content to improve the performance of CIO and CTOs. As an author, Dan has written or coauthored more than 20 books about business and technology, ranging from books about service-oriented architecture, open source, manufacturing, RFID, and wikis to the ideas driving the latest generation of enterprise applications, particularly in the face of Web 2.0′s impact on the enterprise. Dan has also written hundreds of white papers and conducted more than 1,000 interviews with experts in a variety of fields. He is an invited speaker and moderator at international conferences. Dan holds an M.S. from Columbia University’s Graduate School of Journalism and a B.A. in Computer Science from the University of Michigan. Dan writes a column on Forbes.com.


The post Connotate Web Extraction Technology: A Review by Dan Woods appeared first on Connotate.


Viewing all articles
Browse latest Browse all 4

Latest Images

Trending Articles





Latest Images