Board logo

subject: Difference between Data Discovery And Data Extraction [print this page]


Difference between Data Discovery And Data Extraction

Looking at the screen scratching simplified, there are two main steps involved: data discovery and data mining. provides data discovery to navigate a website to access the pages with the information you want, and discusses the data mining to pull the data from these pages. Generally, when people think of screen scraping and focusing on data mining part of the process, but my experience is that the discovery of the data is often the most difficult of the two.

The discovery of the data in the screen scraping can be as simple as requesting a URL. For example, you might need to go to the homepage of a site and get the latest news. At the other end of the spectrum, the discovery of the data cause the connection to a website through a series of pages requires cookies to a POST request to a search form to submit to obtain, through search results pages, and finally, to all the "details" links in the search results pages get the data after you've done. In the case of the former a simple Perl script would often work very well. For more complex than that, however, a commercial tool to screen scraping is an incredible moment to watch. Especially for sites that connection, writing code to handle screen scraping can be a nightmare when it comes to dealing with cookies and more.

In the extraction phase of the data you're on the page with the data you're interested, and you should to get out of HTML. Traditionally, this usually involved the creation of a series of regular expressions that correspond with parts of the page you want (eg the URL and link titles). Regular expressions can be somewhat complicated to treat, so most of the uses screen scraping to hide this information from you, even if they can use regular expressions in the wings.

As an addendum, I should probably have third phase, which is often ignored, and it is, what do you do with the information once you have checked out? Famous examples include writing data to a CSV or XML file or save it to a database. In the case of a website, you can even scratch the information and display it in the user's browser in real time. When shopping for a screen scraping tool to make sure he gives you the flexibility you need to work with the data once it was extracted.




welcome to loan (http://www.yloan.com/) Powered by Discuz! 5.5.0