subject: Difference between Data Mining And Screen-Scraping [print this page] Difference between Data Mining And Screen-Scraping
Data mining is not screen scraping. I know some people in the audience may not agree with this statement, but they are actually two almost entirely different concepts.
In a word, you should state it this way: screen scraping, you can get information, where data mining can analyze information. It is a major simplification, so I'll talk a little.
The term 'screen scraping' comes from the old days central terminal where people were working on computers with green screens and black-only text. Screen scraping was used to extract the characters on the screens so they can be analyzed. Fast forward to the Web world today, screen scraping now usually refers to extracting information from websites. That is, computer programs can "crawl" or "spin" on websites, drawing data. People often build things such as price comparison engines, web pages, archive, or just download the text in a spreadsheet, so it can be filtered and analyzed.
Data mining, on the other hand, is defined by Wikipedia as "the practice of automatically searching large stores of data for the models." In other words, you already have the data, and you're now analyzing learn useful things about it. Data mining often involves many complex algorithms based on statistical methods. It has nothing to do with how the data first. By exploring data not worry about analyzing what already exists.
The difficulty is that people who do not know what the term 'screen scraping' will try Googling for everything it seems. We have some of these conditions on our website to people like us for example pages titled text data mining, automated data collection, Website data extraction, and even the Ripper web site created to help (I think "crabs" is a kind of "ripping"). So it has a little problem, we do not necessarily want a misconception (ie screen scraping data mining =) permanent, but we also use terminology that people actually use.