13 April 2007

Web Scraping

I attended a presentation of a new consolidation web-site tool. The person talking used the term "scraping" a number of times. As in: "we don't control the information, we just go out to the site and scrape the data." Sounded painful. But lo and behold Wikipedia knows the term:

Screen scraping
Screen scraping is a technique in which a computer program
data from the display output of another program. The
program doing the scraping is called a screen scraper. The key element that distinguishes screen scraping from regular
parsing is that the output being scraped was intended for final display to a human user, rather than as input to another program, and is therefore usually neither documented nor structured for convenient parsing. Screen scraping often involves ignoring binary data (usually images or multimedia data) and formatting elements that obscure the essential,
desired text data.
Optical character recognition software is a kind of visual scraper.
There are a number of synonyms for screen scraping, including: Data scraping, data extraction, web scraping, page scraping, web page wrapping and HTML scraping (the last four being specific to scraping web pages).

'Course all you techies knew that. Me? I'm out of the loop. But then I'm an old timer. I can remember life before Wikipedia. It was quiet in the cave, but it was a good life.


