Automatic Detection and Banning of Content Stealing Bots for E-commerce
Nicolas Poggi, Josep Lluis Berral, Toni Moreno, Ricard Gavaldà and Jordi Torres
In: NIPS 2007 Workshop on Machine Learning in Adversarial Environments for Computer Security, 7-8 December 2007, Whistler, British Columbia, Canada.
Content stealing in the web is becoming a serious concern for information and e-commerce websites. In the
practices known as web fetching or web scraping, a stealer bot simulates a human web user to extract desired
content off the victim’s website, which is then stripped off
copyright information and displayed as original in the scraper's website.
In this work we report initial results on the application of machine learning techniques to detect and ban stealer bots from a website, extending our AUGURES system previously used to separate buying from nonbuying sessions in an e-commerce site.