Hacker News with Generative AI: Scraping

Wikipedia offers AI developers its article data on Kaggle to stop scraping (siliconangle.com)
The Wikimedia Foundation, the organization behind the internet’s largest free encyclopedia Wikipedia, is offering an artificial intelligence-ready dataset on Kaggle that’s aimed at dissuading AI companies and large language model trainers from scraping the website.
Go-away (another HTTP proxy for LLM scraper defence) (gammaspectra.live)
Self-hosted abuse detection and rule enforcement against low-effort mass AI scraping and bots.
Amazon is reviewing whether Perplexity AI improperly scraped online content (apnews.com)
Open Source LinikedIn Scraper (ycombinator.com)