Hacker News with Generative AI: Data Scraping

ByteDance is abusing the free video downloading service Cobalt for mass scraping (twitter.com)
Cloudflare's new marketplace lets websites charge AI bots for scraping (techcrunch.com)
Cloudflare announced plans on Monday to launch a marketplace in the next year where website owners can sell AI model providers access to scrape their site’s content. The marketplace is the final step of Cloudflare CEO Matthew Prince’s larger plan to give publishers greater control over how and when AI bots scrape their websites.
Some Suggestions to Improve Robots.txt (ietf.org)
The BBC does not believe the current scraping of its content and data without permission in order to train generative AI models is in the public interest, and wants to agree a more structured and sustainable approach with technology companies.
LinkedIn silently opts users into generative AI data scraping by default (bsky.app)
LinkedIn scraped user data for training before updating its terms of service (techcrunch.com)
LinkedIn may have trained AI models on user data without updating its terms.
Game UI Database slowdown caused by relentless OpenAI scraping (gamedeveloper.com)
Is it legal and possible to scrape the social media platforms? (ycombinator.com)
Given links to posts, is it legal & possible to scrape from social media such as YT, FB, Insta, TikTok & Snap?
Leaked Docs Show Nvidia Scraping a Human Lifetime of Videos per Day to Train AI (404media.co)
AI startup Anthropic accused of 'egregious' data scraping (ft.com)
Show HN: I scraped 3.2B TikTok profiles and 9B posts to build this search engine (seeksocial.io)
Storing Scraped Data in an SQLite Database on GitHub (jerrynsh.com)
How I scraped 6 years of Reddit posts in JSON (afficone.com)
Cloudflare adds option to block AI scrapers and crawlers (front-end.social)
Amazon has a way to scrape GitHub and feed its AI model (dataconomy.com)
Stop Scraping My Git Forge (gabrielsimmer.com)
Elon Musk's X loses lawsuit against Bright Data over data scraping (cnbc.com)