Hacker News with Generative AI: Data Scraping

Show HN: I Built a FAANG Job Board – Only Jobs Scraped in the Last 24h (topjobstoday.com)
🌟 The #1 platform for tech job seekers - join our growing community today
League of Legends data scraping the hard and tedious way for fun (maknee.github.io)
League of Legends is one of the world’s most popular competitive games, with millions of players generating vast amounts of gameplay data daily. Basic match statistics are available, but accessing moment-by-moment gameplay data is near impossible. This article demonstrates how to create a high-fidelity dataset by reverse engineering the game engine, capturing information such as precise player positions to ability usage timings and damage calculations.
Microsoft Word and Excel AI data scraping switched to opt-in by default (tomshardware.com)
ByteDance is abusing the free video downloading service Cobalt for mass scraping (twitter.com)
Cloudflare's new marketplace lets websites charge AI bots for scraping (techcrunch.com)
Cloudflare announced plans on Monday to launch a marketplace in the next year where website owners can sell AI model providers access to scrape their site’s content. The marketplace is the final step of Cloudflare CEO Matthew Prince’s larger plan to give publishers greater control over how and when AI bots scrape their websites.
Some Suggestions to Improve Robots.txt (ietf.org)
The BBC does not believe the current scraping of its content and data without permission in order to train generative AI models is in the public interest, and wants to agree a more structured and sustainable approach with technology companies.
LinkedIn silently opts users into generative AI data scraping by default (bsky.app)
LinkedIn scraped user data for training before updating its terms of service (techcrunch.com)
LinkedIn may have trained AI models on user data without updating its terms.
Game UI Database slowdown caused by relentless OpenAI scraping (gamedeveloper.com)
Is it legal and possible to scrape the social media platforms? (ycombinator.com)
Given links to posts, is it legal & possible to scrape from social media such as YT, FB, Insta, TikTok & Snap?
Leaked Docs Show Nvidia Scraping a Human Lifetime of Videos per Day to Train AI (404media.co)
AI startup Anthropic accused of 'egregious' data scraping (ft.com)
Show HN: I scraped 3.2B TikTok profiles and 9B posts to build this search engine (seeksocial.io)
Storing Scraped Data in an SQLite Database on GitHub (jerrynsh.com)
How I scraped 6 years of Reddit posts in JSON (afficone.com)
Cloudflare adds option to block AI scrapers and crawlers (front-end.social)
Amazon has a way to scrape GitHub and feed its AI model (dataconomy.com)
Stop Scraping My Git Forge (gabrielsimmer.com)
Elon Musk's X loses lawsuit against Bright Data over data scraping (cnbc.com)