Hacker News with Generative AI: Web Content

The Great Scrape (bearblog.dev)
LLMs feed on data. Vast quantities of text are needed to train these models, which are in turn receiving valuations in the billions. This data is scraped from the broader internet, from blogs, websites, and forums, without the author's permission and all content being opt-in by default.
How Bad Is Link Rot? (brainbaking.com)
There’s no denying that online content disappears. Depending on the type and thoroughness of the study, reports claim that 38% to 66.5% of webpages that existed a decade ago are dead. Sometimes we’re treated with a 3xx redirect code but more often than not they’re simply gone forever if it wasn’t for the Internet Archive.
Microsoft says that it's okay to steal web content because it's 'freeware.' (windowscentral.com)