Hacker News with Generative AI: Web Content

The Great Scrape (bearblog.dev)
LLMs feed on data. Vast quantities of text are needed to train these models, which are in turn receiving valuations in the billions. This data is scraped from the broader internet, from blogs, websites, and forums, without the author's permission and all content being opt-in by default.

Data Scraping, Web Content, Ethics, AI

13 points by Tomte 489 days ago | 1 comments

How Bad Is Link Rot? (brainbaking.com)
There’s no denying that online content disappears. Depending on the type and thoroughness of the study, reports claim that 38% to 66.5% of webpages that existed a decade ago are dead. Sometimes we’re treated with a 3xx redirect code but more often than not they’re simply gone forever if it wasn’t for the Internet Archive.

Internet Archive, Web Content, Data Loss, Online Content

17 points by ZacnyLos 583 days ago | 6 comments

Microsoft says that it's okay to steal web content because it's 'freeware.' (windowscentral.com)

Microsoft, Web Content, Intellectual Property, Copyright Law, Software

37 points by blinding-streak 760 days ago | 28 comments