Hacker News with Generative AI: Web Crawlers

How crawlers impact the operations of the Wikimedia projects (wikimedia.org)
Since the beginning of 2024, the demand for the content created by the Wikimedia volunteer community – especially for the 144 million images, videos, and other files on Wikimedia Commons – has grown significantly. In this post, we’ll discuss the reasons for this trend and its impact.

Wikimedia, Web Crawlers, Data Usage, Open Source

7 points by panic 479 days ago | 1 comments

Nepenthes is a tarpit to catch AI web crawlers (zadzmo.org)
This is a tarpit intended to catch web crawlers. Specifically, it's targetting crawlers that scrape data for LLM's - but really, like the plants it is named after, it'll eat just about anything that finds it's way inside.

Artificial Intelligence, Web Crawlers

714 points by blendergeek 555 days ago | 265 comments

Reddit's robots.txt disallows all web crawlers (reddit.com)

Web Crawlers, Reddit, Content Moderation

8 points by rawfael 751 days ago | 1 comments