Caleb Hearth - Linked: Block the Bots that Feed “AI” Models by Scraping Your Website

An excellent overview of steps we can take to prevent scraping of content by unscrupulous bots, including options for robots.txt and blocking specific User-Agents at the firewall or CDN level.

As Neil writes, it's a real problem that this is an opt-out situation, meaning that we need to explicity deny access to our copywritten works, rather than opt-in where authors, artists, and creators provide their work for training purposes.

Like many web users, I've played with using tools like ChatGPT for writing blog posts, Dall-E and Midjourney for hallucinating artwork, Copilot for helping write code, and probably some other things I can't recall right now. I also harbor significant concerns about the ethics of how these tools were trained and the consequences of using them.

I suppose that we can at least give credit to these companies for being "well-behaved" if they follow these directives or identify themselves by User-Agent.

⬣