An excellent overview of steps we can take to prevent scraping of content by unscrupulous bots, including options for robots.txt and blocking specific User-Agents at the firewall or CDN level.
As Neil writes, it's a real problem that this is an opt-out situation, meaning that we need to explicity deny access to our copywritten works, rather than opt-in where authors, artists, and creators provide their work for training purposes.
Like many web users, I've played with using tools like ChatGPT for writing blog posts, Dall-E and Midjourney for hallucinating artwork, Copilot for helping write code, and probably some other things I can't recall right now. I also harbor significant concerns about the ethics of how these tools were trained and the consequences of using them.
I suppose that we can at least give credit to these companies for being "well-behaved" if they follow these directives or identify themselves by User-Agent.