Crawlers, search engines and the sleaze of generative AI companies

Search Engine Land: “…LLMs are not search engines It should now be very clear that an LLM is a different beast from a search engine. A language model’s response does not directly point back to the website(s) whose content was used to train the model. There is no economic exchange like we see with search engines, and this is why many publishers (and authors) are upset. The lack of direct source citations is the fundamental difference between a search engine and an LLM, and it is the answer to the very common question of “why should Google and Bing be allowed to scrape content but not OpenAI?” (I’m using a more polite phrasing of this question.). Google and Bing are trying to show source links in their generative AI responses, but these sources, if shown at all, are not the complete set. This opens up a related question: Why should a website allow its content to be used to train a language model if it doesn’t get anything in return? That’s a very good question – and probably the most important one we should answer as a society. LLMs do have benefits despite the major shortcomings with the current generation of LLMs (such as hallucinations, lying to the human operators, and biases, to name a few), and these benefits will only increase over time while the shortcomings get worked out. But for this discussion, the important point is to realize that a fundamental pillar of how the open web functions right now is not suited for LLMs…”

Facebook LinkedIn

Posted in: AI, Internet, Knowledge Management, Legal Research

Crawlers, search engines and the sleaze of generative AI companies

Thank you!