AI hate speech detectors show major inconsistencies, new study reveals

PsyPost: “A new large-scale analysis has found that the artificial intelligence systems used by technology companies to filter online hate speech are profoundly inconsistent. The research demonstrates that the same piece of content can be flagged as hateful by one system while being considered acceptable by another, with these disagreements being particularly pronounced for speech targeting specific demographic groups. This means a platform’s choice of moderation tool fundamentally shapes what speech is permitted in its digital space. The study was published in Findings of the Association for Computational Linguistics. Researchers from the Annenberg School for Communication at the University of Pennsylvania conducted the study to address a growing concern about online content moderation. As online hate speech has become more common, its negative effects on mental health and political polarization have been well documented. In response, major technology firms have developed and deployed automated systems, often powered by large language models, to filter this content at a massive scale. Yet these private companies have effectively become the arbiters of acceptable speech online without a consistent or transparent standard. The researchers identified a critical gap in knowledge: no one had systematically compared these different AI systems to see if they agreed on what constitutes hate speech. This lack of comparative analysis raises serious questions about fairness and predictability, as inconsistent moderation can appear arbitrary, erode public trust, and provide uneven levels of protection for different communities…”

Source: “Neil Fasching and Yphtach Lelkes. 2025. Model-Dependent Moderation: Inconsistencies in Hate Speech Detection Across LLM-based Systems. In Findings of the Association for Computational Linguistics: ACL 2025, pages 22271–22285, Vienna, Austria. Association for Computational Linguistics.

Posted in: AI, Internet, Knowledge Management