The Dire Defect of ‘Multilingual’ AI Content Moderation

Social media companies claim new language models can remove harmful content in every language. But those systems’ shortcomings can have vast consequences.
Abstract image of black and white brain on right side and neural strands on the left with an altered background of...
Illustration: James Marshall; Getty Images

Three parts Bosnian text. Thirteen parts Kurdish. Fifty-five parts Swahili. Eleven thousand parts English.

This is part of the data recipe for Facebook’s new large language model, which the company claims is able to detect and rein in harmful content in over 100 languages. Bumble uses similar technology to detect rude and unwanted messages in at least 15 languages. Google uses it for everything from translation to filtering newspaper comment sections. All have comparable recipes and the same dominant ingredient: English-language data.

For years, social media companies have focused their automatic content detection and removal efforts more on content in English than the world’s 7,000 other languages. Facebook left almost 70 percent of Italian- and Spanish-language Covid misinformation unflagged, compared to only 29 percent of similar English-language misinformation. Leaked documents reveal that Arabic-language posts are regularly flagged erroneously as hate speech. Poor local language content moderation has contributed to human rights abuses, including genocide in Myanmar, ethnic violence in Ethiopia, and election disinformation in Brazil. At scale, decisions to host, demote, or take down content directly affect people’s fundamental rights, particularly those of marginalized people with few other avenues to organize or speak freely.

The problem is in part one of political will, but it is also a technical challenge. Building systems that can detect spam, hate speech, and other undesirable content in all of the world’s languages is already difficult. Making it harder is the fact that many languages are "low-resource," meaning they have little digitized text data available to train automated systems. Some of these low-resource languages have limited speakers and internet users, but others, like Hindi and Indonesian, are spoken by hundreds of millions of people, multiplying the harms created by errant systems. Even if companies were willing to invest in building individual algorithms for every type of harmful content in every language, they may not have enough data to make those systems work effectively.

A new technology called “multilingual large language models” has fundamentally changed how social media companies approach content moderation. Multilingual language models—as we describe in a new paper—are similar to GPT-4 and other large language models (LLMs), except they learn more general rules of language by training on texts in dozens or hundreds of different languages. They are designed specifically to make connections between languages, allowing them to extrapolate from those languages for which they have a lot of training data, like English, to better handle those for which they have less training data, like Bosnian.

These models have proven capable of simple semantic and syntactic tasks in a wide range of languages, like parsing grammar and analyzing sentiment, but it’s not clear how capable they are at the far more language- and context-specific task of content moderation, particularly in languages they are barely trained on. And besides the occasional self-congratulatory blog post, social media companies have revealed little about how well their systems work in the real world.

Why might multilingual models be less able to identify harmful content than social media companies suggest?

One reason is the quality of data they train on, particularly in lower-resourced languages. In the large text data sets often used to train multilingual models, the least-represented languages are also the ones that most often contain text that is offensive, pornographic, poorly machine translated, or just gibberish. Developers sometimes try to make up for poor data by filling the gap with machine-translated text, but again, this means the model will still have difficulty understanding language the way people actually speak it. For example, if a language model has only been trained on text machine-translated from English into Cebuano, a language spoken by 20 million people in the Philippines, the model may not have seen the term “kuan,” slang used by native speakers but one that does not have any comparable term in other languages. 

Another challenge for multilingual models comes from disparities in the amount of data they train on in each language. When analyzing content in languages they have less training data for, the models end up leaning on rules they have inferred about languages they have more data for. This hampers their ability to understand the nuance and contexts unique to lower-resource languages and imports the values and assumptions encoded into English. One of Meta’s multilingual models, for instance, was trained using nearly a thousand times more English text than Burmese, Amharic, or Punjabi text. If its understanding of those languages is refracted through the lens of English, that will certainly affect its ability to detect harmful content related to current events playing out in those languages, like the Rohingya refugee crisis, the Tigray war, and the Indian farmers’ protest.

Finally, even if a multilingual language model were trained on equal amounts of high-quality data in every language, it would still face what computer scientists call the “curse of multilinguality”—that is, languages interfere with one another in the ultimate outputs of a model. Different languages compete with each other for space within a multilingual language model’s internal mapping of language. As a result, training a multilingual model on more Hindi data may hurt its performance on tasks in etymologically distinct languages like English or Tagalog, and increasing the total number of languages a model trains on may hurt its performance in all of them.

In the case of content moderation, this raises difficult questions about which languages social media companies should prioritize, and what goals these models should target. Should multilingual language models try to achieve equal performance in all languages? Prioritize ones with the most speakers? The ones facing the most dire content moderation problems? And who decides which are the most dire crisis?

Multilingual language models promise to bring the analytical power of LLMs to all the world's languages, but it is still unclear whether their capabilities extend to detecting harmful content. What is harmful does not seem to be easily mapped across languages and linguistic contexts. To make sure these models do not lead to disparate impacts on different language communities, social media companies need to offer more insight into how these models work.

At a minimum, companies should share information about which products rely on these models, what kinds of content they're used on, and in what languages they are used. Companies should also share basic metrics on how language models perform in each language, and more information about the training data they use, so researchers can evaluate those data sets for bias and understand the balance the company is striking between different languages. While the biggest companies, like Facebook and Google, do release versions of their language models to the public for researchers and even other companies to use, they are often mum about how those publicly available systems relate to or differ from those used in their own products. These proxies are not enough—companies should share information about the actual language models they use for content moderation as well.

Social media companies should also consider that a better approach may not be using one large multilingual model but multiple, smaller models more tailored to specific languages and language families. Masakhane's AfroLM model, for instance, is trained on 23 different African languages and is able to outperform larger multilingual models in those languages. Research communities all over the world are working hard to figure out what kinds of language models work best for their own languages. Social media companies should draw not only on their technical work but on their expertise in local language context.

As a solution, multilingual language models run the risk of being a “rest of the world”-sized band-aid to a dynamic problem. By offering more transparency and accountability, prioritizing individual language performance over scalability, and consulting with language communities, companies can start dismantling that approach.

Correction 5/30/23 3:30PT ET: The AfroLM model is from Masakhane. A previous version of the article stated it was from Lelapa.