pds-it
['Blog post','no']
Amazon Web Services
Blog

AWS study: AI translations are usually of poor quality

Contents

    According to an AWS study, 57.1% of content on the internet has been translated by artificial intelligence - and in poor quality. Machine translations are particularly poor in less widely used languages, raising concerns for the future training of large language models.

    The AWS study on AI translations

    The AWS study on AI translations was conducted by researchers with AI Labs. To obtain usable results, they analyzed 6.38 billion sentences from content on the internet.

    The finding: More than half of the text content on the Internet is poor, machine translations. Most content is available in two or more languages. The quality is modest at best.

    The researchers at AI Labs came up with the idea for the AWS study because colleagues in the industry pointed out to them that many texts on the internet were obviously machine-translated, especially in less widely used languages.

    To find out the extent of machine translation in less widely used languages, they conducted the AWS study and found that 57.1% of textual content on the internet comes from machine translation.

    The 6.38 billion sentences selected for the study were checked for direct translations into more than one language. It was found that less common languages in particular suffer from poor translations.

    Risks of AI translations according to the AWS study

    According to the AWS study on AI translations, texts are generally translated much more frequently into the world languages. English and French are at the top of the list. This is where there were the most parallels between the texts reviewed. Many of the sentences were available in four languages. One original and three translations.

    The results are different for languages that are not widely spoken. For example, the African languages Wolfo and Xhosa were examined and it was found that they occur only half as often and that the quality of machine translations into these languages is poor.

    Overall, the AWS study also showed that the quality of translations decreases the more languages are translated.

    The problems that can arise from this can be found in the further training of AI models for machine translations. If these artificial intelligences learn from the poor translations, they also adopt this poor quality. However, since the majority of texts, especially in lesser-known languages, consist exclusively of these poor translations, there is no basis for better machine training in these languages.

    AI models are trained with billions of pieces of data so that they can perform at the level we currently know. This training data comes from the internet. Developers of AI models would therefore have to carry out better quality control of the training data. But this is also a huge effort. Especially if you don't speak the languages yourself.

    We will see whether better training methods can be developed in the future to ensure higher quality translations.

    AWS Machine Learning & AI learning with skill it

    If you want to find out more about machine learning and training methods for AI models, we have the right seminars for you.

    In the four-day course The Machine Learning Pipeline on AWS you will implement a project yourself with the help of AI to solve a given problem.

    With the Amazon SageMaker Studio for Data Scientists you can develop machine learning models yourself. In our course, you will learn how to apply this in practice.

    These courses require a good knowledge of English.

    Author
    Kia Figge
    As the founder of Textflamme, Kia has been writing for companies from all industries for over 10 years. She has written texts for countless websites and blogs and feels at home in the field of information technology.