Meta uses Bible translations in effort to preserve over 4,000 languages

AI team says effort supported by Christian ethicists, but 'same is not true' for Quran, other texts

Jun 15, 2023 - 00:38
 0
Meta uses Bible translations in effort to preserve over 4,000 languages

When it comes to Bible translations, we might be well on our way to the Gospel being preached in “every tongue.”

Meta, the parent company of Facebook and Instagram, has announced plans to use the Bible and other religious texts to process over 4,000 languages with the ultimate goal of preserving those languages.

But why Scripture? 

As part of an effort to collect data for its Massively Multilingual Speech (MMS) project, the Meta Artificial Intelligence research team turned to the Bible and other religious texts to tackle the daunting prospect of gathering such data for thousands of languages, since existing speech datasets currently top out at no more than 100 languages. 

By turning to the Bible, which has been translated throughout the world and shared in audio recordings of those translations, researchers were able to create a dataset of New Testament readings in more than 1,100 languages that spanned an average of 32 hours of data in each language.

In order to cover the 27 books and 260 chapters in the New Testament, Meta used data from Bible.com, GoTo.Bible and FaithComesByHearing.com, including original text as well as audio recordings.

The Meta AI team then built on that work by using “unlabeled recordings of various other Christian religious readings,” and while these are primarily spoken by male speakers, Meta researchers believe the language models “perform equally well for male and female voices.”

Researchers also held “consultations” with Christian ethicists, who, according to Meta, “concluded that most Christians would not regard the New Testament, and translations thereof, as too sacred to be used in machine learning. “

They also said, however, that the “same is not true for all religious texts: for example, the Quran was originally not supposed to be translated.” 

The paper also underscored precedent for using the Bible in such a manner, citing the 2019 CMU Wilderness effort that created speech synthesis models for nearly 700 languages. 

The MMS project, said the team, “follows a long line of research utilizing the New Testament to train and evaluate machine learning models.”

The research paper also noted a risk of “religious training data” influencing and potentially even creating bias for the language models “with respect to a particular worldview,” presumably linked to the Bible and Christianity.

However, the Meta AI team discounted the risk, adding that the language produced by the models “exhibit only little bias compared to baseline models trained on other domains.”

Ultimately, the project, according to researchers, is to preserve languages that could potentially go extinct in the coming years.

“Many of the world’s languages are in danger of disappearing, and the limitations of current speech recognition and speech generation technology will only accelerate this trend,” the team said. “We envision a world where technology has the opposite effect, encouraging people to keep their languages alive since they can access information and use technology by speaking in their preferred language.”

Despite its unprecedented ambition of scale, it’s not the first time AI has been used to amplify the timeless message of the Bible. 

In June 2020, the makers of the Christian meditation app Soultime released what was then the world’s first-ever audio version of the Bible read in its entirety by an artificial intelligence voice. The finished product featured 100 hours of audible Scripture.

Developers with the company previously told The Christian Post that they “evaluated a range of text-to-speech platforms but found Google’s Wavenet the most natural sounding.” But “because the Bible text is extremely complex,” the app developers had to “work hard to modify the basic reading to create something that both sounded natural and was truly enjoyable to listen to.”