Back Translation of a sample dataset using googletrans
Here’s an example Python code for Back Translation of a sample dataset using the googletrans
library:
from googletrans import Translator
import random
translator = Translator()
def back_translate(sentence, lang):
"""
Translates a sentence to the given language and back to the original language
"""
try:
# Translate to the given language
translation = translator.translate(sentence, dest=lang)
# Translate back to the original language
back_translation = translator.translate(translation.text, dest='en')
return back_translation.text
except:
# Return the original sentence if there's an error with translation
return sentence
# Example usage
dataset = ["The quick brown fox jumps over the lazy dog",
"I love to code in Python",
"Artificial Intelligence is the future"]
back_translated = []
for sentence in dataset:
# Randomly select a language other than English
lang = random.choice(['fr', 'es', 'de', 'ru', 'ja', 'ko'])
# Back translate the sentence to English
back_translated_sentence = back_translate(sentence, lang)
back_translated.append(back_translated_sentence)
print("Original Dataset: ", dataset)
print("Back Translated Dataset: ", back_translated)
This code uses the googletrans
library to translate the sentences to a random language other than English and then back to English to create a new variation of the sentence. The back_translate()
function takes in a sentence and a language code as input, translates the sentence to the given language, and then translates it back to English. The random.choice()
method is used to randomly select a language from a list of possible languages. The back translated sentences are stored in a new list back_translated
.