Back Translation of a sample dataset using googletrans

Here’s an example Python code for Back Translation of a sample dataset using the googletrans library:

from googletrans import Translator
import random

translator = Translator()

def back_translate(sentence, lang):
    """
    Translates a sentence to the given language and back to the original language
    """
    try:
        # Translate to the given language
        translation = translator.translate(sentence, dest=lang)
        # Translate back to the original language
        back_translation = translator.translate(translation.text, dest='en')
        return back_translation.text
    except:
        # Return the original sentence if there's an error with translation
        return sentence

# Example usage
dataset = ["The quick brown fox jumps over the lazy dog",
           "I love to code in Python",
           "Artificial Intelligence is the future"]

back_translated = []

for sentence in dataset:
    # Randomly select a language other than English
    lang = random.choice(['fr', 'es', 'de', 'ru', 'ja', 'ko'])
    # Back translate the sentence to English
    back_translated_sentence = back_translate(sentence, lang)
    back_translated.append(back_translated_sentence)

print("Original Dataset: ", dataset)
print("Back Translated Dataset: ", back_translated)

This code uses the googletrans library to translate the sentences to a random language other than English and then back to English to create a new variation of the sentence. The back_translate() function takes in a sentence and a language code as input, translates the sentence to the given language, and then translates it back to English. The random.choice() method is used to randomly select a language from a list of possible languages. The back translated sentences are stored in a new list back_translated.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.