Data Augmentation using Back Translation
If you have a csv file with data in few columns and few rows and you want to generate a bigger dataset using some data augmentation technique, Back Translation is a good fit for this scenario. Here is the code to select all columns from your sample csv and create a new CSV file with the generated data.
import pandas as pd
import random
# Load the CSV file into a Pandas DataFrame
df = pd.read_csv('sample.csv')
# Select a random sample of 10 rows from the DataFrame
sample_df = df.sample(n=10)
# Generate new data
new_text_data = []
for text in sample_df['text']:
# Apply your data augmentation technique here
# For example, you can use back-translation
new_text = back_translation(text)
new_text_data.append(new_text)
# Create a new DataFrame with the generated data
new_df = pd.DataFrame()
for column in sample_df.columns:
new_df[column] = sample_df[column]
new_df['new_text'] = new_text_data
# Write the new DataFrame to a CSV file
new_df.to_csv('new_sample.csv', index=False)
print('New CSV file with generated data saved successfully.')
In this code, we first load the CSV file into a Pandas DataFrame and select a random sample of 10 rows using the sample()
method. Then, we generate new data using your data augmentation technique of choice. In this example, I’ve used the back_translation()
function.
Next, we create a new DataFrame with the original columns from the sample DataFrame and a new column containing the generated text data. We then write the new DataFrame to a CSV file using the to_csv()
method, and specify index=False
to exclude the index column from the CSV file.
Finally, we print a message to confirm that the new CSV file has been saved successfully. You’ll need to replace 'new_sample.csv'
with the desired name for your new CSV file, and modify the data augmentation technique as needed.