{"id":144,"date":"2023-03-17T11:47:11","date_gmt":"2023-03-17T11:47:11","guid":{"rendered":"https:\/\/smartsource.com.sg\/blog\/?p=144"},"modified":"2023-03-17T11:47:42","modified_gmt":"2023-03-17T11:47:42","slug":"data-augmentation-using-back-translation","status":"publish","type":"post","link":"https:\/\/smartsource.com.sg\/blog\/index.php\/2023\/03\/17\/data-augmentation-using-back-translation\/","title":{"rendered":"Data Augmentation using Back Translation"},"content":{"rendered":"\n<p>If you have a csv file with data in few columns and few rows and you want to generate a bigger dataset using some data augmentation technique, Back Translation is a good fit for this scenario. Here is the code to select all columns from your sample csv and create a new CSV file with the generated data. <\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import pandas as pd\r\nimport random\r\n\r\n# Load the CSV file into a Pandas DataFrame\r\ndf = pd.read_csv('sample.csv')\r\n\r\n# Select a random sample of 10 rows from the DataFrame\r\nsample_df = df.sample(n=10)\r\n\r\n# Generate new data\r\nnew_text_data = &#91;]\r\nfor text in sample_df&#91;'text']:\r\n    # Apply your data augmentation technique here\r\n    # For example, you can use back-translation\r\n    new_text = back_translation(text)\r\n    new_text_data.append(new_text)\r\n\r\n# Create a new DataFrame with the generated data\r\nnew_df = pd.DataFrame()\r\nfor column in sample_df.columns:\r\n    new_df&#91;column] = sample_df&#91;column]\r\nnew_df&#91;'new_text'] = new_text_data\r\n\r\n# Write the new DataFrame to a CSV file\r\nnew_df.to_csv('new_sample.csv', index=False)\r\n\r\nprint('New CSV file with generated data saved successfully.')\r\n<\/code><\/pre>\n\n\n\n<p>In this code, we first load the CSV file into a Pandas DataFrame and select a random sample of 10 rows using the <code>sample()<\/code> method. Then, we generate new data using your data augmentation technique of choice. In this example, I&#8217;ve used the <code>back_translation()<\/code> function.<\/p>\n\n\n\n<p>Next, we create a new DataFrame with the original columns from the sample DataFrame and a new column containing the generated text data. We then write the new DataFrame to a CSV file using the <code>to_csv()<\/code> method, and specify <code>index=False<\/code> to exclude the index column from the CSV file.<\/p>\n\n\n\n<p>Finally, we print a message to confirm that the new CSV file has been saved successfully. You&#8217;ll need to replace <code>'new_sample.csv'<\/code> with the desired name for your new CSV file, and modify the data augmentation technique as needed.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>If you have a csv file with data in few columns and few rows and you want to generate a bigger dataset using some data augmentation technique, Back Translation is&hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[19],"tags":[109,72,68,64],"class_list":["post-144","post","type-post","status-publish","format-standard","hentry","category-tutorials","tag-back-translation","tag-data-augmentation","tag-dataset","tag-python"],"_links":{"self":[{"href":"https:\/\/smartsource.com.sg\/blog\/index.php\/wp-json\/wp\/v2\/posts\/144","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/smartsource.com.sg\/blog\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/smartsource.com.sg\/blog\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/smartsource.com.sg\/blog\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/smartsource.com.sg\/blog\/index.php\/wp-json\/wp\/v2\/comments?post=144"}],"version-history":[{"count":1,"href":"https:\/\/smartsource.com.sg\/blog\/index.php\/wp-json\/wp\/v2\/posts\/144\/revisions"}],"predecessor-version":[{"id":145,"href":"https:\/\/smartsource.com.sg\/blog\/index.php\/wp-json\/wp\/v2\/posts\/144\/revisions\/145"}],"wp:attachment":[{"href":"https:\/\/smartsource.com.sg\/blog\/index.php\/wp-json\/wp\/v2\/media?parent=144"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/smartsource.com.sg\/blog\/index.php\/wp-json\/wp\/v2\/categories?post=144"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/smartsource.com.sg\/blog\/index.php\/wp-json\/wp\/v2\/tags?post=144"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}