Step I: `python -m venv venv`<br><br>
Step II: `pip install -r requirements.txt`
-
Get the
GEMINI_API_KEYfrom Google AI studio. -
Create the
.envfile and add it asGEMINI_API_KEY.
Keep the data you want to translate in data directory.
And the data should be in json format.
[
{
"query": "cruise portland maine",
"ad_title": "New England Cruises",
"ad_description": "Your New England Cruise Awaits! Holland America Line Official Site.",
"relevance_label": 1
},
{
"query": "transportation to cruise port miami",
"ad_title": "Holland America Line\u00ae",
"ad_description": "Explore Your World with Four Extraordinary Offers.",
"relevance_label": 0
},
{
"query": "transportation to cruise port miami",
"ad_title": "Holland America Line\u00ae",
"ad_description": "Cruise to Your Own Private Island In the Caribbean. Learn More Now.",
"relevance_label": 1
}
]
***Notebook for qadsm data is available in `data/QADSM.ipynb`***
You can load in this format from huggingface.
Fields might differ depending upon the dataset.
If your data points have different fields, then make a new branch before working on it.
The size of data in the json fileshould not be more than 50*250 = 12500 elements in the list, because the rate limit for gemini flash is 250 per day currently.
The main branch is the default branch and it is for QADSM dataset.
-
First create a new branch
git checkout -b <dataset_name>
-
Update and add the new prompt in
system_instruction.yaml- Name it :
gemini_translation_system_instruction_<dataset_name> - We do not need to update much, maybe just the field names and format.
- Name it :
-
Then change the system instruction we are going to use in
create_gemini_promptfunction inutil/utils.pyfile, line number19.
Run gemini/gemini_calls.ipynb
Before running, update the INPUT_PATH & OUTPUT_PATH,
INPUT_PATH should be in following format : data/<dataset_name>/<data_file_name>.json
OUTPUT_PATH should be in following format: translated_output/<dataset_name>/<output_file_name>.json
NOTE: DO NOT CHANGE THE BATCH SIZE FROM 500 becaue it cannot handle more than 600-800 and to be safe, we are using 500.
git push --set-upstream origin <branch_name>
- if you are pushing for first time.
git push
- if you are pushing after.
You can push to the branch after the translated json is saved in translated_output/<dataset_name>/<output_file_name>.json.
Leave each branch as it it, do not merge!