Conversation
rlskoeser
left a comment
There was a problem hiding this comment.
I wasn't sure which file in the google drive I was supposed to use. I tried running the script like this and I get an error about a missing id column:
python src/muse/annotation/build_notion_concept_tasks.py out.jsonl notion-sent-translations-madlad.jsonl --mt-corpus notion-sent-translations-madlad.jsonl
Am I giving it the wrong file?
Co-authored-by: Rebecca Sutton Koeser <rlskoeser@users.noreply.github.com>
Ah, sorry I forgot about the full input necessary for the script. The parallel sentence corpus |
|
oops, I thought I did give it the parallel sentence file first, but obviously based on the command I cut and pasted I did not. 🤦♀️ (the filenames are very similar and long) I'm able to run it now. Confirmed translation lang with jq, output only includes "en":
|
rlskoeser
left a comment
There was a problem hiding this comment.
Was able to run the script and run prodigy with the script output and the custom recipe. Looks good to me. The disclosure element is working well and I don't think it takes up too much space.
You might want to add a task to do a quick check of getting the annotation data out of the database just to make sure it's exportable and structured the way you want.
Associated Issue(s): #37, #38
Changes in this PR
Notes
Reviewer Checklist
build_notion_concept_tasks.pyruns successfully locally. For this you can use the mt corpora in the project drive [here]build_notion_concept_tasks.pyhas the following fields:tr_id,pair_id,model,scr_lang,tr_lang,src_text,ref_text,text, term`.build_notion_concept_tasks.pyonly include English machine translationsbuild_notion_concept_tasks.py