Skip to content

feat(nlp): tuning prompt for regional language translation#1156

Open
madsysharma wants to merge 2 commits into
viru0909-dev:mainfrom
madsysharma:feat/hinglish-dialogue
Open

feat(nlp): tuning prompt for regional language translation#1156
madsysharma wants to merge 2 commits into
viru0909-dev:mainfrom
madsysharma:feat/hinglish-dialogue

Conversation

@madsysharma
Copy link
Copy Markdown
Contributor

@madsysharma madsysharma commented Jun 4, 2026

Pull Request: Tune prompt for regional language translation

Summary

The Hinglish dialogue spoken by the Nyay Saarthi avatar sometimes came out in a very formal, Sanskritised "shuddh" Hindi (words like nyayalaya, adhiniyam, praavdhaan, kshatipoorti), which is harder for everyday users to understand than the plain, mixed Hinglish they actually speak. This PR tunes the HINGLISH_CONVERSION_PROMPT system prompt in nlp-orchestrator/avatar_speech.py so the model produces colloquial, easy-to-understand Hinglish, and adds tests to lock the new register in.

Closes #849

What's in this PR

nlp-orchestrator/avatar_speech.py: rewrote the system prompt used by convert_to_hinglish() to steer register without touching any conversion logic, the model, generation params, or the {markdown_answer} placeholder contract:

  • Register guidance up top: instructs the model to talk like an educated bilingual Indian explaining things to a friend, not like a government notice, news anchor, or court order, and to prefer the easy word over the heavy "shuddh"/Sanskritised one.

  • Keep-in-English whitelist: common terms users already know in English (court, police, FIR, bail, case, lawyer, compensation, insurance, etc.) are explicitly kept in English rather than force-translated into formal Hindi.

  • Avoid-list with substitutions: an explicit "use this, not that" mapping (eg: nyayalaya -> court, adhiniyam -> Act/kanoon, kshatipoorti -> compensation/muaavza) to anchor the model away from the specific formal words that triggered the issue.

  • One-shot style example: a short sample line in the target register so the model matches tone and not just rules.

  • Preserved all original constraints: 4 to 6 sentence spoken length, warm tone, accurate section numbers / law names, aap (not tum), and plain text only (no markdown) for text-to-speech.
    nlp-orchestrator/tests/test_avatar_speech.py: new test module (there was none for this file before):

  • Prompt-contract regression guards: the placeholder is intact and .format() still works; the colloquial-register cue is present; the formal words to avoid and the English terms to keep are named; the original constraints (length, aap/tum, plain-text, persona, section accuracy) all survive.

  • convert_to_hinglish() behaviour with a mocked Groq client (no network): output is stripped and built from the tuned prompt; the error path falls back to the first three sentences.

  • Light coverage of the untouched detect_domain / get_interim_messages helpers.

Why prompt-only (no logic change)

The only LLM-driven translation path is this system prompt; report_generator's avatar script uses fixed templates that are already colloquial. Keeping the change to the prompt string makes it low-risk, reviewable, and easy to iterate on.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Testing

PS C:\Users\madhu\Documents\nyay-setu-working\nlp-orchestrator> python -m pytest tests/test_avatar_speech.py -v
================================================= test session starts =================================================
platform win32 -- Python 3.11.7, pytest-9.0.3, pluggy-1.6.0 -- C:\Program Files\Python311\python.exe
cachedir: .pytest_cache
rootdir: C:\Users\madhu\Documents\nyay-setu-working\nlp-orchestrator
plugins: anyio-4.13.0, deepeval-4.0.5, Faker-40.19.1, langsmith-0.8.8, asyncio-1.4.0, cov-4.1.0, mock-3.15.1, repeat-0.9.4, rerunfailures-16.3, xdist-3.8.0
asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 21 items

tests/test_avatar_speech.py::TestHinglishPromptRegister::test_prompt_still_accepts_the_answer_placeholder PASSED [  4%]
tests/test_avatar_speech.py::TestHinglishPromptRegister::test_prompt_formats_without_keyerror PASSED             [  9%]
tests/test_avatar_speech.py::TestHinglishPromptRegister::test_prompt_demands_colloquial_register PASSED          [ 14%]
tests/test_avatar_speech.py::TestHinglishPromptRegister::test_prompt_names_formal_words_to_avoid[nyayalaya] PASSED [ 19%]
tests/test_avatar_speech.py::TestHinglishPromptRegister::test_prompt_names_formal_words_to_avoid[adhiniyam] PASSED [ 23%]
tests/test_avatar_speech.py::TestHinglishPromptRegister::test_prompt_names_formal_words_to_avoid[praavdhaan] PASSED [ 28%]
tests/test_avatar_speech.py::TestHinglishPromptRegister::test_prompt_names_formal_words_to_avoid[kshatipoorti] PASSED [ 33%]
tests/test_avatar_speech.py::TestHinglishPromptRegister::test_prompt_names_formal_words_to_avoid[vidhik] PASSED  [ 38%]
tests/test_avatar_speech.py::TestHinglishPromptRegister::test_prompt_whitelists_common_english_terms[court] PASSED [ 42%]
tests/test_avatar_speech.py::TestHinglishPromptRegister::test_prompt_whitelists_common_english_terms[police] PASSED [ 47%]
tests/test_avatar_speech.py::TestHinglishPromptRegister::test_prompt_whitelists_common_english_terms[fir] PASSED [ 52%]
tests/test_avatar_speech.py::TestHinglishPromptRegister::test_prompt_whitelists_common_english_terms[bail] PASSED [ 57%]
tests/test_avatar_speech.py::TestHinglishPromptRegister::test_prompt_whitelists_common_english_terms[compensation] PASSED [ 61%]
tests/test_avatar_speech.py::TestHinglishPromptRegister::test_prompt_whitelists_common_english_terms[insurance] PASSED [ 66%]
tests/test_avatar_speech.py::TestHinglishPromptRegister::test_prompt_preserves_original_constraints PASSED       [ 71%]
tests/test_avatar_speech.py::TestHinglishPromptRegister::test_prompt_keeps_legal_accuracy_requirement PASSED     [ 76%]
tests/test_avatar_speech.py::TestConvertToHinglish::test_returns_stripped_model_output PASSED                    [ 80%]
tests/test_avatar_speech.py::TestConvertToHinglish::test_falls_back_to_first_sentences_on_error PASSED           [ 85%]
tests/test_avatar_speech.py::TestDomainHelpers::test_detect_domain_matches_keywords PASSED                       [ 90%]
tests/test_avatar_speech.py::TestDomainHelpers::test_detect_domain_defaults_to_general PASSED                    [ 95%]
tests/test_avatar_speech.py::TestDomainHelpers::test_get_interim_messages_respects_count_and_dedupes PASSED      [100%]Running teardown with pytest sessionfinish...


================================================= 21 passed in 1.00s ==================================================

Manual testing

Before-and-after harness:

PS C:\Users\madhu\Documents\nyay-setu-working\nlp-orchestrator> python compare_hinglish_prompts.py --dry-run
Dry run — scanning a sample formal sentence:
  text : Aapke case mein nyayalaya jaana padega. Vidhik prakriya ke anusaar kshatipoorti ke liye aavedan karein.
  hits : ['nyayalaya', 'vidhik', 'prakriya', 'kshatipoorti', 'aavedan']

New prompt avoid-list mentions found in shipped prompt:
  ['nyayalaya', 'vidhik', 'vaidhanik', 'adhiniyam', 'praavdhaan', 'prakriya', 'kshatipoorti', 'abhiyukt', 'yachika', 'aavedan', 'vivaran', 'upalabdh', 'sambandhit', 'tatpashchaat', 'kripya', 'pradaan', 'prapt']
PS C:\Users\madhu\Documents\nyay-setu-working\nlp-orchestrator> python compare_hinglish_prompts.py
========================================================================
DOMAIN: accident
------------------------------------------------------------------------
[OLD]  formal words: none
Aapko pata hona chahiye ki Motor Vehicles Act, 1988 ki Section 166 ke tahat, aap road accident ke baad Motor Accident Claims Tribunal me compensation ke liye claim file kar sakte hain. Aapko FIR ki copy aur offending vehicle ke insurance details lena hoga. Driver ki lado par negligence prove karna zaroori hai, tabhi aapka claim succeed hoga. Aap chinta na karein, hum aapki madad karenge, bas humein apni puri jaankari dein. Aapko nyay milne ki poori ummeed hai, aage badhte hain.
------------------------------------------------------------------------
[NEW]  formal words: none
Dekhiye, aapke case mein Motor Vehicles Act ka Section 166 lagta hai — iska matlab hai ki aap accident ke liye compensation claim kar sakte hain. Pehle ek police complaint aur FIR ki copy le lijiye, aur offending vehicle ki insurance details bhi milayein. Driver ki ladoochhedi ko prove karna hoga, tabhi aapka claim successful hoga. Ghabraaiye mat, main aapko har step samjha dunga, aapko apne haq ka claim milne mein madad karunga.

========================================================================
DOMAIN: criminal
------------------------------------------------------------------------
[OLD]  formal words: none
Aap agar koi cognizable offence report karna chahte hain, to aap Section 173 ke tahat Bharatiya Nagarik Suraksha Sanhita, 2023 ke under FIR darj kara sakte hain. Police ko yeh FIR register karna zaroori hai, lekin agar ve mana karte hain, to aap Superintendent of Police se sampark kar sakte hain ya Magistrate ke saamne complaint file kar sakte hain. Aapko chinta nahin karni chahiye, aapka adhikar hai aur hum aapke saath hain. Aapko nyay milega, aap himmat mat haariye.
------------------------------------------------------------------------
[NEW]  formal words: none
Dekhiye, aap koi cognizable offence report karna chahte hain, to aap Section 173 ke tahat FIR darj kara sakte hain. Police ko yeh FIR register karna zaroori hai, lekin agar ve mana karte hain, to aap Superintendent of Police ke paas ja sakte hain ya Magistrate ke saamne complaint file kar sakte hain. Yeh process thoda lamba hai, lekin aapko apne rights ke liye ladna padta hai. Main aapko madad karunga, ghabraaiye mat, kanoon aapke saath hai. Aap apni complaint lekar aaiye, hum saath mein court mein jaenge.

========================================================================
DOMAIN: consumer
------------------------------------------------------------------------
[OLD]  formal words: none
Aapko Consumer Protection Act, 2019 ke tehet, agar aapko koi defective product ya deficient service mila hai, to aap District Consumer Disputes Redressal Commission mein complaint file kar sakte hain. Section 2 ke hisaab se, aapko do saal ke andar complaint file karni hogi. Aap refund ya compensation bhi maang sakte hain, jo aapke nuksan ke hisaab se decide kiya jayega. Aapko chinta nahi karni chahiye, Nyay Saarthi aapke saath hai, aap apna haq le sakte hain.
------------------------------------------------------------------------
[NEW]  formal words: none
Dekhiye, aapko Consumer Protection Act, 2019 ke tahat complaint file karni hai, agar aapko koi defective product mila hai ya service thik nahin mili. Aapko District Consumer Disputes Redressal Commission mein complaint deni hogi, aur wo bhi do saal ke andar, jab se aapko nuksan hua hai. Aap refund ya compensation manga sakte hain, jo bhi aapko sahi lage. Ghabraaiye mat, main aapko har step samjha dunga, aapko apne haq milenge.

========================================================================
TOTAL formal-word hits  ->  OLD: 0   NEW: 0
Expected: NEW <= OLD (fewer formal words). Also eyeball readability.
========================================================================

Testing the actual function:

PS C:\Users\madhu\Documents\nyay-setu-working\nlp-orchestrator> python -c "import asyncio; from avatar_speech import convert_to_hinglish; print(asyncio.run(convert_to_hinglish('Under Section 138 of the Negotiable Instruments Act, a dishonoured cheque is a criminal offence.')))"
Dekhiye, aapke case mein Negotiable Instruments Act ka Section 138 lagta hai — iska matlab hai ki agar koi cheque dishonour ho jata hai, to yeh ek criminal offence hai. Aapko court mein case file karna padega aur judge ke saamne apni baat rakhni hogi. Police ko bhi FIR darj karwani hogi, aur phir aapko lawyer ki madad leni hogi. Ghabraaiye mat, main aapko har step samjha dunga. Aapke rights ki protection ke liye kanoon aapke saath hai, chinta mat kijiye.

Running the service:
NOTE: this is after running uvicorn main:app --port 8001 --reload on a separate terminal.

PS C:\Users\madhu> $body = @{ query = "My bike was hit by a car, what can I claim?"; language = "hinglish" } | ConvertTo-Json
PS C:\Users\madhu> Invoke-RestMethod -Uri "http://localhost:8001/api/legal/analyze" -Method Post -ContentType "application/json" -Body $body | ConvertTo-Json -Depth 10
{
    "query":  "My bike was hit by a car, what can I claim?",
    "sub_questions":  [
                          "What are the provisions under the Motor Vehicles Act (MVA) for claiming compensation in a road accident?",
                          "What is the process for filing a claim under the MVA?",
                          "What are the documents required to support a claim for damages under the MVA?"
                      ],
    "research":  [
                     {
                         "question":  "What are the provisions under the Motor Vehicles Act (MVA) for claiming compensation in a road accident?",
                         "answer":  "The retrieved context does not cover the provisions under the Motor Vehicles Act (MVA) for claiming compensation in a road accident directly. However, it mentions Section 166 and Section 168 of the Act, which relate to claims for compensation and the inquiry to determine compensation, respectively. \n\nAccording to the context, under Section 166 of the Motor Vehicles Act, a petition can be filed seeking compensation for the death of a person in a road traffic accident. The Claims Tribunal is required to hold an inquiry to determine compensation which must appear to it to be just, as per Section 168 of the Act. \n\nIt also mentions that strict rules of evidence are not applicable in an inquiry conducted by the Claims Tribunal, and the term \"rashness and negligence\" has to be construed lightly while making a decision on a petition for claim, as the chapter in the Motor Vehicle Act dealing with compensation is a benevolent legislation and not a penal one.\n\nHowever, the specific provisions and procedures for claiming compensation under the MVA are not explicitly stated in the provided context. Therefore, I cannot verify the exact provisions and procedures.",
                         "source":  "groq",
                         "grounded":  true,
                         "error":  null
                     },
                     {
                         "question":  "What is the process for filing a claim under the MVA?",
                         "answer":  "The retrieved context does not cover the process for filing a claim under the Motor Vehicles Act (MVA) directly. However, it mentions that a petition can be filed under Section 166 of the Act, seeking compensation for the death of a person in a road traffic accident. \n\nAs per the context, the petitioners filed a petition under Section 166 of the Motor Vehicles Act, 1989, seeking compensation of Rs.50,00,000/- for the death of Sri.Gangadhara A. S/o. Ashwathappa, in a road traffic accident. \n\nTo provide more information, Section 166 of the Motor Vehicles Act, 1988, allows a claimant to file a petition before a Motor Accident Claims Tribunal for compensation in case of an accident. However, the exact process and requirements for filing a claim are not specified in the provided context. \n\nTherefore, I cannot verify the exact process for filing a claim under the MVA based on the given context.",
                         "source":  "groq",
                         "grounded":  true,
                         "error":  null
                     },
                     {
                         "question":  "What are the documents required to support a claim for damages under the MVA?",
                         "answer":  "The retrieved context does not cover this directly, but based on the information provided in the excerpts, it can be inferred that the documents required to support a claim for damages under the Motor Vehicles Act (MVA) may include:\n\n1. A certified copy of the report under Section 173 Cr.P.C. (as mentioned in Smt. Gayathri N vs Reliance General Insurance Co on 19 February, 2021)\n2. Mechanical inspection reports of the vehicles (as mentioned in State vs . : Kallu on 25 November, 2019 and Smt. Gayathri N vs Reliance General Insurance Co on 19 February, 2021)\n3. Photographs of the vehicles (as mentioned in State vs . : Kallu on 25 November, 2019)\n4. Post-mortem examination report (as mentioned in Smt. Gayathri N vs Reliance General Insurance Co on 19 February, 2021)\n5. Proof of expenses incurred for the last rites and obsequies (as mentioned in Smt. Gayathri N vs Reliance General Insurance Co on 19 February, 2021)\n\nHowever, it is essential to note that the specific documents required may vary depending on the circumstances of the case and the jurisdiction. Therefore, it is recommended to consult the relevant provisions of the Motor Vehicles Act and seek advice from a qualified legal professional to determine the exact documents required to support a claim for damages.",
                         "source":  "groq",
                         "grounded":  true,
                         "error":  null
                     }
                 ],
    "final_answer":  {
                         "markdown":  "## Introduction to Claiming Compensation\nIf your bike was hit by a car, you can claim compensation under the Motor Vehicles Act (MVA). The amount of compensation you can claim depends on the severity of the accident, the damage to your bike, and the injuries you sustained. You can file a petition under Section 166 of the MVA to seek compensation.\n\n## Key Legal Provisions\nThe MVA provides provisions for claiming compensation in cases of road accidents. The key sections to note are:\n- Section 166 of the MVA, which allows a claimant to file a petition before a Motor Accident Claims Tribunal for compensation in case of an accident.\n- Section 168 of the MVA, which requires the Claims Tribunal to hold an inquiry to determine compensation that must appear to it to be just.\n\n## Practical Steps to Take\nTo claim compensation, you can take the following steps:\n- File a police report and obtain a copy of the report under Section 173 Cr.P.C.\n- Collect documents such as mechanical inspection reports of the vehicles, photographs of the vehicles, post-mortem examination report (if applicable), and proof of expenses incurred for medical treatment or last rites.\n- Gather evidence of the accident, including witness statements and any available video footage.\n- Consult a lawyer to help you prepare and file a petition under Section 166 of the MVA.\n\n## Important Deadlines and Limitations\nIt is essential to note that there are time limits for filing a claim under the MVA. Although the exact time limit is not specified in the provided context, it is generally recommended to file a claim as soon as possible after the accident. You should also be aware that the Claims Tribunal may have specific requirements and deadlines for filing documents and appearing for hearings.\n\n## Disclaimer\nThis response provides general information and is not a substitute for professional legal advice. The specific documents required and the process for filing a claim may vary depending on the circumstances of the case and the jurisdiction. It is recommended that you consult a qualified lawyer to determine the exact requirements and procedures for claiming compensation under the MVA.",
                         "hinglish":  "Dekhiye, aapke bike ko car ne hit kiya hai, to aap Motor Vehicles Act ke tehet compensation claim kar sakte hain, khaskar Section 166 ke under. Pehle ek police complaint aur FIR ki copy le lijiye, phir insurance company ko notice bhejna padega. Aapko apne case ke details collect karne honge, jaise ki mechanical inspection reports, photographs, aur medical expenses ka proof. Ghabraaiye mat, main aapko har step samjha dunga.",
                         "citation_validation":  [

                                                 ]
                     }
}

@madsysharma madsysharma requested a review from viru0909-dev as a code owner June 4, 2026 16:53
@vercel
Copy link
Copy Markdown

vercel Bot commented Jun 4, 2026

@madsysharma is attempting to deploy a commit to the CodeBlooded's projects Team on Vercel.

A member of the Team first needs to authorize it.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 4, 2026

Hi @madsysharma, thanks for contributing to Nyay Setu! 🎉

I have automatically:

  • 👤 Assigned this PR to you.
  • 🏷️ Applied the gssoc:approved label.

Our workflows will now analyze your changes to classify:

  • 📈 PR Difficulty: level:*
  • 🧩 PR Type: type:*
  • 🌟 PR Quality: quality:*

Tip

Ensure your PR description references the issue it resolves (e.g. Closes #123). This allows the bot to inherit any additional labels from that issue!

Happy coding! 🚀

@madsysharma
Copy link
Copy Markdown
Contributor Author

madsysharma commented Jun 4, 2026

Hi @viru0909-dev , please review this PR. Also, please add the gssoc:approved label to it if everything is good to go. Thank you.

@github-actions github-actions Bot added level:advanced GSSoC Advance Level and removed level:intermediate labels Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[NLP] Tune prompt for regional language translation

1 participant