UpCode is a context-optimized, AI-powered tool designed to facilitate the transition from legacy codebases to modern languages. Leveraging Streamlit for its user interface and the Groq API for rapid LLM inference, the application processes full Java or COBOL projects, extracts architectural relationships, and applies a feedback-driven modernization loop to generate, validate, and refine the translated code.
Currently supports:
- Java to Python 3
- COBOL to Go
This tool moves beyond simple file-by-file text replacement by understanding your project's architecture, performing topological sorts to translate dependencies first, and injecting context into LLM prompts.
- Repository Ingestion: Upload a
.ziparchive or paste a GitHub URL containing your legacy source code. - Architectural Analysis: The engine parses
.java,.cbl, or.cpyfiles to map out deep relationships including Inheritance, Interfaces, Imports, COPY books, and CALLs. - Topological Sort: Computes a Directed Acyclic Graph (DAG) to figure out exactly what order to translate the files in. Leaf nodes (dependencies) are translated first.
- Context-Aware Translation: When translating a core file, it injects the already-translated code of its dependencies directly into the LLM prompt, avoiding context dilution and hallucinations. Powered by LangChain and the Groq API (
meta-llama/llama-4-scout-17b-16e-instruct). - Self-Healing Validation: Generated code is automatically run through
ast(Python) orgofmt(Go) checkers. If the LLM hallucinates a syntax error, the engine catches it, builds a refinement prompt with the error trace, and forces the LLM to fix it. - Instant Packaging: The translated results are instantly compiled back into a beautiful
.ziparchive matching the original project structure. - Modern Web UI: A seamless multipage Streamlit interface using
st.navigationfor an intuitive user experience.
The engine's backend is modularly structured under the core/ directory:
- Ingestion (
core/ingestion/): Responsible for fetching the code from ZIP or GitHub, identifying file types, and building the initial dependency graphs. - Translation (
core/translation/): Manages the topological ordering, constructs context from previously translated dependencies, and interfaces with the LLM. - Validation (
core/validation/): Contains strict syntactical validators (python_validator.py,go_validator.py) to verify the LLM's output. - Refinement (
core/refinement/): Orchestrates the feedback loop. If validation fails, it queries the LLM again with the specific error messages until it succeeds or exhausts retries. - Output (
core/output/): Reconstructs the original folder hierarchy with the newly translated files and packages them into a ZIP archive.
- Python 3.8 or higher
- Valid Groq API Key
- Local Go installation (
gofmtmust be available in your system path if you intend to translate COBOL to Go).
-
Clone the repository:
git clone https://github.com/AbhisumatK/Legacy-Code-Parser.git cd Legacy-Code-Parser -
Install the required dependencies:
pip install -r requirements.txt
-
Set up your API Key: You will need a free API key from Groq to power the AI translation engine.
- Go to the GroqCloud Console.
- Log in or create an account.
- Navigate to the API Keys section and generate a new key.
Add your generated API Key to
.streamlit/secrets.toml:GROQ_API_KEY="your_api_key_here"
-
Start the Streamlit application:
streamlit run app.py
-
Open the provided local URL in your web browser.
-
Navigate to either Translate Java or Translate COBOL using the sidebar.
-
Fetch Source: Upload your
.zipor insert a GitHub URL, and click "Start Ingestion Engine". -
Translate: Review the topological translation order and click "Run Translation Engine". The app will display progress as it translates each unit.
-
Download: Once finished, download the structured
.ziparchive containing your modernized codebase.
- The effectiveness of the translation is heavily dependent on the chosen LLM model.
- Certain highly specific legacy paradigms (like complex multi-threading models, deep reflection, or obscure COBOL pointer arithmetic) might require manual review post-translation.
- Circular dependencies are detected, but the system will fall back to a "best-effort" ordering which might lack perfect context injection.
Contributions are welcome. Please ensure that structural changes to the graph generation or translation pipelines are robust, and that new logic is covered under the core/validation/ module.