Automata & Formal Language Studio

A powerful AI-powered tool for exploring Theory of Computation concepts, including Regular Languages, Automata, and Regex. This project bridges the gap between abstract formal language theory and practical visualization using modern web technologies and Generative AI.

Technologies & Concepts Used

Core Technologies

Python 3.8+: The primary programming language.
Streamlit: A react-like framework for building data-driven web apps in pure Python. Used for the entire frontend and state management.
Automata-lib: A Python library for simulating and converting finite automata (DFA, NFA) and Regular Expressions.
Google Gemini API (Generative AI): Used for the "Language Definition" feature, allowing the system to understand natural language descriptions (e.g., "Strings ending in 101") and convert them to formal Regex.
Graphviz: An open-source graph visualization software. Used here to render state transition diagrams.
Pandas: Used for handling transition tables (DataFrames) and batch testing results.

Theoretical Concepts

Finite Automata (FA):
- DFA (Deterministic Finite Automata): Every state has exactly one transition for each symbol.
- NFA (Non-Deterministic Finite Automata): States can have zero, one, or multiple transitions for a symbol, including $\epsilon$-transitions.
Automata Conversions:
- Subset Construction Algorithm (NFA $\to$ DFA):
  - The project implements a manual subset construction algorithm.
  - It computes the epsilon-closure for each state.
  - It tracks new states as sets of NFA states (e.g., {q0, q1}) to explicitly show how the DFA is built, rather than renaming them to arbitrary integers.
- Regex $\to$ DFA:
  - Implements a chained conversion pipeline: Regex $\to$ NFA $\to$ DFA.
  - This combines the parsing power of the library with the custom subset construction logic to ensure educational clarity in the final output.
- Kleene's Theorem: Implementation of conversions between Regular Expressions and Finite Automata.
DFA Minimization (Moore's Algorithm):
- The project implements an $O(n^2)$ algorithm to minimize DFAs by finding groups of equivalent states.
- It iteratively refines partitions (0-equivalence, 1-equivalence, etc.) until no further splitting is possible.

Detailed File Architecture & Logic

1. `app.py` (The Frontend & Controller)

Role: The entry point and main controller of the application.

Logic:
- Session State Management: Uses st.session_state to persist data (automata objects, analysis results, step logs) across browser reruns. This ensures the user's workflow is preserved during interaction.
- Tab-Based Layout: Organizes the UI into "Define & Test Language" (AI + Batch Testing) and "Automata Studio" (Structural Operations).
- Security & API Priority: Implements a secure priority system for loading the Google API Key:
  1. st.secrets (Production/Cloud)
  2. os.environ (Local Environment)
  3. User Input (Sidebar Fallback)
  - Note: If a system key is found, the manual input field is completely hidden from the UI to prevent accidental exposure.
- Context-Aware Examples: The "Load Example" button dynamically loads different datasets based on the target operation (e.g., loading an 8-state DFA specifically designed for Moore's Algorithm when "Minimized DFA" is selected).

2. `automata_logic.py` (The Mathematical Engine)

Role: Handles all deterministic operations and visualization. This file contains the core algorithmic implementations.

Logic:
- nfa_to_dfa (Manual Subset Construction):
  - Instead of using the library's default conversion, this function manually implements the Powerset Construction algorithm.
  - It calculates the epsilon-closure of states and iteratively finds reachable subsets.
  - Goal: To preserve the educational value by showing states labeled as {q0, q1}.
- minimize_dfa_with_steps (Moore's Algorithm):
  - Implements Moore's Algorithm for DFA minimization.
  - Iteratively calculates $k$-equivalence partitions (0-equiv, 1-equiv...) and logs each step for the user.
  - Partial DFA Support: Constructs the final DFA with allow_partial=True to handle cases where transitions are missing (implicit dead states), avoiding validation errors without adding confusing "Sink" states to the diagram.
- regex_to_dfa:
  - Parses the Regex into an NFA using the library, then passes it through the custom nfa_to_dfa logic to ensure the final DFA retains readable subset states.
- get_graphviz_source (Custom Visualization):
  - A custom rendering engine that generates DOT source code directly.
  - Sanitization: Detects complex state labels (like frozenset({'q0', 'q1'}) or ['q0', 'q1']) and converts them into clean set notation ({q0, q1}).
  - Avoids heavy dependencies like pygraphviz, ensuring compatibility with Streamlit Cloud.

3. `ai_handler.py` (The AI Integration)

Role: Manages communication with Google's Gemini API.

Logic:
- Dynamic Model Resolution: Implements _resolve_model_name to fix common "404 Not Found" errors by checking available models at runtime (list_models) and prioritizing gemini-1.5-flash.
- Structured Outputs: Uses typing.TypedDict and response_schema to force the AI to return strict JSON matching defined schemas (LanguageAnalysis, StringCheck). This eliminates parsing errors common with free-form LLM responses.
- Lazy Configuration: The configure_api method allows the API key to be set at runtime (via the UI) rather than requiring it at startup.

4. `logic.py` (Business Logic / Orchestration)

Role: The bridge between the UI and the AI Handler.

Logic:
- LanguageProcessor: A class that encapsulates the state of the "current language".
- Orchestration: It takes the raw string from the UI, sends it to AIHandler, validates the JSON response, and stores the resulting Regex/Explanation for use in testing.
- Regex Validation: It uses Python's re module to test strings against the AI-generated regex to provide deterministic "Accepted/Rejected" results.

5. Configuration & Data Files

.streamlit/secrets.toml: Stores sensitive data (API Keys). This file is git-ignored to prevent leaks.
.streamlit/config.toml: Configures Streamlit server settings (e.g., headless mode).
secrets.toml.example: A template file showing users how to configure their own API keys.
requirements.txt: Lists Python packages (automata-lib, streamlit, google-generativeai).
packages.txt: Lists system-level binaries (graphviz) required by the deployment environment.
main.py: The legacy CLI version of the tool.
test_inputs.csv: Sample data for the Batch Testing feature.
test_model_resolution.py: Unit test script used to verify the AI model selection logic.

Setup & Installation

Prerequisites: Python 3.8+
Install Python Dependencies:
```
pip install -r requirements.txt
```
Install System Dependencies (for visualization):
- This project uses Graphviz. Ensure it is installed on your system (e.g., apt-get install graphviz on Linux, or download from the official site for Windows/Mac).
Run the Application:
```
streamlit run app.py
```

API Key Configuration

The application requires a Google Gemini API Key for the "Language Definition" features.

1. Streamlit Secrets (Recommended)

Create a folder .streamlit in the root.
Create secrets.toml:
```
GOOGLE_API_KEY = "your_key_here"
```

2. Sidebar Input

If no system key is found, the app will prompt you to enter one in the sidebar.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automata & Formal Language Studio

Technologies & Concepts Used

Core Technologies

Theoretical Concepts

Detailed File Architecture & Logic

1. `app.py` (The Frontend & Controller)

2. `automata_logic.py` (The Mathematical Engine)

3. `ai_handler.py` (The AI Integration)

4. `logic.py` (Business Logic / Orchestration)

5. Configuration & Data Files

Setup & Installation

API Key Configuration

1. Streamlit Secrets (Recommended)

2. Sidebar Input

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 117 Commits
.devcontainer		.devcontainer
.streamlit		.streamlit
docs		docs
.gitignore		.gitignore
README.md		README.md
ai_handler.py		ai_handler.py
app.py		app.py
automata_logic.py		automata_logic.py
logic.py		logic.py
main.py		main.py
packages.txt		packages.txt
requirements.txt		requirements.txt
secrets.toml.example		secrets.toml.example
test_inputs.csv		test_inputs.csv
test_model_resolution.py		test_model_resolution.py

Folders and files

Latest commit

History

Repository files navigation

Automata & Formal Language Studio

Technologies & Concepts Used

Core Technologies

Theoretical Concepts

Detailed File Architecture & Logic

1. app.py (The Frontend & Controller)

2. automata_logic.py (The Mathematical Engine)

3. ai_handler.py (The AI Integration)

4. logic.py (Business Logic / Orchestration)

5. Configuration & Data Files

Setup & Installation

API Key Configuration

1. Streamlit Secrets (Recommended)

2. Sidebar Input

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. `app.py` (The Frontend & Controller)

2. `automata_logic.py` (The Mathematical Engine)

3. `ai_handler.py` (The AI Integration)

4. `logic.py` (Business Logic / Orchestration)

Packages