A pedagogical mini-compiler demonstrating fundamental compiler design principles
Features • Installation • Quick Start • Documentation • Examples
- Overview
- Features
- Architecture
- Project Structure
- Installation
- Quick Start
- Detailed Usage
- Grammar Specification
- Examples
- How It Works
- Concepts Demonstrated
- Troubleshooting
- Contributing
- License
Mini Compiler is an educational compiler implementation built in C that demonstrates the core back-end phases of compiler design. It uses Flex for lexical analysis and Bison/YACC for parsing, covering two fundamental compilation stages:
- Abstract Syntax Tree (AST) Construction and Traversal
- Intermediate Code Generation (ICG) using Quadruple Representation
This project is ideal for computer science students and enthusiasts looking to understand how compilers work under the hood.
- ✅ Parses a C-like language with:
- Variable assignments
- Arithmetic expressions (
+,-,*,/) - Conditional statements (
if,if-else)
- ✅ Constructs a ternary expression tree (left/middle/right child nodes)
- ✅ Outputs preorder traversal of the AST
- ✅ Comprehensive error reporting with line numbers
- ✅ Validates syntax against defined grammar rules
- ✅ Extended grammar support:
ifandif-elseconditionalswhileloopsdo-whileloops
- ✅ Generates quadruple (quad) representation:
- Format:
(operator, arg1, arg2, result)
- Format:
- ✅ Automatic resource management:
- Temporary variable generation (
t1,t2,t3, ...) - Jump label generation (
L1,L2,L3, ...)
- Temporary variable generation (
- ✅ Supports relational operators:
<,>,<=,>=,==,!= - ✅ Implements short-circuit control flow evaluation
- ✅ Produces human-readable quad tables
┌─────────────────────────────────────────────────────────────┐
│ SOURCE CODE │
│ (C-like language) │
└───────────────────────┬─────────────────────────────────────┘
│
▼
┌───────────────────────┐
│ LEXICAL ANALYZER │
│ (Flex) │
│ • Tokenization │
│ • Pattern Matching │
└──────────┬────────────┘
│ Token Stream
▼
┌───────────────────────┐
│ SYNTAX ANALYZER │
│ (Bison/YACC) │
│ • Grammar Validation │
│ • Parse Tree Build │
└──────────┬────────────┘
│
┌─────────────┴─────────────┐
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ AST MODULE │ │ ICG MODULE │
├─────────────────┤ ├─────────────────┤
│ • Tree Building │ │ • Quad Gen │
│ • Node Creation │ │ • Temp Vars │
│ • Traversal │ │ • Label Gen │
│ • Preorder Out │ │ • Control Flow │
└─────────────────┘ └─────────────────┘
│ │
▼ ▼
AST Output Quadruple Table
Mini-Compiler/
│
├── AST/ # Abstract Syntax Tree Module
│ ├── lexer.l # Flex lexer specification
│ ├── parser.y # Bison/YACC grammar + AST builder
│ ├── abstract_syntax_tree.c # AST node creation & traversal logic
│ ├── abstract_syntax_tree.h # AST structure & function declarations
│ ├── run.sh # Build automation script
│ ├── test_input1.c # Test: if statement
│ ├── test_input2.c # Test: if-else statement
│ ├── test_input3.c # Test: nested if + sequences
│ ├── commands.png # Screenshot: build commands
│ ├── output 1.png # Screenshot: test output 1
│ ├── output 2.png # Screenshot: test output 2
│ └── output 3.png # Screenshot: test output 3
│
├── ICG/ # Intermediate Code Generation Module
│ ├── lexer.l # Flex lexer specification
│ ├── parser.y # Bison/YACC grammar + quad generator
│ ├── quad_generation.c # Quad table emission & management
│ ├── quad_generation.h # ICG function & variable declarations
│ ├── run.sh # Build automation script
│ ├── test_input1.c # Test: if statement
│ ├── test_input2.c # Test: if-else statement
│ ├── test_input3.c # Test: while loop
│ ├── output1.png # Screenshot: test output 1
│ ├── output2.png # Screenshot: test output 2
│ └── output3.png # Screenshot: test output 3
│
└── README.md # Project documentation
Ensure you have the following tools installed:
| Tool | Purpose | Installation Check |
|---|---|---|
| Flex | Lexical analyzer generator | flex --version |
| Bison/YACC | Parser generator | bison --version |
| GCC | GNU C Compiler | gcc --version |
sudo apt-get update
sudo apt-get install flex bison gccsudo dnf install flex bison gccbrew install flex bison gccflex --version
bison --version
gcc --version# Navigate to AST directory
cd AST
# Make the build script executable
chmod +x run.sh
# Compile the project
./run.sh
# Run with test input
./a.out < test_input1.c# Navigate to ICG directory
cd ICG
# Make the build script executable
chmod +x run.sh
# Compile the project
./run.sh
# Run with test input
./a.out < test_input1.cThe run.sh script automates the compilation process:
#!/bin/bash
# Step 1: Generate lexical analyzer
lex lexer.l # Creates lex.yy.c from lexer specification
# Step 2: Generate parser
yacc -d parser.y # Creates y.tab.c and y.tab.h from grammar
# Step 3: Compile everything
gcc -g y.tab.c lex.yy.c # Compiles and links all components
# Step 4: Cleanup intermediate files
rm -f lex.yy.c y.tab.c y.tab.hIf you prefer manual control:
# For AST module
cd AST
lex lexer.l
yacc -d parser.y
gcc -g y.tab.c lex.yy.c abstract_syntax_tree.c -o ast_compiler
./ast_compiler < test_input1.c
# For ICG module
cd ICG
lex lexer.l
yacc -d parser.y
gcc -g y.tab.c lex.yy.c quad_generation.c -o icg_compiler
./icg_compiler < test_input1.cCreate your own test file:
# Create a new test file
cat > my_test.c << 'EOF'
if(x > 5) {
y = x * 2;
z = y + 3;
}
EOF
# Run the compiler
./a.out < my_test.cSTART → SEQ
SEQ → S SEQ
| S
S → if ( C ) { SEQ }
| if ( C ) { SEQ } else { SEQ }
| while ( C ) { S } ← ICG only
| do { S } while ( C ) ; ← ICG only
| ASSGN
ASSGN → id = E ;
E → E + T
| E - T
| T
T → T * F
| T / F
| F
F → ( E )
| id
| number
C → F relop F
relop → < | > | <= | >= | == | !=
| Token | Pattern | Example |
|---|---|---|
T_ID |
[a-zA-Z][a-zA-Z0-9_]* |
x, var1, temp_value |
T_NUM |
[0-9]+ |
42, 100, 0 |
IF |
if |
if |
ELSE |
else |
else |
WHILE |
while |
while (ICG only) |
DO |
do |
do (ICG only) |
| Relational | <, >, <=, >=, ==, != |
<, >= |
| Arithmetic | +, -, *, / |
+, * |
Input (test_input1.c)
if(a > b) {
a = a + 1;
b = b - 1;
}AST Output
Preorder:
if,>,a,b,seq,=,a,+,a,1,=,b,-,b,1
Valid syntax
ICG Quad Table Output
ICG
-----------------------------------------------------
| op | arg1 | arg2 | result |
-----------------------------------------------------
| > | a | b | t1 |
| if | t1 | | L1 |
| goto | | | L2 |
| Label | | | L1 |
| + | a | 1 | t2 |
| = | t2 | | a |
| - | b | 1 | t3 |
| = | t3 | | b |
| Label | | | L2 |
-----------------------------------------------------
Valid syntax
Input (test_input2.c)
if(a > b) {
a = a + 1;
b = b - 1;
} else {
a = a - 1;
b = b - 1;
}AST Output
Preorder:
if-else,>,a,b,seq,=,a,+,a,1,=,b,-,b,1,seq,=,a,-,a,1,=,b,-,b,1
Valid syntax
ICG Quad Table Output
ICG
-----------------------------------------------------
| op | arg1 | arg2 | result |
-----------------------------------------------------
| > | a | b | t1 |
| if | t1 | | L1 |
| goto | | | L2 |
| Label | | | L1 |
| + | a | 1 | t2 |
| = | t2 | | a |
| - | b | 1 | t3 |
| = | t3 | | b |
| goto | | | L3 |
| Label | | | L2 |
| - | a | 1 | t4 |
| = | t4 | | a |
| - | b | 1 | t5 |
| = | t5 | | b |
| Label | | | L3 |
-----------------------------------------------------
Valid syntax
Input (test_input3.c)
while(x < 10) {
x = x + 1;
y = y * 2;
}ICG Quad Table Output
ICG
-----------------------------------------------------
| op | arg1 | arg2 | result |
-----------------------------------------------------
| Label | | | L1 |
| < | x | 10 | t1 |
| if | t1 | | L2 |
| goto | | | L3 |
| Label | | | L2 |
| + | x | 1 | t2 |
| = | t2 | | x |
| * | y | 2 | t3 |
| = | t3 | | y |
| goto | | | L1 |
| Label | | | L3 |
-----------------------------------------------------
Valid syntax
┌─────────────────────────────────────────────────────────┐
│ 1. SOURCE CODE │
│ • User writes C-like program │
│ • Contains if/else, while, assignments │
└────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ 2. LEXICAL ANALYSIS (Flex - lexer.l) │
│ • Scans input character by character │
│ • Recognizes tokens using regex patterns │
│ • Outputs: Token stream │
│ Example: "if(a > b)" → [IF, (, ID(a), >, ID(b), )]│
└────────────────────┬────────────────────────────────────┘
│ Token Stream
▼
┌─────────────────────────────────────────────────────────┐
│ 3. SYNTAX ANALYSIS (Bison - parser.y) │
│ • Validates token stream against grammar │
│ • Builds parse tree bottom-up (LALR parsing) │
│ • Triggers semantic actions on reduction │
│ • Reports syntax errors with line numbers │
└────────────────────┬────────────────────────────────────┘
│
┌──────────┴──────────┐
│ │
▼ ▼
┌─────────────────────┐ ┌─────────────────────┐
│ 4a. AST MODULE │ │ 4b. ICG MODULE │
├─────────────────────┤ ├─────────────────────┤
│ • Create tree nodes │ │ • Generate quads │
│ • Link parent/child │ │ • Manage temps │
│ • Traverse preorder │ │ • Create labels │
│ • Print structure │ │ • Emit quad table │
└─────────┬───────────┘ └──────────┬──────────┘
│ │
▼ ▼
AST Traversal Quadruple IR Code
-
Tokenization (Lexer)
- Input: Raw source code string
- Process: Pattern matching using regular expressions
- Output: Stream of categorized tokens
-
Parsing (Parser)
- Input: Token stream from lexer
- Process: Bottom-up LALR(1) shift-reduce parsing
- Output: Parse tree + semantic actions triggered
-
AST Construction
- Creates
expression_nodestructures - Links nodes in ternary tree (left, middle, right)
- Performs preorder traversal for output
- Creates
-
ICG Quad Generation
- Walks grammar reduction actions
- Emits quadruples for each operation
- Generates temporaries for intermediate results
- Creates labels for control flow jumps
-
✅ Lexical Analysis
- Regular expressions for token recognition
- Finite automata implementation via Flex
- Token categorization and symbol table integration
-
✅ Syntax Analysis
- Context-free grammar design
- LALR(1) bottom-up parsing (Bison/YACC)
- Shift-reduce conflict resolution
- Error recovery and reporting
-
✅ Abstract Syntax Trees
- Tree data structure in C
- Node allocation and memory management
- Tree traversal algorithms (preorder)
- Recursive tree construction
-
✅ Intermediate Code Generation
- Three-address code (quadruple form)
- Temporary variable allocation
- Label generation for control flow
- Symbol table management
-
✅ Control Flow Translation
- If-then-else translation
- Loop translation (while, do-while)
- Short-circuit boolean evaluation
- Backpatching for forward references
- 🔧 Modular Design: Separation of lexer, parser, AST, and ICG
- 🔧 Build Automation: Shell scripts for compilation pipeline
- 🔧 Testing: Multiple test cases with expected outputs
- 🔧 Error Handling: Line number tracking with
yylineno
Solution:
sudo apt-get install flexSolution:
sudo apt-get install bisonCause: Input doesn't match the grammar
Solution:
- Check for missing semicolons
- Ensure braces
{}are balanced - Verify relational operators are correct
- Check that identifiers start with a letter
Cause: Missing yywrap function
Solution: Already included in parser.y:
int yywrap() {
return 1;
}Solution:
gcc -g y.tab.c lex.yy.c -o compiler -lflSolution:
chmod +x run.shyacc -d -t parser.y # -t enables debug mode
export YYDEBUG=1
./a.out < test_input.cflex -d lexer.l # -d enables debug modegcc -g y.tab.c lex.yy.c # -g includes debug symbols
gdb ./a.out
(gdb) run < test_input.c
(gdb) break yyparse
(gdb) step- Update
lexer.lto recognize new tokens - Add token definitions in
parser.y - Extend grammar rules
- Implement semantic actions
Example: Adding modulo operator
// In lexer.l
"%" { return MOD; }
// In parser.y
T : T '%' F {
$$ = new_temp();
quad_code_gen($$, $1, "%", $3);
}Contributions are welcome! Here's how you can help:
- Check if the issue already exists
- Create a new issue with:
- Descriptive title
- Steps to reproduce
- Expected vs actual behavior
- Input code that causes the issue
- New language features
- Optimization techniques
- Better error messages
- Additional test cases
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new features
- Submit a pull request
- Compilers: Principles, Techniques, and Tools (Dragon Book) - Aho, Lam, Sethi, Ullman
- Modern Compiler Implementation in C - Andrew W. Appel
- Engineering a Compiler - Cooper and Torczon
- Built using GNU Flex and GNU Bison
- Inspired by classic compiler design textbooks
- Educational project for learning compiler construction
For questions, suggestions, or collaboration:
- Open an issue on GitHub
- Contribute via pull requests
⭐ Star this repository if you found it helpful! ⭐
Made with ❤️ for learning compiler design