Desktop tool to tokenize (lexical analysis) and validate syntactic structure (partial/subset) of C code using PLY (lex/yacc) with a Tkinter GUI.
Note: the syntax analysis targets a subset of the C language (declarations, expressions, control flow, etc.). Real-world C code may contain constructs not yet supported and may produce syntax errors (this is expected).
- Reserved keywords (if/else/switch/for/while, data types, storage classes, etc.)
- Arithmetic, relational, logical, and bitwise operators
- Literals: integers, decimals, char, string
- Preprocessor directives:
#include,#define,#undef - Headers with
<...>and"..."(e.g.,#include <stdio.h>,#include "util.h") - Comments:
//and/* ... */
- Basic preprocessor directives (
#include,#define,#undef) - Control structures:
if/else,for,while,switch/case/default - Declarations and assignments (int/float/double/char)
- Math operations and comparisons
- Arrays (declaration/assignment in subset)
- Functions (declaration/calls in subset)
- Left panel: C code input
- Bottom-left: lexical analyzer output
- Bottom-right: syntax analyzer output
- Buttons: analyze lexicon / syntax / both / clear
- Recent improvements:
- Strip trailing
// commentsbefore sending lines to the parser - Skip empty lines and some unsupported constructs (e.g.,
typedef,static) to reduce false positives
- Strip trailing
- Python 3.10+ (recommended)
- Main dependency:
ply
On Linux, if Tkinter is missing:
- Debian/Ubuntu:
sudo apt-get install python3-tk - Fedora:
sudo dnf install python3-tkinter
git clone https://github.com/dpaulsoria/c-analyzer.git
cd c-analyzer
python -m venv .venv
# Windows:
.venv\Scripts\activate
# Linux/Mac:
source .venv/bin/activate
python -m pip install -U pip
pip install -e .pip install -e ".[dev]"python -m ui.apppython ui/app.py- Paste your C code into INPUT
- Click:
- Analyze Lexicon to see tokens
- Analyze Syntax to validate structure (subset)
- Analize both to run both analyzers
- Review results:
- Lexicon Analyzer prints
LexToken(...) - Syntax Analyzer prints a tree-like output or
Syntax error...
- Lexicon Analyzer prints
c-analyzer/
├─ ui/
│ ├─ __init__.py
│ └─ app.py # (was screen.py) main GUI
├─ analyzers/
│ ├─ __init__.py
│ ├─ lexicon.py # Lexer (PLY)
│ └─ syntax.py # Parser (PLY)
├─ assets/ # Resources (if any)
├─ pyproject.toml # Modern packaging/install config
└─ README.md
The lexer treats quoted includes as valid headers (e.g., "util.h").
This avoids tokenizing #include "util.h" as a generic STRING in contexts where you want it treated like a header.
Strings like "%s" can be recognized as format specifiers (FS_STRING, etc.) when applicable.
Real C code may include:
- prototypes with
static,typedef, complex pointers - advanced structs/unions
- complex macros
- multi-line declarations and flexible formatting
If your goal is to validate full C, you’ll need to extend the grammar in analyzers/syntax.py.
The subset grammar doesn’t integrate comments everywhere.
Applied fix in the GUI: it strips trailing // ... before parsing each line.
Solved by making packaging explicit in pyproject.toml and adding __init__.py so directories are proper Python packages.
- Better handling for prototypes:
static,typedef, pointers, andstruct - Block-based parsing
{ ... }(not only line-by-line) - Better visual feedback: highlight invalid token / exact line
- Export results to
.txtor.json - A cleaner tree/AST view
- Paul Soria
- Gabriela Ramos
- Juan Xavier Pita

