C compiler written in Rust based on Sandler Nora's book "Writing a C Compiler". The project is just the the preprocessed C to ASM compiler. Preprocessor and Linker comes from gcc.
After implementing the book's subset of C, the language will be extended with cool stuff like:
- ADTs and pattern matching
- Go-like methods for structs
- A simple rust like module system
And a bunch of extra optimizations.
The final goal is writing and compiling a simple xv6-like OS.
TODO:
- Chapter 10 semantic analysis.
Backlog:
- Proper error reporting system, currently just panics with a message and a backtrace.
program = Program(declaration*)
declaration = FunDecl(function_declaration) | VarDecl(variable_declaration)
variable_declaration = (identifier name, exp? init, storage_class?)
function_declaration = (identifier name, identifier* params,
block? body, storage_class?)
storage_class = Static | Extern
block = Block(block_item*)
block_item = S(statement) | D(declaration)
for_init = InitDecl(variable_declaration) | InitExp(exp?)
statement = Return(exp)
| Expression(exp)
| If(exp condition, statement then, statement? else)
| Compound(block)
| Break(identifier label)
| Continue(identifier label)
| While(exp condition, statement body, identifier label)
| DoWhile(statement body, exp condition, identifier label)
| For(for_init init, exp? condition, exp? post, statement body, identifier label)
| Null
exp = Constant(int)
| Var(identifier)
| Unary(unary_operator, exp)
| Binary(binary_operator, exp, exp)
| Assignment(exp, exp)
| Conditional(exp condition, exp, exp)
| FunctionCall(identifier, exp* args)
unary_operator = Complement | Negate | Not
binary_operator = Add | Subtract | Multiply | Divide | Remainder | And | Or
| Equal | NotEqual | LessThan | LessOrEqual
| GreaterThan | GreaterOrEqual
Loop related statements are annotated in the semantic analysis pass.
<program> ::= { <declaration> }
<declaration> ::= <variable-declaration> | <function-declaration>
<variable-declaration> ::= { <specifier> }+ <identifier> [ "=" <exp> ] ";"
<function-declaration> ::= { <specifier> }+ <identifier> "(" <param-list> ")" ( <block> | ";")
<param-list> ::= "void" | "int" <identifier> { "," "int" <identifier> }
<specifier> ::= "int" | "static" | "extern"
<block> ::= "{" { <block-item> } "}"
<block-item> ::= <statement> | <declaration>
<declaration> ::= "int" <identifier> [ "=" <exp> ] ";"
<for-init> ::= <variable-declaration> | [ <exp> ] ";"
<statement> ::= "return" <exp> ";"
| <exp> ";"
| "if" "(" <exp> ")" <statement> [ "else" <statement> ]
| <block>
| "break" ";"
| "continue" ";"
| "while" "(" <exp> ")" <statement>
| "do" <statement> "while" "(" <exp> ")" ";"
| "for" "(" <for-init> [ <exp> ] ";" [ <exp> ] ")" <statement>
| ";"
<exp> ::= <factor> | <exp> <binop> <exp> | <exp> "?" <exp> ":" <exp>
<factor> ::= <int> | <identifier> | <unop> <factor> | "(" <exp> ")"
| <identifier> "(" [ <argument-list> ] ")"
<argument-list> ::= <exp> { "," <exp> }
<unop> ::= "-" | "~" | "!"
<binop> ::= "-" | "+" | "*" | "/" | "%" | "&&" | "||"
| "==" | "!=" | "<" | "<=" | ">" | ">=" | "="
<identifier> ::= ? An identifier token ?
<int> ::= ? A constant token ?
- Map each variable name to a unique value and adds to symbol table
- Check that variable assignments have valid left expressions (Var(String))
- Check that all variables are defined in their scope
- Check that variable declarations are not repeated in their scope
- Check that all function calls refer to declared functions in their scope
- Check that functions and variables are not declared with SAME name in the SAME scope
- Check that definitions of functions do not live within other functions.
- Annotate loop nodes in the ast with a unique identifier for each corresponding loop
- Check that break and continue statements live within loops
- Check that function declarations are consistent everywhere, and adds name to symbol table
- A function can't be called with the wrong number of arguments
- A function can't be defined more than once (not really type checking but easy to implement here)
program = Program(function_definition*)
function_definition = Function(identifier, identifier* params, instruction* body)
instruction = Return(val)
| Unary(unary_operator, val src, val dst)
| Binary(binary_operator, val src1, val src2, val dst)
| Copy(val src, val dst)
| Jump(identifier target)
| JumpIfZero(val condition, identifier target)
| JumpIfNotZero(val condition, identifier target)
| Label(identifier)
| FunCall(identifier fun_name, val* args, val dst)
val = Constant(int) | Var(identifier)
unary_operator = Complement | Negate | Not
binary_operator = Add | Subtract | Multiply | Divide | Remainder | Equal | NotEqual
| LessThan | LessOrEqual | GreaterThan | GreaterOrEqual
program = Program(function_definition*)
function_definition = Function(identifier name, instruction* instructions)
instruction = Mov(operand src, operand dst)
| Unary(unary_operator, operand)
| Binary(binary_operator, operand, operand)
| Cmp(operand, operand)
| Idiv(operand)
| Cdq
| Jmp(identifier)
| JmpCC(cond_code, identifier)
| SetCC(cond_code, operand)
| Label(identifier)
| AllocateStack(int)
| DeallocateStack(int)
| Push(operand)
| Call(identifier)
| Ret
unary_operator = Neg | Not
binary_operator = Add | Sub | Mult
operand = Imm(int) | Reg(reg) | Pseudo(identifier) | Stack(int)
cond_code = E | NE | G | GE | L | LE
reg = AX | CX | DX | DI | SI | R8 | R9 | R10 | R11
. System V 64 bit calling ABI is implemented in this pass: - input/output regs - callee/caller saved regs - caller handles arg cleanup - 16 byte aligned before call.
- Convert Tacky to ASM, without register allocation (using Pseudo(identifier) for variables).
- Copies arguments to pseudo variables at the start of each function, rather than using the corresponding registers/stack.
- Replace Pseudo(identifier) with Stack(int) for variables, and Reg(reg) for temps.
- Fix up instructions so that src and dst operands are not both memory addresses
- Allocate stack space for each function, rounded up to a multiple of 16 bytes for alignment.