From ed20d939e3e0b08f5a7c6992b767d3223abc53b6 Mon Sep 17 00:00:00 2001 From: rahulc0dy Date: Thu, 4 Jun 2026 11:30:08 +0530 Subject: [PATCH 1/5] Add formal SQL grammar specification in markdown, BNF, and EBNF formats --- docs/grammar.md | 258 ++++++++++++++++++++++++++++++++++ docs/sql-grammar/grammar.bnf | 191 +++++++++++++++++++++++++ docs/sql-grammar/grammar.ebnf | 96 +++++++++++++ 3 files changed, 545 insertions(+) create mode 100644 docs/sql-grammar/grammar.bnf create mode 100644 docs/sql-grammar/grammar.ebnf diff --git a/docs/grammar.md b/docs/grammar.md index e69de29..8391eb7 100644 --- a/docs/grammar.md +++ b/docs/grammar.md @@ -0,0 +1,258 @@ +# SQL Grammar and Syntax Specification + +This document provides a technical explanation of the PenguinDB SQL grammar. The complete grammar definitions can be obtained in the following formats: + +- Backus-Naur Form (BNF): [grammar.bnf](./sql-grammar/grammar.bnf) +- Extended Backus-Naur Form (EBNF): [grammar.ebnf](./sql-grammar/grammar.ebnf) + +--- + +## Case Insensitivity + +All SQL keywords and unquoted identifiers are case-insensitive. For example, keywords such as `SELECT`, `select`, and `SeLeCt` are evaluated identically. Similarly, unquoted table, column, and database names are resolved case-insensitively. String literals enclosed in single quotes preserve their exact character casing. + +--- + +## Statement Entry Points + +### Statement and Manipulation Statement + +```ebnf +Statement ::= ManipulationStatement ';' + +ManipulationStatement ::= DbManipulationStatement + | TableManipulationStatement + | DataManipulationStatement +``` + +- **Statement**: Defines the ultimate entry point of the parser. It requires a manipulation statement followed by a terminating semicolon, representing a single complete command. +- **ManipulationStatement**: Categorizes the types of executable operations into database, table, and data manipulation rules. This routing assists the parser in delegating subsequent token streams to specific handlers. + +--- + +## Database Manipulation + +### Database Manipulation Statement + +```ebnf +DbManipulationStatement ::= ( 'CREATE' | 'DROP' ) 'DATABASE' Identifier | 'USE' Identifier +``` + +- **CREATE DATABASE / DROP DATABASE**: Defines syntax for creating new databases or deleting existing ones. The parser uses this to trigger file-system level database directory creation or deletion. +- **USE**: Sets the active database context for the current session. Any subsequent table queries will assume this database scope unless explicitly overridden. + +--- + +## Table Schema Manipulation + +### Create Table Statement + +```ebnf +CreateTableStatement ::= 'CREATE' 'TABLE' Identifier '(' ColumnDefinition ( ',' ColumnDefinition )* ')' +``` + +- **CreateTableStatement**: Governs schema definition for new tables. It requires a table name (Identifier) followed by a comma-separated list of column definitions enclosed in parentheses. The parser uses this to build the catalog schema. + +### Alter Table Statement + +```ebnf +AlterTableStatement ::= 'ALTER' 'TABLE' Identifier AlterAction +AlterAction ::= ( 'ADD' | 'MODIFY' ) 'COLUMN'? ColumnDefinition + | 'RENAME' ( 'TO' Identifier | 'COLUMN' Identifier 'TO' Identifier ) + | 'DROP' 'COLUMN' Identifier +``` + +- **AlterTableStatement**: Governs DDL modifications to existing tables. +- **AlterAction**: Defines sub-commands for table modification: + - `ADD` or `MODIFY`: Adds new columns or alters data types and constraints on existing columns. + - `RENAME`: Renames the table or a specific column. + - `DROP COLUMN`: Drops a column from the schema, signaling the storage layer to purge or ignore the associated data. + +### Drop Table Statement + +```ebnf +DropTableStatement ::= 'DROP' 'TABLE' Identifier +``` + +- **DropTableStatement**: Governs table deletion syntax. The execution of this statement instructs the storage engine to drop table files and clean up catalog metadata. + +--- + +## Column Definition and Constraints + +### Column Definition and Constraints + +```ebnf +ColumnDefinition ::= Identifier DataType ColumnConstraints? +ColumnConstraints ::= KeyConstraint + | NullConstraint + | DefaultConstraint + | KeyConstraint NullConstraint DefaultConstraint + | KeyConstraint DefaultConstraint NullConstraint + | NullConstraint KeyConstraint DefaultConstraint + | NullConstraint DefaultConstraint KeyConstraint + | DefaultConstraint KeyConstraint NullConstraint + | DefaultConstraint NullConstraint KeyConstraint + | KeyConstraint NullConstraint + | NullConstraint KeyConstraint + | KeyConstraint DefaultConstraint + | DefaultConstraint KeyConstraint + | NullConstraint DefaultConstraint + | DefaultConstraint NullConstraint + +KeyConstraint ::= 'PRIMARY' 'KEY' | 'UNIQUE' +NullConstraint ::= 'NOT' 'NULL' +DefaultConstraint ::= 'DEFAULT' SignedLiteral +SignedLiteral ::= Literal | '-' NumericLiteral +``` + +- **ColumnDefinition**: Associates a column name (Identifier) with a concrete data type and optional constraints. +- **ColumnConstraints**: Models combinations of column constraints. Permuting these options explicitly in the grammar allows the parser to validate constraint ordering without requiring custom AST post-validation logic. +- **KeyConstraint**: Configures uniqueness checks. `PRIMARY KEY` registers the column as the primary key of the table, while `UNIQUE` enforces unique constraints. +- **NullConstraint**: Sets nullability rules. `NOT NULL` prevents null values from being inserted. +- **DefaultConstraint**: Assigns a default value for the column when no value is provided during inserts. +- **SignedLiteral**: Allows literal numbers and values to carry positive or negative signs. + +--- + +## Data Manipulation + +### Select Statement + +```ebnf +SelectStatement ::= 'SELECT' SelectList 'FROM' Identifier WhereClause? LimitClause? +SelectList ::= '*' | SelectColumn ( ',' SelectColumn )* +SelectColumn ::= Expression ( 'AS' Identifier )? +``` + +- **SelectStatement**: Defines syntax for retrieving records from a target table. It processes projections, source tables, logical filters, and row limits. +- **SelectList**: Specifies target columns or expressions to project. A wildcard (`*`) denotes all columns. +- **SelectColumn**: Resolves to a column name or an expression, optionally bound to a display alias using `AS`. + +### Insert Statement + +```ebnf +InsertStatement ::= 'INSERT' 'INTO' Identifier + ( '(' Identifier ( ',' Identifier )* ')' )? + 'VALUES' ValueRow ( ',' ValueRow )* +ValueRow ::= '(' Expression ( ',' Expression )* ')' +``` + +- **InsertStatement**: Specifies the insert interface. It takes a destination table, an optional list of target columns, and rows of values to append. +- **ValueRow**: A grouped tuple of expressions representing the values for a single record. + +### Update Statement + +```ebnf +UpdateStatement ::= 'UPDATE' Identifier 'SET' SetItem ( ',' SetItem )* WhereClause? +SetItem ::= Identifier '=' Expression +``` + +- **UpdateStatement**: Defines syntax for updating values in existing database rows. +- **SetItem**: A key-value pair associating a target column name with an expression. + +### Delete Statement + +```ebnf +DeleteStatement ::= 'DELETE' 'FROM' Identifier WhereClause? +``` + +- **DeleteStatement**: Outlines row deletion criteria. If a `WhereClause` is absent, it deletes all records from the target table. + +--- + +## Query Clauses and Modifiers + +### Where Clause + +```ebnf +WhereClause ::= 'WHERE' Condition +Condition ::= OrCondition +OrCondition ::= AndCondition ( 'OR' AndCondition )* +AndCondition ::= NotCondition ( 'AND' NotCondition )* +NotCondition ::= ConditionPrimary | 'NOT' NotCondition +ConditionPrimary ::= Predicate | '(' Condition ')' +Predicate ::= Expression ComparisonOperator Expression +ComparisonOperator ::= '=' | '!=' | '<>' | '<' | '>' | '<=' | '>=' +``` + +- **WhereClause**: Restricts the records processed by DML statements based on logical conditions. +- **Condition / OrCondition / AndCondition / NotCondition**: Implements a logical expression parser. Splitting these levels establishes operator precedence for boolean logic, ensuring `AND` binds tighter than `OR` and `NOT` binds tighter than `AND`. +- **ConditionPrimary**: Encapsulates atomic predicates or nested conditional expressions inside parentheses to override precedence. +- **Predicate**: Performs value comparisons. +- **ComparisonOperator**: Matches standard comparison symbols for equality, inequality, and order. + +### Limit Clause + +```ebnf +LimitClause ::= 'LIMIT' IntegerLiteral +``` + +- **LimitClause**: Sets a maximum limit on the number of records returned. + +--- + +## Expressions and Operations + +### Expression, Term, and Factor + +```ebnf +Expression ::= Term ( ( '+' | '-' ) Term )* +Term ::= Factor ( ( '*' | '/' | '%' ) Factor )* +Factor ::= Literal | Identifier | '(' Expression ')' | '-' Factor +``` + +- **Expression / Term / Factor**: Configures mathematical order of operations: + - `Factor` processes base operands, parenthesized expressions, and negative signs. + - `Term` evaluates multiplicative operations (`*`, `/`, `%`) which take precedence over additive operations. + - `Expression` evaluates additive operations (`+`, `-`). + +--- + +## Data Types + +### Data Type + +```ebnf +DataType ::= 'INT' + | 'BIGINT' + | 'VARCHAR' '(' IntegerLiteral ')' + | 'BOOLEAN' + | 'TEXT' + | 'TIMESTAMP' +``` + +- **DataType**: Enforces field validation. It defines supported column datatypes, including variable-length strings (`VARCHAR` with explicit sizing constraint), fixed types (`INT`, `BIGINT`, `BOOLEAN`, `TEXT`), and date/time markers (`TIMESTAMP`). + +--- + +## Lexical Rules + +### Identifier and Literals + +```ebnf +Identifier ::= Letter ( Letter | Digit | '_' )* + +Literal ::= NumericLiteral | StringLiteral | BooleanLiteral | NullLiteral +NullLiteral ::= 'NULL' +BooleanLiteral ::= 'TRUE' | 'FALSE' +NumericLiteral ::= IntegerLiteral | FloatLiteral +IntegerLiteral ::= Digit+ +FloatLiteral ::= Digit+ '.' Digit+ +StringLiteral ::= "'" Character+ "'" + +Letter ::= LowercaseLetter | UppercaseLetter +LowercaseLetter ::= 'a' | 'b' | 'c' | 'd' | 'e' | 'f' | 'g' | 'h' | 'i' | 'j' | 'k' | 'l' | 'm' + | 'n' | 'o' | 'p' | 'q' | 'r' | 's' | 't' | 'u' | 'v' | 'w' | 'x' | 'y' | 'z' +UppercaseLetter ::= 'A' | 'B' | 'C' | 'D' | 'E' | 'F' | 'G' | 'H' | 'I' | 'J' | 'K' | 'L' | 'M' + | 'N' | 'O' | 'P' | 'Q' | 'R' | 'S' | 'T' | 'U' | 'V' | 'W' | 'X' | 'Y' | 'Z' +Digit ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' +Character ::= Letter | Digit | '_' | ' ' | '-' | '@' | '.' +``` + +- **Identifier**: Governs database, table, and column names. They must begin with a letter and can include letters, digits, and underscores. +- **Literal**: Denotes fixed data values. +- **NullLiteral / BooleanLiteral**: Captures SQL boolean flags (`TRUE`/`FALSE`) and the missing data flag (`NULL`). +- **NumericLiteral / IntegerLiteral / FloatLiteral**: Governs integer and fractional digits. +- **StringLiteral**: Resolves single-quoted character sequences representing raw text values. +- **Letter / Digit / Character**: Fundamental character sets allowed within identifiers and string values. diff --git a/docs/sql-grammar/grammar.bnf b/docs/sql-grammar/grammar.bnf new file mode 100644 index 0000000..d6596f1 --- /dev/null +++ b/docs/sql-grammar/grammar.bnf @@ -0,0 +1,191 @@ + ::= + ::= | | + + ::= + ::= | | + ::= | | | + + ::= | | + + ::= + ::= | + + ::= + ::= | | | + ::= | + ::= | + ::= + ::= | + ::= + + ::= + + ::= | + + ::= + | + | + | + | + | + | + | + | + | + | + | + | + | + | + + ::= | + ::= + ::= + + ::= | + + ::= + | + | + | + + ::= | + ::= | + ::= | + + ::= + | + + ::= | + ::= | + ::= + ::= | + + ::= + | + + ::= | + ::= + + ::= + | + + ::= + ::= + ::= | + ::= | + ::= | + ::= | + ::= + + ::= + | + | + | + | + | + + ::= + + ::= | | + ::= | | | + ::= | | | + + ::= + | + | + | + | + | + + ::= | + ::= | | + | | | + + ::= | | | + ::= + ::= | + ::= | + ::= + ::= + ::= | + ::= + ::= | + + ::= | + ::= "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l" | "m" + | "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z" + ::= "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J" | "K" | "L" | "M" + | "N" | "O" | "P" | "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z" + ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" + ::= | | "_" | " " | "-" | "@" | "." + + ::= "CREATE" + ::= "DATABASE" + ::= "SCHEMA" + ::= "USE" + ::= "DROP" + + ::= "TABLE" + ::= "ALTER" + ::= "ADD" + ::= "COLUMN" + ::= "MODIFY" + ::= "RENAME" + ::= "TO" + + ::= "SELECT" + ::= "FROM" + ::= "WHERE" + ::= "LIMIT" + ::= "AS" + ::= "INSERT" + ::= "INTO" + ::= "VALUES" + ::= "UPDATE" + ::= "SET" + ::= "DELETE" + + ::= "PRIMARY" + ::= "KEY" + ::= "NOT" + ::= "NULL" + ::= "DEFAULT" + ::= "UNIQUE" + + ::= "AND" + ::= "OR" + ::= "TRUE" + ::= "FALSE" + + ::= "INT" + ::= "BIGINT" + ::= "VARCHAR" + ::= "BOOLEAN" + ::= "TEXT" + ::= "TIMESTAMP" + + ::= "(" + ::= ")" + ::= "," + ::= "." + ::= ";" + ::= "'" + ::= "\"" + + ::= "=" + ::= "!=" | "<>" + ::= "<" + ::= ">" + ::= "<=" + ::= ">=" + + ::= "+" + ::= "-" + ::= "*" + ::= "/" + ::= "%" + + ::= "_" + + ::= ";" \ No newline at end of file diff --git a/docs/sql-grammar/grammar.ebnf b/docs/sql-grammar/grammar.ebnf new file mode 100644 index 0000000..b310d2f --- /dev/null +++ b/docs/sql-grammar/grammar.ebnf @@ -0,0 +1,96 @@ +Statement ::= ManipulationStatement ';' + +ManipulationStatement ::= DbManipulationStatement + | TableManipulationStatement + | DataManipulationStatement + +DbManipulationStatement ::= ( 'CREATE' | 'DROP' ) 'DATABASE' Identifier | 'USE' Identifier +TableManipulationStatement ::= CreateTableStatement | AlterTableStatement | DropTableStatement +DataManipulationStatement ::= InsertStatement | SelectStatement | UpdateStatement | DeleteStatement + +CreateTableStatement ::= 'CREATE' 'TABLE' Identifier '(' ColumnDefinition ( ',' ColumnDefinition )* ')' + +AlterTableStatement ::= 'ALTER' 'TABLE' Identifier AlterAction +AlterAction ::= ( 'ADD' | 'MODIFY' ) 'COLUMN'? ColumnDefinition + | 'RENAME' ( 'TO' Identifier | 'COLUMN' Identifier 'TO' Identifier ) + | 'DROP' 'COLUMN' Identifier + +DropTableStatement ::= 'DROP' 'TABLE' Identifier + +ColumnDefinition ::= Identifier DataType ColumnConstraints? +ColumnConstraints ::= KeyConstraint + | NullConstraint + | DefaultConstraint + | KeyConstraint NullConstraint DefaultConstraint + | KeyConstraint DefaultConstraint NullConstraint + | NullConstraint KeyConstraint DefaultConstraint + | NullConstraint DefaultConstraint KeyConstraint + | DefaultConstraint KeyConstraint NullConstraint + | DefaultConstraint NullConstraint KeyConstraint + | KeyConstraint NullConstraint + | NullConstraint KeyConstraint + | KeyConstraint DefaultConstraint + | DefaultConstraint KeyConstraint + | NullConstraint DefaultConstraint + | DefaultConstraint NullConstraint + +KeyConstraint ::= 'PRIMARY' 'KEY' | 'UNIQUE' +NullConstraint ::= 'NOT' 'NULL' +DefaultConstraint ::= 'DEFAULT' SignedLiteral + +SignedLiteral ::= Literal | '-' NumericLiteral + +SelectStatement ::= 'SELECT' SelectList 'FROM' Identifier WhereClause? LimitClause? +SelectList ::= '*' | SelectColumn ( ',' SelectColumn )* +SelectColumn ::= Expression ( 'AS' Identifier )? + +InsertStatement ::= 'INSERT' 'INTO' Identifier + ( '(' Identifier ( ',' Identifier )* ')' )? + 'VALUES' ValueRow ( ',' ValueRow )* + +ValueRow ::= '(' Expression ( ',' Expression )* ')' + +UpdateStatement ::= 'UPDATE' Identifier 'SET' SetItem ( ',' SetItem )* WhereClause? +SetItem ::= Identifier '=' Expression + +DeleteStatement ::= 'DELETE' 'FROM' Identifier WhereClause? + +WhereClause ::= 'WHERE' Condition +Condition ::= OrCondition +OrCondition ::= AndCondition ( 'OR' AndCondition )* +AndCondition ::= NotCondition ( 'AND' NotCondition )* +NotCondition ::= ConditionPrimary | 'NOT' NotCondition +ConditionPrimary ::= Predicate | '(' Condition ')' +Predicate ::= Expression ComparisonOperator Expression +ComparisonOperator ::= '=' | '!=' | '<>' | '<' | '>' | '<=' | '>=' + +LimitClause ::= 'LIMIT' IntegerLiteral + +Expression ::= Term ( ( '+' | '-' ) Term )* +Term ::= Factor ( ( '*' | '/' | '%' ) Factor )* +Factor ::= Literal | Identifier | '(' Expression ')' | '-' Factor + +DataType ::= 'INT' + | 'BIGINT' + | 'VARCHAR' '(' IntegerLiteral ')' + | 'BOOLEAN' + | 'TEXT' + | 'TIMESTAMP' + +Identifier ::= Letter ( Letter | Digit | '_' )* + +Literal ::= NumericLiteral | StringLiteral | BooleanLiteral | NullLiteral +NullLiteral ::= 'NULL' +BooleanLiteral ::= 'TRUE' | 'FALSE' +NumericLiteral ::= IntegerLiteral | FloatLiteral +IntegerLiteral ::= Digit+ +FloatLiteral ::= Digit+ '.' Digit+ +StringLiteral ::= "'" Character+ "'" + +Letter ::= LowercaseLetter | UppercaseLetter +LowercaseLetter ::= 'a' | 'b' | 'c' | 'd' | 'e' | 'f' | 'g' | 'h' | 'i' | 'j' | 'k' | 'l' | 'm' + | 'n' | 'o' | 'p' | 'q' | 'r' | 's' | 't' | 'u' | 'v' | 'w' | 'x' | 'y' | 'z' +UppercaseLetter ::= 'A' | 'B' | 'C' | 'D' | 'E' | 'F' | 'G' | 'H' | 'I' | 'J' | 'K' | 'L' | 'M' + | 'N' | 'O' | 'P' | 'Q' | 'R' | 'S' | 'T' | 'U' | 'V' | 'W' | 'X' | 'Y' | 'Z' +Digit ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' +Character ::= Letter | Digit | '_' | ' ' | '-' | '@' | '.' \ No newline at end of file From 1633a6336c3a1a4a9934fb4214eca98d924091e1 Mon Sep 17 00:00:00 2001 From: rahulc0dy Date: Thu, 4 Jun 2026 11:55:23 +0530 Subject: [PATCH 2/5] Remove ebnf and bnf files, Shift all grammars to grammar.md --- docs/grammar.md | 353 ++++++++++++++++++++-------------- docs/sql-grammar/grammar.bnf | 191 ------------------ docs/sql-grammar/grammar.ebnf | 96 --------- 3 files changed, 207 insertions(+), 433 deletions(-) delete mode 100644 docs/sql-grammar/grammar.bnf delete mode 100644 docs/sql-grammar/grammar.ebnf diff --git a/docs/grammar.md b/docs/grammar.md index 8391eb7..088b7d5 100644 --- a/docs/grammar.md +++ b/docs/grammar.md @@ -2,20 +2,212 @@ This document provides a technical explanation of the PenguinDB SQL grammar. The complete grammar definitions can be obtained in the following formats: -- Backus-Naur Form (BNF): [grammar.bnf](./sql-grammar/grammar.bnf) -- Extended Backus-Naur Form (EBNF): [grammar.ebnf](./sql-grammar/grammar.ebnf) - ---- - ## Case Insensitivity All SQL keywords and unquoted identifiers are case-insensitive. For example, keywords such as `SELECT`, `select`, and `SeLeCt` are evaluated identically. Similarly, unquoted table, column, and database names are resolved case-insensitively. String literals enclosed in single quotes preserve their exact character casing. ---- +## BNF Grammar + +```bnf + ::= + ::= | | + + ::= | | + ::= | | + ::= | | | + + ::= + ::= + ::= + + ::= + ::= | + + ::= + ::= | | | + ::= | + ::= | + ::= + ::= | + ::= + + ::= + + ::= | + + ::= + | + | + | + | + | + | + | + | + | + | + | + | + | + | + + ::= | + ::= + ::= + + ::= | + + ::= + | + | + | + + ::= | + ::= | + ::= | + + ::= + | + + ::= | + ::= | + ::= + ::= | + + ::= + | + + ::= | + ::= + + ::= + | + + ::= + ::= + ::= | + ::= | + ::= | + ::= | + ::= + + ::= + | + | + | + | + | + + ::= + + ::= | | + ::= | | | + ::= | | | + + ::= + | + | + | + | + | + + ::= | + ::= | | + | | | + + ::= | | | + ::= + ::= | + ::= | + ::= + ::= + ::= | + ::= + ::= | + + ::= | + ::= "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l" | "m" + | "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z" + ::= "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J" | "K" | "L" | "M" + | "N" | "O" | "P" | "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z" + ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" + ::= | | "_" | " " | "-" | "@" | "." + + ::= "CREATE" + ::= "DATABASE" + ::= "SCHEMA" + ::= "USE" + ::= "DROP" + + ::= "TABLE" + ::= "ALTER" + ::= "ADD" + ::= "COLUMN" + ::= "MODIFY" + ::= "RENAME" + ::= "TO" + + ::= "SELECT" + ::= "FROM" + ::= "WHERE" + ::= "LIMIT" + ::= "AS" + ::= "INSERT" + ::= "INTO" + ::= "VALUES" + ::= "UPDATE" + ::= "SET" + ::= "DELETE" + + ::= "PRIMARY" + ::= "KEY" + ::= "NOT" + ::= "NULL" + ::= "DEFAULT" + ::= "UNIQUE" + + ::= "AND" + ::= "OR" + ::= "TRUE" + ::= "FALSE" + + ::= "INT" + ::= "BIGINT" + ::= "VARCHAR" + ::= "BOOLEAN" + ::= "TEXT" + ::= "TIMESTAMP" + + ::= "(" + ::= ")" + ::= "," + ::= "." + ::= ";" + ::= "'" + ::= "\"" + + ::= "=" + ::= "!=" | "<>" + ::= "<" + ::= ">" + ::= "<=" + ::= ">=" + + ::= "+" + ::= "-" + ::= "*" + ::= "/" + ::= "%" + + ::= "_" + + ::= ";" -## Statement Entry Points +``` -### Statement and Manipulation Statement +## EBNF Form + +A more readable EBNF form of the grammar is given below: ```ebnf Statement ::= ManipulationStatement ';' @@ -23,66 +215,20 @@ Statement ::= ManipulationStatement ';' ManipulationStatement ::= DbManipulationStatement | TableManipulationStatement | DataManipulationStatement -``` -- **Statement**: Defines the ultimate entry point of the parser. It requires a manipulation statement followed by a terminating semicolon, representing a single complete command. -- **ManipulationStatement**: Categorizes the types of executable operations into database, table, and data manipulation rules. This routing assists the parser in delegating subsequent token streams to specific handlers. - ---- - -## Database Manipulation - -### Database Manipulation Statement - -```ebnf DbManipulationStatement ::= ( 'CREATE' | 'DROP' ) 'DATABASE' Identifier | 'USE' Identifier -``` - -- **CREATE DATABASE / DROP DATABASE**: Defines syntax for creating new databases or deleting existing ones. The parser uses this to trigger file-system level database directory creation or deletion. -- **USE**: Sets the active database context for the current session. Any subsequent table queries will assume this database scope unless explicitly overridden. - ---- - -## Table Schema Manipulation - -### Create Table Statement +TableManipulationStatement ::= CreateTableStatement | AlterTableStatement | DropTableStatement +DataManipulationStatement ::= InsertStatement | SelectStatement | UpdateStatement | DeleteStatement -```ebnf CreateTableStatement ::= 'CREATE' 'TABLE' Identifier '(' ColumnDefinition ( ',' ColumnDefinition )* ')' -``` - -- **CreateTableStatement**: Governs schema definition for new tables. It requires a table name (Identifier) followed by a comma-separated list of column definitions enclosed in parentheses. The parser uses this to build the catalog schema. -### Alter Table Statement - -```ebnf AlterTableStatement ::= 'ALTER' 'TABLE' Identifier AlterAction AlterAction ::= ( 'ADD' | 'MODIFY' ) 'COLUMN'? ColumnDefinition | 'RENAME' ( 'TO' Identifier | 'COLUMN' Identifier 'TO' Identifier ) | 'DROP' 'COLUMN' Identifier -``` - -- **AlterTableStatement**: Governs DDL modifications to existing tables. -- **AlterAction**: Defines sub-commands for table modification: - - `ADD` or `MODIFY`: Adds new columns or alters data types and constraints on existing columns. - - `RENAME`: Renames the table or a specific column. - - `DROP COLUMN`: Drops a column from the schema, signaling the storage layer to purge or ignore the associated data. - -### Drop Table Statement -```ebnf DropTableStatement ::= 'DROP' 'TABLE' Identifier -``` -- **DropTableStatement**: Governs table deletion syntax. The execution of this statement instructs the storage engine to drop table files and clean up catalog metadata. - ---- - -## Column Definition and Constraints - -### Column Definition and Constraints - -```ebnf ColumnDefinition ::= Identifier DataType ColumnConstraints? ColumnConstraints ::= KeyConstraint | NullConstraint @@ -103,69 +249,24 @@ ColumnConstraints ::= KeyConstraint KeyConstraint ::= 'PRIMARY' 'KEY' | 'UNIQUE' NullConstraint ::= 'NOT' 'NULL' DefaultConstraint ::= 'DEFAULT' SignedLiteral -SignedLiteral ::= Literal | '-' NumericLiteral -``` - -- **ColumnDefinition**: Associates a column name (Identifier) with a concrete data type and optional constraints. -- **ColumnConstraints**: Models combinations of column constraints. Permuting these options explicitly in the grammar allows the parser to validate constraint ordering without requiring custom AST post-validation logic. -- **KeyConstraint**: Configures uniqueness checks. `PRIMARY KEY` registers the column as the primary key of the table, while `UNIQUE` enforces unique constraints. -- **NullConstraint**: Sets nullability rules. `NOT NULL` prevents null values from being inserted. -- **DefaultConstraint**: Assigns a default value for the column when no value is provided during inserts. -- **SignedLiteral**: Allows literal numbers and values to carry positive or negative signs. - ---- - -## Data Manipulation -### Select Statement +SignedLiteral ::= Literal | '-' NumericLiteral -```ebnf SelectStatement ::= 'SELECT' SelectList 'FROM' Identifier WhereClause? LimitClause? SelectList ::= '*' | SelectColumn ( ',' SelectColumn )* SelectColumn ::= Expression ( 'AS' Identifier )? -``` - -- **SelectStatement**: Defines syntax for retrieving records from a target table. It processes projections, source tables, logical filters, and row limits. -- **SelectList**: Specifies target columns or expressions to project. A wildcard (`*`) denotes all columns. -- **SelectColumn**: Resolves to a column name or an expression, optionally bound to a display alias using `AS`. - -### Insert Statement -```ebnf InsertStatement ::= 'INSERT' 'INTO' Identifier ( '(' Identifier ( ',' Identifier )* ')' )? 'VALUES' ValueRow ( ',' ValueRow )* -ValueRow ::= '(' Expression ( ',' Expression )* ')' -``` -- **InsertStatement**: Specifies the insert interface. It takes a destination table, an optional list of target columns, and rows of values to append. -- **ValueRow**: A grouped tuple of expressions representing the values for a single record. - -### Update Statement +ValueRow ::= '(' Expression ( ',' Expression )* ')' -```ebnf UpdateStatement ::= 'UPDATE' Identifier 'SET' SetItem ( ',' SetItem )* WhereClause? SetItem ::= Identifier '=' Expression -``` - -- **UpdateStatement**: Defines syntax for updating values in existing database rows. -- **SetItem**: A key-value pair associating a target column name with an expression. -### Delete Statement - -```ebnf DeleteStatement ::= 'DELETE' 'FROM' Identifier WhereClause? -``` - -- **DeleteStatement**: Outlines row deletion criteria. If a `WhereClause` is absent, it deletes all records from the target table. - ---- - -## Query Clauses and Modifiers - -### Where Clause -```ebnf WhereClause ::= 'WHERE' Condition Condition ::= OrCondition OrCondition ::= AndCondition ( 'OR' AndCondition )* @@ -174,63 +275,20 @@ NotCondition ::= ConditionPrimary | 'NOT' NotCondition ConditionPrimary ::= Predicate | '(' Condition ')' Predicate ::= Expression ComparisonOperator Expression ComparisonOperator ::= '=' | '!=' | '<>' | '<' | '>' | '<=' | '>=' -``` - -- **WhereClause**: Restricts the records processed by DML statements based on logical conditions. -- **Condition / OrCondition / AndCondition / NotCondition**: Implements a logical expression parser. Splitting these levels establishes operator precedence for boolean logic, ensuring `AND` binds tighter than `OR` and `NOT` binds tighter than `AND`. -- **ConditionPrimary**: Encapsulates atomic predicates or nested conditional expressions inside parentheses to override precedence. -- **Predicate**: Performs value comparisons. -- **ComparisonOperator**: Matches standard comparison symbols for equality, inequality, and order. -### Limit Clause - -```ebnf LimitClause ::= 'LIMIT' IntegerLiteral -``` - -- **LimitClause**: Sets a maximum limit on the number of records returned. - ---- - -## Expressions and Operations - -### Expression, Term, and Factor -```ebnf Expression ::= Term ( ( '+' | '-' ) Term )* Term ::= Factor ( ( '*' | '/' | '%' ) Factor )* Factor ::= Literal | Identifier | '(' Expression ')' | '-' Factor -``` - -- **Expression / Term / Factor**: Configures mathematical order of operations: - - `Factor` processes base operands, parenthesized expressions, and negative signs. - - `Term` evaluates multiplicative operations (`*`, `/`, `%`) which take precedence over additive operations. - - `Expression` evaluates additive operations (`+`, `-`). - ---- - -## Data Types -### Data Type - -```ebnf DataType ::= 'INT' | 'BIGINT' | 'VARCHAR' '(' IntegerLiteral ')' | 'BOOLEAN' | 'TEXT' | 'TIMESTAMP' -``` - -- **DataType**: Enforces field validation. It defines supported column datatypes, including variable-length strings (`VARCHAR` with explicit sizing constraint), fixed types (`INT`, `BIGINT`, `BOOLEAN`, `TEXT`), and date/time markers (`TIMESTAMP`). - ---- -## Lexical Rules - -### Identifier and Literals - -```ebnf Identifier ::= Letter ( Letter | Digit | '_' )* Literal ::= NumericLiteral | StringLiteral | BooleanLiteral | NullLiteral @@ -248,8 +306,11 @@ UppercaseLetter ::= 'A' | 'B' | 'C' | 'D' | 'E' | 'F' | 'G' | 'H' | 'I' | ' | 'N' | 'O' | 'P' | 'Q' | 'R' | 'S' | 'T' | 'U' | 'V' | 'W' | 'X' | 'Y' | 'Z' Digit ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' Character ::= Letter | Digit | '_' | ' ' | '-' | '@' | '.' + ``` +### Notes + - **Identifier**: Governs database, table, and column names. They must begin with a letter and can include letters, digits, and underscores. - **Literal**: Denotes fixed data values. - **NullLiteral / BooleanLiteral**: Captures SQL boolean flags (`TRUE`/`FALSE`) and the missing data flag (`NULL`). diff --git a/docs/sql-grammar/grammar.bnf b/docs/sql-grammar/grammar.bnf deleted file mode 100644 index d6596f1..0000000 --- a/docs/sql-grammar/grammar.bnf +++ /dev/null @@ -1,191 +0,0 @@ - ::= - ::= | | - - ::= - ::= | | - ::= | | | - - ::= | | - - ::= - ::= | - - ::= - ::= | | | - ::= | - ::= | - ::= - ::= | - ::= - - ::= - - ::= | - - ::= - | - | - | - | - | - | - | - | - | - | - | - | - | - | - - ::= | - ::= - ::= - - ::= | - - ::= - | - | - | - - ::= | - ::= | - ::= | - - ::= - | - - ::= | - ::= | - ::= - ::= | - - ::= - | - - ::= | - ::= - - ::= - | - - ::= - ::= - ::= | - ::= | - ::= | - ::= | - ::= - - ::= - | - | - | - | - | - - ::= - - ::= | | - ::= | | | - ::= | | | - - ::= - | - | - | - | - | - - ::= | - ::= | | - | | | - - ::= | | | - ::= - ::= | - ::= | - ::= - ::= - ::= | - ::= - ::= | - - ::= | - ::= "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l" | "m" - | "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z" - ::= "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J" | "K" | "L" | "M" - | "N" | "O" | "P" | "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z" - ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" - ::= | | "_" | " " | "-" | "@" | "." - - ::= "CREATE" - ::= "DATABASE" - ::= "SCHEMA" - ::= "USE" - ::= "DROP" - - ::= "TABLE" - ::= "ALTER" - ::= "ADD" - ::= "COLUMN" - ::= "MODIFY" - ::= "RENAME" - ::= "TO" - - ::= "SELECT" - ::= "FROM" - ::= "WHERE" - ::= "LIMIT" - ::= "AS" - ::= "INSERT" - ::= "INTO" - ::= "VALUES" - ::= "UPDATE" - ::= "SET" - ::= "DELETE" - - ::= "PRIMARY" - ::= "KEY" - ::= "NOT" - ::= "NULL" - ::= "DEFAULT" - ::= "UNIQUE" - - ::= "AND" - ::= "OR" - ::= "TRUE" - ::= "FALSE" - - ::= "INT" - ::= "BIGINT" - ::= "VARCHAR" - ::= "BOOLEAN" - ::= "TEXT" - ::= "TIMESTAMP" - - ::= "(" - ::= ")" - ::= "," - ::= "." - ::= ";" - ::= "'" - ::= "\"" - - ::= "=" - ::= "!=" | "<>" - ::= "<" - ::= ">" - ::= "<=" - ::= ">=" - - ::= "+" - ::= "-" - ::= "*" - ::= "/" - ::= "%" - - ::= "_" - - ::= ";" \ No newline at end of file diff --git a/docs/sql-grammar/grammar.ebnf b/docs/sql-grammar/grammar.ebnf deleted file mode 100644 index b310d2f..0000000 --- a/docs/sql-grammar/grammar.ebnf +++ /dev/null @@ -1,96 +0,0 @@ -Statement ::= ManipulationStatement ';' - -ManipulationStatement ::= DbManipulationStatement - | TableManipulationStatement - | DataManipulationStatement - -DbManipulationStatement ::= ( 'CREATE' | 'DROP' ) 'DATABASE' Identifier | 'USE' Identifier -TableManipulationStatement ::= CreateTableStatement | AlterTableStatement | DropTableStatement -DataManipulationStatement ::= InsertStatement | SelectStatement | UpdateStatement | DeleteStatement - -CreateTableStatement ::= 'CREATE' 'TABLE' Identifier '(' ColumnDefinition ( ',' ColumnDefinition )* ')' - -AlterTableStatement ::= 'ALTER' 'TABLE' Identifier AlterAction -AlterAction ::= ( 'ADD' | 'MODIFY' ) 'COLUMN'? ColumnDefinition - | 'RENAME' ( 'TO' Identifier | 'COLUMN' Identifier 'TO' Identifier ) - | 'DROP' 'COLUMN' Identifier - -DropTableStatement ::= 'DROP' 'TABLE' Identifier - -ColumnDefinition ::= Identifier DataType ColumnConstraints? -ColumnConstraints ::= KeyConstraint - | NullConstraint - | DefaultConstraint - | KeyConstraint NullConstraint DefaultConstraint - | KeyConstraint DefaultConstraint NullConstraint - | NullConstraint KeyConstraint DefaultConstraint - | NullConstraint DefaultConstraint KeyConstraint - | DefaultConstraint KeyConstraint NullConstraint - | DefaultConstraint NullConstraint KeyConstraint - | KeyConstraint NullConstraint - | NullConstraint KeyConstraint - | KeyConstraint DefaultConstraint - | DefaultConstraint KeyConstraint - | NullConstraint DefaultConstraint - | DefaultConstraint NullConstraint - -KeyConstraint ::= 'PRIMARY' 'KEY' | 'UNIQUE' -NullConstraint ::= 'NOT' 'NULL' -DefaultConstraint ::= 'DEFAULT' SignedLiteral - -SignedLiteral ::= Literal | '-' NumericLiteral - -SelectStatement ::= 'SELECT' SelectList 'FROM' Identifier WhereClause? LimitClause? -SelectList ::= '*' | SelectColumn ( ',' SelectColumn )* -SelectColumn ::= Expression ( 'AS' Identifier )? - -InsertStatement ::= 'INSERT' 'INTO' Identifier - ( '(' Identifier ( ',' Identifier )* ')' )? - 'VALUES' ValueRow ( ',' ValueRow )* - -ValueRow ::= '(' Expression ( ',' Expression )* ')' - -UpdateStatement ::= 'UPDATE' Identifier 'SET' SetItem ( ',' SetItem )* WhereClause? -SetItem ::= Identifier '=' Expression - -DeleteStatement ::= 'DELETE' 'FROM' Identifier WhereClause? - -WhereClause ::= 'WHERE' Condition -Condition ::= OrCondition -OrCondition ::= AndCondition ( 'OR' AndCondition )* -AndCondition ::= NotCondition ( 'AND' NotCondition )* -NotCondition ::= ConditionPrimary | 'NOT' NotCondition -ConditionPrimary ::= Predicate | '(' Condition ')' -Predicate ::= Expression ComparisonOperator Expression -ComparisonOperator ::= '=' | '!=' | '<>' | '<' | '>' | '<=' | '>=' - -LimitClause ::= 'LIMIT' IntegerLiteral - -Expression ::= Term ( ( '+' | '-' ) Term )* -Term ::= Factor ( ( '*' | '/' | '%' ) Factor )* -Factor ::= Literal | Identifier | '(' Expression ')' | '-' Factor - -DataType ::= 'INT' - | 'BIGINT' - | 'VARCHAR' '(' IntegerLiteral ')' - | 'BOOLEAN' - | 'TEXT' - | 'TIMESTAMP' - -Identifier ::= Letter ( Letter | Digit | '_' )* - -Literal ::= NumericLiteral | StringLiteral | BooleanLiteral | NullLiteral -NullLiteral ::= 'NULL' -BooleanLiteral ::= 'TRUE' | 'FALSE' -NumericLiteral ::= IntegerLiteral | FloatLiteral -IntegerLiteral ::= Digit+ -FloatLiteral ::= Digit+ '.' Digit+ -StringLiteral ::= "'" Character+ "'" - -Letter ::= LowercaseLetter | UppercaseLetter -LowercaseLetter ::= 'a' | 'b' | 'c' | 'd' | 'e' | 'f' | 'g' | 'h' | 'i' | 'j' | 'k' | 'l' | 'm' - | 'n' | 'o' | 'p' | 'q' | 'r' | 's' | 't' | 'u' | 'v' | 'w' | 'x' | 'y' | 'z' -UppercaseLetter ::= 'A' | 'B' | 'C' | 'D' | 'E' | 'F' | 'G' | 'H' | 'I' | 'J' | 'K' | 'L' | 'M' - | 'N' | 'O' | 'P' | 'Q' | 'R' | 'S' | 'T' | 'U' | 'V' | 'W' | 'X' | 'Y' | 'Z' -Digit ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' -Character ::= Letter | Digit | '_' | ' ' | '-' | '@' | '.' \ No newline at end of file From a7dbfb03d2259a082cc4c34d2c17719cf0c6ac54 Mon Sep 17 00:00:00 2001 From: rahulc0dy Date: Fri, 5 Jun 2026 09:56:51 +0530 Subject: [PATCH 3/5] Address review comments --- docs/grammar.md | 220 +++++++++++++++++++++++++----------------------- 1 file changed, 114 insertions(+), 106 deletions(-) diff --git a/docs/grammar.md b/docs/grammar.md index 088b7d5..6838a29 100644 --- a/docs/grammar.md +++ b/docs/grammar.md @@ -9,16 +9,18 @@ All SQL keywords and unquoted identifiers are case-insensitive. For example, key ## BNF Grammar ```bnf + ::= | ::= + ::= | | - ::= | | + ::= | | ::= | | - ::= | | | + ::= | | | ::= - ::= - ::= + ::= + ::= ::= ::= | @@ -33,27 +35,32 @@ All SQL keywords and unquoted identifiers are case-insensitive. For example, key ::= - ::= | - - ::= - | - | - | - | - | - | - | - | - | - | - | - | - | - | + ::= | + + ::= + | + | + | + | + | + | + + ::= + | + | + | + | + + ::= + | + | + + ::= ::= | ::= ::= + ::= ::= | @@ -62,9 +69,10 @@ All SQL keywords and unquoted identifiers are case-insensitive. For example, key | | - ::= | - ::= | - ::= | + ::= | + ::= | + ::= | + ::= | ::= | @@ -120,10 +128,13 @@ All SQL keywords and unquoted identifiers are case-insensitive. For example, key ::= | ::= | ::= - ::= + ::= | ::= | - ::= - ::= | + ::= + | + ::= | + ::= | + ::= | | "_" | " " | "-" | "@" | "." ::= | ::= "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l" | "m" @@ -131,52 +142,51 @@ All SQL keywords and unquoted identifiers are case-insensitive. For example, key ::= "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z" ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" - ::= | | "_" | " " | "-" | "@" | "." - - ::= "CREATE" - ::= "DATABASE" - ::= "SCHEMA" - ::= "USE" - ::= "DROP" - - ::= "TABLE" - ::= "ALTER" - ::= "ADD" - ::= "COLUMN" - ::= "MODIFY" - ::= "RENAME" - ::= "TO" - - ::= "SELECT" - ::= "FROM" - ::= "WHERE" - ::= "LIMIT" - ::= "AS" - ::= "INSERT" - ::= "INTO" - ::= "VALUES" - ::= "UPDATE" - ::= "SET" - ::= "DELETE" - - ::= "PRIMARY" - ::= "KEY" - ::= "NOT" - ::= "NULL" - ::= "DEFAULT" - ::= "UNIQUE" - - ::= "AND" - ::= "OR" - ::= "TRUE" - ::= "FALSE" - - ::= "INT" - ::= "BIGINT" - ::= "VARCHAR" - ::= "BOOLEAN" - ::= "TEXT" - ::= "TIMESTAMP" + + ::= "CREATE" + ::= "DATABASE" + ::= "USE" + ::= "DROP" + + ::= "TABLE" + ::= "ALTER" + ::= "ADD" + ::= "COLUMN" + ::= "MODIFY" + ::= "RENAME" + ::= "TO" + + ::= "SELECT" + ::= "FROM" + ::= "WHERE" + ::= "LIMIT" + ::= "AS" + ::= "INSERT" + ::= "INTO" + ::= "VALUES" + ::= "UPDATE" + ::= "SET" + ::= "DELETE" + + ::= "PRIMARY" + ::= "KEY" + ::= "NOT" + ::= "NULL" + ::= "DEFAULT" + ::= "UNIQUE" + ::= "REFERENCES" + + ::= "AND" + ::= "OR" + ::= "TRUE" + ::= "FALSE" + + ::= "INT" + ::= "BIGINT" + ::= "VARCHAR" + ::= "BOOLEAN" + ::= "TEXT" + ::= "TIMESTAMP" ::= "(" ::= ")" @@ -193,13 +203,13 @@ All SQL keywords and unquoted identifiers are case-insensitive. For example, key ::= "<=" ::= ">=" - ::= "+" - ::= "-" - ::= "*" - ::= "/" - ::= "%" + ::= "+" + ::= "-" + ::= "*" + ::= "/" + ::= "%" - ::= "_" + ::= "_" ::= ";" @@ -210,7 +220,8 @@ All SQL keywords and unquoted identifiers are case-insensitive. For example, key A more readable EBNF form of the grammar is given below: ```ebnf -Statement ::= ManipulationStatement ';' +Program ::= Statement+ +Statement ::= ManipulationStatement ';' ManipulationStatement ::= DbManipulationStatement | TableManipulationStatement @@ -229,32 +240,24 @@ AlterAction ::= ( 'ADD' | 'MODIFY' ) 'COLUMN'? ColumnDefinition DropTableStatement ::= 'DROP' 'TABLE' Identifier -ColumnDefinition ::= Identifier DataType ColumnConstraints? -ColumnConstraints ::= KeyConstraint - | NullConstraint - | DefaultConstraint - | KeyConstraint NullConstraint DefaultConstraint - | KeyConstraint DefaultConstraint NullConstraint - | NullConstraint KeyConstraint DefaultConstraint - | NullConstraint DefaultConstraint KeyConstraint - | DefaultConstraint KeyConstraint NullConstraint - | DefaultConstraint NullConstraint KeyConstraint - | KeyConstraint NullConstraint - | NullConstraint KeyConstraint - | KeyConstraint DefaultConstraint - | DefaultConstraint KeyConstraint - | NullConstraint DefaultConstraint - | DefaultConstraint NullConstraint +ColumnDefinition ::= Identifier DataType ColumnConstraints? + +ColumnConstraints ::= KeyConstraint NullConstraint? DefaultConstraint? ForeignConstraint? + | NullConstraint DefaultConstraint? ForeignConstraint? + | DefaultConstraint ForeignConstraint? + | ForeignConstraint KeyConstraint ::= 'PRIMARY' 'KEY' | 'UNIQUE' NullConstraint ::= 'NOT' 'NULL' DefaultConstraint ::= 'DEFAULT' SignedLiteral +ForeignConstraint ::= 'REFERENCES' Identifier '(' Identifier ')' SignedLiteral ::= Literal | '-' NumericLiteral -SelectStatement ::= 'SELECT' SelectList 'FROM' Identifier WhereClause? LimitClause? -SelectList ::= '*' | SelectColumn ( ',' SelectColumn )* -SelectColumn ::= Expression ( 'AS' Identifier )? +SelectStatement ::= 'SELECT' SelectList 'FROM' Identifier WhereClause? LimitClause? +SelectList ::= '*' | SelectColumn ( ',' SelectColumn )* +SelectColumn ::= SelectExpression ( 'AS' Identifier )? +SelectExpression ::= Expression | Condition InsertStatement ::= 'INSERT' 'INTO' Identifier ( '(' Identifier ( ',' Identifier )* ')' )? @@ -296,8 +299,9 @@ NullLiteral ::= 'NULL' BooleanLiteral ::= 'TRUE' | 'FALSE' NumericLiteral ::= IntegerLiteral | FloatLiteral IntegerLiteral ::= Digit+ -FloatLiteral ::= Digit+ '.' Digit+ -StringLiteral ::= "'" Character+ "'" +FloatLiteral ::= Digit+ '.' Digit+ | '.' Digit+ +StringLiteral ::= "'" StringChar* "'" +StringChar ::= Character | "''" Letter ::= LowercaseLetter | UppercaseLetter LowercaseLetter ::= 'a' | 'b' | 'c' | 'd' | 'e' | 'f' | 'g' | 'h' | 'i' | 'j' | 'k' | 'l' | 'm' @@ -311,9 +315,13 @@ Character ::= Letter | Digit | '_' | ' ' | '-' | '@' | '.' ### Notes -- **Identifier**: Governs database, table, and column names. They must begin with a letter and can include letters, digits, and underscores. +- **Program**: The top-level rule. A program is one or more semicolon-terminated statements, enabling scripts with multiple SQL statements separated by `;`. +- **Identifier**: Governs database, table, and column names. Must begin with a letter and may include letters, digits, and underscores. - **Literal**: Denotes fixed data values. -- **NullLiteral / BooleanLiteral**: Captures SQL boolean flags (`TRUE`/`FALSE`) and the missing data flag (`NULL`). -- **NumericLiteral / IntegerLiteral / FloatLiteral**: Governs integer and fractional digits. -- **StringLiteral**: Resolves single-quoted character sequences representing raw text values. -- **Letter / Digit / Character**: Fundamental character sets allowed within identifiers and string values. +- **NullLiteral / BooleanLiteral**: Captures SQL boolean flags (`TRUE`/`FALSE`) and the missing-data marker (`NULL`). +- **NumericLiteral / IntegerLiteral / FloatLiteral**: Governs integer and fractional digits. `FloatLiteral` accepts both `3.14` and `.14`; a leading digit is not required. +- **StringLiteral**: Resolves single-quoted text values. An empty string `''` is valid. To embed a literal single quote inside a string, double it: `'it''s'` represents `it's`. In the grammar this is expressed via `StringChar ::= Character | "''"`, where `''` is treated as a single escaped-quote unit by the lexer using a greedy longest-match rule. +- **SelectExpression**: A select item may be either an arithmetic `Expression` or a boolean `Condition` (predicate). The two are disjoint at the grammar level — expressions contain no comparison operators, conditions always do — so no ambiguity arises. Conditions used as select items should be enclosed in parentheses for readability and to avoid parser conflicts with the comma separating select columns: `SELECT age, (age < 18) AS is_minor FROM users`. +- **ColumnConstraints**: Supports four constraint types — key, null, default, and foreign — each of which may appear at most once per column. Constraints must be written in canonical order: `KeyConstraint` → `NullConstraint` → `DefaultConstraint` → `ForeignConstraint`. The grammar encodes all 15 valid non-empty subsets of these four types in that fixed order. **Parser note**: the parser must verify at semantic analysis time that no constraint type is duplicated; the grammar structure alone enforces canonical ordering but does not prevent a user from writing the same constraint twice if the grammar were extended permissively. +- **ForeignConstraint**: Column-level referential constraint. Syntax: `REFERENCES table_name (column_name)`, pointing to exactly one column in another table. +- **Letter / Digit / Character**: Fundamental character classes for identifiers and string body characters. From 4d85df6ec8c694dd5140d7647c151021cac8c52f Mon Sep 17 00:00:00 2001 From: rahulc0dy Date: Fri, 5 Jun 2026 18:23:49 +0530 Subject: [PATCH 4/5] Address review comments --- docs/grammar.md | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/docs/grammar.md b/docs/grammar.md index 6838a29..d953c59 100644 --- a/docs/grammar.md +++ b/docs/grammar.md @@ -69,9 +69,8 @@ All SQL keywords and unquoted identifiers are case-insensitive. For example, key | | - ::= | - ::= | - ::= | + ::= | + ::= | | ::= | ::= @@ -128,7 +127,7 @@ All SQL keywords and unquoted identifiers are case-insensitive. For example, key ::= | ::= | ::= - ::= | + ::= | | ::= | ::= | @@ -255,8 +254,8 @@ ForeignConstraint ::= 'REFERENCES' Identifier '(' Identifier ')' SignedLiteral ::= Literal | '-' NumericLiteral SelectStatement ::= 'SELECT' SelectList 'FROM' Identifier WhereClause? LimitClause? -SelectList ::= '*' | SelectColumn ( ',' SelectColumn )* -SelectColumn ::= SelectExpression ( 'AS' Identifier )? +SelectList ::= SelectColumn ( ',' SelectColumn )* +SelectColumn ::= '*' | SelectExpression ( 'AS' Identifier )? SelectExpression ::= Expression | Condition InsertStatement ::= 'INSERT' 'INTO' Identifier @@ -299,7 +298,7 @@ NullLiteral ::= 'NULL' BooleanLiteral ::= 'TRUE' | 'FALSE' NumericLiteral ::= IntegerLiteral | FloatLiteral IntegerLiteral ::= Digit+ -FloatLiteral ::= Digit+ '.' Digit+ | '.' Digit+ +FloatLiteral ::= Digit+ '.' Digit+ | Digit+ '.' | '.' Digit+ StringLiteral ::= "'" StringChar* "'" StringChar ::= Character | "''" @@ -319,9 +318,9 @@ Character ::= Letter | Digit | '_' | ' ' | '-' | '@' | '.' - **Identifier**: Governs database, table, and column names. Must begin with a letter and may include letters, digits, and underscores. - **Literal**: Denotes fixed data values. - **NullLiteral / BooleanLiteral**: Captures SQL boolean flags (`TRUE`/`FALSE`) and the missing-data marker (`NULL`). -- **NumericLiteral / IntegerLiteral / FloatLiteral**: Governs integer and fractional digits. `FloatLiteral` accepts both `3.14` and `.14`; a leading digit is not required. +- **NumericLiteral / IntegerLiteral / FloatLiteral**: Governs integer and fractional digits. `FloatLiteral` accepts all three forms SQL allows: standard (`3.14`), leading-dot (`.14`), and trailing-dot (`10.`). Only `IntegerLiteral` is accepted by `LIMIT` and `VARCHAR`. - **StringLiteral**: Resolves single-quoted text values. An empty string `''` is valid. To embed a literal single quote inside a string, double it: `'it''s'` represents `it's`. In the grammar this is expressed via `StringChar ::= Character | "''"`, where `''` is treated as a single escaped-quote unit by the lexer using a greedy longest-match rule. -- **SelectExpression**: A select item may be either an arithmetic `Expression` or a boolean `Condition` (predicate). The two are disjoint at the grammar level — expressions contain no comparison operators, conditions always do — so no ambiguity arises. Conditions used as select items should be enclosed in parentheses for readability and to avoid parser conflicts with the comma separating select columns: `SELECT age, (age < 18) AS is_minor FROM users`. +- **SelectList / SelectColumn / SelectExpression**: Each item in a select list is independently a `SelectColumn`, which can be a bare `*` or any `SelectExpression` with an optional `AS` alias. This means `*` and other expressions are not disjoint — they can be freely mixed: `SELECT *, price, (price * tax) AS total FROM items` is valid. A `SelectExpression` may be an arithmetic `Expression` or a boolean `Condition`. The two are grammatically disjoint (expressions never contain comparison operators; conditions always do), so no ambiguity arises. Conditions used as select items should be parenthesised to avoid confusion with the column-separating comma: `SELECT age, (age < 18) AS is_minor FROM users`. - **ColumnConstraints**: Supports four constraint types — key, null, default, and foreign — each of which may appear at most once per column. Constraints must be written in canonical order: `KeyConstraint` → `NullConstraint` → `DefaultConstraint` → `ForeignConstraint`. The grammar encodes all 15 valid non-empty subsets of these four types in that fixed order. **Parser note**: the parser must verify at semantic analysis time that no constraint type is duplicated; the grammar structure alone enforces canonical ordering but does not prevent a user from writing the same constraint twice if the grammar were extended permissively. - **ForeignConstraint**: Column-level referential constraint. Syntax: `REFERENCES table_name (column_name)`, pointing to exactly one column in another table. - **Letter / Digit / Character**: Fundamental character classes for identifiers and string body characters. From 1663c12f2f1a44f93d6e46e35634e2b2740c1961 Mon Sep 17 00:00:00 2001 From: rahulc0dy Date: Sat, 6 Jun 2026 09:19:24 +0530 Subject: [PATCH 5/5] Add the new SQL clauses stated before --- docs/grammar.md | 205 ++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 172 insertions(+), 33 deletions(-) diff --git a/docs/grammar.md b/docs/grammar.md index d953c59..0a51c9c 100644 --- a/docs/grammar.md +++ b/docs/grammar.md @@ -18,11 +18,12 @@ All SQL keywords and unquoted identifiers are case-insensitive. For example, key ::= | | ::= | | | - ::= + ::= | ::= - ::= + ::= | ::= + | ::= | ::= @@ -33,7 +34,7 @@ All SQL keywords and unquoted identifiers are case-insensitive. For example, key ::= | ::= - ::= + ::= | ::= | @@ -58,21 +59,59 @@ All SQL keywords and unquoted identifiers are case-insensitive. For example, key ::= ::= | - ::= + ::= | ::= ::= - ::= | + ::= | | - ::= - | - | - | + ::= + | + | + | + + ::= | + + ::= + | + | + | + | + | + | + + ::= + | + | + | + | + + ::= + | + | + | + | + + ::= + | + | ::= | ::= | | ::= | + ::= | + ::= | + ::= | | + ::= | + ::= + | + | + ::= + | | + | | + | | + ::= | @@ -85,7 +124,7 @@ All SQL keywords and unquoted identifiers are case-insensitive. For example, key | ::= | - ::= + ::= ::= | @@ -96,7 +135,17 @@ All SQL keywords and unquoted identifiers are case-insensitive. For example, key ::= | ::= | ::= | - ::= + ::= | | | | + + ::= + ::= + | + ::= + | + ::= + | + ::= + | ::= | @@ -105,11 +154,32 @@ All SQL keywords and unquoted identifiers are case-insensitive. For example, key | | + ::= + ::= | + ::= + + ::= + ::= | + ::= | | + ::= + | ::= | | ::= | | | - ::= | | | + ::= | | + | + | | + + ::= + | + ::= + | | + | | + ::= | + + ::= | + ::= | @@ -132,8 +202,8 @@ All SQL keywords and unquoted identifiers are case-insensitive. For example, key ::= | ::= | - ::= | - ::= | | "_" | " " | "-" | "@" | "." + ::= | + ::= (* any character from the source character set except *) ::= | ::= "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l" | "m" @@ -146,6 +216,8 @@ All SQL keywords and unquoted identifiers are case-insensitive. For example, key ::= "DATABASE" ::= "USE" ::= "DROP" + ::= "IF" + ::= "EXISTS" ::= "TABLE" ::= "ALTER" @@ -156,9 +228,10 @@ All SQL keywords and unquoted identifiers are case-insensitive. For example, key ::= "TO" ::= "SELECT" + ::= "DISTINCT" + ::= "ALL" ::= "FROM" ::= "WHERE" - ::= "LIMIT" ::= "AS" ::= "INSERT" ::= "INTO" @@ -167,6 +240,24 @@ All SQL keywords and unquoted identifiers are case-insensitive. For example, key ::= "SET" ::= "DELETE" + ::= "JOIN" + ::= "INNER" + ::= "LEFT" + ::= "RIGHT" + ::= "FULL" + ::= "OUTER" + ::= "CROSS" + ::= "ON" + + ::= "GROUP" + ::= "HAVING" + ::= "ORDER" + ::= "BY" + ::= "ASC" + ::= "DESC" + ::= "LIMIT" + ::= "OFFSET" + ::= "PRIMARY" ::= "KEY" ::= "NOT" @@ -179,6 +270,10 @@ All SQL keywords and unquoted identifiers are case-insensitive. For example, key ::= "OR" ::= "TRUE" ::= "FALSE" + ::= "LIKE" + ::= "IS" + ::= "IN" + ::= "BETWEEN" ::= "INT" ::= "BIGINT" @@ -211,7 +306,6 @@ All SQL keywords and unquoted identifiers are case-insensitive. For example, key ::= "_" ::= ";" - ``` ## EBNF Form @@ -226,18 +320,20 @@ ManipulationStatement ::= DbManipulationStatement | TableManipulationStatement | DataManipulationStatement -DbManipulationStatement ::= ( 'CREATE' | 'DROP' ) 'DATABASE' Identifier | 'USE' Identifier +DbManipulationStatement ::= 'CREATE' 'DATABASE' ( 'IF' 'NOT' 'EXISTS' )? Identifier + | 'DROP' 'DATABASE' ( 'IF' 'EXISTS' )? Identifier + | 'USE' Identifier TableManipulationStatement ::= CreateTableStatement | AlterTableStatement | DropTableStatement DataManipulationStatement ::= InsertStatement | SelectStatement | UpdateStatement | DeleteStatement -CreateTableStatement ::= 'CREATE' 'TABLE' Identifier '(' ColumnDefinition ( ',' ColumnDefinition )* ')' +CreateTableStatement ::= 'CREATE' 'TABLE' ( 'IF' 'NOT' 'EXISTS' )? Identifier '(' ColumnDefinition ( ',' ColumnDefinition )* ')' AlterTableStatement ::= 'ALTER' 'TABLE' Identifier AlterAction AlterAction ::= ( 'ADD' | 'MODIFY' ) 'COLUMN'? ColumnDefinition | 'RENAME' ( 'TO' Identifier | 'COLUMN' Identifier 'TO' Identifier ) | 'DROP' 'COLUMN' Identifier -DropTableStatement ::= 'DROP' 'TABLE' Identifier +DropTableStatement ::= 'DROP' 'TABLE' ( 'IF' 'EXISTS' )? Identifier ColumnDefinition ::= Identifier DataType ColumnConstraints? @@ -247,13 +343,19 @@ ColumnConstraints ::= KeyConstraint NullConstraint? DefaultConstraint? Forei | ForeignConstraint KeyConstraint ::= 'PRIMARY' 'KEY' | 'UNIQUE' -NullConstraint ::= 'NOT' 'NULL' +NullConstraint ::= 'NOT' 'NULL' | 'NULL' DefaultConstraint ::= 'DEFAULT' SignedLiteral ForeignConstraint ::= 'REFERENCES' Identifier '(' Identifier ')' -SignedLiteral ::= Literal | '-' NumericLiteral +SignedLiteral ::= Literal | ( '+' | '-' ) NumericLiteral -SelectStatement ::= 'SELECT' SelectList 'FROM' Identifier WhereClause? LimitClause? +SelectStatement ::= 'SELECT' ( 'DISTINCT' | 'ALL' )? SelectList + 'FROM' TableReference ( ',' TableReference )* + WhereClause? + GroupByClause? + HavingClause? + OrderByClause? + LimitClause? SelectList ::= SelectColumn ( ',' SelectColumn )* SelectColumn ::= '*' | SelectExpression ( 'AS' Identifier )? SelectExpression ::= Expression | Condition @@ -265,24 +367,51 @@ InsertStatement ::= 'INSERT' 'INTO' Identifier ValueRow ::= '(' Expression ( ',' Expression )* ')' UpdateStatement ::= 'UPDATE' Identifier 'SET' SetItem ( ',' SetItem )* WhereClause? -SetItem ::= Identifier '=' Expression +SetItem ::= QualifiedIdentifier '=' Expression DeleteStatement ::= 'DELETE' 'FROM' Identifier WhereClause? +TableReference ::= TablePrimary ( JoinClause )* +TablePrimary ::= Identifier ( ( 'AS' )? Identifier )? +JoinClause ::= JoinType? 'JOIN' TablePrimary 'ON' Condition +JoinType ::= 'INNER' | 'LEFT' 'OUTER'? | 'RIGHT' 'OUTER'? | 'FULL' 'OUTER'? | 'CROSS' + WhereClause ::= 'WHERE' Condition Condition ::= OrCondition OrCondition ::= AndCondition ( 'OR' AndCondition )* AndCondition ::= NotCondition ( 'AND' NotCondition )* NotCondition ::= ConditionPrimary | 'NOT' NotCondition ConditionPrimary ::= Predicate | '(' Condition ')' -Predicate ::= Expression ComparisonOperator Expression +Predicate ::= ComparisonPredicate + | LikePredicate + | NullPredicate + | InPredicate + | BetweenPredicate +ComparisonPredicate ::= Expression ComparisonOperator Expression +LikePredicate ::= Expression 'NOT'? 'LIKE' Expression +NullPredicate ::= Expression 'IS' 'NOT'? 'NULL' +InPredicate ::= Expression 'NOT'? 'IN' '(' Expression ( ',' Expression )* ')' +BetweenPredicate ::= Expression 'NOT'? 'BETWEEN' Expression 'AND' Expression ComparisonOperator ::= '=' | '!=' | '<>' | '<' | '>' | '<=' | '>=' -LimitClause ::= 'LIMIT' IntegerLiteral +GroupByClause ::= 'GROUP' 'BY' QualifiedIdentifier ( ',' QualifiedIdentifier )* +HavingClause ::= 'HAVING' Condition +OrderByClause ::= 'ORDER' 'BY' OrderByItem ( ',' OrderByItem )* +OrderByItem ::= Expression ( 'ASC' | 'DESC' )? +LimitClause ::= 'LIMIT' IntegerLiteral ( 'OFFSET' IntegerLiteral )? Expression ::= Term ( ( '+' | '-' ) Term )* Term ::= Factor ( ( '*' | '/' | '%' ) Factor )* -Factor ::= Literal | Identifier | '(' Expression ')' | '-' Factor +Factor ::= Literal + | QualifiedIdentifier + | FunctionCall + | '(' Expression ')' + | ( '+' | '-' ) Factor + +FunctionCall ::= Identifier '(' FunctionArgs? ')' +FunctionArgs ::= '*' | ( 'DISTINCT' )? Expression ( ',' Expression )* + +QualifiedIdentifier ::= Identifier ( '.' Identifier )? DataType ::= 'INT' | 'BIGINT' @@ -300,7 +429,8 @@ NumericLiteral ::= IntegerLiteral | FloatLiteral IntegerLiteral ::= Digit+ FloatLiteral ::= Digit+ '.' Digit+ | Digit+ '.' | '.' Digit+ StringLiteral ::= "'" StringChar* "'" -StringChar ::= Character | "''" +StringChar ::= NonQuoteCharacter | "''" +NonQuoteCharacter ::= (* any character except single-quote *) Letter ::= LowercaseLetter | UppercaseLetter LowercaseLetter ::= 'a' | 'b' | 'c' | 'd' | 'e' | 'f' | 'g' | 'h' | 'i' | 'j' | 'k' | 'l' | 'm' @@ -308,19 +438,28 @@ LowercaseLetter ::= 'a' | 'b' | 'c' | 'd' | 'e' | 'f' | 'g' | 'h' | 'i' | ' UppercaseLetter ::= 'A' | 'B' | 'C' | 'D' | 'E' | 'F' | 'G' | 'H' | 'I' | 'J' | 'K' | 'L' | 'M' | 'N' | 'O' | 'P' | 'Q' | 'R' | 'S' | 'T' | 'U' | 'V' | 'W' | 'X' | 'Y' | 'Z' Digit ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' -Character ::= Letter | Digit | '_' | ' ' | '-' | '@' | '.' ``` ### Notes - **Program**: The top-level rule. A program is one or more semicolon-terminated statements, enabling scripts with multiple SQL statements separated by `;`. -- **Identifier**: Governs database, table, and column names. Must begin with a letter and may include letters, digits, and underscores. +- **IF EXISTS / IF NOT EXISTS**: `CREATE DATABASE`, `CREATE TABLE` accept an optional `IF NOT EXISTS` clause to suppress errors when the target already exists. `DROP DATABASE`, `DROP TABLE` accept an optional `IF EXISTS` clause to suppress errors when the target does not exist. +- **Identifier / QualifiedIdentifier**: `Identifier` governs database, table, and column names. Must begin with a letter and may include letters, digits, and underscores. `QualifiedIdentifier` extends this to support dot-separated `table.column` references such as `users.id` or `orders.total`. Qualified identifiers are used in `Factor`, `SetItem`, and `GROUP BY`. - **Literal**: Denotes fixed data values. - **NullLiteral / BooleanLiteral**: Captures SQL boolean flags (`TRUE`/`FALSE`) and the missing-data marker (`NULL`). -- **NumericLiteral / IntegerLiteral / FloatLiteral**: Governs integer and fractional digits. `FloatLiteral` accepts all three forms SQL allows: standard (`3.14`), leading-dot (`.14`), and trailing-dot (`10.`). Only `IntegerLiteral` is accepted by `LIMIT` and `VARCHAR`. -- **StringLiteral**: Resolves single-quoted text values. An empty string `''` is valid. To embed a literal single quote inside a string, double it: `'it''s'` represents `it's`. In the grammar this is expressed via `StringChar ::= Character | "''"`, where `''` is treated as a single escaped-quote unit by the lexer using a greedy longest-match rule. -- **SelectList / SelectColumn / SelectExpression**: Each item in a select list is independently a `SelectColumn`, which can be a bare `*` or any `SelectExpression` with an optional `AS` alias. This means `*` and other expressions are not disjoint — they can be freely mixed: `SELECT *, price, (price * tax) AS total FROM items` is valid. A `SelectExpression` may be an arithmetic `Expression` or a boolean `Condition`. The two are grammatically disjoint (expressions never contain comparison operators; conditions always do), so no ambiguity arises. Conditions used as select items should be parenthesised to avoid confusion with the column-separating comma: `SELECT age, (age < 18) AS is_minor FROM users`. +- **NumericLiteral / IntegerLiteral / FloatLiteral**: Governs integer and fractional digits. `FloatLiteral` accepts all three forms SQL allows: standard (`3.14`), leading-dot (`.14`), and trailing-dot (`10.`). Only `IntegerLiteral` is accepted by `LIMIT`, `OFFSET`, and `VARCHAR`. +- **StringLiteral / NonQuoteCharacter**: Resolves single-quoted text values. An empty string `''` is valid. To embed a literal single quote inside a string, double it: `'it''s'` represents `it's`. Per the SQL standard, `NonQuoteCharacter` is any character from the source character set except the single-quote delimiter. In the grammar this is expressed via `StringChar ::= NonQuoteCharacter | "''"`, where `''` is treated as a single escaped-quote unit by the lexer using a greedy longest-match rule. +- **SignedLiteral**: Supports both unary `+` and `-` for numeric literals in `DEFAULT` values: `DEFAULT -1`, `DEFAULT +5`. +- **SelectStatement**: Supports an optional `DISTINCT` or `ALL` quantifier after `SELECT`, absorbed directly into the four `` alternatives rather than via a nullable rule. The optional clause tail is expressed through four non-nullable helper rules — ``, ``, ``, and `` — each enumerating only the valid non-empty suffixes that may follow a given clause. Together they cover all 23 valid non-empty clause combinations while enforcing canonical ordering (`WHERE → GROUP BY → HAVING → ORDER BY → LIMIT`). `HAVING` is only reachable through ``, so `GROUP BY` before `HAVING` is structurally guaranteed. No nullable rules are used anywhere in the BNF. +- **SelectList / SelectColumn / SelectExpression**: Each item in a select list is independently a `SelectColumn`, which can be a bare `*` or any `SelectExpression` with an optional `AS` alias. A `SelectExpression` may be an arithmetic `Expression` or a boolean `Condition`. +- **TableReference / JoinClause**: A `TableReference` is a `TablePrimary` (an identifier with an optional alias) followed by zero or more `JoinClause`s. Supported join types are: `INNER`, `LEFT [OUTER]`, `RIGHT [OUTER]`, `FULL [OUTER]`, and `CROSS`. All non-cross joins require an `ON` condition. +- **Predicate**: The grammar supports five predicate types: `ComparisonPredicate` (`=`, `!=`, `<>`, `<`, `>`, `<=`, `>=`), `LikePredicate` (`LIKE` / `NOT LIKE`), `NullPredicate` (`IS NULL` / `IS NOT NULL`), `InPredicate` (`IN` / `NOT IN`), and `BetweenPredicate` (`BETWEEN ... AND ...` / `NOT BETWEEN ... AND ...`). +- **GROUP BY / HAVING**: `GROUP BY` accepts a comma-separated list of qualified identifiers. `HAVING` filters groups using a condition and may only appear after `GROUP BY`. +- **ORDER BY**: Accepts a comma-separated list of order items. Each item is an expression with an optional `ASC` (ascending, default) or `DESC` (descending) direction. +- **LIMIT / OFFSET**: `LIMIT` restricts the result set size. An optional `OFFSET` clause skips a specified number of rows before returning results. Both accept only `IntegerLiteral` values. +- **FunctionCall**: Supports general function call syntax: `identifier(args)`. Function arguments can be a bare `*` (for `COUNT(*)`), or one or more expressions optionally preceded by `DISTINCT` (for `COUNT(DISTINCT col)`). This covers all standard aggregate functions (`COUNT`, `SUM`, `AVG`, `MIN`, `MAX`) and any future scalar functions. - **ColumnConstraints**: Supports four constraint types — key, null, default, and foreign — each of which may appear at most once per column. Constraints must be written in canonical order: `KeyConstraint` → `NullConstraint` → `DefaultConstraint` → `ForeignConstraint`. The grammar encodes all 15 valid non-empty subsets of these four types in that fixed order. **Parser note**: the parser must verify at semantic analysis time that no constraint type is duplicated; the grammar structure alone enforces canonical ordering but does not prevent a user from writing the same constraint twice if the grammar were extended permissively. +- **NullConstraint**: Accepts both `NOT NULL` and explicit `NULL`. While `NULL` is the default column behavior, explicitly stating it is valid SQL and commonly used in schema definitions. - **ForeignConstraint**: Column-level referential constraint. Syntax: `REFERENCES table_name (column_name)`, pointing to exactly one column in another table. -- **Letter / Digit / Character**: Fundamental character classes for identifiers and string body characters. +- **Letter / Digit**: Fundamental character classes for identifiers.