Conversation
The vector type is superseded by the unified N-dimensional array model. Dynamic arrays (integer[*]) replace vector<T> for all use cases. Remove the vector.rst document and its toctree entry.
Structs are not part of the simplified language specification. Remove struct.rst, its toctree entry, and the struct declaration listing from globals.rst. Tuple is the remaining aggregate type for ad-hoc grouping of named/positional fields.
2-D matrices are subsumed by the unified N-dimensional array model. The T[N, M] syntax replaces T[N][M], and the T[*, *] form replaces integer[*][*]. Remove matrix.rst, its toctree entry, and the Matrices cross-reference in declarations.rst.
Remove keywords that belonged exclusively to the removed vector, matrix, and struct features: columns, length, reverse, rows, struct, vector. Add 'shape' as a reserved keyword for the built-in N-D array introspection function.
- Remove the 'length' built-in function (callers should use shape(a)[1]) - Update 'shape' description to clarify it returns integer[*] and show both 1-D and 2-D examples using the comma-separated dimension syntax - Update 'reverse' to remove references to Vector; it now applies to any array or string - Update intro paragraph to drop the obsolete Vector compatibility note
Define the rules for constant expressions (constexpr) — expressions that the compiler can evaluate at compile time. This formalises the semantics needed for statically-sized array dimensions and global initialisers: - scalar base-type literals and operators between constexprs - aggregate (array/tuple) constexprs with compile-time known sizes - spread operator on constexpr arrays - distinction between 'const' (immutable) and 'constexpr' (compile-time)
Add a supplementary reference page that classifies every expression in Gazprea as either an lvalue or an rvalue and explains the practical consequences (assignment eligibility, var-argument passing). Includes background on the full C++ taxonomy (glvalue/xvalue/prvalue) so implementors understand why Gazprea intentionally collapses to two categories. The page is placed in a new 'Hints and Reference' toctree section in index.rst.
Replace the narrow 'scalar literal only' restriction with the general constexpr rule: a global's initializer must be a valid constant expression as defined in spec/constexpr.rst. This broadens what is permitted (e.g., constant-folded expressions, constexpr array lookups) while retaining the compile-time guarantee. Explicitly list the three corollaries: no function calls, no dynamic-sized array types, and all globals are implicitly constexpr.
Add dedicated sections for the two sequence-producing operators: Range (..): produces integer[upper - lower] with inclusive lower and exclusive upper bound. Clarifies compile-time vs runtime sizing and the special slicing meaning when used inside [] index expressions. Stride (by): strides through an array returning a deep-copied subset. Result size is N / s (integer division); step must be a positive integer with compile-time checking when the step is a constexpr.
Update the 2-D generator example to use comma-separated dimension syntax: integer[2, 3] instead of integer[2][3], and M[i, j] instead of M[i][j]. This aligns generators with the unified N-D array model where all dimensions appear inside a single pair of square brackets.
Rewrite spec/types/array.rst to reflect the generalised array system that replaces the separate 1-D array, matrix, and vector types: Declaration: - Static dimensions use integer literals or constexprs - Dynamic dimensions use * (asterisk) - N-D arrays declared with comma-separated shape: T[N, M], T[*, N], etc. - Type/size inference via 'var' Construction: - Common promotable element type rule - Empty literal [], including error cases and type-inference restriction Spread operator (new): - ... unpacks elements from an existing array into a literal - Compile-time size verification for static targets Operations: - Slices are rvalues (deep copy); cannot appear on LHS or as var args - shape() semantics for jagged arrays - Concatenation (||) primarily for dynamic arrays - Element-wise ops and broadcasting with trailing-dimension rule - Jagged arrays disallow element-wise ops (compile-time error) Array taxonomy table: - Static, Static N-D, Dynamic 1-D, Regular dynamic N-D, Jagged - Element-wise op eligibility per form
Replace all remaining T[N][M] nested-bracket notation with the comma-separated T[N, M] form that matches the unified array model: - statements.rst: integer[*][*] → integer[*, *]; M[1][2] → M[1, 2] - type_casting.rst: five array cast examples updated - typedef.rst: typealias integer[2][3] → integer[2, 3] Also fix two leftover editorial issues from the previous sequence: - globals.rst: remove stale 'Valid global scope statements' bullet list and 'Variable Declarations' sub-heading (already superseded by the constexpr-based rule added in the previous commit sequence) - declarations.rst: remove trailing space on line 7
Reframe 'string' as a wrapper around character[*] rather than a subtype of the (now-removed) vector type: - Describe string as structurally equivalent to character[*] with the type distinction preserved only for output-formatting purposes - Remove all references to vector methods (concat, push, append) - Document bi-directional implicit promotion with character[*] - Show concatenation exclusively via the || operator (consistent with the array model) - Fix the output example variable name (s, not vec) - Add scalar character concatenation example
Replace the informal prose-and-table description with a structured specification anchored by a Graphviz type lattice diagram: Type Lattice (new): - Graphviz digraph showing every permitted implicit promotion - Concrete edges: integer → real, string ↔ character[*] - Parametric dashed edges: scalar T → T[…] (broadcast), anonymous tuple field-wise promotion - Explicitly states what is NOT permitted (real→integer, boolean/ character promotions, array downcasting) Scalars: - Tighten intro to one-way integer→real rule; remove verbose table preamble referencing the (removed) as<> semantics section Scalar to Array: - Simplify description: scalar broadcasts to match array shape - Keep the integer[5] example; inline the 'other examples' block Tuple to Tuple (new anchor ssec:typePromotion_tuple): - Restrict implicit promotion to anonymous fields only - Named fields require explicit as<> — cross-reference sssec:tuple_casting - Demonstrate two-sided promotion in equality comparison Character Array to/from String (new anchor ssec:typePromotion_string): - Reframe as 'wrapper' bidirectional promotion (consistent with the rewritten string.rst) - Update example to use statically-sized character[5]
functions.rst: - Clarify rule 1: arguments cannot be mutable *or* mutated within the function (not just 'cannot be mutable') - Rule 3: 'Inferred size arrays' → 'Dynamic sized arrays' (consistent with the array model) - Slice-passing paragraph: replace 'Like Rust' vague note with precise statement that slices are rvalues and can only go to const parameters; update example to use real[*] variables instead of removed vector<real> - Fix typo: 'be reference' → 'by reference' - Remove struct namespace bullet from Function Namespacing procedures.rst: - Add paragraph + example showing procedure calls are illegal inside composite literals and as operands of inline casts - Procedure Declarations: document that var qualifiers are part of the type signature and must match between prototype and definition; add increment example - Type Promotion of Arguments: fix String → string, len(x) → shape(x)[1], add explanatory comments to each call site - Aliasing: restore tuple-member wording, remove struct field reference - Aliasing: add 'Slices are not subject to aliasing analysis' paragraph cross-referencing sssec:array_lrvalue and sec:value_categories - Fix typo: 'passed toprocedures' → 'passed to procedures' - Remove struct namespace bullet from Procedure Namespacing type_inference.rst: - Add section showing type inference also applies to procedure call initialisers (var n = get_count() infers var integer)
Document that integer arithmetic is checked at runtime: - Operations that exceed the signed 32-bit range raise OverflowError - Overflow does not wrap silently (unlike C/C++ unsigned semantics) This makes the error behaviour explicit and distinguishable from compile-time SizeError and other runtime errors.
Expand the tuple specification to cover the full named-field model: Declaration: - Fields may be anonymous (index-only access) or named (index or name) - Formalise Type Identity rules: named fields contribute name+type; unnamed fields contribute type only; named and unnamed at the same position are never compatible - Show examples of incompatible tuple types (same underlying types, different name configurations) - Remove the 'at least two elements' constraint and struct cross-ref Literals: - Fully named tuples may use named field syntax (x: 10, y: 3.14) in any order - Anonymous or mixed tuples must be constructed positionally - Add rationale note explaining why mixed-tuple named literals are disallowed (positional ambiguity) - Duplicate field names are a compile-time error Access (renamed from previous section; previously only index access): - Document dual access: by 1-based index (all fields) and by name (named fields only) - Show index and name access on the same variable Operations: - Rewrite Comparison section: equality requires compatible type AND pairwise field equality; show ILLEGAL comparisons between incompatible tuple types Type Casting and Promotion (replaces old table-based ops section): - Implicit promotion: anonymous fields only (field-wise scalar rules) - Named fields are never implicitly promoted - Explicit as<> casting between compatible tuple types with examples - Remove the old Unpacking sub-section (handled in statements.rst)
The defining property of a dynamic array is whether its size is known at compile time, not the presence of the * token. integer[x] with a non-constexpr x is dynamic even without *. Conversely, integer[*] a = [1, 2, 3] may be treated as static since the initialiser length (3) is known at compile time. Implementations only require heap allocation for arrays whose size is truly unknown at compile time.
Implementations are not required to make an eager copy when passing a slice to a function or procedure. A Copy-On-Write or other lazy strategy is semantically equivalent because slices are always passed as const parameters. Added impl/slice_passing.rst with rationale and example, and cross-referenced it from the Slicing section of spec/types/array.rst.
…ntation The value categories document is an implementation-oriented reference rather than a language specification page. Moved spec/hints/value_categories.rst to impl/value_categories.rst and folded it into the Implementation toctree. The separate 'Hints and Reference' toctree section is removed. All :ref: cross-references continue to resolve because the anchor label is unchanged.
We need to note that not all constants are constexprs. Global scope enforces constexpr declarations for globally const variables, but the same constraint does not hold for non-global scopes. Students may attempt to abuse this by using block statements in the global scope, but I contend that such an argument is meaningless since there are no possible uses of a global const value within a block statement. We currently do not allow function/procedure declarations anywhere other than global scope, but we may need to make this explicitly clear.
Upon rereading my restrictions on array slices, jagged arrays are ill formed and cannot be added to after their initial declaration. I conclude that they are incompatible with this version of the specification. I believe the restriction is a good idea: - Tuples can still function like jagged arrays - Element-wise operations are already dissallowed on tuples, so the semantics is even the same
gazprea/spec/types/array.rst
Outdated
| dimension per n-d array, ii) the last dimension of an n-d array with n > 1 | ||
| cannot be dynamic. This prevents the creation of jagged arrays, however | ||
| arrays can hold tuples and vice-versa which provides an avenue for emulating | ||
| jagged arrays. |
There was a problem hiding this comment.
I think it would make more sense to make jagged array literals (or any operations that yeild jagged r-values) illegal.
There was a problem hiding this comment.
I was thinking that they would be useful as a data container, so you could have a tuple tuple (integer[*], integer[*], integer[*]) a = ([1, 2, 3], [3, 4], [5, 6, 7, 8]) especially for things like lookup tables. Thoughts? Currently I was thinking that true jagged arrays are illegal.
There was a problem hiding this comment.
Totally agree that this approach makes sense.
The interpretation I'm advocating is that integer[*][*] be treated as a type-erased wrapper for any integer[n][m] type.
tuple(integer[*], integer[*][*]) container = ([], [[]]) // legal (underlying object type is tuple(integer[0], integer[0][0]))
container = ([1], [[2,3,4]]); // legal (underlying object type is tuple(integer[1], integer[1][3]))
container = ([], [1]); // illegal (wrong dimensionality)
container = ([], [[1],[]]); // illegal (ragged array literal)
The best analog I can think of is std::function in C++:
struct Adder {
int operator()(int x, int y) const {
return x + y;
}
};
struct Multiplier {
int operator()(int x, int y) const {
return x * y;
}
};
int main() {
Adder a{};
Multiplier m{};
std::function<int(int, int)> f = a; // Underlying type is Adder
f = m; // Now underlying type is Multiplier
}
There was a problem hiding this comment.
I think we are aligned here, is there a way I could specify this better? Currently I just have it removed.
|
Also, sorry @rcunrau did not see your comment on the other PR:
Originally posted by @rcunrau in #89 (comment) So I looked through my previous proposal and agree. This one, in contrast, slims things down a fair bit and should clarify a few of the larger issues that students faced last term. WRT specific comments:
I should note that the only reason I started this so early is so that I can bring the solution compiler up to speed. |
|
|
||
| **Anonymous or mixed tuples must be constructed positionally.** When any field | ||
| in a tuple type is unnamed, the entire literal must list values in declaration | ||
| order with no field name labels: |
There was a problem hiding this comment.
What is the rationale behind having mixed tuples? They don't seem very useful to me.
If the purpose of mixed tuples is to add a bit of complexity to the implementation, in my opinion this complexity is not particularly "interesting".
To be clear, the mechanics of mixed tuples are well-described here; I'm just wondering why they're allowed.
There was a problem hiding this comment.
I'm thinking that they can almost be like private fields/structural fields. I admit that it is a little bit of a contrived example, but if you have two types typedef tuple (integer a, float) A; and typedef tuple(integer a, integer) B and they represent some related quantities but one requires the float and the other doesn't (like an iterator) then you can upcase B -> A implicitly.
I can see how this is perhaps not interesting, but I think it is consistent with the definition of a tuple in the language if we fold tagged fields into tuples.
Would you say that only having either entirely tagged or entirely untagged (anonymous) tuples makes better sense? My counter argument was that to have, forcibly, either one or the other feels arbitrary.
There was a problem hiding this comment.
I think that having either entirely tagged or entirely untagged tuples makes more sense, partially because it better matches the languages I know (which may or may not be a compelling reason for you):
- Rust - tuple and tuple
struct(untagged) vs. normalstruct(tagged):/* Allowed */ // Instances accessed with .0, .1 struct TupleStruct(i32, i32); // Also accessed with .0, .1 let tup: (i32, i32) = (1, 2); // Instances accessed with .a, .b // // Indexed access with .0 and .1 isn't allowed in Rust. // My understanding is that this is because the Rust compiler // has the freedom to reorder the fields // and it would be a bit odd to allow .0 to actually access // field 1 after reordering. This isn't really a problem // for Gazprea because there is no reason to reorder fields, // so it's fine to allow indexed access for tagged tuples. struct NormalStruct { a: i32, b: i32 } /* Not allowed */ struct MixedStruct { a: i32, i32 }
- C#, C++ -
tuple(untagged) vs.struct(tagged) - Python -
tuple(untagged) vs.collections.namedtuple(tagged, also allows access by index)
| const Y = x + 5; // Not a constexpr: depends on a 'var' | ||
|
|
||
| function get_val() returns integer { return 100; } | ||
| const Z = get_val(); // Not a constexpr: depends on a function call |
There was a problem hiding this comment.
It would be great if constexpr function calls (not procedure calls) were allowed, although I admit this might be better as an optional extension.
A function call is constexpr if all arguments are constexpr. For example:
function my_number() returns integer { return 2; }
function times_two(integer val) returns integer { return val * 2; }
procedure two_times(integer val) returns integer { return val * 2; }
//
// Constexpr
//
const m = my_number(); // function call where all arguments are constexpr
const n = times_two(m); // function call where all arguments are constexpr
//
// Not constexpr
//
var integer x;
x <- std_input;
const p = times_two(x); // function call but (at least) one of the arguments is not constexpr
const q = two_times(m); // procedure call is not constexpr even if arguments are constexpr
As with the existing proposal, adventurous teams might try to actually evaluate all constexprs at compile-time (by implementing a partial Gazprea interpreter to execute constexpr functions at compile-time), while I assume most teams will only perform constexpr validation at compile-time and defer the computation to runtime. Admittedly, having constexpr functions makes things a lot harder for the adventurous teams since the constexpr evaluator will have to deal with every control-flow construct that exists in the language.
There was a problem hiding this comment.
Yeah, this would have to be an extension. We would run into a problem where you end up implementing a full gazprea interpreter and compiler to deal with these constructs. I agree, it would be fun, but I'll add it to the set of extensions I have for the moment.
My current list (I might just open another PR with these and mention this comment so that folks can add more):
typeas a keyword (see the PR desc) and type methodsexternand multi-file gazprea programs (so invoking the linker and associated machinery)- More aggressive type inference (read, global type inference with a type constraint system)
- Exposing memory primitives to gazprea users
- Implementing move/reference semantics and implementing a borrow checker to deal with them
There was a problem hiding this comment.
Sounds good! I would argue that it isn't quite a full Gazprea interpreter because it doesn't have to deal with procedures at all (since those can't be constexpr), so no I/O or handling of aliasing. But yeah, still better as an optional extension.
|
|
||
| The range operator ``..`` produces an ``integer[upper - lower]`` array | ||
| containing every integer from the lower bound (inclusive) to the upper bound | ||
| (exclusive). Both bounds must be ``integer`` expressions; non-integer bounds |
There was a problem hiding this comment.
Ranges are currently inclusive on both ends. Is the motivation for making the upper bound exclusive to give parity with slices? If so that's well justified IMO. Just note that dozens of examples likely still use the doubly inclusive construction, so will need to be patched.
There was a problem hiding this comment.
oops, tbh I just forgot they were inclusive on both ends.
I'm open to discussion on this, I think that for a programmer who would use gazprea (ex. business people, non-systems programmers) that the most expected behaviour might be inclusive on both ends, but you make a good point that this makes everything consistent in slices.
gazprea/spec/constexpr.rst
Outdated
| integer[ELEMENT] my_array = 0; // Legal: static array of size 30, zero-filled | ||
|
|
||
| const integer[2] BAD_TABLE = [10, get_val()]; // Illegal: initializer is not a constexpr | ||
| // also illegal because function calls are |
There was a problem hiding this comment.
IIRC, we have allowed function calls in the both 1) the size expression of an array type, and 2) the initialization elements of an array value.
// valid as of current Gazprea
// global scope
integer function foo() returns integer = 3;
// local scope
integer[foo()] a = [1, 2, foo()];
I do agree that it the BAD_TABLE example shouldn't be constexpr, but I believe the second justification may contradict the current version of Gapzrea, unless this is an intended restriction.
There was a problem hiding this comment.
I was on the fence about that too. I am willing to make that change to allow function calls in both, I think procedure calls could get a bit gnarly if we allow something like:
procedure foo(integer) returns integer { inner = a - 1; a = a + 1; return inner; }
// local scope
var a = 4;
integer[foo()] = [1, 2, a];
I think the correct behaviour in this case would be that a = 5 in the final array because we need to evaluate foo() for the size before the instantiation of the array.
Or should we still disallow procedures in instantiations?
There was a problem hiding this comment.
I see the line you were referencing here, I fixed that since you're right, function calls should be allowed in declarations.
JustinMeimar
left a comment
There was a problem hiding this comment.
Overall, I find this to be a productive direction for the language to take. There will be some collateral in the form of examples to redo but I assume that will be a future issue before next fall. Nice work!
Co-authored-by: Max Leontiev <144827728+max-leontiev@users.noreply.github.com>
Function calls should be allowed in declarations, procedure calls are different and special. Constexprs still don't allow function calls, but function calls are allowed in declarations.
|
|
||
| It is also possible to construct a single-element array using this | ||
| method of construction. | ||
| Spread Operator |
There was a problem hiding this comment.
When I was thinking about tagged/untagged tuples, I looked at the Rust book page on Defining and Instantiating Structs. This made me think that Gazprea could also adopt "partial update syntax" by extending the spread operator .... What I mean is:
typealias tuple(integer, integer) pair;
typealias tuple(integer, integer, integer) triple;
typealias tuple(integer, integer, integer, integer) quad;
typealias tuple(integer a, integer b) tagged_pair;
typealias tuple(integer a, integer b, integer c) tagged_triple;
typealias tuple(integer a, integer b, integer c, integer d) tagged_quad;
// For untagged tuples
pair x = (1, 2);
triple y1 = (...x, 3); // y1 = (1, 2, 3)
triple y2 = (1, ...x); // y2 = (1, 1, 2)
pair y3 = ...x; // error: ...tuple is illegal outside of tuple literal
pair y4 = (...x); // legal, y4 = (1, 2)
quad y5 = (...x, ...x); // legal, y5 = (1, 2, 1, 2)
pair y6 = (...(1, 2)); // legal, y6 = (1, 2)
tuple(real, real) y7 = (...x); // legal thanks to tuple->tuple promotion
// For tagged tuples
tagged_pair t = (a: 1, b: 2);
tagged_triple u1 = (c: 3, ...t); // u1 = (a: 1, b: 2, c: 3)
tagged_triple u2 = (...t, c: 3); // u2 = (a: 1, b: 2, c: 3)
tagged_triple u3 = (a: 1, ...t); // error: duplicate field name a, also c isn't defined
tagged_pair u4 = ...t; // error: ...tuple is illegal outside of tuple literal
tagged_pair u5 = (...t); // legal, u5 = (a: 1, b: 2)
tagged_pair u6 = (...(a: 1, b: 2)) // legal, u6 = (a: 1, b: 2)
// Mixing tagged/untagged using ... is illegal like you'd expect
pair v = (...t); // error: type mismatch, untagged tuple initialized with tagged tuple
tagged_pair w = (...x); // error: type mismatch, tagged tuple initialized with untagged tuple
We don't have to call it partial update syntax, that's just the name for it in the Rust book.
I don't think this introduces any ambiguities and I could imagine plenty of situations where this syntax could come in handy.
Note that I didn't put any thought into how this would be handled for mixed tuples, because I think being able to use this proposed syntax in an easy-to-understand way is another reason to abandon mixed tuples.
There was a problem hiding this comment.
This is neat, and I like it. I think it's a natural extension
Gazprea '26 Specification Proposal
This is my working upstream PR of a specification for the 2026 gazprea implementation. Currently I would say that the documentation is ready for revision, and would like to point the reviewers to a few specific choices I made that may be contraversial:
Collapsing Aggregate and Composite types
Rationale: vectors as dynamic arrays felt uncomfortable in the syntax of gazprea, but we can provide the same power by using dynamic arrays. The idea is that, to the programmer, static and dynamic arrays behave the same and can all be concatenated to. So long as the range is known at compile time, we treat the array like a static array, only once we encounter a non-static array length (whether by concatenation or by initialization from a non-constexpr) will the implementation treat the array as a dynamic array. We can test student implementations of static vs dynamic arrays with creative testing (I have some proposals I will outline privately).
Rationale: Since structs essentially took over the tagged tuple in the last version of the spec it created a problem in the specification of our type system. I wanted to resolve this, and you will see in the typing section that I have outlined a formal type lattice for the language. Now tuples use tags as part of their type identity, and this allows us to implement a 'weak normative type' using a tuple.
The future of the gazprea type system could be extended. We could allow for normative type declarations with the addition of the
typekeyword. This also frees up the future development ofimpls on custom types.Rationale: It just didn't make sense to restrict it any more with mlir. If students use memref they may need to define a light type wrapper around their structs to make arrays work and vv, but in mlir defining your own type is simple and you don't need to define the rest of the language in your own dialect, you can mix and match. Otherwise students have success with tuple destructuring in the front-end.
Rationale: Due to the restrictions I imposed on lvalues/rvalues in slices, jagged arrays could only be defined at compile time. Further, jagged arrays have little use that tuples of dynamic arrays will not cover, especially tuples of dynamic n-d arrays.
Slices are R-Values and Copy-On-Write objects
Rationale: Slices, as we discussed a few times last semester, are most easily represented as a view into an array. Unfortunately, doing this for real causes many problems in the type system for a constrained language like gazprea. Broadening the type system to account for references causes us to need more value categories, like modern C++ has now, and I am not confident in students being able to execute that value category complexity in a way that makes them better programmers. By contrast, we can still teach a very useful optimization if we tell the students that slices are not eagerly copied from their sources. This way, if a slice is assigned to some lvalue, then we can keep it around as a reference until it is modified.
This aligns with @rcunrau's goal of having reference passing through function arguments, and, now that we can define n-d dynamic arrays, should provide students the opportunity to write some more tricky, low-level code if they want.
Type Lattice Definition
Other Notes