-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathnotes.txt
More file actions
302 lines (253 loc) · 9.98 KB
/
notes.txt
File metadata and controls
302 lines (253 loc) · 9.98 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
====================================================================================================
struct Foo in
t: enum t in
A; B; C; D;
end
u: union in
A: struct in
a: int;
end
B: struct in
x: int;
y: int;
end
C: struct in
s: cstr;
end
D: Foo&;
end
end
This syntax is valid but we don't arbitrarily allow structs/enums/unions as types.
e.g.: var x: struct in x: int; y: int end;
should we?
====================================================================================================
core:
- [x] string
- [x] vector
- [x] list
- [x] arena
- [x] associative list
- [x] set
- [x] map
- [ ] memory allocator
- realloc could be useful but it's a needlessly complex requirement.
- [ ] malloc
- [ ] calloc
- [ ] free
- [ ] printing utils
- use buffered IO
- typechecking the format strings would be nice but is likely complex
- [ ] fprint - equiv to libc fputs
- [ ] print - equiv to libc puts
- [ ] fformat - equiv to libc fprintf
- [ ] format - equiv to libc printf
- [ ] fflush - equiv to libc fflush
- [ ] memcpy
- [ ] memset
- [x] rewrite with template structs
- [x] Vector
- [x] Queue
- [x] Hashmap
- [x] Set
[x] self-host:
- [x] lexing
- [x] parsing
- [x] pretty print
- [x] IR
- [x] codegen
- [x] invoke as & ld from randy compiler
- [x] time each step of the compiler to compare with python
utils:
- [ ] easier asm generator interface
features:
- [x] global/static variables
- [x] variable scoping
- [x] outline enitre type checking and type inference model
- [~] full type checking
- [x] basic type checking
- [ ] remove `ptr`'s ability to auto-cast to reference types
- [x] variadic functions
- we could do this like C# or C++ variadiac templates in the form:
```
def foo a, b, ...c in
end
- OR -
def foo a, b, c... in
end
```
where `c` is an un-typed array of the arguments.
- accessing varargs is like using a regular array with `c.get(i)` and
`c.length`
- varargs might aswell have its own type with with no `.set` function
- we could use the regular calling conventions and just pass this array
as a regular pointer + length pair.
- the array should probably be allocated on the stack by the caller and
then freed when the callee returns.
- passing values larger than ptr_width bytes is a concern, probably just
pass them by reference and be aware of this when using varargs.
- [~] native array support
- [x] template classes make this mostly uneeded
- [x] bitwise ops: ^ | & << >>
- [x] assignment ops: += -= *= etc
- [~] structs
- [x] basic structs
- [x] nested structs
- [x] structs with references
- [x] auto-deref with . operator
- [x] Type::defs(...)
- [x] var.defs(...)
- [ ] meta/reflection generation
- [x] c default align/layout
- [ ] #packed - no alignment or padding
- [ ] #optimized - re-order struct for best memory layout
- [~] enums
- [x] basic sequential enum
- [x] specific enum values
- [ ] meta/reflection generation
- [x] constant expressions for values
- [x] more than just integer enums
- [x] c-style unions
- [ ] tagged unions
- [ ] tail call elimination
- [ ] inlinable functions
- probably implement an `inline` keyword and leave inlining at the discretion of the
programmer. there should also be some metric to determine if a function is even inlinable
and when it's not, just give a warning and emit a regular call.
- [ ] allow inline at call site
- [ ] allow inline at def definition
- [x] closures
- do I really care about this or is it just more uneeded complexity?
-> no closures are not something to care about this early
- [x] decide on memory model, manual? GC? mixed? opt-in GC?
-> full manual with analysis for and warnings about dangling references.
- [ ] basic `match` syntax, similar to switch in other languages with extra features.
no intention to have ML-style full pattern matching.
====================================================================================================
extra ideas:
Once type checking is implemented should I allow "string literals" to be automatically interpreted
as `cstr` or `string` depending on usage? e.g.
var x: cstr = "hello world"; // automatically use the null-terminated char array
var y: string = "hello world"; // automatically compile this into a std/string type
var z = "ambiguous"; // what should the type of this be?
extern printf fmt: cstr, ... -> int;
def greet x: cstr in
printf("Hello, %s\n", x);
end
greet("max"); // this already make sense
I can see problems already arising with deciding conversions before the function's type signature
is known; for example below: `do_action` is defined after its usage. This can be resolved by having
a second pass to resolve symbols before type checking.
do_action("bark"); // is this automatically converted to a std/string?
def do_action action: string in
...
end
====================================================================================================
type model and checking:
The language is strongly- and statically-typed. The type model tries to allow for type inference
everywhere but does not allow polymorphic types when there are usage conflicts. Function parameter
types are not overloadable, meaning you cannot have a `def foo a: int` and a `def foo a: cstr`.
e.g.
def foo a in
return a + 3;
end
// here the type of foo's `a` is `any` and foo's return type is also `any`
foo(123); // foo's `a` parameter now has a deduced type of `int` and the return type
// is also deduced to `int`
foo("asdf"); // illegal even though "asdf" + 3 is a legal expression.
Basic type checking can probably be done in two shallow passes and one or two deep passes.
The first shallow pass will look for specific type definitions: enums, structs, unions and declare
them with empty bodies. The second shallow pass will look at all top-level statements and annotate
them with the known types; particularly this is where `def`s and `extern`s will be declared with
known types. The third, and first deep, pass will visit every expression and assignment and attempt
to resolve a concrete type out of the `any` types. Generics will probably be instantiated into
monomorphs during this pass. The fourth pass may be unecessary but it will do the same as the
previous deep pass and display warnings or errors about types it cannot deduce.
Alternatively, instead of a third pass we can lazily compile and type check functions when they
are called. This also gives us an easier path to monomorphing generics. Check at the call site for
a version of that function with our argument types and either call that instance or instantiate
and compile a new monomorph. There is also the added benefit of not compiling functions that aren't
called which is especially nice for reducing binary size when including library code.
====================================================================================================
enum syntax:
enum ir_kind in
GetLocal; // starts at 0
PushLabel; // 1
PushInt; // 2
AllocTemps; // 3
SetLocal = 23; // jump to 23
NewDef; // 24
SetLocalArg; // 25
CloseDef; // 26...
end
no implicit conversion between int and enum: must use `ir_kind.from(int)`
the `<enum>.from` function doesn't error on out of bounds, it is merely for casting convenience
automatic generation of `ir_kind.to_string(ir_kind)`
automatic generation of `ir_kind.contains(int)`
enums will support bitwise ops to support flags
====================================================================================================
union syntax:
union ir_instr in
foo: string;
alloc_temps: struct in
n: int;
end
call: struct in
label: string;
varargs: bool;
end
inner: union in
foo: int;
bar: string;
end
end
union optional[T] in
some: T;
none;
end
unions will be implemented as a c-style tagged union: a struct with a tag field and a chunk
of memory for the union contents.
the underlying value of a union will only be reachable within `match` blocks
var ir = get_ir_instr(...);
match ir as u in
| foo -> puts(u);
| alloc_temps -> procress_alloc_temps(u);
| call -> printf("calling %s, varargs? %d\n", u.label, u.varargs);
| else -> // else keyword captures the default/unhandled cases
end
compiler will warn or error when a union value goes unused in a match block.
====================================================================================================
generic syntax:
struct list[T] in
next: list[T];
data: T;
end
struct map[Key, Value, KHasher, KComparer] in
struct map_pair in
key: Key;
value: Value;
end
buckets: vector[map_pair];
def set self, key, value in
var hash = KHasher.hash(key);
...
if KComparer.equals(key, self.buckets.get(idx).key) then
...
self.buckets.set(idx, make_map_pair(key, value));
end
end
end
def foo a:[T1], b:[T2] in
return a.get(b);
end
====================================================================================================
struct syntax:
struct point in
x: int;
y: int;
end
automatic generation of `make_point x: int, y: int -> point&`
automatic generation of `free_point self: point`
var pt = make_point(10, 20);
pt.x += 1;
pt.y += 1;
free_point(pt);