Skip to content

Type representation design #2348

@nomeata

Description

@nomeata

In this issue I’d like to design our initial type representation (typerep) design.

What’s this?

The (run-time) type representation is a value at run-time that indicates a type.

What for, and why now?

We need (or can use this) for:

  • Currently, we do type-driven code generation in the IR-to-IR passes (show.ml) and the code generator (for serialization/deseriaizaion). This could be replaced with type-driven generation of a typrep value, and then interpreting that value in (static) rust code.
    The benefits are:
    • Simpler code generation
    • We can write more code in Rust (simpler, easier to maintain, easier to do complex stuff)
    • (Likely) smaller code this way.
  • In particular, I don’t see how we can extend our existing approach to deserialization code generation to the new requirement of dynamic subtyping checks (Spec: Do a subtyping check when decoding dfinity/candid#168); this really seems to require a typerep value (with identity).
  • A bit down the road, this is a requisite for Shared Generics (Shared generics #2096), and even if we don’t do that right away, it would be good to keep that use-case in mind in the design.

Because this seems to be a blocker for correct Candid deserialization, now seems to be the right time to tackele this. Nevertheless, it might be advisable to first migrate debug_show, as it is simpler.

Requirements

  • The typrep values should contain enough information to reconstruct the full type, including field names (if only for error messages), not just field hashes.
  • The typrep values need to be statically decomposable (e.g. the typrep for ?t should contain the typrep for t). Statically meaning the typrep for t exists in memory already. This seems to be necessary for all kind of recursive algorithms (show), and it would be odd if they have to allocate. This likely rules out a format like our “type hash” (which otherwise is, in a way, a very compact representation of the full type).
  • The typrep values must be able to represent coinductive values, i.e. be cyclic. This is a fundamental difference to Motoko value values!
  • The typrep values must support some form of identity, or at least a conservative to way to identify them (conservative in the sense that different typreps of the same type may look different, as long as one doesn’t run into cycles). The pointer to the static memory location of the typrep might work here. This is needed for the caching that one has to do when implementing the Candid subtyping check.
  • (soft goal) The typreps should be compact.
  • (soft goal) The strings the typrep (field names) should be deduplicated.
  • (soft goal) The typreps should be easy to parse/traverse
  • (soft goal) The typreps memory structure should be the representation of a suitable Rust enum/struct, for ease of writing the Rust code.

The following requirements come from the usecase of shared generics.

  • The typrep values need to be composable (e.g. from the typrep for t one can construct the typrep for ?t). This must also work for infinite types (e.g. from t construct let t2 = ?(t, t2) in t2).
  • The above implies that typrep values must be dynamically allocatable.
  • The above implies that they must have a GC-compatible form.

Approach 1: Use Motoko Values

One possible design is to define a Motoko type:

type TypRep = {
  #int;
  #nat;
  …
  #tuple : [TypRep];
  #array : ({#mut; #immut}, TypRep);
  #opt : TypRep;
  …
  #record : [(Text; TypRep)];
  …
}

and use the representation of that.

  • ➕ No need to adjust GC
  • ➕ Possible to write functions over this type in prelude.ml or in IR-to-IR passes
  • ➕ Existing functionality to read and write these heap objects in the RTS can be used
  • ➕ String de-duplication for free (we already de-duplicate static strings)
  • ➗ Motoko values are inductive; this needs to be coinductive. Unclear what breaks if we start to conflate the two
  • ➖ Not very compact. Using a type representation that was designed for uniformity and subtyping when neither is needed here (or is it?)
  • ➖ Maybe less convenient to use in Rust than dedicated structures
  • ➖ Either have to deal with ropes for the strings, or introduce the invariant that strings in a TypRep are always contiguous.

Approach 2: Dedicated heap objects

We could introduce dedicated heap objects for the TypReps.

(No details here yet.)

Pros and Cons mostly the inverse of Approach 1.

And now?

Discuss!

Metadata

Metadata

Assignees

No one assigned

    Labels

    compilerMotoko → WasmtypingInvolves the type system

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions