Unbounded Sequence Types
Introduction
This section introduces relevant terminology
Parameters and arguments
Parameters are the inputs as defined by the function, arguments are the values or expressions effectively passed to the input:
# v parameter
def foo(x: uint256)...
# v argument
foo(4)
Bounded sequence types
We define bounded sequence types to be the types taking as type parameter their maximum length:
Bytes[n]
String[n]
DynArray[T, n]
Notably this list does not include fixed-size lists (T[n] where T is any type except Bytes[n], String[n], flag types), since there the n is not an upper bound, but the actual length of the sequence.
For bounded sequence types, variables with a smaller bound can be assigned to variables with a bigger bound:
def foo() -> None:
s1: String[10] = "Hi" # valid since "Hi" has 2 characters, so fits into "at most 10"
s2: String[20] = s1 # valid since "at most 10" fits into "at most 20"
s3: String[2] = s1 # invalid since "at most 10" does not fit into "at most 2"
return
In other words, for all n1 <= n2, String[n1] is a subtype of String[n2]: String[n1] <: String[n2].
Data locations
The EVM contains essentially 4 locations where data can be located:
- Calldata
- Memory
- Transient storage
- Storage
(There are also others like the stack and the contract bytecode itself, but those are not relevant for this discussion.)
Calldata contains the parameters to an external function, and is immutable.
Memory contains parameters to internal functions, as well as temporary variables (ones defined inside functions).
Transient storage and storage store persistent data, for example contract variables (which can be accessed with self).
Memory, while cheaper than storage, is not free.
The total cost depends on the highest index allocated, and is quadratic.
This means allocating more memory than needed can be extremely expensive.
Motivation
Currently, before a function's arguments are passed, memory is allocated to fit the expected type into memory.
If a function has a String[1000] as parameter, even if the argument is String[10], we allocate the full 1000 chars worth of memory !
And of course you cannot pass a String[11] as an argument for a String[10] parameter, as that could break assumptions on the other end.
This leads to a dilemma: How big do I make that sequence type ?
- If you make it too small, this heavily contrains users in what they can do
- If you make it too big, the cost will get astronomical, even if the user calls the method with a small instance
This is a big limitation that Vyper has that Solidity doesn't, and stops us from implementing a lot of useful patterns, for example:
- Multicall: A contract which calls the contracts as defined in its parameter, and aggregates the results.
- Virtual Machine: Like a multicall, but the output of calling another contract's method can be used to decide which contract to call next, and with what arguments.
- Forwarding parameters from one contract's method to another contract's method.
The goal of this proposal is thus to remove this dilemma, in the simplest and clearest manner.
Overview
The idea is to add new types which represent sequences of any lengths, for example String[INF]:
@external
def size(s: String[INF]) -> uint256:
self.compute_size(s)
def compute_size(s: String[INF]) -> uint256:
return len(s)
As you can see, this avoids the dilemma completely: the size of s will depend on what it is called with.
Spec
Front-end
Unbounded versions of bounded sequence types are added:
String[INF]
Bytes[INF]
DynArray[T, INF]
(optional) Structs can contain fields of unbounded types, this makes them unbounded structs.
(optional) Unbounded structs can be elements of fixed-size lists, the generated type is unbounded.
Unbounded types are unbounded sequence types and unbounded structs.
For the following examples, assume:
struct MyStruct: # Not an unbounded type
i: uint256
Examples of unbounded types:
# Unbounded because no bound
Bytes[INF]
String[INF]
DynArray[MyStruct, INF]
# If unbounded structs are allowed:
struct MyUnboundedStruct:
bytes: Bytes[INF]
# Unbounded because contains unbounded struct
DynArray[MyUnboundedStruct, 4]
DynArray[MyUnboundedStruct, INF]
# If unbounded structs can are valid for fixed-size lists:
MyUnboundedStruct[4]
Unbounded types are only valid as:
- method parameters types
- method return types
- local variable types (for variables inside methods)
- (optional) field types for structs
struct NameBox:
name: String[INF] # valid
name1: String[INF] # invalid, not a local variable
name_box1: NameBox # invalid, not a local variable
@external # following also valid for internal methods
def foo(
name2: String[INF] # valid, parameter
name_box2: NameBox # valid, parameter
) -> (String[INF], NameBox): # valid, return type
name3: String[INF] = name2 # valid, local variable
name_box3: NameBox = name_box2 # valid, local variable
return (name3, name_box2)
Unbounded sequence types are super-types of their bounded counterparts (String[n] <: String[INF]), for example a String[4] can be assigned to a variable of type String[INF].
This is not true in reverse, a String[INF] cannot be assigned to a variable of type String[4], as we cannot be sure it is of length <= 4.
convert-ing from an unbounded type to a bounded counterpart succeeds if the length fits, and reverts otherwise.
convert-ing from an unbounded type to another unbounded type follows the same rules as if they were bounded, for example padding is adjusted accordingly.
convert-ing from an unbounded type to anything else first converts to the most appropriate unbounded type (if required), and then converts again to the destination type. Example: convert(s: String[INF], Bytes[5]) is equivalent to convert(convert(s: String[INF], Bytes[INF]), Bytes[5]).
There exists one additional way to convert some unbounded sequence types to bounded ones: slice, see below.
Built-ins
Changes to built-ins, both to make them work better with the new features, as well as to standardize notation.
Semantically different:
slice
b: Bytes | bytes32 | String to b: Bytes[INF] | bytes32 | String[INF]
- return
Bytes | String to
Bytes[length] | String[length] if length is known at compile time
Bytes[32] if type of b is bytes32
<type of b> otherwise
- Note: this is the current behavior, but is undocumented
raw_call
data: Bytes to data: Bytes[INF]
- (optional)
- deprecate
max_outsize
- make it return
Bytes[INF]
- (optional) or like
slice: if max_outsize is know at compile time, returns Bytes[max_outsize], else returns Bytes[INF]
msg.data, self.code, and <address>.code
- now of type
Bytes[INF]
- Should remove special handling of these types around slicing
Cleanup notation:
raw_create
initcode: Bytes[...] to initcode: Bytes[INF]
raw_log
data: Bytes | bytes32 to data: Bytes[INF] | bytes32
raw_revert
data: Bytes to data: Bytes[INF]
extract32
b: Bytes to b: Bytes[INF]
as_wei_value
unit: str to unit: String[INF]
or to unit: String[n] where n is the greatest number of letters in the valid formats
len
b: Bytes | String | DynArray[_Type, _Integer] to b: Bytes[INF] | String[INF] | DynArray[_Type, INF]
Simplifies the compiler
Things like Bytes.any() can be replaced by the representation for Bytes[INF], since String[n] <: String[INF] for any n.
Back-end
TODO
Backwards compatibility
TODO
Alternatives considered
Optimize calls so that the cost depends on the argument value or type, and not on the parameter type
While beneficial, it could still lead to issues:
- The library or contract designer assumes
Bytes[1000] is plenty, but end-user needs to pass Bytes[1024].
- Many built-ins already use types which are unbounded, this would not allow us to create true forwarders/wrappers for these methods, and makes these built-ins harder to document.
Different syntax for the same idea
String:
Forward compat: If we ever want fixed-size lists of strings this will lead to ambiguity: String[5] vs (String)[5].
Does not generalize to dynamic arrays, since we can't remove their subscript.
String[] (Solidity's solution):
Issues with dynamic arrays: DynArray[uint256, ] or DynArray[uint256].
String[...]:
A bit verbose, but does work well with dynamic arrays: DynArray[uint256, ...]
The syntaxes above are also less clear about sub-typing.
Since String[4] <: String[5], it follows that String[4] <: String[INF] and not String[INF] <: String[4].
That is less clear with the other syntaxes:
String[4] <: String and not String <: String[4]
String[4] <: String[] and not String[] <: String[4]
String[4] <: String[...] and not String[...] <: String[4]
INF could also be INFTY, INFINITY, etc, or an one of their lower-case versions.
All uppercase was chosen because this value is like a constant.
And INF in particular was chosen because it is short to type while remaining intelligible.
Unbounded Sequence Types
Introduction
This section introduces relevant terminology
Parameters and arguments
Parameters are the inputs as defined by the function, arguments are the values or expressions effectively passed to the input:
Bounded sequence types
We define bounded sequence types to be the types taking as type parameter their maximum length:
Bytes[n]String[n]DynArray[T, n]Notably this list does not include fixed-size lists (
T[n]whereTis any type exceptBytes[n],String[n], flag types), since there thenis not an upper bound, but the actual length of the sequence.For bounded sequence types, variables with a smaller bound can be assigned to variables with a bigger bound:
In other words, for all
n1 <= n2,String[n1]is a subtype ofString[n2]:String[n1] <: String[n2].Data locations
The EVM contains essentially 4 locations where data can be located:
(There are also others like the stack and the contract bytecode itself, but those are not relevant for this discussion.)
Calldata contains the parameters to an external function, and is immutable.
Memory contains parameters to internal functions, as well as temporary variables (ones defined inside functions).
Transient storage and storage store persistent data, for example contract variables (which can be accessed with
self).Memory, while cheaper than storage, is not free.
The total cost depends on the highest index allocated, and is quadratic.
This means allocating more memory than needed can be extremely expensive.
Motivation
Currently, before a function's arguments are passed, memory is allocated to fit the expected type into memory.
If a function has a
String[1000]as parameter, even if the argument isString[10], we allocate the full 1000 chars worth of memory !And of course you cannot pass a
String[11]as an argument for aString[10]parameter, as that could break assumptions on the other end.This leads to a dilemma: How big do I make that sequence type ?
This is a big limitation that Vyper has that Solidity doesn't, and stops us from implementing a lot of useful patterns, for example:
The goal of this proposal is thus to remove this dilemma, in the simplest and clearest manner.
Overview
The idea is to add new types which represent sequences of any lengths, for example
String[INF]:As you can see, this avoids the dilemma completely: the size of
swill depend on what it is called with.Spec
Front-end
Unbounded versions of bounded sequence types are added:
String[INF]Bytes[INF]DynArray[T, INF](optional) Structs can contain fields of unbounded types, this makes them unbounded structs.
(optional) Unbounded structs can be elements of fixed-size lists, the generated type is unbounded.
Unbounded types are unbounded sequence types and unbounded structs.
For the following examples, assume:
Examples of unbounded types:
Unbounded types are only valid as:
Unbounded sequence types are super-types of their bounded counterparts (
String[n] <: String[INF]), for example aString[4]can be assigned to a variable of typeString[INF].This is not true in reverse, a
String[INF]cannot be assigned to a variable of typeString[4], as we cannot be sure it is of length<= 4.convert-ing from an unbounded type to a bounded counterpart succeeds if the length fits, and reverts otherwise.convert-ing from an unbounded type to another unbounded type follows the same rules as if they were bounded, for example padding is adjusted accordingly.convert-ing from an unbounded type to anything else firstconverts to the most appropriate unbounded type (if required), and thenconverts again to the destination type. Example:convert(s: String[INF], Bytes[5])is equivalent toconvert(convert(s: String[INF], Bytes[INF]), Bytes[5]).There exists one additional way to convert some unbounded sequence types to bounded ones:
slice, see below.Built-ins
Changes to built-ins, both to make them work better with the new features, as well as to standardize notation.
Semantically different:
sliceb: Bytes | bytes32 | Stringtob: Bytes[INF] | bytes32 | String[INF]Bytes | StringtoBytes[length] | String[length]iflengthis known at compile timeBytes[32]if type ofbisbytes32<type of b>otherwiseraw_calldata: Bytestodata: Bytes[INF]max_outsizeBytes[INF]slice: ifmax_outsizeis know at compile time, returnsBytes[max_outsize], else returnsBytes[INF]msg.data,self.code, and<address>.codeBytes[INF]Cleanup notation:
raw_createinitcode: Bytes[...]toinitcode: Bytes[INF]raw_logdata: Bytes | bytes32todata: Bytes[INF] | bytes32raw_revertdata: Bytestodata: Bytes[INF]extract32b: Bytestob: Bytes[INF]as_wei_valueunit: strtounit: String[INF]or to
unit: String[n]wherenis the greatest number of letters in the valid formatslenb: Bytes | String | DynArray[_Type, _Integer]tob: Bytes[INF] | String[INF] | DynArray[_Type, INF]Simplifies the compiler
Things like
Bytes.any()can be replaced by the representation forBytes[INF], sinceString[n] <: String[INF]for anyn.Back-end
TODO
Backwards compatibility
TODO
Alternatives considered
Optimize calls so that the cost depends on the argument value or type, and not on the parameter type
While beneficial, it could still lead to issues:
Bytes[1000]is plenty, but end-user needs to passBytes[1024].Different syntax for the same idea
String:Forward compat: If we ever want fixed-size lists of strings this will lead to ambiguity:
String[5]vs(String)[5].Does not generalize to dynamic arrays, since we can't remove their subscript.
String[](Solidity's solution):Issues with dynamic arrays:
DynArray[uint256, ]orDynArray[uint256].String[...]:A bit verbose, but does work well with dynamic arrays:
DynArray[uint256, ...]The syntaxes above are also less clear about sub-typing.
Since
String[4] <: String[5], it follows thatString[4] <: String[INF]and notString[INF] <: String[4].That is less clear with the other syntaxes:
String[4] <: Stringand notString <: String[4]String[4] <: String[]and notString[] <: String[4]String[4] <: String[...]and notString[...] <: String[4]INFcould also beINFTY,INFINITY, etc, or an one of their lower-case versions.All uppercase was chosen because this value is like a constant.
And
INFin particular was chosen because it is short to type while remaining intelligible.