Skip to content

feat(nbcr-2023-002): structured token-id (token-component + array form)#20

Open
irfan798 wants to merge 1 commit into
mainfrom
feat/nbcr-2023-002-structured-token-id
Open

feat(nbcr-2023-002): structured token-id (token-component + array form)#20
irfan798 wants to merge 1 commit into
mainfrom
feat/nbcr-2023-002-structured-token-id

Conversation

@irfan798
Copy link
Copy Markdown
Contributor

@irfan798 irfan798 commented Apr 23, 2026

Summary

Stacked on top of #19 (minimal hex-string + UAI dot form). Adds a forward-looking, byte-efficient structured representation of token-id while keeping the minimal PR's tstr form fully valid.

Producers MAY choose either form per identifier; consumers MUST accept both.

What changes

Three new CDDL rules extend the token-ids grammar:

; Extension point for tagged byte-string encodings. Today equals hex-string
; (CBOR tag 263). Future specs MAY add base58, base64, bech32 once registered.
encoded-bytes = hex-string

; A single atomic component of a UAI-style identifier.
token-component = tstr / encoded-bytes / biguint

; Scalar identifier, or ordered array of two to four components for compound
; identifiers (e.g. ERC-721 / ERC-1155). The UAI text form "a.b.c" and the
; array form [a, b, c] are semantically equivalent.
token-id = token-component / [2*4 token-component]

detailed-account gains only one line change: token-ids: [+ tstr / hex-string] becomes token-ids: [+ token-id]. Shape, tag, and keymap are unchanged.

Component arrays are bounded at 2 to 4 entries: realistic UAI identifiers are 2–3 components, and the cap prevents unbounded nesting without constraining any current use case.

The per-chain comment for ERC-721 / ERC-1155 now names the typed array as the preferred wire form and includes a CBOR diagnostic notation example to make the shape concrete.

On-wire impact

For an ERC-1155 identifier 0xfaafdc07907ff5120a76b34b731b278c38d6043c.508851954656174721695133294256171964208:

Wire form Approx size Savings vs tstr
tstr (UAI dotted text, 80 chars) ~82 B baseline
[hex-string, biguint] (typed array) ~44 B ~38 B (~46%)

Savings scale with token-id width. For a portfolio mixing ERC-20 contracts (already covered by #19's hex-string) with a handful of NFT identifiers, the additional savings from the array form easily compound into 1-2 fewer animated-QR frames per sync.

Parser impact

Tag 41402 (detailed-account) and the keymap (account = 1, token-ids = 2) are unchanged. All additions live at the token-ids value level.

  • Scalar tstr / hex-string path stays valid for every chain. Parsers that only handle the minimal PR's grammar continue to decode correctly for scalar identifiers.
  • Parsers SHOULD accept the array form [hex-string, biguint] (or more generally [2*4 token-component]). On decode, an array of 2+ components represents a compound identifier; consumers reconstitute the UAI dotted text by joining the components with . if they need a string view.

Illustrative sketches below target the string-view rendering path (error handling and byte-string size guards elided).

TypeScript (extends DetailedAccount in @ngraveio/ur-sync and HexString in @ngraveio/ur-hex-string)

type TokenComponent = string | HexString | bigint        // tstr / encoded-bytes / biguint
type TokenId        = TokenComponent | TokenComponent[]  // scalar or [2..4] array

// Input side: normalise into the CDDL grammar.
function normalise(id: TokenId): TokenId {
  if (Array.isArray(id)) {
    if (id.length < 2 || id.length > 4) throw new Error('compound id out of range')
    return id.map(promoteComponent)
  }
  return promoteComponent(id)
}

function promoteComponent(v: string | HexString | bigint): TokenComponent {
  if (v instanceof HexString) return v
  if (typeof v === 'bigint')  return v
  try   { return new HexString(v) }   // hex-looking string becomes hex-string
  catch { return v }                  // otherwise keep as tstr
}

// Display side: reconstruct the UAI dotted text.
getTokenIds = () =>
  this.data.tokenIds?.map(id =>
    Array.isArray(id) ? id.map(componentToText).join('.') : componentToText(id))

function componentToText(c: TokenComponent): string {
  if (c instanceof HexString) return '0x' + c.getData()   // round-trip parity
  if (typeof c === 'bigint')  return c.toString(10)
  return c                                                // already a tstr
}

C (tinycbor APIs)

/* Render one token-id entry. Recurses once for the array form. */
CborError render_component(CborValue *it, renderer *out) {
  switch (cbor_value_get_type(it)) {

  case CborTextStringType:                 /* tstr component */
    return copy_text(it, out);

  case CborTagType: {
    CborTag tag;
    cbor_value_get_tag(it, &tag);
    cbor_value_skip_tag(it);
    if (tag == 263) return render_hex_prefixed(it, out);   /* "0x<lowercase hex>" */
    if (tag == 2)   return render_bignum_decimal(it, out); /* tag-2 positive bignum */
    return CborErrorUnknownTag;                            /* future encoded-bytes */
  }

  case CborIntegerType:                    /* biguint fitting in uint64 */
    return render_uint_decimal(it, out);

  case CborArrayType: {                    /* [2*4 token-component] */
    size_t n;
    cbor_value_get_array_length(it, &n);
    if (n < 2 || n > 4) return CborErrorIllegalType;
    CborValue inner;
    cbor_value_enter_container(it, &inner);
    for (size_t i = 0; i < n; i++) {
      if (i > 0) out_putchar(out, '.');
      CborError err = render_component(&inner, out);       /* recurse */
      if (err) return err;
      cbor_value_advance(&inner);
    }
    return cbor_value_leave_container(it, &inner);
  }

  default: return CborErrorUnexpectedType;
  }
}

Both sketches assume the entry iterator already points at the token-id value (the caller has stepped into token-ids). The TypeScript side composes with the existing HexString promotion logic; the C side composes with the existing tinycbor iteration used by the detailed-account reader.

Extensibility

encoded-bytes is the grammar-level extension socket. Adding base58, base64, or bech32 support in a future spec is a one-line CDDL change (expand the encoded-bytes union), with a matching IANA CBOR tag registration. The token-id and token-component rules do not need to change.

Indicative byte savings per identifier once each encoding lands and replaces the current tstr form (all sizes include CBOR headers):

Future tag Representative identifier tstr encoded-bytes Savings
base58-string Solana SPL mint (32 B pubkey, 44-char base58) ~46 B ~37 B ~9 B (~20%)
base58-string Tron TRC-20 address (25 B base58check, 34 chars) ~36 B ~30 B ~6 B (~17%)
bech32-string BTC SegWit v0 P2WPKH (20 B program, 42 chars) ~44 B ~27 B ~17 B (~39%)
bech32-string Cosmos address (20 B + HRP, ~45 chars) ~47 B ~32 B ~15 B (~32%)
base64-string Generic 32 B identifier (44-char base64) ~46 B ~37 B ~9 B (~20%)

For reference, the already-shipped hex-string saves ~20 B (~45%) on an EVM ERC-20 contract address. Savings depend on how compactly the text form encodes the underlying bytes: hex and bech32 expand bytes most, so recovering them wins the most; base58 and base64 are tighter text forms and save proportionally less.

Backward compatibility

  • Producers emitting only scalar identifiers (the full scope of feat(nbcr-2023-002): adopt hex-string + UAI dot form for token-ids #19) remain fully valid under this grammar.
  • Consumers that only implement the minimal PR keep decoding correctly for the scalar subset.
  • Consumers that want to render compound identifiers for NFT portfolios should implement the array path.

Not in this PR

  • A canonical form choice (force array over tstr, or vice versa). Both forms remain equally valid; implementations pick based on payload budget and UX preference.
  • IANA registration of additional encoded-bytes members (base58, bech32, ...). Tracked separately.

References

@irfan798 irfan798 force-pushed the feat/nbcr-2023-002-structured-token-id branch from 10dcf1d to b59d56d Compare April 23, 2026 14:30
Introduce a forward-looking, byte-efficient representation of token-id
on top of the minimal hex-string change. Two interchangeable wire forms
are supported:

- the existing tstr form carrying the UAI dotted text, and
- a new typed array '[+ token-id]' where each component keeps its
  native CBOR type (e.g. '[hex-string, biguint]' for ERC-721 /
  ERC-1155).

CDDL additions:

- 'encoded-bytes' as the extension point for tagged byte-string
  encodings. Today it aliases 'hex-string' (CBOR tag 263). Future
  specs MAY add base58, base64, bech32 and similar once their CBOR
  tags are registered with IANA.
- 'token-component = tstr / encoded-bytes / biguint' as a single
  atomic component of a UAI-style identifier.
- 'token-id = token-component / [2*4 token-component]' as a scalar
  or compound identifier (bounded at four entries to fit realistic
  UAI identifiers without inviting unbounded nesting). The dotted
  tstr form and the array form are semantically equivalent.

The 'detailed-account' shape stays identical; only the grammar of
'token-ids' is lifted from '[+ tstr / hex-string]' to '[+ token-id]'.
The per-chain comment for ERC-721 / ERC-1155 now names the typed
array as the preferred form and includes a CBOR diagnostic notation
example to make the wire shape concrete.

Producers MAY choose either form per identifier and consumers MUST
accept both. Previously-emitted payloads under the minimal grammar
remain valid.

Refs LIQ-843
@irfan798 irfan798 force-pushed the feat/nbcr-2023-002-structured-token-id branch from b59d56d to 0bad87b Compare April 23, 2026 14:35
Base automatically changed from feat/nbcr-2023-002-detailed-account-hex-string to main May 4, 2026 13:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant