feat(nbcr-2023-002): structured token-id (token-component + array form) by irfan798 · Pull Request #20 · ngraveio/Research

irfan798 · 2026-04-23T13:52:18Z

Summary

Stacked on top of #19 (minimal hex-string + UAI dot form). Adds a forward-looking, byte-efficient structured representation of token-id while keeping the minimal PR's tstr form fully valid.

Producers MAY choose either form per identifier; consumers MUST accept both.

What changes

Three new CDDL rules extend the token-ids grammar:

; Extension point for tagged byte-string encodings. Today equals hex-string
; (CBOR tag 263). Future specs MAY add base58, base64, bech32 once registered.
encoded-bytes = hex-string

; A single atomic component of a UAI-style identifier.
token-component = tstr / encoded-bytes / biguint

; Scalar identifier, or ordered array of two to four components for compound
; identifiers (e.g. ERC-721 / ERC-1155). The UAI text form "a.b.c" and the
; array form [a, b, c] are semantically equivalent.
token-id = token-component / [2*4 token-component]

detailed-account gains only one line change: token-ids: [+ tstr / hex-string] becomes token-ids: [+ token-id]. Shape, tag, and keymap are unchanged.

Component arrays are bounded at 2 to 4 entries: realistic UAI identifiers are 2–3 components, and the cap prevents unbounded nesting without constraining any current use case.

The per-chain comment for ERC-721 / ERC-1155 now names the typed array as the preferred wire form and includes a CBOR diagnostic notation example to make the shape concrete.

On-wire impact

For an ERC-1155 identifier 0xfaafdc07907ff5120a76b34b731b278c38d6043c.508851954656174721695133294256171964208:

Wire form	Approx size	Savings vs `tstr`
`tstr` (UAI dotted text, 80 chars)	~82 B	baseline
`[hex-string, biguint]` (typed array)	~44 B	~38 B (~46%)

Savings scale with token-id width. For a portfolio mixing ERC-20 contracts (already covered by #19's hex-string) with a handful of NFT identifiers, the additional savings from the array form easily compound into 1-2 fewer animated-QR frames per sync.

Parser impact

Tag 41402 (detailed-account) and the keymap (account = 1, token-ids = 2) are unchanged. All additions live at the token-ids value level.

Scalar tstr / hex-string path stays valid for every chain. Parsers that only handle the minimal PR's grammar continue to decode correctly for scalar identifiers.
Parsers SHOULD accept the array form [hex-string, biguint] (or more generally [2*4 token-component]). On decode, an array of 2+ components represents a compound identifier; consumers reconstitute the UAI dotted text by joining the components with . if they need a string view.

Illustrative sketches below target the string-view rendering path (error handling and byte-string size guards elided).

TypeScript (extends `DetailedAccount` in `@ngraveio/ur-sync` and `HexString` in `@ngraveio/ur-hex-string`)

type TokenComponent = string | HexString | bigint        // tstr / encoded-bytes / biguint
type TokenId        = TokenComponent | TokenComponent[]  // scalar or [2..4] array

// Input side: normalise into the CDDL grammar.
function normalise(id: TokenId): TokenId {
  if (Array.isArray(id)) {
    if (id.length < 2 || id.length > 4) throw new Error('compound id out of range')
    return id.map(promoteComponent)
  }
  return promoteComponent(id)
}

function promoteComponent(v: string | HexString | bigint): TokenComponent {
  if (v instanceof HexString) return v
  if (typeof v === 'bigint')  return v
  try   { return new HexString(v) }   // hex-looking string becomes hex-string
  catch { return v }                  // otherwise keep as tstr
}

// Display side: reconstruct the UAI dotted text.
getTokenIds = () =>
  this.data.tokenIds?.map(id =>
    Array.isArray(id) ? id.map(componentToText).join('.') : componentToText(id))

function componentToText(c: TokenComponent): string {
  if (c instanceof HexString) return '0x' + c.getData()   // round-trip parity
  if (typeof c === 'bigint')  return c.toString(10)
  return c                                                // already a tstr
}

C (tinycbor APIs)

/* Render one token-id entry. Recurses once for the array form. */
CborError render_component(CborValue *it, renderer *out) {
  switch (cbor_value_get_type(it)) {

  case CborTextStringType:                 /* tstr component */
    return copy_text(it, out);

  case CborTagType: {
    CborTag tag;
    cbor_value_get_tag(it, &tag);
    cbor_value_skip_tag(it);
    if (tag == 263) return render_hex_prefixed(it, out);   /* "0x<lowercase hex>" */
    if (tag == 2)   return render_bignum_decimal(it, out); /* tag-2 positive bignum */
    return CborErrorUnknownTag;                            /* future encoded-bytes */
  }

  case CborIntegerType:                    /* biguint fitting in uint64 */
    return render_uint_decimal(it, out);

  case CborArrayType: {                    /* [2*4 token-component] */
    size_t n;
    cbor_value_get_array_length(it, &n);
    if (n < 2 || n > 4) return CborErrorIllegalType;
    CborValue inner;
    cbor_value_enter_container(it, &inner);
    for (size_t i = 0; i < n; i++) {
      if (i > 0) out_putchar(out, '.');
      CborError err = render_component(&inner, out);       /* recurse */
      if (err) return err;
      cbor_value_advance(&inner);
    }
    return cbor_value_leave_container(it, &inner);
  }

  default: return CborErrorUnexpectedType;
  }
}

Both sketches assume the entry iterator already points at the token-id value (the caller has stepped into token-ids). The TypeScript side composes with the existing HexString promotion logic; the C side composes with the existing tinycbor iteration used by the detailed-account reader.

Extensibility

encoded-bytes is the grammar-level extension socket. Adding base58, base64, or bech32 support in a future spec is a one-line CDDL change (expand the encoded-bytes union), with a matching IANA CBOR tag registration. The token-id and token-component rules do not need to change.

Indicative byte savings per identifier once each encoding lands and replaces the current tstr form (all sizes include CBOR headers):

Future tag	Representative identifier	`tstr`	`encoded-bytes`	Savings
`base58-string`	Solana SPL mint (32 B pubkey, 44-char base58)	~46 B	~37 B	~9 B (~20%)
`base58-string`	Tron TRC-20 address (25 B base58check, 34 chars)	~36 B	~30 B	~6 B (~17%)
`bech32-string`	BTC SegWit v0 P2WPKH (20 B program, 42 chars)	~44 B	~27 B	~17 B (~39%)
`bech32-string`	Cosmos address (20 B + HRP, ~45 chars)	~47 B	~32 B	~15 B (~32%)
`base64-string`	Generic 32 B identifier (44-char base64)	~46 B	~37 B	~9 B (~20%)

For reference, the already-shipped hex-string saves ~20 B (~45%) on an EVM ERC-20 contract address. Savings depend on how compactly the text form encodes the underlying bytes: hex and bech32 expand bytes most, so recovering them wins the most; base58 and base64 are tighter text forms and save proportionally less.

Backward compatibility

Producers emitting only scalar identifiers (the full scope of feat(nbcr-2023-002): adopt hex-string + UAI dot form for token-ids #19) remain fully valid under this grammar.
Consumers that only implement the minimal PR keep decoding correctly for the scalar subset.
Consumers that want to render compound identifiers for NFT portfolios should implement the array path.

Not in this PR

A canonical form choice (force array over tstr, or vice versa). Both forms remain equally valid; implementations pick based on payload budget and UX preference.
IANA registration of additional encoded-bytes members (base58, bech32, ...). Tracked separately.

References

Minimal prerequisite: feat(nbcr-2023-002): adopt hex-string + UAI dot form for token-ids #19
hexString CBOR tag 263 spec
IANA CBOR tag registry
NBCR-2024-001 (UAI)
EIP-1155

Introduce a forward-looking, byte-efficient representation of token-id on top of the minimal hex-string change. Two interchangeable wire forms are supported: - the existing tstr form carrying the UAI dotted text, and - a new typed array '[+ token-id]' where each component keeps its native CBOR type (e.g. '[hex-string, biguint]' for ERC-721 / ERC-1155). CDDL additions: - 'encoded-bytes' as the extension point for tagged byte-string encodings. Today it aliases 'hex-string' (CBOR tag 263). Future specs MAY add base58, base64, bech32 and similar once their CBOR tags are registered with IANA. - 'token-component = tstr / encoded-bytes / biguint' as a single atomic component of a UAI-style identifier. - 'token-id = token-component / [2*4 token-component]' as a scalar or compound identifier (bounded at four entries to fit realistic UAI identifiers without inviting unbounded nesting). The dotted tstr form and the array form are semantically equivalent. The 'detailed-account' shape stays identical; only the grammar of 'token-ids' is lifted from '[+ tstr / hex-string]' to '[+ token-id]'. The per-chain comment for ERC-721 / ERC-1155 now names the typed array as the preferred form and includes a CBOR diagnostic notation example to make the wire shape concrete. Producers MAY choose either form per identifier and consumers MUST accept both. Previously-emitted payloads under the minimal grammar remain valid. Refs LIQ-843

irfan798 force-pushed the feat/nbcr-2023-002-structured-token-id branch from 10dcf1d to b59d56d Compare April 23, 2026 14:30

irfan798 force-pushed the feat/nbcr-2023-002-structured-token-id branch from b59d56d to 0bad87b Compare April 23, 2026 14:35

Base automatically changed from feat/nbcr-2023-002-detailed-account-hex-string to main May 4, 2026 13:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(nbcr-2023-002): structured token-id (token-component + array form)#20

feat(nbcr-2023-002): structured token-id (token-component + array form)#20
irfan798 wants to merge 1 commit into
mainfrom
feat/nbcr-2023-002-structured-token-id

irfan798 commented Apr 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

irfan798 commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changes

On-wire impact

Parser impact

TypeScript (extends DetailedAccount in @ngraveio/ur-sync and HexString in @ngraveio/ur-hex-string)

C (tinycbor APIs)

Extensibility

Backward compatibility

Not in this PR

References

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

irfan798 commented Apr 23, 2026 •

edited

Loading

TypeScript (extends `DetailedAccount` in `@ngraveio/ur-sync` and `HexString` in `@ngraveio/ur-hex-string`)