Spec ID: glyph-loose-1.0.0
Date: 2026-01-13
Status: Stable
Canonical output for any input in the test corpus is frozen and will not change.
GLYPH-Loose is the schema-optional subset of GLYPH. It provides a deterministic canonical representation for JSON-like data, suitable for hashing, caching, and cross-language interoperability.
- Drop-in JSON replacement - Any valid JSON is valid GLYPH-Loose input
- Deterministic canonical form - Same data always produces same output
- Cross-language parity - Go, JS, and Python implementations produce identical output
- Compact - More token-efficient than JSON for LLM contexts
| Type | Canonical Form | Examples |
|---|---|---|
| null | _ |
_ (accepts ∅, null on input) |
| bool | t / f |
t, f |
| int | Decimal, no leading zeros | 0, 42, -100 |
| float | Shortest roundtrip, e (not E) |
3.14, 1e-06, 1e+15 |
| string | Bare if safe, else quoted | hello, "hello world" |
- Zero: Always
0(not-0, not0.0) - Negative zero: Canonicalizes to
0 - Exponent threshold: Use exponential when
exp < -4orexp >= 15 - Exponent format: 2-digit minimum (
1e-06, not1e-6) - NaN/Infinity: Rejected with error (not JSON-compatible)
A string is "bare-safe" (unquoted) if:
- Non-empty
- First character: Unicode letter or
_ - Remaining characters: Unicode letter, digit,
_,-,.,/ - Not a reserved word:
t,f,true,false,null,none,nil
Otherwise, the string is quoted with minimal escapes.
| Type | Canonical Form |
|---|---|
| list | [ + space-separated elements + ] |
| map | { + sorted key=value pairs + } |
Examples:
[]
[1 2 3]
[_ t 42 hello]
{}
{a=1}
{a=1 b=2 c=3}
Map keys are sorted by bytewise UTF-8 comparison of their canonical string form.
Input: {"b":1,"a":2,"aa":3,"A":4,"_":5}
Output: {A=4 _=5 a=2 aa=3 b=1}
UTF-8 byte order: A (0x41) < _ (0x5F) < a (0x61) < ...
Last-wins policy: When a JSON object has duplicate keys, the last value is used.
Input: {"k":1,"k":2,"k":3}
Output: {k=3}
gv, err := glyph.FromJSONLoose(jsonBytes)- Accepts any valid JSON
- Rejects NaN/Infinity (returns error)
- Integers within ±2^53 become
int, others becomefloat
jsonBytes, err := glyph.ToJSONLoose(gv)- Produces valid JSON
- IDs become
"^prefix:value"strings - Times become ISO-8601 strings
- Bytes become base64 strings
With BridgeOpts{Extended: true}:
- Times become
{"$glyph":"time","value":"..."} - IDs become
{"$glyph":"id","value":"^..."} - Bytes become
{"$glyph":"bytes","base64":"..."}
# Format JSON as canonical GLYPH-Loose
echo '{"b":1,"a":2}' | glyph fmt-loose
# Output: {a=2 b=1}
# Convert to pretty JSON
echo '{"b":1,"a":2}' | glyph to-json
# Output:
# {
# "a": 2,
# "b": 1
# }
# File input
glyph fmt-loose data.json
# LLM mode (ASCII-safe nulls)
echo '{"value":null}' | glyph fmt-loose --llm
# Output: {value=_}
# Compact mode with schema header
echo '{"action":"search","query":"test"}' | glyph fmt-loose --compact
# Output:
# @schema#<hash> keys=[action query]
# {#0=search #1=test}The test corpus at testdata/loose_json/ contains 50 cases covering:
- Deep nesting (10-20 levels)
- Unicode (surrogates, CJK, emoji)
- Edge numbers (boundaries, precision)
- Key ordering (stability, unicode)
- Duplicate keys
- Reserved words
- Control characters
Golden files at testdata/loose_json/golden/ anchor expected canonical output.
Cross-implementation tests verify Go, JS, and Python produce byte-identical canonical forms.
GLYPH-Loose is the foundation. Schema features enable additional capabilities.
The @open annotation allows a struct type to accept fields not defined in the schema. This is useful for forward compatibility and dynamic payloads (e.g., Kubernetes-style metadata).
Schema Definition (Go):
schema := NewSchemaBuilder().
AddOpenStruct("Config", "v1",
Field("name", PrimitiveType("str")),
Field("port", PrimitiveType("int")),
).
Build()Schema Definition (TypeScript):
const schema = new SchemaBuilder()
.addOpenStruct('Config', 'v1')
.field('name', t.str())
.field('port', t.int())
.build();Canonical Schema Text:
Config:v1 @open struct{
name: str
port: int
}
Behavior:
| Struct Type | Unknown Field | Validation Result |
|---|---|---|
| Closed (default) | Present | Error: unknown_field |
@open |
Present | Pass (warning logged) |
@open + Strict |
Present | Error: unknown_field |
Strict Validation:
Use ValidateStrict() to reject unknown fields even for @open structs:
// Normal validation - unknown fields accepted with warning
result := ValidateWithSchema(value, schema)
// Strict validation - unknown fields always rejected
result := ValidateStrict(value, schema)Map values are validated against the specified value type:
schema := NewSchemaBuilder().
AddStruct("Config", "v1",
Field("settings", MapType(PrimitiveType("str"), PrimitiveType("int"))),
).
Build()
// This passes - all values are ints
value := Map(
MapEntry{Key: "timeout", Value: Int(30)},
MapEntry{Key: "port", Value: Int(8080)},
)
// This fails - "name" value is string, not int
value := Map(
MapEntry{Key: "timeout", Value: Int(30)},
MapEntry{Key: "name", Value: Str("myapp")}, // type_mismatch error
)Nested type references are also validated:
schema := NewSchemaBuilder().
AddStruct("Address", "v1",
Field("host", PrimitiveType("str")),
Field("port", PrimitiveType("int")),
).
AddStruct("Registry", "v1",
Field("services", MapType(PrimitiveType("str"), RefType("Address"))),
).
Build()Auto-Tabular mode provides compact representation for homogeneous lists of objects (arrays of records). This is common in API responses and database results.
@tab _ [col1 col2 col3]
|val1|val2|val3|
|val4|val5|val6|
@end
- Header:
@tab _followed by sorted column names in brackets - Rows: Pipe-delimited cells, one row per line
- Footer:
@endmarker
Auto-tabular is enabled by default in canonicalization outputs. Disable it if you need v2.2.x-compatible output or strict non-tabular formatting.
Go:
// Default: auto-tabular enabled
canonical := glyph.CanonicalizeLoose(value)
// Disable auto-tabular
canonical := glyph.CanonicalizeLooseNoTabular(value)
// Custom options
opts := glyph.DefaultLooseCanonOpts()
opts.AutoTabular = false
opts.MinRows = 5 // Only tabularize 5+ rows (if enabled)
canonical := glyph.CanonicalizeLooseWithOpts(value, opts)TypeScript:
// Default: auto-tabular enabled
const canonical = canonicalizeLoose(value);
// Disable auto-tabular
const canonical = canonicalizeLooseWithOpts(value, { autoTabular: false });
// Custom options
const canonical = canonicalizeLooseWithOpts(value, {
autoTabular: true,
minRows: 5
});CLI:
echo '[{"id":1,"name":"a"},{"id":2,"name":"b"},{"id":3,"name":"c"}]' | glyph fmt-loose
# Output:
# @tab _ [id name]
# |1|a|
# |2|b|
# |3|c|
# @end
# Disable tabular
echo '[{"id":1},{"id":2},{"id":3}]' | glyph fmt-loose --no-tabularA list qualifies for tabular emission when:
- Contains ≥
MinRowselements (default: 3) - All elements are maps or structs
- No row is an empty object (must have at least one key)
- Total column count ≤
MaxCols(default: 20) - When
AllowMissing=false, all rows must have identical keys - When
AllowMissing=true, the shared key ratio must be ≥ 50%:|intersection(keys)| / |union(keys)| >= 0.5
Columns are sorted by bytewise UTF-8 comparison of their canonical key form (same as map key ordering).
When a row is missing a key present in other rows, the cell contains _:
Input: [{"id":1,"name":"a"},{"id":2},{"id":3,"name":"c"}]
Output:
@tab _ [id name]
|1|a|
|2|_|
|3|c|
@end
Pipes in cell values are escaped as \|:
Input: [{"val":"a|b"},{"val":"c|d"},{"val":"e|f"}]
Output:
@tab _ [val]
|"a\|b"|
|"c\|d"|
|"e\|f"|
@end
Nested maps and lists are emitted inline:
Input: [{"id":1,"meta":{"x":10}},{"id":2,"meta":{"x":20}},{"id":3,"meta":{"x":30}}]
Output:
@tab _ [id meta]
|1|{x=10}|
|2|{x=20}|
|3|{x=30}|
@end
Go:
result, err := glyph.ParseTabularLoose(input)
// result is []map[string]anyTypeScript:
const result = parseTabularLoose(input);
// result.columns: string[]
// result.rows: Array<Record<string, unknown>>Row/column counts can be added to tabular headers for streaming resync:
@tab _ rows=120 cols=6 [id name score status created updated]
|1|Alice|0.95|active|2025-01-01|2025-06-15|
...
@end
This enables:
- Detection of truncated streams
- Verification of complete transmission
- Progress tracking for large tables
| Option | Type | Default | Description |
|---|---|---|---|
AutoTabular |
bool | true | Enable auto-tabular detection |
MinRows |
int | 3 | Minimum rows to trigger tabular |
MaxCols |
int | 20 | Maximum columns allowed |
AllowMissing |
bool | true | Allow rows with missing keys |
NullStyle |
NullStyle | underscore | symbol for ∅, underscore for _ |
SchemaRef |
string | "" | Schema hash/id for @schema header |
KeyDict |
[]string | nil | Key dictionary for compact keys |
UseCompactKeys |
bool | false | Emit #N instead of field names |
Auto-tabular reduces output size by eliminating repeated key names:
| Format | Size (55 test cases) |
|---|---|
| JSON-min | 5,109 bytes |
| GLYPH-Loose | 4,224 bytes |
| GLYPH-Tabular | ~3,900 bytes |
Tabular mode is most effective for wide, shallow tables.
For repeated structured outputs (tool calls, agent traces, topK results), schema contexts enable significant token savings through compact key encoding.
Inline schema (self-contained):
@schema#abc123 @keys=[action query confidence sources]
{#0=search #1="weather NYC" #2=0.95 #3=[web news]}
External schema ref (compact, receiver must have schema cached):
@schema#abc123
{#0=search #1="weather NYC" #2=0.95 #3=[web news]}
Clear active schema:
@schema.clear
{action=search query="weather NYC"}
- Hash-based (default): SHA-256 of keys, first 5 bytes as base32 (8 chars)
- Session-based: Short IDs like
S1,S2for compact streaming
- Keys are positional indices in the
@keys=[...]list - Objects may mix numeric keys (
#N=) and string keys - Known keys → numeric; unknown/rare keys → string literal
- Nested objects/lists also resolve numeric keys via the active schema
// Create schema context
schema := glyph.NewSchemaContext([]string{"role", "content", "tool_calls"})
// Emit with schema
opts := glyph.SchemaLooseCanonOpts(schema)
output := glyph.CanonicalizeLooseWithSchema(value, opts)
// Output: @schema#jka43dvv @keys=[role content tool_calls]
// {#0=user #1="Hello"}
// Parse with registry
registry := glyph.NewSchemaRegistry()
parsed, ctx, err := glyph.ParseLoosePayload(input, registry)from glyph import new_schema_context, SchemaRegistry, parse_loose_payload
# Create schema context
schema = new_schema_context(["role", "content", "tool_calls"])
# Parse with registry
registry = SchemaRegistry()
parsed, ctx = parse_loose_payload(input_str, registry)
role = parsed.get("role").as_str() # "user"const keyDict = buildKeyDictFromValue(sampleValue);
const opts: LooseCanonOpts = {
schemaRef: 'abc123',
keyDict,
useCompactKeys: true,
};
const output = canonicalizeLooseWithSchema(value, opts);Optional base state fingerprinting for patch validation:
@patch @schema#abc123 @keys=wire @target=m:123 @base=1a2b3c4d5e6f7890
= score 5
+ events "Goal!"
@end
The @base= attribute contains the first 16 characters of the SHA-256 hash
of the canonical loose form of the base state. This enables:
- Optimistic concurrency: Reject patches applied to stale state
- Streaming validation: Verify state consistency without full doc transfer
- Debugging: Trace state divergence in distributed systems
Go:
// Create patch with base fingerprint from state
patch := glyph.NewPatchBuilder(target).
WithBaseValue(baseState). // Computes SHA-256, uses first 16 hex chars
Set("score", glyph.Int(5)).
Build()
// Or with explicit fingerprint
patch := glyph.NewPatchBuilder(target).
WithBaseFingerprint("1a2b3c4d5e6f7890").
Set("score", glyph.Int(5)).
Build()TypeScript:
const patch = new PatchBuilder(target)
.withBaseValue(baseState)
.set('score', g.int(5))
.build();Python:
patch = PatchBuilder(target) \
.with_base_value(base_state) \
.set("score", Int(5)) \
.build()GLYPH-Loose is the foundation. When you need schema features:
- Add a schema → enables packed encoding, FID-based parsing
- Use
@openstructs → collect unknown fields safely - Use
map<K,V>→ validate dynamic keys - Use patches → efficient incremental updates
The canonical form remains stable across modes.