You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ic-python-db currently uses json.dumps/json.loads for all entity serialization in db_engine.py. While CPython 3.13 on WASI likely has the C-accelerated _json module, there is still overhead from:
JSON text format is larger than binary alternatives (string quoting, key repetition)
Every db.save() and db.load() goes through JSON encode/decode
On IC canisters, larger payloads mean more StableBTreeMap storage and more cycles
The current format is compact JSON (no indentation). The pretty=True option only exists on the read-only dump_json() and raw_dump_json() export/debug utilities.
Proposal
Add msgpack as the internal serialization format in db_engine.py.
Benefits
Smaller payloads — msgpack is typically 30-40% smaller than compact JSON for the same data
Faster encoding/decoding — even pure-Python msgpack is competitive with C-JSON for small dicts; a C-accelerated msgpack would be significantly faster
Binary-native — no string escaping overhead for keys/values
Size comparison for typical entity dicts
Format
Relative size
JSON (pretty)
~2-3x baseline
JSON (compact)
1x (current)
msgpack
~0.6-0.7x
CBOR
~0.6-0.7x
Protocol Buffers
~0.3-0.5x
Implementation approach
The change is localized to db_engine.py — swap json.dumps/json.loads with msgpack equivalents:
Additional optimization: strip redundant fields from storage
Since the storage key is {type_name}@{id}, the _type and _id fields inside the stored value are redundant. Stripping them at the save boundary and re-injecting on load saves ~15-20% on top of the format change:
This is safe because _type and _id are only needed in the public API (serialize()/deserialize()), not in the stored representation. The _serialize_base() method already adds them when building the dict for external use.
Combined savings: ~45-55% reduction vs current compact JSON.
Alternatives considered
CBOR (Concise Binary Object Representation)
Similar size/speed to msgpack
Self-describing, RFC 8949
Used in parts of the IC ecosystem (HTTP interface)
No clear advantage over msgpack for this use case
Protocol Buffers
50-70% smaller than JSON (schema-based, no field names stored)
But requires a .proto schema definition and code generation
Since ic-python-db is schema-flexible (entities defined dynamically at runtime), you'd be forced into a generic map<string, Value> structure, which erases most of protobuf's size advantage
Would only pay off with a per-entity-type generated proto, requiring a code generation pipeline — massive architectural shift
Verdict: overkill for this library
User-configurable format
Considered and rejected. Reasons:
Serialization format is an internal storage detail, not a user-facing concern
Migration complexity explodes (format detection on every read, full data migration on switch)
Testing surface doubles
Users would always want "the fastest and smallest" — the library should own this decision
Better approach: version the storage format internally with a metadata key, keep JSON for public-facing APIs (dump_json, serialize, deserialize)
Design decisions
The library owns the internal format — users don't choose it
JSON stays for public APIs — dump_json(), serialize(), deserialize() remain JSON-based for human readability and interop
Version the storage format — store a metadata key indicating the format version so future changes can auto-migrate transparently
Migration path — on load, try msgpack first; if it fails, fall back to JSON (for pre-migration data). On save, always use the new format.
Considerations
Storage interface change: Storage.insert/get currently use str values. With msgpack the values would be bytes. Need to update Storage ABC and MemoryStorage, or base64-encode (which eats into size savings).
WASI C extension: For maximum benefit, the msgpack C extension would need to be cross-compiled for wasm32-wasip1. Pure-Python msgpack may not outperform C-accelerated JSON.
Audit log: Also uses json.dumps — should be updated consistently.
Problem
ic-python-dbcurrently usesjson.dumps/json.loadsfor all entity serialization indb_engine.py. While CPython 3.13 on WASI likely has the C-accelerated_jsonmodule, there is still overhead from:db.save()anddb.load()goes through JSON encode/decodeThe current format is compact JSON (no indentation). The
pretty=Trueoption only exists on the read-onlydump_json()andraw_dump_json()export/debug utilities.Proposal
Add msgpack as the internal serialization format in
db_engine.py.Benefits
Size comparison for typical entity dicts
Implementation approach
The change is localized to
db_engine.py— swapjson.dumps/json.loadswith msgpack equivalents:Additional optimization: strip redundant fields from storage
Since the storage key is
{type_name}@{id}, the_typeand_idfields inside the stored value are redundant. Stripping them at the save boundary and re-injecting on load saves ~15-20% on top of the format change:This is safe because
_typeand_idare only needed in the public API (serialize()/deserialize()), not in the stored representation. The_serialize_base()method already adds them when building the dict for external use.Combined savings: ~45-55% reduction vs current compact JSON.
Alternatives considered
CBOR (Concise Binary Object Representation)
Protocol Buffers
.protoschema definition and code generationic-python-dbis schema-flexible (entities defined dynamically at runtime), you'd be forced into a genericmap<string, Value>structure, which erases most of protobuf's size advantageUser-configurable format
dump_json,serialize,deserialize)Design decisions
dump_json(),serialize(),deserialize()remain JSON-based for human readability and interopConsiderations
Storage.insert/getcurrently usestrvalues. With msgpack the values would bebytes. Need to updateStorageABC andMemoryStorage, or base64-encode (which eats into size savings).wasm32-wasip1. Pure-Python msgpack may not outperform C-accelerated JSON.json.dumps— should be updated consistently.Issue Fix redundant save on every Entity.load() — eliminates unnecessary json.dumps + storage write #1 (redundant save on load) should be fixed first(done).