Skip to content

PhaseBridge — an experimental framework for universal, lossless translation between discrete data and continuous phase space. It ensures strict round-trip guarantees and adds κ-based diagnostics, enabling reliable interchange for time series, images, and future modalities.

License

Notifications You must be signed in to change notification settings

synqratech/phasebridge

PhaseBridge — Lossless Discrete ↔ Phase ↔ Discrete

PhaseBridge is a small, modality-agnostic interchange layer that moves data losslessly between a discrete space and a continuous phase representation — and back again. It standardizes a strict, reversible mapping:

  • uintM ↔ θ ∈ [0, 2π) via S1/SN phase codecs (1-D and N-D),
  • optional complex carrier ψ (complex64/complex128),
  • a minimal PIF (Phase Interchange Format) schema,
  • clean SDK, CLI, and conformance tests.

Highlights

  • Lossless by default: when meta.note == "no_processing", decode(encode(x)) == x bit-exact for the declared alphabet(s).

  • Three primary containers:

    1. eager θ, 2) encoded_uint + theta_lazy=True (lazy θ), 3) eager ψ.
  • Lazy phases: store encoded_uint and materialize θ on demand.

  • ND support: SNDPhaseCodec for (N, D) with per-dimension M.

  • Frontends: array-first adapters for audio, image, video, timeseries, text tokens, and graphs.

  • Streaming I/O: StreamPIFWriter/Reader (header/chunk/trailer, per-chunk hash, stream_hash, optional Merkle root).

  • Bundles: PIFBundle for scene/session manifests (multi-track, offsets).

  • Symmetric serializers: JSON, MessagePack, CBOR, and NPZ (binary).

  • Strict errors: unified custom exceptions for schema/structure/serialization/stream.

  • Amplitude broadcast: amp may be scalar or an array broadcastable to the primary shape (e.g., (N,) → (N,D)).

MVP supports time series, images (L/RGB), audio, text tokens, and graph node features through frontends. Analytics (κ, etc.) live outside the core.


Install

Python 3.10+.

# Core (SDK)
pip install -e .

# Optional serializers
pip install msgpack cbor2

# Optional frontends I/O
pip install soundfile pillow opencv-python pandas

# Analytics (κ metrics moved out of core)
pip install phasebridge_ops

Soft imports are used in frontends:

  • soundfile (WAV), Pillow (images), opencv-python (video), pandas (CSV). If missing, you’ll get a friendly ImportError with a pip install ... hint.

Quickstart

Python SDK (S¹)

import numpy as np
from phasebridge import S1PhaseCodec, PIF

x = np.arange(0, 256, dtype=np.uint8)
schema = {"alphabet": {"type": "uint", "M": 256}}

codec = S1PhaseCodec(M=256)
p = codec.encode(
    x, schema,
    lazy_theta=False,     # eager θ (default=False)
    prefer_float32=False, # θ dtype policy; float64 is default
)
xr = codec.decode(p)
assert np.array_equal(x, xr)  # lossless

# JSON round-trip
s = p.to_json(indent=2)
p2 = PIF.from_json(s, validate=True)

ND codec (Sⁿ), lazy θ, and ψ carrier

import numpy as np
from phasebridge import SNDPhaseCodec, ComplexCodec

# ND discrete symbols: shape (N, D), per-dimension Ms
X = np.array([[0,1],[2,3],[4,5]], dtype=np.uint32)
Ms = [8, 16]
schema = {
    "alphabet": {"type":"uint", "M": Ms},
    "structure": {"shape": [3, 2], "axes_roles": ["time","channel"]},
}

# Lazy θ (stores encoded_uint)
snd = SNDPhaseCodec(Ms)
p_lazy = snd.encode(X, schema, lazy_theta=True)
X_back = snd.decode(p_lazy)  # lossless

# Complex carrier ψ (primary); numeric policy via prefer_complex64
cc = ComplexCodec(M=256)
p_psi = cc.encode(
    np.arange(8, dtype=np.uint32),            # S¹
    {"alphabet": {"type":"uint","M":256}},
    store="psi",                              # ψ primary
    prefer_complex64=True,                    # complex64 if safe; else complex128
)

Frontends (array-first)

import numpy as np
from phasebridge.frontends import (
    audio_array_to_pif, image_array_to_pif, timeseries_array_to_pif,
    tokens_to_pif, graph_to_pif
)

# Audio: (N,) float in [-1,1] → quantize + PIF (lazy θ)
sr = 16000.0
y = np.sin(2*np.pi*440*np.arange(32000)/sr).astype(np.float32)
p_audio = audio_array_to_pif(y, sr, M=256, timeline_id="TL-A", t0=0.0)

# Image: (H,W,C) uint → PIF
img = np.random.randint(0,256,(64,64,3),dtype=np.uint8)
p_img = image_array_to_pif(img, M_per_channel=(256,256,256))

# Timeseries: (N,C) float → per-channel linear quantization → PIF
X = np.random.randn(1000, 3)
p_ts = timeseries_array_to_pif(X, M_per_channel=[16,32,64], fs=100.0, timeline_id="TL-TS")

# Text tokens: list[int] → PIF
p_txt = tokens_to_pif([1,5,2,9,3], M=64, timeline_id="TL-TEXT", vocab_ref="my_vocab.json")

# Graph: node discrete features + edge list → PIF
node_feat = np.array([[1,10],[2,9],[3,8],[4,7]], dtype=np.uint32)  # (N, D)
edges = [(0,1),(1,2),(2,3)]
p_graph = graph_to_pif(node_feat, edges, directed=False, node_M=[16,32])

Streaming I/O

from phasebridge import StreamPIFWriter, StreamPIFReader

w = StreamPIFWriter(schema=p_audio.schema, meta=p_audio.meta, numeric=p_audio.numeric,
                    structure=p_audio.schema.get("structure"), fmt="msgpack")
frames = [w.begin()]
enc = p_audio.encoded_uint  # lazy θ payload
split = len(enc)//2
frames.append(w.add_chunk(enc[:split]))
frames.append(w.add_chunk(enc[split:]))
frames.append(w.end())

r = StreamPIFReader.from_stream(frames, fmt="msgpack")
for _ in r: pass
assert r.verified
p_reassembled = r.assemble_pif()

Bundles (scene/session manifest)

from phasebridge import PIFBundle, BundleItem

b = PIFBundle(
    bundle_id="B1",
    session_id="S123",
    timeline_id="TL-UNIVERSAL",
    items=[
      BundleItem(ref={"uri": "memory://audio"}, role="audio", offset_t=0.0),
      BundleItem(ref={"uri": "memory://video"}, role="video", offset_t=0.0),
    ],
    meta={"project": "demo"},
)
b_bytes = b.to_bytes("msgpack")
b2 = PIFBundle.from_bytes(b_bytes, "msgpack")
assert isinstance(b2.items[0], BundleItem)

CLI Tools

Four commands ship with the SDK:

  • pb-encode — raw (bin/csv/npy) → PIF (json/msgpack/cbor/npz)
  • pb-decode — PIF (json/msgpack/cbor/npz) → raw (bin/csv/npy)
  • pb-kappa — compute κ (global/windowed) from PIF (requires phasebridge_ops)
  • pb-validate — schema/runtime validation + decode + hash checks (+ optional raw compare)

Examples:

# Encode raw bytes → PIF JSON
cat data.bin | pb-encode --in - --in-fmt bin --dtype uint8 --M 256 > out.pif.json

# Decode PIF → CSV (S¹-only tool)
pb-decode --in out.pif.cbor --pif-fmt cbor --out recon.csv --out-fmt csv

# Kappa (windowed) – needs `phasebridge_ops`
pb-kappa --in out.pif.json --win 512 --hop 256 --fmt csv > kappa.csv

# Validate (schema+runtime+decode+hash; compare to raw)
pb-validate --in out.pif.npz --pif-fmt npz --raw data.bin --in-fmt bin --dtype uint8 --report text

pb-decode is S¹-only by design (scalar M). For Sⁿ (M=[...]) use the SDK (e.g., SNDPhaseCodec) or pb-validate to inspect. MessagePack/CBOR require msgpack / cbor2. NPZ uses a zip container with JSON + .npy parts. pb-kappa dynamically imports κ from phasebridge_ops and prints a friendly hint if not installed.


PIF v1 (Core) — What We Guarantee

  • Schema: schema.alphabet.type="uint", schema.alphabet.M is scalar or vector per axis. If structure.axes_roles contains "time", sampling.fs is required (timeline_id and t0 supported).

  • Primary container (choose exactly one):

    • eager theta (float32/float64)
    • lazy θ with theta_lazy=True + encoded_uint (unsigned)
    • eager psi (complex64/complex128; {re, im} in JSON)
  • Amplitude: scalar 1.0 or array broadcastable to the primary shape (e.g., (N,) → (N,D)).

  • Lossless round-trip: for no_processing, decoding reproduces the original discrete array (bit-exact for uintM).

  • Serialization: JSON (normative text), MessagePack/CBOR/NPZ (binary). All are symmetric (same manifest in/out).

  • Errors: schema/structure/serialization/stream use dedicated exceptions: SchemaError, StructureError, SerializationError, UnsupportedFormatError, StreamError, PhaseFormatError, etc.


Repo Layout

phasebridge/
├─ src/phasebridge/
│  ├─ pif.py                # PIF object, (de)serialization, lazy θ/ψ views
│  ├─ codec_s1.py           # S¹ codec (uintM ↔ θ)
│  ├─ codec_sN.py           # Sⁿ codec (ND)
│  ├─ codec_complex.py      # ψ carrier
│  ├─ stream.py             # streaming writer/reader (hashes, stream_hash, Merkle)
│  ├─ bundle.py             # PIFBundle (scene/session manifest)
│  ├─ frontends/            # audio/image/video/timeseries/text/graph adapters
│  ├─ utils.py              # hashing, dtype policy, ND helpers, packing
│  ├─ errors.py             # custom exception types
│  └─ ...
├─ cli/                     # pb-encode / pb-decode / pb-kappa / pb-validate
├─ schemas/                 # JSON Schema (PIF v1 core) + YAML mirror
├─ tests/                   # unit tests for round-trip, schema, frontends, stream, bundles
└─ docs/                    # overview, spec, CLI usage

Testing

pytest -q

Covers strict round-trip (S¹/Sⁿ), lazy θ, ψ payloads, schema/runtime validation, stream assembly/verification (reorder/tamper checks), bundles, and frontends (with soft deps skipped when missing).


Troubleshooting

  • “Values out of range …” on encode Input contains values outside [0, M-1]. Fix input or select correct M. (The strict codec enforces range by default.)
  • “schema.structure mismatch …” Ensure structure.shape matches payload length/product and axes_roles length equals shape length. If there is a "time" role, set sampling.fs (and often timeline_id, t0).
  • Missing optional dependency Install the hinted package (e.g., pip install pillow).
  • Unknown serializer Supported: json, msgpack, cbor, npz.

Roadmap

  • More frontends/adapters (tables, logs/events, multi-channel A/V).
  • Sidecar service (HTTP/gRPC), streaming connectors.
  • Extended analytics in a separate package (core stays a reversible carrier).
  • Language bindings (C++/Rust/Go) using the same wire format.

Contributing

  • Keep the lossless contract sacred.
  • Prefer explicit schema, clear errors, symmetric serializers.
  • Run tests locally before PRs.

License

See LICENSE in the repository.


Tip: start with docs/overview.md, then docs/pif_format.md (core spec), and the CLI recipes in docs/cli_usage.md.

About

PhaseBridge — an experimental framework for universal, lossless translation between discrete data and continuous phase space. It ensures strict round-trip guarantees and adds κ-based diagnostics, enabling reliable interchange for time series, images, and future modalities.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages