All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
0.1.1 — 2026-05-01
This release rolls up all post-v0.1.0 security, correctness, and CI work:
external pentest remediation (5 findings, 2 HIGH), Audit #8 (21 findings),
Thrift Correctness Phase (47 new tests), 73 → 92 enterprise compliance gap
fixes, EventBus and FeatureReader performance improvements, local KMS key
store for on-premise deployments, and Windows MSVC + Ubuntu CI green across
17 jobs. 779 → 830 unit tests, all passing. No public C++ API breaks; the
Rust ParquetReader::schema() return type changed from Schema to
SchemaRef<'a> to fix CWE-416 — see migration note below.
- Rust:
ParquetReader::schema()now returnsSchemaRef<'a>(lifetime bound to the reader). Existinglet s = reader.schema();continues to compile; storing the schema beyond the reader's lifetime now fails at compile time as intended.
2026-03-30 — External Pentest Remediation (Strix.ai)
Five external pentest findings fully remediated. 826 → 830 unit tests. Zero open vulnerabilities.
- rust/signet-forge/src/schema.rs + reader.rs + lib.rs:
ParquetReader::schema()returned a raw FFI pointer as an unconstrainedSchemavalue — safe Rust callers could retain it after the reader was dropped, causing use-after-free (CWE-416, CVSS 8.4). Fixed withSchemaRef<'a>lifetime-bound wrapper type;PhantomData<&'a ()>ties the borrow to the reader's lifetime, making misuse a compile-time error at zero runtime cost.SchemaRefre-exported at crate root. - include/signet/reader.hpp:
DELTA_BINARY_PACKEDandBYTE_STREAM_SPLITcolumn decoders silently returned truncated or empty vectors when decoded value count mismatched the pagenum_valuesfield, causing invisible data corruption with no error (CWE-130, CVSS 6.5). Fixed by validatingdecoded.size() != countafter every decode call and returningCORRUPT_PAGEerror on mismatch — applies toint32,int64,float, anddoublepaths. - include/signet/mmap_reader.hpp:
MmapParquetReadercould receiveSIGBUSand crash when a file was truncated after the mapping was established — any subsequent read from the invalidated region raises SIGBUS (CWE-367, CVSS 5.5). Fixed by copying header bytes (4 KB window) and page payload intostd::vectorowned memory before any parsing or CRC operations. Removed volatile pre-fault byte reads inopen()that intentionally touched mapped pages (a SIGBUS trigger under concurrent truncation). - include/signet/error.hpp: Evaluation license counters could be reset by rotating
SIGNET_COMMERCIAL_USAGE_FILEenv var to a fresh writable path across process restarts — a missing file silently initialized fresh counters, bypassing the evaluation limit (CWE-284, CVSS 3.3). Fixed by removing the env-var override path entirely:usage_state_path()now unconditionally returnsdefault_usage_state_path(). Removed now-unused<climits>and<filesystem>includes.
| Finding | Severity | CWE | CVSS | File |
|---|---|---|---|---|
F1/F2 — Rust UAF in schema() |
HIGH | CWE-416 | 8.4 / 7.3 | schema.rs, reader.rs, lib.rs |
| F3 — DELTA/BSS silent corruption | MEDIUM | CWE-130 | 6.5 | reader.hpp |
| F4 — mmap SIGBUS on truncation | MEDIUM | CWE-367 | 5.5 | mmap_reader.hpp |
| F5 — License counter reset | LOW | CWE-284 | 3.3 | error.hpp |
2026-03-10 — Audit #8 + Performance
- EventBus: Replace mutex-guarded
shared_ptr<StreamingSink>withstd::atomic_load/store— publish() hot path is now lock-free (~53 ns, down from ~94 ns) - FeatureReader: Add single-entry row group cache — consecutive point queries to the same row group reuse decoded columns instead of re-decoding (get() ~0.14 μs cached, as_of_batch(100) ~19 μs)
21 findings (3 pre-existing + 18 new), all remediated. Zero known open vulnerabilities.
- examples/ticks_import.cpp, ticks_wal_stream.cpp: Shell metacharacter validation before
popen()— rejects' " \ \$ | ; & ( ) < > \n \r \0` in input paths (CWE-78) - python/_bindings.cpp: Fixed use-after-free in all 4 numeric column readers —
py::array_tnow allocated as Python-owned buffer withmemcpyfrom decoded vector (CWE-416) - column_index.hpp: Type-aware boundary order detection using
PhysicalTypeparameter — fixes predicate pushdown for signed INT32/INT64 columns (CWE-843) - ai/feature_reader.hpp: Added
mutable std::mutex rg_cache_mutex_protecting row group cache + custom move constructor/assignment locking source mutex (CWE-362) - thrift/types.hpp: Added
count < 0guard to 4 list count checks preventing negative-to-massive-size_t allocation (CWE-400) - thrift/compact.hpp:
write_list_headerthrowsstd::invalid_argumenton negative size instead of silent return (CWE-754) - column_index.hpp:
OffsetIndex::deserialize()setsvalid_ = falsebefore early return (CWE-754) - ai/inference_log.hpp: Reordered underflow guard:
offset > size || emb_count > (size - offset) / sizeof(float)(CWE-191) - wasm/signet_wasm.cpp:
writeFileToMemfsreturnsbool(wasvoid); JSONreadStr()handles backslash escapes (CWE-252, CWE-20) - crypto/cipher_interface.hpp: CRNGT partial-block match now throws instead of silent skip (FIPS 140-3 §4.9.2)
- bloom/split_block.hpp: Block index bounds assertion in
insert()andmight_contain()(CWE-125) - writer.hpp:
build_column_index()threadsschema_.column(c).physical_typefor type-correct boundary order - examples/ticks_import.cpp: Replaced
catch(...)with typedcatch(const std::exception& e)(CWE-755)
- error.hpp: Strengthen
usage_state_path()with 6-layer validation: absolute-path-only, realpath canonicalization, is_directory parent check, null byte rejection, path traversal rejection, post-canonicalization recheck - wal.hpp: POSIX
open(0600)+fdopen()for CWE-732 world-writable file prevention (3 locations) - CodeQL: All 8 code scanning alerts resolved (5 fixed in code, 3 dismissed with documented justification)
- Updated all benchmark figures across README.md, docs/BENCHMARKS.md, COMPARISON.md, PRODUCT_OVERVIEW.md to reflect measured values
- WalMmapWriter: corrected from projected ~38 ns to measured ~223 ns
- Audit #8 documentation: Comprehensive internal audit report (docs/internal/AUDIT_8_DOCUMENTATION.md) with CWE references, CVSS scores, risk analysis, and cross-references to NIST/OWASP/CERT/RFC publications
- Updated 10 documentation files with Audit #8 findings: internal architecture docs (encryption, thread model, AI subsystem, encoding codecs), product-knowledge docs (security hardening, cross-language bindings), and client-facing docs (SECURITY.md, CHANGELOG.md, QUALITY_ASSURANCE.md)
2026-03-09 — Enterprise Compliance & Hardening
Nine compliance gap-fix passes resolving 73 of 92 enterprise regulatory gaps. 566 unit tests (100% passing). Covers FIPS 140-3, EU AI Act, MiFID II, GDPR, DORA, and Parquet PME spec.
- CodeQL SAST workflow with
security-extendedquery suite (T-1) - CycloneDX SBOM generation via
anchore/syft(T-2) - Full 554-test commercial tier coverage in CI (T-15)
- Sanitizer coverage for crypto code paths (T-15b)
- GCM invocation counter with 2^32 key rotation trigger (C-3)
- UTC
system_clocktraceability for all timestamps (R-5) - MiFID II RTS 24 mandatory fields: buy_sell_indicator, order_type, time_in_force, ISIN, currency, short_selling_flag (R-4)
- Log retention lifecycle API with configurable policies (R-1)
- EU AI Act Art.13 transparency model card fields (R-2)
- Human oversight API:
HumanOverrideRecord, override tracking, override rate monitoring (R-3) - Art.15 accuracy metrics: PSI/KS-test drift detection, bias monitoring (R-3b)
- Full NIST SP 800-38D test vector suite — 18 test cases (C-2/T-4)
- Crypto fuzz harnesses: AES-GCM, PME, key metadata (T-3)
- PME dictionary page + page header encryption (P-1, P-2)
- AAD binary format per PME spec (P-4)
- Thrift-serialized key metadata replacing custom TLV (P-6)
- PME negative security tests: AAD mismatch, key confusion, page reorder (P-9)
- GCM tag truncation rejection (C-6)
- AAD length limit enforcement per SP 800-38D (C-12)
- Column key O(1) cache with
unordered_map(P-8) - NIST SP 800-38A CTR test vectors (P-10)
- Power-on crypto self-tests / KATs (C-9)
- LEI, ISIN, MIC code validation per ISO 17442/6166/10383 (R-12/R-12b/R-12c)
- Kill switch / circuit breaker API (R-10)
- HKDF key derivation per RFC 5869 (C-7/C-8)
- Signed plaintext footer with HMAC-SHA256 + HKDF-derived keys (P-3)
- KMS client interface:
IKmsClientabstract class for DEK/KEK key wrapping (P-5) SecureKeyBufferRAII with mlock/munlock + secure zeroization (C-11)- Crypto-shredding for GDPR Art.17 right-to-erasure (G-1)
- PII data classification: 6-level
DataClassificationenum (G-2) - Pre-trade risk checks per MiFID II RTS 6 Art. 17 (R-11)
- ICT asset identification/classification per DORA Art. 7-8 (D-6)
- Pseudonymizer utility: HMAC-SHA256 deterministic keyed hashing (G-5)
- GDPR writer policy: enforcement of encryption for PII columns (G-7)
- Records of Processing Activities (ROPA) per GDPR Art. 30 (G-3)
- Data retention / TTL with legal hold support per GDPR Art. 5(1)(e) (G-4)
- Backup policy / RPO tracking per DORA Art. 12 (D-3)
- Key rotation lifecycle management per DORA Art. 9(2) (D-11)
- Continuous RNG test (CRNGT) per FIPS 140-3 §4.9.2 (C-13)
- ICT incident management:
ICTIncidentRecord, severity scoring per DORA Art. 10/15/19 (D-1) - Resilience testing:
ResilienceTestRecordper DORA Art. 24-27 (D-2) - Third-party risk:
ThirdPartyRiskEntryper DORA Art. 28-30 (D-4) - ICT risk management:
ICTRiskEntryper DORA Art. 5-6 (D-5) - Anomaly detection:
AnomalyRecordwith 6 categories per DORA Art. 10 (D-7) - Recovery procedures:
RecoveryProcedureper DORA Art. 11 (D-8) - Post-incident review:
PostIncidentReviewper DORA Art. 13 (D-9) - ICT notification:
ICTNotificationper DORA Art. 14 (D-10)
- DPIA record per GDPR Art. 35 (G-6)
- Subject data query/response for DSAR per GDPR Art. 15 (G-8)
- Performance/drift metrics per EU AI Act Art. 15 (R-6)
- AI risk assessment per EU AI Act Art. 9 (R-7)
- Technical documentation per EU AI Act Art. 11/Annex IV (R-8)
- QMS checkpoints per EU AI Act Art. 17 (R-9)
- Report integrity / signed reports per MiFID II RTS 24 Art. 4 (R-13/R-13b)
- Completeness attestation with gap detection per RTS 24 Art. 9 (R-13c)
- Annual self-assessment per MiFID II Art. 17(2) (R-14)
- Training data metrics per EU AI Act Art. 10 (R-15)
- Lifecycle event logging per EU AI Act Art. 12(2) (R-15b)
- Post-market monitoring per EU AI Act Art. 61 (R-16)
- Order lifecycle linking with
parent_order_idper RTS 24 Art. 9 (R-17) - Serious incident reporting per EU AI Act Art. 62 (R-18)
- Source file manifest in reports (R-18b)
- Algorithm deprecation framework per NIST SP 800-131A (C-4)
INTERNALkey mode production gate per FIPS 140-3 §7.7 (C-15)- Key rotation request/result API per PCI-DSS/HIPAA/SOX (T-7)
- AES-256-only design decision documented per NIST SP 800-131A Rev.2 §4: post-quantum safety (Grover's algorithm), single key size eliminates CWE-326 (C-14)
- NIST SP 800-38D Test Case 15: AES-256-GCM with 64-byte plaintext and independent ciphertext + tag verification (C-16)
- 7 Wycheproof X25519 edge-case tests: valid exchange, RFC 7748 §6.1 vector, RFC 8037 with manual scalar clamping, low-order point rejection, non-canonical u-coordinates, twist points (C-18)
- NIST SP 800-227 updated from draft to Final (Sep 2025) across all source files and documentation (C-19)
- 4 Wycheproof AES-256-GCM edge-case tests: empty plaintext + AAD (tcId 92), ciphertext verification (tcId 97), modified tag rejection (16 flips + boundary values), tampered ciphertext detection (T-5)
- IV uniqueness verification: consecutive CSPRNG-generated nonces proven distinct, same plaintext with different IVs produces different ciphertext — validates GCM IV non-reuse guarantee (T-18)
End-to-end security audit across all 53 header files — crypto, encoding, compression, Thrift, bloom, core reader/writer, interop bridges, AI tier, WAL, streaming, feature store, event bus, and compliance reporters. 91 vulnerabilities identified and fixed across ~45 files. 423/423 tests pass.
- AES S-box cache-timing mitigation: full table prefetch before encrypt/decrypt (NIST/Bernstein 2005)
- Constant-time
gf_mulin MixColumns: replaced branching with arithmetic masking Aes256made non-copyable (key material hygiene); move ops securely zero source- GCM:
encrypt()/decrypt()now callderive_j0()for correct 12/16-byte IV handling (NIST SP 800-38D §7.1) - GCM
gctr(): block count overflow guard (2^32-2 limit per NIST SP 800-38D) - MSVC X25519
fe_sub: corrected2pconstants (was0x3FFFFF0, now2^26-19); all X25519 on Windows was broken - Hybrid KEM: added domain separation label
"signet-forge-hybrid-kem-v1"to SHA-256 key combining (NIST SP 800-227, Final Sep 2025) AesGcmCipher/AesCtrCipher: key storage changed fromstd::vectortostd::array<uint8_t, 32>(prevents reallocation leaks)BCryptGenRandom: return value now checked; size validated againstULONGtruncation (Windows)KeyMode::INTERNAL: runtime warning when raw key stored in Parquet metadata- TLV
append_tlv_str/append_tlv_blob: overflow check againstMAX_TLV_LENGTH - TLV deserialization:
KeyModeandEncryptionAlgorithmenum range validation KeyPair,SignKeyPair,HybridKeyPair,PostQuantumConfig: zeroing destructors for secret key material- Audit chain
now_ns(): changed tosteady_clock+ cross-thread atomic monotonicity (washigh_resolution_clock+thread_local) - Audit chain serialize: overflow check on entry count before
uint32_tcast - Audit chain deserialize: bounds check before
reserve()to prevent 480 GB allocation on crafted input - Non-constant-time
ghash()marked[[deprecated]]
- RLE decoder: truncated value now returns
falseinstead of silently zeroing missing bytes - RLE varint: stream position restored on overflow (was left misaligned)
- RLE
encode_with_length: payload size overflow check beforeuint32_tcast - RLE
flush_rle_run: shift overflow guard (rle_count_capped atSIZE_MAX >> 1) - Delta
decode_int32: range check againstINT32_MIN/INT32_MAXbefore narrowing cast - Delta encoder: subtraction uses unsigned arithmetic to avoid signed overflow UB
- Dictionary encoder:
MAX_DICTIONARY_ENTRIES(1M) limit withis_full()API - BSS encode: overflow check on
count × WIDTHbefore allocation (parity with decode) - Decompression bomb: absolute 256 MB cap + zero-length compressed data rejection + ratio check
- Snappy compress: input >4 GiB rejected (was silently truncated to 32 bits)
- Snappy decompress: 256 MB absolute size cap
- Snappy
match_length: bounds guard on source pointers - LZ4:
size_t→intoverflow validation before all liblz4 calls - GZIP:
size_t→uIntoverflow validation before all zlib calls - Thrift:
zigzag_encode_i64uses unsigned left shift (UB fix for negative values) - Thrift:
write_list_headerrejects negative size - Thrift:
end_structsetserror_on stack underflow instead of silent no-op - Thrift: global
total_fields_read_counter with 1M cap (prevents per-struct reset bypass) - Bloom
from_data: enforceskMaxBytes(128 MiB) limit - Bloom:
reinterpret_cast<uint32_t*>replaced withmemcpy-based access (strict aliasing + alignment) - xxHash: MSVC endianness detection added (was
#erroron MSVC)
read_batch_string(): subtraction-based bounds check prevents integer overflow (waspos_ + len > size_)extract_byte_array_strings(): bounds checks on length prefix and string data readsdata_at()in mmap reader: validates offset againstmapped_size_- Mmap reader:
MADV_WILLNEED+ volatile first/last byte read to detect truncated files early - Mmap reader: 1024:1 decompression ratio check (parity with regular reader)
- Column index: list count capped at 10M before
resize()(prevents Thrift-based memory bomb) - Column writer: >4 GiB BYTE_ARRAY now throws
std::length_error(was silent data loss) SIGNET_FORGE_STATE_DIR: path traversal rejection (..segments)- Default usage state path: XDG_STATE_HOME/HOME-based (was
/tmp— symlink attack risk) - Usage state file:
lstatsymlink check before write (TOCTOU mitigation) - DLPack
byte_offset: range validation - DLPack
import_tensor_copy: checked multiplication fornum_elements × elem_size - DLPack ndim: range check (max 32)
- Arrow bridge:
memsetzero-initialization of output structs (prevents double-free on partial init) - ONNX bridge: dimension positivity validation
- Reader: 1024:1 decompression ratio check
- Reader: 256 MB decoded page memory budget
- Statistics: type-safety documentation for
min_as<T>/max_as<T> - Z-order: alignment validation before pointer casts in
normalize_column() - Arena:
allocate_zeroed()method added
WalMmapWriter::append(): assert-only bounds check replaced with runtimeifcheck (was compiled away in Release)WalMmapWriter:WAL_MAX_RECORD_SIZEenforcement (was missing — records >64 MB caused silent WalReader data loss)WalMmapWriter:active_idx_changed tostd::atomic<size_t>(data race between writer and bg thread)WalMmapWriter:closed_changed tostd::atomic<bool>WalWriter::open()resume scan:WAL_MAX_RECORD_SIZEenforcement (prevents corruptdata_szskip)WalWriter: Windows_ftelli64()for >2 GB WAL files- WAL CRC-32: documented as crash-recovery-only (not tamper-evident)
EventBus:sink_changed tostd::atomic<StreamingSink*>(use-after-free on concurrent detach)MpmcRing::Slot:alignas(64)to eliminate false sharing (latency impact 2-5x)ColumnBatch::to_stream_record:uint32_toverflow check on row countInferenceRecord::serialize(): added EU AI Act training provenance fields (training_dataset_id,training_dataset_size,training_data_characteristics) — previously omitted, allowing metadata tampering without breaking hash chainjson_to_features/json_to_embedding: 1M element cap (prevents memory exhaustion on crafted JSON arrays)FeatureReader:failed_file_count_tracking (was silent skip — compliance risk for incomplete feature queries)FeatureWriter: partial file cleanup on roll close failureFeatureWriter: symlink limitation documentedDecisionLogWriter: symlink limitation documentedMiFID2Reporter: CSPRNG random hex suffix on auto-generatedreport_id(was predictable timestamp-only)EUAIActReporter: per-record anomaly flag changed tomean + 3σ(was inconsistent3× mean)StreamingSink:std::filesystem::pathiteration for path traversal check (was manual string splitting)SpscRingBuffer: heap allocation documentation for large capacitiesRowLineageTracker: commercial license check discard documented as intentional
- MiFID II RTS 24: Report IDs now include CSPRNG entropy — satisfies uniqueness requirement per Annex I field 1
- EU AI Act Article 12: Per-record anomaly detection now uses 3-sigma statistical threshold consistent with aggregate (was
3× mean), ensuring coherent monitoring logs per Article 12(1) - EU AI Act Article 13:
InferenceRecordserialization now includes training dataset provenance fields, ensuring hash chain integrity covers training data characteristics per Article 13(3)(b)(ii) - NIST SP 800-38D: GCM IV derivation (
derive_j0) correctly implements §7.1 for both 96-bit and non-96-bit IVs; counter overflow guard implements §5.2.1 - RFC 7748: X25519 Montgomery ladder field arithmetic corrected for MSVC (10-limb representation)
- FIPS 197: AES S-box cache-timing mitigation via full-table prefetch
- 11 additional fixes from cross-referencing 15 static audit findings against Pass #5
- 6 new fixes: page CRC-32 in writer (Parquet spec compliance), mmap parity gaps (negative page size + num_values cap), reader row_group OOB bounds check, Z-Order column count validation (CWE-787), Float16 shift UB + 6 unaligned cast fixes (CWE-704), feature flush data loss prevention (error path ordering)
- 5 partial completions: getrandom EINTR retry, delta zigzag unsigned shift, statistics typed merge (PhysicalType dispatch), compliance error reporting (5 silent skips → errors), WAL fsync return checks
- 1 deferred: Feature Store composite
(timestamp_ns, version)tie-breaker — API contract change deferred to Feature Store v2
- 53 security and correctness fixes across crypto, encoding, I/O, AI, and compliance subsystems
- 8 CRITICAL: constant-time GHASH (4-bit table lookup), GCM counter overflow guard, RLE resize formula, dictionary error reporting, BYTE_ARRAY bounds validation (read + write), INT4 sign extension (portable branchless), verify_chain early-return on tamper detection
- 18 HIGH: secure key zeroing (volatile + compiler barrier), move-only ciphers (AesCtr), CSPRNG hardening (BCryptGenRandom for Windows, hard-fail on unsupported), overflow guards (BSS, delta, mmap footer, arena), typed statistics (PhysicalType tracking), NaN exclusion from num_values, xxHash endianness enforcement, configurable compliance (price precision, timestamp granularity), training metadata (EU AI Act Art.13), Art.19 cross-chain verification, monotonic now_ns(), MPMC ring validation
- 18 MEDIUM: constant-time X25519 zero check, TLV metadata caps (1 MB), PME module_type validation, reduced Thrift nesting (128 to 64), Arrow offset/length caps, bloom filter seed enforcement, writer close validation, SHA-256 FIPS test vector, WAL empty record tracking, instrument validator callback, feature computation lineage, inference training metadata
- 9 LOW: 64-bit Snappy hash positions, Thrift structured errors, GCM IV size configuration (12/16 bytes), mmap decompression pre-validation; 4 no-ops (correct per spec)
- ~29 new tests (all tagged
[hardening]), 2 new ErrorCodes (CORRUPT_DATA, INVALID_ARGUMENT) - Standards: NIST SP 800-38D, NIST SP 800-38A, FIPS 197, FIPS 180-4, RFC 7748, MiFID II RTS 24, EU AI Act Art.12/13/19
0.1.0 - 2026-03-04
Initial public release of Signet Forge.
ParquetWriter— streaming Parquet file writer with configurable encoding and compressionParquetReader— random-access Parquet file reader with typed column APIsMmapParquetReader— memory-mapped reader path for zero-copy accessSchemaandSchemaBuilder— fluent schema definition with 7 physical types (BOOLEAN, INT32, INT64, INT96, FLOAT, DOUBLE, BYTE_ARRAY, FIXED_LEN_BYTE_ARRAY)ColumnStatistics— min/max/null_count per column chunkColumnIndexandOffsetIndex— predicate pushdown support- CSV-to-Parquet converter with automatic type detection
- PLAIN encoding for all physical types
- RLE/Bit-Packing Hybrid encoding (definition/repetition levels, boolean columns)
- DELTA_BINARY_PACKED encoding (timestamps, monotonic sequences)
- BYTE_STREAM_SPLIT encoding (floating-point columns)
- RLE_DICTIONARY encoding with dictionary pages
- Snappy — bundled header-only implementation (zero external dependencies)
- ZSTD — optional, link
libzstd(-DSIGNET_ENABLE_ZSTD=ON) - LZ4 — optional, link
liblz4(-DSIGNET_ENABLE_LZ4=ON) - Gzip — optional, link
zlib(-DSIGNET_ENABLE_GZIP=ON)
- AES-256-GCM footer encryption
- AES-256-CTR column data encryption
- Parquet Modular Encryption (PME) full spec implementation
- Key metadata serialization
- Post-quantum encryption: Kyber-768 KEM + Dilithium-3 digital signatures
- X25519 Diffie-Hellman key agreement (RFC 7748 Montgomery ladder)
- Hybrid KEM: X25519 + Kyber-768 combined key encapsulation
- Split-block bloom filter (Parquet spec compliant)
- Bundled xxHash64 implementation (public domain)
- PME-encrypted bloom filter support
FLOAT32_VECTOR(dim)logical type with SIMD-accelerated I/OINT8andINT4quantized vector storage with on-read dequantization
- Arrow C Data Interface bridge (
ArrowArray/ArrowSchema, zero-copy) - ONNX Runtime bridge (
OrtValuecreation from Parquet columns) - NumPy / DLPack / buffer protocol bridge for PyTorch integration
TensorViewandOwnedTensorfor zero-copy ML inference
WalWriter— fwrite-based WAL with 339 ns per-append latency (32 B payload)WalReader— crash-safe WAL reader with CRC-32 integrity verificationWalManager— segment rolling, compaction, and lifecycle managementWalMmapWriter— mmap ring-buffer WAL with ~38 ns per-append latencyMappedSegment— 4-slot ring with background pre-allocation and drainStreamingSink— lock-free ring buffer to row group flusher
FeatureWriter— point-in-time correct feature materialization to ParquetFeatureReader— 12 us per-entityas_of()lookup via binary search index- History and batch APIs for time-travel feature retrieval
MpmcRing<T>— Vyukov MPMC bounded queue (10.4 ns single-threaded push+pop)ColumnBatch— columnar batch container with zero-copy TensorView accessEventBus— three-tier event router (topic → subscriber → handler)
AuditChain— SHA-256 hash chain across row groups for tamper detectionDecisionLogWriter/DecisionLogReader— structured AI decision loggingInferenceLogWriter/InferenceLogReader— ML inference audit trail
MiFID2Reporter— MiFID II RTS 24 Annex I report generation (JSON/NDJSON/CSV)EUAIActReporter— EU AI Act Articles 12, 13, and 19 conformity assessment
- pybind11 bindings with 44/44 C++ API exports
- NumPy array integration for all column types
- Full audit/compliance API bindings (gated by
SIGNET_HAS_AI_AUDIT) - 35 pytest test functions
signet_cli— Parquet file inspection (schema, row groups, statistics, metadata)
- 12 CMake presets: dev, dev-tests, release, asan, tsan, ubsan, msan, ci, benchmarks, python, minimal, server, server-pq
- Header-only core with zero mandatory dependencies
- FetchContent integration (Catch2 3.7.1 for tests, pybind11 for Python)
SIGNET_MINIMAL_DEPSone-flag embedded build mode
- GitHub Actions with 7 matrix jobs: build-test (Ubuntu + macOS), ASan, TSan, UBSan, Windows MSVC, server codecs (ZSTD+LZ4+Gzip), post-quantum (liboqs)
- Concurrency control with cancel-in-progress
basic_write.cpp— minimal write examplebasic_read.cpp— minimal read examplecsv_to_parquet.cpp— generic CSV to Parquet converterticks_import.cpp— HFT tick data CSV.gz to Parquet with optimal encodingsticks_query.cpp— HFT tick data query with predicate pushdownticks_wal_stream.cpp— HFT tick data WAL durable ingestion to Parquet compaction
- 37 benchmark cases across 6 files: write throughput, read throughput, WAL latency (fwrite + mmap), encoding speed, feature store latency, MPMC ring throughput
- 9 internal architecture documents (overview, build system, data structures, Parquet format, encryption, encoding codecs, AI subsystem, thread model, PQ-PME integration)
- 7 user-facing documents (quickstart, API reference, applications, compression guide, PQ crypto guide, AI features, production setup)
- Production README with feature matrix, quickstart, benchmark table, architecture diagram
- Hardening Pass 1: Fixed 6 vulnerabilities — BSS decode OOB reads, RLE bit_width boundary values (0 and 65), Thrift nesting depth exhaustion, bad/truncated Parquet magic bytes, Python write_column OOB, path traversal in DecisionLogWriter
- Hardening Pass 2: Fixed 6 additional vulnerabilities — WAL oversize record cap (64 MB), FeatureWriter path traversal, Parquet page size bomb cap (256 MB), Thrift field-count DoS cap (65536), Thrift string bomb cap (64 MB), ArrowArray offset overflow
- Hardening Pass 3: Fixed 23 additional vulnerabilities across 5 batches (~20 files modified):
- Encoding: RLE bit_width clamped to [0,64] (prevents UB on shift > 64), Dictionary decoder OOB index (assert→runtime check), BSS count×WIDTH integer overflow (division-based guard), Delta decoder accumulation overflow (__builtin_add_overflow)
- Thrift: Negative list count rejection, MAP/LIST/SET collection size DoS cap (1M entries)
- Crypto: Replaced std::random_device byte-by-byte IV generation with platform CSPRNG (arc4random_buf on macOS, getrandom on Linux), GCM counter overflow guard per NIST SP 800-38D (~64 GB limit), TLV field size overflow guard (64 MB cap), AES-256 round keys zeroed in destructor via volatile pointer, cipher adapter key vectors zeroed on destruction
- Interop: Arrow bridge offset×elem_size overflow check, raw new→std::make_unique RAII for ArrowSchemaPrivate/ArrowArrayPrivate
- AI: TensorView assert()→throw std::out_of_range in production, string size truncation in decision/inference log append_string, FeatureWriter symlink bypass (weakly_canonical + raw path check)
- Compliance: Field length truncation (MAX_FIELD_LENGTH=4096) in MiFID2 and EU AI Act reporters, 64-bit time_t static_assert for timestamps beyond 2038
- WAL: Empty record rejection, verify_chain() early return on hash deserialization failure
- Post-quantum: Refined #pragma message warning (acknowledges X25519 HybridKem provides real classical ECDH; only Kyber lattice portion is stubbed), added is_real_pq_crypto() runtime query
- New header:
crypto/cipher_interface.hpp— platform-aware CSPRNG + shared cipher adapters
- Hardening Pass 4: Fixed 29 additional vulnerabilities across 7 batches (all language bindings):
- Core C++: Arena allocator overflow guards (count×sizeof, size+alignment), signed index overflow in column/offset index, dictionary page num_values validation, batch read count overflow in column_reader, INT64_MIN negation UB in delta encoder (cast to unsigned), page size >2GiB truncation guard, decompression ratio bomb limit (1024:1), NaN exclusion from statistics, schema column() bounds checking with std::out_of_range
- Crypto: AES-CTR counter overflow guard (64 GiB limit matching AES-GCM)
- AI tier: Feature/embedding count bounds checking in decision_log/inference_log deserialization, stoll try/catch in audit metadata parsing, StreamingSink output_dir path traversal guard, ColumnBatch 100M cell OOM limit
- C FFI: try/catch around 5 extern "C" functions (signet_writer_open, write_column_string, write_row, reader_open, read_column_string) preventing C++ exception UB across FFI boundary, signet_schema_builder_free() for proper cleanup, string column read OOM cleanup on partial failure
- Rust FFI: SchemaBuilder::new() returns Result with null check (was infallible), build() only forgets self on success, Drop uses signet_schema_builder_free, write_column_string panic→Result, WriterOptions null allocation assert
- WASM: writeFileToMemfs 256 MB file size limit, schema column accessor bounds checks, writer column bounds checks, encryption key material zeroing after use (volatile write)
- Python: init.py graceful degradation for AI audit types (try/except ImportError), write_column_bool bounds check parity with other write_column methods
- Keygen: parse_hex_hash bare "0x" rejection, expiry_date overflow clamp [1,36500 days], semicolon injection prevention in custom claims
- 423 total unit tests + 5 Rust integration tests + 5 doc-compile tests, all passing across all 5 hardening passes plus static audit follow-up