Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased][]

### Added
- **UDP/datagram transport (handshake)**: a connectionless transport alongside the TCP/stream one, demultiplexed by a random per-session connection index rather than source address (survives NAT rebind and roaming). It carries the full CH-KEM handshake over a lossy link: the large post-quantum Hellos are fragmented across datagrams and reassembled (`pkg/tunnel/reassembly.go`), and a transport-agnostic state machine plus reliability driver (`pkg/tunnel/dgram_handshake_{fsm,driver,wire}.go`) add retransmission with exponential backoff, a retry ceiling, duplicate/replay handling, and a responder linger that recovers a lost final flight. A bad or forged datagram drops rather than failing the handshake. The responder runs no decapsulation and sends no ServerHello until a full ClientHello arrives and a per-source half-open slot is granted, so it never sends more than it received from an unvalidated source. `DialDatagram` performs the initiator handshake and returns an established session.
- **Datagram handshake benchmark**: `quantum-vpn bench --datagram-handshakes N` measures the datagram handshake rate over loopback UDP (~1,300/sec, ~760 µs each on an M1 Pro, vs ~1,450/sec for the stream path).

### Not yet implemented (datagram)
- The encrypted data path (epoch-keyed AEAD over DATA frames) and authenticated CLOSE, so there is no datagram throughput number yet.
- The stateless cookie/RETRY anti-amplification exchange and connection roaming.

### Planned: v0.0.11 - Security Hardening (carryover)
- Handshake timeout on server Accept
- Module integrity verification (fix always-true check)
Expand Down
20 changes: 12 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,19 +90,23 @@ See [Quick Start Guide](docs/usage/QUICKSTART.md) for detailed examples.
Hardware-accelerated where available (ARMv8 Crypto Extensions on Apple Silicon;
AES-NI / AVX2 / hardware SHA-3 on x86-64). Go 1.26.3 (Green Tea GC).

The transport is **TCP/stream only** (length-prefixed framing); UDP is not currently
supported. Two distinct numbers matter: the raw AEAD cipher rate, and the rate actually
achieved end-to-end through a single tunnel (lower, currently allocation-bound; zero-copy
data-plane work is tracked on the [roadmap](docs/ROADMAP.md)).
The stream transport is **TCP** (length-prefixed framing). A connectionless
**UDP/datagram** transport is in progress: its handshake is implemented (fragmented PQ
Hellos, retransmission with backoff, replay of cached flights), with the encrypted data
path not yet landed, so only the datagram handshake rate appears below. For the stream
path, two distinct numbers matter: the raw AEAD cipher rate, and the rate actually achieved
end-to-end through a single tunnel (lower, currently allocation-bound; zero-copy data-plane
work is tracked on the [roadmap](docs/ROADMAP.md)).

**Measured (Apple M1 Pro, Go 1.26.3, loopback TCP):**
**Measured (Apple M1 Pro, Go 1.26.3, loopback):**

| Metric | Result |
|--------|--------|
| AES-256-GCM cipher (raw AEAD, single core) | ~2.5 GB/s |
| ChaCha20-Poly1305 cipher (raw AEAD) | ~0.7 GB/s |
| Handshakes/sec (full CH-KEM, sequential) | ~1,450 (~670 µs each) |
| Single-tunnel throughput (AES-GCM, end-to-end) | ~690 MB/s (5.5 Gb/s), sustained across rekeys |
| Handshakes/sec (stream/TCP, full CH-KEM, sequential) | ~1,450 (~670 µs each) |
| Handshakes/sec (datagram/UDP, full CH-KEM, sequential) | ~1,300 (~760 µs each) |
| Single-tunnel throughput (stream/TCP, AES-GCM, end-to-end) | ~690 MB/s (5.5 Gb/s), sustained across rekeys |

**Estimated on other hardware** (extrapolated from cipher throughput; not yet
independently measured; run the benchmark to verify):
Expand All @@ -113,7 +117,7 @@ independently measured; run the benchmark to verify):
| Mid-range server (Xeon Silver) | 4-7 GB/s |
| Enterprise (Xeon Platinum / EPYC) | 8-12 GB/s |

Run `quantum-vpn bench --handshakes N --throughput` on your target hardware. See [CLI Reference](docs/usage/CLI.md#benchmark-mode).
Run `quantum-vpn bench --handshakes N --datagram-handshakes N --throughput` on your target hardware. See [CLI Reference](docs/usage/CLI.md#benchmark-mode).

## Contributing

Expand Down
82 changes: 75 additions & 7 deletions cmd/quantum-vpn/bench.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ package main

import (
"fmt"
"net"
"os"
"strings"
"sync"
Expand All @@ -11,15 +12,16 @@ import (
"github.com/sara-star-quant/quantum-go/pkg/tunnel"
)

func runBench(handshakes int, throughputTest bool, sizeStr, durationStr, cipherSuite string) {
fmt.Println("╔═══════════════════════════════════════════════════════════╗")
fmt.Println("║ Quantum-Resistant VPN Benchmark ║")
fmt.Println("║ CH-KEM: ML-KEM-1024 + X25519 ║")
fmt.Println("╚═══════════════════════════════════════════════════════════╝")
func runBench(handshakes, datagramHandshakes int, throughputTest bool, sizeStr, durationStr, cipherSuite string) {
const bannerWidth = 59
fmt.Println("╔" + strings.Repeat("═", bannerWidth) + "╗")
fmt.Printf("║%-*s║\n", bannerWidth, " Quantum-Resistant VPN Benchmark")
fmt.Printf("║%-*s║\n", bannerWidth, " CH-KEM: ML-KEM-1024 + X25519")
fmt.Println("╚" + strings.Repeat("═", bannerWidth) + "╝")
fmt.Println()

if handshakes == 0 && !throughputTest {
fmt.Println("No benchmarks specified. Use --handshakes or --throughput")
if handshakes == 0 && datagramHandshakes == 0 && !throughputTest {
fmt.Println("No benchmarks specified. Use --handshakes, --datagram-handshakes, or --throughput")
fmt.Println("Run 'quantum-vpn bench --help' for usage")
os.Exit(1)
}
Expand All @@ -29,6 +31,11 @@ func runBench(handshakes int, throughputTest bool, sizeStr, durationStr, cipherS
fmt.Println()
}

if datagramHandshakes > 0 {
benchDatagramHandshakes(datagramHandshakes)
fmt.Println()
}

if throughputTest {
size := parseSize(sizeStr)
duration := parseDuration(durationStr)
Expand Down Expand Up @@ -102,6 +109,67 @@ func benchHandshakes(count int) {
printHandshakeResults(count, successCount, errors, totalTime, durations)
}

// benchDatagramHandshakes measures full CH-KEM handshakes over the connectionless
// UDP transport on loopback: it dials sequentially through DialDatagram, which
// fragments the post-quantum Hellos and drives retransmission, so the result is
// directly comparable to the stream handshake number. The encrypted datagram data
// path is not implemented yet, so there is no datagram throughput benchmark.
func benchDatagramHandshakes(count int) {
fmt.Printf("Benchmarking Datagram Handshakes (%d iterations)\n", count)
fmt.Println(strings.Repeat("─", 60))

responderConn, err := net.ListenPacket("udp", "127.0.0.1:0")
if err != nil {
fmt.Fprintf(os.Stderr, "Error: Failed to open responder socket: %v\n", err)
os.Exit(1)
}
responder := tunnel.NewDatagramEndpoint(responderConn)
go responder.Serve()
defer func() { _ = responder.Close() }()

initiatorConn, err := net.ListenPacket("udp", "127.0.0.1:0")
if err != nil {
fmt.Fprintf(os.Stderr, "Error: Failed to open initiator socket: %v\n", err)
os.Exit(1)
}
initiator := tunnel.NewDatagramEndpoint(initiatorConn)
go initiator.Serve()
defer func() { _ = initiator.Close() }()

dst := responderConn.LocalAddr()
fmt.Printf("Test setup: %s -> %s\n\n", initiatorConn.LocalAddr(), dst)

durations := make([]time.Duration, count)
errors := 0

startTime := time.Now()
for i := 0; i < count; i++ {
handshakeStart := time.Now()

session, err := tunnel.DialDatagram(initiator, dst)
if err != nil {
errors++
durations[i] = 0
continue
}
durations[i] = time.Since(handshakeStart)
_ = session

step := count / 10
if step == 0 {
step = 1
}
if (i+1)%step == 0 || i == count-1 {
fmt.Printf("Progress: %d/%d (%.0f%%)\r", i+1, count, float64(i+1)/float64(count)*100)
}
}
fmt.Println()

totalTime := time.Since(startTime)
successCount := count - errors
printHandshakeResults(count, successCount, errors, totalTime, durations)
}

func printHandshakeResults(total, successful, failed int, totalTime time.Duration, durations []time.Duration) {
if failed == total {
fmt.Fprintf(os.Stderr, "All handshakes failed\n")
Expand Down
10 changes: 7 additions & 3 deletions cmd/quantum-vpn/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,8 @@ EXAMPLES:

func benchCommand() {
fs := flag.NewFlagSet("bench", flag.ExitOnError)
handshakes := fs.Int("handshakes", 0, "Number of handshakes to benchmark (0 = skip)")
handshakes := fs.Int("handshakes", 0, "Number of stream (TCP) handshakes to benchmark (0 = skip)")
datagramHandshakes := fs.Int("datagram-handshakes", 0, "Number of datagram (UDP) handshakes to benchmark (0 = skip)")
throughput := fs.Bool("throughput", false, "Run throughput benchmark")
size := fs.String("size", "100MB", "Data size for throughput test (e.g., 100MB, 1GB)")
duration := fs.String("duration", "10s", "Duration for throughput test (e.g., 10s, 1m)")
Expand All @@ -145,9 +146,12 @@ OPTIONS:`)
fs.PrintDefaults()
fmt.Println(`
EXAMPLES:
# Benchmark 100 handshakes
# Benchmark 100 stream (TCP) handshakes
quantum-vpn bench --handshakes 100

# Benchmark 100 datagram (UDP) handshakes
quantum-vpn bench --datagram-handshakes 100

# Benchmark throughput for 30 seconds
quantum-vpn bench --throughput --duration 30s

Expand All @@ -160,7 +164,7 @@ EXAMPLES:

_ = fs.Parse(os.Args[2:])

runBench(*handshakes, *throughput, *size, *duration, *cipherSuite)
runBench(*handshakes, *datagramHandshakes, *throughput, *size, *duration, *cipherSuite)
}

func exampleCommand() {
Expand Down
44 changes: 22 additions & 22 deletions docs/datagram-transport.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,22 +5,22 @@ the existing TCP/stream transport. The two share the crypto core (`pkg/chkem`,
`pkg/crypto`, and the `Session` key/rekey secret derivation) but have **separate
wire formats**. There is no TCP↔UDP interop, by design.

The transport is being built as a phased epic. This document tracks the design
and is updated as phases land.
The transport is being built incrementally. This document tracks the design and
is updated as pieces land.

## Status

| Component | File | Status |
|-----------|------|--------|
| Datagram wire codec | `pkg/protocol/datagram_codec.go` | Phase 1a — implemented |
| Multi-word replay window | `pkg/tunnel/replay.go` | Phase 1a — implemented |
| Bounded handshake reassembler | `pkg/tunnel/reassembly.go` | Phase 1a — implemented |
| Datagram constants | `internal/constants/constants.go` | Phase 1a — implemented |
| Endpoint + demux | `pkg/tunnel/datagram.go` | Phase 1a — pending |
| Reliable handshake FSM | `pkg/tunnel/dgram_handshake.go` | Phase 1a — pending |
| Epoch cipher selection (recv) | `pkg/tunnel/session.go` (datagram path) | Phase 1a — pending |
| Zero-alloc / batched I/O | | Phase 1b |
| Stateless cookie / anti-amplification / roaming | | Phase 2 |
| Datagram wire codec | `pkg/protocol/datagram_codec.go` | implemented |
| Multi-word replay window | `pkg/tunnel/replay.go` | implemented |
| Bounded handshake reassembler | `pkg/tunnel/reassembly.go` | implemented |
| Datagram constants | `internal/constants/constants.go` | implemented |
| Endpoint + demux + dial/accept | `pkg/tunnel/datagram.go` | implemented |
| Reliable handshake FSM + driver + wiring | `pkg/tunnel/dgram_handshake_{fsm,driver,wire}.go` | implemented |
| Epoch cipher selection (recv) | `pkg/tunnel/session.go` (datagram path) | pending |
| Zero-alloc / batched I/O | - | future |
| Stateless cookie / anti-amplification / roaming | - | future |

## Wire format

Expand All @@ -33,7 +33,7 @@ Common 14-byte header (all frame types):
[FrameType:1][Epoch:1][RecvIndex:4 BE][Seq:8 BE]
```

- **FrameType** — DATA, HANDSHAKE, CLOSE, or RETRY (RETRY reserved for Phase 2).
- **FrameType** — DATA, HANDSHAKE, CLOSE, or RETRY (RETRY reserved for a future stateless-retry exchange).
- **Epoch** — selects the receive cipher for DATA frames (see *Rekey*). Carried
in the clear but authenticated: the AEAD AAD is the entire 14-byte header, so a
flipped epoch is rejected, not merely mis-routed.
Expand All @@ -51,8 +51,8 @@ HANDSHAKE frame appends an extension before the (possibly fragmented) message:
[SenderIndex:4][MsgType:1][FragOffset:2][FragLen:2][TotalLen:2][CookieLen:1][Cookie:CookieLen]
```

`CookieLen` is `0` in Phase 1; the field exists so the Phase 2 stateless-retry
hardening needs no wire change.
`CookieLen` is `0` today; the field exists so a future stateless-retry hardening
needs no wire change.

## Key design decisions (improve, do not inherit)

Expand All @@ -65,8 +65,8 @@ hardening needs no wire change.
- **Demux by random connection index, not source address.** A session survives
NAT rebind/roaming, one address can host many sessions, and indices resist
off-path guessing (CSPRNG, regenerated on the rare active-table collision). The
source address is only a hint, updated after an AEAD-valid packet (the
authenticated-rebinding enforcement lands in Phase 2).
source address is only a hint, updated after an AEAD-valid packet
(authenticated-rebinding enforcement is future work).

- **Epoch-based rekey (reorder-safe).** The stream transport promotes the new
receive cipher on the first new-key packet and discards the old one — correct
Expand All @@ -93,16 +93,16 @@ hardening needs no wire change.
best-effort datagram, never relied upon. `Send` emits one datagram per call and
rejects payloads larger than `DatagramMaxDataPayload` (no PMTU discovery).

## Phase 1 DoS posture
## DoS posture

Amplification is inherently low: the ~1.7 KB ClientHello must be fully received
and reassembled before the comparable ServerHello is sent (response:request ≈
1:1, not an amplifier). Phase 1 additionally reuses the existing rate limiters
and caps concurrent half-open handshakes per source and overall. The Phase 2
stateless cookie closes the residual spoofed-source state-exhaustion gap and
enforces a strict anti-amplification bound.
1:1, not an amplifier). The current implementation additionally reuses the
existing rate limiters and caps concurrent half-open handshakes per source and
overall. A future stateless cookie will close the residual spoofed-source
state-exhaustion gap and enforce a strict anti-amplification bound.

## Out of scope (future)

GSO/GRO offload, PMTU discovery, multipath, and a parallel per-datagram crypto
pipeline (revisited after the Phase 1 baseline is measured).
pipeline (revisited after the current baseline is measured).
Loading
Loading