Skip to content

Implement Write-Ahead Log with recovery and skip list concurrency improvements#22

Open
Souvik606 wants to merge 19 commits into
mainfrom
feat/storage-engine/wal
Open

Implement Write-Ahead Log with recovery and skip list concurrency improvements#22
Souvik606 wants to merge 19 commits into
mainfrom
feat/storage-engine/wal

Conversation

@Souvik606

@Souvik606 Souvik606 commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator

Issue Reference

Summary by CodeRabbit

  • New Features

    • Durable write-ahead log with framed records, CRC validation, append-only writer with background batching, segment rotation/sync, directory replay that recovers valid records and truncates corrupted tails, and rejection of empty keys on append.
  • Tests

    • Extensive WAL test suites for framing, CRC/truncation/error handling, writer behavior, rotation, recovery ordering/truncation, and concurrency.
    • New skip-list concurrency stress tests validating concurrent reads/writes.

@coderabbitai

coderabbitai Bot commented Jun 5, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds a framed WAL (CRC32 + size + opcode + key-length + key + value), exported errors and Record marshal/unmarshal, Replay recovery with truncation behavior, a batched rotating LogWriter with fsync and Close semantics, extensive WAL unit tests, and two SkipList concurrency stress tests.

Changes

Write-Ahead Log (WAL) Durability Implementation

Layer / File(s) Summary
WAL error types and record schema
internal/storage/wal/errors.go, internal/storage/wal/record.go
Three exported errors (ErrInvalidCRC, ErrTruncated, ErrEmptyKey), Record type, OpcodePut/OpcodeDelete, Marshal() framing with size/CRC/opcode/key-length/key/value, and UnmarshalRecord() validating truncation and CRC.
Record marshal/unmarshal tests
internal/storage/wal/record_test.go
Unit tests cover frame layout, CRC coverage, delete opcode, zero-length/nil key/value handling, large payloads, binary round-trips, idempotent marshaling, and negative truncation/CRC cases.
Recovery interface and Replay
internal/storage/wal/recovery.go
Adds MemTable interface and Replay(directory, engine) which enumerates/sorts *.wal files, computes next segment ID, and replays segments in order.
Segment replay loop and truncation handling
internal/storage/wal/recovery.go
Per-segment replay reads 8-byte headers and payloads, enforces minimum/maximum frame sizes, distinguishes EOF vs unexpected EOF, validates CRC via UnmarshalRecord, truncates files on corrupted/truncated frames, and applies Put/Delete to the provided MemTable.
Recovery tests: ordering & corruption resilience
internal/storage/wal/recovery_test.go
Tests missing/empty dirs, non-WAL ignore, single/multi-segment order, highest-segment-id return, unknown opcode ignore, CRC/header/payload truncation results, empty segments, memtable error propagation, mixed ops, numeric sorting, subdirectory ignore, and invalid frame-size cases.
LogWriter (batching, rotation, fsync)
internal/storage/wal/writer.go
MaxSegmentSizeBytes constant (32 MiB), LogWriter with NewLogWriter, Append (reject empty key), background batchWorker grouping writes, segment rotation, batched write+fsync, current size tracking, and idempotent Close.
Writer tests: init, append, rotation, concurrency, close
internal/storage/wal/writer_test.go
Tests directory/segment creation, startSegmentID handling, invalid directory error, append semantics (roundtrip via Replay, empty/nil key validation, empty value allowed), rotation behavior and fsync, concurrent append recoverability, Close idempotency/durability, and batch/group-commit concurrency tests.

SkipList Concurrency Testing

Layer / File(s) Summary
SkipList test imports
internal/storage/memtable/skiplist_test.go
Adds fmt and sync imports used by concurrency stress tests.
Strict reader/writer stress test
internal/storage/memtable/skiplist_test.go
TestSkipList_StrictConcurrency: one writer repeatedly Put a shared key while multiple readers concurrently Get, asserting only ErrKeyNotFound or correctly-prefixed values are observed.
General Put/Delete/iterator stress test
internal/storage/memtable/skiplist_test.go
TestSkipList_Concurrency: concurrent Put in key-000..099, Delete in key-100..199, parallel iterators, and final Get validation (present vs ErrKeyNotFound).

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant LogWriter
  participant BatchWorker
  participant FileSystem
  Client->>LogWriter: Append(record)
  LogWriter->>BatchWorker: enqueue commitTicket
  BatchWorker->>FileSystem: write batch of frames
  FileSystem->>FileSystem: fsync()
  FileSystem-->>BatchWorker: write/fsync result
  BatchWorker-->>LogWriter: complete tickets (errors)
  LogWriter-->>Client: Append result
Loading
sequenceDiagram
  participant Startup
  participant Replay
  participant FileSystem
  participant Unmarshal
  participant MemTable
  Startup->>Replay: Replay(directory, memtable)
  Replay->>FileSystem: open & read segment header+payload
  FileSystem->>Unmarshal: pass frame
  Unmarshal->>Replay: validate CRC/unmarshal
  Replay->>MemTable: Put/Delete applied
  Replay-->>Startup: return highest segment id or error
Loading

Assessment against linked issues

Objective Addressed Explanation
Define binary format for WAL records with checksum, length, opcode, key, value [#4]
Implement append-only file writer with file.Sync() for durability [#4]
Write crash-recovery routine to read and replay WAL operations into MemTable [#4]
Add concurrency tests for SkipList thread-safety with Put, Get, Delete [#21]

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread internal/storage/wal/recovery.go Fixed
Signed-off-by: Souvik Mukherjee <115608850+Souvik606@users.noreply.github.com>
@rahulc0dy rahulc0dy changed the title Feat/storage engine/wal @coderabbitai Jun 5, 2026
@rahulc0dy

Copy link
Copy Markdown
Contributor

@coderabbitai set pr title

@coderabbitai

coderabbitai Bot commented Jun 5, 2026

Copy link
Copy Markdown

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Fetch current PR details
gh pr view 22 --repo makeshift-engineering/penguin-db --json title,body,files

Length of output: 1921


🏁 Script executed:

#!/bin/bash
gh pr edit 22 --repo makeshift-engineering/penguin-db --title "feat: implement Write-Ahead Log (WAL) with recovery and skip list concurrency improvements"

Length of output: 464



@rahulc0dy Unfortunately, the command to update the PR title is blocked due to workflow restriction settings. You can set it manually to the following suggested title:

feat: implement Write-Ahead Log (WAL) with recovery and skip list concurrency improvements

This title reflects the key changes in the PR:

  • WAL implementation (writer, recovery, record marshaling, CRC validation, error handling)
  • Skip list concurrency improvements

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (1)
internal/storage/wal/writer.go (1)

107-116: ⚡ Quick win

Consider clearing batch slice references to avoid potential memory retention.

Lines 108–109 reuse commitBatch and writeBuffer by truncating to [:0] and appending new data. While the slices are reused efficiently, the underlying array may still hold pointers to old commitTicket objects if a previous batch was larger than the current one. Those tickets will not be garbage-collected until the array is reallocated or overwritten.

For example, if one batch has 10,000 tickets and the next has 10, the array retains 9,990 stale ticket pointers until they are eventually overwritten.

♻️ Proposed fix

Nil out ticket references after signaling to allow garbage collection:

 		for _, ticket := range commitBatch {
 			ticket.resultChan <- err
 		}
+		for i := range commitBatch {
+			commitBatch[i] = nil
+		}
 	}
 }

This ensures that completed tickets can be collected promptly, preventing retention of large batches in memory.


ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 02fd8400-1f9e-40da-8a94-9c5f32793bd2

📥 Commits

Reviewing files that changed from the base of the PR and between 885d5cc and bfa0f94.

📒 Files selected for processing (9)
  • internal/storage/memtable/skiplist_test.go
  • internal/storage/wal/.gitkeep
  • internal/storage/wal/errors.go
  • internal/storage/wal/record.go
  • internal/storage/wal/record_test.go
  • internal/storage/wal/recovery.go
  • internal/storage/wal/recovery_test.go
  • internal/storage/wal/writer.go
  • internal/storage/wal/writer_test.go

Comment thread internal/storage/memtable/skiplist_test.go
Comment thread internal/storage/memtable/skiplist_test.go
Comment thread internal/storage/wal/recovery.go
Comment thread internal/storage/wal/writer.go Outdated
Comment thread internal/storage/wal/writer.go
@rahulc0dy rahulc0dy changed the title @coderabbitai Implement Write-Ahead Log with recovery and skip list concurrency improvements Jun 5, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/storage/wal/recovery.go (1)

43-48: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Check fmt.Sscanf return value to satisfy errcheck and handle malformed filenames.

The return value is unchecked. If a file named e.g. "abc.wal" exists, segmentID remains 0 while the file is still replayed—silently corrupting segment-id tracking. Consider skipping files that don't match the expected pattern.

🛡️ Proposed fix
 	for _, fileName := range walFiles {
 		var segmentID int
-		fmt.Sscanf(fileName, "%d.wal", &segmentID)
+		if n, _ := fmt.Sscanf(fileName, "%d.wal", &segmentID); n != 1 {
+			slog.Debug("skipping WAL file with unexpected name format", "file", fileName)
+			continue
+		}
 		if segmentID > highestSegmentID {
 			highestSegmentID = segmentID
 		}

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 96173958-2688-4048-946d-a169d99a40ca

📥 Commits

Reviewing files that changed from the base of the PR and between bfa0f94 and 1903a89.

📒 Files selected for processing (4)
  • internal/storage/memtable/skiplist_test.go
  • internal/storage/wal/recovery.go
  • internal/storage/wal/recovery_test.go
  • internal/storage/wal/writer.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • internal/storage/memtable/skiplist_test.go
  • internal/storage/wal/writer.go
  • internal/storage/wal/recovery_test.go

Comment thread internal/storage/wal/recovery.go Outdated
coderabbitai[bot]
coderabbitai Bot previously approved these changes Jun 5, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
internal/storage/wal/recovery_test.go (2)

228-230: ⚡ Quick win

Handle errors explicitly in test setup.

These test setup blocks ignore errors when opening files. If os.OpenFile fails, the subsequent Write or Close operations will panic, making test failures harder to diagnose.

🛡️ Proposed fix for explicit error handling

For line 228-230:

-	f, _ := os.OpenFile(path, os.O_APPEND|os.O_WRONLY, 0644)
+	f, err := os.OpenFile(path, os.O_APPEND|os.O_WRONLY, 0644)
+	if err != nil {
+		t.Fatal(err)
+	}
 	f.Write(badFrame)
 	f.Close()

Apply similar changes to lines 262-264 and 290-293.

Also applies to: 262-264, 290-293


440-442: ⚡ Quick win

Handle errors explicitly in test setup.

Similar to earlier tests, these blocks ignore errors when opening files. Explicit error handling improves test diagnostics.

🛡️ Proposed fix

For line 440-442:

-	f, _ := os.OpenFile(path, os.O_APPEND|os.O_WRONLY, 0644)
+	f, err := os.OpenFile(path, os.O_APPEND|os.O_WRONLY, 0644)
+	if err != nil {
+		t.Fatal(err)
+	}
 	f.Write(hdr)
 	f.Close()

Apply a similar change to lines 474-476.

Also applies to: 474-476


ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 88bf4d59-f5d8-441b-8b2e-562b6f2a4fc9

📥 Commits

Reviewing files that changed from the base of the PR and between 1903a89 and 4e34f85.

📒 Files selected for processing (8)
  • internal/storage/memtable/skiplist_test.go
  • internal/storage/wal/errors.go
  • internal/storage/wal/record.go
  • internal/storage/wal/record_test.go
  • internal/storage/wal/recovery.go
  • internal/storage/wal/recovery_test.go
  • internal/storage/wal/writer.go
  • internal/storage/wal/writer_test.go
🚧 Files skipped from review as they are similar to previous changes (7)
  • internal/storage/wal/errors.go
  • internal/storage/memtable/skiplist_test.go
  • internal/storage/wal/record.go
  • internal/storage/wal/record_test.go
  • internal/storage/wal/writer.go
  • internal/storage/wal/recovery.go
  • internal/storage/wal/writer_test.go

Comment thread internal/storage/wal/recovery_test.go Outdated
coderabbitai[bot]
coderabbitai Bot previously approved these changes Jun 5, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
internal/storage/wal/record.go (1)

82-90: ⚡ Quick win

Validate the stored frame size during decode.

UnmarshalRecord never checks the encoded Frame Size field against len(frameData), so that header is effectively write-only. Adding that check would catch malformed frames earlier and make the framing contract self-consistent.


ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 51e5d53d-ef64-444a-a210-d805ac12f993

📥 Commits

Reviewing files that changed from the base of the PR and between 96c242b and 5283cbd.

📒 Files selected for processing (1)
  • internal/storage/wal/record.go

Comment thread internal/storage/wal/record.go

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/storage/wal/recovery.go (1)

129-138: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't truncate the segment on non-EOF payload read failures.

This branch turns any io.ReadFull error into Truncate(validBytes) + Sync(). If the underlying read fails for an actual disk/I/O reason, recovery will mutate the WAL and discard data instead of surfacing the read failure. Only io.EOF / io.ErrUnexpectedEOF should take the truncation path.

Suggested fix
 	payloadBuffer := make([]byte, payloadSizeBytes)
 	_, err = io.ReadFull(file, payloadBuffer)
 	if err != nil {
-		slog.Debug("unexpected EOF in payload, truncating segment",
-			"file", filepath.Base(filePath),
-			"valid_bytes", validBytes)
-
-		if truncErr := file.Truncate(validBytes); truncErr != nil {
-			return truncErr
-		}
-		return file.Sync()
+		if errors.Is(err, io.EOF) || errors.Is(err, io.ErrUnexpectedEOF) {
+			slog.Debug("unexpected EOF in payload, truncating segment",
+				"file", filepath.Base(filePath),
+				"valid_bytes", validBytes)
+
+			if truncErr := file.Truncate(validBytes); truncErr != nil {
+				return truncErr
+			}
+			return file.Sync()
+		}
+		return fmt.Errorf("unexpected disk error reading frame payload: %w", err)
 	}

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 0adeb482-0dc2-494b-be91-0ea2a0f91f38

📥 Commits

Reviewing files that changed from the base of the PR and between 5283cbd and 26218d7.

📒 Files selected for processing (2)
  • internal/storage/wal/recovery.go
  • internal/storage/wal/writer.go

Comment thread internal/storage/wal/writer.go Outdated
Comment thread internal/storage/wal/writer.go

info, err := file.Stat()
if err != nil {
file.Close()
@theMr17 theMr17 added the storage Core LSM-tree storage engine components, including memory buffers, disk I/O, and compaction. label Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

storage Core LSM-tree storage engine components, including memory buffers, disk I/O, and compaction.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add tests to verify concurrency of SkipList Implement Durability via Write-Ahead Log (WAL)

3 participants