Skip to content

iter_rows_keyset infinite loop when by column contains NULL values #1

Description

@mkoskinen

Summary

db.iter_rows_keyset(...) hangs indefinitely when the by column has NULL values. Reproduces on SQLite (aiosqlite) and Postgres (asyncpg, psycopg).

Repro (SQLite, no setup)

import asyncio
from etchdb import DB, Row


class T(Row):
    __table__ = "t"
    id: int
    label: str | None = None


async def main():
    db = await DB.from_url("sqlite+aiosqlite:///:memory:")
    try:
        await db.execute("CREATE TABLE t (id INTEGER PRIMARY KEY, label TEXT)")
        await db.execute("INSERT INTO t VALUES (1, NULL), (2, NULL), (3, 'a'), (4, 'b')")

        rows = []
        async for r in db.iter_rows_keyset(T, by="label", batch_size=2):
            rows.append((r.id, r.label))
            if len(rows) > 10:
                print(f"infinite loop: {len(rows)} yields, expected 4")
                print(f"  first 6: {rows[:6]}")
                return
        print(f"finished with {len(rows)} rows")
    finally:
        await db.close()


asyncio.run(main())

Expected: 4 rows, then terminates.
Actual: yields (1, None), (2, None) forever.

Also reproduces on Postgres

Same shape, slightly different trigger: Postgres sorts NULLs last (SQLite sorts them first), so the bug needs >= batch_size + 1 NULL rows so that one lands at the end of a non-final page.

# postgres: 3 NULL rows + 1 non-null + batch_size=2 reproduces
await db.execute("INSERT INTO t VALUES (1, NULL), (2, NULL), (3, NULL), (4, 'a')")

Root cause

In src/etchdb/db.py:294:

last_seen = rows[-1][by]   # becomes None if last row has NULL in `by`

In src/etchdb/sql/__init__.py:169:

if last_seen is not None:
    where_parts.append(f"{by} > {placeholder(len(params))}")

When last_seen is None, the cursor predicate is skipped entirely, so the next page is identical to the previous one - same rows, same last_seen=None, looping forever.

On the documented contract

The docstring says by must be "monotonic-ordered and unique enough that no two rows tie" - NULL technically violates that. But a silent hang is a much worse failure mode than the docstring's wording prepares the user for: no exception, no log line, just a coroutine that never ends. Easy to ship to production unnoticed.

Fix shape - two options

Option A (small, narrow): runtime guard. Keeps the single-column keyset narrow, converts the hang into a clear error.

last_seen = rows[-1][by]
if last_seen is None:
    raise ValueError(
        f"iter_rows_keyset: {by!r} is NULL at the page boundary; the "
        f"cursor cannot advance. Pick a non-nullable column for `by`, "
        f"or filter NULLs out via raw SQL."
    )

This is the minimum to stop the silent hang. Doesn't address tied non-NULL values, which silently drop rows from the same root cause (WHERE by > last_seen can't advance through equal values either) — separate issue, same family.

Option B (bigger): compound cursor (by, pk) with NULL-aware comparison (IS NOT DISTINCT FROM on Postgres, IS on SQLite). Tie-breaks on PK so the cursor can advance through both NULL rows and tied by values. Fixes the NULL hang and the silent-tie-skip together. Bigger surface - dialect-specific SQL, more involved testing.

Environment

  • etchdb 0.6.0
  • Python 3.14
  • aiosqlite 0.22.1 / asyncpg 0.31.0 / psycopg 3.3.4

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions