Skip to content

Use raw SQLite executemany for bulk database writes#268

Merged
cmutel merged 3 commits into
mainfrom
feature/executemany-writes
May 13, 2026
Merged

Use raw SQLite executemany for bulk database writes#268
cmutel merged 3 commits into
mainfrom
feature/executemany-writes

Conversation

@cmutel
Copy link
Copy Markdown
Member

@cmutel cmutel commented May 13, 2026

Summary

Extracted from #255 by Raphael Jolivet — isolating just the executemany bulk-insert optimization without the schema normalization, pickle→JSON, or drop_metadata changes from that PR.

  • Adds insert_many_activities() and insert_many_exchanges() to schema.py, each using cursor.executemany() on the raw SQLite connection to bypass Peewee ORM overhead per row
  • _efficient_write_dataset() no longer flushes batches of 125 mid-loop (that limit existed because Peewee's insert_many builds one large INSERT ... VALUES (?, ?...), (?, ?...) statement that hits SQLite's 999-variable limit; executemany uses a single prepared statement executed repeatedly, so no such limit applies)
  • _efficient_write_many_data() collects all datasets in one pass, then calls the two new functions once

Co-authored-by: Raphael Jolivet contact@raphael-jolivet.name

Test plan

  • All 178 existing tests pass
  • Manual benchmark against ecoinvent import to verify speedup

Replace Peewee ORM insert_many() calls in _efficient_write_many_data with
raw cursor.executemany(), bypassing per-row ORM overhead. The old approach
batched in groups of 125 to stay under SQLite's 999-variable limit; executemany
uses a single prepared statement executed repeatedly, so no such limit applies
and the full dataset can be flushed in one call.

New functions insert_many_activities() and insert_many_exchanges() in schema.py
handle the raw insertion. _efficient_write_dataset() no longer returns lists or
flushes mid-loop; _efficient_write_many_data() calls the new functions once after
iterating all datasets.

Co-authored-by: Raphael Jolivet <contact@raphael-jolivet.name>
cmutel added 2 commits May 13, 2026 20:10
SQLite's raw executemany cannot bind Python tuples to TEXT columns,
but locations like ("foo", "bar") are valid in brightway. Coerce
location to str() before binding, matching Peewee's TextField behavior.
@cmutel cmutel merged commit 106d627 into main May 13, 2026
9 checks passed
@cmutel cmutel deleted the feature/executemany-writes branch May 13, 2026 21:10
cmutel added a commit that referenced this pull request May 13, 2026
@cmutel cmutel mentioned this pull request May 13, 2026
2 tasks
cmutel added a commit that referenced this pull request May 14, 2026
cmutel added a commit that referenced this pull request May 14, 2026
* Release 4.7

Add changelog entry for 4.7 release.

* Add #268 to 4.7 changelog

* Add #270 to 4.7 changelog
@cmutel cmutel self-assigned this May 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant