Skip to content

[wip] investigate soak test failures#1791

Draft
alxndrsn wants to merge 107 commits into
getodk:masterfrom
alxndrsn:soak-test-test
Draft

[wip] investigate soak test failures#1791
alxndrsn wants to merge 107 commits into
getodk:masterfrom
alxndrsn:soak-test-test

Conversation

@alxndrsn
Copy link
Copy Markdown
Contributor

@alxndrsn alxndrsn commented Apr 1, 2026

Ref getodk/central#1734

areas to investigate

  • idle queries?
  • dangling queries
    • which?
    • how long do they dangle?
      • run with massive timeout, and confirm if/no failure
      • suite: 9 minutes before cleanup
      • individual query:
    • what is the cause?
  • zip 500 errors
    • recreate?
    • cause: connection pool exhaustion
    • should these cause suite-failure? currently acceptance threshold for the zip test is 0%!
  • if all connections have drained, why are there still open queries?

spin-off PRs

@alxndrsn
Copy link
Copy Markdown
Contributor Author

From e.g. https://github.com/getodk/central-backend/actions/runs/24929884345/job/73005829220?pr=1791, it looks like a request can complete (unclear if 200 or 500), but leave a dangling connection:

2026-04-25T11:36:38.4550676Z }
2026-04-25T11:36:38.4551428Z [2026-04-25T11:36:38.453Z] INFO [test/e2e/soak] iterate(238292720085693) completed; openRequests: -998
2026-04-25T11:36:38.8282486Z [2026-04-25T11:36:38.827Z] INFO [test/e2e/soak] iterate(1543570849309272) returned: 861481 bytes
2026-04-25T11:36:38.8283582Z [2026-04-25T11:36:38.828Z] INFO [test/e2e/soak] iterate(1543570849309272) completed; openRequests: -1000
2026-04-25T11:36:38.8573775Z [2026-04-25T11:36:38.857Z] INFO [test/e2e/soak] All connections have completed.
2026-04-25T11:36:38.8603946Z [2026-04-25T11:36:38.860Z] REPORT [test/e2e/soak] --------------------------
2026-04-25T11:36:38.8605405Z [2026-04-25T11:36:38.860Z] REPORT [test/e2e/soak]               Test: exportZipWithDataAndMedia
2026-04-25T11:36:38.8606473Z [2026-04-25T11:36:38.860Z] REPORT [test/e2e/soak]      Test duration: 300000
2026-04-25T11:36:38.8607458Z [2026-04-25T11:36:38.860Z] REPORT [test/e2e/soak]     Total requests: 1000
2026-04-25T11:36:38.8608480Z [2026-04-25T11:36:38.860Z] REPORT [test/e2e/soak] Success % required: 0%
2026-04-25T11:36:38.8609447Z [2026-04-25T11:36:38.860Z] REPORT [test/e2e/soak]          Successes: 193 (19%)
2026-04-25T11:36:38.8610549Z [2026-04-25T11:36:38.860Z] REPORT [test/e2e/soak]         Throughput: 0.6 reqs/s
2026-04-25T11:36:38.8611378Z [2026-04-25T11:36:38.860Z] REPORT [test/e2e/soak]           Failures: 807
2026-04-25T11:36:38.8612010Z [2026-04-25T11:36:38.860Z] REPORT [test/e2e/soak]     Response times:
2026-04-25T11:36:38.8612861Z [2026-04-25T11:36:38.860Z] REPORT [test/e2e/soak]               mean: 17.2s
2026-04-25T11:36:38.8613500Z [2026-04-25T11:36:38.860Z] REPORT [test/e2e/soak]                min: 4771 ms
2026-04-25T11:36:38.8614123Z [2026-04-25T11:36:38.860Z] REPORT [test/e2e/soak]                max: 26559 ms
2026-04-25T11:36:38.8615263Z [2026-04-25T11:36:38.860Z] REPORT [test/e2e/soak]     Response sizes:
2026-04-25T11:36:38.8616135Z [2026-04-25T11:36:38.860Z] REPORT [test/e2e/soak]                min: 861481 b
2026-04-25T11:36:38.8617156Z [2026-04-25T11:36:38.860Z] REPORT [test/e2e/soak]                max: 861481 b
2026-04-25T11:36:38.8618149Z [2026-04-25T11:36:38.860Z] REPORT [test/e2e/soak]             Errors:
2026-04-25T11:36:38.8622314Z [2026-04-25T11:36:38.861Z] REPORT [test/e2e/soak]               * 500: {"message":"Internal Server Error","stack":"ConnectionError: timeout exceeded when trying to connect\n    at Object.createConnection (/home/runner/work/central-backend/central-backend/node_modules/slonik/dist/src/factories/createConnection.js:55:23)\n    at runNextTicks (node:internal/process/task_queues:64:5)\n    at listOnTimeout (node:internal/timers:567:9)\n    at process.processTimers (node:internal/timers:541:7)\n    at async Promise.all (index 2)"}
2026-04-25T11:36:38.8628649Z [2026-04-25T11:36:38.861Z] REPORT [test/e2e/soak]               * 500: {"message":"Internal Server Error","stack":"ConnectionError: timeout exceeded when trying to connect\n    at Object.createConnection (/home/runner/work/central-backend/central-backend/node_modules/slonik/dist/src/factories/createConnection.js:55:23)\n    at runNextTicks (node:internal/process/task_queues:64:5)\n    at listOnTimeout (node:internal/timers:567:9)\n    at process.processTimers (node:internal/timers:541:7)\n    at async Promise.all (index 0)"}
2026-04-25T11:36:38.8634026Z [2026-04-25T11:36:38.861Z] REPORT [test/e2e/soak]               * 500: {"message":"Internal Server Error","stack":"ConnectionError: timeout exceeded when trying to connect\n    at Object.createConnection (/home/runner/work/central-backend/central-backend/node_modules/slonik/dist/src/factories/createConnection.js:55:23)\n    at async fullzip (/home/runner/work/central-backend/central-backend/lib/resources/submissions.js:346:31)"}
2026-04-25T11:36:38.8640365Z [2026-04-25T11:36:38.861Z] REPORT [test/e2e/soak]               * 500: {"message":"Internal Server Error","stack":"ConnectionError: timeout exceeded when trying to connect\n    at Object.createConnection (/home/runner/work/central-backend/central-backend/node_modules/slonik/dist/src/factories/createConnection.js:55:23)\n    at runNextTicks (node:internal/process/task_queues:64:5)\n    at listOnTimeout (node:internal/timers:567:9)\n    at process.processTimers (node:internal/timers:541:7)\n    at async fullzip (/home/runner/work/central-backend/central-backend/lib/resources/submissions.js:346:31)"}
2026-04-25T11:36:38.8646751Z [2026-04-25T11:36:38.861Z] REPORT [test/e2e/soak]               * 500: {"message":"Internal Server Error","stack":"ConnectionError: timeout exceeded when trying to connect\n    at Object.createConnection (/home/runner/work/central-backend/central-backend/node_modules/slonik/dist/src/factories/createConnection.js:55:23)\n    at runNextTicks (node:internal/process/task_queues:64:5)\n    at listOnTimeout (node:internal/timers:567:9)\n    at process.processTimers (node:internal/timers:541:7)"}
2026-04-25T11:36:38.8652826Z [2026-04-25T11:36:38.861Z] REPORT [test/e2e/soak]               * 500: {"message":"Internal Server Error","stack":"ConnectionError: timeout exceeded when trying to connect\n    at Object.createConnection (/home/runner/work/central-backend/central-backend/node_modules/slonik/dist/src/factories/createConnection.js:55:23)\n    at runNextTicks (node:internal/process/task_queues:64:5)\n    at listOnTimeout (node:internal/timers:567:9)\n    at process.processTimers (node:internal/timers:541:7)\n    at async fullzip (/home/runner/work/central-backend/central-backend/lib/resources/submissions.js:344:20)"}
2026-04-25T11:36:38.8658328Z [2026-04-25T11:36:38.861Z] REPORT [test/e2e/soak]               * 500: {"message":"Internal Server Error","stack":"ConnectionError: timeout exceeded when trying to connect\n    at Object.createConnection (/home/runner/work/central-backend/central-backend/node_modules/slonik/dist/src/factories/createConnection.js:55:23)"}
2026-04-25T11:36:38.8663143Z [2026-04-25T11:36:38.861Z] REPORT [test/e2e/soak]               * 500: {"message":"Internal Server Error","stack":"ConnectionError: timeout exceeded when trying to connect\n    at Object.createConnection (/home/runner/work/central-backend/central-backend/node_modules/slonik/dist/src/factories/createConnection.js:55:23)\n    at async fullzip (/home/runner/work/central-backend/central-backend/lib/resources/submissions.js:344:20)"}
2026-04-25T11:36:38.8669455Z [2026-04-25T11:36:38.861Z] REPORT [test/e2e/soak]               * 500: {"message":"Internal Server Error","stack":"ConnectionError: timeout exceeded when trying to connect\n    at Object.createConnection (/home/runner/work/central-backend/central-backend/node_modules/slonik/dist/src/factories/createConnection.js:55:23)\n    at runNextTicks (node:internal/process/task_queues:64:5)\n    at listOnTimeout (node:internal/timers:567:9)\n    at process.processTimers (node:internal/timers:541:7)\n    at async Promise.all (index 5)"}
2026-04-25T11:36:38.8675144Z [2026-04-25T11:36:38.861Z] REPORT [test/e2e/soak]               * 500: {"message":"Internal Server Error","stack":"ConnectionError: timeout exceeded when trying to connect\n    at Object.createConnection (/home/runner/work/central-backend/central-backend/node_modules/slonik/dist/src/factories/createConnection.js:55:23)\n    at async Promise.all (index 5)"}
2026-04-25T11:36:38.8679696Z [2026-04-25T11:36:38.861Z] REPORT [test/e2e/soak]               * 500: {"message":"Internal Server Error","stack":"ConnectionError: timeout exceeded when trying to connect\n    at Object.createConnection (/home/runner/work/central-backend/central-backend/node_modules/slonik/dist/src/factories/createConnection.js:55:23)\n    at async Promise.all (index 0)"}
2026-04-25T11:36:38.8682253Z [2026-04-25T11:36:38.861Z] REPORT [test/e2e/soak] --------------------------
2026-04-25T11:36:38.8683131Z [2026-04-25T11:36:38.861Z] REPORT [test/e2e/soak] !!!
2026-04-25T11:36:38.8683853Z [2026-04-25T11:36:38.861Z] REPORT [test/e2e/soak] !!!
2026-04-25T11:36:38.8685037Z [2026-04-25T11:36:38.861Z] REPORT [test/e2e/soak] !!! REQUEST FAILURES MAY AFFECT SUBSEQUENT SOAK TESTS!
2026-04-25T11:36:38.8686020Z [2026-04-25T11:36:38.861Z] REPORT [test/e2e/soak] !!!
2026-04-25T11:36:38.8686740Z [2026-04-25T11:36:38.861Z] REPORT [test/e2e/soak] !!!
2026-04-25T11:36:38.8687760Z [2026-04-25T11:36:38.861Z] REPORT [test/e2e/soak] --------------------------
2026-04-25T11:36:38.8688834Z [2026-04-25T11:36:38.861Z] INFO [test/e2e/soak] Check for extra logs at /tmp/soak-tester-logs
2026-04-25T11:36:38.8689803Z [2026-04-25T11:36:38.861Z] INFO [test/e2e/soak] Complete.
2026-04-25T11:36:38.8795446Z [test/e2e/soak/run-tests] Checking open DB query count...
2026-04-25T11:36:39.1093800Z Open queries: [{...

alxndrsn added a commit that referenced this pull request Apr 28, 2026
Convert `doSoakTest()` function arguments to named object properties.  This makes it much easier to see what each number is, which is very helpful when understanding, modifying and investigating soak test failures, changes etc.

Ref #1791
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant