Skip to content

DSL tracing infrastructure, execution safety fixes, and monitoring support#2

Open
kmdvs wants to merge 6 commits into
masterfrom
cleanup-history
Open

DSL tracing infrastructure, execution safety fixes, and monitoring support#2
kmdvs wants to merge 6 commits into
masterfrom
cleanup-history

Conversation

@kmdvs
Copy link
Copy Markdown

@kmdvs kmdvs commented Mar 16, 2026

Summary

This PR introduces tracing support for DSL algorithm execution and improves the reliability of DSL evaluation in the Condenser pipeline.

The changes allow developers to inspect each step of DSL algorithm execution, making debugging of scraping pipelines significantly easier.

In addition, several fixes improve execution safety and thread isolation.

Finally, monitoring support was added for websites.


Key Changes

DSL tracing infrastructure

  • Introduce a DSL algorithm runner
  • Add trace collection during algorithm execution
  • Expose trace data in the UI when tracing is enabled
  • Improve debugging visibility for DSL algorithm steps

DSL execution fixes

  • Ensure thread-safe execution of DSL steps
  • Support local lambdas across algorithm steps
  • Properly propagate abort conditions from Wringer
  • Reset thread-local DSL state between executions

Website monitoring support

  • Add monitorable attribute to websites
  • Allow marking websites for monitoring/debugging purposes

Database Changes

A migration is included:

add_monitorable_to_websites

This adds a column:

websites.monitorable :boolean

The column indicates whether a website should be monitored/debugged.


Tests

Added and updated tests:

  • dsl_algorithm_runner_test
  • dsl_content_parser_test
  • statements_helper_test

These tests verify:

  • correct DSL algorithm execution
  • algorithm parsing
  • tracing behavior

Motivation

Debugging DSL scraping pipelines has historically been difficult because intermediate algorithm steps were not visible.

Tracing provides visibility into each algorithm step (XPath extraction, Ruby transformations, etc.), allowing developers to quickly identify failures.

The execution fixes also prevent state leakage between DSL runs and improve stability under concurrent execution.


Deployment Notes

After deploying, run:
rails db:migrate

No breaking schema changes are expected.


Review Notes

The main implementation areas are:

  • app/services/dsl/ (DSL execution and tracing)
  • statements_controller / views (trace UI integration)
  • migration adding websites.monitorable

Tracing is optional and only enabled when explicitly requested.

Introduce DSL algorithm runner and trace collector for
debugging scraping steps. Update statements controller
and views to conditionally render trace data when enabled.
Add options to toggle trace in UI, improve wringer error
handling, and introduce trace partial.

Also update routes and helpers to support the new trace
feature.
Add monitoring capability for websites and improve DSL execution
pipeline.

Includes:
- monitoring flag for websites
- DSL runner improvements
- additional tests for parser and algorithm runner
@kmdvs kmdvs requested a review from saumier March 16, 2026 19:42
@kmdvs kmdvs added the enhancement New feature or request label Mar 16, 2026
@saumier saumier removed their assignment May 16, 2026
Copy link
Copy Markdown
Member

@saumier saumier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Kim. I am not able to review this. These changes are too big and I don't understand enough to add value here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants