Skip to content

feat(extractors): conditional Extractors with when Blocks #51

@maycon

Description

@maycon

Feature Request: Conditional Extractors with when Blocks

Summary

Add support for conditional execution of extractors using when blocks, similar to the existing functionality in state transitions. This would allow extractors to run only when specific conditions are met, improving flexibility and reducing unnecessary processing.

Motivation

Currently, all extractors defined in a state are executed unconditionally. This creates several limitations:

Problem 1: Error-prone extraction

When a response doesn't contain the expected data (e.g., 404 error, validation failure), extractors still attempt to run and may:

  • Generate misleading error messages
  • Extract incorrect values from error pages
  • Pollute the context with invalid data

Problem 2: Status-dependent extraction

Different HTTP status codes often require different extraction logic:

  • 200 OK → Extract session token
  • 401 Unauthorized → Extract error message
  • 403 Forbidden → Extract rate limit info

Currently, you must either:

  • Use complex regex patterns that handle all cases
  • Accept failed extractions in error scenarios
  • Create separate states for each status code

Problem 3: Content-type dependent extraction

JSON vs HTML responses require different extractors, but currently both execute regardless of response type.

Proposed Solution

Extend the extractor syntax to support optional when conditions, using the same syntax as state transitions:

Basic Syntax

states:
  login:
    request: |
      POST /api/login HTTP/1.1
      ...
    
    extract:
      # Conditional extractor - only runs when condition is true
      session_token:
        type: jpath
        pattern: "$.token"
        when:
          on_status: 200
      
      # Another condition example
      error_message:
        type: jpath
        pattern: "$.error"
        when:
          on_status: [400, 401, 403]

Advanced Examples

Example 1: Status-based extraction

extract:
  # Success case
  auth_token:
    type: regex
    pattern: 'token=([a-zA-Z0-9]+)'
    when:
      on_status: 200
  
  # Error cases
  error_code:
    type: jpath
    pattern: "$.error.code"
    when:
      on_status: [400, 401, 403, 500]
  
  rate_limit_reset:
    type: header
    pattern: "X-RateLimit-Reset: (\\d+)"
    when:
      on_status: 429

Example 2: Content-type based extraction

extract:
  # JSON extraction
  user_id:
    type: jpath
    pattern: "$.user.id"
    when:
      on_header:
        name: "Content-Type"
        pattern: "application/json"
  
  # HTML extraction
  csrf_token:
    type: regex
    pattern: 'name="csrf" value="([^"]+)"'
    when:
      on_header:
        name: "Content-Type"
        pattern: "text/html"

Example 3: Body pattern matching

extract:
  # Only extract if response contains success indicator
  transaction_id:
    type: regex
    pattern: 'transaction_id=([0-9]+)'
    when:
      on_body:
        pattern: "status.*success"
  
  # Extract error details only if error present
  error_details:
    type: jpath
    pattern: "$.error"
    when:
      on_body:
        pattern: '"error":'

Example 4: Multiple conditions (AND logic)

extract:
  api_key:
    type: jpath
    pattern: "$.data.api_key"
    when:
      on_status: 200
      on_header:
        name: "Content-Type"
        pattern: "application/json"
      on_body:
        pattern: '"api_key":'

Example 5: Race condition scenarios

states:
  race_attack:
    race:
      threads: 20
    
    request: |
      POST /api/redeem-coupon HTTP/1.1
      ...
    
    extract:
      # Only extract discount if successful
      discount_applied:
        type: jpath
        pattern: "$.discount"
        when:
          on_status: 200
      
      # Track failures separately
      already_used:
        type: jpath
        pattern: "$.error"
        when:
          on_status: 409
          on_body:
            pattern: "already.*used"

Implementation Details

Supported when Conditions

All condition types from state transitions should be supported:

Condition Description Example
on_status HTTP status code(s) on_status: 200 or on_status: [200, 201]
on_header Response header pattern on_header: {name: "Content-Type", pattern: "json"}
on_body Response body pattern on_body: {pattern: "success"}
on_response_time Response time threshold on_response_time: {less_than: 100}

Behavior Specification

When condition is TRUE:

  • Extractor executes normally
  • If extraction fails, behavior depends on extractor configuration
  • Extracted value is added to context

When condition is FALSE:

  • Extractor is skipped entirely
  • No error is raised
  • No value is added to context
  • Variable remains undefined (or keeps previous value if already set)

When condition is UNDEFINED:

  • If when block is not present → Always execute (backward compatible)
  • If when block is present but evaluates to undefined → Skip extraction

Error Handling

extract:
  token:
    type: jpath
    pattern: "$.token"
    when:
      on_status: 200
    required: true  # Fail if condition is true but extraction fails
  • required: true + condition FALSE → No error (extractor skipped)
  • required: true + condition TRUE + extraction fails → Error raised

Use Cases

1. Multi-status Response Handling

# Current approach (problematic)
extract:
  token:
    type: jpath
    pattern: "$.token"  # Fails on error responses!

# Proposed approach (clean)
extract:
  token:
    type: jpath
    pattern: "$.token"
    when:
      on_status: 200
  
  error:
    type: jpath
    pattern: "$.error"
    when:
      on_status: [400, 401, 403, 500]

2. Rate Limit Handling

extract:
  # Extract data on success
  results:
    type: jpath
    pattern: "$.results"
    when:
      on_status: 200
  
  # Extract rate limit info on 429
  retry_after:
    type: header
    pattern: "Retry-After: (\\d+)"
    when:
      on_status: 429

3. Race Condition Analysis

states:
  race_coupon:
    race:
      threads: 50
    
    extract:
      # Count successes
      coupon_success:
        type: jpath
        pattern: "$.success"
        when:
          on_status: 200
      
      # Count rate limits (shouldn't happen in race!)
      rate_limited:
        type: literal
        value: true
        when:
          on_status: 429
      
      # Count conflicts
      already_used:
        type: literal
        value: true
        when:
          on_status: 409

4. Content-Type Negotiation

extract:
  # Try JSON first
  data_json:
    type: jpath
    pattern: "$.data"
    when:
      on_header:
        name: "Content-Type"
        pattern: "application/json"
  
  # Fallback to XML
  data_xml:
    type: xpath
    pattern: "//data"
    when:
      on_header:
        name: "Content-Type"
        pattern: "application/xml"

Benefits

  1. Cleaner Code: No need for complex regex patterns that handle all cases
  2. Better Error Messages: Extractors only run when expected to succeed
  3. Performance: Skip unnecessary extraction attempts
  4. Flexibility: Handle multiple response types in single state
  5. Debugging: Clear which extractors ran vs were skipped
  6. Race Conditions: Better analysis of different outcomes per thread

Backward Compatibility

Fully backward compatible

  • Existing configurations without when blocks continue to work unchanged
  • when block is optional
  • Default behavior (no when) = always execute (current behavior)

Related Features

This feature would pair well with:

  • State-level when blocks (already implemented)
  • Conditional state transitions (already implemented)
  • Logger conditional output (could be a future enhancement)

Example: Complete Login Flow

states:
  login:
    request: |
      POST /api/login HTTP/1.1
      Content-Type: application/json
      
      {"username": "{{ username }}", "password": "{{ password }}"}
    
    extract:
      # Success extractors
      session_token:
        type: jpath
        pattern: "$.token"
        when:
          on_status: 200
      
      user_id:
        type: jpath
        pattern: "$.user.id"
        when:
          on_status: 200
      
      # Error extractors
      error_message:
        type: jpath
        pattern: "$.error.message"
        when:
          on_status: [400, 401, 403]
      
      # Rate limit extractors
      retry_after:
        type: header
        pattern: "Retry-After: (\\d+)"
        when:
          on_status: 429
      
      rate_limit_remaining:
        type: header
        pattern: "X-RateLimit-Remaining: (\\d+)"
        when:
          on_status: 429
    
    logger:
      on_state_leave: |
        Status: {{ status }}
        {% if session_token %}
        ✅ Login successful! Token: {{ session_token[:16] }}...
        {% elif error_message %}
        ❌ Login failed: {{ error_message }}
        {% elif retry_after %}
        ⚠️  Rate limited. Retry after {{ retry_after }} seconds
        {% endif %}
    
    next:
      - on_status: 200
        goto: authenticated_action
      - on_status: 401
        goto: login_failed
      - on_status: 429
        goto: rate_limited

Implementation Checklist

Phase 1: Core Functionality

  • Add when field to ExtractorConfig model
  • Implement condition evaluation in extractor execution
  • Support on_status conditions
  • Support on_header conditions
  • Support on_body conditions
  • Update extractor execution logic to check conditions
  • Skip extractor if condition is false

Phase 2: Advanced Features

  • Support on_response_time conditions
  • Support multiple conditions (AND logic)
  • Add required field interaction with when
  • Logging for skipped extractors (debug mode)

Phase 3: Documentation

  • Update extractors.rst documentation
  • Add examples to README
  • Update schema validation
  • Add tests for conditional extractors
  • Update PortSwigger lab examples

Phase 4: Validation

  • Schema validation for when blocks
  • Error messages for invalid conditions
  • Warning for unreachable extractors (always false condition)

Testing Strategy

# Test case: Multiple conditions
test_conditional_extractors:
  request: GET /test
  
  extract:
    success_token:
      type: regex
      pattern: "token=([a-z]+)"
      when:
        on_status: 200
    
    error_code:
      type: regex
      pattern: "error=([0-9]+)"
      when:
        on_status: 400
  
  assertions:
    # On 200: success_token defined, error_code undefined
    # On 400: success_token undefined, error_code defined
    # On 500: both undefined

Alternative Designs Considered

Alternative 1: Separate extractor lists per status

extract:
  on_success:  # Only on 2xx
    token: {type: jpath, pattern: "$.token"}
  on_error:    # Only on 4xx/5xx
    error: {type: jpath, pattern: "$.error"}

Rejected because:

  • Less flexible (can't use custom conditions)
  • Harder to read for complex scenarios
  • Doesn't support other condition types

Alternative 2: Post-processing filter

extract:
  token:
    type: jpath
    pattern: "$.token"
  
filter_extractions:
  - if: "{{ status != 200 }}"
    remove: ["token"]

Rejected because:

  • Extraction still happens (performance cost)
  • More complex syntax
  • Errors still occur during extraction

Questions for Discussion

  1. Should we support OR logic for multiple conditions?

    when:
      any_of:
        - on_status: 200
        - on_status: 201
  2. Should we support negation?

    when:
      not:
        on_status: [400, 401, 403]
  3. Should skipped extractors be logged in verbose mode?

  4. Should we allow extractors to reference other extracted values in conditions?

    when:
      on_variable:
        name: "content_type"
        pattern: "json"

References

  • Current when block implementation in state transitions
  • Extractor documentation: docs/source/extractors.rst
  • Schema: schema/treco-config.schema.json
  • Related: PortSwigger time-sensitive lab (needs conditional extraction)

Priority: Medium-High
Complexity: Medium
Breaking Changes: None (fully backward compatible)
Related Labs: Time-sensitive, Rate-limit-bypass (would benefit from this)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions