Skip to content

Harden WAF ETL pipeline #4598

@btylerburton

Description

@btylerburton

User Story

In order to harvest WAF sources effectively and at scale, datagovteam would like to harden the current WAF ETL pipeline.

Acceptance Criteria

[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]

  • GIVEN [a contextual precondition]
    [AND optionally another precondition]
    WHEN [a triggering event] happens
    THEN [a verifiable outcome]
    [AND optionally another verifiable outcome]

Background

[Any helpful contextual notes or links to artifacts/evidence, if needed]

Security Considerations (required)

[Any security concerns that might be implicated in the change. "None" is OK, just be explicit here!]

Sketch

  • add record partition logic into harvesting logic repo
  • benchmark and report metrics on traversal and download ( how many files vs how long it took ). total processing time.
  • get number of WAF harvest sources
  • consider implementing download xml inside traversal instead of separate function depending if performance impact is noticeable

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    🗄 Closed

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions