Skip to content

Add lenient parsing mode #96

@rock3r

Description

@rock3r

Right now, the parser will refuse to parse a feed that isn't perfectly adhering to specs. Alas, it happens that feeds out there can be incomplete and/or out of spec (e.g., specifying <itunes:explicit>explicit</itunes:explicit>). I think it would be very useful if we had a lenient parsing mode, which will create a Podcast element even if the feed is non-compliant.

This is useful both for users of the library that want to deal with the zoo of malformed feeds out there (e.g., a player app or an indexing service), and for our own validation feature (see #46). In all likelihood, we will need to create a set of "raw" models — e.g., a RawPodcast — which contains all the data the "real" model has, but without proper typing and checks. Those raw models are read with minimal transformations from the feed itself, and can be easily validated and converted into "real" models, if they're valid.

This would allow us to:

  • Read a close-to-the-ground-truth version of feeds, including incomplete, invalid ones
  • Extract the validation logic from the builders, and into validators
  • Transform the parsing logic to a three-step process:
    1. Read raw data
    2. Validate raw data
    3. Marshal valid raw data into final, properly typed models
    4. The final models could still be invalid (e.g., having no episodes) but they won't contain invalid data; having validation results being returned alongside the parsed model would allow users to decide whether they're ok with the feed being invalid, or attempt to remediate the issue with some custom logic
  • Exposing the validation step would then be trivial, and we can also expose a full list of validation issues in case there are any (API TBD, but something like a sealed ValidationResult may be what we need)

The changes in both the infrastructure and the APIs to achieve this are massive, but I think it's very important for real-life usage. I realised how much I needed this when I first used this library last week for some hacking around feeds merging, last week...

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions