Skip to content

Xcrap Parser is a declarative, model-driven parser for extracting data from HTML and JSON files, with the ability to interleave both to extract even more information.

Notifications You must be signed in to change notification settings

Xcrap-Cloud/parser-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Xcrap Parser

Xcrap Parser is a declarative, model-driven parser for extracting data from HTML and JSON files, with the ability to interleave both to extract even more information.

It is inspired by the parser embedded in the Xcrap Framework available for Node.js. It was built using Parsel for HTML parsing and JMESPath for JSON parsing.

Installation

pip install xcrap-parser

Simple Usage

from xcrap_parser import HtmlParsingModel

html = "<html><title>Title</title><body><h1>Heading</h1></body></html>"

root_parsing_model = HtmlParsingModel({
    "title": {
        "query": "title::text"
    },
    "heading": {
        "query": "h1::text"
    }
})

data = root_parsing_model.parse(html)

print(data)

About

Xcrap Parser is a declarative, model-driven parser for extracting data from HTML and JSON files, with the ability to interleave both to extract even more information.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages