Created for @fhooe by Marcel Skumantz
Please go ask @fhooe
open(path)opens a string as a dom node and returns that nodeopen_url(path, encoding='utf-8'opens an url as a dom node and returns that nodenode.find_child_tags(tag)returns a list of children with the provided tag in the contentnode.find_child_tags_by_pattern(tag, pattern)returns a list of children with the provided tag in the content and matches their content against the provided patternnode.attrreturns a dictionary of attributesnode.contentreturns the content of the tagnode.typeretuns the type of the tag (div, p, etc...)
import RegexWebParser as rwp
root = rwp.open_url('https://en.wikipedia.org/wiki/Python_(programming_language)')
captions = root.find_children("caption")
for caption in captions:
print(caption)