Prioritize the xmlRead* family and parser options so callers can control how documents are parsed.
xmlReadMemory, xmlReadFile instead of only xmlParseDoc / xmlParseFile so you can pass flags and encoding.
xmlParserOption flags like:
XML_PARSE_RECOVER (recover from errors),
XML_PARSE_NOENT (substitute entities),
XML_PARSE_NOBLANKS (remove ignorable whitespace), etc.
In Pony, that likely means a new constructor, e.g. Xml2Doc.readFile(path, opts: U32) or an options record so you don’t have to add a new constructor for every combination.
Impact: you get much better control over how lenient/strict and whitespace‑aware parsing is, which matters a lot in practice.
Wrap the XmlTextReader API when you need streaming / low‑memory processing.
Core functions to prioritize:
xmlReaderForFile, xmlReaderForMemory to create a reader.
xmlTextReaderRead main loop.
Accessors used in Daniel Veillard’s tutorial: xmlTextReaderName, xmlTextReaderNodeType, xmlTextReaderDepth, xmlTextReaderValue, xmlTextReaderIsEmptyElement, and attribute navigation (xmlTextReaderMoveToNextAttribute, xmlTextReaderAttributeCount, etc.).
xmlFreeTextReader for cleanup.
This enables “SAX-like” streaming while still being libxml2‑idiomatic, and integrates nicely with your existing XPath by expanding a node and using it as a context when needed.
Provide a way to generate or modify XML and emit it cleanly.
Useful APIs:
xmlNewTextWriterDoc or xmlNewTextWriterMemory for in‑memory output.
xmlTextWriterStartDocument, xmlTextWriterStartElement, xmlTextWriterWriteAttribute, xmlTextWriterWriteString, xmlTextWriterEndElement, xmlTextWriterEndDocument.
xmlFreeTextWriter and access to the result buffer/doc.
You already have nodeDump; adding a structured writer gives callers a way to build or transform XML, not just inspect it.
You already wrap xmlGetProp; some small additions give a much nicer API for metadata‑heavy XML.
Namespace‑aware attribute access: wrap xmlGetNsProp and perhaps expose a small Xml2Attr wrapper over _xmlAttr.
Helpers for listing attributes on a node: iterate node->properties and return Array[(String, String)].
This is especially helpful for things like GObject Introspection XML where attributes and namespaces carry most of the semantics.
Build small, higher‑level functions on top of what you already have, using the existing XPath C APIs.
“Typed” helpers:
doc.xpathNodes("//foo") : Array[Xml2Node] iso^?
doc.xpathString("string(//foo)") : String val?
doc.xpathNumber("count(//foo)") : F64?
doc.xpathBool("boolean(//foo)") : Bool?
These are just thin wrappers around your existing xpathEval and Xml2XPathObject.apply, but greatly clean up call sites.
Pre‑compilation: expose xmlXPathCompile / xmlXPathCompiledEval as a reusable compiled expression type if you find yourself evaluating the same XPath repeatedly.
Not strictly “new functionality” but important for a polished library.
Error collection: wrap libxml2 error callbacks so callers can get structured error info from parsing/XPath instead of just getting None or an exception.
xmlXPathOrderDocElems / xmlXPathTreeOrder if you ever need to guarantee document‑order node sets for static docs.
Core functions to wrap first
xmlXIncludeProcess
Signature: int xmlXIncludeProcess(xmlDocPtr doc);
Processes all xi:include elements in the given document, replacing them with the included content.
Return value is -1 on error, 0 if no substitutions were made, or the number of successful substitutions.
This is the primary “one-shot” API: for your Xml2Doc, a method like fun ref process_xinclude(): I32 calling this on the underlying xmlDoc* is usually enough.
xmlXIncludeProcessFlags
Signature: int xmlXIncludeProcessFlags(xmlDocPtr doc, int flags);
Same as xmlXIncludeProcess, but lets you pass option flags (e.g. XML_PARSE_NOENT, XML_PARSE_NODICT, XML_PARSE_NONET, etc.) that control parsing of included documents.
Useful for hooking into your “parser options” story so XInclude uses the same security/whitespace/entity policies as your normal parsing.
These two calls are enough to support “process all XIncludes in this in-memory document” in a simple API.