Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .coveragerc
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ omit =
opensiddur/tests/*
*/__pycache__/*
*/test_*.py
opensiddur/importer/wlc/download_tanach.py

[report]
precision = 2
Expand Down
59 changes: 58 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,4 +47,61 @@ cross-reference mappings for changed files, and removes stale entries for
projects or files that no longer exist. It prints a per-project summary on
completion.

You must re-sync before running the compiler on any newly-added project.
You must re-sync before running the compiler on any newly-added project.

## Compilation (JLPTEI → compiled linear XML)

The compiler takes a `project/<name>/` file, resolves transclusions, annotations, and parallel texts,
and outputs a single “compiled” XML file that can be
converted into a final printable format (eg, PDF).

Example (compile `project/wlc/ruth.xml` to `compiled.xml`):

```bash
uv run python -m opensiddur.exporter.compiler \
--project wlc \
--file_name ruth.xml \
--output_file compiled.xml
```

Example with a settings YAML (controls project priorities, annotations, and optional parallel lookup):

```bash
uv run python -m opensiddur.exporter.compiler \
--project wlc \
--file_name ruth.xml \
--settings doc/exporter-settings.example.yaml \
--output_file compiled.xml
```

## TeX export (compiled XML → LuaLaTeX)

Convert the compiled XML file to LuaLaTeX using the `reledmac`/`reledpar` pipeline:

```bash
uv run python -m opensiddur.exporter.tex.latex \
compiled.xml \
--settings doc/exporter-settings.example.yaml \
--output compiled.tex
```

## PDF export (compiled XML → PDF)

Export directly to PDF (generates TeX internally, then runs LuaLaTeX/latexmk):

```bash
uv run python -m opensiddur.exporter.pdf.pdf \
--settings doc/exporter-settings.example.yaml \
compiled.xml \
output.pdf
```

Keep the intermediate TeX (helpful for debugging LaTeX issues):

```bash
uv run python -m opensiddur.exporter.pdf.pdf \
--settings doc/exporter-settings.example.yaml \
--keep-tex \
compiled.xml \
output.pdf
```
45 changes: 45 additions & 0 deletions doc/exporter-settings.example.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
---
# Example exporter settings file.
# This file includes every currently supported exporter settings
#
# Notes:
# - Project names must correspond to directories under `project/`.
# - `parallel.column_order` values: `primary_first` | `primary_last`
# - `typography.layout` values: `pages` | `pairs`
# - `typography.paper` values: `a4paper` | `letterpaper` | `legalpaper` | `a5paper` | `b5paper` | `executivepaper`

priority:
# Project priority for resolving `urn:x-opensiddur:` references to *texts* (transclusions)
# The first entry will typically be the project you are compiling.
# The lower priority projects can be used when the first project wants to import from others (eg, a siddur that borrows Tanach text from another source)
transclusion:
- wlc
- jps1917

# Project priority for resolving `urn:x-opensiddur:` references to *instructions*
instructions:
- wlc
- jps1917

# Projects from which to include annotations
# *All* annotations from the given projects will be included.
annotations:
- wlc
- jps1917

# Optional parallel-text configuration. Omit to disable.
parallel:
# Projects to search for parallel text content (in priority order)
projects:
- wlc
- jps1917
# Column order for parallel text display in printed output
column_order: primary_first

# Output-format settings consumed by the TeX/PDF stage only.
typography:
hebrew_font: "Frank Ruehl CLM"
latin_font: "Linux Libertine O"
layout: pairs
paper: letterpaper
fontsize: "11pt"
16 changes: 10 additions & 6 deletions opensiddur/common/xslt.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,15 +64,19 @@ def xslt_transform_string(

def xslt_transform(
xslt_file: Path,
input_file: Path,
output_file: Optional[Path] = None):

input_file: Path,
output_file: Optional[Path] = None,
*,
xslt_params: Optional[dict[str, Any]] = None,
):
try:
# Read the input XML
with open(input_file, 'r', encoding='utf-8') as input_fd:
input_xml = input_fd.read()

result = xslt_transform_string(xslt_file, input_xml)
input_xml = input_fd.read()

result = xslt_transform_string(
xslt_file, input_xml, xslt_params=xslt_params
)

if output_file:
with open(output_file, 'w', encoding='utf-8') as output_fd:
Expand Down
60 changes: 51 additions & 9 deletions opensiddur/exporter/README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
# Open Siddur Exporter

The exporter takes data in JLPTEI files and converts it into directly
The exporter takes data in JLPTEI files and converts it into directly
consumable formats, like PDF and HTML.

The exporter operates in two stages:
1. **Compilation**: Given a starting file and a settings file, generate a compiled pseudo-TEI file that includes all of the data needed to convert into a final format in a linear form. The compilation step is common to all output formats.
1. **Compilation**: Given a starting file and a settings file, generate a compiled pseudo-TEI file that includes all of the data needed to convert into a final format in a linear form. The compilation step is common to all output formats.
2. **Output format**: Given the compiled file, output to the consumable format. The current output formats are:
1. TeX typesetting system (XeLaTeX)
2. PDF, via XeLaTeX
1. TeX typesetting system (LuaLaTeX, via [`reledmac`](https://ctan.org/pkg/reledmac) + [`reledpar`](https://ctan.org/pkg/reledpar) for critical-edition apparatus and parallel-text alignment)
2. PDF, via the same LuaLaTeX pipeline

## Run the compiler

Expand All @@ -16,10 +16,17 @@ See

## Export to PDF

For TeX and PDF export, you will need an installation of XeLaTeX. See `install-tex.sh`.
For TeX and PDF export, you'll need a TeX Live install with the LuaLaTeX
pipeline (`lualatex`, `latexmk`, `biber`, `reledmac`, `reledpar`, `polyglossia`,
`biblatex`). On Debian/Ubuntu the installer script `install-tex.sh` covers it:

For round-trip command examples,
see `tei-to-pdf.sh`.
```bash
sudo bash opensiddur/exporter/tex/install-tex.sh
```

For round-trip command examples, see `scripts/tei-to-pdf.sh` — the same
`-s <settings-file>` flag drives both the compiler and the PDF stage, so any
typography settings in the YAML are forwarded to the LuaLaTeX preamble.

## Settings file

Expand Down Expand Up @@ -58,8 +65,43 @@ annotations:
- ...
```

From which projects should notes (such as editorial notes or commentary) be derived?
From which projects should notes (such as editorial notes or commentary) be derived?
Unlike instructions and transclusions, annotations are not in prioritized order; the annotations from all listed projects will be included when available.

### Parallel texts

```yaml
parallel:
projects:
- jps1917
column_order: primary_first # or primary_last
```

When the compiler builds a document, it also looks up matching content in
each of the listed `parallel` projects (by `corresp` URN) and emits
`p:parallel`/`p:parallelItem` blocks. The PDF stage feeds those blocks into
`reledpar` so the verses on each side stay aligned across page breaks.

`column_order: primary_first` puts the primary stream on the left page (or
left column for a `pairs` layout); `primary_last` swaps them.

### Typography (PDF/TeX stage only)

```yaml
typography:
hebrew_font: "Frank Ruehl CLM" # any installed OpenType font with Hebrew coverage
latin_font: "Linux Libertine O" # any installed OpenType font for the Latin stream
layout: pages # "pages" → facing pages; "pairs" → two columns/page
paper: a4paper # any \documentclass paper option
fontsize: 11pt # 10pt | 11pt | 12pt
```

The `typography` section is read by the PDF/TeX stage only; the linear-XML
compiler ignores it. Every key is optional — when the section (or any single
key) is omitted, the defaults shown above are used. Fonts that aren't found
on the system fall back to a sensible default automatically (`Ezra SIL` →
`SBL Hebrew` → `FreeSerif` for Hebrew).

## Settings file versioning
Note that this file is likely to change slightly in format when parallel texts are introduced.
Note that this file is likely to change slightly in format as more output
formats are introduced.
21 changes: 10 additions & 11 deletions opensiddur/exporter/compiler.py
Original file line number Diff line number Diff line change
Expand Up @@ -323,8 +323,9 @@ def _annotate(self, element: ElementBase, root: Optional[ElementBase] = None) ->
will continue to use this instruction as-is.

2. If the element has standoff annotation (a commentary or editorial note),
we need to determine which project's commentary set should be used to provide the corresponding commentary.
All selected commentaries should be loaded and returned.
we need to determine which project's commentary set should be used to provide
the corresponding commentary. All selected commentaries should be loaded and
returned.

Args:
element: The element to annotate
Expand Down Expand Up @@ -403,12 +404,13 @@ def _annotate(self, element: ElementBase, root: Optional[ElementBase] = None) ->
# For commentary/editorial notes, select all annotations for corresp or xml_id
# May be standoff annotation, or inline.
references = self._refdb.get_references_to(corresp, xml_id, project, file_name)
note_references = [r for r in references
note_references = [r for r in references
if r.element_tag =="{http://www.tei-c.org/ns/1.0}note"]
limited_references = self._urn_resolver.prioritize_range(note_references, self.linear_data.annotation_projects, return_all=True)

limited_references = self._urn_resolver.prioritize_range(
note_references, self.linear_data.annotation_projects, return_all=True)

result_elements = []
if limited_references: # Handle case where prioritize_range returns None
if limited_references:
for reference in limited_references:
processor = CompilerProcessor(
reference.project,
Expand All @@ -423,22 +425,19 @@ def _annotate(self, element: ElementBase, root: Optional[ElementBase] = None) ->
processed_element = processor.process(reference_element)
if not(reference.project == self.project and reference.file_name == self.file_name):
self._mark_file_source(processed_element, project=reference.project, file_name=reference.file_name)

# Check if language differs and add xml:lang if needed

annotation_lang = processor.root_language
insertion_context_lang = self._get_in_scope_language(element)
if annotation_lang and annotation_lang != insertion_context_lang:
processed_element.set('{http://www.w3.org/XML/1998/namespace}lang', annotation_lang)

result_elements.append(processed_element)
if result_elements:
annotation_command = _AnnotationCommand.INSERT
else:
annotation_command = _AnnotationCommand.NONE
return result_elements, annotation_command



@staticmethod
def _insert_first_element(element: ElementBase, new_child: ElementBase) -> ElementBase:
"""
Expand Down
Loading