Skip to content

Jossifresben/peshitta

Repository files navigation

Peshitta Triliteral Root Finder

A bilingual (Spanish/English) web application for researching the Syriac Peshitta New Testament through its triliteral root system. Enter a Syriac root in simplified Latin transliteration and find every word form and verse occurrence in the 22-book traditional canon, along with Hebrew and Arabic cognates, semantic outlier detection, interactive cross-root semantic bridges, passage constellation visualizer, and a methodology page documenting the Semitic exegesis approach.

Python Flask License DOI

Features

  • Root search — Enter a triliteral root (e.g., K-T-B) and find all derived word forms across 7,440 verses
  • Semitic sound correspondence — Searching S-L-M automatically finds SH-L-M (Arabic س ↔ Hebrew/Syriac שׁ and other regular correspondences)
  • Dual transliteration — Academic (š, ḥ, ṭ, ṣ, ʾ, ʿ) and simplified Latin shown side by side
  • Verse popup — Click any reference to view the full Syriac verse with word highlighting, transliterations, and English/Spanish translations (WEB + Reina Valera 1909)
  • Hebrew & Arabic cognates — 397 root entries with 3,780 cognate words
  • Semantic outlier detection — AI-powered identification of 651 cognates that have drifted semantically from their root's core meaning
  • Semantic bridges — 363 cross-root connections showing how an outlier word's meaning belongs to another root family (click to expand in visualizer)
  • Root family visualizer — Interactive D3.js force-directed graph showing root families with Hebrew, Arabic, and Syriac cognates; fullscreen mode
  • Ficha de Raíz (Root Card) — Detailed root profile with paradigmatic citation (highlighted root word), semantic field summary (sabor de raíz), sister roots, cognate table, and translation shift analysis
  • Translation Shift — Three-card chain (Aramaic → Greek NT → Modern translation) showing how semantic range narrows through translation layers, with prose analysis of what was lost (generated by Claude Opus 4.6)
  • Greek NT parallels — 397 roots mapped to their Greek New Testament equivalents with full semantic range comparison and philosophical/cultural influence analysis
  • Passage Constellation — D3.js force-directed graph showing all roots in a passage (verse range), their cognates, semantic bridges between co-occurring roots, and sister-root connections
  • Methodology page — Overview of the Semitic exegesis method, explaining the triliteral root approach and translation degradation analysis
  • About page — Author bio, project links, and acknowledgments
  • Autocomplete — Type-ahead suggestions with automatic dash insertion
  • Bilingual UI — Full interface in Spanish (default) and English, with auto-detection from browser Accept-Language header
  • Four translation tracks — Verse translations in Spanish (RV1909), English (WEB), Hebrew, and Arabic (SVD)
  • Peshitta Reader — Interlinear chapter reader with clickable words for root lookup and compact degradation chain
  • QR-friendly URLs — Tab state (?tab=ficha) and translation preferences preserved in URL for sharing
  • Collapsible transliteration reference — 22-letter Syriac alphabet table for input guidance
  • Template inheritance — All pages extend base.html (Jinja2) with shared navigation, footer, and global JS
  • Olive & Stone palette — Unified color scheme with Inter font across all pages

Transliteration Input

Dashes are inserted automatically as you type. Special mappings:

Syriac Input Letter
ܚ KH Heth
ܫ SH Shin
ܬ TH Taw
ܨ TS Sadhe
ܥ E or O Ayin
ܐ ' (apostrophe) or A Alaph

All other letters use their standard Latin equivalent (B, G, D, H, W, Z, T, Y, K, L, M, N, S, P, Q, R). Alaph is displayed as ' (glottal stop) in output; typing A is accepted as an alias for backward compatibility.

Semitic Sound Correspondences

The app automatically resolves cross-language consonant equivalences when searching:

You type Finds Correspondence
S-L-M SH-L-M (peace) Arabic س → Syriac/Hebrew שׁ
S-M-E SH-M-E (hear) Arabic س → Syriac/Hebrew שׁ
TH-Q-N T-Q-N Arabic ث → Syriac ܛ

Supported pairs: S ↔ SH, TH ↔ T, D ↔ TH, TS ↔ S.

Quick Start

Requirements

  • Python 3.11+
  • Flask 3.0+

Installation

# Clone the repository
git clone https://github.com/Jossifresben/peshitta.git
cd peshitta

# Create a virtual environment (recommended)
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

Running the App

python -m peshitta_roots

The app will start on http://localhost:8080 and open your browser automatically.

Project Structure

peshitta/
├── README.md
├── requirements.txt
├── syriac_nt_traditional22_unicode.csv   # Corpus: 7,440 verses, UTF-8 Syriac
├── data/
│   ├── cognates.json          # 397 roots with cognates, outliers, bridges, Greek parallels, sabor de raíz, translation shifts
│   ├── i18n.json              # Spanish & English UI translations
│   ├── known_roots.json       # Curated root dictionary with glosses
│   ├── stopwords.json         # Function words excluded from indexing
│   └── translations.json      # Verse translations (WEB + RV1909 + Hebrew + Arabic)
├── peshitta_roots/
│   ├── __init__.py
│   ├── __main__.py            # CLI entry point
│   ├── app.py                 # Flask routes and API endpoints
│   ├── affixes.py             # Syriac prefix/suffix stripping
│   ├── characters.py          # Transliteration maps, parsing, sound correspondences
│   ├── cognates.py            # Cognate lookup engine with outlier/bridge support
│   ├── corpus.py              # CSV parser and word index
│   ├── extractor.py           # Root extraction engine
│   ├── glosser.py             # Morphological glossing & stem detection
│   ├── static/
│   │   ├── style.css
│   │   ├── js/
│   │   │   └── global.js      # Shared JS: settings dropdown, language/script/trans switching
│   │   ├── logo.svg
│   │   └── favicon.svg
│   └── templates/
│       ├── base.html          # Base template (Jinja2 inheritance): nav, footer, shared assets
│       ├── index.html         # Main search page
│       ├── browse.html        # Browse all roots
│       ├── read.html          # Interlinear chapter reader
│       ├── visualize.html     # D3.js root family visualizer + Ficha de Raíz
│       ├── constellation.html # Passage constellation: multi-root D3.js graph for verse ranges
│       ├── methodology.html   # Semitic exegesis methodology page
│       ├── about.html         # Author bio, project links, and acknowledgments
│       └── help.html          # Help & documentation page
├── scripts/
│   ├── expand_cognates.py       # Batch expand Hebrew/Arabic cognates via Claude API
│   ├── tag_outliers.py          # AI-powered semantic outlier detection
│   ├── generate_bridges.py      # Generate cross-root semantic bridges via Claude API
│   ├── generate_sabor_raiz.py   # Generate semantic field summaries via Claude Haiku
│   ├── generate_greek_parallels.py  # Generate Greek NT parallels + translation shift analysis via Claude Opus
│   ├── generate_hebrew_parallels.py # Generate Hebrew Bible parallel mappings
│   ├── generate_new_cognates.py # Add new root entries with cognates
│   ├── apply_priority1_fixes.py # Apply priority audit fixes with dictionary citations
│   ├── dedup_cognates.py        # Deduplicate cognate word entries
│   ├── flag_modern_hebrew.py    # Flag modern Hebrew words in cognate data
│   ├── fix_bridge_concepts.py   # Fix mismatched bridge concept text
│   ├── fetch_translations.py    # Utility to download Bible translations
│   ├── fetch_ot_translations.py # Fetch Old Testament translation data
│   └── convert_ot_text.py       # Convert OT text format for processing
└── docs/
    ├── API.md                 # API reference
    ├── ARCHITECTURE.md        # Architecture overview
    ├── DATA.md                # Data files reference
    ├── DEVELOPMENT.md         # Development guide
    ├── FRONTEND.md            # Frontend reference
    └── MODULES.md             # Python modules reference

Pages

Route Page Description
/ Search Root search with cognates, word forms, and verse references
/browse Browse Paginated list of all 2,535 roots sorted by frequency
/read Reader Interlinear chapter reader with clickable words
/visualize/<root_key> Visualize D3.js root family graph + Ficha de Raíz
/constellation Constellation Passage-level D3.js graph for a verse range
/methodology Methodology Semitic exegesis method overview
/about About Author bio, project links
/help Help How-to, settings, FAQ

API Endpoints

GET /

Main search page. Query parameters:

  • q — Root in Latin transliteration (e.g., K-T-B)
  • lang — Language: es (default) or en

GET /visualize/<root_key>

Interactive D3.js root family visualizer with semantic bridges and Ficha de Raíz.

  • tab — View: graph (default) or ficha
  • lang — Language: es (default) or en
  • trans — Translation language: es, en, he, ar
  • script — Transliteration script: latin, syriac, hebrew, arabic

GET /constellation

Passage constellation visualizer — shows all roots in a verse range as a force-directed graph.

  • book — Book name (e.g., Matthew)
  • chapter — Chapter number
  • v_start — Start verse
  • v_end — End verse (defaults to v_start)
  • lang, script, trans — Same as other pages

GET /methodology

Methodology page describing the Semitic exegesis method.

GET /about

About the author page with bio and project links.

GET /api/verse

Returns verse data as JSON.

  • ref — Verse reference (e.g., Matthew 1:1)
  • lang — Language for translations

GET /api/root-family

Returns full root family data for the visualizer, including cognates, outliers, semantic bridges, sabor de raíz, and Greek NT parallel with translation shift analysis.

  • root — Root key (e.g., K-TH-B or S-L-M)
  • lang — Language: es (default) or en
  • trans — Translation language: es, en, he, ar

GET /api/passage-constellation

Returns constellation data for a passage: roots, cognates, inter-root bridges, and sister-root connections.

  • book — Book name
  • chapter — Chapter number
  • v_start, v_end — Verse range
  • lang, script, trans — Same as other endpoints

GET /api/word-root

Returns root data for a clicked word in the reader, including compact degradation chain.

  • form — Syriac word form
  • lang — Language: es (default) or en

GET /api/suggest

Autocomplete suggestions.

  • prefix — Partial root input (e.g., K-T)

GET /api/roots

Paginated list of all roots sorted by frequency.

  • page — Page number (default: 1)
  • per_page — Results per page (default: 50, max: 100)

Corpus

The corpus is a UTF-8 Unicode Syriac dataset derived from the ETCBC/syrnt plain-text corpus, restricted to the traditional 22-book Peshitta NT canon:

Matthew, Mark, Luke, John, Acts, Romans, 1–2 Corinthians, Galatians, Ephesians, Philippians, Colossians, 1–2 Thessalonians, 1–2 Timothy, Titus, Philemon, Hebrews, James, 1 Peter, 1 John.

Important: This is NOT a diplomatic transcription of the Khabouris manuscript. It is a Unicode Syriac dataset for search, NLP, and indexing purposes.

Verse Translations

  • English: World English Bible (WEB) — public domain
  • Spanish: Reina Valera 1909 — public domain
  • Hebrew: Hebrew Modern translation
  • Arabic: Smith & Van Dyke (SVD)

Stats

  • 2,535 unique triliteral root patterns extracted
  • 397 roots with Hebrew/Arabic cognate data
  • 3,780 cognate words (1,929 Hebrew + 1,851 Arabic)
  • 397 Greek NT parallel mappings with translation shift analysis
  • 651 semantic outliers detected via AI
  • 363 semantic bridges across 207 root families
  • 15,261 unique surface word forms
  • 7,440 verses across 22 books
  • 4 verse translation tracks (Spanish, English, Hebrew, Arabic)

License

Apache License 2.0 — Copyright (c) 2026 Jose Fresco Benaim