EccLexica : Christian Architectural Discourse Detection in Latin Texts

This repository contains a dictionary-based method for identifying architectural discourse in Late Antique and Medieval Latin texts. It is applied to a test corpus of Latin inscriptions.

The workflow combines a dictionary-based approach with an evaluation against GLiNER named-entity recognition models.

EccLexica is being developed as part of the ANR E-cclesia project.

Repository Structure

.
├── data/
│   └── dicts
│     └── auto_terms.csv
│     └── asso_terms.csv
│     └── mat_terms.csv
│   └── sample.csv
│
├── output/
│
├── arch-score.py
├── gliner-eval.py
└── README.md

Data

Main dataset

The main dataset is not stored in this repository due to its size.

Please download EDCS_text_cleaned_2022-09-12.json from Zenodo:

👉 https://zenodo.org/records/7072337

After downloading, place the file in the following location:

data/EDCS_text_cleaned_2022-09-12.json

Sample dataset (data/sample.csv)
- 100 inscriptions.
- Used for testing and GLiNER evaluation.

Scripts

`arch-score.py`

Computes an architectural concentration score (0–100) for each inscription based on the presence and distribution of architectural vocabulary.

Main steps:

Filtering
- Date range: 399–1199 CE.
- Excludes a predefined list of eastern provinces.
Lemmatization
- Uses spaCy’s Latin model (la_core_web_md) on cleaned interpretive text.
Scoring
- Uses three term categories:
  - Autonomous (e.g. basilica, ecclesia)
  - Associative (e.g. murus, porta)
  - Material (e.g. marmor, aurum)
- Score combines:
  - Term count
  - Co-occurrence of term types
  - Proximity of terms in the text
  - Term density
Output generation
- Full scored dataset.
- Score-based subsets (per 10-point bin).
- High-score subset (score > 50).

`gliner-eval.py`

Evaluates GLiNER NER models against the dictionary-based architectural terms.

What it does:

Loads a scored sample dataset from output/sample.csv.
Runs multiple GLiNER models on:
- Lemmatized text
- Cleaned interpretive text
Compares GLiNER-extracted entities with dictionary-based terms.

Evaluated labels:

building/type
building/part
building/material

Metrics computed:

Precision
Recall
F1 score
Percentage and count of overlapping terms
Number of inscriptions with at least one shared term

Requirements

Python 3.9+
Core libraries:
- pandas, numpy, scikit-learn
- spacy (+ la_core_web_md)
- gliner

Typical Workflow

Run arch-score.py to filter inscriptions and compute architectural scores.
Inspect or subset results in the output/ directory.
Run gliner-eval.py on a scored sample to compare dictionary-based detection with GLiNER models.

Notes

Scores are heuristic and intended for comparative and exploratory analysis, not absolute classification.

Citations

Heřmánková, Petra. “EDCS”. Zenodo, September 12, 2022. https://doi.org/10.5281/zenodo.7072337.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EccLexica : Christian Architectural Discourse Detection in Latin Texts

Repository Structure

Data

Scripts

`arch-score.py`

`gliner-eval.py`

Requirements

Typical Workflow

Notes

Citations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
data		data
README.md		README.md
arch-score.py		arch-score.py
gliner-eval.py		gliner-eval.py

Folders and files

Latest commit

History

Repository files navigation

EccLexica : Christian Architectural Discourse Detection in Latin Texts

Repository Structure

Data

Scripts

arch-score.py

gliner-eval.py

Requirements

Typical Workflow

Notes

Citations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`arch-score.py`

`gliner-eval.py`

Packages