Cultural Personality Analyzer

Live Demo

Frontend: https://culture-identifier.vercel.app API: https://culture-identifier-backend.onrender.com/api/health

Match your writing style and intellectual voice to iconic French philosophers, writers, and artists — using sentence embeddings and cosine similarity, no API key required.

1. Problem

Most personality tools map you to archetypes or trait dimensions. This system asks a different question: whose intellectual lineage does your writing reflect?

Given a text sample, it identifies which French cultural figures — Sartre, Camus, Flaubert, Baudelaire, and others — your writing most resembles. The match is driven not by topic overlap but by stylistic and semantic fingerprints: sentence rhythm, argumentative structure, lexical density, rhetorical register, and thematic preoccupations. A short paragraph about everyday frustration can reveal whether you write like an existentialist or a naturalist.

2. Architecture

User text input
      |
      v
[Sentence Embedder]
  paraphrase-multilingual-mpnet-base-v2
  Encodes input into a 768-dim semantic vector
      |
      v
[FAISS Vector Store]
  Pre-indexed embeddings of cultural figures'
  representative texts (essays, letters, excerpts)
      |
      v
[Cosine Similarity Search]
  Ranks all indexed figures by distance to user vector
      |
      v
[CulturalAnalyzer]
  Extracts thematic keywords, generates 2D PCA projection
  for visualization, builds match explanation
      |
      v
[FastAPI Response]
  Ranked matches + similarity scores + style summary

The backend is fully stateless per request. All embeddings are computed once at startup and held in memory via FAISS; query time is sub-100ms after the model loads.

3. Design Decisions

Why sentence-transformers? Sentence-level embeddings capture semantic and stylistic intent better than token-level models for this task. paraphrase-multilingual-mpnet-base-v2 handles both English and French source texts, which matters because the reference corpus includes original French works.

Why these cultural figures? The reference set covers distinct intellectual styles: Sartre's dense phenomenology, Camus's lucid prose, Proust's exhaustive introspection, Baudelaire's lyrical intensity, Flaubert's clinical realism. They were chosen to maximize differentiation in embedding space, not for encyclopedic coverage.

FastAPI over Flask The initial prototype was Flask (reflected in legacy README badges). The production backend uses FastAPI for async request handling, automatic OpenAPI docs at /docs, and Pydantic model validation. The CPU-bound embedding inference runs synchronously inside async endpoints — acceptable given model warm-up at startup.

FAISS over brute-force numpy Even with a small reference corpus (~20 figures), FAISS gives a consistent interface if the corpus grows. Index build time at startup is negligible.

4. Tech Stack

Component	Technology
Backend	Python 3.12, FastAPI, Uvicorn
NLP Model	`sentence-transformers` (multilingual-mpnet)
Vector Search	FAISS
Similarity	Cosine similarity + PCA projection
Validation	Pydantic v2
Logging	Loguru
Frontend	Vanilla JS, Nginx (served separately)
Container	Docker multi-stage, non-root user

5. Running Locally

Without Docker

git clone https://github.com/Aliipou/culture-identifier.git
cd culture-identifier
python -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install -r requirements.txt
uvicorn backend.app.main:app --host 0.0.0.0 --port 8000 --reload

API available at http://localhost:8000 — interactive docs at http://localhost:8000/docs.

With Docker Compose

docker compose up --build

Backend on port 8000, frontend (Nginx) on port 3000.

6. API Example

Analyze a text sample:

curl -X POST http://localhost:8000/api/analyze \
  -H "Content-Type: application/json" \
  -d '{
    "text": "The absurdity of existence does not negate our freedom to choose. On the contrary, it is precisely because nothing is predetermined that every choice carries its full weight.",
    "top_k": 3,
    "mode": "standard"
  }'

Example response:

{
  "matches": [
    {
      "name": "Albert Camus",
      "similarity": 0.91,
      "themes": ["existential", "rational"],
      "style_summary": "Existential framing, short declarative sentences, use of paradox"
    },
    {
      "name": "Jean-Paul Sartre",
      "similarity": 0.84,
      "themes": ["existential", "political"]
    }
  ],
  "projection": [...],
  "processing_time_ms": 42.3
}

Health check:

curl http://localhost:8000/health
# {"status": "ok"}

7. Limitations

Language bias: The embedding model handles multilingual input but was trained primarily on English and major European languages. Short or highly idiomatic texts in less-represented languages will produce less reliable matches.
Small reference corpus: ~20 cultural figures is enough to show the concept but not enough for confident disambiguation. Figures with similar styles (e.g., Sartre and de Beauvoir) will produce close similarity scores.
No stylometric features: Matches are purely embedding-based. Classic stylometric signals — sentence length distribution, function word frequencies, punctuation patterns — are not explicitly modeled; they influence the embedding only implicitly.
French-intellectual scope: The corpus covers 19th–20th century French philosophy and literature. Writers from other traditions will find their closest match within this set, which may not be meaningful.
Cold start: The model (~400MB) and FAISS index load at startup. First boot takes 30–60 seconds depending on hardware.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
.github		.github
analyzer		analyzer
backend/app		backend/app
data		data
docs		docs
frontend		frontend
screenshots		screenshots
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
render.yaml		render.yaml
requirements.txt		requirements.txt
test_system.py		test_system.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cultural Personality Analyzer

Live Demo

1. Problem

2. Architecture

3. Design Decisions

4. Tech Stack

5. Running Locally

6. API Example

7. Limitations

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Cultural Personality Analyzer

Live Demo

1. Problem

2. Architecture

3. Design Decisions

4. Tech Stack

5. Running Locally

6. API Example

7. Limitations

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages