A multi-feature NLP application built with Streamlit and powered by modern Transformer models. Supports both English and Portuguese (PT-BR).
| Feature | Description | Models / Libraries |
|---|---|---|
| 📝 Text Summarization | Abstractive summarization via Hugging Face Inference API | BART-large-CNN (EN), PTT5-base-summ-xlsum (PT-BR) |
| 🏷️ Named Entity Recognition | Entity extraction with displaCy visualization and CSV export | spaCy en_core_web_sm (18 entity types), pt_core_news_sm (4 types) |
| ✂️ Text Chunking | Fixed-size and semantic chunking with side-by-side comparison | LangChain RecursiveCharacterTextSplitter, HF embeddings |
| 🔗 Semantic Similarity | Cosine similarity between texts with language-aware model selection | BERTimbau (PT-BR), MiniLM / MPNet (multilingual) |
app.py # Entry point — st.navigation multi-page routing
pages/
summarization.py # HF Inference API summarization
ner.py # spaCy NER + displaCy rendering
chunking.py # Fixed-size & semantic chunking
similarity.py # Sentence-transformer similarity
utils/
config.py # Centralized models, colors, thresholds, example texts
hf_client.py # Shared InferenceClient with retry logic
spacy_models.py # Cached spaCy model loading
- Streamlit — Multi-page UI with
st.navigation - Hugging Face Inference API — Serverless model inference (free tier)
- spaCy 3.7 — NER pipelines + displaCy visualization
- LangChain — Text splitters for chunking strategies
- NumPy / Pandas — Similarity metrics and data export
- Python 3.11+
- A Hugging Face API token (free)
git clone https://github.com/marinaramalhete/NLP-Text-Summary-App.git
cd NLP-Text-Summary-App
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtCreate a .streamlit/secrets.toml file with your HF token:
HF_TOKEN = "hf_..."streamlit run app.pyThis app is designed for Streamlit Community Cloud. To deploy your own instance:
- Push the repo to GitHub
- Go to share.streamlit.io and connect your repo
- Add
HF_TOKENin the app's Secrets settings - Deploy