NER & POS Tagging 📝✨

spaCy named-entity recogniser + PyTorch GRU part-of-speech tagger for automatic parsing

Overview

Recruiters spend too much time pulling structured data out of free-form CVs.
This repo tackles that in two steps:

Step	Model	Dataset	Goal
1 NER	spaCy custom NER	Entities (DataTurks ↗︎)	Find names, skills, colleges, emails, etc.
2 POS	GRU sequence tagger in PyTorch	BatteryData POS (HF Datasets ↗︎)	Provide syntactic features for downstream parsers.

The notebook (pos_ner.ipynb) walks through data prep, training, evaluation and saving the trained artefacts.

Project Structure

. ├── README.md ├── requirements.txt ├── run_notebook.py # scripts to run main .ipynb file ├── src/ │ ├── pos_ner.ipynb │ ├── dataset │ ├── trained_models

Setup

# 1. clone
git clone https://github.com/YOUR-USER/ner-pos-tagger.git
cd ner-pos-tagger
# 2. python env
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt   
# 3. run script
python3 run_notebook.py

Results

1. Résumé NER (entity-level)

Entity	Precision	Recall	F1 score
Name	0.973	0.766	0.857
Email Address	0.800	0.778	0.789
College Name	0.429	0.350	0.385
Skills	0.261	0.273	0.267
Designation	0.613	0.355	0.450
Location	0.622	0.299	0.404

2. POS Tagging

Model	Accuracy	Precision	Recall	F1
GRU (ours)	89.41 %	0.898	0.894	0.893
Most-frequent-tag baseline	13.95 %	--	--	--

Take-away: the neural POS tagger lifts accuracy by +75 pp over the naive baseline, and the spaCy NER reaches up to 0.86 F1 on person names.

Road-map

Hyper-parameter search for NER (dropout, LR scheduler)
CRF layer on top of GRU for POS
Export both models as a single REST/Gradio micro-service

Contributing

PRs are very welcome! Please open an issue to discuss major changes first.

Fork → Commit → Pull Request
Follow black & ruff linting
Write / update unit tests where sensible

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NER & POS Tagging 📝✨

Table of Contents

Overview

Project Structure

Setup

Results

1. Résumé NER (entity-level)

2. POS Tagging

Road-map

Contributing

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run_notebook.py		run_notebook.py

Folders and files

Latest commit

History

Repository files navigation

NER & POS Tagging 📝✨

Table of Contents

Overview

Project Structure

Setup

Results

1. Résumé NER (entity-level)

2. POS Tagging

Road-map

Contributing

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages