Skip to content

ArminRmt/NER-POS-Tagger

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NER & POS Tagging 📝✨

spaCy named-entity recogniser + PyTorch GRU part-of-speech tagger for automatic parsing Made with spaCy PyTorch

Table of Contents


Overview

Recruiters spend too much time pulling structured data out of free-form CVs.
This repo tackles that in two steps:

Step Model Dataset Goal
1 NER spaCy custom NER Entities (DataTurks ↗︎) Find names, skills, colleges, emails, etc.
2 POS GRU sequence tagger in PyTorch BatteryData POS (HF Datasets ↗︎) Provide syntactic features for downstream parsers.

The notebook (pos_ner.ipynb) walks through data prep, training, evaluation and saving the trained artefacts.


Project Structure

. ├── README.md ├── requirements.txt ├── run_notebook.py # scripts to run main .ipynb file ├── src/ │ ├── pos_ner.ipynb │ ├── dataset │ ├── trained_models


Setup

# 1. clone
git clone https://github.com/YOUR-USER/ner-pos-tagger.git
cd ner-pos-tagger
# 2. python env
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt   
# 3. run script
python3 run_notebook.py

Results


1. Résumé NER (entity-level)

Entity Precision Recall F1 score
Name 0.973 0.766 0.857
Email Address 0.800 0.778 0.789
College Name 0.429 0.350 0.385
Skills 0.261 0.273 0.267
Designation 0.613 0.355 0.450
Location 0.622 0.299 0.404

2. POS Tagging

Model Accuracy Precision Recall F1
GRU (ours) 89.41 % 0.898 0.894 0.893
Most-frequent-tag baseline 13.95 % -- -- --

Take-away: the neural POS tagger lifts accuracy by +75 pp over the naive baseline, and the spaCy NER reaches up to 0.86 F1 on person names.



Road-map


  • Hyper-parameter search for NER (dropout, LR scheduler)

  • CRF layer on top of GRU for POS

  • Export both models as a single REST/Gradio micro-service



Contributing


PRs are very welcome! Please open an issue to discuss major changes first.

  1. Fork → Commit → Pull Request

  2. Follow black & ruff linting

  3. Write / update unit tests where sensible


About

spaCy entity recogniser + PyTorch GRU POS-tagger — end-to-end parsing pipeline

Topics

Resources

Stars

Watchers

Forks

Contributors