This proof of concept was created to explore automated bias detection in textual content — specifically research projects. The goal is to identify gender bias and discriminatory language so that reviewers and organisations can take corrective action before publication or hiring decisions.
The key research question: can we reliably detect subtle bias in text, and how do lightweight keyword methods compare to ML-based approaches?
The project was built incrementally, with each phase adding a new capability on top of the previous one.
The foundation: a TextPreprocessor class wrapping NLTK and spaCy to handle cleaning (URL/email/HTML removal), tokenisation, lemmatisation, stopword removal, POS tagging, and named entity recognition. This layer feeds clean text to every downstream component.
The first detection approach (BiasDetector) uses curated keyword dictionaries and regex patterns to find:
- Gender bias — imbalances in male/female keyword counts, stereotypical role associations (e.g. leader→male, nurse→female), explicit discriminatory patterns.
- Discriminatory language — categorised by type: age, race, disability, appearance.
Dictionaries were built for both English and Spanish, stored in a shared bias_keywords.py module. A positive-context mechanism was added to discount false positives when phrases like "gender equality" or "diversity and inclusion" appear nearby.
The second approach (MLBiasDetector) combines three signal sources:
- Fine-tuned classifier —
valurank/distilroberta-bias, a DistilRoBERTa model fine-tuned for binary bias detection (English only). - Zero-shot classification — BART (
facebook/bart-large-mnli) for English, XLM-RoBERTa (joeddav/xlm-roberta-large-xnli) for Spanish, categorising text against bias labels. - Keyword/pattern analysis — the same shared dictionaries from Phase 2, ensuring consistency between approaches.
The final bias score is a weighted combination of all three signals.
Two CLI scripts were created to demonstrate real-world usage on PDF documents:
keyword_analysis.py— runs keyword detection per pageml_analysis.py— runs ML detection per page
| Decision | Rationale |
|---|---|
| Dual approach (keyword + ML) | Keywords are fast, transparent, and need no GPU; ML catches subtler patterns. Comparing both reveals detection coverage and agreement. |
| distilroberta-bias as fine-tuned model | Purpose-built for bias detection, lightweight enough for PoC use, good accuracy on English text. |
| BART + XLM-R for zero-shot | BART performs well on English zero-shot; XLM-R extends coverage to Spanish without fine-tuning. |
| Shared bias_keywords module | Single source of truth for keyword dictionaries ensures consistency between BiasDetector and MLBiasDetector. |
| Positive-context discounting | Reduces false positives when bias-related words appear in anti-bias contexts (e.g. "promoting gender equality"). |
| PDF per-page analysis | Many real documents are PDFs; per-page granularity lets reviewers pinpoint where bias occurs. |
The fine-tuned bias classifier (distilroberta-bias) only works for English. For Spanish, we had to rely on zero-shot classification with XLM-RoBERTa, which is less precise for domain-specific bias categories. Building separate keyword dictionaries for each language was also labour-intensive and would need native-speaker validation.
Early versions flagged any text mentioning gender-related terms as biased — including passages actively promoting equality. The positive-context phrase mechanism was introduced to address this, discounting scores by up to 60% when anti-bias language is present. Tuning these discount thresholds required iterative testing on real documents.
The two detection approaches don't always agree. Keywords catch explicit patterns (e.g., "only men should apply") that ML sometimes under-weights, while ML catches implicit bias that keywords miss entirely.
Transformer models (especially BART-large) require significant memory. The pipeline was designed to fall back to CPU gracefully, and the lightweight BiasDetector was maintained as a no-GPU alternative. Batch size configuration allows tuning for available hardware.
PDF text extraction (via PyPDF2) produces inconsistent results depending on document formatting — scanned PDFs, multi-column layouts, and embedded tables all degrade extraction quality. Per-page analysis helps contain these issues, but preprocessing still needs to handle noisy input gracefully.
Mapping continuous bias scores to discrete severity levels (none / low / medium / high / critical) required experimentation. Thresholds that worked well for English texts didn't generalise to Spanish texts, leading to language-aware severity mapping in the shared keywords module.
The PoC is functional and demonstrates both detection approaches on real PDF documents. The sample scripts produce per-page analysis with JSON exports.
- Additional language support beyond English and Spanish.
- Fine-tuning a bias classifier on Spanish data.
- Interactive web dashboard for non-technical reviewers.
- Benchmark datasets for systematic accuracy evaluation.
- Domain-specific keyword dictionaries (e.g., medical, legal).