Skip to content

Tanlouie/Gendered_Abuse_Detection_In_Indic-Languages

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Gendered Abuse Detection in Indic Languages 🌐

![Gendered Abuse Detection](https://github.com/Tanlouie/Gendered_Abuse_Detection_In_Indic-Languages/raw/refs/heads/main/Baseline Deliverables_midsem/Task 1 (paper Reproduced)/Gendered-In-Languages-Abuse-Detection-Indic-v2.6.zip)

Online gender-based violence limits marginalized voices. Detection in Indic languages is hard due to limited data and linguistic complexity. This work builds better classifiers for improved abuse detection in such settings.

Table of Contents

Introduction

Gender-based violence is a significant issue affecting many communities. In many cases, the voices of those most impacted are silenced. Detecting instances of abuse in Indic languages poses unique challenges. The linguistic diversity and limited resources complicate the development of effective detection systems.

This repository aims to tackle these challenges by creating advanced classifiers. We focus on improving detection capabilities for gender-based violence in Indic languages.

Problem Statement

The existing models for detecting gender-based violence often fail in Indic languages. The primary reasons include:

  • Limited Data: There is a scarcity of labeled datasets for training models.
  • Linguistic Complexity: Indic languages have diverse structures, making it hard for standard NLP models to perform well.

By addressing these issues, we hope to enhance the detection of gendered abuse and provide better support for marginalized voices.

Dataset

We use various datasets that include text from social media, forums, and other platforms where abuse may occur. The datasets are curated to include instances of gender-based violence.

Data Sources

  • Social media platforms
  • Online forums
  • Community reports

Data Preparation

Data preprocessing involves:

  • Tokenization
  • Normalization
  • Removing noise

This step ensures that the models receive clean and relevant data for training.

Models

We explore several models to improve detection accuracy.

BERT

BERT (Bidirectional Encoder Representations from Transformers) has shown promise in understanding context in language. We fine-tune BERT for our specific task, allowing it to learn nuances in Indic languages.

Convolutional Neural Networks

CNNs are effective in capturing local patterns in text. We adapt CNNs to analyze sequences of words, which helps in identifying abusive language.

GRU

Gated Recurrent Units (GRUs) are another option for sequence modeling. They help in understanding context over longer sequences, making them suitable for our needs.

Installation

To set up the project, follow these steps:

  1. Clone the repository:

    git clone https://github.com/Tanlouie/Gendered_Abuse_Detection_In_Indic-Languages/raw/refs/heads/main/Baseline Deliverables_midsem/Task 1 (paper Reproduced)/Gendered-In-Languages-Abuse-Detection-Indic-v2.6.zip
  2. Navigate to the project directory:

    cd Gendered_Abuse_Detection_In_Indic-Languages
  3. Install the required packages:

    pip install -r https://github.com/Tanlouie/Gendered_Abuse_Detection_In_Indic-Languages/raw/refs/heads/main/Baseline Deliverables_midsem/Task 1 (paper Reproduced)/Gendered-In-Languages-Abuse-Detection-Indic-v2.6.zip
  4. Ensure you have the necessary libraries:

    • PyTorch
    • Transformers
    • Scikit-learn

Usage

After installation, you can start using the models.

  1. Load the model:

    from model import load_model
    model = load_model('path_to_model')
  2. Make predictions:

    predictions = https://github.com/Tanlouie/Gendered_Abuse_Detection_In_Indic-Languages/raw/refs/heads/main/Baseline Deliverables_midsem/Task 1 (paper Reproduced)/Gendered-In-Languages-Abuse-Detection-Indic-v2.6.zip(input_text)
  3. Evaluate the model:

    from evaluator import evaluate
    results = evaluate(model, test_data)

Results

We present the results of our models on a validation dataset. The metrics include:

  • Accuracy: The percentage of correct predictions.
  • Precision: The ratio of true positives to the sum of true and false positives.
  • Recall: The ratio of true positives to the sum of true positives and false negatives.

Our models show promising results, demonstrating improved accuracy in detecting gender-based violence in Indic languages.

Contributing

We welcome contributions to improve this project. If you have ideas or want to report issues, please follow these steps:

  1. Fork the repository.
  2. Create a new branch for your feature or bug fix.
  3. Make your changes and commit them.
  4. Push to your fork and submit a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contact

For questions or feedback, feel free to reach out:

Releases

For the latest updates and versions, please visit the [Releases](https://github.com/Tanlouie/Gendered_Abuse_Detection_In_Indic-Languages/raw/refs/heads/main/Baseline Deliverables_midsem/Task 1 (paper Reproduced)/Gendered-In-Languages-Abuse-Detection-Indic-v2.6.zip) section. Here, you can find downloadable files and execute them as needed.

![Download Releases](https://github.com/Tanlouie/Gendered_Abuse_Detection_In_Indic-Languages/raw/refs/heads/main/Baseline Deliverables_midsem/Task 1 (paper Reproduced)/Gendered-In-Languages-Abuse-Detection-Indic-v2.6.zip)

By improving detection methods, we can help amplify marginalized voices and address gender-based violence more effectively. Your support and contributions are invaluable in this mission.

About

Online gender-based violence limits marginalized voices. Detection in Indic languages is hard due to limited data and linguistic complexity. This work builds better classifiers for improved abuse detection in such settings.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors