LLM with Relation Classifier for Document-Level Relation Extraction

Code for paper LLM with Relation Classifier for Document-Level Relation Extraction

Abstract

Large language models (LLMs) have created a new paradigm for natural language processing. Despite their advancement, LLM-based methods still lag behind traditional approaches in document-level relation extraction (DocRE), a critical task for understanding complex entity relations within long context. This paper investigates the causes of this performance gap, identifying the dispersion of attention by LLMs due to entity pairs without relations as a key factor. We then introduce a novel classifier-LLM approach to DocRE. Particularly, the proposed approach begins with a classifier designed to select entity pair candidates that exhibit potential relations and then feed them to LLM for final relation classification. This method ensures that the LLM's attention is directed at relation-expressing entity pairs instead of those without relations during inference. Experiments on DocRE benchmarks reveal that our method significantly outperforms recent LLM-based DocRE models and narrows the performance gap with state-of-the-art BERT-based models.

Set up

Since the environments for Relation Candidate Proposal (RCP) and Relation Classification (RC) differ significantly, we recommend setting them up separately.

1. Environment for RCP

Please refer to the RCP folder for the specific requirements. You can set up the environment by running:

# Create a new environment
conda create -n rcp python=3.9
conda activate rcp

# Install dependencies
cd RCP
pip install -r requirements.txt

⚠️ Note: Since the required version of apex (apex==0.1) is relatively old, you may encounter compatibility or installation issues during setup. In such cases, you can refer to this link for solutions.

2. Environment for RC

The RC stage relies on LLaMA-Factory. Please enter the RC directory, clone the repository, and install the environment as follows:

# Create a new environment
conda create -n rc python=3.10
conda activate rc

# Install dependencies
cd RC
git clone https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
pip install -e .[metrics]

Data

All datasets and fine-tuning data required for this project are provided in the data directory. Please refer to the data folder for detailed information regarding the file structure and contents.

Run Experiments

The experiment pipeline consists of three main stages: RCP, RC, and Evaluation.

1. Relation Candidate Proposal (RCP)

Navigate to the RCP directory and run the Roberta model:

cd RCP
bash run_roberta.sh

Note: The training and inference modes share the same script. You can switch modes by modifying the --load_path argument in run_roberta.sh.

After obtaining the model predictions, convert the results into the format required for the next step (step 2):

python generate_for_step2.py

This script generates a JSON output containing the data and attaches the corresponding title to each entry to facilitate subsequent evaluation.

2. Relation Classification (RC)

Fine-tuning

First, fine-tune your model using the data provided in the data folder (alternatively, you can use construct_prompt.py to construct the data yourself).

We use LLaMA-Factory for fine-tuning. Follow these steps:

Prepare Data: Copy your fine-tuning data to the LLaMA-Factory/data folder.
Register Data: Add your dataset definition to LLaMA-Factory/data/dataset_info.json.
Configure: Create a config.yaml based on the template found in LLaMA-Factory/examples/train_lora/llama3_lora_sft.yaml.
Run Training:

cd LLaMA-Factory
llamafactory-cli train examples/train_lora/your_config.yaml

Inference

Once fine-tuning is complete, configure the inference settings by modifying infer/config.yaml. Then, run the inference script:

cd RC
bash infer.sh

3. Evaluation

Finally, map the plain text output from the LLM to a structured format suitable for evaluation, and then run the evaluation script.

# Convert LLM text output to structured format
python map_results.py

# Evaluate the results
python evaluation.py

Citation

If you find this work useful, please cite:

@article{li2024llm,
  title={Llm with relation classifier for document-level relation extraction},
  author={Li, Xingzuo and Chen, Kehai and Long, Yunfei and Zhang, Min},
  journal={arXiv preprint arXiv:2408.13889},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM with Relation Classifier for Document-Level Relation Extraction

Abstract

Set up

1. Environment for RCP

2. Environment for RC

Data

Run Experiments

1. Relation Candidate Proposal (RCP)

2. Relation Classification (RC)

Fine-tuning

Inference

3. Evaluation

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
RC		RC
RCP		RCP
data		data
image		image
.gitattributes		.gitattributes
README.md		README.md
construct_prompt.py		construct_prompt.py
evaluation.py		evaluation.py
map_results.py		map_results.py

Folders and files

Latest commit

History

Repository files navigation

LLM with Relation Classifier for Document-Level Relation Extraction

Abstract

Set up

1. Environment for RCP

2. Environment for RC

Data

Run Experiments

1. Relation Candidate Proposal (RCP)

2. Relation Classification (RC)

Fine-tuning

Inference

3. Evaluation

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages