LLM with Relation Classifier for Document-Level Relation Extraction

Code for paper LLM with Relation Classifier for Document-Level Relation Extraction

Set up

Prepare Environment for the RC Stage

You can install require packages for the RC stage by running the following command:

conda create -n lmrc python=3.9
conda activate lmrc
pip install -r requirements.txt

Prepare Environment for the RCP Stage

For the RCP stage, the requirements are:

PyTorch
transformers
numpy
apex=0.1
opt-einsum=3.3.0
wandb
ujson
tqdm

Please refer to ATLOP for further information. It should be noted that the apex package will report an error when using pip. It is recommended to clone the repository and install it manually.

Datasets

The DocRED dataset can be downloaded following the instructions at link. The Re-DocRED dataset can be downloaded from link (You need to replace the original DocRED file with the 'revised' version in Re-DocRED). The expected structure of files is:

LMRC
 |-- dataset
 |    |-- docred
 |    |    |-- train_annotated.json        
 |    |    |-- train_distant.json
 |    |    |-- dev.json
 |    |    |-- test.json
 |    |    |-- rel_info.json
 |    |-- re-docred
 |    |    |-- train_revised.json        
 |    |    |-- dev_revised.json
 |    |    |-- test_revised.json
 |    |    |-- rel_info.json

For your convenience, the two datasets are also available in the data/Relation Candidate Proposal directory.

Run Experiments

Finetune your model for RC

Run convert_docred_to_llm.py first to generate data for finetuning your model (Alternatively, you can use the latest uploaded data in the data folder).

Finetuning:

cd RC
bash finetune-lora.sh

For RCP

Train the Roberta model on DocRED/RE-DocRED with the following command:

cd RCP
bash run_roberta.sh

By modifying '--load_dath', it can be converted to testing on dev/test set. Then, run generate_for_step2.py to convert structured output into natural language text encapsulated with prompt template. generate_for_step2.py will also output an 'idx.json', which is used to determine the document number corresponding to each query.

For RC

Use the converted RCP results as input for the RC stage:

cd RC
bash eva-for-lora.sh

Evaluate

Run convert_llm_result_to_docred to format the plain text output of the RC stage. 'idx.json' comes from generate_for_step2.py.

You can evaluate the structured results with evaluation.py, which is modified from link

Citation

If you find this work useful, please cite:

@article{li2024llm,
  title={Llm with relation classifier for document-level relation extraction},
  author={Li, Xingzuo and Chen, Kehai and Long, Yunfei and Zhang, Min},
  journal={arXiv preprint arXiv:2408.13889},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM with Relation Classifier for Document-Level Relation Extraction

Set up

Prepare Environment for the RC Stage

Prepare Environment for the RCP Stage

Datasets

Run Experiments

Finetune your model for RC

For RCP

For RC

Evaluate

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
RC		RC
RCP		RCP
data		data
.gitattributes		.gitattributes
README.md		README.md
convert_docred_to_llm.py		convert_docred_to_llm.py
convert_llm_result_to_docred.py		convert_llm_result_to_docred.py
evaluation.py		evaluation.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

LLM with Relation Classifier for Document-Level Relation Extraction

Set up

Prepare Environment for the RC Stage

Prepare Environment for the RCP Stage

Datasets

Run Experiments

Finetune your model for RC

For RCP

For RC

Evaluate

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages