Code for paper LLM with Relation Classifier for Document-Level Relation Extraction
You can install require packages for the RC stage by running the following command:
conda create -n lmrc python=3.9
conda activate lmrc
pip install -r requirements.txtFor the RCP stage, the requirements are:
PyTorch
transformers
numpy
apex=0.1
opt-einsum=3.3.0
wandb
ujson
tqdm
Please refer to ATLOP for further information. It should be noted that the apex package will report an error when using pip. It is recommended to clone the repository and install it manually.
The DocRED dataset can be downloaded following the instructions at link. The Re-DocRED dataset can be downloaded from link (You need to replace the original DocRED file with the 'revised' version in Re-DocRED). The expected structure of files is:
LMRC
|-- dataset
| |-- docred
| | |-- train_annotated.json
| | |-- train_distant.json
| | |-- dev.json
| | |-- test.json
| | |-- rel_info.json
| |-- re-docred
| | |-- train_revised.json
| | |-- dev_revised.json
| | |-- test_revised.json
| | |-- rel_info.json
For your convenience, the two datasets are also available in the data/Relation Candidate Proposal directory.
Run convert_docred_to_llm.py first to generate data for finetuning your model (Alternatively, you can use the latest uploaded data in the data folder).
Finetuning:
cd RC
bash finetune-lora.shTrain the Roberta model on DocRED/RE-DocRED with the following command:
cd RCP
bash run_roberta.shBy modifying '--load_dath', it can be converted to testing on dev/test set. Then, run generate_for_step2.py to convert structured output into natural language text encapsulated with prompt template. generate_for_step2.py will also output an 'idx.json', which is used to determine the document number corresponding to each query.
Use the converted RCP results as input for the RC stage:
cd RC
bash eva-for-lora.shRun convert_llm_result_to_docred to format the plain text output of the RC stage. 'idx.json' comes from generate_for_step2.py.
You can evaluate the structured results with evaluation.py, which is modified from link
If you find this work useful, please cite:
@article{li2024llm,
title={Llm with relation classifier for document-level relation extraction},
author={Li, Xingzuo and Chen, Kehai and Long, Yunfei and Zhang, Min},
journal={arXiv preprint arXiv:2408.13889},
year={2024}
}