Skip to content

Latest commit

 

History

History
94 lines (77 loc) · 3.38 KB

File metadata and controls

94 lines (77 loc) · 3.38 KB

Deep Span Representations for Named Entity Recognition

This page describes the materials and code for "Deep Span Representations for Named Entity Recognition".

Setup

Installation

Setup an environment and install the dependencies and eznlp according to README.

Download and process datasets

Download pretrained language models

Download the pretrained language models by transformers and save to assets/transformers.

git clone https://huggingface.co/google-bert/bert-base-uncased  assets/transformers/bert-base-uncased
git clone https://huggingface.co/google-bert/bert-base-cased    assets/transformers/bert-base-cased
git clone https://huggingface.co/google-bert/bert-large-uncased assets/transformers/bert-large-uncased
git clone https://huggingface.co/google-bert/bert-large-cased   assets/transformers/bert-large-cased
git clone https://huggingface.co/FacebookAI/roberta-base        assets/transformers/roberta-base
git clone https://huggingface.co/FacebookAI/roberta-large       assets/transformers/roberta-large
git clone https://huggingface.co/hfl/chinese-bert-wwm-ext       assets/transformers/hfl/chinese-bert-wwm-ext
git clone https://huggingface.co/hfl/chinese-macbert-base       assets/transformers/hfl/chinese-macbert-base
git clone https://huggingface.co/hfl/chinese-macbert-large      assets/transformers/hfl/chinese-macbert-large

Running the Code

For English datasets:

$ python scripts/entity_recognition.py @scripts/options/with_bert.opt \
    --dataset {ace2004 | ace2005 | genia | kbp2017 | conll2003 | conll2012} \
    --doc_level \
    --pre_subtokenize \
    --num_epochs 50 \
    --lr 2e-3 \
    --finetune_lr 2e-5 \
    --batch_size 48 \
    --num_grad_acc_steps 1 \
    --ck_decoder specific_span \
    --affine_dim 300 \
    --sb_epsilon 0.1 \
    --sse_no_share_weights_ext \
    --sse_no_share_interm2 \
    --sse_max_span_size {10 | 15 | 20 | 25} \
    --bert_arch RoBERTa_base \
    --use_interm2 \
    --hid_dim 400 \
    [options]

For Chinese datasets:

$ python scripts/entity_recognition.py @scripts/options/with_bert.opt \
    --dataset {WeiboNER | ResumeNER} \
    --pre_merge_enchars \
    --num_epochs 50 \
    --lr 2e-3 \
    --finetune_lr 2e-5 \
    --batch_size 48 \
    --num_grad_acc_steps 1 \
    --ck_decoder specific_span \
    --affine_dim 300 \
    --sb_epsilon 0.1 \
    --sse_no_share_weights_ext \
    --sse_no_share_interm2 \
    --sse_max_span_size {10 | 15 | 20 | 25} \
    --bert_arch MacBERT_base \
    --use_interm2 \
    --hid_dim 400 \
    [options]

See more details for options:

$ python scripts/entity_recognition.py --help