Releases: Spico197/CatalogExtraction
Releases · Spico197/CatalogExtraction
Data v1 baseline reproduction patch
- Text concatenation data and preprocessing script for classification pipeline
- Tagging data and preprocessing script for tagging baseline
Model v1
Data v1
ChCatExt: Containing BidAnn, FinAnn and CreRat as the paper demonstrates. The containingDomainMixfolder is the concatenation of three -domains (i.e. the whole ChCatExt dataset).ChCatExtForPipelinesBaseline: For reproducing pipeline baseline.DataForAnalysisExp: For reproducing analysis experiments.Wiki: Wikipedia data for pretraining WikiBert.OriginalRawData: Raw files, including HTMLs and PDFs.