Skip to content

lukas-r/cc-extraction-framework

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cc-extraction-framework

Java framework for extracting and processing isA-pairs from CommonCrawl data. Contains classes for distributed extraction, entity disambiguation of tuple entities, building local taxonomies from isA-pairs and merging these into bigger global taxonomies. Functionality for exporting tuples and taxonomy graphs also included.

Dependencies

This framework is compatible with extractor classes from the Web Data Commons Extraction Framework. It therefore includes dependencies which can be resolved by importing the WCD Extraction Framework.

To store data in SQLite Databases, this project relies on sqlite-jdbc, which needs to be imported in order to use it.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages