Skip to content

Releases: HeardLibrary/linked-data

VanderBot v1.3 release

07 Aug 03:11
b649779

Choose a tag to compare

Three major changes were made in this version update.

  • In order to remove the limitation on the size of SPARQL queries made by HTTP GET, the queries are now made by POST.

  • The metadata mapping file csv-metadata.json was changed so that when used to generate RDF, the form of IRI objects would be correct (stop appending a base IRI) and the form of Wikidata statement IRIS was corrected (add the Q ID before the UUID). This correction required a small change to the Wikidata API-writing script (vb6_upload_wikidata.py).

  • The csv-metadata.json file was modified use virtual columns to add the links from statement instances to the statement objects (http://www.wikidata.org/prop/statement/ or ps: namespace predicates) for statements that had IRI objects. Could not yet figure out how to create virtual columns for statements with literal objects.

VanderBot v1.2 release

19 Jul 00:30

Choose a tag to compare

This release is primarily focused on making changes necessary in order for the JSON schema to accurately generate RDF from the source CSV that corresponds to the RDF that is present in the Wikidata triplestore.

Changes:

  • The data type for dates was changed from 'date' to 'dateTime' since all dates in Wikidata are converted into datetimes. This prevents generating an error if the schema is used to convert the CSV data directly to RDF.

  • The method of indicating that a value is a URL was changed from providing an anyURI datatype in the schema to using a string datatype and a valueUrl where the entire string is substituted within the curly brackets. This situation is detected when the first character in the valuUrl is '{'. This change was necessary in order to make the csv2rdf schema correctly generate RDF that matches the RDF provided by the SPARQL endpoint. Previously, the generated RDF would have have a literal value datatyped as 'anyURI', while the SPARQL endpoint would have a non-literal value.

  • The leading + required for dateTime values by the Wikidata API has been removed from the data in the CSV table and added or removed as necessary by the software prior to interactions with the API.

  • The requirement that there be a value for every reference and qualifier property was removed.

  • Changed handling of the alias column so that the JSON schema will produce valid RDF consistent with the Wikibase model.

Not yet fixed:

  • The handling of aliases needs to be moved to a separate CSV file to allow for many aliases per item. The column mappings could be included in the same JSON schema as the main CSV file since the standard allows for multiple CSV tables.

  • Currently, the only dates that would be mapped correctly by the schema are dates to the day (level 11). Years (level 9) or months (level 10) are handled by the script correctly, but would not be correctly mapped in the JSON file to generate correct RDF. The solution is to always represent the dates as dateTimes regardless of the level and add another column with the level of the date. The schema would need to be adapted to model this correctly and the code would need to be amended to pick up this relationship.

Actual VanderBot v1.1 release

23 Apr 14:50

Choose a tag to compare

Forgot to push the actual changes before creating the release! :-(

VanderBot v1.1 release

23 Apr 14:35
18ea351

Choose a tag to compare

This is a minor version upgrade that adds support for suffixes (Jr., III, etc.)

VanderBot v1.0 release

20 Apr 18:01
18ea351

Choose a tag to compare

This is a fully functional and tested version of the collection of scripts that make up VanderBot. It's been used to make over 1000 Wikidata edits since v0.9.

Major changes:

  • combined scripts and moved most from Jupyter notebooks to stand-alone scripts
  • cleaned up the script that matches employees with Wikidata to make it more extensible.
  • added a script to check for label/description collisions with existing items
  • moved common code to a module that can be loaded by any of the scripts (most notably the Query() class definition)

VanderBot v0.9 release

12 Apr 15:54

Choose a tag to compare

This release contains functional versions of three Python scripts:

  • Researcher/scholar data harvesting and preparation by department script: process_department.ipynb
  • Generic script to write to the Wikidata API using CSV data mapped using a JSON schema based on the W3C Generating RDF from Tabular Data on the Web Recommendation: process_csv_metadata_full.py
  • Generic script to add references to existing statements (also using the W3C-based schema): add_missing_references.py

An example schema is csv-metadata.json. An example department configuration file (needed to drive the harvesting script) is department-configuration.json

These scripts are undergoing testing in anticipation of releasing V1.0 .