Skip to content

Consolidate PBW URIs with hash-based URIs according to URI Minting Policy #82

@lu-pl

Description

@lu-pl

Instead of adapting URI generation logic in the PBW script, a viable option for URI consoldiation between old UUID-based PBW URIs and hash-based URIs according to the URI Minting Policy would be a SPARQL Update request that generates the respective URIs.

The following is a Perplexity-generated draft of such an Update request:

l SPARQL UPDATE: Replace URIs for crm:E21_Person entities
# whose identifier was assigned by the Prosopography of the Byzantine World.
#
# URI minting policy for E21_Person:
#   Hash input: LCASE(STRIP(Identifier)) + " / " + LCASE(STRIP(Service))
#   where Service = "https://pbw2016.kdl.kcl.ac.uk/"
#   New URI = <https://r11.eu/rdf/resource/{first 36 chars of SHA256 hex digest}>
#
# IMPORTANT: This query assumes your triple store supports SHA256() returning
# a hex string (SPARQL 1.1 standard). Some stores (e.g. Jena) return the hash
# with a "^^xsd:hexBinary" datatype and in uppercase — adjust STR()/LCASE()
# wrapping as needed for your store.
#
# Strategy:
#   1. Find each E21_Person that has an E15_Identifier_Assignment
#      carried out by the PBW agent, and retrieve the symbolic content
#      (the identifier string) from the associated E42_Identifier.
#   2. Compute the new URI per the minting policy.
#   3. For every triple where the old person URI appears as subject or object,
#      delete the old triple and insert a replacement with the new URI.
#
# NOTE: Run the SELECT query at the bottom first to preview the mappings
# before executing the DELETE/INSERT.

PREFIX crm:    <http://www.cidoc-crm.org/cidoc-crm/>
PREFIX rdf:    <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs:   <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd:    <http://www.w3.org/2001/XMLSchema#>
PREFIX r11res: <https://r11.eu/rdf/resource/>

### Step 1: Replace triples where the person URI is the SUBJECT

DELETE {
  ?person ?p ?o .
}
INSERT {
  ?newURI ?p ?o .
}
WHERE {
  # Identify persons with a PBW identifier assignment
  ?person a crm:E21_Person .

  # The E15_Identifier_Assignment that targets this person
  ?assignment a crm:E15_Identifier_Assignment ;
              crm:P140_assigned_attribute_to ?person ;
              crm:P14_carried_out_by ?agent ;
              crm:P37_assigned ?identifier .

  # The agent must be the Prosopography of the Byzantine World
  ?agent rdfs:label "Prosopography of the Byzantine World"@en .

  # The E42_Identifier and its symbolic content (the PBW identifier string)
  ?identifier a crm:E42_Identifier ;
              crm:P190_has_symbolic_content ?idValue .

  # All triples with the person as subject
  ?person ?p ?o .

  # Compute new URI per minting policy:
  # hash_input = lcase(strip(idValue)) + " / " + "https://pbw2016.kdl.kcl.ac.uk/"
  BIND(CONCAT(LCASE(xsd:string(?idValue)), " / ", "https://pbw2016.kdl.kcl.ac.uk/") AS ?hashInput)
  BIND(SHA256(?hashInput) AS ?fullHash)
  BIND(SUBSTR(STR(?fullHash), 1, 36) AS ?segment)
  BIND(IRI(CONCAT("https://r11.eu/rdf/resource/", ?segment)) AS ?newURI)

  # Only process persons whose URI actually needs changing
  FILTER(?person != ?newURI)
};

### Step 2: Replace triples where the person URI is the OBJECT

DELETE {
  ?s ?p ?person .
}
INSERT {
  ?s ?p ?newURI .
}
WHERE {
  ?person a crm:E21_Person .

  ?assignment a crm:E15_Identifier_Assignment ;
              crm:P140_assigned_attribute_to ?person ;
              crm:P14_carried_out_by ?agent ;
              crm:P37_assigned ?identifier .

  ?agent rdfs:label "Prosopography of the Byzantine World"@en .

  ?identifier a crm:E42_Identifier ;
              crm:P190_has_symbolic_content ?idValue .

  # All triples with the person as object
  ?s ?p ?person .

  BIND(CONCAT(LCASE(xsd:string(?idValue)), " / ", "https://pbw2016.kdl.kcl.ac.uk/") AS ?hashInput)
  BIND(SHA256(?hashInput) AS ?fullHash)
  BIND(SUBSTR(STR(?fullHash), 1, 36) AS ?segment)
  BIND(IRI(CONCAT("https://r11.eu/rdf/resource/", ?segment)) AS ?newURI)

  FILTER(?person != ?newURI)
};


### ---------------------------------------------------------------
### PREVIEW QUERY: Run this SELECT first to inspect the mappings
### before executing the UPDATE above.
### ---------------------------------------------------------------

# PREFIX crm:    <http://www.cidoc-crm.org/cidoc-crm/>
# PREFIX rdfs:   <http://www.w3.org/2000/01/rdf-schema#>
# PREFIX xsd:    <http://www.w3.org/2001/XMLSchema#>
#

SELECT ?person ?idValue ?hashInput ?newURI
WHERE {
  ?person a crm:E21_Person .

  ?assignment a crm:E15_Identifier_Assignment ;
              crm:P140_assigned_attribute_to ?person ;
              crm:P14_carried_out_by ?agent ;
              crm:P37_assigned ?identifier .

  ?agent rdfs:label "Prosopography of the Byzantine World"@en .

  ?identifier a crm:E42_Identifier ;
              crm:P190_has_symbolic_content ?idValue .

  BIND(CONCAT(LCASE(xsd:string(?idValue)), " / ", "https://pbw2016.kdl.kcl.ac.uk/") AS ?hashInput)
  BIND(SHA256(?hashInput) AS ?fullHash)
  BIND(SUBSTR(STR(?fullHash), 1, 36) AS ?segment)
  BIND(IRI(CONCAT("https://r11.eu/rdf/resource/", ?segment)) AS ?newURI)

  FILTER(?person != ?newURI)
}
ORDER BY ?idValue

I reviewed the query and find the URI generation to be sound. Also spot-checking the generated URIs against URIs produced by the Master Spreadsheet Conversion logic confirms that URI consolidation works as intended.

However, I would suggest INSERTing owl:sameAs insertions into a <https://r11.eu/rdf/resource/consolidation> named graph; this would avoid destructive modification of the main PBW graph and keep things separated.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions