You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue serves as the ✨ latest Archipelago Roadmap 🗺 , shared in tandem with the 🎉 1.5.0 Release in June 2025. This Roadmap list includes completed core features (many carried over from Archipelago's first release!), works-in-progress, and future to-do's. This Roamdap is open for public evaluation and comments. We @alliomeria and @DiegoPino will pop in to update this Roadmap with completed and new items from time to time as Archipelago keeps sailing onwards. ⛵🌊
JSON Flatten Keys as a field Property(only with values)
JSONPATH/JMESPATH
JMESPATH supports string, ISO8601 and EDTF date casting with ranges (NEW)
JMESPETH expressions for JSON Key Name providers can be super super long, multiple ones and include filters
Entity Reference Casting Provider (Using UUID loading and configurable entity type) using JSON based hints to expose any semantic relationship to Search API.
JSON stored Service Endpoints with extended logic (e.g HOCR) - A.k.a Strawberry Flavor Data Source.
Multi Map/ join: many properties to single. e.g All keys - Authorities- referring to creators, contributors etc unified as Agents keys. This leads to Fractal Ontologies and our Buckets approach.
File downloads and streaming
Ranged Request Streamer with back-to-front S3 management and buffer/memory management. For any exposed Binary Endpoint. Also streaming
New!: Additional download endpoint for Files that uses UUID ADO caster and a simpler route structure
Strawberry Flavor Data structures can now hold NLP data and metadata
Strawberry Flavor Data structures and indexed Documents in Solr have cleanup on deletes and caching management
SMART (very) Breadcrumb generation with strategy selection (Longest Path, common repeating Path)
Composter: Special Queue for Garbage (temporary files) cleanup. With expiration time.
Search API Integration
Automatic re-tracking of Flavors and Source ADO Solr documents on changes/reindex. Any ADO persistence will trigger a reindex of OCR/etc to make sure parent properties are brought into the flavor doc.
Result payload reducer Processor. Drupal brings every field of every Solr document on return by default. On a per-view basis and using the analysis of the rendered output we reduce this to the bare minimum to have faster Solr responses and less memory usage
Advanced Highlighter. Smarter, faster Snippet Highlighter with Strawberry Flavor Linking capabilities and JOIN operations aware. Includes Lazy Loader to avoid Page level cache.
Flavor Aggregator. Harvests Strawberry flavors on index and attaches them back to an ADO to allow complex AND based queries against all ADOs (complements for some uses cases the new JOIN Views Filter)
Flavor Data Source Deposit/read as JSON File into the backend instead of Database. Will allow also easy edit of NLP data, OCR, etc.
SBF ADO NID configuration (source/join) for Flavor Aggregated Processor Field
Strawberry Flavor (e.g OCR) Join Views Filter (searching OCRs and returning ADOs with Highlighted OCR) allow for deeper nesting and more settings
Drupal Related Upgrades
Upgrade Event Subscriber code for new Symfony/Drupal 10
Upgrade Drush JSON API to Drush 12+/Drupal 10
Ajax/Facet/Views improvements + ML similarity contextual filter and facets and Drupal 10.3+ compatibility
ML Vector Fields
YOLO v8
MobileNet V3
InsightFace
Google ViT
Better Date Range Facets (with new Date Range Keyname provider, Solr Query method, and new type of Facet Widget that allows a Slider, an Histogram (with color settings!), manual input (per year) and precise full date input for start/end for thousands of individual dates in real time)
JSON representation and enrichment
Better File management (Better than Drupal)
File referencing via UUID instead of via Entity ID
Handle temporary files when moving from TEMP storage to PERMANENT
Increment file usage count on new versions
Decrement file usage count on version removal
Change file usage on Delete, EDIT on existing active content and versions
Add Webform based UI management (reorder, replace, delete) for files
File based Post processing
TECHMD (EXIF, MEDIAINFO, PDFINFO, IDENTIFY)
Pronom Service/Preservation
New JSON Service Architecture reference
Deposit/save on Node save whole, selfs sustainable Strawberry JSON blob in S3/Minio/FileSystem
Keep track of Service and action on Ingest/edit using Activity Streams
Add more agent information on our activity streams for provenance and tracking. AMI now also adds Set IDs
Add More Event Driven Subscribers. And better
Hook-able and override-able storage Pattern for files.
Selective size of TECHMD generation based on amount of files present on a single ADO
Automatic Cache clear of parent entities of Child entity persistence
Automatic deletion of Strawberry Flavor (OCR, etc) on ADO removal
New ap:tasks flag to control the source (JSON key) to be used as ADO/NODE title
Webforms integration
Webform Driven UI Ingest with custom handler and widget
Handler allows direct CRUD without any node attached and also pre-population of data using an existing node UUID @alliomeria we need docs here too 🥰 🥰
Create a set of Demo Webforms that cover base of our GLAM source data needs
Full Autosaving during Creation (sessions are kept alive for a week. Users can skip Steps, jump back and forth and Validation will still happen but at the end. Log out, come back, continue.
Allow Webform Field Widget selection be driven by RDF type and permissions.
Webform Widgets can start Open/Rendered or closed via settings and have "cancel edit" hiding to avoid users leaving the edit realm.
Solr Aware Entity Select Views (with code code to handle Solr to Entity)
Complex autocomplete elements (like get me all Digital Objects of Type Book with a green Cover the user can see)
Fine grained Entity (node to node) reference possible through this.
CSV to JSON importer element
XML to JSON importer element
Strawberry transplanter. Any JSON into filled Webform Elements (display) using a twig template.
Special Date element ISO8601, with Ranges, Single Dates and free form representation.
EDTF support for Special Date element ISO8601
Updates for better EDTF element UI
Create new, better, LoD Webform elements
WIKIDATA
LoC (with support for any Suggest endpoint)
LoC with support MADS RDF Types
WIKIDATA Agents with LD Roles
WIKIDATA using custom SPARQL
Viaf
EUROPEANA
SNAC/Orgs/Names/Family Names
MeSH (PubMed)
Multi Source, Multi Agent Element. Agents/Corporate can use now multiple Authority Controls.
Advanced Multi Source, Multi Agent Element --> more subfields for better IR use case compatibility and customizations
ORCID
Getty with exact and fuzzy search (updated to be better!)
Nominatim Geo reconciliation. Normal and Reverse.
Panorama Tour Building App (like 1200 lines of code, gosh!)
Image and EXIF extraction on upload for UI/facing previews.
Custom LoD vis CSV ADO Webform Element
GBIF entity/taxonomy autocomplete
Create Stub (temporary) WIKIDATA entities if query shows desired WIKIDATA entity does not exist upstream.
"publish" to wikibase functionality
Replace repo wide stub uri with official one once pushed.
Keep track on the stub who is referencing it is (bidirectional reference?)
Deal with as:documents, as:video, as:sound, as:dataset elements
Deal with as:models
Allow anonymous submits to be converted into proper Nodes by Admin (Self deposit, crowd sourced metadata) WOHO! This also allows self standing endpoints and custom mappings.
Make Webform API Interaction work with States(JS) by removing one From wrapper.
Make Webform API Interaction more versatile for our use. Use as schema validator. WIP. AMI.
Add JS to avoid main node CRUD to submit/validate embedded Webform as widget
Better handling of MultiStep Forms with direct links to others and final/before submit validation
Drupal Upgrades
Upgrade Event Subscriber code for new Symfony/Drupal 10
Data pre-check for compatibility against Webforms
TUS-PHP/JS Uploader for webform files
Media Displays Entities
Display settings, new tab that shows only the active View Mode for an ADO
Admin/contextual block that shows how ADO to Type was chosen by the system (admin hint)
Add expected mime/type output to Media displays. Allows to tag media displays as JSON, XML, CSV, JSON-LD or HTML only.
React to mime type to allow JSON or XML output to be downloaded too.
Native/self rendering and Content-Type tagging with caching.
Automatic extraction From template of required/used variables (context).
Additional Twig Context for LoD reconciled data on AMI Preview. Simplistic one (directly use) + one that adds the original label used to reconcile (for further processing)
JSON preview (when enabling native representation) is pretty formatted now
Webforms are injected as Context. So a Webform Element Title can be used to match its value.
AMI set id and URLs are injected as Context during batch ingest
Add new Data Views Plugin integration to allow Media Displays to preprocess values on views exposed as API endpoints
Version/Revision Media Display Entities (This is config, annotations and Update Hooks)
Inline Preview with ADO selection. Means users can see the data, test the data and see the output with Live Updates even without
Inline Preview with Validation of destination format
Preview more contextual data (e.g Original Data before an AMI update)
Per Metadata Display Extra data injection via any strawberry field that is added. @alliomeria we need docs!
Provide example Twig templates for
MODS
DC
JSON-LD
GEOJSON
IIIF Manifest 2.1
IIIF Manifest 3.0
EAD2002 (With recursive C Element generation from CSV)
EAD3 (With recursive C Element generation from CSV)
IIIF Manifests for Creative WorkSeries and Children based on Views
Carousel
OAI-PMH items, wrapper for Dublin Core
Metadata Display Exposed endpoints (reuse as Standalone API/download/streams)
API builder via UI using Endpoints. Any API, OAI, IIIF, etc. Allows a VIEW to be injected to feed data. Arguments are filtered and fully customizable. This uses OpenAPI and argument parsing too.
Flavor Search endpoint with coordinates using backend parsed Metadata display (JMESPATHS) to match complex front facing structures. Includes coordinates transformation from Source Space to Canvas Space.
IIIF Content Search API 2.0 based on the previous using also a Metadata display on the backend for the end result rendering. Mixing Strawberry flavor results + ADO level annotations with Compound awareness and Per Resource (ADO) or full Manifest Endpoints.
Use as:tasks new flag that allows any ADO's JSON to be transformed (again or for a first time) via a template. This means API level ingests, cleanups, etc.
Field Formatters
Static IIIF Images
New!: Universal Viewer (UV) Formatter.
Open Seadragon IIIF Images
W3C Web Annotations! Box and Polygon, fully IIIF compliant with CRUD endpoints. Caches until you are ready to save.
Face and polygon/edge detection (mid colors, highlights) via OpenCV and Web Worker
IIIF Manifest Paging Mode (from IIIF manifest) integration with IA Bookreader (left-right, etc).
Panorama via IIIF now with webGL max texture calculator and max Image size/memory preprocessing to avoid breaking Cantaloupe when using 400MP images.
Panorama Tours via other Panorama Objects and IIIF, including Hotspots of many types
Panorama Tour talks to maps sending NODE that is being presently displayed
Metadata up-casters
Metadata up-casters with download endpoint (Metadata Display Exposed endpoints)
Video (HTML5) with Subtitles (with grouping, multi Video, multi Subtitle)
In the case of single media/one VTT or more grouping is no longer needed.
Audio (HTML5) with Subtitles (with grouping, multi Audio, multi Subtitle)
In the case of single media/one VTT or more grouping is no longer needed.
PDF with multi file selection(custom, derived from the base PDF.js library. Not fancy. But Mozilla asks people to NOT use their fancy one directly and we agreed.
Web annotations (IIIF) with JMESPATH fine grained selector of which Files to attach
Complex nested structures (Whole graphs)
3D! (Three + JSM) with Full Material Support and UV Textures
3D UV Mapping using IIIF Sources and Scene/Light settings
3D Point Clouds from JSON or URLS
Mirador 3.1 (With Resource comparison and multi sourced IIIF manifests, using full release now)
Expose View Mode to JSON Type value mapping that triggers automatic View Mode Selection
Webrecorder.io native player (WARC replay) with WACZ capabilities version 1.3.2
CiteProc (Citation) Formatter with citation mode selection and JS injection
Lazy Image Loading via CSS class. JS driven, only loads (when used) Images when visible by the user (+100 px to give them some time to load while users navigate)
All formatters can handle Embargoes based on Time and IP address/ranges with caching. Includes alternative Source for Media when embargoed. Embargo info is passed to Templates too as an argument. Embargos are self un-caching to trigger regeneration of NODE displays.
All formatters can handle with JMESPATH fine grained selector of which Files to attach
Explicit "hide on embargo" checkboxes to all Field Formatters (means Viewers too) + File Embargo
API Ingest, Migration and backup
Strawberryfield Normalizer: expands JSON string as a JSON when exporting
Strawberryfield denormalizer: string-ify JSON when importing
Wrap JSONAPI on a set of Drush script to (Strawberry Seeds)
Allow Single command line invoke files and node ingest
Create virtual field Entity "bucket" to allow Media to be ingested into those as links and routed to internal Strawberryfield elements (utility methods for ingest)
AMI (Archipelago Multi Import)
API Source (Other repos, ContentDM, generic Solr)
API Source (ISLANDORA Solr)
Google Spreadsheets (same as IMI)
EAD Plugin to ingest new (using nested CSVs, for parent finding aids vs. child containers)
EAD Sync Plugin to update existing
Complete Drush 9 integration
AMI Set Entities
AMI Sets Entity processing via Batch or Enqueuing (for Hydroponics)
Separate processing for remote/single files allowing longer processing
AMI Sets Delete Ingested ADOs by this Set via batch (to clear and reingest)
LoD Reconciliation with complete per Label Processing and multiple Endpoint calls. Can be edited/refined and reused in a Metadata Display . Better and stronger
LoD can be provided/replaced via a Spreadsheet and will update the internal cached version
AMI Update action now can "replace, append, full update" with "keep files safe" addition
Reusable, canned public facing AMI ingest strategies. Users can only add the source data, all the rest is pre-setup.
S3 Sources for AMI
Mre/better fine grained permissions per AMI set.
Local file (server) Sources for AMI
Remote HTTP sources for AMI
ZIP Source)
UI improvements on AMI LoD Field display (thanks @alliomeria!)
Reports tab using Monolog and files. Includes Batch Item level reporting on errors and a final state.
Download + Clear Logs option for Reports tab
Folder as a source (on the works)
Vouchers
Filesystem drop-and-forget ingest. You save a JSON file into S3, Archipelago creates entities and relationships.
Use JSON API to allow seamless moving of dependent assets between repositories and also for backups. Script included
Service Architecture (Strawberry Runners)
Develop webhook driven notification service for derivatives
Custom, user facing Plugins. Build your own derivative workflows (system calls, JSON processing, etc)
Document/deploy webhook triggers for minio S3 per mimetype
Document/deploy webhook triggers for AWS S3 (via lambda) per mimetype
Develop Shell processing using Custom Plugins (Processors) and user configurable for each case (rule system)
Allow Processor to be chained! And have multiple outputs.
Queue-worker processing
Composter aware temporary files. Any Plugin can inform files it used and those will be composted. Smart "at then of a chain" composting.
New: Better process handling for Timeout-ed System services.
Generate JSON reference-able Services (plugins) for complex non descriptive metadata and data
HOCR with Language Detection (after), Language selection (via metadata) and better NLP
Full text from PDF to HOCR (miniOCR) via custom PDFAlto with language detection and NLP
HOCR of single images
In the presence of an HOCR XML(HTML really), OCR processor will try to use it instead of HOCRing again. Allows AMI HOCR migration in XML form from other sources
XML/TEXT/VTT/CSV extraction into Strawberry Flavors.
TECHMD
WACZ
New: ZIP streaming (WIP thanks to Mike @digitaldogsbody, I need to write TESTs yet for S3 version. Sorry!)
Web Annotations
Tabular datasets
Transcripts (similar to Web Annotations, mostly dependant)
File Conversions (any that your Shell allows) with re-ingest
Smart checks on existing processed output to avoid double processing.
~~ Build slim Content entity that can be used to index natively that content into Solr via search API ~~ This is now a fully capable Search API Datasource that can hold any output. one (node) to many (files) to even more sequences.
Allow Services to be self explaining of its capabilities. WIP how we expose this to the world. Probably GET will be allowed
Two Hydroponics approaches. Single Thread lineal one (default) and Multi Child, with how many children are spawn config. All using ReactPHP
Use ap:tasks to selectively skip Strawberry Runners. E.g a Manuscript might not need OCR.
Better/faster Language detection Service (Esmero NLP) built by the awesome @digitaldogsbody.
Language-aware OCR processing (single/multi language) from Metadata hints in ISO 639-3. Both( desired) and detected languages persistent as Strawberry Flavor data
NLP Cleanup using regular expressions.
For 1.5.0: Refactor and unify Ingest & Action queues
SEO and API
Allow Media displays output to be embeded in HTML head for SEO
Test/Develop nested DATA VIEWS integration for OAI-ORE and OAI-PMH (See Format Strawberryfield and API builder)
Create (TWIG, metadata displays) and expose as endpoints full set of IIIF API JSON outputs.
Add helper methods and twig extensions to allow Metadata displays to access pre existing views (like object listings for a collection) to help build those lists.
Fragaria Redirect/ PURL module. Allows custom/dynamic redirect (permanent and temporary) URLs to be build to redirect to ADOs.
ACL / Permissions
Integrate custom ACL with JSON Paths into per NODE ACL. Allowing this way to apply permissions to individual metadata elements/paths.
Embargoes with JSON key setup for dates/IPs (Individual and ranges). Includes Cron "release" system (deletes caches) and applies to Formatters, Metadata endpoints too
[x] Global Embargo options (conditions order and overrides)
Same but needs better UI for referenced Services and Media
Allow Metadata (rule) to trigger ACL permissions. e.g if embargo_date == bla bla = remove public access
Allow for ACL inheritance (from parent, recursive) without hard copies.
Views/Facet and Search API integrations (submodules)
Advanced Search Full text Views Filter. Multiple files, AND/OR/ etc
Date Range Picker Facet Widget/Processor
Case insensitive Facet Item removal Processor (better than the built in)
Facet Summary processor for no results with complex (no results! no facets!) processing and Full text query injection into the Facet Summary
!JOIN (ADOs + Strawberry Flavors) Filters. Allows JOIN queries between ADOs and OCR. Better suited for simpler Phrases and OR operations (given that an AND might exclude ADOs where a full set of many words has no exact match at Metadata or a given page level.
Map View Display that provides JS and Theming (a style plugin) using leaflet. Allows grouping and facets
Optimized JS loading on every page to avoid large/heavy pages
Solarium level hooks overrides to deal with both Solr 8 and 9, extra highlight data and JOIN/ deferred queries.
Convert hooks into events for Drupal 10/Latest Search API Solr.
ML/Similarity Model/processing for Images using Solr 9 Vector capabilities and Strawberry Runners + NLP docker container enhancements
Deployment and DevOps
Sync Configurations and remove non used ones for minio branch / periodic for each Drupal release
Site-build and remove orphan blocks
Add more utility views
Enable JSONAPI by default on minio branch
Create jsonapi user with jsonapi credentials for minio branch
Create basic scripts to automate Docker/Bash operations
Update AWS deployer to match minio including docs and Cloud Services integration
PHP 8.1 Containers, Cookie based, routed by NGINX
Natural Language processing Service via Docker update with new Language Capabilities and multi architecture
Cantaloupe 6.0.0 Pre Release
Redis integration for caching in Archipelago Deployment Live
Catmandu Docker container for large data mangling
Update all Strawberryfield modules script.
Drupal 9.5 and bumps on every module
Solr 8.11, MYSQL 8.
Solr 9.11 and new OCR Plugin - Schemas/field types/code awareness too
Archipelago Live with optimized folder structure and Production read AWS EC2 Docker deployment
Migrate/import Metadata Displays Script
Drupal 10 upgrade
For 1.5.0: Composer improvements/logging and new Search API Hydroponics indexer
Batch Operations
Bulk Batch Views PURE TEXT plugin to (All this via JSONPATCH so supports any operation)
Replace existing JSON values
Bulk Batch Views JSONPATH plugin to (All this via JSONPATCH so supports any operation)
Replace existing JSON values
Add to existing Values
Respect data type casted values, (entities, file references)
Bulk Batch Views Webform Element based plugin to replace existing JSON values using a Given Webform an a UX driven From/TO option
Bulk Batch Views MEDIA plugin to
Replace Media
Add Media
Bulk Batch Views ACL plugin to
Replace ACL and inheritance
Replace ACL individual Control List Elements
Add ACL individual Control List Elements
Integrate into Solr Results and Strawberryfield Taxonomy Term pages
CSV based export with selective type and AMI Set generation for future "Update" operation
Full facet (AJAX and none) integration with VBO enabled views. Allows for very fine grained Filters before applying a batch OP.
Strawberry Runners (selective) re-trigger via VBO.
Future roadmap
Solr Cloud/ Consortial ensemble
Native Wikibase/Wikidata publishing
ML/AI driven Vector Search using self trained models
Remote (across repositories) AMI ingest via JSON API
See also #5 and #35 and #79 and #80 and #103 and #172 and #190 and now #243 for a more detailed historical overview of Archipelago's roadmap evolution and task completion history.
This issue serves as the ✨ latest Archipelago Roadmap 🗺 , shared in tandem with the 🎉 1.5.0 Release in June 2025. This Roadmap list includes completed core features (many carried over from Archipelago's first release!), works-in-progress, and future to-do's. This Roamdap is open for public evaluation and comments. We @alliomeria and @DiegoPino will pop in to update this Roadmap with completed and new items from time to time as Archipelago keeps sailing onwards. ⛵🌊
Official Archipelago Logo
Strawberryfield
JSON representation and enrichment
ap:tasksflag to control the source (JSON key) to be used as ADO/NODE titleWebforms integration
as:imagesas:documents,as:video,as:sound,as:datasetelementsas:modelsMedia Displays Entities
CElement generation from CSV)CElement generation from CSV)as:tasksnew flag that allows any ADO's JSON to be transformed (again or for a first time) via a template. This means API level ingests, cleanups, etc.Field Formatters
API Ingest, Migration and backup
JSONAPIon a set of Drush script to (Strawberry Seeds)Service Architecture (Strawberry Runners)
ap:tasksto selectively skip Strawberry Runners. E.g a Manuscript might not need OCR.SEO and API
ACL / Permissions
[x] Global Embargo options (conditions order and overrides)
Views/Facet and Search API integrations (submodules)
ORoperations (given that an AND might exclude ADOs where a full set of many words has no exact match at Metadata or a given page level.Deployment and DevOps
Batch Operations
Future roadmap
Documentation:
See also #5 and #35 and #79 and #80 and #103 and #172 and #190 and now #243 for a more detailed historical overview of Archipelago's roadmap evolution and task completion history.