Skip to content

Improvements on Content Search API + Search API integrations #580

@DiegoPino

Description

@DiegoPino

What?

There are edge cases/things that are hard to track and need more info for users/more settings and more math on my part. Will explain

  • Depending on the Solr HL settings, the snippets for a word present many times in a body of text will/can contain multiple highlighted words, even if we have multiple "regions", they all share a snippet. That gives users the impression of duplicated results. Solution: when that happens, use better backend logic (at least for Content Search API, by tapping into the Solr Query hl settings) to reduce the size of the highlight. We use the "unified" one which is faster, re-think if that is the case for Content Search API? If not do crazy/regex/math to get a shorter one (so every results looks different in Mirador)
  • Absolute v/s percentage based canvas regions for highlights. This is complex. We depend on users setting the correct canvas size/image size on the Manifest. In perfect scenario looking at you Mirador the actual size would be respected, and our #whxy= coordinates, calculated from % ones we have in Solr, will match well. But Mirador, when using an Image service basically ignores any width/height settings! So... here comes the hard part. Normally we use EXIF info for that, so we never try to generate a smaller canvas that the requested IMAGE, BUT, for PDFs, a user can request Cantaloupe to render at e.g 300 dpi or 72 dpi. And we scale accordingly. When that happens, the actual returned IMAGE will differ from the canvas size/image size/desired and our annotation will look OFF-set. This of course is not breaking. if the user is aware/did the change himself, understands IIIF and edits the manifests to accommodate. All works. Does anyone do that? Sadly no 👎 So we need Docs, settings, warning
  • Content Search API required fields v/s optionals. Some Solr fields are used blindly (e.g ocr_text) and we don't tell the user... hey you don't have it. So we need to have it
  • Search API v/s Solr... if the Normal Highlight processor (instead of our fancy one) is enabled, JOINS fail... I did not know that. So we need to check/warn the user about that.

@alliomeria this is what I learned today.

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions