Skip to content
liseli edited this page Mar 17, 2026 · 4 revisions

Initial query Mapping - December, 2025

Initial function: build_and_or_onephrase()

  • tokenizeInput(),
  • validateInput() and
  • exactmatcherify())

Procedure:

  • First strips a small set of illegal punctuation, normalizes fancy quotes, and verifies balanced Lucene syntax before producing a dictionary of clause types: onephrase, and, or, asis, compressed, exactmatcher, and emstartswith.
  • Each value is derived from the sanitized string (e.g., onephrase wraps the token stream in quotes, and joins tokens with AND, and exactmatcher lowercases/removes non-alphanumeric characters while keeping */?).
  • The same function also feeds standardSearchComponents(), so every search term flows through this pipeline before anything reaches Solr.
  • The spec-to-query bridge in __buildQueryString() walks the entries defined in conf/searchspecs.yaml, looks up the relevant transformed value (e.g., exactmatcher for title_ab, emstartswith for titleProper, and for title, etc.), and emits field:(value) clauses with the configured boost, joining them with OR.
  • standardSearchComponents() then wraps the per-field clauses in parentheses and concatenates them across the requested search types, forming the q parameter that Solr executes.

Examples

Input: Apple AND Orange

  • tokenizeInput() treats AND as a connector, so the token array is ["Apple AND Orange"].

  • Transform values:

    * `onephrase = "Apple AND Orange"`, 
    * `and = "Apple AND Orange"`, 
    * `or = "Apple AND Orange"`, 
    * `asis = Apple AND Orange`, 
    * `compressed = AppleANDOrange`, 
    * `exactmatcher = appleandorange`, 
    * `emstartswith = appleandorange*`
    

Title-search clauses (subset shown):

  * title_ab:(appleandorange)^25000, 
  * title_a:(appleandorange)^15000, 
  * titleProper:(appleandorange*)^8000, 
  * titleProper:("Apple AND Orange")^1200, 
  * titleProper:(Apple AND Orange)^120, 
  * title:(Apple AND Orange)^30, 
  * title_top:(Apple AND Orange)^20, 
  * title_rest:(Apple AND Orange)^1, 

plus analogous clauses on series, series2, and the other title subfields. All are OR’ed together before applying any additional boolean connectors.

Input: Star Wars

  • tokenizeInput() token array becomes ["Star", "Wars"],

  • Transform values:

    * `onephrase = "Star Wars"`, 
    * `and = Star AND Wars`, 
    * `or = Star OR Wars`, 
    * `asis = Star Wars`, 
    * `compressed = StarWars`, 
    * `exactmatcher = starwars`, 
    * `emstartswith = starwars*`. 
    

Title-search clauses (subset shown): * title_ab:(starwars)^25000, * titleProper:(starwars*)^8000, * titleProper:("Star Wars")^1200, * title:(Star AND Wars)^30, etc., so field boosts still apply even though there are no explicit connectors in the input.

Input: "Quantum Leap"

  • tokenizeInput() Quoted phrase stays intact, so tokenizeInput() produces [""Quantum Leap""] (quotes are stripped before wrapping), *

Transform values:

  * onephrase = "Quantum Leap", 
  * and = "Quantum Leap", 
  * or = "Quantum Leap", 
  * asis = "Quantum Leap", 
  * compressed = "QuantumLeap", 
  * exactmatcher = quantumleap, 
  * emstartswith = quantumleap*. 

The title clauses mirror the quoted phrase for phrase-friendly transformations (onephrase and and) while still pushing down the normalized string into the exactmatcher fields, so you get both phrase matches and raw string matches in the same weighted query.

Input: machine learning~3

  • tokenizeInput() token array becomes ["machine","learning~3"], [""machine learning"~3"]

  • Transform values:

----- Tokenized Search ------: ["charles","dickens OR "weekly""]

----- Raw Search ------: "(title_ab:(charlesdickensorweekly)^25000 OR title_a:(charlesdickensorweekly)^15000 OR titleProper:(charlesdickensorweekly*)^8000 OR titleProper:("charles dickens OR weekly")^1200 OR titleProper:(charles AND dickens OR "weekly")^120 OR title_topProper:("charles dickens OR weekly")^600 OR title_topProper:(charles AND dickens OR "weekly")^60 OR title_restProper:("charles dickens OR weekly")^400 OR title_restProper:(charles AND dickens OR "weekly")^40 OR series:("charles dickens OR weekly")^500 OR series:(charles AND dickens OR "weekly")^50 OR series2:("charles dickens OR weekly")^500 OR series2:(charles AND dickens OR "weekly")^50 OR title:(charles AND dickens OR "weekly")^30 OR title_top:(charles AND dickens OR "weekly")^20 OR title_rest:(charles AND dickens OR "weekly")^1)"

  • docker compose run vufind php PrintSolrQuery.php 'heart AND cardiac' title ----- Tokenized Search ------: ["heart AND cardiac"]----- Raw Search ------: "(title_ab:(heartandcardiac)^25000 OR title_a:(heartandcardiac)^15000 OR titleProper:(heartandcardiac*)^8000 OR titleProper:("heart AND cardiac")^1200 OR titleProper:(heart AND cardiac)^120 OR title_topProper:("heart AND cardiac")^600 OR title_topProper:(heart AND cardiac)^60 OR title_restProper:("heart AND cardiac")^400 OR title_restProper:(heart AND cardiac)^40 OR series:("heart AND cardiac")^500 OR series:(heart AND cardiac)^50 OR series2:("heart AND cardiac")^500 OR series2:(heart AND cardiac)^50 OR title:(heart AND cardiac)^30 OR title_top:(heart AND cardiac)^20 OR title_rest:(heart AND cardiac)^1)"

  • heart -cardiac ----- Tokenized Search ------: ["heart","-cardiac"]----- Raw Search ------: "(title_ab:(heartcardiac)^25000 OR title_a:(heartcardiac)^15000 OR titleProper:(heartcardiac*)^8000 OR titleProper:("heart -cardiac")^1200 OR titleProper:(heart AND -cardiac)^120 OR title_topProper:("heart -cardiac")^600 OR title_topProper:(heart AND -cardiac)^60 OR title_restProper:("heart -cardiac")^400 OR title_restProper:(heart AND -cardiac)^40 OR series:("heart -cardiac")^500 OR series:(heart AND -cardiac)^50 OR series2:("heart -cardiac")^500 OR series2:(heart AND -cardiac)^50 OR title:(heart AND -cardiac)^30 OR title_top:(heart AND -cardiac)^20 OR title_rest:(heart AND -cardiac)^1)"

  • (heart OR cardiac) AND surgery ----- Tokenized Search ------: ["(heart OR cardiac) AND surgery"]----- Raw Search ------: "(title_ab:(heartorcardiacandsurgery)^25000 OR title_a:(heartorcardiacandsurgery)^15000 OR titleProper:(heartorcardiacandsurgery*)^8000 OR titleProper:("heart OR cardiac AND surgery")^1200 OR titleProper:(heart OR cardiac AND surgery)^120 OR title_topProper:("heart OR cardiac AND surgery")^600 OR title_topProper:(heart OR cardiac AND surgery)^60 OR title_restProper:("heart OR cardiac AND surgery")^400 OR title_restProper:(heart OR cardiac AND surgery)^40 OR series:("heart OR cardiac AND surgery")^500 OR series:(heart OR cardiac AND surgery)^50 OR series2:("heart OR cardiac AND surgery")^500 OR series2:(heart OR cardiac AND surgery)^50 OR title:(heart OR cardiac AND surgery)^30 OR title_top:(heart OR cardiac AND surgery)^20 OR title_rest:(heart OR cardiac AND surgery)^1)"

  • optim*

----- Tokenized Search ------: ["optim*"]----- Raw Search ------: "(title_ab:(optim*)^25000 OR title_a:(optim*)^15000 OR titleProper:(optim**)^8000 OR titleProper:("optim*")^1200 OR titleProper:(optim*)^120 OR title_topProper:("optim*")^600 OR title_topProper:(optim*)^60 OR title_restProper:("optim*")^400 OR title_restProper:(optim*)^40 OR series:("optim*")^500 OR series:(optim*)^50 OR series2:("optim*")^500 OR series2:(optim*)^50 OR title:(optim*)^30 OR title_top:(optim*)^20 OR title_rest:(optim*)^1)"

Next Steps Capture this mapping (pipeline → sample values → title field clauses) in whichever internal doc you prefer so every person inspecting catalog searches can trace from the user input to the Solr clause. If you want to validate a specific search type, run the same string through build_and_or_onephrase() and then __buildQueryString() with the desired spec (or log the q parameter); no automated tests were run for these notes.

Clone this wiki locally