Skip to content

Ett 1379 solr parser error#131

Open
liseli wants to merge 1 commit intomainfrom
ETT-1379_solrParserError
Open

Ett 1379 solr parser error#131
liseli wants to merge 1 commit intomainfrom
ETT-1379_solrParserError

Conversation

@liseli
Copy link
Contributor

@liseli liseli commented Mar 20, 2026

This PR solved two issues:

  • Apply escape when generating the assis field. It caused Solr parser error in AllFields queries.
  • Reproduce the previous search in query strings that combine terms, operators, and phrase tokens. e.g. charles dickens OR "weekly"

As part of this PR, I've created a simple PHP script that shows the Solr query given and input query.

Run the script with the command docker compose run vufind php PrintSolrQuery.php 'charles dickens OR "weekly"' title, the output should be

----- Tokenized Search ----- : ["charles","dickens","OR","\"weekly\""] ----- Classified Tokens ----- : [{"type":"term","value":"charles"},{"type":"term","value":"dickens"},{"type":"operator","value":"OR"},{"type":"phrase","value":{"text":"weekly","slop":null}}] ----- Tokens after collapsing compound phrases ----- : [{"type":"term","value":"charles"},{"type":"compound_phrase","value":{"tokens":[{"type":"term","value":"dickens"},{"type":"operator","value":"OR"},{"type":"phrase","value":{"text":"weekly","slop":null}}]}}] ----- Escaped Parts ----- : ["charles","dickens OR \"weekly\""] ----- Semantic Structure ----- : {"onephrase":"\"charles dickens OR weekly\"","and":"charles AND dickens OR \"weekly\"","or":"charles OR dickens OR \"weekly\"","asis":"charles (dickens OR \"weekly\")","compressed":"charles\\(dickensOR\\\"weekly\\\"\\)","exactmatcher":"charlesdickensorweekly","emstartswith":"charlesdickensorweekly*"} ----- Solr Search ----- : "(title_ab:(charlesdickensorweekly)^25000 OR title_a:(charlesdickensorweekly)^15000 OR titleProper:(charlesdickensorweekly*)^8000 OR titleProper:(\"charles dickens OR weekly\")^1200 OR titleProper:(charles AND dickens OR \"weekly\")^120 OR title_topProper:(\"charles dickens OR weekly\")^600 OR title_topProper:(charles AND dickens OR \"weekly\")^60 OR title_restProper:(\"charles dickens OR weekly\")^400 OR title_restProper:(charles AND dickens OR \"weekly\")^40 OR series:(\"charles dickens OR weekly\")^500 OR series:(charles AND dickens OR \"weekly\")^50 OR series2:(\"charles dickens OR weekly\")^500 OR series2:(charles AND dickens OR \"weekly\")^50 OR title:(charles AND dickens OR \"weekly\")^30 OR title_top:(charles AND dickens OR \"weekly\")^20 OR title_rest:(charles AND dickens OR \"weekly\")^1)"

These changes are running in https://test.catalog.hathitrust.org/Search/Home

@liseli liseli requested review from aelkiss and moseshll March 20, 2026 17:23
Copy link
Member

@aelkiss aelkiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes here make sense; new test looks good; I'm assuming other existing tests would catch potential regressions, plus you mentioned manually comparing to a previous version.

Copy link
Contributor

@moseshll moseshll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I abused this branch and could not make it mad. No "problem talking to catalog" errors.

I have two suggestions on the welcome addition of the PrintSolrQuery facility:

  • Add a print("\n"); after everything else to keep terminal happy (I stuck it on line 83) -- zsh on the Mac gives me that funky red % to indicate there was no newline at the end of the output.
  • I would consider making a bin directory and moving this file there. Unfortunately I had to change e.g. require_once __DIR__ . '/sys/Solr.php'; to require_once __DIR__ . '/../sys/Solr.php'; three times, and once more for the parse_ini_file on line 43, and it would impact the README and the "Usage:" string too, so may not be worth it. My rationale is, it's not part of the catalog web service so using bin helps sequester it, and it really doesn't belong in test.

* Escape asis field before building the Solr query
* Create a PHP script to print Solr query given an string query and save it into a bin directory
@liseli liseli force-pushed the ETT-1379_solrParserError branch from 3969c53 to 8a2c29c Compare March 24, 2026 20:54
@liseli
Copy link
Contributor Author

liseli commented Mar 24, 2026

I abused this branch and could not make it mad. No "problem talking to catalog" errors.

I have two suggestions on the welcome addition of the PrintSolrQuery facility:

  • Add a print("\n"); after everything else to keep terminal happy (I stuck it on line 83) -- zsh on the Mac gives me that funky red % to indicate there was no newline at the end of the output.
  • I would consider making a bin directory and moving this file there. Unfortunately I had to change e.g. require_once __DIR__ . '/sys/Solr.php'; to require_once __DIR__ . '/../sys/Solr.php'; three times, and once more for the parse_ini_file on line 43, and it would impact the README and the "Usage:" string too, so may not be worth it. My rationale is, it's not part of the catalog web service so using bin helps sequester it, and it really doesn't belong in test.

I've incorporated both suggestions. I also believe the script for making queries shouldn't be in the tests folder, but your idea of creating the bin directory is great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants