This repository contains the code and data for the paper "A Comparative Analysis of Linguistic and Retrieval Diversity in LLM-Generated Search Queries"
A Comparative Analysis of Linguistic and Retrieval Diversity in LLM-Generated Search Queries
Oleg Zendel, Sara Fahad Dawood Al Lawati, Lida Rashidi, Falk Scholer, and Mark Sanderson. 2025. A Comparative Analysis of Linguistic and Retrieval Diversity in LLM-Generated Search Queries. In Proceedings of the 34th ACM International Conference on Information and Knowledge Management (CIKM ’25), November 10–14, 2025, Seoul, Republic of Korea. ACM, New York, NY, USA, 10 pages.
This repository contains the code and data used in our paper. Specifically, it provides:
- Code to generate the LLM queries used in our experiments
- Datasets produced and/or used in the study
Our goal is to support transparency, reproducibility, and future work in this area.
This repository provides code for generating search queries using both GPT models and other LLMs:
Code to generate queries using GPT models:
- GPT-4
- GPT-4o-mini
Notebook to generate queries using additional LLMs:
anthropic.claude-3-5-haiku-20241022-v1:0us.meta.llama3-3-70b-instruct-v1:0us.meta.llama3-2-11b-instruct-v1:0mistral.mistral-7b-instruct-v0:2mistral.mixtral-8x7b-instruct-v0:1mistral.mistral-large-2407-v1:0
Our goal is to support transparency, reproducibility, and enable future research in LLM-generated query analysis.
We used five methods for both GPT-Query-Gen and Bedrock generation. Here's how the files are named in this repository:
- Format:
tn_pnv_modelwhere:t= technique numberpv= prompt versionmodel= LLM model used
- Note: Some methods didn't have different prompt versions, so we used only
tnfor those files
- Follow a similar naming structure to GPT files, but all code is consolidated within a single notebook file:
berock_generation.ipynb
Description: Shuffles 12 backstories, groups them into batches of 6, repeats this process 100 times with different shuffles - range(100). Each backstory has 100 queries generated.
Models: GPT 4o-mini and GPT 4
Prompt Versions: 1-4
Description: Inserts all 12 backstories together in the same order, repeat this process 50 times with the same order - range (50). Each backstory has 50 queries generated.
Models: GPT 4o-mini and GPT 4
Prompt Versions: 1-4
Prompt v1:
You are an English-speaking searcher. You will be presented with a list of backstories. For each backstory write the first search query you would use to find relevant information.
Return a query for each backstory in JSONL format `{"Backstory-ID":[BACKSTORY-ID], "Query":[QUERY]}`
Prompt v2:
You are an English-speaking searcher. You will be given a list of backstories. For each backstory, generate an initial search query to find relevant information.
Provide a query for each backstory in JSONL format: {"Backstory-ID": [BACKSTORY-ID], "Query": [QUERY]}.
Prompt v3:
You will be provided with a list of backstories. As an English-speaking searcher, for each backstory, generate the initial search query you would use to find relevant information.
Return your response in JSONL format: {"Backstory-ID": [BACKSTORY-ID], "Query": [QUERY]}.
Prompt v4:
You will be given a list of backstories. As an English-speaking searcher, for each backstory, generate the first search query you would use to retrieve relevant information.
Provide your response in JSONL format: {"Backstory-ID": [BACKSTORY-ID], "Query": [QUERY]}.
Note: Despite the structured design, the outputs showed limited lexical diversity and differed substantially from human-written queries, leading us to discard the above two methods in our analysis in the paper.
Description: Generates synthetic queries by prompting GPT in a similar way as crowdsource workers. Inserted all 12 backstories together.
Prompt v1:
Please create a list of search queries made by a diverse group of users seeking answers to a situation described in a provided backstory. The queries should reflect the users' diverse backgrounds and word choices. Queries are expressed using natural language or keywords and may use abbreviations. Each backstory must have 100 queries. Queries vary in length but should have 5 words on average.
Provide the queries in JSONL format: {"Backstory-ID": [BACKSTORY-ID], "Queries": [[QUERY1], [QUERY2], [QUERYn]]}.
Prompt v2:
Please create a list of search queries made by a diverse group of users seeking answers to a situation described in a provided backstory. The queries should reflect the users' diverse backgrounds and word choices. Queries are expressed using natural language or keywords and may use abbreviations. Each backstory is expected to have 100 queries. Queries vary in length but should have 5 words on average.
Provide the queries in JSONL format: {"Backstory-ID": [BACKSTORY-ID], "Queries": [[QUERY1], [QUERY2], [QUERYn]]}.
Prompt v3:
Please create a list of search queries made by a diverse group of users seeking answers to a situation described in a provided backstory. The queries should reflect the users' diverse backgrounds and word choices. Queries are expressed using natural language or keywords and may use abbreviations. Each backstory is expected to have a range between 19 to 101 queries. Queries vary in length but should have 5 words on average.
Provide a range between 19 to 101 queries for each backstory in JSONL format: {"Backstory-ID": [BACKSTORY-ID], "Queries": [[QUERY1], [QUERY2], [QUERYn]]}.
Prompt v4:
Please create a list of search queries made by a diverse group of users seeking answers to a situation described in a provided backstory. The queries should reflect the users' diverse backgrounds and word choices. Queries are expressed using natural language or keywords and may use abbreviations. For each backstory, generate a random number of queries between 19 and 101. The number of queries for each backstory should vary. Queries vary in length but should have 5 words on average.
Provide the queries for each backstory in JSONL format: {"Backstory-ID": [BACKSTORY-ID], "Queries": [[QUERY1], [QUERY2], [QUERYn]]}.
Description: Generates synthetic queries by prompting GPT in a similar way as crowdsource workers with (4o mini). Inserted one backstory at a time.
Prompt:
Please create a list of 500 search queries made by a diverse group of users seeking answers to a situation described in a provided backstory. The queries should reflect the users' diverse backgrounds and word choices. Queries are expressed using natural language or keywords and may use abbreviations. Queries vary in length but should have 5 words on average. Provided backstory: "INSERT BACKSTORY" Provide output in JSONL format: {"Query": [QUERY]}.
Description: Inspired by Simone Filice, Guy Horowitz, David Carmel, Zohar Karnin, Liane Lewin-Eytan, and Yoelle Maarek. 2025. Generating Diverse Q&A Benchmarks for RAG Evaluation with DataMorgana. arXiv:2501.12789 (Jan. 2025). https://doi.org/10.48550/arXiv.2501.12789
We defined different sets of search skills, topic knowledge and query types which were fed into the prompt:
Prompt:
You are a user simulator that should generate {num_queries} candidate queries for addressing a specified information need.
The {num_queries} queries must be in the form of search queries that a user would type into a search engine.
When generating the queries, assume that whoever reads the queries will read each query independently.
The {num_queries} queries must be diverse and different from each other.
Return only the queries without any preamble.
Write each query in a new line, in the following JSON format:
'{"query": <query>}'
## The generated queries should address the following information need:
{information_need}
## Each of the generated queries must reflect the following characteristics:
1. User Skills: {user_skills}
2. Topic Knowledge: {topic_knowledge}
3. Query Type: {query_types}
This research was partially supported by the Australian Research council through the ARC Centre of Excellence for Automated Decision-Making and Society CE200100005 and Discovery Project DP190101113, and was undertaken with the assistance of computing resources from the RMIT Advanced Cloud Ecosystem (RACE) Hub.
If you use this code or dataset, please cite our paper:
@inproceedings{zendel2025comparative,
author = {Oleg Zendel and Sara Fahad Dawood Al Lawati and Lida Rashidi and Falk Scholer and Mark Sanderson},
booktitle = {Proceedings of the 34th ACM International Conference on Information and Knowledge Management (CIKM ’25)},
doi = {10.1145/3746252.3761382},
pages = {},
title = {A Comparative Analysis of Linguistic and Retrieval Diversity in LLM-Generated Search Queries.},
year = {2025}
}