This repository contains the code and data for the paper "A Comparative Analysis of Linguistic and Retrieval Diversity in LLM-Generated Search Queries"

A Comparative Analysis of Linguistic and Retrieval Diversity in LLM-Generated Search Queries
Oleg Zendel, Sara Fahad Dawood Al Lawati, Lida Rashidi, Falk Scholer, and Mark Sanderson. 2025. A Comparative Analysis of Linguistic and Retrieval Diversity in LLM-Generated Search Queries. In Proceedings of the 34th ACM International Conference on Information and Knowledge Management (CIKM ’25), November 10–14, 2025, Seoul, Republic of Korea. ACM, New York, NY, USA, 10 pages.

Overview

This repository contains the code and data used in our paper. Specifically, it provides:

Code to generate the LLM queries used in our experiments
Datasets produced and/or used in the study

Our goal is to support transparency, reproducibility, and future work in this area.

Repository Structure

This repository provides code for generating search queries using both GPT models and other LLMs:

1. GPT-Query-Gen

Code to generate queries using GPT models:

GPT-4
GPT-4o-mini

2. berock_generation.ipynb

Notebook to generate queries using additional LLMs:

anthropic.claude-3-5-haiku-20241022-v1:0
us.meta.llama3-3-70b-instruct-v1:0
us.meta.llama3-2-11b-instruct-v1:0
mistral.mistral-7b-instruct-v0:2
mistral.mixtral-8x7b-instruct-v0:1
mistral.mistral-large-2407-v1:0

Our goal is to support transparency, reproducibility, and enable future research in LLM-generated query analysis.

Methodology

We used five methods for both GPT-Query-Gen and Bedrock generation. Here's how the files are named in this repository:

File Naming Convention

GPT Query Generation Files:

Format: tn_pnv_model where:
- t = technique number
- pv = prompt version
- model = LLM model used
Note: Some methods didn't have different prompt versions, so we used only tn for those files

Bedrock Generation Files:

Follow a similar naming structure to GPT files, but all code is consolidated within a single notebook file: berock_generation.ipynb

Technique 1 (t1): Giving LLMs Same Prompts

Description: Shuffles 12 backstories, groups them into batches of 6, repeats this process 100 times with different shuffles - range(100). Each backstory has 100 queries generated.

Models: GPT 4o-mini and GPT 4
Prompt Versions: 1-4

Technique 2 (t2): Giving LLMs Different Prompts

Description: Inserts all 12 backstories together in the same order, repeat this process 50 times with the same order - range (50). Each backstory has 50 queries generated.

Models: GPT 4o-mini and GPT 4
Prompt Versions: 1-4

Prompt Versions 1-4 for t1 and t2:

Prompt v1:

You are an English-speaking searcher. You will be presented with a list of backstories. For each backstory write the first search query you would use to find relevant information.

Return a query for each backstory in JSONL format `{"Backstory-ID":[BACKSTORY-ID], "Query":[QUERY]}`

Prompt v2:

You are an English-speaking searcher. You will be given a list of backstories. For each backstory, generate an initial search query to find relevant information.

Provide a query for each backstory in JSONL format: {"Backstory-ID": [BACKSTORY-ID], "Query": [QUERY]}.

Prompt v3:

You will be provided with a list of backstories. As an English-speaking searcher, for each backstory, generate the initial search query you would use to find relevant information.

Return your response in JSONL format: {"Backstory-ID": [BACKSTORY-ID], "Query": [QUERY]}.

Prompt v4:

You will be given a list of backstories. As an English-speaking searcher, for each backstory, generate the first search query you would use to retrieve relevant information.

Provide your response in JSONL format: {"Backstory-ID": [BACKSTORY-ID], "Query": [QUERY]}.

Note: Despite the structured design, the outputs showed limited lexical diversity and differed substantially from human-written queries, leading us to discard the above two methods in our analysis in the paper.

Technique 3: Crowd Workers Prompt Variants (CW PV)

Description: Generates synthetic queries by prompting GPT in a similar way as crowdsource workers. Inserted all 12 backstories together.

Prompt Versions for Technique 3:

Prompt v1:

Please create a list of search queries made by a diverse group of users seeking answers to a situation described in a provided backstory. The queries should reflect the users' diverse backgrounds and word choices. Queries are expressed using natural language or keywords and may use abbreviations. Each backstory must have 100 queries. Queries vary in length but should have 5 words on average.

Provide the queries in JSONL format: {"Backstory-ID": [BACKSTORY-ID], "Queries": [[QUERY1], [QUERY2], [QUERYn]]}.

Prompt v2:

Please create a list of search queries made by a diverse group of users seeking answers to a situation described in a provided backstory. The queries should reflect the users' diverse backgrounds and word choices. Queries are expressed using natural language or keywords and may use abbreviations. Each backstory is expected to have 100 queries. Queries vary in length but should have 5 words on average.

Provide the queries in JSONL format: {"Backstory-ID": [BACKSTORY-ID], "Queries": [[QUERY1], [QUERY2], [QUERYn]]}.

Prompt v3:

Please create a list of search queries made by a diverse group of users seeking answers to a situation described in a provided backstory. The queries should reflect the users' diverse backgrounds and word choices. Queries are expressed using natural language or keywords and may use abbreviations. Each backstory is expected to have a range between 19 to 101 queries. Queries vary in length but should have 5 words on average.

Provide a range between 19 to 101 queries for each backstory in JSONL format: {"Backstory-ID": [BACKSTORY-ID], "Queries": [[QUERY1], [QUERY2], [QUERYn]]}.

Prompt v4:

Please create a list of search queries made by a diverse group of users seeking answers to a situation described in a provided backstory. The queries should reflect the users' diverse backgrounds and word choices. Queries are expressed using natural language or keywords and may use abbreviations. For each backstory, generate a random number of queries between 19 and 101. The number of queries for each backstory should vary. Queries vary in length but should have 5 words on average.

Provide the queries for each backstory in JSONL format: {"Backstory-ID": [BACKSTORY-ID], "Queries": [[QUERY1], [QUERY2], [QUERYn]]}.

Technique 4: Crowd Workers 500 (CW 500)

Description: Generates synthetic queries by prompting GPT in a similar way as crowdsource workers with (4o mini). Inserted one backstory at a time.

Prompt:

Please create a list of 500 search queries made by a diverse group of users seeking answers to a situation described in a provided backstory. The queries should reflect the users' diverse backgrounds and word choices. Queries are expressed using natural language or keywords and may use abbreviations. Queries vary in length but should have 5 words on average. Provided backstory: "INSERT BACKSTORY" Provide output in JSONL format: {"Query": [QUERY]}.

Technique 5: Context-based Method (VC - Various Contexts)

Description: Inspired by Simone Filice, Guy Horowitz, David Carmel, Zohar Karnin, Liane Lewin-Eytan, and Yoelle Maarek. 2025. Generating Diverse Q&A Benchmarks for RAG Evaluation with DataMorgana. arXiv:2501.12789 (Jan. 2025). https://doi.org/10.48550/arXiv.2501.12789

We defined different sets of search skills, topic knowledge and query types which were fed into the prompt:

Prompt:

You are a user simulator that should generate {num_queries} candidate queries for addressing a specified information need.
The {num_queries} queries must be in the form of search queries that a user would type into a search engine.
When generating the queries, assume that whoever reads the queries will read each query independently.
The {num_queries} queries must be diverse and different from each other.
Return only the queries without any preamble.
Write each query in a new line, in the following JSON format:
'{"query": <query>}'

## The generated queries should address the following information need:
{information_need}

## Each of the generated queries must reflect the following characteristics:
1. User Skills: {user_skills}
2. Topic Knowledge: {topic_knowledge}
3. Query Type: {query_types}

Acknowledgment

This research was partially supported by the Australian Research council through the ARC Centre of Excellence for Automated Decision-Making and Society CE200100005 and Discovery Project DP190101113, and was undertaken with the assistance of computing resources from the RMIT Advanced Cloud Ecosystem (RACE) Hub.

Citation

If you use this code or dataset, please cite our paper:

@inproceedings{zendel2025comparative,
 author = {Oleg Zendel and Sara Fahad Dawood Al Lawati and Lida Rashidi and Falk Scholer and Mark Sanderson},
 booktitle = {Proceedings of the 34th ACM International Conference on Information and Knowledge Management (CIKM ’25)},
 doi = {10.1145/3746252.3761382},
 pages = {},
 title = {A Comparative Analysis of Linguistic and Retrieval Diversity in LLM-Generated Search Queries.},
 year = {2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
GPT-Query-Gen		GPT-Query-Gen
data		data
figures		figures
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
anserini-runs.ipynb		anserini-runs.ipynb
bedrock_generation.ipynb		bedrock_generation.ipynb
extract_qrel_docs.py		extract_qrel_docs.py
ngram_analysis.ipynb		ngram_analysis.ipynb
relevance-judgments.ipynb		relevance-judgments.ipynb
statistical_analysis.ipynb		statistical_analysis.ipynb
uqv100-relevance-guidelines-v2.pdf		uqv100-relevance-guidelines-v2.pdf
uqv100_12-topics-unique-docs-per-topic.html		uqv100_12-topics-unique-docs-per-topic.html
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

This repository contains the code and data for the paper "A Comparative Analysis of Linguistic and Retrieval Diversity in LLM-Generated Search Queries"

Overview

Repository Structure

1. GPT-Query-Gen

2. berock_generation.ipynb

Methodology

File Naming Convention

GPT Query Generation Files:

Bedrock Generation Files:

Technique 1 (t1): Giving LLMs Same Prompts

Technique 2 (t2): Giving LLMs Different Prompts

Prompt Versions 1-4 for t1 and t2:

Technique 3: Crowd Workers Prompt Variants (CW PV)

Prompt Versions for Technique 3:

Technique 4: Crowd Workers 500 (CW 500)

Technique 5: Context-based Method (VC - Various Contexts)

Acknowledgment

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

rmit-ir/Query-Gen-LLM

Folders and files

Latest commit

History

Repository files navigation

This repository contains the code and data for the paper "A Comparative Analysis of Linguistic and Retrieval Diversity in LLM-Generated Search Queries"

Overview

Repository Structure

1. GPT-Query-Gen

2. berock_generation.ipynb

Methodology

File Naming Convention

GPT Query Generation Files:

Bedrock Generation Files:

Technique 1 (t1): Giving LLMs Same Prompts

Technique 2 (t2): Giving LLMs Different Prompts

Prompt Versions 1-4 for t1 and t2:

Technique 3: Crowd Workers Prompt Variants (CW PV)

Prompt Versions for Technique 3:

Technique 4: Crowd Workers 500 (CW 500)

Technique 5: Context-based Method (VC - Various Contexts)

Acknowledgment

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages