Skip to content

rmit-ir/Query-Gen-LLM

Repository files navigation

This repository contains the code and data for the paper "A Comparative Analysis of Linguistic and Retrieval Diversity in LLM-Generated Search Queries"

A Comparative Analysis of Linguistic and Retrieval Diversity in LLM-Generated Search Queries
Oleg Zendel, Sara Fahad Dawood Al Lawati, Lida Rashidi, Falk Scholer, and Mark Sanderson. 2025. A Comparative Analysis of Linguistic and Retrieval Diversity in LLM-Generated Search Queries. In Proceedings of the 34th ACM International Conference on Information and Knowledge Management (CIKM ’25), November 10–14, 2025, Seoul, Republic of Korea. ACM, New York, NY, USA, 10 pages.


Overview

This repository contains the code and data used in our paper. Specifically, it provides:

  • Code to generate the LLM queries used in our experiments
  • Datasets produced and/or used in the study

Our goal is to support transparency, reproducibility, and future work in this area.


Repository Structure

This repository provides code for generating search queries using both GPT models and other LLMs:

1. GPT-Query-Gen

Code to generate queries using GPT models:

  • GPT-4
  • GPT-4o-mini

2. berock_generation.ipynb

Notebook to generate queries using additional LLMs:

  • anthropic.claude-3-5-haiku-20241022-v1:0
  • us.meta.llama3-3-70b-instruct-v1:0
  • us.meta.llama3-2-11b-instruct-v1:0
  • mistral.mistral-7b-instruct-v0:2
  • mistral.mixtral-8x7b-instruct-v0:1
  • mistral.mistral-large-2407-v1:0

Our goal is to support transparency, reproducibility, and enable future research in LLM-generated query analysis.


Methodology

We used five methods for both GPT-Query-Gen and Bedrock generation. Here's how the files are named in this repository:

File Naming Convention

GPT Query Generation Files:

  • Format: tn_pnv_model where:
    • t = technique number
    • pv = prompt version
    • model = LLM model used
  • Note: Some methods didn't have different prompt versions, so we used only tn for those files

Bedrock Generation Files:

  • Follow a similar naming structure to GPT files, but all code is consolidated within a single notebook file: berock_generation.ipynb

Technique 1 (t1): Giving LLMs Same Prompts

Description: Shuffles 12 backstories, groups them into batches of 6, repeats this process 100 times with different shuffles - range(100). Each backstory has 100 queries generated.

Models: GPT 4o-mini and GPT 4
Prompt Versions: 1-4

Technique 2 (t2): Giving LLMs Different Prompts

Description: Inserts all 12 backstories together in the same order, repeat this process 50 times with the same order - range (50). Each backstory has 50 queries generated.

Models: GPT 4o-mini and GPT 4
Prompt Versions: 1-4

Prompt Versions 1-4 for t1 and t2:

Prompt v1:

You are an English-speaking searcher. You will be presented with a list of backstories. For each backstory write the first search query you would use to find relevant information.

Return a query for each backstory in JSONL format `{"Backstory-ID":[BACKSTORY-ID], "Query":[QUERY]}`

Prompt v2:

You are an English-speaking searcher. You will be given a list of backstories. For each backstory, generate an initial search query to find relevant information.

Provide a query for each backstory in JSONL format: {"Backstory-ID": [BACKSTORY-ID], "Query": [QUERY]}.

Prompt v3:

You will be provided with a list of backstories. As an English-speaking searcher, for each backstory, generate the initial search query you would use to find relevant information.

Return your response in JSONL format: {"Backstory-ID": [BACKSTORY-ID], "Query": [QUERY]}.

Prompt v4:

You will be given a list of backstories. As an English-speaking searcher, for each backstory, generate the first search query you would use to retrieve relevant information.

Provide your response in JSONL format: {"Backstory-ID": [BACKSTORY-ID], "Query": [QUERY]}.

Note: Despite the structured design, the outputs showed limited lexical diversity and differed substantially from human-written queries, leading us to discard the above two methods in our analysis in the paper.

Technique 3: Crowd Workers Prompt Variants (CW PV)

Description: Generates synthetic queries by prompting GPT in a similar way as crowdsource workers. Inserted all 12 backstories together.

Prompt Versions for Technique 3:

Prompt v1:

Please create a list of search queries made by a diverse group of users seeking answers to a situation described in a provided backstory. The queries should reflect the users' diverse backgrounds and word choices. Queries are expressed using natural language or keywords and may use abbreviations. Each backstory must have 100 queries. Queries vary in length but should have 5 words on average.

Provide the queries in JSONL format: {"Backstory-ID": [BACKSTORY-ID], "Queries": [[QUERY1], [QUERY2], [QUERYn]]}.

Prompt v2:

Please create a list of search queries made by a diverse group of users seeking answers to a situation described in a provided backstory. The queries should reflect the users' diverse backgrounds and word choices. Queries are expressed using natural language or keywords and may use abbreviations. Each backstory is expected to have 100 queries. Queries vary in length but should have 5 words on average.

Provide the queries in JSONL format: {"Backstory-ID": [BACKSTORY-ID], "Queries": [[QUERY1], [QUERY2], [QUERYn]]}.

Prompt v3:

Please create a list of search queries made by a diverse group of users seeking answers to a situation described in a provided backstory. The queries should reflect the users' diverse backgrounds and word choices. Queries are expressed using natural language or keywords and may use abbreviations. Each backstory is expected to have a range between 19 to 101 queries. Queries vary in length but should have 5 words on average.

Provide a range between 19 to 101 queries for each backstory in JSONL format: {"Backstory-ID": [BACKSTORY-ID], "Queries": [[QUERY1], [QUERY2], [QUERYn]]}.

Prompt v4:

Please create a list of search queries made by a diverse group of users seeking answers to a situation described in a provided backstory. The queries should reflect the users' diverse backgrounds and word choices. Queries are expressed using natural language or keywords and may use abbreviations. For each backstory, generate a random number of queries between 19 and 101. The number of queries for each backstory should vary. Queries vary in length but should have 5 words on average.

Provide the queries for each backstory in JSONL format: {"Backstory-ID": [BACKSTORY-ID], "Queries": [[QUERY1], [QUERY2], [QUERYn]]}.

Technique 4: Crowd Workers 500 (CW 500)

Description: Generates synthetic queries by prompting GPT in a similar way as crowdsource workers with (4o mini). Inserted one backstory at a time.

Prompt:

Please create a list of 500 search queries made by a diverse group of users seeking answers to a situation described in a provided backstory. The queries should reflect the users' diverse backgrounds and word choices. Queries are expressed using natural language or keywords and may use abbreviations. Queries vary in length but should have 5 words on average. Provided backstory: "INSERT BACKSTORY" Provide output in JSONL format: {"Query": [QUERY]}.

Technique 5: Context-based Method (VC - Various Contexts)

Description: Inspired by Simone Filice, Guy Horowitz, David Carmel, Zohar Karnin, Liane Lewin-Eytan, and Yoelle Maarek. 2025. Generating Diverse Q&A Benchmarks for RAG Evaluation with DataMorgana. arXiv:2501.12789 (Jan. 2025). https://doi.org/10.48550/arXiv.2501.12789

We defined different sets of search skills, topic knowledge and query types which were fed into the prompt:

Prompt:

You are a user simulator that should generate {num_queries} candidate queries for addressing a specified information need.
The {num_queries} queries must be in the form of search queries that a user would type into a search engine.
When generating the queries, assume that whoever reads the queries will read each query independently.
The {num_queries} queries must be diverse and different from each other.
Return only the queries without any preamble.
Write each query in a new line, in the following JSON format:
'{"query": <query>}'

## The generated queries should address the following information need:
{information_need}

## Each of the generated queries must reflect the following characteristics:
1. User Skills: {user_skills}
2. Topic Knowledge: {topic_knowledge}
3. Query Type: {query_types}

Acknowledgment

This research was partially supported by the Australian Research council through the ARC Centre of Excellence for Automated Decision-Making and Society CE200100005 and Discovery Project DP190101113, and was undertaken with the assistance of computing resources from the RMIT Advanced Cloud Ecosystem (RACE) Hub.

Citation

If you use this code or dataset, please cite our paper:

@inproceedings{zendel2025comparative,
 author = {Oleg Zendel and Sara Fahad Dawood Al Lawati and Lida Rashidi and Falk Scholer and Mark Sanderson},
 booktitle = {Proceedings of the 34th ACM International Conference on Information and Knowledge Management (CIKM ’25)},
 doi = {10.1145/3746252.3761382},
 pages = {},
 title = {A Comparative Analysis of Linguistic and Retrieval Diversity in LLM-Generated Search Queries.},
 year = {2025}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages