Optimizing Query Expansions via LLM Preference Alignment

One of the longstanding challenges in information retrieval is the vocabulary mismatch problem, where the terms used in user queries do not directly align with those in relevant documents. For example, a user might want to find a track on Spotify based on its lyrics, but the user might have misheard and formulate a query where there is a mismatch, since the issued query, e.g. “Dancing queen, feel the beat from the tangerine”, is different to the actual lyric, e.g. “Dancing queen, feel the rhythm from the tambourine”.

Large Language Models (LLMs) offer a solution: modify the input query before it is fed to the search engine, a technique known as query expansion. A key question in this approach is: how do we know that LLM-generated query expansions will lead to high-quality search results? In this blog post, we describe an approach for this problem called Aligned Query Expansion (AQE). By doing post-training with direct preference optimization on a set of pairs of effective and not effective query expansions, we teach the model to generate expansions that work well for the target search system. Our experiments show that AQE can reduce processing time by ~70%, while improving top-1 retrieval accuracy in public datasets.

Limitations of current query expansion methods

LLMs are prone to generate irrelevant query expansions. Recent approaches to expansion like Doc2Query-- [1] and Expand and Rerank [2] use a generate-then-filter approach, where multiple expansions are generated and a re-ranking model is applied to select the best ones only. A limitation of generate-then-filter approaches is that they do not teach the LLM to prioritize query expansions that are most beneficial for the retrieval task, leading to inefficiencies and increased latency. This is particularly problematic in real-time IR systems with stringent performance requirements. With the proposed Aligned Query Expansion, we can skip the additional filtering/ranking step, and directly train the model to output effective queries for the search engine.

How Aligned Query Expansion works

We start from a training set that contains pairs of queries and relevant documents, for instance obtained via click-through data or manual annotation. Then, there are three main steps in the AQE pipeline: (1) Zero-shot Query Expansion Generation (2) Ranking Query Expansions (3) Alignment.

First, we rely on a zero-shot LLM to generate multiple query expansion candidates for every query in the train dataset.
For each query expansion candidate, we concatenate the expansion to the original query, then we calculate the position of the relevant document when issuing the query expansion to the downstream search system. The higher the position of this document is, the higher the score of the query expansion candidate is.
Based on the scored candidates from (2), we train the model with Rejection Sampling Fine-tuning, which uses only the highest scored query expansion, and with Direct Preference Optimization, which uses pairs of high scored query expansion and low scored query expansion.

Optimizing Query Expansions via LLM Preference Alignment - Image 1

Comparison with existing methods

We compare AQE with baselines on the public Natural Questions dataset. The table below shows that the combination of RSFT followed by DPO outperforms the original query and previous generate-then-filter approaches.

	Model	Top-1 Accuracy
Baselines	Original Query	22.1
	Generate-then-filter	28.5
AQE	RSFT + DPO	30.8

AQE also reduces the computational time to generate queries by ~70% compared to the generate-then-filter approach, since we do not generate a number of candidates that are later re-ranked.

We also found that AQE is effective in out-of-domain settings, i.e. trained with one dataset and tested in another one. The following table displays the results when training the models on the Natural Questions and testing on the EntityQs dataset:

Tested on the EntityQs dataset
	Model	Top-1 Accuracy
Baselines	Original Query	44.2
	Generate-then-filter	23.7
AQE	RSFT + DPO	46.7

Conclusion

In this work we introduced a novel approach to query expansion, AQE, that uses preference alignment techniques to generate effective query expansions for a search system. By aligning LLMs with RSFT and DPO, AQE eliminates costly filtering steps, improves retrieval performance, and reduces latency and memory usage—crucial for real-time, large-scale applications.

Our results highlight that query expansion techniques can be used in industry settings to improve the effectiveness of search systems, mitigating vocabulary mismatch problems.

For more information please refer to our paper: Efficient Query Expansion for Information Retrieval through LLM Alignment Adam Yang, Gustavo Penha, Enrico Palumbo and Hugues Bouchard.

References: [1] Gospodinov, Mitko, Sean MacAvaney, and Craig Macdonald. "Doc2Query–: when less is more." ECIR, 2023. [2] Chuang, Yung-Sung, et al. "Expand, Rerank, and Retrieve: Query Reranking for Open-Domain Question Answering." Findings ACL 2023.

Optimizing Query Expansions via LLM Preference Alignment

Limitations of current query expansion methods

How Aligned Query Expansion works

Comparison with existing methods

Conclusion

Related articles

Teaching Large Language Models to Speak Spotify: How Semantic IDs Enable Personalization

Personalizing Agentic AI to Users' Musical Tastes with Scalable Preference Optimization

Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations

Profile-aware LLM-as-a-Judge for Podcasts: A Better Middle Ground Between Offline Metrics and A/B Tests