Semantic IDs for Generative Search and Recommendation

Large Language Model (LLM)-based generative frameworks are transforming how we approach both search and recommendation. Rather than maintaining separate, task-specific systems, these models provide a unified solution that simplifies design and can improve generalization across tasks. A central challenge, however, lies in how to represent items as discrete tokens that LLMs can process. Traditional approaches, such as using unique IDs, sequential IDs, or item titles, face limitations in scalability and in handling cold-start items.

Recent work has introduced Semantic IDs, where items are tokenized based on their embeddings. This allows similar items to share tokens, promoting better generalization. However, studies show that the effectiveness of Semantic IDs depends strongly on the underlying embedding space: embeddings fine-tuned for search enhance retrieval performance, while those tuned for recommendation yield better recommendation quality.

In this work, we first demonstrate that task-specific Semantic IDs fail to generalize across both search and recommendation when used in a unified generative model. We then introduce a simple strategy for generating Semantic IDs that incorporate information from both tasks, and present results comparing this approach with other methods that also leverage cross-task information.

Task-specific semantic IDs do not generalize in a joint generative Search & Recommendation (S&R) model

Generative models promise to unify search and recommendation within a single architecture, but most existing methods optimize item IDs separately for each task. To explore what happens when these optimizations interact, we evaluated three approaches to creating Semantic IDs:

Content based embeddings (as in DSI [1] and TIGER [2]).
Search content-based embeddings fine-tuned for search (as in RIPOR [3]).
Recommendation based embeddings learned from a collaborative-filtering recommendation model (as in TokenRec [4]).

We generate Semantic IDs from these embeddings using RQ-KMeans, and use them to represent items in a joint generative model for both search and recommendation. We then report effectiveness (Recall@30) for search and recommendation on a dataset derived from MovieLens-25M.

The results show that Semantic IDs tuned for search significantly improve retrieval but degrade recommendation performance, while recommendation-tuned IDs have the opposite effect. In other words, optimizing for one task comes at the expense of the other, a tension highlighted in recent studies.

This raises our central question: can we build a shared embedding space that balances both tasks? To address this, we introduce a bi-encoder jointly fine-tuned for search and recommendation.

Semantic IDs from a multi-task bi-encoder model

To capture information from both tasks in a single set of Semantic IDs, we propose a simple approach: fine-tuning a bi-encoder model jointly for search and recommendation in a multi-task setup, as illustrated in the diagram below.

More concretely, we train a text-based bi-encoder with contrastive learning, using two types of instances: query–item pairs from the search dataset and co-occurring item–item pairs from the recommendation dataset. The resulting item embeddings are then clustered and discretized with RQ-KMeans to form Semantic IDs. We refer to this approach as Multi-task.

For comparison, we also evaluate other methods that combine information from both tasks: Separate, where each task has its own set of Semantic ID tokens; and Fused SVD, where embeddings are reduced to a common dimensionality and concatenated. A full description of these models can be found in our paper.

Results

We evaluate the effectiveness (Recall@30) of search and recommendation in a joint generative S&R model using the proposed Semantic IDs on the MovieLens-25M dataset. The results, summarized in the table below, show that Multi-task Semantic IDs achieve a strong balance between search and recommendation performance compared to other cross-task approaches.

Importantly, the Multi-task approach lies on the Pareto frontier: no other cross-task method improves one task without reducing performance on the other. In this sense, it represents the best achievable trade-off given the current design. Nonetheless, there remains a performance gap compared to task-specific Semantic IDs, which achieve higher effectiveness when considering a single task at a time.

Conclusion

In our previous work [5], we examined why jointly training a generative model for search and recommendation improves effectiveness when using atomic IDs. In this work, we expand our research to explore the use of semantic IDs within a unified generative model for search and recommendation.

Our findings highlight that the way Semantic IDs are constructed is critical. Task-specific IDs perform well in isolation but fail to generalize, whereas cross-task IDs provide a balanced solution at no additional cost. These results offer early evidence that shared item representations are both feasible and beneficial, paving the way for more generalizable generative recommender architectures.

For more information please refer to our paper: Semantic IDs for Joint Generative Search and Recommendation Gustavo Penha, Edoardo D'Amico, Marco De Nadai, Enrico Palumbo, Alexandre Tamborrino, Ali Vardasbi, Max Lefarov, Shawn Lin, Timothy Heath, Francesco Fabbri, Hugues Bouchard RecSys 2025, Late Breaking Results

References

[1] Yi Tay, Vinh Tran, Mostafa Dehghani, Jianmo Ni, Dara Bahri, Harsh Mehta, Zhen Qin, Kai Hui, Zhe Zhao, Jai Gupta, et al. Transformer memory as a differentiable search index. NeurIPS 2022. [2] Shashank Rajput, Nikhil Mehta, Anima Singh, Raghunandan Hulikal Keshavan, Trung Vu, Lukasz Heldt, Lichan Hong, Yi Tay, Vinh Tran, Jonah Samost, et al. . Recommender systems with generative retrieval. NeurIPS 2023. [3] Hansi Zeng, Chen Luo, Bowen Jin, Sheikh Muhammad Sarwar, Tianxin Wei, and Hamed Zamani. Scalable and Effective Generative Information Retrieval. The WebConf 2024 [4] Haohao Qu, Wenqi Fan, Zihuai Zhao, and Qing Li. TokenRec: Learning to Tokenize ID for LLM-based Generative Recommendation. arXiv:2406.10450 [cs.IR] [5] Penha, Gustavo, Ali Vardasbi, Enrico Palumbo, Marco De Nadai, and Hugues Bouchard. Bridging Search and Recommendation in Generative Retrieval: Does One Task Help the Other?. RecSys 2024.

Semantic IDs for Generative Search and Recommendation

Task-specific semantic IDs do not generalize in a joint generative Search & Recommendation (S&R) model

Semantic IDs from a multi-task bi-encoder model

Results

Conclusion

References

Related articles

Learning Personalised Prices in Ad Auctions with Game Theory and Deep Learning

Personalizing Agentic AI to Users' Musical Tastes with Scalable Preference Optimization

Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations

Profile-aware LLM-as-a-Judge for Podcasts: A Better Middle Ground Between Offline Metrics and A/B Tests