PulseAugur
EN
LIVE 12:37:22

Airbnb uses LLMs to generate synthetic data for search

Researchers at Airbnb have developed a novel framework utilizing large language models (LLMs) to generate synthetic data for natural language search systems. This approach addresses the critical cold-start problem by creating realistic user queries and relevance labels, enabling effective model training and evaluation. The method significantly improves query realism and attribute distribution matching compared to baseline approaches, providing valuable signals for enhancing retrieval and ranking models. AI

IMPACT Provides a scalable method for training and evaluating search systems in data-scarce environments, potentially improving user experience and search relevance.

RANK_REASON Academic paper detailing a novel methodology for synthetic data generation using LLMs. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.IR (Information Retrieval) →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. arXiv cs.IR (Information Retrieval) TIER_1 English(EN) · Sanjeev Katariya ·

    Bridging the Cold-Start Gap: LLM-Powered Synthetic Data Generation for Natural Language Search at Airbnb

    Deploying natural language search systems presents a critical cold-start challenge: no real user queries to learn linguistic patterns, and no relevance labels to train ranking models. We present a framework for generating synthetic queries and labels using large language models (…