FOUNDv2: Learning Unified User Quantized Tokenizers for User Representation
Researchers have introduced FOUNDv2, a novel framework for user representation learning designed to address limitations in traditional continuous embedding methods. This new scheme utilizes a Unified User Quantized Tokenizer (U2QT) to convert heterogeneous user data into a standardized, discrete token space, significantly reducing storage and computational costs. FOUNDv2 employs a two-stage architecture for feature extraction and discretization, incorporating multi-scale alignment objectives to capture both fine-grained behaviors and temporal patterns. Large-scale deployment on Alipay has demonstrated its practical scalability and efficiency in industrial scenarios. AI
IMPACT This research offers a more efficient method for user representation, potentially improving personalization services and reducing infrastructure costs for large-scale platforms.