A new research paper explores the effectiveness of different categorical encoding methods for high-cardinality fraud detection. The study tested seven encoders on the IEEE-CIS fraud benchmark dataset, comparing their performance using LightGBM and CatBoost learners. Entity embeddings achieved the highest AUC-ROC score, closely followed by CatBoost, and significantly outperformed tier group encoding. However, on AUC-PR, CatBoost led, indicating no single encoder dominated both metrics. The research suggests that entity embeddings offer an advantage due to their ability to capture joint multi-column representations. AI
IMPACT This research provides insights into optimizing fraud detection models by comparing different encoding techniques, potentially improving accuracy in financial applications.
RANK_REASON Academic paper detailing a new methodology and benchmark results. [lever_c_demoted from research: ic=1 ai=1.0]
- Catboost
- entity embeddings
- IEEE-CIS
- LightGBM
- TabNet: Attentive Interpretable Tabular Learning
- target encoding
- tier group encoding
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →