Brief

last 24h

[3/3] 221 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

RESEARCH · arXiv cs.LG English(EN) · 3d · [2 sources]

Class-Dependent Hybrid Data Augmentation for Multiclass Migraine Classification under Severe Class Imbalance

Researchers have developed a novel data augmentation framework to address severe class imbalance in migraine classification tasks. This approach corrects prior methodological flaws and introduces a hybrid strategy that assigns generation methods based on per-class sample size. Experiments on a dataset of 400 patients demonstrated that the proposed framework significantly improved classification performance, achieving a peak macro-F1 score of 0.914 with the FT-Transformer model. AI

IMPACT This research introduces a novel data augmentation technique that could improve the accuracy of AI models in medical diagnosis, particularly for conditions with imbalanced datasets.
TOOL · arXiv cs.LG English(EN) · 3d

Visibility nowcasting in South Korea: a machine learning approach to class imbalance and distribution shift

Researchers have developed a machine learning framework for predicting atmospheric visibility in six South Korean cities, addressing challenges like imbalanced data and distribution shifts. The study employed techniques such as SMOTENC and CTGAN to handle data imbalance and an ensemble of machine and deep learning models for prediction. A significant drop in performance on the test set compared to cross-validation highlighted the impact of temporal distribution shifts, quantified using Wasserstein distance. AI

IMPACT Presents a methodology for addressing data imbalance and distribution shifts in time-series forecasting, applicable to various scientific domains.
RESEARCH · Hugging Face Daily Papers English(EN) · 1w · [2 sources]

Memisis: Orchestrating and Evaluating Synthetic Data for Tabular Health Datasets

Researchers have developed a method to distill knowledge from large, computationally expensive tabular foundation models (TFMs) into smaller, faster models for structured health data. This technique, tested across 19 healthcare datasets, allows distilled models to retain over 90% of the original model's predictive accuracy while operating significantly faster and maintaining crucial calibration and fairness properties. The study also found that averaging predictions from multiple teachers did not consistently outperform the best single teacher, suggesting a more streamlined approach to deploying TFM-quality insights in resource-constrained health settings. Separately, a new tool called Memisis has been introduced to orchestrate and evaluate synthetic data generation for tabular health datasets, aiming to balance privacy, utility, and fairness. AI

IMPACT Distillation techniques offer a path to deploy high-performing models in resource-constrained healthcare environments, while synthetic data tools aim to improve data availability and privacy.

Brief

Class-Dependent Hybrid Data Augmentation for Multiclass Migraine Classification under Severe Class Imbalance

Visibility nowcasting in South Korea: a machine learning approach to class imbalance and distribution shift

Memisis: Orchestrating and Evaluating Synthetic Data for Tabular Health Datasets