PulseAugur
EN
LIVE 17:09:05

HPC trace merging framework expands hardware counter coverage for ML

Researchers have developed a new heuristic-based method to merge High-Performance Computing (HPC) execution traces, aiming to expand the coverage of hardware counters available for machine learning-based performance prediction. This technique addresses the limitation of collecting a restricted set of hardware counters simultaneously by merging traces from multiple runs, each with different counters. The approach matches computation bursts across executions using MPI structure, timing, and communication patterns to create a unified dataset with a richer feature space for training ML models without manual counter selection. Validation on the MareNostrum5 machine demonstrated that the merged counters maintain acceptable accuracy for various applications and kernels. AI

IMPACT Enables more comprehensive hardware counter data for ML models, potentially improving the accuracy of HPC performance predictions.

RANK_REASON Publication of an academic paper on a novel methodology for HPC trace analysis. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.LG →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

HPC trace merging framework expands hardware counter coverage for ML

COVERAGE [1]

  1. arXiv cs.LG TIER_1 English(EN) · Marta Garcia-Gasulla ·

    Heuristic-Based Merging of HPC Traces to Extend Hardware Counter Coverage

    This work extends a framework for predicting the performance of High-Performance Computing (HPC) workloads using Machine Learning (ML). A common limitation in performance modeling is the restricted number of hardware counters that can be collected simultaneously. To address this,…