PulseAugur
LIVE 13:58:05
research · [1 source] · · 中文(ZH) 用机器合成数据训练,效果会比真实世界数据更好吗?
0
research

AI companies explore synthetic data to overcome real-world data limitations in robotics

The scarcity of high-quality data has been a major bottleneck for training embodied AI models, with real-world data collection being costly and time-consuming. Machine-generated synthetic data offers a potential solution, but concerns remain about its lack of real-world fidelity, such as missing friction coefficients or tactile feedback. A common approach adopted by companies in both China and the US is to use a hybrid training method, combining real-world data with synthetically generated data to significantly scale up data volume. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Hybrid training approaches may accelerate embodied AI development by overcoming real-world data limitations.

RANK_REASON Discusses a technical challenge in AI training (data scarcity for embodied models) and potential solutions (synthetic vs. real data, hybrid approaches).

Read on 36氪 (36Kr) →

COVERAGE [1]

  1. 36氪 (36Kr) TIER_1 中文(ZH) ·

    Will training with machine-synthesized data yield better results than real-world data?

    高质量数据的稀缺,一直是掣肘具身模型训练的瓶颈。当前真机数据采集一直存在成本高、周期长、场景覆盖等问题。 机器合成数据,就是解决方案之一。然而,合成数据的局限性在于真实信息的缺失,比如摩擦系数、延迟、触觉反馈等。这也造成业界对“sim-to-real-gap”的担忧。 混合数据训练,是当下中美具身智能企业提出的主流解决方案。比如,魔法原子总裁顾诗韬介绍,魔法原子日均采集约16000条数据,再通过数据合成实现1万倍的体量扩展。她提到,由于产品迭代快、60%-70%的工序依赖人工, 新能源汽车制造业,是数据采集的富矿 。 判断使用真实数据,还是机器合成数据,