ENTITY Importance-Weighted On-Policy Distillation

Importance-Weighted On-Policy Distillation

PulseAugur coverage of Importance-Weighted On-Policy Distillation — every cluster mentioning Importance-Weighted On-Policy Distillation across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

1 over 90d

Releases · 30d

0 over 90d

Papers · 30d

1 over 90d

TIER MIX · 90D

TOPICS

paper 1
other 1

SENTIMENT · 30D

1 day(s) with sentiment data

RECENT · PAGE 1/1 · 1 TOTAL

TOOL · CL_104687 · Jun 21 · 17:20

New distillation method tackles position bias in reinforcement learning

Researchers have identified a position bias in On-Policy Distillation (OPD), a method used to improve reinforcement learning efficiency. They found that OPD's standard KL objective uniformly weights all tokens, but late…

New distillation method tackles position bias in reinforcement learning