Thompson sampling
PulseAugur coverage of Thompson sampling — every cluster mentioning Thompson sampling across labs, papers, and developer communities, ranked by signal.
7 day(s) with sentiment data
-
New Thompson Sampling methods tackle non-stationary and private contextual bandits
Two new research papers introduce novel approaches to Thompson sampling for contextual bandits. One paper, "Flow-Corrected Thompson Sampling for Non-Stationary Contextual Bandits," proposes a Bayesian method that reuses…
-
LLM framework generates verifiable PCB schematics without unit tests
Researchers have developed PCBSchemaGen, a novel framework designed to enable large language models (LLMs) to generate verifiable code for printed circuit board (PCB) schematic designs. Unlike typical code synthesis ben…
-
Detecting silent LLM degradation: New methods emerge
Developers are exploring methods to detect silent degradation in Large Language Models (LLMs) that can occur even when API calls return successful status codes. This degradation can manifest as a decline in accuracy, ad…
-
Optimism Stabilizes Thompson Sampling for Adaptive Inference
A new arXiv paper introduces a method called "optimism" to stabilize Thompson sampling, a technique widely used in adaptive inference for multi-armed bandit problems. The research, led by Han Zhong, demonstrates that th…
-
New analysis shows linear ensemble sampling matches Thompson sampling
Researchers have published a new analysis of linear ensemble sampling (ES) in stochastic linear bandits, demonstrating its effectiveness with standard Gaussian perturbations. The study shows that ES can achieve a regret…
-
Multi-armed bandits optimize structured pruning in deep neural networks
Researchers have developed a novel structured pruning framework for deep neural networks that utilizes multi-armed bandit (MAB) algorithms to remove entire neurons. This method treats each neuron as an 'arm' in a bandit…
-
Thompson Sampling algorithms advance risk-averse and GP bandits
Two new research papers explore advancements in Thompson Sampling for bandit problems. The first paper introduces an algorithm for risk-averse bandits with sub-Gaussian rewards, achieving asymptotic optimality for vario…
-
New Bayesian Framework MINTS Simplifies Sequential Decision-Making
Researchers have introduced MINTS, a new Bayesian framework for sequential decision-making under uncertainty. This minimalist approach places a prior only on the optimum's location, simplifying complex structural constr…
-
Thompson Sampling viewed as online optimization
A new paper recasts Thompson Sampling, a widely used bandit algorithm, as an online optimization problem. This perspective reveals how posterior sampling balances exploration and exploitation by mimicking a Bellman-opti…
-
New bandit algorithms tackle adversarial attacks and complex applications
Researchers are exploring new frontiers in bandit algorithms, focusing on their application and robustness in complex scenarios. One paper investigates adversarial attacks on high-dimensional offline bandits, revealing …
-
New algorithm balances user reward with statistical accuracy in experiments
Researchers have developed a new algorithm called TS-PostDiff that aims to improve the balance between user benefit and statistical accuracy in online experiments. Traditional methods like uniform random assignment are …
-
New research advances contextual bandit algorithms for dynamic and complex environments
Researchers are exploring advanced techniques for contextual bandit problems, focusing on improving regret bounds and handling dynamic environments. One paper introduces a retry-aware bandit algorithm that aims to optim…
-
New 'Delight-gated exploration' algorithm optimizes vast action spaces
Researchers have introduced Delight-gated exploration (DE), a novel algorithm designed to optimize decision-making in scenarios with vast action spaces. DE prioritizes exploratory actions based on their potential "delig…
-
New algorithm Anchor-TS improves offline-to-online learning
Researchers have developed a new algorithm called Sample-Mean Anchored Thompson Sampling (Anchor-TS) to improve offline-to-online learning. This method addresses the challenge of distribution shift between offline and o…
-
New methods boost LLM code generation efficiency and theory
Researchers have developed new methods for improving Large Language Model (LLM) code generation efficiency. One approach, Planning-after-Trial (PaT), adaptively invokes a planner only when an initial generation attempt …
-
DARTS method optimizes covariate acquisition for budget-constrained sequential experiments
Researchers have developed DARTS (Dynamic Adaptive Rerandomization via Thompson Sampling), a novel method for optimizing covariate acquisition in budget-constrained sequential experiments. This approach treats the proce…
-
New algorithm tackles scalable policy learning under network interference
Researchers have developed a new Thompson sampling algorithm designed to optimize policy impact in dynamic networks where interference occurs. This algorithm addresses the scalability limitations of existing methods, wh…
-
New AI framework 'Bayesian Reflex' unifies online learning with autonomic nervous system analogy
A new paper introduces the "Bayesian reflex" as a framework for online learning in AI, drawing an analogy to the autonomic nervous system. This approach uses probabilistic representations, Bayes' theorem for sequential …
-
Thompson Sampling for Bayesian Optimization with Preferential Feedback Analyzed
Researchers have developed a new Thompson Sampling approach for Bayesian optimization that utilizes preferential feedback, such as pairwise comparisons, instead of scalar scores. This method models comparisons through a…
-
Eugene Yan recaps RecSys conferences, highlighting AI advancements in recommendation systems.
Eugene Yan's RecSys 2022 recap highlights a significant increase in industry submissions and a focus on algorithmic advancements and real-world applications. Key papers explored efficient training for sequential recomme…