This repository imlements the simultions for showcasing the cost of smoothness misspecification in the non-parametric contextual bandits setting. We compare the performance of three policies: the policy that adapts to smoothness, proposed by Perchet and Rigollet (2013) which is initiated by the correct smoothness parameter , and which is initiated by the misspecified smoothness parameter .
Yonatn Gur, Ahmadreza Momeni, and Stefan Wager. Smoothness-Adaptive-Contextual-Bandits. 2020. [arxiv]