Skip to content

0.5.5

Latest
Compare
Choose a tag to compare
@usaito usaito released this 16 Jun 07:36
· 20 commits to master since this release
456a1ea

Updates

  • Add some advanced off-policy gradient estimators (#167)
  • Automatic candidate hyperparamer sorting for slope (#168)
  • Fixing the the error checking about "p_e_a" in obp.ope.OffPolicyEvaluation (#169)
  • Fixing the expected reward factual in the independent reward structure (#170)
  • Allowing slope to use the true marginal importance weight for mips (#172)

References

  • Yuta Saito and Thorsten Joachims. "Off-Policy Evaluation for Large Action Spaces via Embeddings." 2022.
  • Thorsten Joachims, Adith Swaminathan, and Maarten de Rijke. "Deep Learning for Logged Bandit Feedback.", 2018.
  • Yi Su, Maria Dimakopoulou, Akshay Krishnamurthy, and Miroslav Dudik.Doubly Robust Off-Policy Evaluation with Shrinkage.", 2020.
  • Alberto Maria Metelli, Alessio Russo, and Marcello Restelli. "Subgaussian and Differentiable Importance Sampling for Off-Policy Evaluation and Learning.", 2021.