Latest News

Online Semi-Supervised Learning in Contextual Bandits with Episodic Reward

[Submitted on 17 Sep 2020 (v1), last revised 25 Oct 2020 (this version, v2)]

Download PDF

Abstract: We considered a novel practical problem of online learning with episodically
revealed rewards, motivated by several real-world applications, where the
contexts are nonstationary over different episodes and the reward feedbacks are
not always available to the decision making agents. For this online
semi-supervised learning setting, we introduced Background Episodic Reward
LinUCB (BerlinUCB), a solution that easily incorporates clustering as a
self-supervision module to provide useful side information when rewards are not
observed. Our experiments on a variety of datasets, both in stationary and
nonstationary environments of six different scenarios, demonstrated clear
advantages of the proposed approach over the standard contextual bandit.
Lastly, we introduced a relevant real-life example where this problem setting
is especially useful.

Submission history

From: Baihan Lin [view email]

Thu, 17 Sep 2020 20:41:02 UTC (2,150 KB)

Sun, 25 Oct 2020 03:29:56 UTC (2,149 KB)

Read More

Show More

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button

Adblock Detected

Please consider supporting us by disabling your ad blocker