Abstract
This paper studies bandit algorithms under data poisoning attacks in a bounded reward setting. We consider a strong attacker model in which the attacker can observe both the selected actions and their corresponding rewards, and can contaminate the rewards with additive noise. We show that \emph{any} bandit algorithm with regret $O(\log T)$ can be forced to suffer a regret $\Omega(T)$ with an expected amount of contamination $O(\log T)$. {This amount of contamination is also necessary, as we prove that there exists an $O(\log T)$ regret bandit algorithm, specifically the classical UCB, that requires $\Omega(\log T)$ amount of contamination to suffer regret $\Omega(T)$.} To combat such poisoning attacks, our second main contribution is to propose verification based mechanisms, which use limited \emph{verification} to access a limited number of uncontaminated rewards. In particular, for the case of unlimited verifications, we show that with $O(\log T)$ expected number of verifications, a simple modified version of the Explore-then-Commit type bandit algorithm can restore the order optimal $O(\log T)$ regret \emph{irrespective of the amount of contamination} used by the attacker.
We also provide a UCB-like verification scheme, called Secure-UCB, that also enjoys full recovery from any attacks, also with $O(\log T)$ expected number of verifications. To derive a matching lower bound on the number of verifications, we also prove that for any order-optimal bandit algorithm, this number of verifications $O(\log T)$ is necessary to recover the order-optimal regret. On the other hand, when the number of verifications is bounded above by a budget $B$, we propose a novel algorithm, Secure-BARBAR, which provably achieves $\tilde{O}(\min\{C,T/\sqrt{B} \})$ regret with high probability against weak attackers (i.e., attackers who have to place the contamination \emph{before} seeing the actual pulls of the bandit algorithm), where $C$ is the total amount of contamination by the attacker, which breaks the known $\Omega(C)$ lower bound of the non-verified setting if $C$ is large.
Keywords
poster session 5 @ blue 1, poster session 10 @ blue 1, oral session 5 @ blue 1, poster session 5, poster session 10, oral session 5