#### Accelerating Creator Audience Building through Centralized Exploration

Buket Baran, Guilherme Dinis Junior, Antonina Danylenko, Olayinka S. Folorunso, Gösta Forsum, Maksym Lefarov, Lucas Maystre, Yu Zhao

We consider the concurrent reinforcement learning problem where n agents simultaneously learn to make decisions in the same environment by sharing experience with each other. Existing works in this emerging area have empirically demonstrated that Thompson sampling (TS) based algorithms provide a particularly attractive alternative for inducing cooperation, because each agent can independently sample a belief environment (and compute a corresponding optimal policy) from the joint posterior computed by aggregating all agents’ data , which induces diversity in exploration among agents while benefiting shared experience from all agents. However, theoretical guarantees in this area remain under-explored; in particular, no regret bound is known on TS based concurrent RL algorithms. In this paper, we fill in this gap by considering two settings. In the first, we study the simple finite-horizon episodic RL setting, where TS is naturally adapted into the concurrent setup by having each agent sample from the current joint posterior at the beginning of each episode. We establish a O˜(HS √ (AT / n) ) per-agent regret bound, where H is the horizon of the episode, S is the number of states, A is the number of actions, T is the number of episodes and n is the number of agents. In the second setting, we consider the infinite-horizon RL problem, where a policy is measured by its long-run average reward. Here, despite not having natural episodic breakpoints, we show that by a doubling-horizon schedule, we can adapt TS to the infinite-horizon concurrent learning setting to achieve a regret bound of O˜(DS (√ AT / n)), where D is the standard notion of diameter of the underlying MDP and T is the number of timesteps. Note that in both settings, the per-agent regret decreases at an optimal rate of Θ( 1 / √ n ), which manifests the power of cooperation in concurrent RL.

September 2023 | RecSys
#### Accelerating Creator Audience Building through Centralized Exploration

Buket Baran, Guilherme Dinis Junior, Antonina Danylenko, Olayinka S. Folorunso, Gösta Forsum, Maksym Lefarov, Lucas Maystre, Yu Zhao

August 2023 | Interspeech
#### Lightweight and Efficient Spoken Language Identification of Long-form Audio

Winstead Zhu, Md Iftekhar Tanveer, Yang Janet Liu, Seye Ojumu, Rosie Jones

July 2023 | KDD
#### Impatient Bandits: Optimizing for the Long-Term Without Delay

Thomas McDonald, Lucas Maystre, Mounia Lalmas, Daniel Russo, Kamil Ciosek