Datum
dinsdag 21 mei 2024 vanaf 12:00 PM tot 1:30 PMLocatie
GZ 0.05Medeorganisator
Mechanical EngineeringPrijs
freeNonlinear policy optimization in deep reinforcement learning: policy gradients for wide neural networks
Andrea Agazzi, Assistant Professor in the Mathematics Department at the University of Pisa, is a guest of Mauro Salazar, Assistant Professor at Control Systems Technology group of the department of Mechanical Engineering, 黑料福利网.
Title | Nonlinear policy optimization in deep reinforcement learning: policy gradients for wide neural networks
In recent years, we have witnessed multiple groundbreaking results obtained using neural networks as flexible (nonlinear) parameterizations of large policy classes to solve difficult reinforcement learning tasks, e.g., AlphaGO, Dota2, Self-driving cars. However, despite these successes, there exists a notable gap in providing theoretical explanations for the effectiveness of neural networks trained with (deep) reinforcement learning algorithms. In this presentation, I will first briefly overview of the policy optimization problem in reinforcement learning, along with an introduction to the policy gradient algorithm, a prototypical solution approach. Then, I will discuss some limitations of this algorithm when paired with general nonlinear policy classes. Finally, I will discuss how these limitations are bypassed by wide neural networks under an appropriate scaling of parameters at initialization, resulting in the convergence of the policy gradient training dynamics towards a so-called 鈥渕ean-field鈥 limit. In particular, in this setting one can prove global optimality of the dynamics' fixed points despite the nonlinear and nonconvex characteristics of the risk function.
Program
12:00 - 12.45 Lecture in Gemini South 0.05 (doors open at 11:45)
12:45 - 13:00 Q&A
13:00 Pizza lunch
Andrea Agazzi
Andrea Agazzi, Assistant Professor in the Mathematics Department at the University of Pisa, received his PhD in Theoretical Physics at the University of Geneva, and was then hired as a Griffith Research Assistant Professor at Duke University. Before that, he obtained his Bsc degree in physics at ETH Zurich and his Msc in theoretical physics at Imperial College London. His main research focus is in applied probability theory, using techniques from statistical mechanics and stochastic analysis to gain insight in the (stochastic) behavior of complex dynamical models emerging in real world applications. For example, he has worked on scaling limits of machine learning models seen as interacting particle systems, on the behavior of large networks of chemical reactions, focusing on the relations between their stochastic dynamics and their structure, and on stochastic approximations of complex fluid models.
Eindhoven Artificial Intelligence Systems Institute
Artificial Intelligence