Skip Navigation
Search

ECE Departmental Seminar

Beyond the Cumulative Return in Reinforcement Learning

Dr. Alec Koppel
U.S. Army Research Laboratory

Friday, 9/17/21, 1:00pm
Online 
To obtain access the Zoom link for this seminar, please click here to register.

Abstract: Reinforcement Learning (RL) is a form of stochastic adaptive control in which one seeks to estimate parameters of a controller only from data, and has gained popularity in recent years. However, technological successes of RL are hindered by the high variance and irreproducibility their training exhibits in practice. Motivated by this gap, we'll present recent efforts to solidify theoretical understanding of how risk-sensitivity, incorporating prior information, and prioritizing exploration may be subsumed into a "general utility." This entity is defined as any concave function of the long-term state-action occupancy measure of a MDP. We present two different methodologies for RL with general utilities: the first, for the tabular setting, extends the classical linear programming formulation of dynamic programming to general utilities. We develop a solution methodology based upon a stochastic variant of primal-dual method, whose polynomial rate of convergence to a primal-dual optimal pair is derived. Experiments demonstrate the proposed approach yields a rigorous way to incorporate risk-sensitivity into RL. Secondly, we study scalable solutions for general utilities by searching over parameterized families of policies. To do so, we put forth Variational Policy Gradient Theorem, based upon which we develop Variational Policy Gradient (VPG) method. VPG constructs a "shadow reward" which plays the role of the usual reward in PG methods to conduct search directions in parameter space. We present the convergence rate of this technique to global optimality that exploits a bijection between occupancy measures and parameterized polices. Experimentally, we observe that VPG provides an effective framework for solving constrained MDPs and exploration problems experimentally on some benchmarks in OpenAI Gym.

Bio: Alec Koppel is a Research Scientist at the U.S. Army Research Laboratory in the Computational and Information Sciences Directorate since September of 2017. He completed his Master's degree in Statistics and Doctorate in Electrical and Systems Engineering, both at the University of Pennsylvania (Penn) in August of 2017. Before coming to Penn, he completed his Master's degree in Systems Science and Mathematics and Bachelor's Degree in Mathematics, both at Washington University in St. Louis (WashU), Missouri. He is a recipient of the 2016 UPenn ESE Dept. Award for Exceptional Service, an awardee of the Science, Mathematics, and Research for Transformation (SMART) Scholarship, a co-author of Best Paper Finalist at the 2017 IEEE Asilomar Conference on Signals, Systems, and Computers, a finalist for the ARL Honorable Scientist Award 2019, an awardee of the 2020 ARL Director's Research Award Translational Research Challenge (DIRA-TRC), a 2020 Honorable Mention from the IEEE Robotics and Automation Letters, and mentor to the 2021 ARL Summer Symposium Best Project Awardee. His research interests are in optimization and machine learning. Currently, he focuses on approximate Bayesian inference, reinforcement learning, and decentralized optimization, with an emphasis on applications in robotics and autonomy.