Kathryn Merrick

Biography

Kathryn Merrick is an associate professor in computer science and deputy head of school (teaching) in the School of Engineering and IT at the University of New South Wales, Canberra. Kathryn’s research interests lie in the field of autonomous mental development for machines, including machine learning. Kathryn’s research speciality within this domain is the development, use and evaluation of computational models of motivation. Kathryn has contributed to the theoretical development of computational motivation in reinforcement learning, particle swarm optimisation and game theoretic settings. Applications of Kathryn’s research include the control of believable digital characters in online games, intelligent sensed environments, and developmental robots. Her research has attracted $1.5 million in research funding from sources such as the Australian Research Council and the Defence Science and Technology Group. She has over 90 publications including two books.

Abstract

Intrinsically Motivated, Multi-objective Reinforcement Learning

Many real-world decision-making problems involve multiple conflicting objectives that cannot be optimized simultaneously without a compromise. Such problems are known as multi-objective Markov decision processes. Multi-objective reinforcement learning methods address this challenge by finding an optimal coverage set of non-dominated policies that can satisfy any user’s preference about the trade-off required to solve the problem. However, this is achieved with costs of computational complexity, time consumption and lack of adaptability to non-stationary environment dynamics. This talk will consider the interplay between intrinsic motivation and multi-objective reinforcement learning. It will first introduce several different architectures that can achieve intrinsically motivated learning in multi-objective settings. It will then introduce a specific, novel developmental method that utilizes adversarial self-play between an intrinsically motivated preference exploration component, and a policy coverage set optimization component that robustly evolves a convex coverage set of policies. Results show the effectiveness of the proposed method in comparison to state-of-the-art multi-objective reinforcement learning methods in stationary and non-stationary environments.

back to top