Composing Efficient, Robust Tests for Policy Selection

Dustin Morrill

Thomas Walsh

Daniel Hernandez

Peter R. Wurman

Peter Stone

UAI 2023



Modern reinforcement learning systems produce many high-quality policies throughout the learning process. However, to choose which policy to actually deploy in the real world, they must be tested under an intractable number of environmental conditions. We introduce RPOSST, an algorithm to select a small set of test cases from a larger pool based on a relatively small number of sample evaluations. RPOSST treats the test case selection problem as a two-player game and optimizes a solution with provable k-of-N robustness, bounding the error relative to a test that used all the test cases in the pool. Empirical results demonstrate that RPOSST finds a small set of test cases that identify high quality policies in a toy one-shot game, poker datasets, and a high-fidelity racing simulator.

Related Publications

VaryNote: A Method to Automatically Vary the Number of Notes in Symbolic Music

CMMR, 2023
Juan M. Huerta*, Bo Liu*, Peter Stone

Automatically varying the number of notes in symbolic music has various applications in assisting music creators to embellish simple tunes or to reduce complex music to its core idea. In this paper, we formulate the problem of varying the number of notes while preserving the…

LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning

NeurIPS, 2023
Bo Liu*, Yifeng Zhu*, Chongkai Gao*, Yihao Feng*, Qiang Liu*, Yuke Zhu*, Peter Stone

Lifelong learning offers a promising paradigm of building a generalist agent that learns and adapts over its lifespan. Unlike traditional lifelong learning problems in image and text domains, which primarily involve the transfer of declarative knowledge of entities and conce…

FAMO: Fast Adaptive Multitask Optimization

NeurIPS, 2023
Bo Liu*, Yihao Feng*, Peter Stone, Qiang Liu*

One of the grand enduring goals of AI is to create generalist agents that can learn multiple different tasks from diverse data via multitask learning (MTL). However, gradient descent (GD) on the average loss across all tasks may yield poor multitask performance due to severe…


Shape the Future of AI with Sony AI

We want to hear from those of you who have a strong desire
to shape the future of AI.