Supplementary Materials #

This website maintains the supplementary materials related to the following paper:

Ke Li, Han Guo+, “Human-in-the-Loop Policy Optimization for Preference-Based Multi-Objective Reinforcement Learning”, submitted for peer review, 2024.

It consists of the following parts:

The source code of this paper can be found from our Github repo.
Video clips of the preferred policy obtained by our proposed PBMORL.
Supplementary figures in addition to the experimental results reported in the manuscript can be found in this PDF.

Ant-v2 and Halfcheetah-v2, left is \(f_1\) preferred and right is \(f_2\) preferred.
Moving peak benchmark

For Ant-v2, \(f_1\) and \(f_2\) are the speeds at the x and y axes, respectively.
For Halfcheetah-v2, \(f_1\) and \(f_2\) are the forward speed and energy consumption, respectively.

Swimmer-v2 and Walker-v2, left is \(f_1\) preferred and right is \(f_2\) preferred.
Moving peak benchmark
For Swimmer-v2 and Walker-v2, \(f_1\) and \(f_2\) are the forward speed and energy consumption, respectively.

Humanoid-v2 and Hopper-v2, left is \(f_1\) preferred and right is \(f_2\) preferred.
Moving peak benchmark
For Humanoid-v2, \(f_1\) and \(f_2\) are the forward speed and energy consumption, respectively.
For Hopper-v2, \(f_1\) and \(f_2\) are the forward speed and jumping height, respectively.

Hopper-v3, left to right are \(f_1\) , \(f_2\) , and \(f_3\) preferred, respectively.
Moving peak benchmark with Gaussian peaks
For Hopper-v3, \(f_1\) to \(f_3\) are the forward speed, jumping height, and energy consumption, respectively.

Please cite the paper by using the following bibtex.

@article{LiLY22,
    author    = {Ke Li and
                 Han Guo},
    title     = {Human-in-the-Loop Policy Optimization for Preference-Based Multi-Objective Reinforcement Learning},
    journal   = {ArXiv},
    pages     = {1--15},
    year      = {2024},
    note      = {under review}
}