Supplementary

Supplementary Materials #

This website maintains the supplementary materials related to the following paper:

Ke Li, Han Guo+, “Human-in-the-Loop Policy Optimization for Preference-Based Multi-Objective Reinforcement Learning”, submitted for peer review, 2024.

It consists of the following parts:

  • The source code of this paper can be found from our Github repo.
  • Video clips of the preferred policy obtained by our proposed PBMORL.
Ant-v2 and Halfcheetah-v2, left is \(f_1\) preferred and right is \(f_2\) preferred.

For Ant-v2, \(f_1\) and \(f_2\) are the speeds at the x and y axes, respectively.
For Halfcheetah-v2, \(f_1\) and \(f_2\) are the forward speed and energy consumption, respectively.

Swimmer-v2 and Walker-v2, left is \(f_1\) preferred and right is \(f_2\) preferred.

For Swimmer-v2 and Walker-v2, \(f_1\) and \(f_2\) are the forward speed and energy consumption, respectively.


Humanoid-v2 and Hopper-v2, left is \(f_1\) preferred and right is \(f_2\) preferred.

For Humanoid-v2, \(f_1\) and \(f_2\) are the forward speed and energy consumption, respectively.
For Hopper-v2, \(f_1\) and \(f_2\) are the forward speed and jumping height, respectively.


Hopper-v3, left to right are \(f_1\) , \(f_2\) , and \(f_3\) preferred, respectively.

For Hopper-v3, \(f_1\) to \(f_3\) are the forward speed, jumping height, and energy consumption, respectively.

Please cite the paper by using the following bibtex.

@article{LiLY22,
    author    = {Ke Li and
                 Han Guo},
    title     = {Human-in-the-Loop Policy Optimization for Preference-Based Multi-Objective Reinforcement Learning},
    journal   = {ArXiv},
    pages     = {1--15},
    year      = {2024},
    note      = {under review}
}