Supplementary

Supplementary Materials #

This website maintains the supplementary materials related to the following paper:

Ke Li, Han Guo+, “Human-in-the-Loop Policy Optimization for Preference-Based Multi-Objective Reinforcement Learning”, submitted for peer review, 2024.

It consists of the following parts:

  • The source code of this paper can be found from our Github repo.
  • Video clips of the preferred policy obtained by our proposed PBMORL.
  • Supplementary figures in addition to the experimental results reported in the manuscript can be found in this PDF.
Ant-v2 and Halfcheetah-v2, left is f1f_1 preferred and right is f2f_2 preferred.

For Ant-v2, f1f_1 and f2f_2 are the speeds at the x and y axes, respectively.
For Halfcheetah-v2, f1f_1 and f2f_2 are the forward speed and energy consumption, respectively.

Swimmer-v2 and Walker-v2, left is f1f_1 preferred and right is f2f_2 preferred.

For Swimmer-v2 and Walker-v2, f1f_1 and f2f_2 are the forward speed and energy consumption, respectively.


Humanoid-v2 and Hopper-v2, left is f1f_1 preferred and right is f2f_2 preferred.

For Humanoid-v2, f1f_1 and f2f_2 are the forward speed and energy consumption, respectively.
For Hopper-v2, f1f_1 and f2f_2 are the forward speed and jumping height, respectively.


Hopper-v3, left to right are f1f_1 , f2f_2 , and f3f_3 preferred, respectively.

For Hopper-v3, f1f_1 to f3f_3 are the forward speed, jumping height, and energy consumption, respectively.

Please cite the paper by using the following bibtex.

@article{LiLY22,
    author    = {Ke Li and
                 Han Guo},
    title     = {Human-in-the-Loop Policy Optimization for Preference-Based Multi-Objective Reinforcement Learning},
    journal   = {ArXiv},
    pages     = {1--15},
    year      = {2024},
    note      = {under review}
}