Supplementary Materials #
This website maintains the supplementary materials related to the following paper:
Ke Li, Han Guo+, “Human-in-the-Loop Policy Optimization for Preference-Based Multi-Objective Reinforcement Learning”, submitted for peer review, 2024.
It consists of the following parts:
- The source code of this paper can be found from our Github repo.
- Video clips of the preferred policy obtained by our proposed PBMORL.
- Supplementary figures in addition to the experimental results reported in the manuscript can be found in this
PDF
.




For Ant-v2, \(f_1\) and \(f_2\) are the speeds at the x and y axes, respectively.
For Halfcheetah-v2, \(f_1\) and \(f_2\) are the forward speed and energy consumption, respectively.
Swimmer-v2 and Walker-v2, left is
\(f_1\)
preferred and right is
\(f_2\)
preferred.
For Swimmer-v2 and Walker-v2,
\(f_1\)
and
\(f_2\)
are the forward speed and energy consumption, respectively.
Humanoid-v2 and Hopper-v2, left is
\(f_1\)
preferred and right is
\(f_2\)
preferred.
For Humanoid-v2,
\(f_1\)
and
\(f_2\)
are the forward speed and energy consumption, respectively.
For Hopper-v2,
\(f_1\)
and
\(f_2\)
are the forward speed and jumping height, respectively.
Hopper-v3, left to right are
\(f_1\)
,
\(f_2\)
, and
\(f_3\)
preferred, respectively.
For Hopper-v3,
\(f_1\)
to
\(f_3\)
are the forward speed, jumping height, and energy consumption, respectively.
Please cite the paper by using the following bibtex.
@article{LiLY22,
author = {Ke Li and
Han Guo},
title = {Human-in-the-Loop Policy Optimization for Preference-Based Multi-Objective Reinforcement Learning},
journal = {ArXiv},
pages = {1--15},
year = {2024},
note = {under review}
}