[1] | but footnotes are cooler! Agreed! |
but (3) is fine.
Chakraborty, Qiu, Bedi, Wang, MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with Diverse Human Preferences, ICML 2024.
Singh, Sutton, Reinforcement learning with replacing eligibility traces, Machine Learning 1996.