A Quick Overview of Reinforcement Learning (RL)
Talk, Fudan University (internal seminar), Shanghai, China
Abstract
This seminar serves as a theoretical prerequisite for understanding modern Large Language Model (LLM) reinforcement learning alignment techniques, such as GRPO and DAPO. Rather than focusing on the heavy engineering pipelines of RLHF, this talk constructs a rigorous, uninterrupted mathematical narrative.
