Pivotal role of reinforcement learning in modern large language models

Kempner Institute researchers are working at the cutting edge of reinforcement learning (RL), across the full scientific spectrum from theory to practical applications. RL is a key tool that scientists use to train large language models (LLMs) like ChatGPT.
The explosion of modern AI, exemplified by the unprecedented abilities of large language models (LLMs), was enabled by a family of computational techniques known as machine learning (ML). But how exactly does a machine learn anything? In this explainer, we dive into one of the most important tools in the ML toolbox: reinforcement learning (RL).
The prehistory of RL: B.F. Skinner and operant conditioning
In the middle decades of the 20th century, B.F. Skinner was one of the most influential experimental psychologists in the world. A Harvard professor, Skinner is widely considered the father of operant conditioning, a technique for training an animal or a human to produce a specific behavior using rewards or punishments. It’s the same approach a dog owner might use to train a dog by rewarding certain behaviors with treats.
While still a graduate student, Skinner invented the operant conditioning chamber or “Skinner box,” a highly controllable setting in which a lab animal’s behavior can be observed and manipulated. A rat in a Skinner box might be given a treat whenever it happens to press a lever after a light is turned on. Over time, the rat learns to tap the lever as soon as the light turns on. Alternatively, the rat might be punished for tapping on a lever. Over time, the rat learns to avoid that action. In both cases, the rat learns what to do, and what not to do, by trial and error. Over time, the rat’s voluntary behavior is shaped by the consequences of its past actions.
A rat running around in a cage tapping on levers might seem a world away from a large language model running on a state-of-the-art supercomputer like the Kempner AI cluster, but there’s an important thread linking Skinner’s work with the ongoing AI renaissance.