Balanced and dominance trainers tend to present reward and punishment as opposite sides of the same coin, as if choosing one over the other was a matter of taste. This simplistic picture ignores the basic fact that reinforcement and punishment are functionally, anatomically and cytologically different. This study takes advantage of the relatively new technology of optogenetics; it permits cell-type specific targeting and the ability to control cell activation with light.
[N.B. The safe money is that the pioneers of optogenetics will get a Nobel Prize in 10-15 years]
We’ve known that reinforcement learning (in this case reinforcement refers to punishment and reward) involves dopamine and changes to striatal synaptic activity. Striatal connectivity is divided into two pathways that ultimately have opposing action on the thalamus. The indirect pathway increases thalamic activity and the direct pathway inhibits thalamic activity. Because of previous experiments these pathways have also been described as Go/NoGo pathways. But are were they more than that?
The authors wanted to test the hypothesis that direct pathway medium spiny neurons (dMSNs) mediate reinforcement and indirect pathway neurons (iMSNs) mediate punishment
To test the hypothesis they created 2 distinct test populations. One group expressed channelrhodopsin only on D1 MSNs and the second group only on D2 MSNs – labeled dMSN-ChR2 and iMSN-ChR2 respectively. Since dopamine can be excitatory or inhibitory, optogenetics offers the added benefit of directly exciting the neurons thereby taking dopamine out of the equation.
The authors began with a familiar procedure, placing the mouse in an operant box. The mouse is placed in a box with active and inactive plates; the active place triggers the optrode initiating MSN activity. Optogenetic freeshaping. What they found was that activation of direct pathway (dMSN) reinforces contact and indirect activation (iMSN)reduces the probability of contact.
Because reinforcement increases the probability of repeating a behavior and punishment decreases it, we can say that activation of dMSN pathway is reinforcing and iMSN pathway mediates punishment.
Mice were trained for 3 days with each session lasting 30 minutes. For the whole trial (a) shows that activation of dMSN (BLUE) pathway results in a stronger response than iMSN (RED). The results in (b) illustrating the first 2 minutes of each trial and also suggest that by the second day the dMSN mice had made a positive association – learned behavior – and the preference persisted into the third day. The iMSN mice show weaker responses and show minimal discrimination for day 2 and no discrimination during the first 2 minutes of the third day’s lesson showing the transient nature of behavior acquired by punishment/indirect mediated pathway.
Both populations were subjected to an extinction test which consisting of 30 minutes of retraining immediately followed by 30 minutes of extinction. During the extinction test neither plate results in activation.
The dMSN mice showed persistent bias toward the trigger plate even in the absence of reinforcement while iMSN mice quickly lost all behavioral preferences.
The use of electronic monitoring allowed the researchers to answer a question every trainer wants to know: If I punish/reward, how does that affect the chances that behavior will be avoided/repeated?
For the first 15 seconds both mice show distinct biases compared to the control mice (YFP) but for longer periods this difference rapidly diminished for the iMSN mice and no longer significant for time periods after 15 seconds. Punishment fades and reinforcement sticks. The transitory effect of punishment couldn`t be more obvious.
Finally to make sure the effects of reinforcement/punishment weren’t specific to learning an operant task the mice were also tested for place preference. In a place preference test an imaginary line divides the cage in half and only one side of the cage activates illumination.
The results were consistent with the previous findings with operant training; the reinforced dMSN mice learned faster and gave stronger responses as well as proved to be far more resistant during the extinction test. Like with the operant task, punished iMSN mice lost all preference during the extinction trial.
In short, reinforced (dMSN) mice learned faster, showed stronger responses, were more resistant to extinction. The punishment mediated pathway (iMSN) gave weak responses between trials and within trials, it was the behavior was prone to extinction.
If you must choose then choose wisely.
Kravitz, A., Tye, L., & Kreitzer, A. (2012). Distinct roles for direct and indirect pathway striatal neurons in reinforcement Nature Neuroscience, 15 (6), 816-818 DOI: 10.1038/nn.3100