Lighting the Pathways of Reward and Punishment

Photons for Fido

“Positive reinforcement results in lasting behavioral modification, whereas punishment changes behavior only temporarily and presents many detrimental side effects.” B.F. Skinner (1970)

Balanced and dominance trainers tend to present reward and punishment as opposite sides of the same coin, as if choosing one over the other was a matter of taste. This simplistic picture ignores the basic fact that reinforcement and punishment are functionally, anatomically and cytologically different. This study takes advantage of the relatively new technology of optogenetics; it permits cell-type specific targeting and the ability to control cell activation with light.

[N.B. The safe money is that the pioneers of optogenetics will get a Nobel Prize in 10-15 years]

Scientific American, 2012 Nov.

Scientific American, 2012 Nov.

We’ve known that reinforcement learning (in this case reinforcement refers to punishment and reward) involves dopamine and changes to striatal synaptic activity. Striatal connectivity is divided into two pathways that ultimately have opposing action on the thalamus. The indirect pathway increases thalamic activity and the direct pathway inhibits thalamic activity. Because of previous experiments these pathways have also been described as Go/NoGo pathways. But are were they more than that?

The authors wanted to test the hypothesis that direct pathway medium spiny neurons (dMSNs) mediate reinforcement and indirect pathway neurons (iMSNs) mediate punishment

To test the hypothesis they created 2 distinct test populations. One group expressed channelrhodopsin only on D1 MSNs and the second group only on D2 MSNs – labeled dMSN-ChR2  and iMSN-ChR2 respectively. Since dopamine can be excitatory or inhibitory, optogenetics offers the added benefit of directly exciting the neurons thereby taking dopamine out of the equation.

Optogenetics - operant box

doi:10.1038/nn.3100

The authors began with a familiar procedure, placing the mouse in an operant box. The mouse is placed in a box with active and inactive plates; the active place triggers the optrode initiating MSN activity. Optogenetic freeshaping. What they found was that activation of direct pathway (dMSN) reinforces contact and indirect activation (iMSN)reduces the probability of contact.

Because reinforcement increases the probability of repeating a behavior and punishment decreases it, we can say that activation of dMSN pathway is reinforcing and iMSN pathway mediates punishment.

gdfgdfgdoi:10.1038/nn.3100

doi:10.1038/nn.3100

Mice were trained for 3 days with each session lasting 30 minutes. For the whole trial (a) shows that activation of dMSN (BLUE) pathway results in a stronger response than iMSN (RED).  The results in (b) illustrating the first 2 minutes of each trial and also suggest that by the second day the dMSN mice had made a positive association – learned behavior – and the preference persisted into the third day. The iMSN mice show weaker responses and show minimal discrimination for day 2 and no discrimination during the first 2 minutes of the third day’s lesson showing the transient nature of behavior acquired by punishment/indirect mediated pathway.

doi:10.1038/nn.3100

doi:10.1038/nn.3100

Both populations were subjected to an extinction test which consisting of 30 minutes of retraining immediately followed by 30 minutes of extinction. During the extinction test neither plate results in activation.

The dMSN mice showed persistent bias toward the trigger plate even in the absence of reinforcement while iMSN mice quickly lost all behavioral preferences.

fgdf

doi:10.1038/nn.3100

The use of electronic monitoring allowed the researchers to answer a question every trainer wants to know: If I punish/reward, how does that affect the chances that behavior will be avoided/repeated?

For the first 15 seconds both mice show distinct biases compared to the control mice (YFP) but for longer periods this difference rapidly diminished for the iMSN mice and no longer significant for time periods after 15 seconds. Punishment fades and reinforcement sticks. The transitory effect of punishment couldn`t be more obvious.

sss

doi:10.1038/nn.3100

Finally to make sure the effects of reinforcement/punishment weren’t specific to learning an operant task the mice were also tested for place preference. In a place preference test an imaginary line divides the cage in half and only one side of the cage activates illumination.

The results were consistent with the previous findings with operant training; the reinforced dMSN mice learned faster and gave stronger responses as well as proved to be far more resistant during the extinction test. Like with the operant task, punished iMSN mice lost all preference during the extinction trial.

In short, reinforced (dMSN) mice learned faster, showed stronger responses, were more resistant to extinction. The punishment mediated pathway (iMSN) gave weak responses between trials and within trials, it was the behavior was prone to extinction.

If you must choose then choose wisely.

REFERENCES

Kravitz, A., Tye, L., & Kreitzer, A. (2012). Distinct roles for direct and indirect pathway striatal neurons in reinforcement Nature Neuroscience, 15 (6), 816-818 DOI: 10.1038/nn.3100

About these ads

12 thoughts on “Lighting the Pathways of Reward and Punishment

  1. I guess this means that if you inadvertently reinforce the wrong thing, e.g. with attention, it is pretty hard to get rid of.

  2. I remember learning at Uni that positive reinforcement has a much higher sticking rate; interesting to learn the neurological reasoning behind it! Need to go and make notes in my old ‘Principles of Learning and Behaviour’ book…

  3. Pingback: More science to support positive training | canibringthedog

  4. At first glance it just looks cool that someone found a way to prove what we see. We R+ operant trainers have said for decades that “P+ digs a hole, but you have to fill it quickly with R+ behaviors or the old one comes back stronger.” That might fit here, too. This is the tip of the iceberg; very interesting stuff coming.

    • It works the same with humans, quitting a bad habit (e.g. smoking) is much easier when you fill it with some other habit (exercise). Punishing and creating a behavioral vacuum will be filled by the old habits…. Idea for a blog… ”Basal ganglia in habit formation”

  5. Pingback: Frozen by Punishment | Science of Dogs

  6. Pingback: The Transparent Brain | Science of Dogs

  7. Pingback: Lighting the Pathways of Reward and Punishment. | Canid Science Library

  8. Pingback: Distinct roles for direct and indirect pathway striatal neurons in reinforcement. | Canid Science Library

Tell me what you think .....

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s