dolphinatdock
Source: imms.org

 

Today, I will try to introduce a framework for intelligence (both artificial and biological). This framework, coexisting in both computer science and behavioral psychology, is known as reinforcement learning. What does it entail?

Let’s think how we would train a dolphin to collect litter from a pool and bring it back us, as done by the Marine Mammal Studies in Mississippi [1]. We’ll call this dolphin an ‘agent’, and everything around it the ‘environment’. The environment includes the pool, the litter, us (the trainers), and basically everything that isn’t the dolphin.

What would be a good way to train these dolphins? Well, what do dolphins like? Fish! That right, we can reward these dolphins with fish whenever they bring back litter. Fish, in our case, is what would be called a ‘positive reward signal’ in the field of reinforcement learning. Note that the reward signal can be negative as well. But in our case, simply not giving the dolphins any fish if they don’t bring back litter would suffice because the dolphins wouldn’t have anything to eat. Seems simple enough, right?

At a high-level, this is what the reinforcement learning framework entails. There is an agent in an environment, and the agent does actions (like collecting litter from the pool) which manipulates the state of the environment. Furthermore, the agent receives rewards (could be varying magnitudes and positive or negative) and observes the change in state of the environment. Observing the reward and new state allows the agent to reinforce the consequences of particular actions, which eventually leads to learning an optimal behavior, or policy. This feedback loop is shown in the figure below (taken from [2]).

RL1.jpg

There is more to how the reinforcement is actually done, a concept called TD-learning [3] , but we will not cover that in this post. The eventual policy, the mapping from action to state, that the agent learns depends a lot on how we design the reward logic! Dolphins figured out a way to game this system:

“One day, when a gull flew into her pool, she grabbed it, waited for the trainers and then gave it to them. It was a large bird and so the trainers gave her lots of fish. This seemed to give Kelly a new idea. The next time she was fed, instead of eating the last fish, she took it to the bottom of the pool and hid it under the rock where she had been hiding the paper. When no trainers were present, she brought the fish to the surface and used it to lure the gulls, which she would catch to get even more fish. After mastering this lucrative strategy, she taught her calf, who taught other calves, and so gull-baiting has become a hot game among the dolphins.” [1]

— Sims

[1] https://www.theguardian.com/science/2003/jul/03/research.science

[2]http://blogs.cornell.edu/ml4ics/2011/05/09/approach-to-the-problem-irl/

[3]https://en.wikipedia.org/wiki/Temporal_difference_learning

Advertisements