Reinforcement Learning

Q Learning

Nowadays one of the most hot topic in both science and engineering is Artificial Intelligence (AI). But combination of these two words is very comprehensive litteraly. It is very easy to get confused by terms are used in the modern technology. So Machine Learning (ML) is mostly considered while people are talking about AI. But Machine Learning is just a subset of modern Artificial Intelligence that let computers learn and act without any help from humans. There are even other subsets of Machine Learning algorithms. And all these subset algorithms provide computers with various ways of learning methods to solve various type of engineering problems. Below, in the figure you can see well known taxonomy of ML:

As mentioned before all these categorized algorithms are designed to look at the problems from different point of views and solve them. But in this blog I will talk about Q Learning is just categorized just under RL. This blog will help for beginners to get basic imagination about this algorithm and I will share my framework written in Java.

Q Learning first was developed by Chris Watkins in 1989 and proved by again Watkins and Dayan in 1992. Two most common versions of algorithm are avialable as well; Deep Q Learning and Double Q Learning.

There are three components Q learning needs to work that has been shown in the figure below:

These three components are called Reward table, Q Values table and Bellman equation (named after Richard E. Bellman) relatively. Reward table can be provided by us or created programmatically due to the problem itself. WIth this table matrix we can reward or penalize our agent for its actions. We can think of Q Values table matrix as experience of agent. In the beginning it does not know anything at all about environments so all elements of Q Values table are equal to zero. As our agent discovers the environment and gets experiences it learns from environment and updates Q Values table. And each experience can be calculated with Bellman equation mathematically.

At the end of the learning session (just after iterations in other words) our agent’s experience (Q Values table) can be saved for later use. As mentioned before source code this algorithm can be found at the Appendices section. This source code can be used as a framework. And an example demo code has been created to show developers how they can implement this algorithm to their applications.

Appendices

• Source code

Please rotate your device!