Hello everyone! This is the second blog post from the Hiperdyne AI team. Recently, some of the members of the Hiperdyne AI Team participated in a competition named The Animal-AI Olympics. Here’s what we did!
The researchers at Leverhulme CFI organized The Animal-AI Olympics as a competition for mimicking the thinking-process of intelligent animals with the help of AI.
We decided to participate in this competition so that we can gain experience in building systems related to reinforcement learning and the latest AI technology.
About the Competition:
We had to mimic the thinking process of an animal using a program, we will call it ‘Agent’ throughout this article.
We were provided properties for an arena and a list of cognitive abilities to be tested with our agent. There were 10 categories of tests, where in each category the agent was tested whether it understood certain elements of its environment (food, obstacles, and danger areas) and developed cognitive or reasoning capabilities (choosing to avoid danger areas, choosing bigger food over smaller ones, choosing a shorter route to the food, etc).
Arena Samples:
The first image below depicts 4 sample arenas.
In each of the arenas, there are some obstacles like walls with multiple colors. The agent is shown as a blue ball, and the food target is shown as a green ball. There could be multiple food targets, but as the arena is simple, only one is shown.
There are more difficult arenas as well. In the following image, there are 9 sample arenas with higher difficulty levels.
In each arena, there are multiple types of objects, such as walls, tunnels, boxes, etc. The red rectangular areas are restricted and thus an agent will be penalized if it enters it. Agents are shown as dogs and the food targets are shown as bananas.
In addition, there might be some predators in an arena in form of red balls which if the agent encounters the episode will terminate. These predators can be both moving and stationary.
The Agent’s Task:
The object of this competition was to train an agent in such a way that the agent could differentiate between different objects in the test arena, go around obstacles, avoid restricted areas, and maximize the reward value by gathering all the food targets.
Methods Tried :
We tried the following methods for solving the task.
- The Unity Machine Learning Agents Toolkit (ML-Agents):
ML-Agents is an open-source Unity plugin that enables games and simulations to serve as environments for training intelligent agents. Agents can be trained using reinforcement learning, imitation learning, neuroevolution, or other machine learning methods through a simple-to-use Python API.
The challenge we faced when using this approach was to find out the optimum hyperparameters for a configured arena. The training took a lot of time and it was hard to keep track of the changes in the hyperparameters.
- CNN Model and DQN:
At first, we tried some Convolutional Neural Network (CNN) models in order to solve the task. But, considering the complexity of the task, we started to use another method named Deep Q-Network (DQN).
Q-network or Q-learning is one of the simplest reinforcement learning algorithms. DQN is a combination of deep learning and reinforcement learning (Q-learning). It is also called off-policy reinforcement learning because it maps state to reward of each action instead of mapping state with action directly.
In our case, the DQN method consisted of the following basic steps:
- Getting information regarding the state of the agent (agent position, target position, velocity, etc.)
- Predicting the best action (move forward, backward, left, right, etc.) based on the state for the agent
- Getting information concerning the reward
- Assigning reward information to the state and action combination
- Repeat
Here are some screenshots of our agent in action (the agent’s first person view) in Unity:
You can see the agent has to deal with different types of obstacles including immovable objects such as a ramp (which it can climb), walls, a tunnel, etc., and also movable objects such as the cardboard box and predators (red balls) that need to be avoided.
In this screenshot, both the food target (green ball) and the predator (red ball) in front of the agent is depicted. The agent needs to avoid the red ball and go for the green ball.
Final Score:
During the competition, over a period of 5 months, we submitted a total of 20 models for solving the task in our allotted spare time. The competition authority considered the best scoring submission for the final evaluation.
The Animal-AI authority provided a set of 10 evaluation types. The final score is the sum of the 10 types.
They also published a leaderboard with a total of 61 teams based on the final scores.
Our team achieved the 30th position on the leaderboard!
One very promising thing regarding our score is that our agent secured 3rd best position in the Object Permanence evaluation! This evaluation tests the agent’s ability to differentiate between static and moving objects and remember the positions of the static objects in the arena.
Final Thoughts:
By participating in this competition we were hoping to learn and implement some new things which we do not use regularly. As a result, we had the chance to learn about color segmentation of images, Reinforcement Learning, Curriculum Learning, and the Unity ML-Agents Toolkit. From this experience, we are confident that we can use these methods in our future projects.
Special thanks to Koozyt and Hiperdyne for letting us participate in such an interesting and informative competition!
Hiperdyne AI Team
最新記事 by Hiperdyne AI Team (全て見る)
- AI Radiologist: Teaching Computers to Recognize Brain Hemorrhage - 2月 28, 2020
- Can an AI Agent Mimic an Animal? - 1月 24, 2020
- Kaggleコンペティションの試みの物語:分子特性の予測 - 11月 25, 2019