Tool: Hexapawn – a machine-learning teaching tool

Tool: Hexapawn – a machine-learning teaching tool

The simple game illustrates reinforced learning and is a valuable teaching tool in any machine-learning courses. The tool is freely accessible right here.

1. januar 2020

Tool: Hexapawn – a machine-learning teaching tool

Hexapawn is a simple deterministic two-player game. The game offers relatively few possible positions, which makes it a nice model in order to explore reinforcement learning strategies employed by artificial intelligences.

The Hexapawn game can be accessed by clicking this link. The game is developed to complement machine-learning courses at Aarhus University, by giving master students an interactive example of how a machine can learn, and by giving lecturers a teaching tool to demonstrate just that.

About the Hexapawn

The original game was developed by Martin Gardner in 1962 and published in Mathematical Games by Scientific American.

The Hexapawn tool is a small simulator based on the mathematical game of the same name. It uses a heuristic learning algorithm (learn by doing) to teach a computer to play the game by playing it repeatedly until it has learned only the winning paths.

The goal of each player is to advance one of their pawns to the opposite end of the board or to prevent the other player from moving. Since it is a two-player perfect-information game, there is always a perfect strategy for one of the players, meaning that the game always ends in a win for one player or the other.

Find the instruction for how to play the game by clicking this link.

About the Hexapawn

The tool is based on a stripped-down version of chess using only six pawns (hence the name) on a 3×3 board. One player is controlled by a human, the other by a computer (a series of labelled matchboxes with beads in the original version).

There are two learning approaches for the computer: punish, reward. The punish approach removes moves that lead to a loss (i.e. it removes the last move made by the computer when the human wins). The reward approach instead gives more weight to moves that lead to victory (i.e. it adds an extra “bead” corresponding to the last move performed by the computer when it wins).

The two approaches lead to vastly different results, with the punishment way allowing the computer to reach a 90% win rate after only a handful of lost games.

How can you use the tool for learning?

The tool offers a modest model to facilitate teaching regarding the mechanisms behind reinforced learning. By illustrating the mechanisms using a game, teaching and learning becomes interactive and engages the students in hands-on machine-learning.

The tool promotes a basic understanding of how a machine can learn as well as it offers insight into reinforcement learning strategies. Hence, the Hexapawn tool makes learning a game, in literal sense.

How was the tool developed?

The tool was developed as a simple web-app using the TypeScript and JavaScript programming languages. The simulator is written around three basic objects that handle the players, the board and the game itself.

The original algorithm and rules were used as a skeleton for the code, with the only additions being toggles for the learning approaches, computer control and learning for both players, and the ability to play on any board of nxn size (with n greater or equal to 3).

The graph is provided by the chart.js library and the tool is deployed to Heroku via BitBucket’s pipelines.

The Hexapawn tool was initiated and tested by Associate Professor Mirko Presser and developed in collaboration with student assistant Matteo Campinoti from the DBD team and the Department of Business Development and Technology at Aarhus University.

Access the tool by clicking this link.

Access the code behind the tool by clicking this link.

bubble