In my last post, I said I was hoping to develop a better way of assessing the relative strength of two (or more) game-playing AIs. Let’s talk more about that.

Currently I have a short script that plays two different versions of Picket against each other, and tallies how many wins each player gets. It was very useful during development to know if a given change increased the strength of an engine versus the current best-playing version.

So why isn’t that enough? If we’re trying to improve playing strength, why do we care about more than knowing we’re better than the current best?

One task a game AI programmer has is deciding the value of different variables in the algorithm. For example, how many iterations should the algorithm take to decide on a solution? More iterations typically mean greater playing strength, but also take longer. Other parameters, like the UCT exploration parameter have less intuitive implications, but can still dramatically affect playing strength. A more rigorous way of evaluating the effect of changing these values would be useful.

This is especially important for iterations, since there appears to be a point of diminishing returns (at least with Picket). 100,000 iterations takes roughly ten times as long as 10,000 to make a move, but seems to result in much less than ten times the strength. If I could graph the number of iterations against a numerical estimate of playing strength, it would be easier to find a good strength/time trade-off.

Of course, a rating system would also be useful for gauging the effect of changes to the algorithm itself, beyond simple parameter changes.

The rating system could easily be kept generic, to be used in other projects and other games.

Ultimately, the goal is to create stronger game AI engines. In some cases these systems can feel very black-box. The more we understand about the algorithms and the resulting playing strength, the better off we are.