Trackmania: Nations Forever is a weird kind of racing game. It’s a time trial, arcade-style game where players compete solely against the clock in a stadium setting. A YouTuber by the name of Yosh has been a long time fan of the game, honing his skills for years in pursuit of ever faster times. He then set out on a quest to see if he could train an AI capable of beating him at his own favorite game. It wasn’t an easy road, but in the battle between human and machine, Yosh would eventually come off the lesser.
Yosh set out to train an AI to play the game using a neural network. As he explains in his YouTube video, a neural network takes in numerical input from the game in the form of the car’s speed, acceleration, distance to the track walls, and so on. It then passes these numbers through a network of artificial “neurons” which effectively peform series of calculations to generate acceleration and steering outputs for the AI’s vehicle.
It might sound difficult to combine all these inputs from the car and mathematically turn them into steering and acceleration outputs. You’d be correct in that assessment. Figuring out the maths for all those neurons is too hard. Here’s the trick – Yosh didn’t have to program the neural network to do the maths directly. Instead, using a technique called reinforcement learning, he was able to train the neural network to figure out the maths for itself. With enough training, it would figure out how to make the right decisions to drive the car well.
Reinforcement learning is a straightforward concept. It’s similar to how you might train a pet to stop peeing in the house, by offering a reward for good behavior. The AI is instructed to maximize “reward,” and is sent out to drive its car on the track with no prior knowledge. When it does good things, like keeping its speed up or completing more of the course, it gets reward points. The mathematical connections between neurons that generated this behavior are then strengthened to encourage it in future. For example, if the AI finds out it gets rewarded for accelerating flat out on straights, it will modify its neural network to reinforce that behavior. Thus, when it senses a straight, it will generate the relevant accelerator output.
Yosh was able to get the AI driving the track with this methodology, but eventually hit roadblocks. His AI drivers kept hitting the walls, costing them time, and more training wasn’t weeding out the problem. After a great deal of tinkering with learning algorithms, reward weights, and the like, he was able to eventually help his AI move past this roadblock, and it began getting faster and faster. On the simple curved twisting track he’d selected, the AI eventually beat his time. Getting to this point took him 3 years.
He quickly realized that he would never beat the AI on this track, by virtue of the AI’s uncanny consistency. Human players make mistakes, whereas the AI tends to operate consistently according to the instructions of its neural network. The AI was able to deftly carve the corners with the narrowest margin, something a human player would struggle to accomplish over a whole run, let alone repeat on demand.
However, Yosh hasn’t built an all-dominating AI that can beat every human at Trackmania ever. That’s because of a concept known as generalization. It’s relevant to everything from large language models like ChatGPT, to self-driving cars in development by major automakers. It’s entirely one thing to train an AI to drive on a winding, curvy course. Put that same AI in a completely different environment, though, without any further training, and it may not be able to handle the difference very well at all. This is because, using this project as an example, the lessons learned on one track don’t necessarily transfer to others.
Yosh demonstrates this lesson by explaining his efforts to teach the AI to handle a more challenging course with harsh right-angle turns and drop-offs on the side of the track. He had to give the AI more sensory help so that it had a good idea of the track ahead. Without appropriate input from the environment, it’s impossible for the neural network to make good decisions, after all. He also had to give the AI information about the car’s pitch, roll, and wheel contact. This allowed the AI to understand when the car was tipping over the edge of the track, for example, and work to avoid driving off the edges.
After 35 hours of training, the AI was able to beat its human creator in the more complex realm, as well. Yosh had left out brake use from the AI’s abilities during training, and so he was eventually able to set a faster time himself by using the brakes. However, the AI would overcome this after around 100 hours of training, handicap not withstanding. Eventually, he went on to teach the AI to use the brake, and even drift using a special in-game trick and some heavy reinforcement. It’s hilarious to watch the AI snaking back and forth down the straights as it eagerly aims to maximize reward.
It’s amazing to see the AI racer develop over time. It starts out as a bumbling fool of a thing that can’t get round the first bend. By the end of the training, it’s carving perfect drifts on a tight course at speeds the average human player could never hope to match. Yosh has done a great job, not just in training his AI better, but in explaining these complex machine learning topics to a broader population. That, my friends, is worthy of applause.