Trackmania: Nations Forever is a weird kind of racing game. It’s a time trial, arcade-style game where players compete solely against the clock in a stadium setting. A YouTuber by the name of Yosh has been a long time fan of the game, honing his skills for years in pursuit of ever faster times. He then set out on a quest to see if he could train an AI capable of beating him at his own favorite game. It wasn’t an easy road, but in the battle between human and machine, Yosh would eventually come off the lesser.
Yosh set out to train an AI to play the game using a neural network. As he explains in his YouTube video, a neural network takes in numerical input from the game in the form of the car’s speed, acceleration, distance to the track walls, and so on. It then passes these numbers through a network of artificial “neurons” which effectively peform series of calculations to generate acceleration and steering outputs for the AI’s vehicle.
It might sound difficult to combine all these inputs from the car and mathematically turn them into steering and acceleration outputs. You’d be correct in that assessment. Figuring out the maths for all those neurons is too hard. Here’s the trick – Yosh didn’t have to program the neural network to do the maths directly. Instead, using a technique called reinforcement learning, he was able to train the neural network to figure out the maths for itself. With enough training, it would figure out how to make the right decisions to drive the car well.
Reinforcement learning is a straightforward concept. It’s similar to how you might train a pet to stop peeing in the house, by offering a reward for good behavior. The AI is instructed to maximize “reward,” and is sent out to drive its car on the track with no prior knowledge. When it does good things, like keeping its speed up or completing more of the course, it gets reward points. The mathematical connections between neurons that generated this behavior are then strengthened to encourage it in future. For example, if the AI finds out it gets rewarded for accelerating flat out on straights, it will modify its neural network to reinforce that behavior. Thus, when it senses a straight, it will generate the relevant accelerator output.
Yosh was able to get the AI driving the track with this methodology, but eventually hit roadblocks. His AI drivers kept hitting the walls, costing them time, and more training wasn’t weeding out the problem. After a great deal of tinkering with learning algorithms, reward weights, and the like, he was able to eventually help his AI move past this roadblock, and it began getting faster and faster. On the simple curved twisting track he’d selected, the AI eventually beat his time. Getting to this point took him 3 years.
He quickly realized that he would never beat the AI on this track, by virtue of the AI’s uncanny consistency. Human players make mistakes, whereas the AI tends to operate consistently according to the instructions of its neural network. The AI was able to deftly carve the corners with the narrowest margin, something a human player would struggle to accomplish over a whole run, let alone repeat on demand.
However, Yosh hasn’t built an all-dominating AI that can beat every human at Trackmania ever. That’s because of a concept known as generalization. It’s relevant to everything from large language models like ChatGPT, to self-driving cars in development by major automakers. It’s entirely one thing to train an AI to drive on a winding, curvy course. Put that same AI in a completely different environment, though, without any further training, and it may not be able to handle the difference very well at all. This is because, using this project as an example, the lessons learned on one track don’t necessarily transfer to others.
Yosh demonstrates this lesson by explaining his efforts to teach the AI to handle a more challenging course with harsh right-angle turns and drop-offs on the side of the track. He had to give the AI more sensory help so that it had a good idea of the track ahead. Without appropriate input from the environment, it’s impossible for the neural network to make good decisions, after all. He also had to give the AI information about the car’s pitch, roll, and wheel contact. This allowed the AI to understand when the car was tipping over the edge of the track, for example, and work to avoid driving off the edges.
After 35 hours of training, the AI was able to beat its human creator in the more complex realm, as well. Yosh had left out brake use from the AI’s abilities during training, and so he was eventually able to set a faster time himself by using the brakes. However, the AI would overcome this after around 100 hours of training, handicap not withstanding. Eventually, he went on to teach the AI to use the brake, and even drift using a special in-game trick and some heavy reinforcement. It’s hilarious to watch the AI snaking back and forth down the straights as it eagerly aims to maximize reward.
It’s amazing to see the AI racer develop over time. It starts out as a bumbling fool of a thing that can’t get round the first bend. By the end of the training, it’s carving perfect drifts on a tight course at speeds the average human player could never hope to match. Yosh has done a great job, not just in training his AI better, but in explaining these complex machine learning topics to a broader population. That, my friends, is worthy of applause.
That’s a very good and approachable explanation of how this kind of ML works!
I think the video was great! Did my best to translate.
It is really hard to come up with a good reward function and to define good constraints. AI and machine learning researchers have been discovering some of the things that optimization researchers have known since the 1980s… your learning (or optimization) algorithm will: (1) exploit quirks and not-well-modelled parts of your mathematical models; (2) exploit your reward function; (3) exploit your constraints; (4) find a local optimum but not a global optimum.
You’ll get solutions that are optimal in the context of your dynamic model but in real life have really strange behaviors (even though they are physically possible). Or solutions that are physically impossible because of unmodelled dynamics. Or solutions that exploit a local behavior to crank up the reward.
A really good description of this is here: https://www.alexirpan.com/2018/02/14/rl-hard.html
My current ML/RL “favorite” is researchers using machine learning to develop control systems for things that are really well understood using physics based methods (meaning, the machine learning is being used to learn Newton’s Law). In most cases the “right” answer will likely be to use physics based models together with a machine learning model… the physics based model captures all the stuff that you can describe using differential equations and the ML part models the things that are really hard to describe that way.
There’s got to be a way to run a program like this in a real vehicle. Something with full motion sensors and fast response electronic or hydraulic controls. Train it virtually first so it doesn’t smash up the doubtless expensive RC vehicle and then see what it can do in the real world. On a well maintained track it might not have issues but potholes, dips, humps, cracks and loose debris could give it a hard time. Plus rain or changes in light if it uses a camera based system.
What I’d really be interested to see is how they would learn to behave racing other cars. Could it be taught to bluff and intimidate others, to block cars behind it while threading it’s way through the ones ahead? It would be game theory, a constant evaluation of risk versus reward with the consequences ranging from first place to DNF. I know for a lot of people the draw of racing is apparently the driver. I don’t see how you get more personality from a human wrapped in logos speaking pure Corporatese than a computer but this would not be for them. More of a BattleBots, machine to machine competition. I’d like it and I think there are probably more people who would enjoy AI racing.
This exists!
https://www.donkeycar.com/
I built one, but kept breaking cheap R/C parts and then the pandemic supply chain priced my out of ever actually getting to compete.
It is also known as Tesla Full Self-Driving
The question I have with any of these machine learning systems is whether they’re actually better than a purpose-built simulation system. I don’t want to see this AI compete against humans, I want to see it compete against the normal AI in a racing game that was written with domain-specific logic. Is this faster or slower? Was it more or less time-consuming to write? These are the questions we should be asking around machine learning.
My guess is that the answer in some cases will be that it’s better and some cases it’s worse. I wouldn’t be shocked if it’s actually worse in a closed, relatively simple system like a racing game, but in a massively complex system with more inputs than you could possibly code for manually it might be better, or indeed the only way to do it.
Yeah, I’d love to see a contest between the trained AI and traditional human-coded AI racers. I suspect the AI would perhaps be able to win on the tracks that it’s trained on, but the problem of weak generalization would mean the human-designed racers would do better on other circuits.
Really, though – maybe it’s time to learn how regular racing game AIs are created?
Heh, those early-in-training cars remind me of when the Lemons iRacing series slaps my name on the bot-cars. There isn’t a skill setting low enough to replicate my performance in a video game.
He should try to get a sponsorship from Falken.
Professor Falken mostly focuses on AI that plays tic-tac-toe, chess, and Global Thermonuclear War.
That’s really cool. Gotta love that the ai started drifting back & forth on the straights when Yosh set a reward of 10 attached to drifting. Way too easy to anthropomorphize it : ‘Yeah! Ai likes to hoon, too!’
I just wonder how setting up demerits or ‘punishments’ for unwanted behavior would affect it. If it was punished for driving into/off the wall, would it learn the better route more quickly?
Now if only someone could train AI to teach my dryer how to fold the laundry.
And put it away.
Turns out robotics researchers are working on this… it’s amazingly hard to fold laundry (really complicated dynamics). https://www.npr.org/2022/10/22/1130552239/robot-folding-laundry
Yeah I’ve seen those reports but I think they’re just not trying hard enough. If half the effort being put into self driving cars by several companies could be applied to laundry I think we’d have it licked by now.
I know that I personally would MUCH rather have a laundry robot than a semi-self-driving car that can’t tell the difference between a stop sign and a full moon.
And if we spent half the money spent on self-driving cars on high speed rail we’d have…
half a mile of track somewhere halfway between LA and San Francisco.
unfortunately. It’d be great to have a high speed rail link (or any kind of reliable rail link) between Pittsburgh and Manhattan.
I might have agreed with you a few years ago but Sammy’s Famous Corned Beef closed down and there’s simply no reason for anyone in Manhattan to visit Pittsburgh anymore.