Last night, Cruise issued a statement on ex-Twitter-now-X that stated the company intends to “proactively pause driverless operations across all of our fleets while we take time to examine our processes, systems, and tools and reflect on how we can better operate in a way that will earn public trust.” While apparently not “related to any new on-road incidents,” the decision is happening right after Cruise lost its operation license in California over an incident where one of its robotaxis dragged an injured woman underneath itself for 20 feet. Cruise will still have operations and development with a human safety driver in the driver’s seat, ready to take over. I think what is important here is something that’s maybe not so obvious. If you look at Cruise’s most recent incidents, something becomes apparent: One of the biggest hurdles of automated vehicles isn’t the obvious problem of solving the mechanical tasks of driving, but rather how to emulate the more blurry and vague general sense of surroundings that humans innately have.
The most important thing for us right now is to take steps to rebuild public trust. Part of this involves taking a hard look inwards and at how we do work at Cruise, even if it means doing things that are uncomfortable or difficult.
In that spirit, we have decided to proactively pause driverless operations across all of our fleets while we take time to examine our processes, systems, and tools and reflect on how we can better operate in a way that will earn public trust.
This isn’t related to any new on-road incidents, and supervised AV operations will continue. We think it’s the right thing to do during a period when we need to be extra vigilant when it comes to risk, relentlessly focused on safety, & taking steps to rebuild public trust.
So, I have to hand it to Cruise for voluntarily realizing it’s important to take the time to really solve its problems, instead of shoving those problems aside and just barreling ahead, solving nothing. This feels prudent and careful, which is what you want from a company building 5,000-pound robots and releasing them into cities. What Cruise isn’t telling anyone, at least not yet, is what it plans to do to improve the safety and usefulness of its AVs.
Tellingly, Cruise hasn’t reached out for my opinion, but if they do, I promise to deliver a really satisfying spit-take and then offer them this advice: Focus on improving the broad and general situational awareness of the cars, beyond what’s required for the specific driving task. This won’t be easy.
Here’s what I mean; let’s look at the past three significant Cruise failures that we’ve written about over the past year. Most recently, we have the tragic incident where a woman, hit by another car, was run over by a Cruise AV, which then proceeded to undertake a procedure to get off the active traffic lane (a good idea) while the woman was still trapped under the car (a bad idea). From Cruise’s statement on Twitter:
In the incident being reviewed by the DMV, a human hit and run driver tragically struck and propelled the pedestrian into the path of the AV. The AV braked aggressively before impact and because it detected a collision, it attempted to pull over to avoid further safety issues. When the AV tried to pull over, it continued before coming to a final stop, pulling the pedestrian forward.
Then, in June we had an incident where a Cruise EV was in an area where (yet another) mass shooting was taking place, and appeared to be getting in the way, causing frustration for police officers and emergency workers, as can be seen in this video:
Fellow Mission friends. Please stay away from 24th/Folsom. Gunshots fired; reckless Cruise cars. pic.twitter.com/fICRtS6e05
— Paul Valdez ????????️???? (@paulvaldezsf) June 10, 2023
And then in January, we had a strange incident where six Cruise EVs all converged on one intersection for 20 minutes, for apparently no good reason, and the July before, 20 Cruise AVs converged on an intersection close to that one. Here’s a video from the six-car incident:
A friend of mine took a video in San Francisco tonight of 6 @Cruise self-driving cars stopped at an intersection for 20 min. Traffic came to a standstill + people didn’t know what to do. Two of the cars were on wrong side of the road. Doesn’t seem particularly safe. #Autonomous pic.twitter.com/tKZrHgrdEi
— Jose Fermoso (@fermoso) January 21, 2023
So, what do all of these incidents have in common? They all have a common failure mode, and it has nothing to do with the mechanics of driving or sensory systems or not understanding a traffic sign or anything like that. It has to do with a general sense of knowing what the hell is going on around you, a sense which these machines have absolutely none of.
Of course they don’t know what’s going on around them – they’re machines! This is also why I’m not worried about some grand AI uprising or whatever; artificial intelligence is simply incapable of that sort of awareness– they’re not self-aware, they don’t have consciousness like we do. They’re capable of incredible acts of categorization and combing through data and generating text and images and interpreting camera and sensor data, and while it may seem like these systems have real intent and they feel like they’re thinking, the truth is they’re not. They have no idea what the hell they’re doing.
You can ask MidJourney to make you a photorealistic image of what a Volkswagen Beetle would have looked like if instead of metal we built them out of gouda cheese, but it has absolutely no idea what it’s actually doing. No computer does. They execute programs, they can’t come up with original ideas or have any self-awareness at all. And yet, to some degree, that’s what AVs need to be able to do if they’re to successfully operate in the human world.
Think about all those three incidents I mentioned: Each is about the car not understanding the situation it’s in. For the convergence of the 20 Cruise AVs, if you or I show up to a Taco Bell and immediately find ourselves surrounded by 19 other people all dressed the exact same as us, we’d know some manner of shit was going down. Something isn’t normal, and our behavior needs to adjust accordingly.
With the Cruise AV in the area with the active shooter, any human would have seen the police lights and caution tape, heard the sounds, and just felt that something wasn’t right. It wouldn’t take much for a human driver to encounter such a situation, and decide that this is an area best avoided (to be fair, emergency vehicles were eventually able to get around the stopped Cruise AV). Even if you kept driving into the situation, at some point a cop or EMT or someone would probably yell at you to get the hell out of there, or perhaps ask what your problem is, and if you’re not leaving at that point, you’re probably part of whatever is going down.
For the most recent incident, barring some unusual circumstances, you know if you hit somebody and they’re trapped under your car. You’d have felt and heard things that should feel very, very wrong. Perhaps you’d panic and drive or something, but the point is you wouldn’t just be calmly blasé about it.
Sure, there are mechanical/electronic solutions to this particular problem: sensors and cameras underneath the car to stop if any object is detected there, for example, but that’s not currently implemented on any deployed AV I can think of.
Cruise’s AVs may understand the basics of how to drive, but that’s only part of what driving is. Driving is, on some level, a human social activity, because we do it with and around other people doing the same thing, all reacting to the immediate surroundings while communicating with each other and making decisions based as much on the overarching culture we live in and how we interact with other people as the fundamental rules of driving. Situations that are identical from a driving perspective – the same speed limits, lighting, stretch of road, weather, and visibility – can be wildly different based on what is happening there that has nothing to do with driving.
Imagine the same intersection on a late Sunday afternoon and the same intersection on a Wednesday morning when an active and heated protest is happening. Imagine that same intersection with construction being done, or the road painted for an art installation, or with a street fair happening or any number of other possible things that humans can do outdoors. The basic traffic rules and driving rules don’t change, but the behavior of how to drive does, sometimes dramatically.
Consider this hypothetical: If you’re driving through the desert in 110° heat and see a broken-down car, you might pull over to help. But you’d assess if that person looked like a threat or not: Are they ranting around, waving a machete, or are they just looking glum and defeated? And once you stopped and talked to them, you’d be assessing their behavior too, innately and immediately – are they just talking about the Night Weasels that follow them everywhere and rat on them to the FBI and NFL agents hiding in every cactus, or are they thanking you for stopping and telling you how they need to get to Barstow for their niece’s karate demonstration? You suss out how creepy they are or aren’t pretty quickly.
If you wanted to program this behavior into an AV, think of all the factors you’d need to assess: the location, how to visually identify a vehicle that’s actually broken, the ambient temperature, comparing the look of the person and their stance and actions to vast databases of behavior, identifying any objects they may hold, transcribing their speech and comparing it to databases of stored psychological tests, and on and on and on. Is it even possible? And do people even want their AVs to help people in distress? I mean, I hope they would?
Exactly how Cruise can pull this off is a huge question. I think it may be possible to create some sort of algorithm that takes in a lot of varied factors of the environment – including visual data, auditory information, references to calendar dates, even referring to local news – and build some sort of basic behavior matrix. Perhaps as a number of markers get flagged, a human is contacted to rapidly assess the situation the car may be in and give instructions – something most humans could likely do almost instantly.
Of course, I’m far too stupid to know for sure how to solve this. What I do know is that this is a non-obvious and still very unsolved factor of automated driving, and it seems to be one that most AV companies, Waymo, Cruise, Tesla, whoever – has not spent much time or effort focusing on.
Of course, I’ve been arguing with David about this for way too long now: David thinks this concept is obvious, and any AV engineer already knows and understands this. And, while I did reach out to Cruise to ask about this and what else they’ll be doing while they pause driverless operations, I don’t know for sure yet what they’re cooking up in their R&D labs. Maybe they have vast Dynamic Cultural Environment models and algorithms being tested as we speak!
But, I would like to point out that I had a very similar argument with David about Level 3 autonomy. He felt like of course the engineers had the tricky hand-over problem solved! And then I talked to an actual AV engineer, who told me
I asked our automated driving engineer about this, and while he said that “processes are being designed,” he also added that the fundamental parameters of Level 3 are “poorly designed,” and as a result “everyone is just guessing” when it comes to how best to implement whatever L3 actually is.
“There is absolutely a lot of guessing going on,” he added, noting that there simply isn’t any real well of L3 driving data to work with at this time.
So, I’m still going to be skeptical, but I’m also happy to be proven wrong here. Any AV engineers working in this particular field should absolutely reach out to me. That said, I still think it’s time that AVs learn to read the room.