We live in a very strange era of Artificial Intelligence. Tesla CEO Elon Musk did some dire fear-mongering about AI becoming so intelligent it wants to murder us all; I don’t buy this — not because I don’t think an advanced machine intelligence couldn’t find a suitable motive, but more because I don’t think we’re anywhere near close to real artificial intelligence. Even the name is deceptive, because what we’re calling AI now isn’t really “intelligence,” at least not as we understand it in humans. Maybe it’s just a bunch of if-then-else statements, maybe it’s more, but whatever it is, it sure is weird. Like many people, I’ve been playing with the image-from-text-generating system called Craiyon, which was known as DALL-E Mini, which was based on the much more advanced DALL-E. Of course, the only good use for a computer is making it do things with cars, so I tried a lot of car-related inputs to see what I’d get. The results are, I think, fascinating.
I only have the most rudimentary understanding of how the system actually works, but from what I’ve read, the full DALL-E system uses something known as a Generative Pre-Trained Transformer, which uses a neural network that implements methods to make it better at predicting relevance of inputs from a given dataset. The system is trained with text-image pairs from the internet, meaning that it sort of recognizes what certain images are of based on tags, and has a vast library of these to pull from.
Here’s the simple explanation of how it works, according to the DALL-E Mini site:
The model is trained by looking at millions of images from the internet with their associated captions. Over time, it learns how to draw an image from a text prompt.Some of the concepts are learnt from memory as it may have seen similar images. However, it can also learn how to create unique images that don’t exist such as “the Eiffel tower is landing on the moon” by combining multiple concepts together.Several models are combined together to achieve these results:
- an image encoder that turns raw images into a sequence of numbers with its associated decoder
- a model that turns a text prompt into an encoded image
- a model that judges the quality of the images generated for better filtering
I’m not clear on how it specifically “draws” the images, but it seems to be taking elements from images and somehow re-processing them and compositing them together, I think? I am not clear at all, but one thing I can say is that the results are simultaneously impressive and a bit disturbing. The images suggest something that has a lot of skill and observational understanding without any overall comprehension. It can draw things that look like cars, even specific cars or cars of a given era, but it’s also very clear it has no idea what a car actually is.
According to the more detailed technical explainer, images are built from what appear to be discrete tokens, mosaic’d together:
The system manages to get many details right but has no idea what they are, what they’re for, or really anything. It’s hard to explain, but I think it is something genuinely new in the timeline of images that humans have had some hand in creating. So, with that in mind, let’s look at some AI-generated cars!
All of these start with text prompts to get the system going. With that in mind, I started with one of the best ways to start any car-related whatever, by asking it to show me some “Tatra cars in city.” Here’s the result:
Well, those do resemble Tatras! Severely mutated Tatras, sure, but there’s definitely some Tatra T87 in there. Though, the middle row, rightmost car looks kind of like a Jaguar Mk2 or something. I think it’d really interesting how, despite getting so much generally right about the idiosyncratic stylistic design of a Tatra, it somehow doesn’t understand really basic fundamental things about cars like how they’re almost always bilaterally symmetrical.
In case you forgot what a Tatra looks like, here you go (via Wikimedia Commons):
It’s also interesting to note that most of the streets are cobblestone, because most pictures of Tatras are from picturesque Eastern European old cities with these sorts of streets.
Let’s try making it come up with something that doesn’t really exist, but is a combination of two things that definitely do. Like a Bugatti pickup truck:
Huh! This is interesting, because it seems to have almost exclusively chosen old, 30s-era Bugattis as opposed to modern ones like a Veyron or Chiron. Some of those trucks do have the old school Bugatti horse-collar grille, and look like pretty cool sometimes hot-rodded pickup trucks. Again, there’s no understanding of automotive symmetry or even the fundamental layout of a car. That bottom middle one also looks like some kind of cool three-wheel cyclecar experiment. Generally, it looks like the AI found these sorts of examples and merged them together:
What if we try another carmaker-that-doesn’t-make-trucks type of experiment? Like a Lexus dump truck?
Huh. Surprisingly boring! Hardly any Lexus at all in these! Also, look how mind-hurtingly confused that top middle one is.
Okay, let’s try one more incongruous car type with carmaker. This time, how about seeing what a Velorex – a tiny, leather-bodied Czech three-wheeled microcar – limousine might be like. Here’s a real-world Velorex, to give you a reference:
…and here’s what a computer thinks a Velorex limo might be like:
It clearly had a lot of good Velorex sources, because all of these do have a very Velorex look, and I suppose some of them are longer and seem to have four doors, which would qualify as a limo, based on Velorex standards, so I’ll call this one a victory.
Let’s get a little self-referential and see what happens here. What if I just ask it to draw “The Autopian?”
Strangely, for this input it seems like it took almost all the visual references from one story I wrote a bit ago, about a 1906 magic lantern show that seems to be one of the first car-related movie-like things ever. That bottom middle pic is almost directly lifted from one of those magic lantern slides. But why did it focus on that for “The Autopian?” And who the hell are those old scientist-looking dudes?
Keeping self-referential, I wonder what it thinks the co-founders of this dump, David Tracy and me, look like:
Yikes. These digital ghouls dress better than we do, but I think I can safely say we’re better looking. Also, it seems to think David and I dress better than we actually do. I can’t think of any pictures where David and I would have been posed like these, in environments like these, so I have no idea where it’s pulling its data from for this.
[Editor’s Note: Good Lord I wish I could unsee that. -DT]
Let’s move to more technical stuff; I wonder how the AI will do if I ask it to draw me a diagram of a horizontally-opposed, or flat-four engine?
This is interesting for a few reasons, even if none of them is really a flat-four layout. What’s interesting is that they do sort of have a diagrammatic look to them, so the AI understood that much, and you can see that a lot of these appear to have finned cylinders like an air-cooled engine, which would make sense because the most common flat-fours likely to be found online in diagrams are air-cooled VW engines.
[Editor’s Note: These are amazing! -DT]
Some of these look like air-cooled Fiat 500 motors, some look like large V8s, and almost all of them have floppy-looking pulleys and some Escher-esque belt routing paths. Also interesting are the callouts, which I don’t believe the AI understands are not part of the engine diagram itself, because, again, it has no idea what it’s drawing, really.
Let’s see how it does with just a request for an engine image, not a diagram. Something exciting, like a turbocharged V8:
There’s big turbo pipes and some things that could be superchargers, even, and what could be valve covers – these all definitely suggest V8 engines, even if the specifics are pretty fuzzy and some of the parts look like a fully-popped Jiffy Pop pan.
What if we try to do something really specific, like ask the AI to re-create this scene from a famous 1985 Citroën commercial involving Grace Jones, a Citroën CX2, and a garage shaped like a giant Grace Jones head. This ad:
Something about everything about this commercial feels like it may be at home in an AI-generated space. So let’s see what it does when I ask it to draw “Grace Jones driving Citroën into giant head”:
Okay, so, it’s light on the Citroën and giant head part, but, honestly, these are kind of great interpretations of Grace Jones, ones that I feel like she would appreciate. They might be the most recognizable specific person I’ve ever seen this system generate, if I’m honest.
Let’s try something simple and straightforward and dear to my heart this time: taillights.
Those are all fairly plausible taillights! They all feel quite modern, at least 1990s-era and up, and there seems to be a fondness for the inset round lamp, sometimes an amber indicator, sometimes not. It’s interesting how they all seem to generally have that sort of lights-at-the-corners tombstone shape like so so many cars, like, say, a Kia Spectra or something generic like that.
I got a very unexpected result from something I thought would be more specific, “semaphore turn signal.” You know, those funny-looking little arms that were used as turn indicators on some European cars back in the 30s to the 60s or so:
I do love the look of the result, though, which feels like early 20th century modern art, like a Mondrian or Kandinsky or something:
I should have called them “trafficators.” In fact, let’s try that:
Okay, that’s not any better. Let’s move on to something else, like taking well-known cars and putting them in unexpected situations. Like a Beetle racing at Le Mans:
That worked pretty well! I like how it doesn’t really seem to differentiate between key circular elements like wheels, lights, and the gumball-number on the doors. These are all quite recognizable Beetles, though. Let’s try with some other highly-identifiable cars. Like a Ford Model T, but desert racing in Baja:
Hot damn, look at that! We even have some action shots here! Let’s replace the Model T with a BMW Isetta:
Those are very clearly Isettas, and they’re off-roading!
Okay, one last set of experiments. I’m curious to see how well the AI categorizes different stylistic eras for cars, so let’ come up with a nonsense car – a Saturn Tesla Wagon – and add the year, starting from the 1960s, and see how different the results look. Here’s a 1963 Saturn Tesla Wagon:
These definitely feel like late 1950s/early 1960s cars, with all the expected design language and details. Mostly seem American, too. Let’s see what it thinks the 1970s looked like, automotively:
Yeah, these do feel very 1970s! The right earthtone-heavy color palette, proportions feel right, and this time we have some interior shots as well. These still feel quite American, but with some Euro-looking details. On to the 1980s:
Very 1980s, especially 1980s GM. At this point we’re getting some real-world Saturn design influences here, mostly in the front ends, even though we really didn’t have Saturns on the roads until the 1990s. Again, though, Craiyon/DALL-E mini does seem very capable of creating likely car images based on an era’s general design traits. On to the 1990s!
Here we’re finally getting a lot of Saturn design influences, which makes sense, as there’s plenty of 1990s Saturn wagon references. To the 2000s!
Again, very Saturn-y, but more importantly, very much early 2000s-era looking design. Let’s keep going!
We’re finally seeing some Tesla influence here (look at the vertical screen right above here) though the top left one reminds me of a Saab 9-3 wagon, a bit, with the clear taillights and overall look.
Let’s jump a tiny bit into the future now, and see what it does when we ask for a 2024 car:
Interestingly, because the year is in the future, we have the blank backgrounds you’d see for concept car renderings, and some very sleek, concept car designs.
I think it’s worth playing around with this tool; it’s kind of amazing that we have free and easy access to something like this at all. What I really think is fascinating is how the system tends to fail, possibly more so than how it succeeds. It can do some truly remarkable things, and at the same time I can’t think of a better way to convey the limitations of the state of AI than seeing where it misses the mark.
So many of these sorts of systems are “black boxes,” meaning no one is entirely sure what’s going on inside them, which is all the more reason to try things out and see what you get. With that, I’ll leave you with this important set of images:
[Editor’s Note: This is incredible. -DT]