Artificial Intelligence Art and Cooking
Comparing the processing of cooking food with making digital images using AI
Unless you live under a rock, you must have picked up on the fact that we currently live at the beginning of an artificial intelligence revolution. In many ways, this is like the beginning of the internet revolution. It swept across the world in the mid-90s and later accelerated with the smartphone revolution of the 2010s. It didn't necessarily feel dramatic right there and then. Except suddenly you realize that you are ordering your plane tickets online, paying all your bills through a computer and ordering books and other items through Amazon rather than walking into a physical store.
Artificial Intelligence, like the internet, is transforming every possible aspect of our life, but in this story I want to discuss the creation of digital art specifically through the use of artificial intelligence. While you may not care about that subtopic in particular, it does say a lot about both the opportunities and limitations of artificial intelligence in general.
For the last two months, I have gone down the rabbit hole of AI Art. Having spent much time making images and using different tools, I feel I have obtained a good sense of what this process is like. Before going down this path, I had pondered several questions in relation to AI Art:
Will this make artists obsolete?
How difficult are these tools to use?
What kind of skill and talent is involved?
What are the limitations?
What is Talent?
I can say that after countless hours, I am now able to produce significantly better images than I did when I started. But that doesn't necessarily mean that it is a complex process with many moving parts I have to master. A lot of it is about knowing what to do and not do. I am reminded of a story about a man who opened car doors for a living. He would get called upon when people had gotten locked out of their cars.
When he started, he would spend close to an hour with plenty of efforts and struggling to get the door open. Customers would stand by, and after an hour would be over-joyous about what he had achieved. They would give him extra tip for the accomplishments.
Fast-forward 20 years, and he walks up to a car and opens a door in less than a minute. Rather than being thrilled about their problem quickly solved, customers are more often than not annoyed at how much he charges for a "trivial" job. "That much for a minute of work," they exclaim? His response was always: "You are paying me for 20 years of experience and 1 minute of work."
The Art of Cooking an Image
Countless things are easy once you know them, but getting there can take time. For some reason, my AI art journey makes me think about cooking food. Perhaps because it is something everyone can relate to, and because my teenage son recently made the Indian dish Biryani from scratch. He did an impressive job, I thought. I am a proud father. It takes me back to when he was a toddler and I made mashed potatoes with him. He would stand on a chair, and we would add garlic, salt, pepper, butter, and lemon juice. I would let him stir, and then I'd have him taste. I would ask him how he thought it tasted and what was missing. I would show him how the taste would change if we added more salt, pepper, or butter.
It goes to the heart of what, I think, is important in cooking or almost any human endeavor: Multiple iterations guided by quality feedback. I have come to realize that being a good cook is a lot about the ability to taste food and know what good food tastes like. Of course, anyone can taste food, but how good are you at tasting something and knowing what is missing? That is a skilled acquired through practice.
Just because you can buy ingredients in a store and read a recipe doesn't mean you can make great food. Creating AI art is a bit similar. Your ingredients are words and typically a primitive drawing, providing some guidance to what you want out.
In cooking, you can choose not only whether you should add butter or milk, but also how much. Likewise with AI art creation, you can not only add a word to a description of the image, but you can also decide how strongly weighted that word should be. You may not regard every word in the description as equally important.
Once you got your ingredients, you can start cooking. You can "cook" your text input for a short or long time to get different output images. Just like you taste food on the way you "taste" images, the AI outputs but considering how close they are to what you want and what you need to adjust to get to what you want.
As with cooking, you need experience to know what is best to tweak to get closer. Maybe you need to add another word into the pot to get the perfect image. Or perhaps you require "less" of one of the words.
In other ways, AI Art is working in a workshop. Occasionally, the reason something feels so difficult is that you are simply using the wrong tool for the job. My first images were somewhat crappy, in part because I was using the wrong AI models.
What is an AI Model?
What is an AI model, you may ask? Remember trigonometric functions such as sine and cosine from school? You have them on a regular calculator. In middle school, you are introduced to functions such as f(x)
, expressed as f
of x
. Often these functions would be defined as something like f(x) = 4x + 2
. It means you put in some value x
and get some value out. The numbers 4 and 2 are coefficients. They determine the sensitivity of the output to the input.
AI Art works basically the same way. You have a kind of mathematical function. Rather than putting in a single number, you put in numerical representations of text and images. Outcomes of the generated image. We call such a large complex function an AI model. Training an AI model means adjusting millions of coefficients until you get the desired output for a given input.
Scientists have already done that for us. They have collected thousands of images from different image libraries online together with their captions. The training process works by making each image deliberately crappy by adding noise and artifacts to it. You input a bad image together with its caption. Then the output is compared with the original unaltered image. The difference between these images is used to adjust the coefficients (called weights) in the AI model.
We are back to the cooking analogy. This is like tasting the food you are cooking. Except there is an automated program that does it for the scientists. An automated program makes the comparison between actual output and desired output. The AI model is then automatically adjusted. This keeps getting repeated over and over and over again with countless images until the difference between desired and actual output is minimized.
You might ask how on Earth you get art out of that process. It doesn't sound like a very creative process. All you are doing is training an AI to remove artifacts. It is just taking an image and making it a bit prettier.
But here is the kicker: This process is driven to extremes. As the AI gets better, they make the images more and more noisy until they are almost completely unrecognizable as images. At that point, you can input almost any random noise and get out something that looks like a real picture. Essentially, you have a ghost in the machine. The AI thinks it is reconstructing an actual image covered in noise. It doesn't know that there was never any image there in the first place. The AI, however, just faithfully does what it has been trained to do: Create pretty pictures from messy input.
Remember, the AI model is trained with both image captions and images. That means that by providing a caption to an image which is just noise, you bias the AI toward generating an image with similarities to other images having a similar caption.
This may help dispel a common misconception about AI art creation. Having an AI art generator is not like having a talented artist which you are instructing in detail what to make. The AI has absolutely no concept of what it is making. It doesn't really know what the words you write means. It just matches up words in a statistical fashion with imagery it has seen matching similar texts before.
AI Art is more like Googling than Instructing and Artist
You should not think about writing textual descriptions of images you want as giving commands to the AI, but as providing a Google search. Implicitly, the AI model stores millions of pictures. These pictures are all endless combinations and permutations of the images it got trained on, together with related captions. As an AI art creator, you are trying to Google your way to the right image.
Why is that distinction important to keep in mind? Because for anyone experienced searching on Google, you know that you cannot give explicit search instructions such as find a web page with a red banner written in Cyrillic text and with clown pictures placed on the right side. A bizarre example, but it is to illustrate that you cannot give detailed instructions about what a web page is like when you are searching.
Likewise, the AI has been trained on images with captions. It has not really learned what part of the images is denim jacket or copper buttons. All it has seen are images with those captions and where there happens to be those things somewhere in the image. Hence, isolating words to particular parts of an image is hard. Say you write that your character should have a "black jacket." That risks also turning their hair black unless you have used words to describe the color of the hair, which tends to be unique for hair. E.g., you could write "blonde hair" and that will not turn the pants or jacket light yellow because we rarely describe clothes using the word "blonde".
Thus, writing a good description of an image often relies on using quite unique words which cannot be mistaken. In various guides, they will refer to those as powerful words. When I first read advice on this, I thought the AI models had been programmed to react specifically to these words. However, that is not the case. It is simply a side effect of the training. Very generic words will match far more images, and thus are poor at narrowing down what you want.
Somebody might write "women in a dress" and think the AI engine will spit out Cinderella. Instead, they may get a heavy-set older woman. One could write a "pretty young woman in a dress," but many of these words are somewhat subjective. Remember, this is all about how images have been originally captioned. The AI has no concept of what pretty or handsome means.
Are images of pretty women usually captioned as such? Probably not. That is a challenge when describing images. You must do it by proxy. How would somebody caption images matching what you want? Thus, instead of writing "pretty" it may be better to write the name of a well-known photographer known for pretty portraits. Alternatively, you can use names of celebrities or TV series with pretty young people.
Pop culture is a treasure trove in this case. Say you want some kind of spaceship stuff? Better to specify things like Star Wars or Star-trek, for which there are numerous images which often have relevant captions. That helps the AI narrow down the style you want. I have personally found that I have had to familiarize myself with various artists and their style when making images because describing a style in words is quite hopeless. It is better to describe it by using the name of an artist. You can also use many artist names. That typically works better than one. If you can list many artists, working in a similar style, to give the AI more images to work with in terms of producing something you want. Unique names are important. An artist named John Smith would be hard to utilize, as it becomes hard to narrow down images with such a common name associated with them.
Artists, TV shows and celebrities with unique names thus represent more powerful words in terms of image creation than common and ordinary words applied to many things. When creating photo realistic images for instance it could help to specify the name of common cameras used by professional photographers because such info is often displayed in captions to images on various web sites with large collections of images such as Flickr. An AI model trained on such image databases will thus have learned to associate many high-quality images with names of specific camera models or camera types.
An image caption isn't necessarily going to explicitly state "photo realistic" next to a photo realistic image. Instead, it may state the camera used. Thus, names of cameras can act as proxy words for photo realism, just like the name of a celebrity or TV series may act as a proxy for "pretty".
Pop culture icons and characters also represent powerful words. Write in the names of well-known comic book heroes or villains, for instance, and suddenly, you notice the images that used to be dull and lack detail and crispness suddenly explode with color and clarity? But why is that? Because these characters have been drawn in countless images online, not just by professional illustrators but also in countless fan art. To extract the ability to draw something well, the AI needs a lot of images. Pop culture icons typically have that.
Of course not all of that is high quality. That is why you also typically need to write words which can filter out the bad images or narrow down the good ones. You can do that by specifying quality artists directly. Write things such as "trending on Artstation." The assumption would be that images which are trending are likely to be higher quality than those which are not.
Do you see the pattern? You can rarely specify quality directly as some sort of instruction to the AI. Rather, you need to find ways of telling the AI that you want quality indirectly. That is why writing text prompts become something of an art form. Through experimentation, you gradually develop an intuition of how you must describe what you want.
Will Artists Be Replaced?
With ChatGPT, we ask if writers will be replaced, and with software like Stable Diffusion we ask if artists will be replaced. The short answer is no, but that is obviously an unsatisfactory answer lacking in nuance.
AI art has some clear limitations in that it doesn't have a proper 3D understanding of the world. Remember, it is really just working with flat 2D images. That means it does a very good job making images where a person faces a camera with their arms down. In these cases, matching up different parts of a person between different photos is easy. But once you deal with more complex poses and movements with arms and legs, everything gets way more complicated.
Hands are infamous for looking like a horror show in AI Art. A key reason is that we humans often orient our hands in photos in numerous variations, which an AI will struggle to capture and distinguish. That means an AI today can typically outperform the average artist in terms of making beautiful portraits. But as soon as the image is more dramatic with more movements and with more complex interactions with the environment and other characters, then the AI tends to fall flat on its face.
The result is similar to what I see with ChatGPT: People can get a grossly inflated idea of how capable the system is because it can perform so well within the narrowly defined subset of images it is good at making. It is a bit like how a calculator is really good and multiplying huge numbers. Much better than a human. But does that mean a calculator can do complex mathematics? No, it can't. AI today does a bit like some kind of excessively talented person with a very bad case of autism. It is easy to think somebody with autism is really sharp because of a very special and narrowly defined ability which is way beyond what a normal person can do. We forget that that person still has a very severe handicap and cannot deal with situations we deem as easy and trivial. AI is the same. It has extreme talent in some areas while being far behind an average person in what we deem trivial tasks. I saw infamous psychologist professor Jordan Peterson make this cognitive mistake when discussing ChatGPT. He got mesmerized by the ability of ChatGPT to do to very academic and nerdy stuff. He forgot to ask it to do more mundane tasks because he assumes an AI operates like a normal human.
For instance, making a comic book with AI would be quite difficult because retaining accurate style across multiple images is nearly impossible. Say characters are inside a house and you see the interior from different angles. It will be very hard to make that interior look the same in each image if you make those with an AI art generator. Likewise, characters in different situations will get numerous details changed. Their belt may change style. The color of their jacket and their pants may get swapped. Their faces may change in minor but noticeable ways.
Thus, AI is excellent at making standalone images used in things like posters and book covers. But if the images are part of a related set of images such as in a comic, step-by-step step instructions or similar, then AI tools will not be very helpful.
Thus, certain artist skills will be less valuable in this new area. However, that is not new. All through history, advances in technology have rendered certain skills obsolete. Early craftsmen had to be very good with eye measurement and estimation. The introduction of high-precision measuring instruments and precision tools rendered a lot of that skill obsolete.
The ability to draw on a computer made things like being good at mixing physical colors less valuable. Consider the time of Leonardo da Vinci. The studio had chicken in cages because they used eggs to make their colors from scratch. How many artists know how to do that today? But that hasn't rendered artists obsolete. Instead, focus change.
Existing artists will have to either learn to use AI tools themselves, or simply choose to specialize in a niche where AI art currently cannot compete.
Personal Perspective
I, personally, don't like the kind of art that goes into art galleries. For some reason, that is what we typically label as art today. But most art produced today doesn't go into art galleries. Countless talented artists are illustrating beautiful children's books, graphic novels, posters, book covers , computer games, web pages, news articles, science books and many other things. Many are on the streets in various popular cities selling their renditions of beautiful or interesting parts of the city. I, personally, own some illustrations of Amsterdam that I bought from one of these guys, which I am very much in love with. It has such fresh colors. This is the kind of art I love and appreciate. The internet is full of it.
This is the kind of art I wanted to make. I love science fiction and fantasy and love seeing renditions of these fantastical imaginary worlds. Or the people or creatures within them, whether orcs, dwarves, or robots. A long time ago I gave up drawing these kinds of things. I simply did not have the talent. Or maybe I didn't have the patients. I was always known as one of the best, if not the best in my class to draw. But when it comes to art, that is rarely enough. Good artistic skill is quite rare.
With AI art, an old passion and hobby is suddenly made possible again. Amazing and beautiful creations be brought to life with ease, often only bounded by your own imagination. You don't necessarily get exactly what you want but one less obvious advantage I see is as a tool for writers. I have tried my hand on writing fiction on several occasions.
Frequently I find that imagining my world visually can limit my ability to create that world in writing. AI Art can create images with so much atmosphere and detail that can really unlock your own imagination as a writer. You can see new possibilities and perspective which had not been clear to you before.
As a writer, I think I actually see this as more important than ChatGPT and its ability to write text. ChatGPT simply doesn't write stuff that feels very imaginative or novel to me. It suffers the same problem as AI Art suffers when trying to create a collection of images which are part of one comprehensive story. ChatGPT cannot create a consistent and coherent story. Not surprising since ChatGPT is essentially a sophisticate bullshit generator. AI Art is much the same: A very sophisticated parrot which can mash together all sorts of artistic styles and elements, but without really knowing what the hell it is doing.
Conclusion and Predictions
This revolution has really just begun. It will change our lives in profound ways, the way the internet and smartphones did. Some things will stay the same. You will still live in a pretty ordinary looking house and not in a glass bubble floating in the air. You will not drive to work in a rocket powered flying car. And on the surface, our society will look very familiar. It is not like when cars and tractors replaced horses.
But the nature of very many office jobs will change in profound ways. The media we consume will change. I notice on Art sites how AI Art has caused an avalanche in image production and general quality increase. It is not just that quantity has increased, but quality has generally increased considerably too. We are not just talking a few percent, but orders of magnitude. The ability to produce high-quality images in a short time has dramatically changed.
My prediction is that this will significantly impact the production of comic books and graphic novels. It will impact the creation of computer games. For a hobbyist like myself, it is suddenly much easier to simply generate beautiful images to use in games. That will cause a revolution for independent game developers who lack their own art divisions.
I also think that over time this will have a major impact on movie creation., although that may take a longer time. And with images much easier and cheaper to create, they will likely be used a lot more. Articles and books will likely use a lot more illustrations than in the past.
Education is another area that will have to change. The kinds of essays students are typically asked to write are the very kind AI tools are perfect at writing. In my mind it must be possible to give better assignments because if AI software was truly good at writing anything, I would have used it to write this article. I don't because all my attempts had produced really lame articles full of platitudes and superficial characteristics.
Perhaps there is an elementary solution: Teachers should simply ask ChatGPT to write an answer to their assignment. If they get a great reply, then the assignment probably wasn't very good. Rinse, repeat until you start getting answers which suck. When the AI is giving you worthless answers, then you know you most likely have a good assignment.