As I read about more of the details around ChatGPT and AI art creation tools such as Stable Diffusion, I am left with the impression that these systems were not really planned. Rather, something quite amazing arose from attempting to solve much smaller, more narrowly defined problems. Problems which at first glance have much less to do with intelligence.
This makes what we see as the state of the art in AI today something profoundly different from the AI systems we have seen in the past. These earlier systems were built from scratch to attempt to achieve general intelligence, but never got anywhere close. I remember years ago reading about a system they built up on a complex set of rules to attempt to mimic how a human applies logic. This was done by defining a long list of rules which could be combined in a logical manner.
These old AI systems are the ones which have left the pop culture perception that Artificial Intelligence must be excessively logical and use very cold and calculated logic. The character Data in Star Trek is a classic case of this stereotypical idea of artificial intelligence.
Today's systems such as ChatGPT and Stable Diffusion are much messier and not really planned excessively for general intelligence. In so far as they can be said to be intelligent, that is a much more of an emergent phenomenon.
ChatGPT today can solve complex programming problems and write long, complex articles on a variety of topics. Stable Diffusion can create wonderful beautiful and original looking art illustrations.
Based on what they do it may be tempting to think that the technology comes from scientists sitting down and thinking about how to build a system that understands the process of creating an image and how to reason about and understand complex essays and text. You may think one has programmed in knowledge of different kinds of brushes, colors, how to draw eyes etc.
Except nothing remotely like that is behind this revolution. Instead, ChatGPT is essentially a spell checker on steroids and Stable Diffusion is really just an evolution of software for removing blur and noise in an image. When you take a photo, you can get noise and artifacts you want to remove. There was thus a market to create software that would fix-up images to make them look better.
Spell checkers are based on using statistics to guess the correct word in a sentence. Given several preceding words, the spell checker knows the statistical probability for different words. Some words are more probably to follow.
In other words, there is no real understanding of the text. Rather, by training on lots of text, the spell checker has been able to create statistics for chains of words about what words typically go together with other words.
The AI art creation library Stable Diffusion is similar. It has been trained on real-world images, which have gotten noise and artifacts added. This allows trainers to show the neural network what the correct output should be in each case.
What computer scientists have come to realize is that these simple, rather narrowly defined principles can be taken to extremes and when you do that, you get something that starts looking eerily like intelligence.
But how can predicting the next word in a sentence create the ability to write whole essays on a complex topic?
It is because prediction can be repeated with no end. Once the AI adds one probable word, it can repeat the same process to add yet another word. This can be repeated as long as you like to create entire texts.
Answering a question typed by a user can be seen as just another variant of guessing the next word. An answer usually follows a question. What is the next more probably word for a given sentence, representing a question? The AI can statistically determine the next most probable word that follows. Based on that word, it can guess the next word and so on.
Notice how this process does not rely on understanding that we are even dealing with a question, or what that question is. All we are doing is finding words that are statistically likely to follow a sentence or longer text. Words in an answer are in that regard no different.
This iterative process is also how AI Art is created. The AI has been trained to statically guess what a new probable image should be given an input image. It has been trained on blurry and noisy images, but doesn't really know what a blurry or noisy image is. Thus, it sees no difference between an actual image which is a little bad and an image which is nothing but pure noise. Thus, the AI can turn pure noise into something close to an image. This new image can be fed in again and the process can be repeated in multiple steps, just like the word guessing game of ChatGPT. This simple approach allows us to turn arbitrary noise into a whole, fully detailed picture.
In this case, we are dealing with the victory of quantity over quality. As Stalin supposedly said "Quantity has a quality of its own." By simply being trained on a huge amount of human text ChatGTP has managed to devise probabilities around a huge number of questions and their answers.
It is important to understand that none of these systems are reproducing actual concrete answers in the training data. Rather, a statistical understanding over countless related questions and answers have been distilled. The same applies to images. Stable Diffusion has distilled how crude artifacts are turned into real looking images. It doesn't store actual images in a database.
In a way, we may have stubbled upon a viable path towards real artificial intelligence without even trying to build it. Instead, it is an outgrowth of a much simpler idea.
Of course, I am dumbing down this narrative. Naturally, people working on this see the potential for broader intelligence and have tried to tailor their system more towards that. What I am talking about is more about the core idea and principles. Those were not designed to build general intelligence. We jokingly used to say that the Google search may turn into a sentient being eventually. In reality, it may just end up being your good old Word spell checker on steroids, which eventually evolves into general intelligence.
Please note, this whole article is a quite subjective observation and reflection. I am personally not anywhere near knowledgeable about this field. See this more as a speculation rather than some absolute truth. It may, for instance, very well be that the potential to evolve these systems into a general or broader intelligence was understood by AI researchers from the very beginning. I am very open to criticism and corrections.
My interest is really just in thinking out loud about this topic and getting you to think about this and maybe do your research.
Great article! There is a lot of emphasis these days on the probabilistic nature of trained systems, but that may be more a means of getting mechanisms to emerge than being fundamental to the mechanisms these systems end up developing.
If you reverse-engineer a trained neural (deep learning) network, you do see patterns that correspond to components that are carrying out specific operations on data (looking for edges, shapes etc). That is to say even though we don’t plan and build the mechanism, training may be developing mechanisms that we could reverse-engineer that would help us understand what is going on. Trained algorithms as a tool of science even if not a product of science.
For a while in my career I worked with people who worked in one branch of symbolic AI called Case-Based Reasoning. The basic idea is that humans store a collection of memories of situations and stories, and then have ways of combining and adapting those stories in new situations.
I suspect that LLMs (ChatGTP etc) are doing something similar, looking for patterns in past cases and adapting them to new situations. The fact that LLMs can get so far on pattern matching and adaptation supports many insights from the CBR community. Humans seem to do a lot of remembering and adapting past cases. But also it seems not to account for all of human intelligence.
LLMs also sometimes fail. They are not great at reasoning, and sometimes invent new stuff without testing their suppositions against available evidence and background knowledge. A little like a bright enthusiastic High School student who has new and creative ideas about science but has not yet developed the discipline to take generated ideas as hypotheses rather than as knowledge.
There is a related idea in analyzing human intelligence. Given the long delay that can be demonstrated in conscious reaction, where we react before we are conscious of the input, a hypothesis is that we're running a predictive model which allows us to react in real time. Perhaps humans are a generative ai with after-the-fact corrections. This may explain those videos where we watch a basketball game and don't notice the guy in the gorilla suit in the background, because how would we generate that - and if it does not interact with the events we are interested in we just eliminate it as noise.