GPT Explained for 5 year old

Remember 2020, the year when ChatGPT was released? The whole world was amazed, and for most people, it seemed like some kind of magic that could answer any question. The technical documentation was hard to understand at first glance. However, if you take a closer look and try to understand how this amazing technology works, you'll quickly see the magic behind it.

Transformers

Transformer by very definition means something that will transform something from one state to another. In other words, it will convert an Input to some another form.

E.g Take example of a Juice Maker Machine. We put the fruits inside the machine and in return we get the juice from the machine. Here, the Juice Machine is working as a Transformer.

So, in simpler words, Transformer takes input and returns output in a desired form. In the Image below the Input of Type A get’s converted to Type B by the Transformer.

Transformers

Meaning of GPT

So, now that we know what Transformers are, now how does this concept fit into the equation of AI, especially in products like ChatGPT. First let’s understand the word GPT.

Meaning of GPT

In simple terms, GPT stands for Generative Pre-trained Transformer:

Generative → It Generates text, numbers etc.
Pre-trained → It has been already trained by it’s creator using huge library of text, books, internet articles etc.
Transformer → It takes input as previous word and outputs next word.

Now if we look at a product like ChatGPT we can see that the GPT is the real technology, the models that does all the things while the prefix Chat Symbolizes that we can have a conversation with this Transformer.

Now we know that what GPT stands for. Now it’s time to see what is the mechanism behind this awesome technology.

Fancy Next Word Predictor

Let’s take example of the a classroom. We are taught counting when we were kids in the school. So, if I ask you to tell me what would be the next number of 3? You will instantly answer 4. If I ask for next, you will answer 5. Maybe I ask you what comes after A? You will reply B.

Similarly, In another scenario, let’s say I call you on your mobile, what’s the first thing you would say? It will most probably be “Hello?”.

In both of the above examples, we can clearly see that our mind are already trained to predict and tell what would come next and as in above examples, we are fast to reply. Same is the concept and working of the models like GPT.

Hence, all GPT does is it will take previous word, text, number and will predict the next one of either same or different types.

Now, the GPT model will not give the entire output at once. No instead it will output one token at a time and then the same token will be fed as an Input back to the GPT Transformer.

Token could be anything like text, number, letter, word etc. anything that the Transformer can parse and work with. Now, I would be using the word Token instead of words, numbers, etc

So, the output of the Transformer will be fed back into the It again as Input. This process will keep going until the some ending character like <EOF> is encountered.

In ChatGPT, when you say Hello, It will print back “Hi, How are you?”.

Input	Output
Hello	Hi,
Hello Hi,	Hello Hi, How
Hello Hi, How	Hello Hi, How are
Hello Hi, How are	Hello Hi, How are you? <EOF>
<EOF> Detected	Ends Printing

Hence in simpler terms, this is how Transformers work. Hence they are nothing but predicting next words just like how your android/iOS keyboard will print next word when typing out.

Hence it’s nothing just a Fancy Next Word Generator.

How GPT Read and Understand Everything You Say

Now, we will learn how a Transformer understands your query by going more low level. It is basically an encoder and decoder and utilize various steps and we will learn all those phases now.

Tokenization → Break the words

When we humans try to understand something we don’t understand the whole sentence as one. Instead we try to understand different words used in the sentence to understand the actual meaning of the thing being said to us. Our brain automatically breaks a big sentence into small chunks and analyze each and every part so we can understand it better.

tokenization

Similarly, when the Transformer get some Input from the User, the Query will be broken into multiple tokens and each token is assigned a number which is used by the Transformer to further perform some operations, this process is called Tokenization.

Vector Embeddings → Group Tokens by Meaning

When I say the word “Paris”, the first thing that might come to your mind will probably be “Eiffel Tower”, Similarly when I say something like “AI”, then “ChatGPT” might be the first thing that comes to your mind. So our mind has some words that are grouped together by some meaning, which makes it easier for us to relate to something.

Similarly, Transformer would take Tokens and would place them in a Vector Map, which can be 2D or 3D or even have more dimensions where similar meaning words are grouped together. “India” and “France” here are countries, hence they are placed together while “Eiffel Tower” and “India Gate” are monuments and are placed together which makes mapping easier between them.

In large LLMS, these maps are huge with log of dimensions which allow easy relationships between the tokens.

Positional Encoding → Right Token at Right Place

If I say “Hello, how are you?” and “Hello, are you how”, which one makes more sense? Obviously the first one, our brain tries to rearrange certain words in order to make something meaningful. Right? Our brain find meanings in words even by shuffling their positions.

Similarly, Transformers can try to rearrange the tokens based on the numbers assigned to them, so that they are positioned correctly and hence makes more sense and provide a better output for the user.

Semantic Meaning → Right Meaning

If I say “Cat chase the mouse” and “Mouse chase the cat”, if you observe both these sentences have same number of words but the meaning of each sentence is completely different. So, right order is required to make a sense of meaning that feels right.

Similarly, Transformers don’t just look at the position but also tries to understand the meaning between the tokens based on the context. Hence by capturing the semantic meaning we can have the right output of the query given to the Transformer.

Self-Attention → Don’t forget the Neighbors

When you hear a statement like “A Big Bank will not lend money to risky clients”, then how do you know that we are talking about a “Financial Bank” and not a “River Bank”? Well, you will say that because the words “lend money“, “risky clients“ are enough to tell you that we’re not talking about river banks.

Similarly, Transformers checks every token and its neighboring tokens as well to understand how different tokens relate to one another.

SoftMax → What comes next?

I have a small exercise for you. It’s called "complete a sentence." Here’s the sentence: “I’m going to ….” What came to your mind? The market, the gym, or somewhere else? Our brains find many words and choose what feels best.

Transformer does the same thing. When it is on current token it will predict all the tokens that could come next and then based on it would calculate the score into probabilities. The model would then pick the token with the highs probability and the process repeats again.

Multi-Head Attention → More Perspectives

You might be told by many people to think of a situation from different perspectives and Scenarios. It’s usually said, so that you can grasp something much better.

Well Transformer does kinda same thing. It looks at a sentence from multiple angles i.e. different relationships, syntax etc to get a bigger picture.

Temperature → Be Creative Number

If I give you a picture of a school and ask you to write an essay only using what you can see in the picture, then you would be following a rigid instructions and hence the creativity would be highly limited. But what If I told you to just “write an essay on School”. In this case, you don’t have any picture, so everything that comes out would be through your Imagination and it would be lot more creative.

Similarly, Transformers can be told to whether be more creative or more serious when following a prompt. This is done using a setting called as temperature. Hence, the lower the temperature, lesser would be creativity and higher the temperature, the higher will be the creativity.

Summary

In short, GPT is like a supercharged “next-word guesser” with a brain full of books, articles, and conversations. It chops your message into tiny pieces (tokens), figures out what each one means, keeps them in the right order, looks at them from every angle, and then predicts what should come next. Hence by now you now have a good understanding of how GPT works and what is that magic behind all the AI that’s in trend these days.

GPT Explained for 5 year old

Transformers

Meaning of GPT

Fancy Next Word Predictor