Sam Altman, CEO of OpenAI, addressed attendees’ questions during a Q&A session at the AC10 online meetup, confirming the rollout of the GPT-4 model had indeed been planned. Here, we will take that knowledge and use it alongside recent developments to make predictions about the model’s size, optimal parameter and compute, multimodality, sparsity, and performance.
- Altman says that GPT-4 won’t be all that much bigger than GPT-3. So, we can guess that it will have somewhere between 175B and 280B parameters, like Deepmind’s Gopher language model. The big Megatron NLG model is three times bigger than GPT-3 with 530B parameters, but it didn’t do better than GPT-3. The next model, which was smaller and came after it, did better. To put it simply, a bigger size does not mean that it will work better. Altman said that their main goal is to improve the performance of smaller models. For the big language models, you needed a big data set, a lot of computing power, and a complicated way to put it all together. Even putting out big models can be too expensive for some companies.
- Most large models are not optimised well enough. Training the model is expensive, so companies have to choose between accuracy and cost. For example, GPT-3 was only trained once, even though it made mistakes. Researchers couldn’t do hyperparameter optimization because the costs were too high. Microsoft and OpenAI have shown that GPT-3 could be made better if it was trained with the best hyperparameters. In the results, they found that optimising the hyperparameters of a 6.7B GPT-3 model made it perform as well as a 13B GPT-3 model. They have found a new parameterization (P) that shows that the best hyperparameters for the smaller models with the same architecture are the same as the best for the larger models. Researchers have been able to improve big models at a fraction of the cost.
- DeepMind just found out that the number of training tokens affects the performance of a model just as much as its size. They proved it by teaching Chinchilla a 70B model that is four times smaller than Gopher and has four times more data than large language models since GPT-3. We can be sure that OpenAI will add 5 trillion training tokens to a compute-optimal model. It means that the model will need 10–20 times as many FLOPs as GPT-3 in order to be trained and reach minimal loss.
- Altman said during the Q&A that the GPT-4 won’t be multimodal like the DALL-E. It will only have text on it. What gives? When compared to language only or vision only, it is hard to make good multimodal. Putting together textual and visual information is hard. It also means they need to do a better job than GPT-3 and DALL-E 2.
- Sparse models cut down on the cost of computing by using conditional computation. The model is easy to use with more than 1 trillion parameters and doesn’t require a lot of computing power. It will let us train big language models with less time and money. But GPT-4 won’t be using sparse models. Why? OpenAI has always used dense language models in the past, and they won’t make the models any bigger.
- GPT-4 will line up better than GPT-3. OpenAI is having trouble aligning AI. They want language models to reflect what we mean and what we believe in. They have trained InstructGPT, which is the first step. It is a GPT-3 model that was taught to follow instructions by getting feedback from people. People who judged the model thought it was better than GPT-3. No matter what language standards are.
In terms of efficiency, GPT-4 will be on par with GPT-3 and will be a huge language model that solely uses text. More humane norms and standards will be reflected in it.
There is a lot of confusion surrounding GPT-4, a programme that has 100 trillion parameters and exists purely to create code, and you may have heard contradicting things.
But they’re all hypothetical right now.
There is a lack of information since OpenAI has not made public any firm details on the release date, model design, dataset size, or other relevant factors.
Like GPT-3, GPT-4 will be put to use in a wide range of language-related contexts, such as code creation, text summarization, language translation, classification, chatbots, and grammatical correction.
The updated model will improve in several ways over its predecessor, including safety, fairness, precision, and consistency. It will last a long time and not break the bank.