🤖🚀 Dive deep into the world of AI as we explore ‘GPTs and LLMs: Pre-Training, Fine-Tuning, Memory, and More!’ Understand the intricacies of how these AI models learn through pre-training and fine-tuning, their operational scope within a context window, and the intriguing aspect of their lack of long-term memory.
🧠 In this article, we demystify:
Pre-Training & Fine-Tuning Methods: Learn how GPTs and LLMs are trained on vast datasets to grasp language patterns and how fine-tuning tailors them for specific tasks.
Context Window in AI: Explore the concept of the context window, which acts as a short-term memory for LLMs, influencing how they process and respond to information.
Lack of Long-Term Memory: Understand the limitations of GPTs and LLMs in retaining information over extended periods and how this impacts their functionality.
Database-Querying Architectures: Discover how some advanced AI models interact with external databases to enhance information retrieval and processing.
PDF Apps & Real-Time Fine-Tuning
Drop your questions and thoughts in the comments below and let’s discuss the future of AI! #GPTsExplained #LLMs #AITraining #MachineLearning #AIContextWindow #AILongTermMemory #AIDatabases #PDFAppsAI”
Welcome to AI Unraveled, the podcast that demystifies frequently asked questions on artificial intelligence and keeps you up to date with the latest AI trends. Join us as we delve into groundbreaking research, innovative applications, and emerging technologies that are pushing the boundaries of AI. From the latest trends in ChatGPT and the recent merger of Google Brain and DeepMind, to the exciting developments in generative AI, we’ve got you covered with a comprehensive update on the ever-evolving AI landscape. In today’s episode, we’ll cover GPTs and LLMs, their pre-training and fine-tuning methods, their context window and lack of long-term memory, architectures that query databases, PDF app’s use of near-realtime fine-tuning, and the book “AI Unraveled” which answers FAQs about AI.
GPTs, or Generative Pre-trained Transformers, work by being trained on a large amount of text data and then using that training to generate output based on input. So, when you give a GPT a specific input, it will produce the best matching output based on its training.
The way GPTs do this is by processing the input token by token, without actually understanding the entire output. It simply recognizes that certain tokens are often followed by certain other tokens based on its training. This knowledge is gained during the training process, where the language model (LLM) is fed a large number of embeddings, which can be thought of as its “knowledge.”
After the training stage, a LLM can be fine-tuned to improve its accuracy for a particular domain. This is done by providing it with domain-specific labeled data and modifying its parameters to match the desired accuracy on that data.
Now, let’s talk about “memory” in these models. LLMs do not have a long-term memory in the same way humans do. If you were to tell an LLM that you have a 6-year-old son, it wouldn’t retain that information like a human would. However, these models can still answer related follow-up questions in a conversation.
For example, if you ask the model to tell you a story and then ask it to make the story shorter, it can generate a shorter version of the story. This is possible because the previous Q&A is passed along in the context window of the conversation. The context window keeps track of the conversation history, allowing the model to maintain some context and generate appropriate responses.
As the conversation continues, the context window and the number of tokens required will keep growing. This can become a challenge, as there are limitations on the maximum length of input that the model can handle. If a conversation becomes too long, the model may start truncating or forgetting earlier parts of the conversation.
Regarding architectures and databases, there are some models that may query a database before providing an answer. For example, a model could be designed to run a database query like “select * from user_history” to retrieve relevant information before generating a response. This is one way vector databases can be used in the context of these models.
There are also architectures where the model undergoes near-realtime fine-tuning when a chat begins. This means that the model is fine-tuned on specific data related to the chat session itself, which helps it generate more context-aware responses. This is similar to how “speak with your PDF” apps work, where the model is trained on specific PDF content to provide relevant responses.
In summary, GPTs and LLMs work by being pre-trained on a large amount of text data and then using that training to generate output based on input. They do this token by token, without truly understanding the complete output. LLMs can be fine-tuned to improve accuracy for specific domains by providing them with domain-specific labeled data. While LLMs don’t have long-term memory like humans, they can still generate responses in a conversation by using the context window to keep track of the conversation history. Some architectures may query databases before generating responses, and others may undergo near-realtime fine-tuning to provide more context-aware answers.
GPTs and Large Language Models (LLMs) are fascinating tools that have revolutionized natural language processing. It seems like you have a good grasp of how these models function, but I’ll take a moment to provide some clarification and expand on a few points for a more comprehensive understanding.
When it comes to GPTs and LLMs, pre-training and token prediction play a crucial role. During the pre-training phase, these models are exposed to massive amounts of text data. This helps them learn to predict the next token (word or part of a word) in a sequence based on the statistical likelihood of that token following the given context. It’s important to note that while the model can recognize patterns in language use, it doesn’t truly “understand” the text in a human sense.
During the training process, the model becomes familiar with these large datasets and learns embeddings. Embeddings are representations of tokens in a high-dimensional space, and they capture relationships and context around each token. These embeddings allow the model to generate coherent and contextually appropriate responses.
However, pre-training is just the beginning. Fine-tuning is a subsequent step that tailors the model to specific domains or tasks. It involves training the model further on a smaller, domain-specific dataset. This process adjusts the model’s parameters, enabling it to generate responses that are more relevant to the specialized domain.
Now, let’s discuss memory and the context window. LLMs like GPT do not possess long-term memory in the same way humans do. Instead, they operate within what we call a context window. The context window determines the amount of text (measured in tokens) that the model can consider when making predictions. It provides the model with a form of “short-term memory.”
For follow-up questions, the model relies on this context window. So, when you ask a follow-up question, the model factors in the previous interaction (the original story and the request to shorten it) within its context window. It then generates a response based on that context. However, it’s crucial to note that the context window has a fixed size, which means it can only hold a certain number of tokens. If the conversation exceeds this limit, the oldest tokens are discarded, and the model loses track of that part of the dialogue.
It’s also worth mentioning that there is no real-time fine-tuning happening with each interaction. The model responds based on its pre-training and any fine-tuning that occurred prior to its deployment. This means that the model does not learn or adapt during real-time conversation but rather relies on the knowledge it has gained from pre-training and fine-tuning.
While standard LLMs like GPT do not typically utilize external memory systems or databases, some advanced models and applications may incorporate these features. External memory systems can store information beyond the limits of the context window. However, it’s important to understand that these features are not inherent to the base LLM architecture like GPT. In some systems, vector databases might be used to enhance the retrieval of relevant information based on queries, but this is separate from the internal processing of the LLM.
In relation to the “speak with your PDF” applications you mentioned, they generally employ a combination of text extraction and LLMs. The purpose is to interpret and respond to queries about the content of a PDF. These applications do not engage in real-time fine-tuning, but instead use the existing capabilities of the model to interpret and interact with the newly extracted text.
To summarize, LLMs like GPT operate within a context window and utilize patterns learned during pre-training and fine-tuning to generate responses. They do not possess long-term memory or real-time learning capabilities during interactions, but they can handle follow-up questions within the confines of their context window. It’s important to remember that while some advanced implementations might leverage external memory or databases, these features are not inherently built into the foundational architecture of the standard LLM.
Are you ready to dive into the fascinating world of artificial intelligence? Well, I’ve got just the thing for you! It’s an incredible book called “AI Unraveled: Demystifying Frequently Asked Questions on Artificial Intelligence.” Trust me, this book is an absolute gem!
Now, you might be wondering where you can get your hands on this treasure trove of knowledge. Look no further, my friend. You can find “AI Unraveled” at popular online platforms like Etsy, Shopify, Apple, Google, and of course, our old faithful, Amazon.
This book is a must-have for anyone eager to expand their understanding of AI. It takes those complicated concepts and breaks them down into easily digestible chunks. No more scratching your head in confusion or getting lost in a sea of technical terms. With “AI Unraveled,” you’ll gain a clear and concise understanding of artificial intelligence.
So, if you’re ready to embark on this incredible journey of unraveling the mysteries of AI, go ahead and grab your copy of “AI Unraveled” today. Trust me, you won’t regret it!
On today’s episode, we explored the power of GPTs and LLMs, discussing their ability to generate outputs, be fine-tuned for specific domains, and utilize a context window for related follow-up questions. We also learned about their limitations in terms of long-term memory and real-time updates. Lastly, we shared information about the book “AI Unraveled,” which provides valuable insights into the world of artificial intelligence. Join us next time on AI Unraveled as we continue to demystify frequently asked questions on artificial intelligence and bring you the latest trends in AI, including ChatGPT advancements and the exciting collaboration between Google Brain and DeepMind. Stay informed, stay curious, and don’t forget to subscribe for more!
So it has begun! Ok, so, yeah! There is not a lot of usage you can get out of this thing so you have to use the prompting very sparingly. It is days rate limiting not hours. 🙁 Let's start off with the media. Just one little dig at them because on CNBC they said, "the model is a smaller model". I think the notion here was that this model is a smaller model from a larger model so they just repeated that. I don't think this is a smaller model. Now, it could be that the heart of the model is smaller but what is going on behind the scenes with the thinking is a lot of throughput to model(s). I think the implication here is important to understand because on one hand there is an insanely low rate limit. when I say low I mean 30 messages per week low. On the other hand, the thinking is clearly firing a lot of tokens to get through a process of coming to a conclusion. The reason why I say it's a concert of models firing towards each other is because something has to be doing the thinking and another call (could be the same model) has to be doing the checking of the steps and other "things". In my mind, you would have a collection of experts doing each thing. Ingenious really. Plausibility model The plausibility model as the prime cerebral model. When humans think the smartest humans understand when they are headed down the right path and what is not the right path. You see this in Einstein's determination to prove the theory of relativity. His clutch of infamy came on the day when in an observatory (I think during an eclipse) he caught the images of light bending around our star proving that the fabric of space was indeed curved. Einstein's intuition here can not be underestimated. From Newton's intuition about gravity and mass to Einstein coming along and challenging that basic notion and to take it further and learn a new understanding of the how and why. It all starts with a plausibility of where one is going in their quest for knowledge. With my thoughts am I headed down the right path. Does the intuition of my thoughts make sense or should I change course to another or should I abandon the thought all together. This is truly what happens in the mind of an intelligent and sentient being on the level of genius. Not only the quest for knowledge but the ability to understand and know correctness wherever the path has led. In this, LLM's were at a distinct disadvantage because they are static capsules of knowledge frozen in time (and a neural network). In many ways they still are. However, OpenAI has done something that is truly ingenious to initially deal with this limitation. First, you have to understand the limitation of why being static and not dynamic is such a bad thing. If I ask you a question and tell you that the only way you can answer is to spit out the first thing that comes to your mind, without thinking, would produce in some probable occasions the wrong answer. With increasing difficulty of the question the more and more likely it would be that one would give the wrong answer. But human beings don't operate with such a constraint. They think through things as the level of difficulty of the perceived question is queried. One initial criticism is that this model over thinks all of the time. Case in point. It took 6 seconds to process hello. https://preview.redd.it/aih5umfz4iod1.png?width=1459&format=png&auto=webp&s=65bef59c6f7cdb52e9bef56c6d65e1a64b32f0d3 Eventually, I am sure OpenAI will figure this out. Perhaps a gate orchestrator model?! Some things don't require much thought; just saying. But back to the plausibility model concept. I don't know from Sunday if this is what is really going on but I surmise. What I imagine here is that smaller models (or the model) are quickly bringing information to a plausibility model. What is a mystery here is how on earth does the plausibility model "know" when it has achieved a qualitative output? Sam said something in an interview that leads me to believe that what's interesting about models as they stood since GPT-4 is that if you run something 10,000 times somewhere in there is correctness. Just getting the model to definitely give you that answer consistently and reliably is the issue. Hence, hallucinations. But what if, you could deliver responses and a model checks that response for viability. It's the classic chicken and egg problem. Does the correct answer come first or the wrong answer. Well, even going further, what if I present to the model many different answers. Choosing between the one that makes the most sense makes the problem solving a little more easier. It all becomes recursively probabilistic at this point. Of all of these incoming results keep checking to see if the path we're heading down is logical. Memory In another methodology, a person would keep track of where they were in the problem solving solution. It is ok to get to a certain point and pause for a moment to plan on where you would then go next. Hmmm. Memory, here is vital. You must keep the proper context of where you are in your train of thought or it is easy to lose track or get confused. Apparently OpenAI has figured out decent ways to do this. Memory, frankly, is horrible in all LLM's including GPT-4. Building up a context window is still such a major issue for me and the way the model refers to it is terrible. In GPT-o1-preview you can tell there are major strides in how memory is used. Not necessarily from the browser but perhaps on their side via backend services we humans would never see. Again, this would stem from the coordinating models firing thoughts in and out. Memory on the backend is probably keeping track of all of that which is probably the main reason why COT won't be spilling out to your browser amongst many other reasons. Such as entities stealing it. I digress. In the case of GPT-o1 memory seems to have a much bigger role and is actually used very well for the purpose of thinking. Clarity I am blown away by the totality of this. The promise is so clear of what this is. Something is new here. The model feels and acts different. It's more confident and clear. In fact, the model will ask you for clarity when you are conversing with it. Amazingly, it feels the need to grasp clarity for an input you are asking it. https://preview.redd.it/dr8zsc235iod1.png?width=1201&format=png&auto=webp&s=9f76caa2efe0251c414162faabc389132f4310e8 Whoa. That's just wild! It's refreshing too. It "knows" it's about to head into a situation and says, wait a minute let me get a better understanding here before we begin. Results and Reasoning The results are spectacular. It's not perfect and for the sake of not posting too many images I had to clean up my prompt so that it would be confused by something it asked me to actual clarify in the first place. So maybe while it isn't perfect, It sure the hell is a major advancement in artificial intelligence. Here is a one shot prompt that GPT-4, 4o continually fail at. The reason why I like this prompt is that it was something I saw in a movie and as soon as I saw the person write down the date from the guy asking him to do it I knew right away what was about to happen. Living in the US and travelling abroad you notice some oddities that are just the way things are outside of one's bubble. The metric system for example. Italy is notorious for giving Americans speeding tickets and to me the reason is because they have no clue how fast they are going with that damn speedometer in KPH. I digress. The point is, you have to "know" certain things about culture and likelihood to get the answer immediately. You have to reason through the information quickly to conclude to the correct answer. There is a degree of obviousness but not just from someone being smart but from someone having experienced things in the world. Here is GPT-o1-preview one shotting the hell out of this story puzzle. https://preview.redd.it/z6vdhal55iod1.png?width=1057&format=png&auto=webp&s=17d6499286d671449ca9a62fe44eba2ed37f9112 https://preview.redd.it/grphx9q65iod1.png?width=616&format=png&auto=webp&s=52457b4bd11c230590c2583aac6660b3d6b65e92 https://preview.redd.it/j0g5wm575iod1.png?width=796&format=png&auto=webp&s=cb258066c771c35ef5826ce7b37287dfc8ac712a As I said, GPT-4 and 4o could not do this in 1 shot no way, no how. I am truly amazed. The Bad Not everything is perfect here. The notion that this model can't not think about certain responses is a fault that OAI needs to address. There is no way that we will want to not being using this model all of the damn time instead of <4o. it not knowing when to think and when to just come out with it will be a peculiar thing. With that said, perhaps they are imagining a time when there are acres and acres of Nvidia Blackwell GPU's that will run this in near real time no matter the thought process. Also, the amount of safety that is embedded into this is remarkable. I would have done a section of a Safety model but that is probably coordinating here too but I think you get the point. Checks upon checks. The model seems a little stiff on the personality and I am unclear about the verbosity of the answers. You wouldn't believe it from my long posts but when I am learning something or interacting I am looking for the shortest and most clearest answer you can give. I can't really tell if that has been achieved here. Conversing and waiting multiple seconds is not something I am going to do to try and figure out. Which brings me to the main complaint as of right now. The rate limit is absurd. lol. I mean 30 per week how can you even imagine using that. For months now people will be screaming because of this and rightly so. Jensen can't get those GPU's to OpenAI fast enough I tell you. Here again, 2 years later and we are going to be capability starved by latency and throughput. I am just being greedy. Final Thoughts In the words of Wes Roth, "I am stunned". When the limitations are removed, throughput and latency are achieved, and this beast is let loose I have a feeling that this will be the dawn of a new era of intelligence. In this way, humanity has truly arrived at the dawn of an man made and plausibly sentient intelligence. There are many engineering feats that will be left to overcome but we are in a place that on this date 9/12/2024 the world will be forever changed. The thing is though this is only showcasing knowledge retrieval and reasoning. It will be interesting to see what can be done with vision, hearing, long term memory, and true learning. The things that will built with this may be truly amazing. The enterprise implications here are going to be profound. Great job OpenAI! submitted by /u/Xtianus21 [link] [comments]
Does anyone feel like agi is a hoax and ai will just end up being some convient reference tool .I just don’t see how people think ai is going to be able to make scientific breakthroughs when it all it does is try to predict the next word on the vast amount of data it’s trained on. It just doesn’t seem fundamentally right to tell a bunch of 0 and 1s to think submitted by /u/Electrical_Prune_932 [link] [comments]
OpenAI, Nvidia Executives Discuss AI Infrastructure Needs With Biden Officials.[1] Google unlists misleading Gemini video.[2] Google’s ALOHA Unleashed AI Robot Arm Can Now Tie Shoes Autonomously.[3] Meta is making its AI info label less visible on content edited or modified by AI tools.[4] Sources: [1] https://www.bloomberg.com/news/articles/2024-09-12/openai-nvidia-executives-discuss-ai-infrastructure-needs-with-biden-officials [2] https://www.theverge.com/2024/9/12/24242897/google-gemini-unlists-misleading-video-ai [3] https://www.techeblog.com/google-aloha-unleashed-robot-arm-tie-shoes/ [4] https://techcrunch.com/2024/09/12/meta-is-making-its-ai-info-label-less-visible-on-content-edited-or-modified-by-ai-tools/ submitted by /u/Excellent-Target-847 [link] [comments]
Today I Learned (TIL) You learn something new every day; what did you learn today? Submit interesting and specific facts about something that you just found out here.
Reddit Science This community is a place to share and discuss new scientific research. Read about the latest advances in astronomy, biology, medicine, physics, social science, and more. Find and submit new publications and popular science coverage of current research.
At the end of August this year, ASBC voted again in favor of IBA by 21–14. Yesterday, the Swiss Federal Tribunal has rejected the International Boxing Association's (IBA) appeal on the Court Arbitration of Sport's (CAS) and the International Olympic Committee's (IOC) decision to strip recognition of IBA as governing body of olympic boxing events. (Still looking for other articles about this...) submitted by /u/gho87 [link] [comments]