|
Over the past month, I have conducted research on ChatGPT, GPT-3, and GPT-4, which are examples of a type of AI that emerged in 2018. These models owe their success to a seminal paper from OpenAI titled "Improving Language Understanding by Generative Pre-Training" (Radford et al., 2018), as well as previous research in the field. While not an entirely new approach, as the size of the model increases, it becomes far more useful than originally thought possible. GPT models do not store text, in any form, and instead utilize artificial neural networks, which are a simplified model of a brain's neural network. Numerical values are stored, and their purpose is to affect input values as they pass through the different layers and neural nodes to the final output. This process occurs on a very small, sub-word scale. To some observers, it appears as if we have opened a door to whole new paths towards AGI, or Artificial General Intelligence. However, this is a somewhat speculative claim and runs afoul of current technical understanding. The determination of what constitutes AGI is up for debate, as well, and different disciplines have different definitions. Nevertheless, it is clear that GPT is not merely munging existing text; it is building its output word by word, line by line, and concept by concept. The model exhibits a deep understanding of how concepts relate to each other. The core of GPT-3 (as well as other variants such as GPT-4), which does not include tuning, reinforcing, and other downstream steps, is primarily based on a pre-trained autoregressive language model. This model is a type of large language model (LLM) used to predict the continuation of presented text. LLMs use neural networks that are trained through self-supervised learning (SSL). During its training, GPT-3's SSL process consumed an enormous corpus of multilingual text (including source code and conlangs), totaling 45 terabytes. This training process determines the weights and biases (i.e., parameters) assigned to nodes, with each path to a node given its own node parameter. The GPT-3 model has approximately 175 billion parameters and around 295 thousand nodes across its various layers. It is important to note that the model does not store the training text corpus in any way, and it does not reference a large database to produce inference output. During training, GPT-3 was given pretext problems to solve that involved finding patterns and relationships in the training data (text corpus). The training process did not start with a prior comprehension of language. Instead, the model was given the capacity to tokenize text, whereby tokens were generated from sequences of approximately four characters in length (words may consist of more than one token). Subsequently, each pretext training task furnished the model with contextual information and rules to facilitate pattern recognition. This process resulted in a vast map of relationships, where correlations, contextual hierarchies, probabilities, domains, concepts, and more were stored. The model developed an understanding on its own, through training, of language parts, such as nouns and sentences, and all the way up to social constructs and other big picture classifications. What we are left with is a huge map of relationships. During training, GPT-3 was asked to iterate over, to train on predicting missing tokens in a variety of contexts. By predicting what comes next in a series of tokens or filling in gaps, given enough tokens and their use, correlations, contextual hierarchies, probabilities, domains, concepts, and more are stored. Emergent abilities unexpectedly arise from these relationships (plus tuning and safety) as more data and parameters are added. GPT-3 Emergent Abilities (list generated by ChatGPT):
GPT-4 emergent abilities (list generated by Bing Chat):
This is what Bing Chat (based on GPT-4 + search results) had to say about whether artificial general intelligence (AGI) has emerged, or can emerge, from a system like GPT: "According to the search results, some researchers have claimed that GPT-4 is showing “sparks” of artificial general intelligence (AGI), meaning that it has capabilities that are at or above human level across a variety of domains and tasks. However, this claim is not universally accepted or proven, and there is no clear or agreed-upon definition of what constitutes AGI. "It is possible that AGI could emerge in a GPT-like model at some point in the future, if the model is scaled up sufficiently and trained on diverse and high-quality data. However, it is not guaranteed or inevitable that this will happen, as there may be fundamental limitations or challenges that prevent GPT-like models from achieving true AGI, such as their reliance on next-word prediction, their lack of causal understanding, their potential for harmful outputs, and their alignment with human values and goals. Moreover, there may be alternative or better approaches to achieving AGI that do not rely on GPT-like models, such as hybrid systems that combine symbolic and sub-symbolic methods, or neuromorphic systems that mimic the structure and function of biological brains. "Therefore, it is not clear or certain that GPT-like models are a valid or optimal approach to achieving AGI. However, they are still valuable and useful tools for advancing AI research and applications, and for exploring the possibilities and implications of AGI. They also pose significant ethical and social challenges that need to be carefully considered and addressed by researchers, developers, users, and policymakers." --------------- I put the above together from notes I've been taking, partly for work. I thought you guys might find it useful. The progression towards AGI is fascinating and fills me with hope.
|
# ¿ Mar 26, 2023 21:44 |
|
|
# ¿ May 10, 2024 00:24 |
|
GPT does not regurgitate text. After training, the neural net's parameters are finalized, and it does not have access to any form of text. Base GPT does not reference a database. However, some products built on GPT give it access to various data sources, such as Bing Chat with live search results or Wolfram Alpha (GPT-4 Plugin). How a data source is used during a session with a GPT product is determined by the natural language instructions given to GPT before the session starts and the instructions given by the user during their session. Nearly all the abilities that people are excited about are emergent and were discovered in, or coaxed from, GPT's neural net, which is a store of relationships, both large and small in scope, using tokens itself discovered while training. No human ever gave it an explicit description of language. In fact, it trained on numerous different languages all at once, consuming a huge collection of free-form, unlabeled text using self-supervised learning. It was not given labeled sets of data. GPT can learn things and even be taught within a session to take on a specific role, limit its scope to a particular context, or perform other tasks. Using natural language, I was able to teach GPT to process claims at work, as I would with a human using the same paragraphs of text. It was that simple. However, the problem I ran into, which would prevent me from putting it into one of my company's products this morning, is the rare hallucination when GPT finds a connection to something outside the tasks domain and becomes convinced that it is relevant. But, this is GPT-3.5, and I have heard that there are significant improvements in GPT-4 on that front. There is indeed something different going on as the scale increases. This is not a beefed-up Siri or something preprogrammed as an expert system. Carp fucked around with this message at 16:16 on Mar 27, 2023 |
# ¿ Mar 27, 2023 16:10 |
|
gurragadon posted:Thanks for the post, the emergent ability from these systems is really interesting and to me unexpected. Could you say what industry you work in and why you are looking into AI for it? If you can't I understand but would be interested. I work for a company that processes advertising co-op claims as a software engineer and developer. AI has been a back-burner interest of mine for decades, but I'm far from an expert and have had little experience coding, or coding for, AI systems. [edit] Err, meant to add, I'm looking into using GPT as a co-auditor to help human auditors triage claims. None of our customers would be remotely interested in claim processing being completely automated using AI. They hire us for our attention to detail and support a human can provide. Carp fucked around with this message at 16:25 on Mar 27, 2023 |
# ¿ Mar 27, 2023 16:19 |
|
BrainDance posted:[...] There are also now plugins for GPT-4 that use, I believe, natural language to communicate back and forth. This will allow GPT to defer to more current sources of data, consume that data, and use it in it's reasoned response. Plugins can also add features, but I'm not sure how that works. I hear memory could be, or is, a feature plugin. If it is like other implementations I've seen, the memory is probably a condensed natural language summary of previous sessions. There was a paper written a week or two ago about how GPT-4 can use "tools." It understands instructions that tell it to pass in info to an external command and use the value returned by the command. Sounds very much like the plugin architecture released recently. The thought is, GPT can grow in data size and in parameter count as versions are released, but it can also grow "sideways" with plugins and products, which may bring, in combination with other plugins or GPT's base, their own unexpected emergent abilities. Carp fucked around with this message at 17:18 on Mar 27, 2023 |
# ¿ Mar 27, 2023 16:50 |
|
gurragadon posted:Your job seems like a perfect fit for collaborative AI program. The AI could perform a first check, then the human checks (I'm assuming it's already redundant on the human side), and then a final AI check for any missed issues. I guess the main thing would be if it actually ended up being more accurate or if the AI program is incomplete enough that it adds more errors than it removes. Exactly. The tests with plain text claims were very promising, but GPT-3.5 encounters serious issues with dates. It hallucinates the current date (even when provided with the correct date) and makes errors in date comparison and calculation. Therefore, I will be unable to proceed until the GPT-4 beta is open to me (at the very least, tool use = external date calculator). Additionally, we require the image parsing capabilities of GPT-4 to authenticate claim attachments/PoPs.
|
# ¿ Mar 27, 2023 17:32 |
|
Centurium posted:Thiis is, in fact, the output of an AI model. I say that because it is jam packed with kinda true or false claims backed up by explanations that are either unrelated or outside the scope of the claim being made. Pretty funny though how outrageous the claims are. GPT doesn't have access to text! What does the model act on if not text? Yeah, it probably was a poor overview. There wasn't really a clear audience in mind, it was just a collection of notes I was keeping for work. The follow ups were a continuation of that. There is plenty missing in the technical description. But, I think you misunderstood, Centurium. I was not trying to defend a claim, but trying to describe something very technical in a simpler way by dispelling what I believe is a misperception of how the technology works. When I say it does not have access to text, I mean that it does not have access to the training data after it is trained, and that it does not use a database to look up answers. GPT infers a continuation of the text prompt. Carp fucked around with this message at 22:03 on Mar 27, 2023 |
# ¿ Mar 27, 2023 21:53 |
|
Gumball Gumption posted:Personally I compare training data and the things AI then produces to pink slime, the stuff they make chicken McNuggets out of. It turns the training data into a slurry that it then can produce results with that are novel but still also dependent on and similar to the original training data. Right! It's kind of like the brain, but much simpler. I wonder how much of the learned probability of the next token to appear in one language is available when weighing the parameters to output another language? The ability to translate was not expected of GPT.
|
# ¿ Mar 27, 2023 23:58 |
|
BrainDance posted:A thing Ive been saying since I sorta stumbled into finetuning an incredibly small model (1.3B) into being roughly as good as GPT3 on one specific task (but only that specific task) is that I think transformers can potentially be very useful for a huge variety of things we haven't really tried them on yet if they're purpose trained for that specific thing. GPT is very general, so it's going to be a limited jack of all trades. But a model that follows the same principles and is only trained to do one specific task but of the same overall complexity might be very, very capable and very useful at that one task. BrainDance, I meant to ask this earlier: which small model were you fine-tuning? GPT-3 is open source and uses TensorFlow and other common libraries, so it could be fun to translate it from Python to C#. However, that's likely laborious. Alternatively, I could try fine-tuning a small model for a specific task, as you suggested, and spend less time coding. By the way, have you come across the paper that discusses fine-tuning, or maybe training with supervised learning, a small model using the GPT-3 API? The paper claimed that it performed almost as well as GPT-3.
|
# ¿ Mar 28, 2023 20:00 |
|
BrainDance posted:I was finetuning GPT-Neo, mostly 1.3B, because anything larger needed a lot of ram (deepspeed offloads some of the vram stuff to normal ram) and if you're using swap instead the training time jumps from hours to days. LLaMA got a lot of people paying attention to this though, and now we can use LoRA with the language models so I've been doing that with LLaMA. Have you come across "GPT in 60 Lines of NumPy" as a way of understanding the basics? There's a post about it on Hacker News that I found informative. Although I may end up using something like LLaMA in the future, right now I feel like I need to learn from scratch and build a foundation. I'm feeling a bit lost with my usual hacking methods of learning. Also, I realized that I was mistaken earlier - GPT-3 isn't actually open source. There's a repository out there, but it is just data supporting a paper. Wow, yeah, LoRA sounds very interesting, and I agree that it would make fine-tuning a large model much easier. How has it worked out for you? There is so much new information out there about deep learning and LLMs. If I come across the paper again, I'll be sure to let you know.
|
# ¿ Mar 30, 2023 12:44 |
|
BrainDance posted:[...] This is likely the closest I'm going to get to finding the paper, and it is more recent, but maybe not new to you. GPT4All is a LLaMa model finetuned on prompt-generation pairs generated with the help of OpenAI's GPT-3.5-Turbo API. https://github.com/nomic-ai/gpt4all 2023_GPT4All_Technical_Report.pdf Self-Instruct: Aligning Language Model with Self Generated Instructions [edit] Found it! https://crfm.stanford.edu/2023/03/13/alpaca.html cat botherer posted:I think transfer learning tools like LoRA are going to be the main way that stuff like ChatGPT gets used in industry. It's certainly been the main (only) way I've used language models in the past. What have you used them for in the past and what do think of ChatGPT and GPT-4? Carp fucked around with this message at 13:49 on Mar 31, 2023 |
# ¿ Mar 31, 2023 00:12 |
|
|
# ¿ May 10, 2024 00:24 |
|
cat botherer posted:[...] That's a pretty good summary. Much better than my notes earlier in the thread, which are a little confused.
|
# ¿ Mar 31, 2023 02:00 |