Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Carp
May 29, 2002

Over the past month, I have conducted research on ChatGPT, GPT-3, and GPT-4, which are examples of a type of AI that emerged in 2018. These models owe their success to a seminal paper from OpenAI titled "Improving Language Understanding by Generative Pre-Training" (Radford et al., 2018), as well as previous research in the field. While not an entirely new approach, as the size of the model increases, it becomes far more useful than originally thought possible.

GPT models do not store text, in any form, and instead utilize artificial neural networks, which are a simplified model of a brain's neural network. Numerical values are stored, and their purpose is to affect input values as they pass through the different layers and neural nodes to the final output. This process occurs on a very small, sub-word scale.

To some observers, it appears as if we have opened a door to whole new paths towards AGI, or Artificial General Intelligence. However, this is a somewhat speculative claim and runs afoul of current technical understanding. The determination of what constitutes AGI is up for debate, as well, and different disciplines have different definitions. Nevertheless, it is clear that GPT is not merely munging existing text; it is building its output word by word, line by line, and concept by concept. The model exhibits a deep understanding of how concepts relate to each other.

The core of GPT-3 (as well as other variants such as GPT-4), which does not include tuning, reinforcing, and other downstream steps, is primarily based on a pre-trained autoregressive language model. This model is a type of large language model (LLM) used to predict the continuation of presented text. LLMs use neural networks that are trained through self-supervised learning (SSL). During its training, GPT-3's SSL process consumed an enormous corpus of multilingual text (including source code and conlangs), totaling 45 terabytes. This training process determines the weights and biases (i.e., parameters) assigned to nodes, with each path to a node given its own node parameter. The GPT-3 model has approximately 175 billion parameters and around 295 thousand nodes across its various layers. It is important to note that the model does not store the training text corpus in any way, and it does not reference a large database to produce inference output. During training, GPT-3 was given pretext problems to solve that involved finding patterns and relationships in the training data (text corpus). The training process did not start with a prior comprehension of language. Instead, the model was given the capacity to tokenize text, whereby tokens were generated from sequences of approximately four characters in length (words may consist of more than one token). Subsequently, each pretext training task furnished the model with contextual information and rules to facilitate pattern recognition. This process resulted in a vast map of relationships, where correlations, contextual hierarchies, probabilities, domains, concepts, and more were stored. The model developed an understanding on its own, through training, of language parts, such as nouns and sentences, and all the way up to social constructs and other big picture classifications.

What we are left with is a huge map of relationships. During training, GPT-3 was asked to iterate over, to train on predicting missing tokens in a variety of contexts. By predicting what comes next in a series of tokens or filling in gaps, given enough tokens and their use, correlations, contextual hierarchies, probabilities, domains, concepts, and more are stored. Emergent abilities unexpectedly arise from these relationships (plus tuning and safety) as more data and parameters are added.

GPT-3 Emergent Abilities (list generated by ChatGPT):

  • Few-shot learning: GPT-3 is able to learn new tasks quickly with only a few examples, sometimes as few as one or two examples. This is due to the model's ability to generalize from its pre-training and adapt to new tasks with minimal fine-tuning.
  • Natural language understanding: GPT-3 has demonstrated a high level of proficiency in understanding and generating natural language text, including the ability to complete text prompts, answer questions, and generate coherent and relevant text in a conversational setting.
  • Common sense reasoning: Although GPT-3 does not have explicit knowledge of the world, it has demonstrated an ability to reason about common sense knowledge and make logical inferences based on context.
  • Language translation: GPT-3 has shown some ability to translate between languages, although its performance is currently not as good as specialized machine translation models.
  • Creative writing: GPT-3 has been used to generate creative writing, including poetry and fiction, and has demonstrated some ability to mimic the writing style of different authors or genres.
  • Code generation: GPT-3 has also been used to generate code, including HTML, CSS, and JavaScript, and has shown some ability to generate code that is functional and syntactically correct.

GPT-4 emergent abilities (list generated by Bing Chat):

  • Creativity: The ability to generate, edit, and iterate with users on creative and technical writing tasks, such as composing songs, writing screenplays, or learning a user’s writing style.
  • Visual input: The ability to process images as well as text and use them to generate relevant and coherent responses.
  • Longer context: The ability to handle much more nuanced instructions and maintain longer conversations than previous models.
  • Advanced reasoning: The ability to solve difficult problems with greater accuracy, thanks to its broader general knowledge and problem solving abilities.
  • Humour interpretation: The ability to interpret humour and distill the essence of the joke into a clear and logical explanation.
  • Theory of mind: The ability to impute unobservable mental states to others, such as beliefs, desires, intentions, emotions, etc. It is central to human social interactions, communication, empathy, self-consciousness, and morality. A study by Michal Kosinski1 tested several language models using 40 classic false-belief tasks widely used to test theory of mind in humans. The study found that GPT-4 solved nearly all the tasks (95%), suggesting that theory of mind-like ability may have spontaneously emerged as a byproduct of language models’ improving language skills. However, this ability may not be equivalent or comparable to human theory of mind, as it may depend on the input and the task, and may not reflect genuine understanding or empathy.

This is what Bing Chat (based on GPT-4 + search results) had to say about whether artificial general intelligence (AGI) has emerged, or can emerge, from a system like GPT:

"According to the search results, some researchers have claimed that GPT-4 is showing “sparks” of artificial general intelligence (AGI), meaning that it has capabilities that are at or above human level across a variety of domains and tasks. However, this claim is not universally accepted or proven, and there is no clear or agreed-upon definition of what constitutes AGI.

"It is possible that AGI could emerge in a GPT-like model at some point in the future, if the model is scaled up sufficiently and trained on diverse and high-quality data. However, it is not guaranteed or inevitable that this will happen, as there may be fundamental limitations or challenges that prevent GPT-like models from achieving true AGI, such as their reliance on next-word prediction, their lack of causal understanding, their potential for harmful outputs, and their alignment with human values and goals. Moreover, there may be alternative or better approaches to achieving AGI that do not rely on GPT-like models, such as hybrid systems that combine symbolic and sub-symbolic methods, or neuromorphic systems that mimic the structure and function of biological brains.

"Therefore, it is not clear or certain that GPT-like models are a valid or optimal approach to achieving AGI. However, they are still valuable and useful tools for advancing AI research and applications, and for exploring the possibilities and implications of AGI. They also pose significant ethical and social challenges that need to be carefully considered and addressed by researchers, developers, users, and policymakers."

---------------

I put the above together from notes I've been taking, partly for work. I thought you guys might find it useful. The progression towards AGI is fascinating and fills me with hope.

Adbot
ADBOT LOVES YOU

Carp
May 29, 2002

GPT does not regurgitate text. After training, the neural net's parameters are finalized, and it does not have access to any form of text. Base GPT does not reference a database. However, some products built on GPT give it access to various data sources, such as Bing Chat with live search results or Wolfram Alpha (GPT-4 Plugin). How a data source is used during a session with a GPT product is determined by the natural language instructions given to GPT before the session starts and the instructions given by the user during their session. Nearly all the abilities that people are excited about are emergent and were discovered in, or coaxed from, GPT's neural net, which is a store of relationships, both large and small in scope, using tokens itself discovered while training. No human ever gave it an explicit description of language. In fact, it trained on numerous different languages all at once, consuming a huge collection of free-form, unlabeled text using self-supervised learning. It was not given labeled sets of data.

GPT can learn things and even be taught within a session to take on a specific role, limit its scope to a particular context, or perform other tasks. Using natural language, I was able to teach GPT to process claims at work, as I would with a human using the same paragraphs of text. It was that simple. However, the problem I ran into, which would prevent me from putting it into one of my company's products this morning, is the rare hallucination when GPT finds a connection to something outside the tasks domain and becomes convinced that it is relevant. But, this is GPT-3.5, and I have heard that there are significant improvements in GPT-4 on that front.

There is indeed something different going on as the scale increases. This is not a beefed-up Siri or something preprogrammed as an expert system.

Carp fucked around with this message at 16:16 on Mar 27, 2023

Carp
May 29, 2002

gurragadon posted:

Thanks for the post, the emergent ability from these systems is really interesting and to me unexpected. Could you say what industry you work in and why you are looking into AI for it? If you can't I understand but would be interested.

I work for a company that processes advertising co-op claims as a software engineer and developer. AI has been a back-burner interest of mine for decades, but I'm far from an expert and have had little experience coding, or coding for, AI systems.

[edit]

Err, meant to add, I'm looking into using GPT as a co-auditor to help human auditors triage claims. None of our customers would be remotely interested in claim processing being completely automated using AI. They hire us for our attention to detail and support a human can provide.

Carp fucked around with this message at 16:25 on Mar 27, 2023

Carp
May 29, 2002

BrainDance posted:

[...]
A thing Ive been saying since I sorta stumbled into finetuning an incredibly small model (1.3B) into being roughly as good as GPT3 on one specific task (but only that specific task) is that I think transformers can potentially be very useful for a huge variety of things we haven't really tried them on yet if they're purpose trained for that specific thing. GPT is very general, so it's going to be a limited jack of all trades. But a model that follows the same principles and is only trained to do one specific task but of the same overall complexity might be very, very capable and very useful at that one task.

There are also now plugins for GPT-4 that use, I believe, natural language to communicate back and forth. This will allow GPT to defer to more current sources of data, consume that data, and use it in it's reasoned response. Plugins can also add features, but I'm not sure how that works. I hear memory could be, or is, a feature plugin. If it is like other implementations I've seen, the memory is probably a condensed natural language summary of previous sessions. There was a paper written a week or two ago about how GPT-4 can use "tools." It understands instructions that tell it to pass in info to an external command and use the value returned by the command. Sounds very much like the plugin architecture released recently.

The thought is, GPT can grow in data size and in parameter count as versions are released, but it can also grow "sideways" with plugins and products, which may bring, in combination with other plugins or GPT's base, their own unexpected emergent abilities.

Carp fucked around with this message at 17:18 on Mar 27, 2023

Carp
May 29, 2002

gurragadon posted:

Your job seems like a perfect fit for collaborative AI program. The AI could perform a first check, then the human checks (I'm assuming it's already redundant on the human side), and then a final AI check for any missed issues. I guess the main thing would be if it actually ended up being more accurate or if the AI program is incomplete enough that it adds more errors than it removes.

Exactly. The tests with plain text claims were very promising, but GPT-3.5 encounters serious issues with dates. It hallucinates the current date (even when provided with the correct date) and makes errors in date comparison and calculation. Therefore, I will be unable to proceed until the GPT-4 beta is open to me (at the very least, tool use = external date calculator). Additionally, we require the image parsing capabilities of GPT-4 to authenticate claim attachments/PoPs.

Carp
May 29, 2002

Centurium posted:

Thiis is, in fact, the output of an AI model. I say that because it is jam packed with kinda true or false claims backed up by explanations that are either unrelated or outside the scope of the claim being made. Pretty funny though how outrageous the claims are. GPT doesn't have access to text! What does the model act on if not text?

Yeah, it probably was a poor overview. There wasn't really a clear audience in mind, it was just a collection of notes I was keeping for work. The follow ups were a continuation of that. There is plenty missing in the technical description. But, I think you misunderstood, Centurium. I was not trying to defend a claim, but trying to describe something very technical in a simpler way by dispelling what I believe is a misperception of how the technology works. When I say it does not have access to text, I mean that it does not have access to the training data after it is trained, and that it does not use a database to look up answers. GPT infers a continuation of the text prompt.

Carp fucked around with this message at 22:03 on Mar 27, 2023

Carp
May 29, 2002

Gumball Gumption posted:

Personally I compare training data and the things AI then produces to pink slime, the stuff they make chicken McNuggets out of. It turns the training data into a slurry that it then can produce results with that are novel but still also dependent on and similar to the original training data.

Right! It's kind of like the brain, but much simpler. I wonder how much of the learned probability of the next token to appear in one language is available when weighing the parameters to output another language? The ability to translate was not expected of GPT.

Carp
May 29, 2002

BrainDance posted:

A thing Ive been saying since I sorta stumbled into finetuning an incredibly small model (1.3B) into being roughly as good as GPT3 on one specific task (but only that specific task) is that I think transformers can potentially be very useful for a huge variety of things we haven't really tried them on yet if they're purpose trained for that specific thing. GPT is very general, so it's going to be a limited jack of all trades. But a model that follows the same principles and is only trained to do one specific task but of the same overall complexity might be very, very capable and very useful at that one task.

BrainDance, I meant to ask this earlier: which small model were you fine-tuning? GPT-3 is open source and uses TensorFlow and other common libraries, so it could be fun to translate it from Python to C#. However, that's likely laborious. Alternatively, I could try fine-tuning a small model for a specific task, as you suggested, and spend less time coding.

By the way, have you come across the paper that discusses fine-tuning, or maybe training with supervised learning, a small model using the GPT-3 API? The paper claimed that it performed almost as well as GPT-3.

Carp
May 29, 2002

BrainDance posted:

I was finetuning GPT-Neo, mostly 1.3B, because anything larger needed a lot of ram (deepspeed offloads some of the vram stuff to normal ram) and if you're using swap instead the training time jumps from hours to days. LLaMA got a lot of people paying attention to this though, and now we can use LoRA with the language models so I've been doing that with LLaMA.

Didn't see that paper but if you find it again let me know.

Have you come across "GPT in 60 Lines of NumPy" as a way of understanding the basics? There's a post about it on Hacker News that I found informative. Although I may end up using something like LLaMA in the future, right now I feel like I need to learn from scratch and build a foundation. I'm feeling a bit lost with my usual hacking methods of learning. Also, I realized that I was mistaken earlier - GPT-3 isn't actually open source. There's a repository out there, but it is just data supporting a paper.

Wow, yeah, LoRA sounds very interesting, and I agree that it would make fine-tuning a large model much easier. How has it worked out for you? There is so much new information out there about deep learning and LLMs. If I come across the paper again, I'll be sure to let you know.

Carp
May 29, 2002

BrainDance posted:

[...]
Didn't see that paper but if you find it again let me know.

This is likely the closest I'm going to get to finding the paper, and it is more recent, but maybe not new to you. GPT4All is a LLaMa model finetuned on prompt-generation pairs generated with the help of OpenAI's GPT-3.5-Turbo API.

https://github.com/nomic-ai/gpt4all
2023_GPT4All_Technical_Report.pdf
Self-Instruct: Aligning Language Model with Self Generated Instructions

[edit] Found it! https://crfm.stanford.edu/2023/03/13/alpaca.html

cat botherer posted:

I think transfer learning tools like LoRA are going to be the main way that stuff like ChatGPT gets used in industry. It's certainly been the main (only) way I've used language models in the past.

What have you used them for in the past and what do think of ChatGPT and GPT-4?

Carp fucked around with this message at 13:49 on Mar 31, 2023

Adbot
ADBOT LOVES YOU

Carp
May 29, 2002


That's a pretty good summary. Much better than my notes earlier in the thread, which are a little confused.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply