The Artificial Intelligence & Machine Learning Megathread

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > The Artificial Intelligence & Machine Learning Megathread

Macichne Leainig: Jul 26, 2012; by VG

Welcome to the Artificial Intelligence & Machine Learning megathread! :science:

Use this thread to discuss all things AI, ML, deep learning, GANs, CNNs, RNNs, transformers, data science, and all other kind of gobbledygook!
This OP is a work in progress, and I'll admit upfront that I have a traditional software development background and therefore no significant formal education in data science or statistics.
So help make this OP better by PMing me suggestions or posting them in this thread.

What is Artificial Intelligence?

Broadly, Wikipedia defines artificial intelligence as "intelligence demonstrated by machines, as opposed to natural intelligence displayed by animals including humans." And I think it's fair to call it a broad concept, because there's a ton of definitions for AI. In practice, AI is used in the forms of machine learning, and more specifically, deep learning.

Okay, so what is Machine Learning then?

Again ripping from Wikipedia: "Machine learning (ML) is the study of computer algorithms that can improve automatically through experience and by the use of data."

Simply put: Given data, an algorithm will learn roughly the relationship between variables in your data. Classically, you might already be familiar with a line of best fit; this is a simplistic example of how machine learning works, but that should hopefully give you a rough idea of what machine learning aims to accomplish.

Of course, more excitingly, there's:

Deep Learning :pseudo:

Obligatory Wikipedia definition: "Deep learning (also known as deep structured learning) is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised."

That's a lot of stuff! The key operating word here is "deep" and that's because often, neural networks will have multiple layers. Each layer typically extracts higher-level features from the raw input. A tangible example would be detecting faces in an image. Here's what the layers might learn in this case:

Naturally, the problem with deep learning and having many layers is that it's memory and computationally expensive. This is why desktop GPUs are often used for training deep learning models, and GPUs with high amounts of VRAM are preferred. That doesn't mean you need a RTX 3090; I've gotten away with a RTX 3070 with "only" 8GB of VRAM for deep learning, though I also have a machine with a RTX 2080 Ti that I use for more memory intensive tasks. But a lot of modern deep learning frameworks (i.e., Tensorflow and PyTorch) are accelerated by NVIDIA's CUDA, and that means you might want a new-ish NVIDIA card to train deep neural networks. Some state of the art models are trained on multiple GPUs even, but I've found in practice that's only if you want to squeeze the highest Mean Average Precision from your dataset or whatnot.

Okay, so what can you do with machine learning and deep learning?

Regression
Regression is a form of supervised learning - meaning the model requires input data and target output values to train - where you want the model to be able to output predicted values based on arbitrary input data. Again, for an example, here I'll link to the Linear Regression example in the library scikit-learn - they can do a better job explaining than I can.

https://scikit-learn.org/stable/modules/linear_model.html
Classification
Classification is another form of supervised learning. Instead of outputting continuous values, in classification the output is a discrete set of values. It can be binary classification - this is the classic Hotdog/Not Hotdog from Silicon Valley - or multi-class classification, for example with the COCO dataset, which has 80 distinct object classes - and anything in between, of course.

Example of classification for hand-written digits with scikit-learn:
https://scikit-learn.org/stable/aut...assification-py
Data Generation
I don't know much about GANs personally, but you've probably seen them on sites such as https://thispersondoesnotexist.com which generates images of people that do not exist, or OpenAI's GPT-3 at https://github.com/openai/gpt-3 which has been used to generate text-based content. It's pretty neat stuff!
More??? No Terminators or Skynet yet, sorry.

Machine Learning Frameworks

For better or for worse, a lot of the popular frameworks are in Python these days. The major two deep learning frameworks are PyTorch and TensorFlow, and for standard machine learning algorithms such as logistic regression or Support Vector Machines are part of scikit-learn. These three are all fairly easy to use, and there are often larger libraries built on top of them, such as the Open MMLab ecosystem, Facebook's own libraries such as detectron2.

How to get started with AI/ML
There are tons of resources of varying quality out there. A few notable ones:

FastAI - Practical Deep Learning for Coders
Official TensorFlow tutorials
Official PyTorch tutorials
scikit-learn's official examples
Your suggestion here!

There's a lot to think about in the field of AI, ML, deep learning, or whatever buzzword is hot at the current time. My personal expertise is in computer vision, object detection and those sort of tasks, so I'm happy to discuss and help out people with that in this thread.

In closing, I give you an image of a robot looking at a wall of science-y greeble, as this definitely represents AI/ML in a picture:

Macichne Leainig fucked around with this message at 18:01 on Feb 9, 2022

# ¿ Feb 9, 2022 17:38

Adbot: ADBOT LOVES YOU

# ¿ May 17, 2024 01:54

Macichne Leainig: Jul 26, 2012; by VG

Reserved for more.

# ¿ Feb 9, 2022 17:38

Macichne Leainig: Jul 26, 2012; by VG

I feel the same. It's good that I have a background in both front-end and back-end development, because most of what I've done for work has been setting up tools to manage our datasets, models, etc and not, y'know, data science. But hey, someone's gotta do it I guess.

Besides, we hired some people who actually know more of the data science stuff, and I'm absorbing a ton of information from them. It's nice.

# ¿ Mar 7, 2022 16:18

Macichne Leainig: Jul 26, 2012; by VG

Raenir Salazar posted:

I have a project where I need to train a CNN into detecting different classifications of lets say images of cars. Ford, Ferrari, Mitsubishi, and Toyota; while I can get datasets with lots of cars and maybe a dataset that is only ford I cannot seem to find a convenient dataset that has all four neatly divided into categories.

Is there a trick I can do on a dataset that I know has all four mixed together to neatly divide them (like I dunno, some sort of unsupervized clustering to try to pre-divide/sort the data to cut down on the work I have to do manually) or do I basically have to divide it all manually?

Or do we have a more dedicated data science thread for this question?

If the dataset is not labeled in some way and you can't programmatically divide them then any unsupervised algorithm won't really help you; there's enough natural variance in an RGB image that you couldn't really do that without a trained classifier in the first place.

Theoretically if a car manufacturer has a ubiquitous enough design language you wouldn't need that many samples but of course when you add wear and tear, different generations, etc into the mix that muddies the waters. But anyway this sentence is mostly rambling.

Since this is a uni project a predone dataset is probably your best bet, this one seems to be from Stanford, has numerous samples, and has make/model in the features from what I can tell. I couldn't tell you what the distribution of makes is, but I hope a Stanford dataset would have tried to somewhat balance that.

https://www.kaggle.com/datasets/jessicali9530/stanford-cars-dataset
http://ai.stanford.edu/~jkrause/cars/car_dataset.html

Kaggle is always a good resource for things like this in my experience.

# ¿ Jun 8, 2022 00:14

Macichne Leainig: Jul 26, 2012; by VG

Hammerite posted:

This is a very pedestrian question, but I wanted to try out the ChatGPT demo and when I try to log in it just says "The email you provided is not supported". This occurs consistently when trying to log in in several ways:

- with my personal microsoft account
- with my work microsoft account
- trying to create an account with my work email address

I hate to ask the obvious but have you tried a different browser? For whatever reason I can't login to my work's lovely monorepo app in Firefox but anything Chromium-based is fine

# ¿ Apr 17, 2023 01:51

Macichne Leainig: Jul 26, 2012; by VG

Their servers are also getting slammed pretty hard, so it could just be that you caught it at a particularly busy moment and it barfed strangely. I have to admit I paid for a month of ChatGPT Plus just to try it out and it's not really worth it (still kind of buggy, GPT-4 limited to 25 messages every three hours and still fails to include the whole content regularly on longer output.)

# ¿ Apr 17, 2023 21:35

Macichne Leainig: Jul 26, 2012; by VG

Microsoft has had a 49% stake in the company for a while now so it is realistically no different than it was before imho (that is to say, Capitalism Sucks)

# ¿ Nov 20, 2023 15:28

Macichne Leainig: Jul 26, 2012; by VG

Unions kick rear end.

Lol, and even lmao to the board today. They're gonna need it

# ¿ Nov 20, 2023 18:41

Macichne Leainig: Jul 26, 2012; by VG

The CEO can have those kinds of thoughts, I can�t control his brain. But for the love of god you do not need to share every opinion you have online

# ¿ Nov 21, 2023 16:04

Adbot: ADBOT LOVES YOU

# ¿ May 17, 2024 01:54

Macichne Leainig: Jul 26, 2012; by VG

The custom GPT stuff is pretty impressive given that the requirements for training a decent LLM with any accuracy is pretty god damned ridiculous in my experience. Abstracting all that away to yet another GPT prompt web interface is very useful and saves a ton of time. Naturally, though, we even had a discussion today at work about not relying on Open AI stuff due to recent turbulence, so we are seeing what our company will allow us to afford in AWS

# ¿ Nov 21, 2023 20:01

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > The Artificial Intelligence & Machine Learning Megathread