Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Discendo Vox
Mar 21, 2013

We don't need to have that dialogue because it's obvious, trivial, and has already been had a thousand times.
Thanks, these responses are very helpful. I need to figure out how to persuade the relevant stakeholders that this would be similarly straightforward.

Jabor posted:

Have you tried just pointing Tesseract at your data and seeing how well it does?

I literally do not know how to do that. I do know that one implementation of the highly similar use case I'd mentioned uses tesseract.

Discendo Vox fucked around with this message at 21:57 on Jul 3, 2022

Adbot
ADBOT LOVES YOU

Analytic Engine
May 18, 2009

not the analytical engine


https://ali-design.github.io/deepcreativity/

w00tmonger
Mar 9, 2011

F-F-FRIDAY NIGHT MOTHERFUCKERS

Sorry if this has been answered somewhere else, but I feel like I'm kinda of in the deep end and don't know where to start.

Lim looking to work on a script to create product tags for some shopify listings, and an image recognition API seems like it would be ideal. I know gpt-4 has some image recognition in it, but access seems a bit weird/inconsistent and might be overkill.

What I want to do, is input a product title with an image (it's 3d printed miniature,so "vampire lord", and a pic of the sculpt), and have it spit out a list of product tags for my search/profuct-categories. I spend a ton of time indexing search terms, so some API would save me a ton of time.

cinci zoo sniper
Mar 15, 2013




w00tmonger posted:

Sorry if this has been answered somewhere else, but I feel like I'm kinda of in the deep end and don't know where to start.

Lim looking to work on a script to create product tags for some shopify listings, and an image recognition API seems like it would be ideal. I know gpt-4 has some image recognition in it, but access seems a bit weird/inconsistent and might be overkill.

What I want to do, is input a product title with an image (it's 3d printed miniature,so "vampire lord", and a pic of the sculpt), and have it spit out a list of product tags for my search/profuct-categories. I spend a ton of time indexing search terms, so some API would save me a ton of time.

To clarify, you have a collection of tags, and for each pair of title+photo you want an automated way to select the best matching tags from the collection?

w00tmonger
Mar 9, 2011

F-F-FRIDAY NIGHT MOTHERFUCKERS

cinci zoo sniper posted:

To clarify, you have a collection of tags, and for each pair of title+photo you want an automated way to select the best matching tags from the collection?

Sort of. so Shopify works off collections, which for me would be broad categories of sculpts (undead, human, beasts,terrain, etc), with some sub-categories (undead would have vampires, ghosts,zombies etc)

Each product has tags which describe the product which I can use to pop an item into a category, but are also used for searchability.

I want it to output any matching tags I've made so a listing can be assigned to any relevant categories, but also potentially add some tags I haven't thought of for searchability. The second part might be overkill, may just make more sense to generate a huge list of predefined tags to constrain it

Ex: I have a vampire castle sculpt. I want it to run through a tool, and output that it has the tags undead,vampire, and terrain. It would then further give me a handfull of adjacent tags like Dracula, Transylvania, count etc for search

Edit:I feel like I would need to train my own model? Given I want it to know what a mini of a vampire looks like. I have titles tied to 1.5k+ miniatures, but I feel like I might need to do something more broadly unless there's an existing solution

w00tmonger fucked around with this message at 19:58 on Apr 16, 2023

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe
This is a very pedestrian question, but I wanted to try out the ChatGPT demo and when I try to log in it just says "The email you provided is not supported". This occurs consistently when trying to log in in several ways:

- with my personal microsoft account
- with my work microsoft account
- trying to create an account with my work email address

For the microsoft account ones, I had to click through a page granting access to my profile to OpenAI, and I got notifications that the app was associated with my account. But even after granting that access, I still got the "email not supported" message when trying to log in.

Has anyone encountered this problem and got past it?

I googled 'chatGPT "The email you provided is not supported"' and got a load of SEO spam. I also googled 'chatGPT "The email you provided is not supported" reddit' and found a couple of Reddit threads where people were complaining about having the same problem, but none of them could provide consensus on what a solution might be. There was one guy who described a crazy procedure involving connecting to VPNs and using Tor which, if that's what I need to do to get to this thing, then gently caress that.

cinci zoo sniper
Mar 15, 2013




w00tmonger posted:

Sort of. so Shopify works off collections, which for me would be broad categories of sculpts (undead, human, beasts,terrain, etc), with some sub-categories (undead would have vampires, ghosts,zombies etc)

Each product has tags which describe the product which I can use to pop an item into a category, but are also used for searchability.

I want it to output any matching tags I've made so a listing can be assigned to any relevant categories, but also potentially add some tags I haven't thought of for searchability. The second part might be overkill, may just make more sense to generate a huge list of predefined tags to constrain it

Ex: I have a vampire castle sculpt. I want it to run through a tool, and output that it has the tags undead,vampire, and terrain. It would then further give me a handfull of adjacent tags like Dracula, Transylvania, count etc for search

Edit:I feel like I would need to train my own model? Given I want it to know what a mini of a vampire looks like. I have titles tied to 1.5k+ miniatures, but I feel like I might need to do something more broadly unless there's an existing solution

So, this is 2 separate tasks – 1) get tags for image, 2) generate new tags based on existing tags. The latter is something you can credibly do with any text model starting with GPT-3, after some prodding. The former, in industry terminology, would be “image classification”, assuming you have clean photos where the miniature is the only “feature” of the image. GPT-4 is multimodal, and accepts image inputs, but the image ingestion API is not enabled at the moment in the public OpenAI service, and it may ultimately depend on access to the 32k context length model, which is not publicly available eitehr, as yet. So you may have to shop around for some other model, and either consider trainign your own right away, or investigating fine-tuning an existing model with your data at a later point.

w00tmonger
Mar 9, 2011

F-F-FRIDAY NIGHT MOTHERFUCKERS

cinci zoo sniper posted:

So, this is 2 separate tasks – 1) get tags for image, 2) generate new tags based on existing tags. The latter is something you can credibly do with any text model starting with GPT-3, after some prodding. The former, in industry terminology, would be “image classification”, assuming you have clean photos where the miniature is the only “feature” of the image. GPT-4 is multimodal, and accepts image inputs, but the image ingestion API is not enabled at the moment in the public OpenAI service, and it may ultimately depend on access to the 32k context length model, which is not publicly available eitehr, as yet. So you may have to shop around for some other model, and either consider trainign your own right away, or investigating fine-tuning an existing model with your data at a later point.

Yeah I did a bit more digging around and 2 definitely seems easier/more-secondary.

If I can manage to build out a dataset for a model, I might be able to use tensorflow. I was hoping something might be a bit more baked re gpt4, but yeah the API isn't publicly available yet and is probably going to cost a pretty penny. If I can make my own model I could in theory run it locally on a timescale that works for my needs though so there is that at least

https://www.tensorflow.org/tutorials/images/classification

Macichne Leainig
Jul 26, 2012

by VG

Hammerite posted:

This is a very pedestrian question, but I wanted to try out the ChatGPT demo and when I try to log in it just says "The email you provided is not supported". This occurs consistently when trying to log in in several ways:

- with my personal microsoft account
- with my work microsoft account
- trying to create an account with my work email address

I hate to ask the obvious but have you tried a different browser? For whatever reason I can't login to my work's lovely monorepo app in Firefox but anything Chromium-based is fine

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

Macichne Leainig posted:

I hate to ask the obvious but have you tried a different browser? For whatever reason I can't login to my work's lovely monorepo app in Firefox but anything Chromium-based is fine

That is in no way obvious. It did however work. I can log in just fine in Firefox. I think that the fact they issue an error message saying that there is a problem with my email address, when in fact the problem is something else entirely, makes them appear extremely incompetent. Once I logged in they showed a message inviting me to join their Discord server and provide any feedback I have, so I did.

Macichne Leainig
Jul 26, 2012

by VG
Their servers are also getting slammed pretty hard, so it could just be that you caught it at a particularly busy moment and it barfed strangely. I have to admit I paid for a month of ChatGPT Plus just to try it out and it's not really worth it (still kind of buggy, GPT-4 limited to 25 messages every three hours and still fails to include the whole content regularly on longer output.)

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe
(this has no relation to my previous posts in this thread) (also it's me, someone with no actual knowledge of ML speaking my thoughts aloud and rubber-ducking, and you shouldn't read it because it'll be nonsense)

I find the idea of getting a computer to play strategy games using machine learning really interesting, but I haven't actually done anything to get experience in this. I'm a software developer, but I don't do anything to do with ML in my day job. I used to run a lovely website that implemented a niche boardgame so that people could play it online (I've taken it down due to concerns about personal legal exposure and users' data - the site was launched some years before GDPR was a reality, and was poorly coded). I would like to re-create the website, but to do it better with several years of working professionally as a developer under my belt. I'd also really like to implement an AI opponent - something the original implementation never had - and that's the idea that really interests me.

I recall hearing on the news a few years ago about how prominent teams in the field had created AIs that play Chess and Go at high levels, and how they did it without external training data because they would have their AIs play against one another and generate the training data from those games (possible I misunderstood aspects of this). I found this project (and an article by its author promoting it) which is an implementation of this sort of idea. There's a lot of jargon I don't have a handle on. If I understand correctly, the idea is that after a move has been made, the board state is evaluated for the probability that the AI player wins the game. Higher probability of winning => higher reward. And this is the basis for the training process. I could be fundamentally misunderstanding the ideas here, and inventing details to fill the gaps.

The author has an example of training an AI to play "Sushi Go", which is a card game of fairly low complexity. I noticed that the logic for creating input to the AI is basically turning information about the game into a huge 1-dimensional vector of floating point numbers (which are mostly booleans in disguise). But this implies that the AI will be trained to assume a very specific deck composition. As soon as you play with a slightly different deck - or more realistically for what I might use this idea for, a different board with different action spaces - the AI is totally useless. Are there models that allow you to associate different game resources and concepts with each other? Things like "there are n cards that are associated with this space on the board, and this many of them have been seen, and this many are in my hand"? Or "there are connections to this location on the board from such and such a set of other locations" - that can be combined in some way with information on what control I have on each of those locations? I'm sorry, this is probably being hopelessly vague. I guess what I'm trying to say is, are there models for an AI that allow you to pass in a set of inputs of variable size, or even have layers of the network that are instanced with one instance for each member of an index set? (I am aware of the concept of convolutional layers in image classification networks, so really what I'm thinking of is something like that but indexed by a discrete index set rather than by spatial extent.)

Also, I work day to day in C# and I don't like Python as a language to write more than small scripts in. Has anyone used ML.NET? Is it any good?

Hammerite fucked around with this message at 22:02 on Apr 23, 2023

QuarkJets
Sep 8, 2008

You can use an IDE to add static type-checking in a Python project

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe
I have edited my post to be less rude about Python, because I don't want all responses to my post to focus on that part of it.

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe
I went back and tried to read the material explaining that library more carefully. I still don't understand what's going on, but I think I understand a little better what it is I don't understand.

My existing mental model for how all this works is the "supervised learning" model, having read some explanations of it a few years ago and having found that it made sense to me. I read some "neural networks for beginners" free online text, which explained supervised learning and gradient descent through a metaphor of a ball rolling around a multi-dimensional landscape (representing a loss function) descending to a local minimum.

But that library doesn't use supervised learning, or not in the simple way I understand it. It uses something called "proximal policy optimization", which per Wikipedia is a kind of "reinforcement learning", which is a distinct paradigm to supervised learning. What I don't understand is how proximal policy optimization works - how the neural net is trained. It seems like it's more complex than the basic supervised-learning idea I read about a few years ago. I need to find an explanation for a non-specialist of how it works, ideally one that I can understand using some kind of metaphor like the "rolling ball" one.

My assumption - from a position of ignorance - was that an effective way to select actions for an AI would be:
- iterate over all possible actions (or a sample of them, if there are very many)
- for each action, use the state of the board afterwards as the input to a neural network
- the outputs of the neural network represent a prediction of which player will win
- choose the action that gives the greatest likelihood that the AI itself will win

The issue with that of course is that if you are doing "supervised learning" then you need a canonical answer, the "correct answer", as to which player is most likely to win... and how do you get to that with no data? That's probably why it's not done that way, I guess.

Jabor
Jul 16, 2010

#1 Loser at SpaceChem
Your naive assumption about an effective way to make an AI turns out to not be very effective in practice. Machine learning models turn out to be poor at consistently picking good moves, and an AI that makes the best move 90% of the time and horrific blunders the other 10% is very weak compared to a competent player.

The winning strategy (at least as used in the current most successful game-playing AIs) is to pair machine learning models with Monte-Carlo Tree Search. MCTS judges the strength of a position by making random moves for each player until the end of the game is reached, repeated a large number of times (e.g. 10,000) for each candidate action. The candidate action which wins the highest proportion of rollouts is the best one to select.

duck monster
Dec 15, 2004

Hammerite posted:

This is a very pedestrian question, but I wanted to try out the ChatGPT demo and when I try to log in it just says "The email you provided is not supported". This occurs consistently when trying to log in in several ways:

- with my personal microsoft account
- with my work microsoft account
- trying to create an account with my work email address

For the microsoft account ones, I had to click through a page granting access to my profile to OpenAI, and I got notifications that the app was associated with my account. But even after granting that access, I still got the "email not supported" message when trying to log in.

Has anyone encountered this problem and got past it?

I googled 'chatGPT "The email you provided is not supported"' and got a load of SEO spam. I also googled 'chatGPT "The email you provided is not supported" reddit' and found a couple of Reddit threads where people were complaining about having the same problem, but none of them could provide consensus on what a solution might be. There was one guy who described a crazy procedure involving connecting to VPNs and using Tor which, if that's what I need to do to get to this thing, then gently caress that.

ChatGPT can be weird to sign up for. As an alternative perhaps sign up for poe.com. Its got an interface to a bunch of these models, in limited capacities. I'm rather fond of "Claude" from anthropics, its about on level with GPT4/ChatGPT+, perhaps a little behind , but its ever so well behaved and seems pretty good at explaining itself. The only problem is one it shares with GPT, which is hallucinating its brain out if you ask for citations. A lot of folk in the AI research community are fond of Claude, as it "ConstitutionalAI" (which apparently is slightly different to the usual RLHF method of politeness training) seems to work really well, so its kind of a helpful friendly little dude without Bing's freakouts.

duck monster fucked around with this message at 06:12 on Apr 26, 2023

Turambar
Feb 20, 2001

A Túrin Turambar turun ambartanen
Grimey Drawer
I'm really starting to fear for my job

https://twitter.com/0xgaut/status/1658175583853977600

Keisari
May 24, 2011


Lmao, that is amazing. Chatgpt isn't usually so catty.

Edit:
But it's probably fake or was asked to provide insulting responses

ultrafilter
Aug 23, 2007

It's okay if you have any questions.


That's entirely plausible. ChatGPT is trying to predict what a human would say in response to the prompt and it probably has a lot of examples of that response in its training data.

poty
Jun 21, 2008

虹はどこで終わるのですか? あなたの魂の中で、または地平線で?
I'm currently working on "babby's first ML project" for a school assignment about setting up an SVM classifier to guess if a stock will go up the next day. It does this from a handful of features I came up with (that obviously can't predict that) using sklearn on Python.

The thing that frustrates me about sklearn is that I don't know how to guess whether a model fitting (as part of a RandomizedSearchCV for example) is going to take 5 seconds, or 5 minutes, or seemingly infinite time. I wish you could set some sort of timeout value like "if this set of parameters hasn't finished the fitting process in 5 seconds just kill it" but it doesn't look like that exists.

poty fucked around with this message at 18:26 on Jun 3, 2023

CarForumPoster
Jun 26, 2013

⚡POWER⚡

poty posted:

I'm currently working on "babby's first ML project" for a school assignment about setting up an SVM classifier to guess if a stock will go up the next day. It does this from a handful of features I came up with (that obviously can't predict that) using sklearn on Python.

The thing that frustrates me about sklearn is that I don't know how to guess whether a model fitting (as part of a RandomizedSearchCV for example) is going to take 5 seconds, or 5 minutes, or seemingly infinite time. I wish you could set some sort of timeout value like "if this set of parameters hasn't finished the fitting process in 5 seconds just kill it" but it doesn't look like that exists.

You can create watchdog timers in Python, then run .fit() inside them.

poty
Jun 21, 2008

虹はどこで終わるのですか? あなたの魂の中で、または地平線で?

CarForumPoster posted:

You can create watchdog timers in Python, then run .fit() inside them.

Thanks for the suggestion, but I don't think it works if I understand things correctly. There is only one call to .fit() for the whole RandomizedSearchCV optimization process, as opposed to one call to .fit() for each of the 100 sets of parameters (5 of which might never complete). I can put the global .fit() call inside a timer but then I don't get results for any set of parameters if one is slow (it's the same result as what I'm doing now, killing the kernel in Jupyter Notebook).

I guess the solution is to evaluate the sets of parameters "manually" without using something like RandomizedSearchCV or GridSearchCV, then I would in fact have access to the individual .fit() calls and could do that.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

poty posted:

Thanks for the suggestion, but I don't think it works if I understand things correctly. There is only one call to .fit() for the whole RandomizedSearchCV optimization process, as opposed to one call to .fit() for each of the 100 sets of parameters (5 of which might never complete). I can put the global .fit() call inside a timer but then I don't get results for any set of parameters if one is slow (it's the same result as what I'm doing now, killing the kernel in Jupyter Notebook).

I guess the solution is to evaluate the sets of parameters "manually" without using something like RandomizedSearchCV or GridSearchCV, then I would in fact have access to the individual .fit() calls and could do that.

Ahh, yea, I misunderstood.

Keisari
May 24, 2011

poty posted:

Thanks for the suggestion, but I don't think it works if I understand things correctly. There is only one call to .fit() for the whole RandomizedSearchCV optimization process, as opposed to one call to .fit() for each of the 100 sets of parameters (5 of which might never complete). I can put the global .fit() call inside a timer but then I don't get results for any set of parameters if one is slow (it's the same result as what I'm doing now, killing the kernel in Jupyter Notebook).

I guess the solution is to evaluate the sets of parameters "manually" without using something like RandomizedSearchCV or GridSearchCV, then I would in fact have access to the individual .fit() calls and could do that.

Sadly the only solution I have come up with is to extend the RandomizedSearchCV/GridSearchCV code itself to allow for this. And I can't be arsed so I just dealt with it.

hyphz
Aug 5, 2003

Number 1 Nerd Tear Farmer 2022.

Keep it up, champ.

Also you're a skeleton warrior now. Kree.
Unlockable Ben
Is there a good clear explanation of how backpropagation works and why?

Analytic Engine
May 18, 2009

not the analytical engine

hyphz posted:

Is there a good clear explanation of how backpropagation works and why?

9 years old:
http://karpathy.github.io/neuralnets/

newer:
https://cs231n.github.io/optimization-2/

hyphz
Aug 5, 2003

Number 1 Nerd Tear Farmer 2022.

Keep it up, champ.

Also you're a skeleton warrior now. Kree.
Unlockable Ben

That made several things much clearer - thanks!

Entropist
Dec 1, 2007
I'm very stupid.
Here's a more accessible and high level description of some issues by Karpathy: https://karpathy.medium.com/yes-you-should-understand-backprop-e2f06eab496b

hyphz
Aug 5, 2003

Number 1 Nerd Tear Farmer 2022.

Keep it up, champ.

Also you're a skeleton warrior now. Kree.
Unlockable Ben
Well, that article also confirms my suspicion that there are tensorflow script kiddies now.

Irony.or.Death
Apr 1, 2009


Yeah, that started even before they got bored of crypto - probably at least three years ago, maybe more

Stubear St. Pierre
Feb 22, 2006

hyphz posted:

Is there a good clear explanation of how backpropagation works and why?

Backprop is a computer sciencey version of the chain rule from elementary calculus. If you (or anyone who ever reads this) don't know elementary calculus, here's a crash course:

- The derivative: If you take the slope of a line over an infinitely small interval, it tells you the "instantaneous rate of change" of a function. You can tell whether the function is increasing, decreasing, or staying the same (at a maximum or minimum value) based on whether that value is positive, negative, or 0.
- The derivative of F(X) is written F'(X).
- In machine learning, you're trying to minimize a loss function. So that's why you need a derivative--I know I'm at a minimum if my derivative is negative for a while, then turns to 0.
- If you have multiple variables, F(x, y, z, t, w, ...) then let's just say we call this a "gradient." It tells us how much we're changing in a whole bunch of directions at once.

So here's backpropagation:

1) if I have some poo poo, and multiply that poo poo by a bunch of matrices, that's a composite function--x times a matrix F, times a matrix G, times a matrix H etc can be written as F(G(H(x))). This is what deep learning does. All deep learning actually is are a graph of tensors and ops as the edges connecting those tensors.
1.5) let's assume we can calculate a derivative of a function on a computer pretty easily.
2) the chain rule: the derivative ("gradient", it's actually the Jacobian iirc but who cares) of a well-behaved* composite function, F(G(H(X)) is F'(G(H(X)) * G'(H(X)) * H'(X). There's a very easy proof/derivation of this actually, but you can live a rich and fulfilling life without ever bothering to look it up or verify it.

So, backpropagation:
- In deep learning, I'm doing (x * F ( * G ( * H))) where let's say our * operator boils down to matrix multiplication or something similar.
- I can rewrite that as F(G(H(x))).
- As I go through each step, I can compute a derivative, F', G', H' and set up placeholders for when I know what G(X) and H(X) are.
- I can then go backwards when I'm done, and start with H'(X), then plug H(X) into my placeholder for G'(H(X)), on and on and on.
- That's backpropagation, and I finally get something that behaves like a derivative from that, and that derivative will tell me if I'm heading in the right direction.

So if I calculate H'(?), G'(?), F'(?) as I go, where "?" is a placeholder, I can just shove the outputs of F(G(H(x))) (because I'm computing them sequentially) into those derivatives and "backpropagate" the gradients.

In Tensorflow this is done by creating a second backwards graph before running your stuff, in PyTorch every tensor you create has a "backwards" method that gets called by the autograd engine (remember, everything in TF/PT are tensors, the whole notion of "layers" is syntactic sugar).

*"well-behaved" in this context means continuously differentiable on the interval you're interested in or something

It really is that simple!

street doc
Feb 20, 2019

Real neurons don’t work like this at all though, right? It’s all spiking neurons and summation and excitation etc etc. Plus reinforced connections that are highly dependent on input spike timing…

Stubear St. Pierre
Feb 22, 2006

Yeah the actual human brain works basically nothing like a deep neural network (to the extent we know how it works at all). Neural networks and "activations" get their name from superficial similarities.

The branch of CS devoted to simulating brains is called "neuromorphic" computing, that's basically the extent of my knowledge on it but there's a Wikipedia article about it https://en.wikipedia.org/wiki/Neuromorphic_engineering

Rahu
Feb 14, 2009


let me just check my figures real quick here
Grimey Drawer
I've been trying to learn some ML stuff lately and to that end I've been reading over Andrej Karpathy's nanoGPT.

I think I have a pretty good grasp on how it works but I'm curious about one specific bit. The training script loads a binary file full of 16-bit ints that represent the tokenized input. It has a block of code that looks like this

https://github.com/karpathy/nanoGPT/blob/7fe4a099ad2a4654f96a51c0736ecf347149c34c/train.py#L116

code:
data = np.memmap(os.path.join(data_dir, 'train.bin'), dtype=np.uint16, mode='r')
ix = torch.randint(len(data) - block_size, (batch_size,))
x = torch.stack([torch.from_numpy((data[i:i+block_size]).astype(np.int64)) for i in ix])
What I'm curious about is: what is the purpose of doing `astype(np.int64)` here? The data is written out as 16 bit uints, then loaded as 16 bit uints, then reinterpreted as 64 bit ints when converting from numpy to pytorch and I just don't see what that achieves.

Xun
Apr 25, 2010

Anyone going to ICML? Im not presenting (rip) but I managed to get a travel grant for it anyway :shrug:

pangstrom
Jan 25, 2003

Wedge Regret
This is a factoid from a previous life but you used to hear that backprop was biologically implausible except for maybe kinda-sorta in the cerebellum.

Stubear St. Pierre
Feb 22, 2006

Rahu posted:

I've been trying to learn some ML stuff lately and to that end I've been reading over Andrej Karpathy's nanoGPT.

I think I have a pretty good grasp on how it works but I'm curious about one specific bit. The training script loads a binary file full of 16-bit ints that represent the tokenized input. It has a block of code that looks like this

https://github.com/karpathy/nanoGPT/blob/7fe4a099ad2a4654f96a51c0736ecf347149c34c/train.py#L116

code:
data = np.memmap(os.path.join(data_dir, 'train.bin'), dtype=np.uint16, mode='r')
ix = torch.randint(len(data) - block_size, (batch_size,))
x = torch.stack([torch.from_numpy((data[i:i+block_size]).astype(np.int64)) for i in ix])
What I'm curious about is: what is the purpose of doing `astype(np.int64)` here? The data is written out as 16 bit uints, then loaded as 16 bit uints, then reinterpreted as 64 bit ints when converting from numpy to pytorch and I just don't see what that achieves.

The forward method of their GPT model feeds that input through an nn.Embedding layer which requires torch.long (int64) input, so they're doing the conversion on the batch code because that will generally run on the CPU, or at the very least can be precomputed/queued, whereas a conversion further down the actual network in the Embedding layer will happen on the GPU.

Xun posted:

Anyone going to ICML? Im not presenting (rip) but I managed to get a travel grant for it anyway :shrug:

Honestly the major conferences are notorious for being a horseshit arbitrary process to get accepted, you really deserve respect for being able to get a travel grant these days

Analytic Engine
May 18, 2009

not the analytical engine

hyphz posted:

That made several things much clearer - thanks!

Karpathy is da man

Adbot
ADBOT LOVES YOU

Rahu
Feb 14, 2009


let me just check my figures real quick here
Grimey Drawer

Stubear St. Pierre posted:

The forward method of their GPT model feeds that input through an nn.Embedding layer which requires torch.long (int64) input, so they're doing the conversion on the batch code because that will generally run on the CPU, or at the very least can be precomputed/queued, whereas a conversion further down the actual network in the Embedding layer will happen on the GPU.

Ah, didn’t realize that it only took fixed size input like that. Thanks :tipshat:

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply