|
I've been trying to learn some ML stuff lately and to that end I've been reading over Andrej Karpathy's nanoGPT. I think I have a pretty good grasp on how it works but I'm curious about one specific bit. The training script loads a binary file full of 16-bit ints that represent the tokenized input. It has a block of code that looks like this https://github.com/karpathy/nanoGPT/blob/7fe4a099ad2a4654f96a51c0736ecf347149c34c/train.py#L116 code:
|
# ¿ Jun 14, 2023 10:51 |
|
|
# ¿ May 20, 2024 23:48 |
|
Stubear St. Pierre posted:The forward method of their GPT model feeds that input through an nn.Embedding layer which requires torch.long (int64) input, so they're doing the conversion on the batch code because that will generally run on the CPU, or at the very least can be precomputed/queued, whereas a conversion further down the actual network in the Embedding layer will happen on the GPU. Ah, didn’t realize that it only took fixed size input like that. Thanks
|
# ¿ Jun 15, 2023 12:24 |