Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Raenir Salazar
Nov 5, 2010

College Slice
I have a project where I need to train a CNN into detecting different classifications of lets say images of cars. Ford, Ferrari, Mitsubishi, and Toyota; while I can get datasets with lots of cars and maybe a dataset that is only ford I cannot seem to find a convenient dataset that has all four neatly divided into categories.

Is there a trick I can do on a dataset that I know has all four mixed together to neatly divide them (like I dunno, some sort of unsupervized clustering to try to pre-divide/sort the data to cut down on the work I have to do manually) or do I basically have to divide it all manually?

Or do we have a more dedicated data science thread for this question?

Adbot
ADBOT LOVES YOU

bob dobbs is dead
Oct 8, 2017

I love peeps
Nap Ghost
cardinality of data?

if its like 20k thats a nice afternoon for a few peeps w/ real eyeballs

Raenir Salazar
Nov 5, 2010

College Slice
The requirements are for a minimum of 1200 training images and 400 testing images balanced between the four classes. Is it normal to just use the Mark 1 Eyeball for this? The programmer in me definitely feels like if there was a way to conveniently automate this I should, but that also sounds suspiciously exactly like the point of the project :v:

America Inc.
Nov 22, 2013

I plan to live forever, of course, but barring that I'd settle for a couple thousand years. Even 500 would be pretty nice.

bob dobbs is dead posted:

if its like 20k thats a nice afternoon for a few peeps w/ real eyeballs

No joke, measure how long it takes you to label say, 50 images. Then figure how many people you could assign to labeling the data and how many hours it would take for each of them. If you don't have anyone, like if this is a personal project (with non-sensitive data), then maybe outsource to mechanical turk?

bob dobbs is dead
Oct 8, 2017

I love peeps
Nap Ghost

Raenir Salazar posted:

The requirements are for a minimum of 1200 training images and 400 testing images balanced between the four classes. Is it normal to just use the Mark 1 Eyeball for this? The programmer in me definitely feels like if there was a way to conveniently automate this I should, but that also sounds suspiciously exactly like the point of the project :v:

eyeball will be cheaper and faster

hard to get a decent programmer for less than $100/hr. can get a manual peep to classify for min wage

1200, you dont even need to hire peeps. take two hours and whack it out

Raenir Salazar
Nov 5, 2010

College Slice
Just to be clear this is a university/school project, not something I'd have a budget for. :D The class is an introduction to AI sort of course as part of completing my BCompSci.

America Inc.
Nov 22, 2013

I plan to live forever, of course, but barring that I'd settle for a couple thousand years. Even 500 would be pretty nice.
And this is why to my understanding people working in AI like to use and rely on pre-labeled public datasets. If there were an algorithm to cluster the images why would you need to do classification in the first place?

bob dobbs is dead
Oct 8, 2017

I love peeps
Nap Ghost
if they're making new algorithms, sure

if you're classifying per se in production there's a fuckton of images and you're getting a sample to train with

Raenir Salazar
Nov 5, 2010

College Slice
Yeah that's why I was checking to make sure I wasn't overcomplicating things or missing anything that I *should* be doing.

Macichne Leainig
Jul 26, 2012

by VG

Raenir Salazar posted:

I have a project where I need to train a CNN into detecting different classifications of lets say images of cars. Ford, Ferrari, Mitsubishi, and Toyota; while I can get datasets with lots of cars and maybe a dataset that is only ford I cannot seem to find a convenient dataset that has all four neatly divided into categories.

Is there a trick I can do on a dataset that I know has all four mixed together to neatly divide them (like I dunno, some sort of unsupervized clustering to try to pre-divide/sort the data to cut down on the work I have to do manually) or do I basically have to divide it all manually?

Or do we have a more dedicated data science thread for this question?

If the dataset is not labeled in some way and you can't programmatically divide them then any unsupervised algorithm won't really help you; there's enough natural variance in an RGB image that you couldn't really do that without a trained classifier in the first place.

Theoretically if a car manufacturer has a ubiquitous enough design language you wouldn't need that many samples but of course when you add wear and tear, different generations, etc into the mix that muddies the waters. But anyway this sentence is mostly rambling.

Since this is a uni project a predone dataset is probably your best bet, this one seems to be from Stanford, has numerous samples, and has make/model in the features from what I can tell. I couldn't tell you what the distribution of makes is, but I hope a Stanford dataset would have tried to somewhat balance that.

https://www.kaggle.com/datasets/jessicali9530/stanford-cars-dataset
http://ai.stanford.edu/~jkrause/cars/car_dataset.html

Kaggle is always a good resource for things like this in my experience.

Raenir Salazar
Nov 5, 2010

College Slice
I went and by hand separated my images into their folders, took me like 8 hours while watching Let's Plays in the background. :toot:

My next challenge was to how to use these datasets with the example code of the various Labs we did in class so far.

I managed to stumble on Neural Network Console by Sony, which seems to not only let me produce a image classifier dataset, but works with subfolders and assigns labels based on the subfolders; yay.

Took me a couple of tries as while it will output a training set and a testing set, the code example I had would auto divide it randomly for me so I just outputted the whole dataset as 1 csv file which I loaded using a custom loader using some code I found online in a youtube tutorial.

This seems to work but then turns out my python environment is hosed because I get the error "Initializing libiomp5md.dll, but found mk2iomp5md.dll already initialized" with the problem being I must have some conflicted python library installed.

I pushed my code and now I wait as I try to reinstall spyder in a new conda python environment and reinstall my modules and hope that works.

Anyways here's my code so far:

(my loader)
code:
import os
import pandas as pd
import torch 
from torch.utils.data import Dataset 
from skimage import io

class CoolCarsDatasetLoader(Dataset):
    def __init__(self, csv_file, root_dir, transform=None):
        self.annotations = pd.read_csv(csv_file)
        self.root_dir = root_dir
        self.transform = transform
        
    def __len__(self):
        return len(self.annotations)
    
    def __getitem__(self, index):
        img_path = os.path.join(self.root_dir, self.annotations.iloc[index,0])
        image = io.imread(img_path)
        y_label = torch.tensor(int(self.annotaions.iloc[index, 1]))
        
        if self.transform:
            image = self.transform(image)
        return (image, y_label)
(using the loader)
code:
    dataset = CoolCarsDatasetLoader(csv_file = './data/project1dataset/coolcarsdataset.csv', 
                                     root_dir = './data/project1dataset',
                                     transform = transforms.ToTensor())
    # I had 1733+44 images in total
    train_set, test_set = torch.utils.data.random_split(dataset, [1733, 400])
    
    print("Train_set size: ", len(train_set)) # seems to work?
    print("Test_set size: ", len(test_set)) # seems to work?
    
    train_loader = DataLoader(dataset=train_set, batch_size=32, shuffle=True)
    test_loader = DataLoader(dataset=test_set, batch_size=32, shuffle=True)
Didn't seem to throw any fits until I added the code from my project partners to my code (my partners part of the code is actually constructing the CNN, training the CNN and evaluating it, we're a team of 3, so this is divided between us).

Raenir Salazar fucked around with this message at 23:35 on Jun 8, 2022

cinci zoo sniper
Mar 15, 2013




Raenir Salazar posted:

I have a project where I need to train a CNN into detecting different classifications of lets say images of cars. Ford, Ferrari, Mitsubishi, and Toyota; while I can get datasets with lots of cars and maybe a dataset that is only ford I cannot seem to find a convenient dataset that has all four neatly divided into categories.

Is there a trick I can do on a dataset that I know has all four mixed together to neatly divide them (like I dunno, some sort of unsupervized clustering to try to pre-divide/sort the data to cut down on the work I have to do manually) or do I basically have to divide it all manually?

Or do we have a more dedicated data science thread for this question?

That trick is called Mechanical Turk, or a specialised data labelling service. I do NLU, not CV, but this sounds like a crap idea for unsupervised or semi-supervised clustering, if that’s what you’re thinking.

We have a dead-ish DS thread, and should probably get around merging those together. Maybe merge in the scientific computing thread as well, make a boffin central. And revive LaTeX thread :anime:

Fake edit: 1200 images? Just boil yourself a coffee pot.

Edit: Should’ve probably finished reading before replying. :v:

cinci zoo sniper fucked around with this message at 11:36 on Jun 9, 2022

America Inc.
Nov 22, 2013

I plan to live forever, of course, but barring that I'd settle for a couple thousand years. Even 500 would be pretty nice.

cinci zoo sniper posted:

That trick is called Mechanical Turk, or a specialised data labelling service. I do NLU,

That's Natural Language Understanding, right? Do you know of some good intro books into that field? I have an interest in linguistics and NLP + NLU.

cinci zoo sniper
Mar 15, 2013




quarantinethepast posted:

That's Natural Language Understanding, right? Do you know of some good intro books into that field? I have an interest in linguistics and NLP + NLU.

That’s it indeed. The area you’ve outlined is really broad, and I don’t know what’s your background like, so maybe take a look at https://web.stanford.edu/~jurafsky/slp3/ ? I’m not sure I know anything up to date that would be more beginner-friendly than this, as far as books go.

bob dobbs is dead
Oct 8, 2017

I love peeps
Nap Ghost
dan jurafsky the years i knew him was kinda 3/4 of the way to being a crying shambling wreck because the neural net peeps were running roughshod over his life's work. he's fundamentally not a neural nets guy and neither is martin iirc. of course they are forced to shove it in any nlu book now

cinci zoo sniper
Mar 15, 2013




bob dobbs is dead posted:

dan jurafsky the years i knew him was kinda 3/4 of the way to being a crying shambling wreck because the neural net peeps were running roughshod over his life's work. he's fundamentally not a neural nets guy and neither is martin iirc. of course they are forced to shove it in any nlu book now

Yeah it’s not a DL book per se, but jumping directly into DL without understanding what you’re trying to do or why is like trying to do a backflip with a motorcycle when you can’t do it water, imo.

Raenir Salazar
Nov 5, 2010

College Slice
So after having to completely reinstall python/anaconda I managed to get cuda working for the PyTorch CNN; but the "Skorch" version just refuses to work on my GPU. I have device='cuda' in the NeuralNetClassifier constructor and I've confirmed that cuda/my gpu is available and working correctly, at least with PyTorch, but Skorch I have no indication that it's working; GPU usage is 2-5% when working with PyTorch but 0% when it gets to the Skorch version of my network.

America Inc.
Nov 22, 2013

I plan to live forever, of course, but barring that I'd settle for a couple thousand years. Even 500 would be pretty nice.

cinci zoo sniper posted:

Yeah it’s not a DL book per se, but jumping directly into DL without understanding what you’re trying to do or why is like trying to do a backflip with a motorcycle when you can’t do it water, imo.

I've been taking Andrew Ng's deep learning Coursera series so I've got some DL background, albeit the courses could use more practical focus on projects beyond "add these 2 lines which we have spelled out for you to an almost complete function".

cinci zoo sniper
Mar 15, 2013




quarantinethepast posted:

I've been taking Andrew Ng's deep learning Coursera series so I've got some DL background, albeit the courses could use more practical focus on projects beyond "add these 2 lines which we have spelled out for you to an almost complete function".

I had reviewed that course recently and for a practitioner imo it’s only good to tie up disparate knowledge about network-based ML, e.g., if you’re moving from credit risk into computer vision. I really hated the assignments, most of which just had you copy and paste code provided above, full verbatim, and I struggle to imagine how anyone could learn from that.

Apparently, though, that was too much already, as the course is now seeming being redone to be simpler.

Also, the audio quality was really poo poo. I loved random loud pitching noises in half of the videos, because no one on the editorial side had functional hearing.

Not sure what’s a good alternative for it though, for general NN intro. One of the things on my docket is to figure out a replacement curriculum for future hires that may need it.

Keisari
May 24, 2011

cinci zoo sniper posted:

Not sure what’s a good alternative for it though, for general NN intro. One of the things on my docket is to figure out a replacement curriculum for future hires that may need it.

Allow me to (possibly) help you! :eng101:

Personally, I used Kirill Eremenko et al's "Deep Learning A-Z™: Hands-On Artificial Neural Networks" to learn the basics. I had no previous understanding of ML at all, and only very basic knowledge of programming, perhaps 80-160 hours total of programming under my belt. The course was pretty good in my opinion, and I learnt the basics. My thesis network probably still uses bits of code from those lectures I did years ago. I don't know how "current" the course is, and obviously it doesn't teach deep insight on deep learning, but a very good hands-on lecture series. Or at least I found it good, I'm not saying it's the best one out there.

Oh and just in case anyone isn't familiar, Udemy's "sales" are all bullshit, courses always cost 10-30 bucks no matter what the list prices say. Those very "urgent" 95 % off deals of several hundred dollar/euro courses are essentially fake. There's always a sale going on that ends in a few hours or days.

Link: https://www.udemy.com/course/deeplearning/

EDIT:

I still use the course every now and then, for example I am looking into RNNs at the moment, and watching the LSTM lectures every now and then. It's going to be a good stepping stone to time-series analysis or whatever the hell I want to do later.

bob dobbs is dead
Oct 8, 2017

I love peeps
Nap Ghost
cs229 for reals has pretty moderate problem sets that they still couldn't deal with when they started coursera with it

https://cs229.stanford.edu/summer2019/ps1.pdf

fast.ai is decent enough for neural nets stuff. although you won't be doing nontrivial original stuff from it, you should not condign anyone to that fate in a corp anyways

bob dobbs is dead fucked around with this message at 09:53 on Jun 13, 2022

Boris Galerkin
Dec 17, 2011

I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!
I want to learn machine learning so that I can work for Google and make headlines about falling in love with a chatbot but I'm not sure where to start. The OP mentions PyTorch, Tensorflow, and scikit-learn. I'm very familiar with numpy/scipy so I was leaning towards scikit-learn but I'd rather pick a package that is most used.

I did skim through this tutorial for PyTorch real quick and got to this point:

quote:

When training neural networks, the most frequently used algorithm is back propagation. In this algorithm, parameters (model weights) are adjusted according to the gradient of the loss function with respect to the given parameter. The loss function calulates the difference between the expected output and the actual output that a neural network produces. The goal is to get the result of the loss function as close to zero as possible. The algorithm traverse backwards through the network network to adjust the weights and bias to retrain the model. That's why it's called back propagation. This back and forward process of retraining the model over time to reduce the loss to 0 is called the gradient descent.

I'm just not really sure how this is any different from newton's method or like basically any numerical method that minimizes an objective/loss function.

e: Maybe I need something more technical instead of a quickstart, as my background is numerical methods.

Boris Galerkin fucked around with this message at 18:23 on Jun 15, 2022

cinci zoo sniper
Mar 15, 2013




Boris Galerkin posted:

I want to learn machine learning so that I can work for Google and make headlines about falling in love with a chatbot but I'm not sure where to start. The OP mentions PyTorch, Tensorflow, and scikit-learn. I'm very familiar with numpy/scipy so I was leaning towards scikit-learn but I'd rather pick a package that is most used.

I did skim through this tutorial for PyTorch real quick and got to this point:

I'm just not really sure how this is any different from newton's method or like basically any numerical method that minimizes an objective/loss function.

e: Maybe I need something more technical instead of a quickstart, as my background is numerical methods.

I would say that you could try going into https://developers.google.com/machine-learning/crash-course raw and seeing how it goes. Depending on particulars of your background, you could give plenty of ML practitioners run for their money.

On the 3 libraries, difference is in focus. TF/PyTorch focus on neural network-based ML, while SKLearn focuses on classical ML methods. You could (kind of) think of the difference as differentiable programming vs probabilistic programming.

Where to go after the crash course depends a bit on what your plans are. Do you want to work for a particular company or company class? They may have a niche to specialise for where learning classical ML wouldn’t make sense, or the other way around. It could also be the case that your specific kind of numerical methods are actually repackaged as (more expensive) ML somewhere else, if you just want to hop into the job title.

If you’re comfortable with academic literature, in my explicitly subjective opinion you could easily do worse than read https://hastie.su.domains/ElemStatLearn/ regardless of what’s down the line for your ML career, though.

Some other things you’ll need to take care of likely sooner than later:
- SQL
- Data engineering fundamentals
- Not grimacing when colleagues ask about sick AI features

Boris Galerkin
Dec 17, 2011

I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!
Cool, thanks I’ll take a look at the textbook and try to do that Google course in earnest.

I mostly want to learn this stuff because the very first time I heard the term “machine learning” I thought “how cute the programmers are learning about linear regression”. And well here we are in 2022 and everything uses ML. Jokes aside just trying to fill in some knowledge gaps.

fritz
Jul 26, 2003

Boris Galerkin posted:

I'm just not really sure how this is any different from newton's method or like basically any numerical method that minimizes an objective/loss function.

Newton's method and other second-order techniques aren't really appropriate for NNs for a couple reasons, not least of which being that the Hessian is so enormous: even if you don't store it you still have to compute with it.

Backpropagation is just a way of quickly computing the gradient of the objective function that works with the particular structure of NNs.

ultrafilter
Aug 23, 2007

It's okay if you have any questions.


Boris Galerkin posted:

I'm just not really sure how this is any different from newton's method or like basically any numerical method that minimizes an objective/loss function.

The goal in traditional optimization is to fit the points you have. The goal here is to fit the points you don't have. There are a lot of methods in common but the problems are fundamentally different.

bob dobbs is dead
Oct 8, 2017

I love peeps
Nap Ghost
you can get pearlmutter's trick done for pseudo second order method and lots of first order methods with vague handwaving towards curvatures but not much else

Raenir Salazar
Nov 5, 2010

College Slice
So here's my current problem for a university project and my attempt to make it go a little faster using my GPU.

code:
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    if torch.cuda.is_available():
        print("CUDA is Available")
        print(torch.cuda.device(0))
        print(torch.cuda.get_device_name(0))
    else:
        print("Get a better computer.")
    
    model = CNN()
    model.to(device)
    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=m_learning_rate)
    
    #training
    total_step = len(train_loader)
    loss_list = []
    acc_list = []
    
    for epoch in range(m_num_epochs):
        print("Epoch: ", epoch)
        for i, (images, labels) in enumerate(train_loader):
            #images = images.view(images.size(0), -1)
            images, labels = images.to(device), labels.to(device)
            #print("i: ", i)
            # Forward pass
            outputs = model(images)
            loss = criterion(outputs, labels)
            loss_list.append(loss.item())
    
            # Backprop and optimisation
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            # Train accuracy
            total = labels.size(0)
            _, predicted = torch.max(outputs.data, 1)
            correct = (predicted == labels).sum().item()
            acc_list.append(correct / total)
            if (i + 1) % 10 == 0:
                print('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}, Accuracy: {:.2f}%'
                .format(epoch + 1, m_num_epochs, i + 1, total_step, loss.item(),
                (correct / total) * 100))
This seems to work, and is faster, and using the GPU, albeit only 3-5% of the GPU is actually being utilized.

However using skorch it seems to be much slower and not using the GPU at all:

code:
    from torch.utils.data import random_split
    from sklearn.model_selection import cross_val_score
    from skorch.helper import SliceDataset
    
    m = len(train_set)
    train_set_size = int(m - m * 0.2)
    test_set_size = int(m * 0.2)
    rem = m - train_set_size - test_set_size
    train_data, val_data = random_split(train_set, [train_set_size+ rem, test_set_size])
    y_train = np.array([y for x, y in iter(train_data)])
    
    classes = ( 'cloth', 'n95', 'none', 'surgical' )
    new_net = NeuralNetClassifier(
        CNN,    # should match the part 1 for the model training
        max_epochs=m_num_epochs,
        iterator_train__num_workers=8,
        iterator_valid__num_workers=8,
        lr=m_learning_rate,  # maybe need to match
        batch_size=m_batch_size,  # maybe need to match
        optimizer=optim.Adam,
        criterion=nn.CrossEntropyLoss,
        device='cuda'
    )
Any idea what's up? I've tried googling but not much comes up that seems relevant to my use case.

Raenir Salazar
Nov 5, 2010

College Slice
Alrighty, I managed to determine that if I go to the Windows task manager performance monitor, and switch one of the windows to "CUDA" I can actually confirm that CUDA ranges between 41% to 94% usage; yay. So I guess my GPU IS being used, it's still going to be drat slow for reasons unknown to me then. Perhaps I'm loading/transfering the data in a way that isn't optimal?

Here's my network for reference:

code:
import torch.nn as nn

class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.flatten = nn.Flatten()
        self.conv_layer = nn.Sequential(
            nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, padding=1),
            nn.BatchNorm2d(32),
            nn.LeakyReLU(inplace=True),
            nn.Conv2d(in_channels=32, out_channels=32, kernel_size=3, padding=1),
            nn.BatchNorm2d(32),
            nn.LeakyReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding=1),
            nn.BatchNorm2d(64),
            nn.LeakyReLU(inplace=True),
            nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, padding=1),
            nn.BatchNorm2d(64),
            nn.LeakyReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2), # kernel size times image dims
        )
        
        self.fc_layer = nn.Sequential(
            nn.Dropout(p=0.1),
            nn.Linear(65536, 1000), # what it wants for a dataset with 128x128 images
            #nn.Linear(8 * 8 * 64, 1000), # original
            nn.ReLU(inplace=True),
            nn.Linear(1000, 512),
            nn.ReLU(inplace=True),
            nn.Dropout(p=0.1),
            nn.Linear(512, 4)
        )
        
    def forward(self, x):
        # conv layers
        x = self.conv_layer(x)
        # flatten
        x = x.view(x.size(0), -1) # batch size here?
        # fc layer
        x = self.fc_layer(x)
        return x
Which begs a follow up question; originally in the lab it was suggested 8 * 8 * 64 (64 being original batch size?) for the input size for the nn.Linear; but when I changed the resolution of my images to 128 by 128 pixels this gave me a mat1 and mat2 shapes cannot be multiplied error.

Based on googling the input for that first nn.Linear should be something like (??? * imageWidth * imageHeight) where ??? I think relates to the last output size of the last convolution layer? Is that the MaxPool2d since its a 2x2 matrix (4?)? 4*128*128 adds up to 65536 which is what the error is telling me it wants. "(32x65536 and 32768x1000)" as an example for an error I got when I thought it was 32 * 32 * 32.

The values otherwise for the Linear and Conv layers are all basically taken as-is from the class Labs and are essentially arbitrary as far as I'm aware; I have no idea if they are good values, I am reading https://towardsdatascience.com/pytorch-layer-dimensions-what-sizes-should-they-be-and-why-4265a41e01fd but their examples are a little different from the code I was given so I'm not really 100% understanding what's going on.

If I'm guess right here; originally the CIFAR10 dataset maybe was 32 by 32 images; there's two pooling layers (MaxPool2d) which I read somewhere does some math and cuts the resolution in half? So 32/2/2 is 8. So 8*8, and then the last Conv layer was 64 as its output, which would correspond to the original 8*8*64? Is that right?

So for 128by128, that would imply, 32*32*64? That would add up? Is that right?

Ideally I'd like to setup the inputs to be either automatic or easy to calculate based on my dataset image resolution and settings; and set it up so CUDA will do its drat job so I'm not twiddling my thumbs with my tiny dataset of just 2,000 images.

Keisari
May 24, 2011

Raenir Salazar posted:

So here's my current problem for a university project and my attempt to make it go a little faster using my GPU.

code:
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    if torch.cuda.is_available():
        print("CUDA is Available")
        print(torch.cuda.device(0))
        print(torch.cuda.get_device_name(0))
    else:
        print("Get a better computer.")
    
    model = CNN()
    model.to(device)
    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=m_learning_rate)
    
    #training
    total_step = len(train_loader)
    loss_list = []
    acc_list = []
    
    for epoch in range(m_num_epochs):
        print("Epoch: ", epoch)
        for i, (images, labels) in enumerate(train_loader):
            #images = images.view(images.size(0), -1)
            images, labels = images.to(device), labels.to(device)
            #print("i: ", i)
            # Forward pass
            outputs = model(images)
            loss = criterion(outputs, labels)
            loss_list.append(loss.item())
    
            # Backprop and optimisation
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            # Train accuracy
            total = labels.size(0)
            _, predicted = torch.max(outputs.data, 1)
            correct = (predicted == labels).sum().item()
            acc_list.append(correct / total)
            if (i + 1) % 10 == 0:
                print('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}, Accuracy: {:.2f}%'
                .format(epoch + 1, m_num_epochs, i + 1, total_step, loss.item(),
                (correct / total) * 100))
This seems to work, and is faster, and using the GPU, albeit only 3-5% of the GPU is actually being utilized.

However using skorch it seems to be much slower and not using the GPU at all:

code:
    from torch.utils.data import random_split
    from sklearn.model_selection import cross_val_score
    from skorch.helper import SliceDataset
    
    m = len(train_set)
    train_set_size = int(m - m * 0.2)
    test_set_size = int(m * 0.2)
    rem = m - train_set_size - test_set_size
    train_data, val_data = random_split(train_set, [train_set_size+ rem, test_set_size])
    y_train = np.array([y for x, y in iter(train_data)])
    
    classes = ( 'cloth', 'n95', 'none', 'surgical' )
    new_net = NeuralNetClassifier(
        CNN,    # should match the part 1 for the model training
        max_epochs=m_num_epochs,
        iterator_train__num_workers=8,
        iterator_valid__num_workers=8,
        lr=m_learning_rate,  # maybe need to match
        batch_size=m_batch_size,  # maybe need to match
        optimizer=optim.Adam,
        criterion=nn.CrossEntropyLoss,
        device='cuda'
    )
Any idea what's up? I've tried googling but not much comes up that seems relevant to my use case.

With the really big caveat that I am relatively novice and probably wrong, it sounds like there is a bottleneck on data transfer to gpu, causing idle times.

Possible reasons: batch size is small, model is too small to get good gpu advantage.

Raenir Salazar
Nov 5, 2010

College Slice
I think it might be how the data is loaded, because when I use pytorch it's blazingly fast.

As an aside, I can't figure this out but now my new problem is no matter what settings I adjust, like batch size, epochs, test data size etc, testing accuracy never gets above 47%. But the training accuracy easily reaches 100% with 20ish epochs.

Is this likely a result of bad data? Is 2,000 images total not enough?

former glory
Jul 11, 2011

That seems like overfitting. Not knowing about the data here, I'd start by checking to make sure your data is homogeneous: that you don't have a huge set of similar data in training and then a big set of different ones in test. This can happen if you don't randomly separate the two categories and flat copy the data into test where filenames might be grouping classes or similar samples within a class. Your test data could just be the tail end of your class listings.

You could also be feeding very high res data, in which case, you would either need a lot more data, or you could just scale them at the input if that's not already happening.

Augmentation could improve your acc, but we're talking single digit % typically. I'm mostly familiar with TF, but I imagine PyTorch would have that built into the front end as well. But this disparity points to some more fundamental issue with the data imo.

e: I just saw that you're trying to ID cars by mfg. with CNN. Assuming it's a bunch of random images off the net at all sorts of different angles and views, that's going to be hard to get accurate at 2000 samples. But I think you should be able to see better test acc. than that. Lower input res and making sure they're uniformly distributed should really boost that.

former glory fucked around with this message at 15:56 on Jun 23, 2022

Charles 2 of Spain
Nov 7, 2017

Try a validation set after every epoch to see if you're maybe overfitting there.

Keisari
May 24, 2011

At a quick glance it seems like you have an enormous sequential layer there. It might be the culprit here. I would look into scikit's gridsearchcv and random architecture search to tune network size or dropout values.

Discendo Vox
Mar 21, 2013

We don't need to have that dialogue because it's obvious, trivial, and has already been had a thousand times.
I'm a complete outsider interested in gaining an overview of OCR methods and applications; there's a specific OCR application that I think would be unusually straightforward and doable, and which I also think has already been completely accomplished with high fidelity in a nearly identical setting. I unfortunately can't be specific, but it's along the lines of "right now there are tools out there for reading product names and prices from grocery receipts, and we want a tool that can do the same to bookstore receipts." The real applications are actually much closer together and more stable than that example. There's also a preexisting, fully labeled dataset that I believe is quite large (six digits) that could be used to train and refine such an application. The endpoint would be a site where users would upload an image of the text and the output would be generated for their review/errorchecking immediately, then stored.

Everything I see and read makes this seem like it would be (relatively) simple, but at the same time, I know I'm close to clueless about any of this stuff, and I'm really just trying to get a sense of how costly/feasible actually doing it would be. Is there a good resource for really intro-level stuff that would give me a sense of how difficult building this out would look like?

Discendo Vox fucked around with this message at 06:18 on Jul 3, 2022

cinci zoo sniper
Mar 15, 2013




Talking about real life application targets for this:

1) Would the text there be printed by a machine exclusively?

2) Would it all be in English?

3) Could these parcels of text be considered documents (i.e. clean background, standard font, focus on legibility)?

4) Would they be described well by a low number of templates?

Raenir Salazar
Nov 5, 2010

College Slice
I had an idea once back when I did QA work for (indirectly) WoTC of scanning MTG cards from the god book we were given for key phrases and use grammar rules to try to output all of our test cases/scripts for testing the cards in order to automate our work of thinking of and writing those test cases. My lead discouraged me from doing it so I never went through with it as I wanted to actually be paid for potentially eliminating my own job :v:

Discendo Vox
Mar 21, 2013

We don't need to have that dialogue because it's obvious, trivial, and has already been had a thousand times.

cinci zoo sniper posted:

Talking about real life application targets for this:

1) Would the text there be printed by a machine exclusively?

2) Would it all be in English?

3) Could these parcels of text be considered documents (i.e. clean background, standard font, focus on legibility)?

4) Would they be described well by a low number of templates?
1) Yes

2) No, a very low group, probably ~1%, will be in duplicated form with another language and in theory there will be a handful not in English. It should be possible to exclude these from the intended use, and/or from the training set.

3) Yes.

4) I’m not certain what you mean by “template”, but the images would be in a handful, maybe six, standardized forms where text is positioned in proportionately consistent relative positions and sizes. The theoretical training data is not currently labeled to separate these formats.

Discendo Vox fucked around with this message at 15:16 on Jul 3, 2022

Jabor
Jul 16, 2010

#1 Loser at SpaceChem
Have you tried just pointing Tesseract at your data and seeing how well it does?

Adbot
ADBOT LOVES YOU

cinci zoo sniper
Mar 15, 2013




Discendo Vox posted:

1) Yes

2) No, a very low group, probably ~1%, will be in duplicated form with another language and in theory there will be a handful not in English. It should be possible to exclude these from the intended use, and/or from the training set.

3) Yes.

4) I’m not certain what you mean by “template”, but the images would be in a handful, maybe six, standardized forms where text is positioned in proportionately consistent relative positions and sizes. The theoretical training data is not currently labeled to separate these formats.

“OCR” in current parlance lumps 3 different areas:

1) Optical Character Recognition – read the text [in a traditional document]*
2) Scene Text Detection – identify text areas in a [naturally occurring] scene
3) Scene Text Recognition – read the text in those areas

2 and 3 are difficult problems, challenging state-of-the-art methods. 1 can be challenging for difficult handwritings or small languages, but is basically a solved problem for machine print of major languages (unless you're dealing with 50-year-old photos of 500-year-old parchments or similar).

Your second point is unlikely an issue, and your 4th point takes care of text detection basicallly - you probably can distinguish between forms using rote heuristics, with no fuzzy ML needed. Thus, I'm not sure you even need anything more than regular developers with experience of integrating off-the-shelf OCR toolkits like EasyOCR, PaddleOCR, Tesseract, respective CV APIs of major cloud providers, or whatever else you have access to, to digitize the collection in question.

Caveat - I'm assuming that the text here is text, and not, e.g., physics formulas. Dealing with math symbols or fancy sub/superscripting is not something I've encountered, but should be researchable enough with the aforementioned keywords.

*Traditional document – sanely laid-out text on a high-contrast background, typeset in a generic font with large enough letters (relative to image resolution, for meaningful contrast areas), legible spacing, and no fancy formatting.

cinci zoo sniper fucked around with this message at 15:52 on Jul 3, 2022

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply