Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Ahz
Jun 17, 2001
PUT MY CART BACK? I'M BETTER THAN THAT AND YOU! WHERE IS MY BUTLER?!

Business posted:

The subject of pickling came up earlier and I am new to the whole idea of serialization so I wanted to check if what I'm doing makes sense. I am working with spaCy for NLP and I wanted to save large Doc objects (spacy thing that contains lots of NLP data for text(s)) on the backend of my django app. Here's what I'm thinking: a user uploads a text and the backend processes it and saves it as a .pickle so they don't have to re-process the text every time they want to use it. When the user wants to do something to the previously uploaded text, I have the backend open up the saved pickle, search through it, and serve that data to the user. The pickle file seems gigantic to me (600 kb text file to 17 mb .pickle) so I'm worried I'm going about this the wrong way.

Is this an appropriate use case? Eventually I want to add users and allow them to save processed text files. Do I store those pickle objects as fields in a database or is that a Bad Way to do things?

https://intoli.com/blog/dangerous-pickles/

Adbot
ADBOT LOVES YOU

punished milkman
Dec 5, 2018

would have won

This is super interesting but doesn't seem to be relevant to this dude's use case since the end user doesn't get the opportunity to upload their own pickles.

shrike82
Jun 11, 2005

You should be able to export the token data in a spacy Doc directly to numpy arrays and then use the usual numpy serialization methods.

Sad Panda
Sep 22, 2004

I'm a Sad Panda.
Is there a way to find the changelogs for packages that are updated but don't necessarily have anything useful? For example I'm using PyAutoGui. 3 days ago 0.9.43 was released. If I look at the Github (https://github.com/asweigart/pyautogui/commits/master) the latest is 0.9.42 but PyPi (https://pypi.org/project/PyAutoGUI/0.9.43/#history) shows that 0.9.43 exists but I have no idea what changed.

It doesn't necessarily have to be showdiff, it's just that I'm reticent to upgrade without know what's happened given it's going to a semi-important project

Sad Panda fucked around with this message at 19:34 on May 30, 2019

mr_package
Jun 13, 2000

Business posted:

The pickle file seems gigantic to me (600 kb text file to 17 mb .pickle) so I'm worried I'm going about this the wrong way.

Is this an appropriate use case? Eventually I want to add users and allow them to save processed text files. Do I store those pickle objects as fields in a database or is that a Bad Way

According to docs that seems like it's to be expected.

quote:

When pickling spaCy’s objects like the Doc or the EntityRecognizer, keep in mind that they all require the shared Vocab (which includes the string to hash mappings, label schemes and optional vectors). This means that their pickled representations can become very large, especially if you have word vectors loaded, because it won’t only include the object itself, but also the entire shared vocab it depends on.
https://spacy.io/usage/saving-loading

Only thing I can think to check is if v2 has some new stuff that makes it easier/better (these are v1 docs I think?) it's not really clear to me whether the built-in to_bytes() / from_bytes() methods can apply to Doc objects or only pipelines or whatever (I have no experience in NLP so this is all greek to me): https://spacy.io/usage/v2#migrating-saving-loading

I would not store these in a database generally but maybe some of the new object / document store databases would work? Not sure what they buy you over just saving to disk if you can't actually make use of the data (e.g. what does stuffing binary blobs into a db get you? I don't know but check em out there might be some benefit there).

necrotic
Aug 2, 2005
I owe my brother big time for this!

Sad Panda posted:

Is there a way to find the changelogs for packages that are updated but don't necessarily have anything useful? For example I'm using PyAutoGui. 3 days ago 0.9.43 was released. If I look at the Github (https://github.com/asweigart/pyautogui/commits/master) the latest is 0.9.42 but PyPi (https://pypi.org/project/PyAutoGUI/0.9.43/#history) shows that 0.9.43 exists but I have no idea what changed.

It doesn't necessarily have to be showdiff, it's just that I'm reticent to upgrade without know what's happened given it's going to a semi-important project

I'd open an issue on the repo and ask why there are releases but no changes to the repo. Their changes document hasn't been updated since 0.9.40 either.

They just released another one today, even. You're only option at this point is to diff the package contents.

Business
Feb 6, 2007

Thanks for the responses all they are helpful. Going to look into storing as numpy arrays and/or being more selective about what I store. The main issue is that my texts are large enough that I have to stream it as a chopped up list of Docs, and then find a way to store that and the associated vocab data without resorting to writing it to bytes. The blobs would give me everything, and they good for my own offline purposes, but its helpful to have it confirmed that it's weird to store a big binary data thing for each file.

General_Failure
Apr 17, 2005
I worked out my pip problem. Sort of. Using - - no-use-pep517 gets me past the issue. It's concerning that the same problem exists on multiple devices.

shrike82
Jun 11, 2005

Business posted:

Thanks for the responses all they are helpful. Going to look into storing as numpy arrays and/or being more selective about what I store. The main issue is that my texts are large enough that I have to stream it as a chopped up list of Docs, and then find a way to store that and the associated vocab data without resorting to writing it to bytes. The blobs would give me everything, and they good for my own offline purposes, but its helpful to have it confirmed that it's weird to store a big binary data thing for each file.

Preprocessing texts in spacy shouldn't be that compute intensive so consider just storing the raw texts and computing on demand especially if users are given the option to upload new texts.

Trade CPU time for storage - even with numpy, I suspect you're going to be dealing with large objects since there's a lot of token level metadata.

Wallet
Jun 19, 2006

shrike82 posted:

Preprocessing texts in spacy shouldn't be that compute intensive so consider just storing the raw texts and computing on demand especially if users are given the option to upload new texts.

Trade CPU time for storage - even with numpy, I suspect you're going to be dealing with large objects since there's a lot of token level metadata.

I imagine this is the way you'd want to go because spaCy is taking text and adding a whole lot of extra data to it by its very nature. Depending on what you're up to you could also split the difference and store the original text and whatever the relevant processed/aggregated results are but not the entire marked up data from spaCy.

the yeti
Mar 29, 2008

memento disco



Is it awful practice to wrap import errors in a generic ‘Make sure you’re using the right virtual environment ‘ type reminder?

mbt
Aug 13, 2012

the yeti posted:

Is it awful practice to wrap import errors in a generic ‘Make sure you’re using the right virtual environment ‘ type reminder?

As someone who us often in the wrong virtual environment, yes because that's sort of implied by the error. Anyone who uses a virtual environment ie non new users, that would be the first thing you check.

Boris Galerkin
Dec 17, 2011

I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!
I think if you do something like

code:
try:
  import foo

except ImportError:
  raise ImportError("oh god did you activate the environment?")
i.e., giving a more tailored troubleshooting message that would be fine but if you mean to make your own error class or to do something like, I dunno, checking for the name of the environment and/or packages are available, then it's a waste of time.

Also, for anybody experienced with using python this type of message would be completely unneeded as they'd all know what to do/check, so it'd just be for beginners.

Boris Galerkin fucked around with this message at 16:28 on May 31, 2019

Business
Feb 6, 2007

shrike82 posted:

Preprocessing texts in spacy shouldn't be that compute intensive so consider just storing the raw texts and computing on demand especially if users are given the option to upload new texts.

Trade CPU time for storage - even with numpy, I suspect you're going to be dealing with large objects since there's a lot of token level metadata.

From testing things out on my own it seems like everything up through part of speech tagging is no problem, but dependencies and NER take a long time. About ~25 seconds on my laptop that has 8 gigs of ram for a more or less typical use case I have in mind. Which seems prohibitively expensive in terms of cloud CPU time but I have no clue, was just working under the assumption that storage is way cheaper than computation. Sorry this is getting kinda far afield but cool to see other people are using spacy

the yeti
Mar 29, 2008

memento disco



Boris Galerkin posted:

I think if you do something like

code:
try:
  import foo

except ImportError:
  raise ImportError("oh god did you activate the environment?")
i.e., giving a more tailored troubleshooting message that would be fine

Yeah that’s what I had in mind, I’m dealing with DBAs who don’t know much/any python but will run the stuff I’m writing now and then to check data going into their db.

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

the yeti posted:

Yeah that’s what I had in mind, I’m dealing with DBAs who don’t know much/any python but will run the stuff I’m writing now and then to check data going into their db.

Yeah, if you know your audience you should write for your audience.

Empress Brosephine
Mar 31, 2012

by Jeffrey of YOSPOS
Trying to learn Flask & Python to do what I need it to do....right now I'm trying to get a integer from a HTML number form and times it by a set amount (in this example 12) and then return the result to display on the HTML page...basically ask the user how many tickets they want and then times it by the ticket cost. I know I'm doing this wrong anyways but I think if I can get this one thing working I can expand out and do what I actually want to accomplish. The problem I'm having though is when I run the application, I don't get x * 12 as a result I get the integer x printed 12 times....so not sure what I'm doing wrong. Anyway one of you could take a look and give a hint or something? I tried to turn the input aduTix into a int to times by twelve but it returns a internal server error. I don't really understand what I'm doing but I can kinda understand the code I'm writing. Thank you all for the help! \

EDIT: The internal server error I get says "int" or "float' or "str" aren't supported.

EDIT2: I figured it out! Flask doesn't like to return a int, it has to be a string or something else I believe


here's my html:
code:
<!DOCTYPE html>
<html>
<head>
	</head>

<body>
<form method = "POST">
<input type ="radio" name="seasons" value="regular">Regular<br>
<input type ="radio" name="seasons" value="peak">Peak</br>  

<br>
<br>
# of Adults (13-64): 
<input type="number" name="adultTix"><br>
# of Seniors (65+):
<input type="number" name="seniorTix"><br>
# of Children (4-12):
<input type="number" name="childTix"><br>
# of Children Under 4:
<input type="number" name="free"><br>
<br> 
<input type="submit" name="submit_button">

</form>


</body>
</html>
and my app.py:
code:
from flask import Flask, request, render_template

app = Flask(__name__)


@app.route('/')
def forms():
    return render_template('form.html')


@app.route('/', methods=['GET', 'POST'])
def forms_post():

    # init variables
    seaChoice = request.form['seasons']
    aduTix = request.form['adultTix']
    senTix = request.form['seniorTix']
    chiTix = request.form['childTix']
    freeTix = request.form['free']

    # do the maths
    if seaChoice.lower() == "regular":
        aduTotal = aduTix * 12

    return aduTotal


app.run()

Empress Brosephine fucked around with this message at 20:21 on May 31, 2019

Da Mott Man
Aug 3, 2012


Empress Brosephine posted:

Trying to learn Flask & Python to do what I need it to do....right now I'm trying to get a integer from a HTML number form and times it by a set amount (in this example 12) and then return the result to display on the HTML page...basically ask the user how many tickets they want and then times it by the ticket cost. I know I'm doing this wrong anyways but I think if I can get this one thing working I can expand out and do what I actually want to accomplish. The problem I'm having though is when I run the application, I don't get x * 12 as a result I get the integer x printed 12 times....so not sure what I'm doing wrong. Anyway one of you could take a look and give a hint or something? I tried to turn the input aduTix into a int to times by twelve but it returns a internal server error. I don't really understand what I'm doing but I can kinda understand the code I'm writing. Thank you all for the help! \

EDIT: The internal server error I get says "int" or "float' or "str" aren't supported.

EDIT2: I figured it out! Flask doesn't like to return a int, it has to be a string or something else I believe

This should help explain why it wasn't working.
http://flask.pocoo.org/docs/1.0/quickstart/#about-responses

Empress Brosephine
Mar 31, 2012

by Jeffrey of YOSPOS
Thanks for the help with Flask. Another quick question, is it possible to take data input from a form like a date and then insert it into a a href such as


E; nvm keep answering my own questions :D

Empress Brosephine fucked around with this message at 17:44 on Jun 1, 2019

Dominoes
Sep 20, 2007

Hey dudes. Is it possible to set mobile conditions from a Django template? I'm not v good with them and forms, but I have a form for file upload that's in a template since I'm not sure how to do it on frontend. Want to hide it if on mobile. Normally this is done by checking window.innerWidth, but not sure how to do in a template or Django form. Answer might just be to move to frontend.

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

Dominoes posted:

Hey dudes. Is it possible to set mobile conditions from a Django template? I'm not v good with them and forms, but I have a form for file upload that's in a template since I'm not sure how to do it on frontend. Want to hide it if on mobile. Normally this is done by checking window.innerWidth, but not sure how to do in a template or Django form. Answer might just be to move to frontend.

You can parse request.META['HTTP_USER_AGENT'] and then see if its a mobile browser. There's various libraries to help with this.

Personally, I avoid doing this kinda stuff. Instead I'd use media queries in my CSS.

unpacked robinhood
Feb 18, 2013

by Fluffdaddy
I'd like a method to run once a month, on a machine that's randomly on (I'd prefer to keep-it full python if I can so no cron etc)
It's not important that it runs every 30 days, or every 1st of the month.

If the main script hasn't been executed for more than a month it should run the task once, and keep track for next time
I've done a quick test with schedule but I don't think it's what I need.

What are my options ? I can whip up a lovely thing by writing a timestamp to a file but maybe there's a better way.

Nippashish
Nov 2, 2005

Let me see you dance!

unpacked robinhood posted:

I'd like a method to run once a month, on a machine that's randomly on (I'd prefer to keep-it full python if I can so no cron etc)
It's not important that it runs every 30 days, or every 1st of the month.

If the machine might reboot at unpredictable times I don't see how you can guarantee anything without touching the external environment. Even if you write your own custom logic to keep track of how long to wait between runs you still need to somehow tell the system to turn the thing doing the waiting back on when it reboots, and that is going to necessarily involve configuring something at the system level (i.e. not "full python").

I would just make a cron job that runs once per day, and have the script check if it's been at least a month since it was last run before doing anything.

QuarkJets
Sep 8, 2008

Definitely use cron

unpacked robinhood
Feb 18, 2013

by Fluffdaddy

QuarkJets posted:

Definitely use cron

Nippashish posted:

I would just make a cron job that runs once per day, and have the script check if it's been at least a month since it was last run before doing anything.

The script will run at least once when the machine is on, probably at boot time, or from user input. I'll take a look at cron then.

Any libraries to deal with the second part ? I've written a thing but I don't really trust myself with reliability and edge cases

cinci zoo sniper
Mar 15, 2013




unpacked robinhood posted:

The script will run at least once when the machine is on, probably at boot time, or from user input. I'll take a look at cron then.

Any libraries to deal with the second part ? I've written a thing but I don't really trust myself with reliability and edge cases

I would just write month of execution into a text file, and compare against it on each run if that value has changed since the last time, or does not exist.

unpacked robinhood
Feb 18, 2013

by Fluffdaddy

cinci zoo sniper posted:

I would just write month of execution into a text file, and compare against it on each run if that value has changed since the last time, or does not exist.

I like this better than dicking around with timestamps, thanks

Empress Brosephine
Mar 31, 2012

by Jeffrey of YOSPOS
Right now I have a variable that is a request.form from a html date picker that assigns a value of XXXX-XX-XX to the variable. How would I write a if statement that says if variable is between let’s say 2019-01-01 and 2019-05-31, is that possible? I’m not sure if Python or myself is smart enough to know that the variable is a integer and not just a random string. Right now I have about a 50 int long “or” statement looking for certain dates to trigger a fail.

Thanks for the help.

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL
The datetime module has problems I don't totally understand (mostly having to do with time zone shenanigans, I think?) but that way I'd do it is turn the XXXX-XX-XX string into a date object (using fromisoformat) and do your comparisons between those.

KICK BAMA KICK
Mar 2, 2009

Empress Brosephine posted:

Right now I have a variable that is a request.form from a html date picker that assigns a value of XXXX-XX-XX to the variable. How would I write a if statement that says if variable is between let’s say 2019-01-01 and 2019-05-31, is that possible? I’m not sure if Python or myself is smart enough to know that the variable is a integer and not just a random string. Right now I have about a 50 int long “or” statement looking for certain dates to trigger a fail.

Thanks for the help.
You can use datetime's strptime to turn a string into a datetime you can easily test by specifying the format the string is in:
Python code:
import datetime

value = '2019-06-02'
format_str = '%Y-%m-%d'
value_as_date = datetime.datetime.strptime(value, format_str).date()  # strptime returns a datetime, call date() to throw away the time information if we're only concerned with the date
start_date = datetime.date(2019, 01, 01)
end_date = datetime.date(2019, 05, 31)
in_range = start_date <= value_as_date <= end_date

unpacked robinhood
Feb 18, 2013

by Fluffdaddy

Empress Brosephine posted:

Right now I have a variable that is a request.form from a html date picker that assigns a value of XXXX-XX-XX to the variable. How would I write a if statement that says if variable is between let’s say 2019-01-01 and 2019-05-31, is that possible? I’m not sure if Python or myself is smart enough to know that the variable is a integer and not just a random string. Right now I have about a 50 int long “or” statement looking for certain dates to trigger a fail.

Thanks for the help.

Python code:
import pendulum as pdu

d1 = pdu.parse('2019-01-01')
d2 = pdu.parse('2019-05-31')
inside = pdu.parse('2019-04-20')
outside = pdu.parse('2008-04-20')

print(d1<inside<d2) # True
print(d1<outside<d2) # False
Seems ok ?

CarForumPoster
Jun 26, 2013

⚡POWER⚡

unpacked robinhood posted:

Python code:
import pendulum as pdu

d1 = pdu.parse('2019-01-01')
d2 = pdu.parse('2019-05-31')
inside = pdu.parse('2019-04-20')
outside = pdu.parse('2008-04-20')

print(d1<inside<d2) # True
print(d1<outside<d2) # False
Seems ok ?

I had never heard of pendulum until now and have definitely been bitten by datetime issues or wrote too complicated of code for that kind of bs. Absolutely fantastic.

NinpoEspiritoSanto
Oct 22, 2013




Pendulum owns tbh

Empress Brosephine
Mar 31, 2012

by Jeffrey of YOSPOS
Thank you al so much. Will this work with Flask?

Dominoes
Sep 20, 2007

Python code:
import pendulum as pdu
d1 = pdu.parse('2019-01-01')

d1.time()  # Time(0, 0, 0)

pdu.parse('2019-01-01') == pdu.parse('2019-01-01T00:00') # True
Bad behavior.

Thermopyle posted:

You can parse request.META['HTTP_USER_AGENT'] and then see if its a mobile browser. There's various libraries to help with this.

Personally, I avoid doing this kinda stuff. Instead I'd use media queries in my CSS.
Appreciate it.

General_Failure
Apr 17, 2005
Liking the Jetson nano, but the software dev team needs to get their poo poo together. Some stuff is pre installed with their Jetpack SDK, some via other means. What is making GBS threads me is opencv. I don't know what they did but it's a part of the SDK. Trouble is it doesn't appear in any venvs. Can someone point me in the right direction of how to deal with this, if at all possible?

the yeti
Mar 29, 2008

memento disco



Is that poetry package manager the pendulum people have any good?

Malcolm XML
Aug 8, 2009

I always knew it would end like this.
It's like if pipenv wasn't written by an rear end in a top hat, so yes

the yeti
Mar 29, 2008

memento disco



Malcolm XML posted:

It's like if pipenv wasn't written by an rear end in a top hat, so yes

:vince:

I figured at least one response would be along that line. In a practical sense pipenv is really pissing me off because packages keep breaking it (pendulum does) and the excuse is always vendored dependencies.

Adbot
ADBOT LOVES YOU

Empress Brosephine
Mar 31, 2012

by Jeffrey of YOSPOS
So I finished Python Crash Course and loved it; what should I read next to improve my skills? Is it worth learning more than the blade level of skills with Flask?

Thanks all.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply