Python

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python

Ahz: Jun 17, 2001; PUT MY CART BACK? I'M BETTER THAN THAT AND YOU! WHERE IS MY BUTLER?!

Business posted:

The subject of pickling came up earlier and I am new to the whole idea of serialization so I wanted to check if what I'm doing makes sense. I am working with spaCy for NLP and I wanted to save large Doc objects (spacy thing that contains lots of NLP data for text(s)) on the backend of my django app. Here's what I'm thinking: a user uploads a text and the backend processes it and saves it as a .pickle so they don't have to re-process the text every time they want to use it. When the user wants to do something to the previously uploaded text, I have the backend open up the saved pickle, search through it, and serve that data to the user. The pickle file seems gigantic to me (600 kb text file to 17 mb .pickle) so I'm worried I'm going about this the wrong way.

Is this an appropriate use case? Eventually I want to add users and allow them to save processed text files. Do I store those pickle objects as fields in a database or is that a Bad Way to do things?

https://intoli.com/blog/dangerous-pickles/

# ? May 30, 2019 15:32

Adbot: ADBOT LOVES YOU

# ? May 27, 2024 00:31

punished milkman: Dec 5, 2018; would have won

Ahz posted:

https://intoli.com/blog/dangerous-pickles/

This is super interesting but doesn't seem to be relevant to this dude's use case since the end user doesn't get the opportunity to upload their own pickles.

# ? May 30, 2019 15:54

shrike82: Jun 11, 2005

You should be able to export the token data in a spacy Doc directly to numpy arrays and then use the usual numpy serialization methods.

# ? May 30, 2019 16:10

Sad Panda: Sep 22, 2004; I'm a Sad Panda.

Is there a way to find the changelogs for packages that are updated but don't necessarily have anything useful? For example I'm using PyAutoGui. 3 days ago 0.9.43 was released. If I look at the Github (https://github.com/asweigart/pyautogui/commits/master) the latest is 0.9.42 but PyPi (https://pypi.org/project/PyAutoGUI/0.9.43/#history) shows that 0.9.43 exists but I have no idea what changed.

It doesn't necessarily have to be showdiff, it's just that I'm reticent to upgrade without know what's happened given it's going to a semi-important project

Sad Panda fucked around with this message at 19:34 on May 30, 2019

# ? May 30, 2019 19:32

mr_package: Jun 13, 2000

Business posted:

The pickle file seems gigantic to me (600 kb text file to 17 mb .pickle) so I'm worried I'm going about this the wrong way.

Is this an appropriate use case? Eventually I want to add users and allow them to save processed text files. Do I store those pickle objects as fields in a database or is that a Bad Way

According to docs that seems like it's to be expected.

quote:

When pickling spaCy�s objects like the Doc or the EntityRecognizer, keep in mind that they all require the shared Vocab (which includes the string to hash mappings, label schemes and optional vectors). This means that their pickled representations can become very large, especially if you have word vectors loaded, because it won�t only include the object itself, but also the entire shared vocab it depends on.

https://spacy.io/usage/saving-loading

Only thing I can think to check is if v2 has some new stuff that makes it easier/better (these are v1 docs I think?) it's not really clear to me whether the built-in to_bytes() / from_bytes() methods can apply to Doc objects or only pipelines or whatever (I have no experience in NLP so this is all greek to me): https://spacy.io/usage/v2#migrating-saving-loading

I would not store these in a database generally but maybe some of the new object / document store databases would work? Not sure what they buy you over just saving to disk if you can't actually make use of the data (e.g. what does stuffing binary blobs into a db get you? I don't know but check em out there might be some benefit there).

# ? May 30, 2019 19:58

necrotic: Aug 2, 2005; I owe my brother big time for this!

Sad Panda posted:

Is there a way to find the changelogs for packages that are updated but don't necessarily have anything useful? For example I'm using PyAutoGui. 3 days ago 0.9.43 was released. If I look at the Github (https://github.com/asweigart/pyautogui/commits/master) the latest is 0.9.42 but PyPi (https://pypi.org/project/PyAutoGUI/0.9.43/#history) shows that 0.9.43 exists but I have no idea what changed.

It doesn't necessarily have to be showdiff, it's just that I'm reticent to upgrade without know what's happened given it's going to a semi-important project

I'd open an issue on the repo and ask why there are releases but no changes to the repo. Their changes document hasn't been updated since 0.9.40 either.

They just released another one today, even. You're only option at this point is to diff the package contents.

# ? May 30, 2019 20:28

Business: Feb 6, 2007

Thanks for the responses all they are helpful. Going to look into storing as numpy arrays and/or being more selective about what I store. The main issue is that my texts are large enough that I have to stream it as a chopped up list of Docs, and then find a way to store that and the associated vocab data without resorting to writing it to bytes. The blobs would give me everything, and they good for my own offline purposes, but its helpful to have it confirmed that it's weird to store a big binary data thing for each file.

# ? May 30, 2019 21:54

General_Failure: Apr 17, 2005

I worked out my pip problem. Sort of. Using - - no-use-pep517 gets me past the issue. It's concerning that the same problem exists on multiple devices.

# ? May 30, 2019 21:59

shrike82: Jun 11, 2005

Business posted:

Thanks for the responses all they are helpful. Going to look into storing as numpy arrays and/or being more selective about what I store. The main issue is that my texts are large enough that I have to stream it as a chopped up list of Docs, and then find a way to store that and the associated vocab data without resorting to writing it to bytes. The blobs would give me everything, and they good for my own offline purposes, but its helpful to have it confirmed that it's weird to store a big binary data thing for each file.

Preprocessing texts in spacy shouldn't be that compute intensive so consider just storing the raw texts and computing on demand especially if users are given the option to upload new texts.

Trade CPU time for storage - even with numpy, I suspect you're going to be dealing with large objects since there's a lot of token level metadata.

# ? May 31, 2019 00:50

Wallet: Jun 19, 2006

shrike82 posted:

Preprocessing texts in spacy shouldn't be that compute intensive so consider just storing the raw texts and computing on demand especially if users are given the option to upload new texts.

Trade CPU time for storage - even with numpy, I suspect you're going to be dealing with large objects since there's a lot of token level metadata.

I imagine this is the way you'd want to go because spaCy is taking text and adding a whole lot of extra data to it by its very nature. Depending on what you're up to you could also split the difference and store the original text and whatever the relevant processed/aggregated results are but not the entire marked up data from spaCy.

# ? May 31, 2019 13:47

the yeti: Mar 29, 2008; memento disco

Is it awful practice to wrap import errors in a generic �Make sure you�re using the right virtual environment � type reminder?

# ? May 31, 2019 14:08

mbt: Aug 13, 2012

the yeti posted:

Is it awful practice to wrap import errors in a generic ‘Make sure you’re using the right virtual environment ‘ type reminder?

As someone who us often in the wrong virtual environment, yes because that's sort of implied by the error. Anyone who uses a virtual environment ie non new users, that would be the first thing you check.

# ? May 31, 2019 15:44

Boris Galerkin: Dec 17, 2011; I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!

I think if you do something like

code:

try:
  import foo

except ImportError:
  raise ImportError("oh god did you activate the environment?")

i.e., giving a more tailored troubleshooting message that would be fine but if you mean to make your own error class or to do something like, I dunno, checking for the name of the environment and/or packages are available, then it's a waste of time.

Also, for anybody experienced with using python this type of message would be completely unneeded as they'd all know what to do/check, so it'd just be for beginners.

Boris Galerkin fucked around with this message at 16:28 on May 31, 2019

# ? May 31, 2019 16:24

Business: Feb 6, 2007

shrike82 posted:

Preprocessing texts in spacy shouldn't be that compute intensive so consider just storing the raw texts and computing on demand especially if users are given the option to upload new texts.

Trade CPU time for storage - even with numpy, I suspect you're going to be dealing with large objects since there's a lot of token level metadata.

From testing things out on my own it seems like everything up through part of speech tagging is no problem, but dependencies and NER take a long time. About ~25 seconds on my laptop that has 8 gigs of ram for a more or less typical use case I have in mind. Which seems prohibitively expensive in terms of cloud CPU time but I have no clue, was just working under the assumption that storage is way cheaper than computation. Sorry this is getting kinda far afield but cool to see other people are using spacy

# ? May 31, 2019 16:46

the yeti: Mar 29, 2008; memento disco

Boris Galerkin posted:

I think if you do something like
code:
try:
  import foo

except ImportError:
  raise ImportError("oh god did you activate the environment?")
i.e., giving a more tailored troubleshooting message that would be fine

Yeah that�s what I had in mind, I�m dealing with DBAs who don�t know much/any python but will run the stuff I�m writing now and then to check data going into their db.

# ? May 31, 2019 17:07

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

the yeti posted:

Yeah that�s what I had in mind, I�m dealing with DBAs who don�t know much/any python but will run the stuff I�m writing now and then to check data going into their db.

Yeah, if you know your audience you should write for your audience.

# ? May 31, 2019 18:57

Empress Brosephine: Mar 31, 2012; by Jeffrey of YOSPOS

Trying to learn Flask & Python to do what I need it to do....right now I'm trying to get a integer from a HTML number form and times it by a set amount (in this example 12) and then return the result to display on the HTML page...basically ask the user how many tickets they want and then times it by the ticket cost. I know I'm doing this wrong anyways but I think if I can get this one thing working I can expand out and do what I actually want to accomplish. The problem I'm having though is when I run the application, I don't get x * 12 as a result I get the integer x printed 12 times....so not sure what I'm doing wrong. Anyway one of you could take a look and give a hint or something? I tried to turn the input aduTix into a int to times by twelve but it returns a internal server error. I don't really understand what I'm doing but I can kinda understand the code I'm writing. Thank you all for the help! \

EDIT: The internal server error I get says "int" or "float' or "str" aren't supported.

EDIT2: I figured it out! Flask doesn't like to return a int, it has to be a string or something else I believe

here's my html:

code:

<!DOCTYPE html>
<html>
<head>
	</head>

<body>
<form method = "POST">
<input type ="radio" name="seasons" value="regular">Regular<br>
<input type ="radio" name="seasons" value="peak">Peak</br>  

<br>
<br>
# of Adults (13-64): 
<input type="number" name="adultTix"><br>
# of Seniors (65+):
<input type="number" name="seniorTix"><br>
# of Children (4-12):
<input type="number" name="childTix"><br>
# of Children Under 4:
<input type="number" name="free"><br>
<br> 
<input type="submit" name="submit_button">

</form>


</body>
</html>

and my app.py:

code:

from flask import Flask, request, render_template

app = Flask(__name__)


@app.route('/')
def forms():
    return render_template('form.html')


@app.route('/', methods=['GET', 'POST'])
def forms_post():

    # init variables
    seaChoice = request.form['seasons']
    aduTix = request.form['adultTix']
    senTix = request.form['seniorTix']
    chiTix = request.form['childTix']
    freeTix = request.form['free']

    # do the maths
    if seaChoice.lower() == "regular":
        aduTotal = aduTix * 12

    return aduTotal


app.run()

Empress Brosephine fucked around with this message at 20:21 on May 31, 2019

# ? May 31, 2019 19:39

Da Mott Man: Aug 3, 2012

Empress Brosephine posted:

Trying to learn Flask & Python to do what I need it to do....right now I'm trying to get a integer from a HTML number form and times it by a set amount (in this example 12) and then return the result to display on the HTML page...basically ask the user how many tickets they want and then times it by the ticket cost. I know I'm doing this wrong anyways but I think if I can get this one thing working I can expand out and do what I actually want to accomplish. The problem I'm having though is when I run the application, I don't get x * 12 as a result I get the integer x printed 12 times....so not sure what I'm doing wrong. Anyway one of you could take a look and give a hint or something? I tried to turn the input aduTix into a int to times by twelve but it returns a internal server error. I don't really understand what I'm doing but I can kinda understand the code I'm writing. Thank you all for the help! \

EDIT: The internal server error I get says "int" or "float' or "str" aren't supported.

EDIT2: I figured it out! Flask doesn't like to return a int, it has to be a string or something else I believe

This should help explain why it wasn't working.
http://flask.pocoo.org/docs/1.0/quickstart/#about-responses

# ? May 31, 2019 20:27

Empress Brosephine: Mar 31, 2012; by Jeffrey of YOSPOS

Thanks for the help with Flask. Another quick question, is it possible to take data input from a form like a date and then insert it into a a href such as

E; nvm keep answering my own questions

Empress Brosephine fucked around with this message at 17:44 on Jun 1, 2019

# ? Jun 1, 2019 17:36

Dominoes: Sep 20, 2007

Hey dudes. Is it possible to set mobile conditions from a Django template? I'm not v good with them and forms, but I have a form for file upload that's in a template since I'm not sure how to do it on frontend. Want to hide it if on mobile. Normally this is done by checking window.innerWidth, but not sure how to do in a template or Django form. Answer might just be to move to frontend.

# ? Jun 1, 2019 17:42

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

Dominoes posted:

Hey dudes. Is it possible to set mobile conditions from a Django template? I'm not v good with them and forms, but I have a form for file upload that's in a template since I'm not sure how to do it on frontend. Want to hide it if on mobile. Normally this is done by checking window.innerWidth, but not sure how to do in a template or Django form. Answer might just be to move to frontend.

You can parse request.META['HTTP_USER_AGENT'] and then see if its a mobile browser. There's various libraries to help with this.

Personally, I avoid doing this kinda stuff. Instead I'd use media queries in my CSS.

# ? Jun 1, 2019 17:48

unpacked robinhood: Feb 18, 2013; by Fluffdaddy

I'd like a method to run once a month, on a machine that's randomly on (I'd prefer to keep-it full python if I can so no cron etc)
It's not important that it runs every 30 days, or every 1st of the month.

If the main script hasn't been executed for more than a month it should run the task once, and keep track for next time
I've done a quick test with schedule but I don't think it's what I need.

What are my options ? I can whip up a lovely thing by writing a timestamp to a file but maybe there's a better way.

# ? Jun 2, 2019 18:18

Nippashish: Nov 2, 2005; Let me see you dance!

unpacked robinhood posted:

I'd like a method to run once a month, on a machine that's randomly on (I'd prefer to keep-it full python if I can so no cron etc)
It's not important that it runs every 30 days, or every 1st of the month.

If the machine might reboot at unpredictable times I don't see how you can guarantee anything without touching the external environment. Even if you write your own custom logic to keep track of how long to wait between runs you still need to somehow tell the system to turn the thing doing the waiting back on when it reboots, and that is going to necessarily involve configuring something at the system level (i.e. not "full python").

I would just make a cron job that runs once per day, and have the script check if it's been at least a month since it was last run before doing anything.

# ? Jun 2, 2019 18:37

QuarkJets: Sep 8, 2008

Definitely use cron

# ? Jun 2, 2019 19:03

unpacked robinhood: Feb 18, 2013; by Fluffdaddy

QuarkJets posted:

Definitely use cron

Nippashish posted:

I would just make a cron job that runs once per day, and have the script check if it's been at least a month since it was last run before doing anything.

The script will run at least once when the machine is on, probably at boot time, or from user input. I'll take a look at cron then.

Any libraries to deal with the second part ? I've written a thing but I don't really trust myself with reliability and edge cases

# ? Jun 2, 2019 19:14

cinci zoo sniper: Mar 15, 2013

unpacked robinhood posted:

The script will run at least once when the machine is on, probably at boot time, or from user input. I'll take a look at cron then.

Any libraries to deal with the second part ? I've written a thing but I don't really trust myself with reliability and edge cases

I would just write month of execution into a text file, and compare against it on each run if that value has changed since the last time, or does not exist.

# ? Jun 2, 2019 19:29

unpacked robinhood: Feb 18, 2013; by Fluffdaddy

cinci zoo sniper posted:

I would just write month of execution into a text file, and compare against it on each run if that value has changed since the last time, or does not exist.

I like this better than dicking around with timestamps, thanks

# ? Jun 2, 2019 19:38

Empress Brosephine: Mar 31, 2012; by Jeffrey of YOSPOS

Right now I have a variable that is a request.form from a html date picker that assigns a value of XXXX-XX-XX to the variable. How would I write a if statement that says if variable is between let�s say 2019-01-01 and 2019-05-31, is that possible? I�m not sure if Python or myself is smart enough to know that the variable is a integer and not just a random string. Right now I have about a 50 int long �or� statement looking for certain dates to trigger a fail.

Thanks for the help.

# ? Jun 2, 2019 20:34

Dr Subterfuge: Aug 31, 2005; TIME TO ROC N' ROLL

The datetime module has problems I don't totally understand (mostly having to do with time zone shenanigans, I think?) but that way I'd do it is turn the XXXX-XX-XX string into a date object (using fromisoformat) and do your comparisons between those.

# ? Jun 2, 2019 20:51

KICK BAMA KICK: Mar 2, 2009

Empress Brosephine posted:

Right now I have a variable that is a request.form from a html date picker that assigns a value of XXXX-XX-XX to the variable. How would I write a if statement that says if variable is between let�s say 2019-01-01 and 2019-05-31, is that possible? I�m not sure if Python or myself is smart enough to know that the variable is a integer and not just a random string. Right now I have about a 50 int long �or� statement looking for certain dates to trigger a fail.

Thanks for the help.

You can use datetime's strptime to turn a string into a datetime you can easily test by specifying the format the string is in:

Python code:

import datetime

value = '2019-06-02'
format_str = '%Y-%m-%d'
value_as_date = datetime.datetime.strptime(value, format_str).date()  # strptime returns a datetime, call date() to throw away the time information if we're only concerned with the date
start_date = datetime.date(2019, 01, 01)
end_date = datetime.date(2019, 05, 31)
in_range = start_date <= value_as_date <= end_date

# ? Jun 2, 2019 20:56

unpacked robinhood: Feb 18, 2013; by Fluffdaddy

Empress Brosephine posted:

Right now I have a variable that is a request.form from a html date picker that assigns a value of XXXX-XX-XX to the variable. How would I write a if statement that says if variable is between let�s say 2019-01-01 and 2019-05-31, is that possible? I�m not sure if Python or myself is smart enough to know that the variable is a integer and not just a random string. Right now I have about a 50 int long �or� statement looking for certain dates to trigger a fail.

Thanks for the help.

Python code:

import pendulum as pdu

d1 = pdu.parse('2019-01-01')
d2 = pdu.parse('2019-05-31')
inside = pdu.parse('2019-04-20')
outside = pdu.parse('2008-04-20')

print(d1<inside<d2) # True
print(d1<outside<d2) # False

Seems ok ?

# ? Jun 2, 2019 20:56

CarForumPoster: Jun 26, 2013; â¡POWERâ¡

unpacked robinhood posted:

Python code:

import pendulum as pdu

d1 = pdu.parse('2019-01-01')
d2 = pdu.parse('2019-05-31')
inside = pdu.parse('2019-04-20')
outside = pdu.parse('2008-04-20')

print(d1<inside<d2) # True
print(d1<outside<d2) # False

Seems ok ?

I had never heard of pendulum until now and have definitely been bitten by datetime issues or wrote too complicated of code for that kind of bs. Absolutely fantastic.

# ? Jun 2, 2019 21:30

NinpoEspiritoSanto: Oct 22, 2013

Pendulum owns tbh

# ? Jun 2, 2019 23:12

Empress Brosephine: Mar 31, 2012; by Jeffrey of YOSPOS

Thank you al so much. Will this work with Flask?

# ? Jun 2, 2019 23:14

Dominoes: Sep 20, 2007

Python code:

import pendulum as pdu
d1 = pdu.parse('2019-01-01')

d1.time()  # Time(0, 0, 0)

pdu.parse('2019-01-01') == pdu.parse('2019-01-01T00:00') # True

Bad behavior.

Thermopyle posted:

You can parse request.META['HTTP_USER_AGENT'] and then see if its a mobile browser. There's various libraries to help with this.

Personally, I avoid doing this kinda stuff. Instead I'd use media queries in my CSS.

Appreciate it.

# ? Jun 3, 2019 00:20

General_Failure: Apr 17, 2005

Liking the Jetson nano, but the software dev team needs to get their poo poo together. Some stuff is pre installed with their Jetpack SDK, some via other means. What is making GBS threads me is opencv. I don't know what they did but it's a part of the SDK. Trouble is it doesn't appear in any venvs. Can someone point me in the right direction of how to deal with this, if at all possible?

# ? Jun 3, 2019 09:10

the yeti: Mar 29, 2008; memento disco

Is that poetry package manager the pendulum people have any good?

# ? Jun 3, 2019 14:28

Malcolm XML: Aug 8, 2009; I always knew it would end like ｔｈｉｓ．

It's like if pipenv wasn't written by an rear end in a top hat, so yes

# ? Jun 3, 2019 16:48

the yeti: Mar 29, 2008; memento disco

Malcolm XML posted:

It's like if pipenv wasn't written by an rear end in a top hat, so yes

I figured at least one response would be along that line. In a practical sense pipenv is really pissing me off because packages keep breaking it (pendulum does) and the excuse is always vendored dependencies.

# ? Jun 3, 2019 17:09

Adbot: ADBOT LOVES YOU

# ? May 27, 2024 00:31

Empress Brosephine: Mar 31, 2012; by Jeffrey of YOSPOS

So I finished Python Crash Course and loved it; what should I read next to improve my skills? Is it worth learning more than the blade level of skills with Flask?

Thanks all.

# ? Jun 4, 2019 01:48

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python

«‹›230 »