Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Furism
Feb 21, 2006

Live long and headbang
Automate the Boring Stuff is nice but a little too simple. I'd like to know more about the proper coding conventions in the Python world. What would be the next book?

Adbot
ADBOT LOVES YOU

bob dobbs is dead
Oct 8, 2017

I love peeps
Nap Ghost

Furism posted:

Automate the Boring Stuff is nice but a little too simple. I'd like to know more about the proper coding conventions in the Python world. What would be the next book?

"effective python"
"fluent python"

they're ostensibly advanced but you can just rtfm when you get a bit lost

KICK BAMA KICK
Mar 2, 2009

KICK BAMA KICK posted:

e: Oh, query the database in the main thread and submit each row to a concurrent.futures.ProcessPoolExecutor? Then iterate through concurrent.futures.as_completed(those_futures) to get the max. That more on the right track?

ee: Yep, :perfect:, huge thanks
So this didn't actually work -- probably obvious to everyone but me that I'm still loading all the data into memory as I submit those Futures. I got away with it on my first test (my database is like 85% of the size of the RAM on the target machine) but subsequent runs soft-locked the machine, mashing Ctrl-C might get control back a minute or two later. Finally sorted it out though -- realized I was dumb for using a database; each row was just a blob serializing an ndarray and a string identifying the thing that data's about, so I just dumped those into .npy files with the filename identifying them. Now I just pool.map(do_thing, Path('/data/those_files/').glob('*.npy')) and np.load the file there. Super simple code, slightly faster, uses maybe 10% of the memory it did before. This whips rear end!

CarForumPoster
Jun 26, 2013

⚡POWER⚡
How do I do something to a range of columns in a df?

I have 10 columns named "DispoClass_#" with the # being 1-10. I want to set to ordinality of the categorical values they contain with .cat.set_categories

How do I select all 10? I need to do this with other things structured as "Name_#" so just writing them out isn't that deirable.

Something like this, but, ya know, works...
code:
df_raw[["DispoClass_1":"DispoClass_10"]].cat.set_categories(['High', 'Medium', 'Low'], ordered=True, inplace=True)

Furism
Feb 21, 2006

Live long and headbang
So apparently I'm a dumbass because I can't seem to instantiate an object for a class I created. I read the documentation and I don't understand what I'm doing wrong :(

This is my class:

code:
class CfClient:
    __bearerToken = ""

    def __init__(self, userName, userPassword, controllerAddress):
        self.userName = userName
        self.userPassword = userPassword
        self.controllerAddress = controllerAddress

    def login(self):
        ## Do Login
        self.__bearerToken = 12345
        return True
And my main file:

code:
import lib.models.CfClient

cfClient = CfClient("Soandso",
                    "somePassword", "https://192.168.1.10")
When I do this, I'm getting an error saying "Undefined variable 'CfClient'"

I can't seem to understand the difference between my code and the documentation sample, except that my file is under two subdirectories (into which I dropped __init__.py files). What am I doing wrong?

TheFluff
Dec 13, 2006

FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE
Either import with an alias:
Python code:
import lib.models.CfClient as CfClient # doesn't work, I'm dumb
Or refer to it with the full path:

Python code:
import lib.models.CfClient

client = lib.models.CfClient("foo", "bar")
edit: no wait I'm dumb, that first one doesn't work. Just do
Python code:
from lib.models import CfClient
instead if that's what you want.

edit edit: The above assumes that "class CfClient: ..." is in a file called models.py in a directory called lib. If you instead have lib/models/CfClient.py which contains the class CfClient, then you need to tell Python about both the file and the class, like so:

Python code:
from lib.models.CfClient import CfClient

client = CfClient("foo", "bar")

TheFluff fucked around with this message at 19:10 on Dec 17, 2018

cinci zoo sniper
Mar 15, 2013




CarForumPoster posted:

How do I do something to a range of columns in a df?

I have 10 columns named "DispoClass_#" with the # being 1-10. I want to set to ordinality of the categorical values they contain with .cat.set_categories

How do I select all 10? I need to do this with other things structured as "Name_#" so just writing them out isn't that deirable.

Something like this, but, ya know, works...
code:
df_raw[["DispoClass_1":"DispoClass_10"]].cat.set_categories(['High', 'Medium', 'Low'], ordered=True, inplace=True)

You should use df.filter() for that.

Python code:
import pandas as pd

test = pd.DataFrame()
test["hello"] = [1, 2, 3]
test["a_01"] = ["foo", "bar", "baz"]
test["a_02"] = ["foo", "bar", "baz"]
test["a_03"] = ["foo", "bar", "baz"]

print(test)

target = test.filter(like='a_', axis=1)
test[target.columns] = target.apply(lambda x: x.str.capitalize())

print(test)

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL
What about df.loc? It’s at least advertised to do slicing with strings.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

cinci zoo sniper posted:

You should use df.filter() for that.

Python code:
import pandas as pd

test = pd.DataFrame()
test["hello"] = [1, 2, 3]
test["a_01"] = ["foo", "bar", "baz"]
test["a_02"] = ["foo", "bar", "baz"]
test["a_03"] = ["foo", "bar", "baz"]

print(test)

target = test.filter(like='a_', axis=1)
test[target.columns] = target.apply(lambda x: x.str.capitalize())

print(test)

Much appreciated. Also appreciate you helping me last week.

Also I just now found out about : https://regex101.com/

Holy crap is that helpful! I have the worst time with regexs and I basically end up finding someone on stack overflow who wasnt the same thing and cpying the answer.

Furism
Feb 21, 2006

Live long and headbang

TheFluff posted:

If you instead have lib/models/CfClient.py which contains the class CfClient, then you need to tell Python about both the file and the class, like so:

Python code:
from lib.models.CfClient import CfClient

client = CfClient("foo", "bar")

That was it, thanks!

cinci zoo sniper
Mar 15, 2013




Dr Subterfuge posted:

What about df.loc? It’s at least advertised to do slicing with strings.
You can, but I don't think there even seldom are any good reasons to do so.

What you could do instead is something like this:
Python code:
target = test.columns[test.columns.str.contains(pat='a_')]
test[target] = test[target].apply(lambda x: x.str.capitalize())

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL
What makes df.loc so much less desirable?

cinci zoo sniper
Mar 15, 2013




Dr Subterfuge posted:

What makes df.loc so much less desirable?

For regex/-like subsetting it’s just a question of code clarity. Functionally nothing will change at lower levels, I think. Like, what would be your proposed .loc example here?

Furism
Feb 21, 2006

Live long and headbang
So I'm trying to POST a file along with a Bearer Token against a REST API. I get an error from the API telling me that "Request was not successfully validated against the schema." I'm trying to figure out what I did wrong because I think my request is correctly crafted:

code:
def uploadFile(self, file):
        ofile = open(file, "rb")
        files = {'file' : ofile}
        response = requests.post(
            self.controllerAddress + '/files?type=multipart',
            headers={'Authorization': 'Bearer {0}'.format(self.__bearerToken)},
            files=files,
            verify=False,
        )
        print(response)
Normally I'd fire up Wireshark and look at what's actually sent, but the API is over HTTPS and I don't have the server's private key to decrypt. I enabled the logging mobule but it doesn't seem to be able to show POST requests:

code:
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): 10.75.231.30:443
DEBUG:urllib3.connectionpool:[url]https://10.75.231.30:443[/url] "POST /api/v2/files?type=multipart HTTP/1.1" 400 71
<Response [400]>
What are my options to find out what's wrong with my request? Note that "file" in the method's signature is just a string that contains the full, absolute path to the actual file.

Sorry for the newbie questions :(

Furism fucked around with this message at 22:10 on Dec 17, 2018

necrotic
Aug 2, 2005
I owe my brother big time for this!
That is the server telling you it will not accept the request for not matching whatever schema it expects. Inspecting the traffic wouldn't help you there. It does look like you're using requests correctly. Does the API also expect a body in the request instead of only the file part?

SurgicalOntologist
Jun 17, 2004

CarForumPoster posted:

How do I do something to a range of columns in a df?

I have 10 columns named "DispoClass_#" with the # being 1-10. I want to set to ordinality of the categorical values they contain with .cat.set_categories

How do I select all 10? I need to do this with other things structured as "Name_#" so just writing them out isn't that deirable.

Something like this, but, ya know, works...
code:
df_raw[["DispoClass_1":"DispoClass_10"]].cat.set_categories(['High', 'Medium', 'Low'], ordered=True, inplace=True)

In general if you find yourself wanting to do computation on the names of columns (e.g. .str.startswith), you are doing something wrong and should reorganize your data. In your case I would create a dataframe with only the "DispoClass" data, with columns 1-10. Then you don't have to do any column subsetting when you want to do something to the DispClass data only. You can still coordinate the data with columns from another dataframe if they share the same row labels (index).

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL

cinci zoo sniper posted:

For regex/-like subsetting it’s just a question of code clarity. Functionally nothing will change at lower levels, I think. Like, what would be your proposed .loc example here?

This seems like it would work?

Python code:
test.loc[:, 'a_01':'a_03'].apply(do_stuff)
I'm pretty sure .loc returns a view? If it doesn't I don't understand anything.

SurgicalOntologist
Jun 17, 2004

Indexing without the loc is just a shortcut that sometimes works. cinci zoo sniper's suggestion could easily have been

Python code:
target = test.columns[test.columns.str.contains(pat='a_')]
test.loc[:, target] = test.loc[:, target].apply(lambda x: x.str.capitalize())
The distinction that matters here is boolean indexing (based on a computation on the column labels) vs. slice indexing. I think slice indexing is more clear in this case but also relies on column order to an extent that makes it too fragile IMO.

In any case, if you are naming your columns XX_1, XX_2, ..., XX_n then you have >2D data and either split up into multiple dataframes as I suggested or look into other data structures like xarray.

cinci zoo sniper
Mar 15, 2013




^^ For a single column, list of columns, or slice of rows [] and .loc behave identically. For single rows, list of rows, slice of columns, or combined selection of rows and columns within one operation .loc is the only appropriate option, and please don’t ask me why.

Dr Subterfuge posted:

This seems like it would work?

Python code:
test.loc[:, 'a_01':'a_03'].apply(do_stuff)
I'm pretty sure .loc returns a view? If it doesn't I don't understand anything.

Right, this would work but as SurgicalOntologist points out, it relies on column order (also see comment above about slicing columns). I don’t think there’s something inherently wrong with that to a severe extent, but I prefer to defensively avoid doing operations like that.

Dominoes
Sep 20, 2007

Does anyone else dislike the if __name__ == __main__ syntax? I still have to look it up every time.

Dominoes fucked around with this message at 02:46 on Dec 18, 2018

bob dobbs is dead
Oct 8, 2017

I love peeps
Nap Ghost

Dominoes posted:

Does anyone else dislike the if __name__ == __main__ syntax? I still have to look it up every time.

you forgot quote marks

at this point it's muscle memory to me

Cute n Popular
Oct 12, 2012
I was in the same position not too long ago and I can't recommend Effective Python enough after a goon recommended it in this thread. I did find that I occasionally needed external resources to supplement the material but that's more on me being a novice then anything else.

I also picked up a lot by doing advent of code and going through various solutions that were posted online.

Cute n Popular fucked around with this message at 08:23 on Dec 18, 2018

cinci zoo sniper
Mar 15, 2013




Dominoes posted:

Does anyone else dislike the if __name__ == __main__ syntax? I still have to look it up every time.

Sign me up on the “this feels awkward” list.

QuarkJets
Sep 8, 2008

I dislike that syntax as well even if I'm used to it now

It feels like it's a hack rather than a feature

necrotic
Aug 2, 2005
I owe my brother big time for this!
What would you proposed it look like instead?

Furism
Feb 21, 2006

Live long and headbang

necrotic posted:

That is the server telling you it will not accept the request for not matching whatever schema it expects. Inspecting the traffic wouldn't help you there. It does look like you're using requests correctly. Does the API also expect a body in the request instead of only the file part?

Yes that was my understanding of the problem too. The API documentation only says this:



I wanted to inspect the traffic to make sure my code did what I thought it did (being new at Python).

necrotic
Aug 2, 2005
I owe my brother big time for this!
Looks like it is expecting a specific mime type on the file payload. I dont know how to do that with requests off the top of my head, but look around for customizing the mime type on the file in the request.

Furism
Feb 21, 2006

Live long and headbang
Tried that (sorry, that wasn't in my original code) this way:

code:
headers={'Authorization': 'Bearer {0}'.format(self.__bearerToken),
                     'Content-Type': 'multipart/form-data'},
Still no luck. I'll look around.

necrotic
Aug 2, 2005
I owe my brother big time for this!
No, the file itself is attached with a different content type. It's multipart, like email.

here https://stackoverflow.com/questions/15746558/how-to-send-a-multipart-related-with-requests-in-python

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

Can't you just do it like this?
http://docs.python-requests.org/en/master/user/quickstart/#post-a-multipart-encoded-file
(second example, specifying the content-type explicitly)

also are you doing the auth right? It looks like it's complaining that the auth header is bad in some way

cinci zoo sniper
Mar 15, 2013




necrotic posted:

What would you proposed it look like instead?

Literally anything that isn't 26 "out-of-context" characters?

Dominoes
Sep 20, 2007

necrotic posted:

What would you proposed it look like instead?
def main():, where main is a built-in.

Dominoes fucked around with this message at 20:11 on Dec 18, 2018

cinci zoo sniper
Mar 15, 2013




Dominoes posted:

def main():, where main is a built-in.

I was thinking about some ‘on import:’ construct, but this would do better probably. Either way I agree to this approach, my main philosophical argument is against the comparison of an ostensible implicit.

bob dobbs is dead
Oct 8, 2017

I love peeps
Nap Ghost

Dominoes posted:

def main():, where main is a built-in.

save it for python 4, i guess

(lots of peeps have a main() in python scripts already, so if you make

code:
if __name__ == "__main__":
    do_some_shit()
    main()
and then make the main() the entrance bit, shenanigans)

Nippashish
Nov 2, 2005

Let me see you dance!
You could also just commit to not import your scripts as modules and then you don't need any double underscore shenanigans.

cinci zoo sniper
Mar 15, 2013




bob dobbs is dead posted:

save it for python 4, i guess

(lots of peeps have a main() in python scripts already, so if you make

code:
if __name__ == "__main__":
    do_some_shit()
    main()
and then make the main() the entrance bit, shenanigans)

I guess they could do then a new reserved keyword or the like, e.g. what I thought of, to preserve legacy code. Or probably some actually competent solution, I’m not a compaci person by large and wide margin.

QuarkJets
Sep 8, 2008

Nippashish posted:

You could also just commit to not import your scripts as modules and then you don't need any double underscore shenanigans.

It's cool and good to have a file that is both importable and runnable as a script

necrotic
Aug 2, 2005
I owe my brother big time for this!

baka kaba posted:

Can't you just do it like this?
http://docs.python-requests.org/en/master/user/quickstart/#post-a-multipart-encoded-file
(second example, specifying the content-type explicitly)


Yeah that looks way better. I just did a phone search :effort:

Nippashish
Nov 2, 2005

Let me see you dance!

QuarkJets posted:

It's cool and good to have a file that is both importable and runnable as a script

I guess what I'm trying to say is that having a slightly weird syntax to do a slightly weird thing is one of the less objectionable features of python imo.

Adbot
ADBOT LOVES YOU

QuarkJets
Sep 8, 2008

Lots of languages treat a function named main() as, well, main. It wouldn't have been unusual for Python to do the same thing. It's not really objectionable just a weird design choice to break from convention and make everyone check the value of the __name__ variable

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply