Python information and short questions megathread.

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python information and short questions megathread.

«‹›484 »

the: Jul 18, 2004; by Cowcaster

Could you explain what exactly (_.strip() for _ in s.split(',')) is doing?

# ? May 10, 2014 20:44

Adbot: ADBOT LOVES YOU

# ? May 9, 2024 12:16

Space Kablooey: May 6, 2009

For every item ("_", in this case) in the iterable (s.split(',')), call method .strip() of that item and put that item in a tuple.

Utimately, it results in a tuple like this: ("115 Berry Lane", "Plano", "TX 24134")

For more explanation: https://docs.python.org/2/tutorial/datastructures.html#list-comprehensions

# ? May 10, 2014 20:54

sharktamer: Oct 30, 2011; Shark tamer ridiculous

Wouldn't it make sense to use regex in this case? I guess splitting strings would work for such a simple case, but it's always worth considering.

code:

import re
street, city, state, zip = re.search(r'(.+), (.+), (\w{2}) (\d+)', t).groups()

# ? May 10, 2014 20:54

QuarkJets: Sep 8, 2008

I was always told that it's better to use Python methods rather than regex, but I don't really know why

# ? May 11, 2014 01:29

namaste friends: Sep 18, 2004; by Smythe

QuarkJets posted:

I was always told that it's better to use Python methods rather than regex, but I don't really know why

I've been told that regexps can be extremely computationally expensive. I think it just boils down to using the right tool for the right job.

# ? May 11, 2014 01:50

BannedNewbie: Apr 22, 2003; HOW ARE YOU? -> YOSHI?
FINE, THANK YOU. -> YOSHI.

Cultural Imperial posted:

I've been told that regexps can be extremely computationally expensive. I think it just boils down to using the right tool for the right job.

I've also heard people say to avoid them because they can be difficult for other people to read

# ? May 11, 2014 01:54

namaste friends: Sep 18, 2004; by Smythe

BannedNewbie posted:

I've also heard people say to avoid them because they can be difficult for other people to read

I don't find that's much of a problem with all the web based regexp testers out there.

# ? May 11, 2014 02:02

QuarkJets: Sep 8, 2008

Cultural Imperial posted:

I don't find that's much of a problem with all the web based regexp testers out there.

Yeah, but if they're just Python methods then it won't be necessary to use a web-based regexp tester

And that doesn't help those of us who have to develop in an environment without internet

# ? May 12, 2014 06:46

namaste friends: Sep 18, 2004; by Smythe

QuarkJets posted:

Yeah, but if they're just Python methods then it won't be necessary to use a web-based regexp tester

And that doesn't help those of us who have to develop in an environment without internet

Hmmm. Can a python method search for a windows SID? Like a string with the form S-1-.....

# ? May 12, 2014 07:19

ahmeni: May 1, 2005; It's one continuous form where hardware and software function in perfect unison, creating a new generation of iPhone that's better by any measure.; Grimey Drawer

"Don't use regular expressions because they're hard to read" is such a terrible piece of advice that I can only assume it was cargo-culted from a Slashdot comment from 2003. Regexs are fine and if they get complicated you pretty fix them up with Pythons support for verbose regex via re.VERBOSE.

# ? May 12, 2014 08:50

BeefofAges: Jun 5, 2004; Cry 'Havoc!', and let slip the cows of war.

QuarkJets posted:

And that doesn't help those of us who have to develop in an environment without internet

I don't think I would accept a job offer from anywhere where this was the case. It just sounds miserable.

# ? May 12, 2014 17:30

Megaman: May 8, 2004; I didn't read the thread BUT...

QuarkJets posted:

And that doesn't help those of us who have to develop in an environment without internet

I'd LOVE to know what this company is, please tell us!!

# ? May 12, 2014 17:33

the: Jul 18, 2004; by Cowcaster

edit Nm answered my own question

the fucked around with this message at 17:57 on May 12, 2014

# ? May 12, 2014 17:55

null gallagher: Jan 1, 2014

e;fb by original poster

# ? May 12, 2014 17:58

good jovi: Dec 11, 2000; 'm pro-dickgirl, and I VOTE!

the posted:

Why isn't this working?

str.strip just strips characters off the beginning and end of the string. See: https://docs.python.org/2/library/stdtypes.html#str.strip

Try str.replace, or using re.sub for more complicated cases.

# ? May 12, 2014 17:59

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

ahmeni posted:

"Don't use regular expressions because they're hard to read" is such a terrible piece of advice that I can only assume it was cargo-culted from a Slashdot comment from 2003. Regexs are fine and if they get complicated you pretty fix them up with Pythons support for verbose regex via re.VERBOSE.

Like many sorts of general advice it breaks down in all sorts of situations.

There are many times where using a string method or two makes your intent more clear than a regex. There are many times where you've got a dozen lines of string methods to do something you could do in a simple regex.

String methods are faster than a regex. I think there are times when a regex is as fast (I can't say I've really run in to any situations where a regex is faster) but I can't come up with any examples off the top of my head. There are even more times where the speed doesn't even matter and you should go with the method that makes your intent the clearest.

Example of string method being 10x faster (compiling the regex in this case makes hardly any difference):

http://pastebin.com/7Jybgfid

Someone is welcome to come up with an example of the reverse. I'm sure such an example exists.

Thermopyle fucked around with this message at 19:03 on May 12, 2014

# ? May 12, 2014 19:01

Hammerite: Mar 9, 2007; And you don't remember what I said here, either, but it was pompous and stupid.; Jade Ear Joe

Basically it all adds up to "use regular expressions when they are the most suitable tool for the job, and don't use them when they are not". Which applies equally to any tool at your disposal.

# ? May 12, 2014 19:11

the: Jul 18, 2004; by Cowcaster

I'm trying to use cssselect to strip the state URLs off this page

I tried doing a straight selection of tables, but that didn't work. Then I tried going in via the div tag and then to the table, but that isn't grabbing anything either:

Python code:

import requests
import lxml.html
import cssselect
import csv

req = requests.get('http://www.publiclibraries.com/')
root = lxml.html.fromstring(req.text)

divs = root.cssselect('div')
div = divs[2]
tables = div.cssselect('table')

I said [2] because in the source it looks like the third div tag is where that table starts, but then:

code:

In [107]: tables
Out[107]: []

I've tried other div[] spots but they don't work either. What am I missing?

edit: I also used this page as a test run for the css selector. I pasted in the entire source code from the page I want, and I wrote 'table' in the second box, and it perfectly selected the states. So why the hell isn't *my* code working?

Second edit: Figured out the problem. They have a <p> tag on Line 50 that is supposed to be a </p> tag on their page. This breaks lxml I think?

the fucked around with this message at 20:54 on May 12, 2014

# ? May 12, 2014 19:22

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

Thermopyle posted:

Example of string method being 10x faster (compiling the regex in this case makes hardly any difference):

For reference, Python always compiles the regex. Around 2.2 it got an internal cache, so there's no reason to use re.compile manually.

# ? May 12, 2014 20:06

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

Suspicious Dish posted:

For reference, Python always compiles the regex. Around 2.2 it got an internal cache, so there's no reason to use re.compile manually.

Oh, sweet.

I think I might of known that at some point because years ago I used to compile them all the time and now I never do.

# ? May 12, 2014 20:21

Plorkyeran: Mar 22, 2007; To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed

Suspicious Dish posted:

For reference, Python always compiles the regex. Around 2.2 it got an internal cache, so there's no reason to use re.compile manually.

Explicitly compiling long-lived regex objects can still be beneficial and there isn't really any downside to doing so.

# ? May 12, 2014 21:00

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

Plorkyeran posted:

Explicitly compiling long-lived regex objects can still be beneficial

How?

# ? May 12, 2014 21:44

No Safe Word: Feb 26, 2005

Thermopyle posted:

How?

Only if you can do it before you know when it will be used at runtime. Like just compile it on initialization and then you don't have to pay hte one-time compile cost on first-use.

# ? May 12, 2014 21:54

Plorkyeran: Mar 22, 2007; To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed

Thermopyle posted:

How?

The cache is not infinite in size (unbounded caches are also known as "memory leaks"), so regex objects that should be long-lived can get bumped from the cache by large numbers of one-shot regexes. Not something that's going to be an issue very often, but more often than never.

Ignoring performance, I also think that precompiling hardcoded regexes result in trivially more readable code, i.e.:

Python code:

FOO_REGEX = re.compile('...')
...
match = FOO_REGEX.match(str)

# vs

FOO_REGEX = '...'
...
match = re.match(FOO_REGEX, str)

# ? May 12, 2014 22:24

more like dICK: Feb 15, 2010; This is inevitable.

You can increase the cache size if you're going over 100 regexes (like a big Django site might).

# ? May 12, 2014 22:25

QuarkJets: Sep 8, 2008

BeefofAges posted:

I don't think I would accept a job offer from anywhere where this was the case. It just sounds miserable.

It's not bad at all, and really not that rare, although I wouldn't say that it's common. Usually these kinds of environments will have Internet-connected computers sitting next to development computers, so it's not like you're screwed if you need to Google something. You won't be able to easily copy example code verbatim, however, which is probably actually a good thing

# ? May 12, 2014 23:54

Plorkyeran: Mar 22, 2007; To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed

Dominoes posted:

Looking for regex help! As a non-matching group in a larger regex, I'm trying to match any number of spaces, or a dash.

(?:\s*|\-) This matches any number of spaces, but returns None for all subsequent matching groups if it finds a -.

(?:\s+|\-) This matches 1 or more spaces or a -. Ie it does what I think it should do, but I really need to match 0 or more spaces.

(?:\s*|\-) When in doubt, try a question mark. Nope - this cuts the results short no matter what it finds.

Edit: figured out a solution: (?:\-|\s*) Ie do the - first... Magic.*?

I don't believe the hyphen needs escaping in this context. (?:-|\s*)

# ? May 15, 2014 11:57

suffix: Jul 27, 2013; Wheeee!

Dominoes posted:

Looking for regex help! As a non-matching group in a larger regex, I'm trying to match any number of spaces, or a dash.

(?:\s*|\-) This matches any number of spaces, but returns None for all subsequent matching groups if it finds a -.

(?:\s+|\-) This matches 1 or more spaces or a -. Ie it does what I think it should do, but I really need to match 0 or more spaces.

(?:\s*|\-) When in doubt, try a question mark. Nope - this cuts the results short no matter what it finds.

Edit: figured out a solution: (?:\-|\s*) Ie do the - first... Magic.*?

(?:\s*|\-) is ambiguous because even when there's a hyphen there are also zero spaces in front of it.

If you're sure that there should never be both spaces and a hyphen, you can use a negative lookahead assertion to only match spaces not followed by a hyphen,

(?:\s*(?!-)|-)

Or else it may be simpler to just match spaces and then an optional hyphen,

(?:\s*-?)

# ? May 15, 2014 12:20

ohgodwhat: Aug 6, 2005

So I went from numba 0.11.1 to 0.12.1, and now inplace subtraction is no longer allowed. I wonder if inplace addition with a negative number is allowed? I haven't tried it.

After fixing that, two functions that are quite similar in how they work have diverged in performance. One that used to be slow has gotten a lot faster (10s -> 2 s), and one that used to be quite fast is now a lot slower (0.25 s -> 15 s). I was looking at numbapro but this kind of stuff :psyduck:

I'll have to play with it though, as I imagine there's some way I'm writing my code that is hurting its performance. The only issue is that some of the users are on 0.11 and some are on 0.12. Ugh.

# ? May 15, 2014 13:36

BigRedDot: Mar 6, 2008

ohgodwhat posted:

So I went from numba 0.11.1 to 0.12.1, and now inplace subtraction is no longer allowed. I wonder if inplace addition with a negative number is allowed? I haven't tried it.

After fixing that, two functions that are quite similar in how they work have diverged in performance. One that used to be slow has gotten a lot faster (10s -> 2 s), and one that used to be quite fast is now a lot slower (0.25 s -> 15 s). I was looking at numbapro but this kind of stuff

I'll have to play with it though, as I imagine there's some way I'm writing my code that is hurting its performance. The only issue is that some of the users are on 0.11 and some are on 0.12. Ugh.

I would definitely encourage you to share some specifics on the GH issue tracker: https://github.com/numba/numba

# ? May 15, 2014 21:19

Dominoes: Sep 20, 2007

sharktamer posted:

I know you said you've solved it, but I'm still curious why you need to match zero or more spaces.

I'm matching lat/lon coordinates, input in various formats. ie: N 52 30.5 / N52-30.5 / n 5230.5 / N52-30 30

qntm posted:

I don't believe the hyphen needs escaping in this context. (?:-|\s*)

You're right - removed.

suffix posted:

(?:\s*|\-) is ambiguous because even when there's a hyphen there are also zero spaces in front of it.

If you're sure that there should never be both spaces and a hyphen, you can use a negative lookahead assertion to only match spaces not followed by a hyphen,

(?:\s*(?!-)|-)

Or else it may be simpler to just match spaces and then an optional hyphen,

(?:\s*-?)

Thanks for the explanation!

# ? May 15, 2014 22:34

namaste friends: Sep 18, 2004; by Smythe

What do you guys use for templating? Jinja2?

# ? May 15, 2014 23:07

accipter: Sep 12, 2003

Cultural Imperial posted:

What do you guys use for templating? Jinja2?

I typically use Mako, but I would be interested to hear what other people prefer.

# ? May 16, 2014 00:05

Haystack: Jan 23, 2005

I also prefer mako, although I consider Jinja perfectly acceptable as an alternative.

I haven't had a chance to use it, but Plim looks pretty cool, if you want to go down the complile-to-html route.

# ? May 16, 2014 03:18

Space Kablooey: May 6, 2009

Cultural Imperial posted:

What do you guys use for templating? Jinja2?

I use Jinja2, but I'm using flask, so.

# ? May 16, 2014 04:11

Adbot: ADBOT LOVES YOU

# ? May 9, 2024 12:16

diddy kongs feet: Dec 11, 2012; wanna lick the dirt out between ur chimp toes

Total programming rookie, working on an assignment in jython and I'm trying to implement input validation for two things at once, kinda stumped. Basically I'm drawing a 'room' on an image and then requesting input location to place a light inside the room. Between taking inputs and placing the light I want to check that the x/y coords are within the image and that the light is actually placed inside the room I've drawn. I can validate both of these independently just fine but can't work out a good economy for doing both appropriately, since my logic so far lets the user pass one check and then fail the first check when passing the second yet continue. I know how to wing it but I'd be repeating a lot of code and that doesn't seem right to me.
Could anyone hook me up with some reading material or just general advice on input validation economy? Struggling with this got me really interested in best practices for things like this.

# ? May 16, 2014 08:59

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python information and short questions megathread.

«‹›484 »