Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
the
Jul 18, 2004

by Cowcaster
Could you explain what exactly (_.strip() for _ in s.split(',')) is doing?

Adbot
ADBOT LOVES YOU

Space Kablooey
May 6, 2009


For every item ("_", in this case) in the iterable (s.split(',')), call method .strip() of that item and put that item in a tuple.

Utimately, it results in a tuple like this: ("115 Berry Lane", "Plano", "TX 24134")

For more explanation: https://docs.python.org/2/tutorial/datastructures.html#list-comprehensions

sharktamer
Oct 30, 2011

Shark tamer ridiculous
Wouldn't it make sense to use regex in this case? I guess splitting strings would work for such a simple case, but it's always worth considering.

code:
import re
street, city, state, zip = re.search(r'(.+), (.+), (\w{2}) (\d+)', t).groups()

QuarkJets
Sep 8, 2008

I was always told that it's better to use Python methods rather than regex, but I don't really know why

namaste friends
Sep 18, 2004

by Smythe

QuarkJets posted:

I was always told that it's better to use Python methods rather than regex, but I don't really know why

I've been told that regexps can be extremely computationally expensive. I think it just boils down to using the right tool for the right job.

BannedNewbie
Apr 22, 2003

HOW ARE YOU? -> YOSHI?
FINE, THANK YOU. -> YOSHI.

Cultural Imperial posted:

I've been told that regexps can be extremely computationally expensive. I think it just boils down to using the right tool for the right job.

I've also heard people say to avoid them because they can be difficult for other people to read

namaste friends
Sep 18, 2004

by Smythe

BannedNewbie posted:

I've also heard people say to avoid them because they can be difficult for other people to read

I don't find that's much of a problem with all the web based regexp testers out there.

QuarkJets
Sep 8, 2008

Cultural Imperial posted:

I don't find that's much of a problem with all the web based regexp testers out there.

Yeah, but if they're just Python methods then it won't be necessary to use a web-based regexp tester

And that doesn't help those of us who have to develop in an environment without internet :(

namaste friends
Sep 18, 2004

by Smythe

QuarkJets posted:

Yeah, but if they're just Python methods then it won't be necessary to use a web-based regexp tester

And that doesn't help those of us who have to develop in an environment without internet :(

Hmmm. Can a python method search for a windows SID? Like a string with the form S-1-.....

ahmeni
May 1, 2005

It's one continuous form where hardware and software function in perfect unison, creating a new generation of iPhone that's better by any measure.
Grimey Drawer
"Don't use regular expressions because they're hard to read" is such a terrible piece of advice that I can only assume it was cargo-culted from a Slashdot comment from 2003. Regexs are fine and if they get complicated you pretty fix them up with Pythons support for verbose regex via re.VERBOSE.

BeefofAges
Jun 5, 2004

Cry 'Havoc!', and let slip the cows of war.

QuarkJets posted:

And that doesn't help those of us who have to develop in an environment without internet :(

I don't think I would accept a job offer from anywhere where this was the case. It just sounds miserable.

Megaman
May 8, 2004
I didn't read the thread BUT...

QuarkJets posted:

And that doesn't help those of us who have to develop in an environment without internet :(

I'd LOVE to know what this company is, please tell us!!

the
Jul 18, 2004

by Cowcaster
edit Nm answered my own question

the fucked around with this message at 17:57 on May 12, 2014

null gallagher
Jan 1, 2014
e;fb by original poster

good jovi
Dec 11, 2000

'm pro-dickgirl, and I VOTE!

the posted:

Why isn't this working?

str.strip just strips characters off the beginning and end of the string. See: https://docs.python.org/2/library/stdtypes.html#str.strip

Try str.replace, or using re.sub for more complicated cases.

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

ahmeni posted:

"Don't use regular expressions because they're hard to read" is such a terrible piece of advice that I can only assume it was cargo-culted from a Slashdot comment from 2003. Regexs are fine and if they get complicated you pretty fix them up with Pythons support for verbose regex via re.VERBOSE.

Like many sorts of general advice it breaks down in all sorts of situations.

There are many times where using a string method or two makes your intent more clear than a regex. There are many times where you've got a dozen lines of string methods to do something you could do in a simple regex.

String methods are faster than a regex. I think there are times when a regex is as fast (I can't say I've really run in to any situations where a regex is faster) but I can't come up with any examples off the top of my head. There are even more times where the speed doesn't even matter and you should go with the method that makes your intent the clearest.

Example of string method being 10x faster (compiling the regex in this case makes hardly any difference):

http://pastebin.com/7Jybgfid

Someone is welcome to come up with an example of the reverse. I'm sure such an example exists.

Thermopyle fucked around with this message at 19:03 on May 12, 2014

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe
Basically it all adds up to "use regular expressions when they are the most suitable tool for the job, and don't use them when they are not". Which applies equally to any tool at your disposal.

the
Jul 18, 2004

by Cowcaster
I'm trying to use cssselect to strip the state URLs off this page

I tried doing a straight selection of tables, but that didn't work. Then I tried going in via the div tag and then to the table, but that isn't grabbing anything either:

Python code:
import requests
import lxml.html
import cssselect
import csv

req = requests.get('http://www.publiclibraries.com/')
root = lxml.html.fromstring(req.text)

divs = root.cssselect('div')
div = divs[2]
tables = div.cssselect('table')
I said [2] because in the source it looks like the third div tag is where that table starts, but then:

code:
In [107]: tables
Out[107]: []
I've tried other div[] spots but they don't work either. What am I missing?

edit: I also used this page as a test run for the css selector. I pasted in the entire source code from the page I want, and I wrote 'table' in the second box, and it perfectly selected the states. So why the hell isn't *my* code working?

Second edit: Figured out the problem. They have a <p> tag on Line 50 that is supposed to be a </p> tag on their page. This breaks lxml I think?

the fucked around with this message at 20:54 on May 12, 2014

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe

Thermopyle posted:

Example of string method being 10x faster (compiling the regex in this case makes hardly any difference):

For reference, Python always compiles the regex. Around 2.2 it got an internal cache, so there's no reason to use re.compile manually.

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

Suspicious Dish posted:

For reference, Python always compiles the regex. Around 2.2 it got an internal cache, so there's no reason to use re.compile manually.

Oh, sweet.

I think I might of known that at some point because years ago I used to compile them all the time and now I never do.

Plorkyeran
Mar 22, 2007

To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed

Suspicious Dish posted:

For reference, Python always compiles the regex. Around 2.2 it got an internal cache, so there's no reason to use re.compile manually.

Explicitly compiling long-lived regex objects can still be beneficial and there isn't really any downside to doing so.

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

Plorkyeran posted:

Explicitly compiling long-lived regex objects can still be beneficial

How?

No Safe Word
Feb 26, 2005


Only if you can do it before you know when it will be used at runtime. Like just compile it on initialization and then you don't have to pay hte one-time compile cost on first-use.

Plorkyeran
Mar 22, 2007

To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed

The cache is not infinite in size (unbounded caches are also known as "memory leaks"), so regex objects that should be long-lived can get bumped from the cache by large numbers of one-shot regexes. Not something that's going to be an issue very often, but more often than never.

Ignoring performance, I also think that precompiling hardcoded regexes result in trivially more readable code, i.e.:

Python code:
FOO_REGEX = re.compile('...')
...
match = FOO_REGEX.match(str)

# vs

FOO_REGEX = '...'
...
match = re.match(FOO_REGEX, str)

more like dICK
Feb 15, 2010

This is inevitable.
You can increase the cache size if you're going over 100 regexes (like a big Django site might).

QuarkJets
Sep 8, 2008

BeefofAges posted:

I don't think I would accept a job offer from anywhere where this was the case. It just sounds miserable.

It's not bad at all, and really not that rare, although I wouldn't say that it's common. Usually these kinds of environments will have Internet-connected computers sitting next to development computers, so it's not like you're screwed if you need to Google something. You won't be able to easily copy example code verbatim, however, which is probably actually a good thing

Plorkyeran
Mar 22, 2007

To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed

more like dICK posted:

You can increase the cache size if you're going over 100 regexes (like a big Django site might).
That's a viable option only if your application has a fixed number of regexes it uses and doesn't run user-supplied regexes or the like. Of course, that does describe most applications.

Maluco Marinero
Jan 18, 2001

Damn that's a
fine elephant.
So last night I made a thing, potentially dumb, potentially not.

https://bitbucket.org/MalucoMarinero/cellacceptance

Takes a running instance of LibreOffice, loads a spreadsheet and then inputs data and reads results. It's a way you can run generative tests on complex calculations, and test for correctness using a client vetted spreadsheet. Bit of a Rude Goldberg machine but maybe it'll help someone. Babby's first Python module so potentially badly packaged.

Dominoes
Sep 20, 2007

Looking for regex help! As a non-matching group in a larger regex, I'm trying to match any number of spaces, or a dash.


(?:\s*|\-) This matches any number of spaces, but returns None for all subsequent matching groups if it finds a -.

(?:\s+|\-) This matches 1 or more spaces or a -. Ie it does what I think it should do, but I really need to match 0 or more spaces.

(?:\s*|\-) When in doubt, try a question mark. Nope - this cuts the results short no matter what it finds.

Edit: figured out a solution: (?:\-|\s*) Ie do the - first... Magic.*?

Dominoes fucked around with this message at 10:24 on May 15, 2014

sharktamer
Oct 30, 2011

Shark tamer ridiculous
I know you said you've solved it, but I'm still curious why you need to match zero or more spaces.

qntm
Jun 17, 2009

Dominoes posted:

Looking for regex help! As a non-matching group in a larger regex, I'm trying to match any number of spaces, or a dash.


(?:\s*|\-) This matches any number of spaces, but returns None for all subsequent matching groups if it finds a -.

(?:\s+|\-) This matches 1 or more spaces or a -. Ie it does what I think it should do, but I really need to match 0 or more spaces.

(?:\s*|\-) When in doubt, try a question mark. Nope - this cuts the results short no matter what it finds.

Edit: figured out a solution: (?:\-|\s*) Ie do the - first... Magic.*?

I don't believe the hyphen needs escaping in this context. (?:-|\s*)

suffix
Jul 27, 2013

Wheeee!

Dominoes posted:

Looking for regex help! As a non-matching group in a larger regex, I'm trying to match any number of spaces, or a dash.


(?:\s*|\-) This matches any number of spaces, but returns None for all subsequent matching groups if it finds a -.

(?:\s+|\-) This matches 1 or more spaces or a -. Ie it does what I think it should do, but I really need to match 0 or more spaces.

(?:\s*|\-) When in doubt, try a question mark. Nope - this cuts the results short no matter what it finds.

Edit: figured out a solution: (?:\-|\s*) Ie do the - first... Magic.*?

(?:\s*|\-) is ambiguous because even when there's a hyphen there are also zero spaces in front of it.

If you're sure that there should never be both spaces and a hyphen, you can use a negative lookahead assertion to only match spaces not followed by a hyphen,

(?:\s*(?!-)|-)

Or else it may be simpler to just match spaces and then an optional hyphen,

(?:\s*-?)

ohgodwhat
Aug 6, 2005

So I went from numba 0.11.1 to 0.12.1, and now inplace subtraction is no longer allowed. I wonder if inplace addition with a negative number is allowed? I haven't tried it.

After fixing that, two functions that are quite similar in how they work have diverged in performance. One that used to be slow has gotten a lot faster (10s -> 2 s), and one that used to be quite fast is now a lot slower (0.25 s -> 15 s). I was looking at numbapro but this kind of stuff :psyduck:

I'll have to play with it though, as I imagine there's some way I'm writing my code that is hurting its performance. The only issue is that some of the users are on 0.11 and some are on 0.12. Ugh.

BigRedDot
Mar 6, 2008

ohgodwhat posted:

So I went from numba 0.11.1 to 0.12.1, and now inplace subtraction is no longer allowed. I wonder if inplace addition with a negative number is allowed? I haven't tried it.

After fixing that, two functions that are quite similar in how they work have diverged in performance. One that used to be slow has gotten a lot faster (10s -> 2 s), and one that used to be quite fast is now a lot slower (0.25 s -> 15 s). I was looking at numbapro but this kind of stuff :psyduck:

I'll have to play with it though, as I imagine there's some way I'm writing my code that is hurting its performance. The only issue is that some of the users are on 0.11 and some are on 0.12. Ugh.

I would definitely encourage you to share some specifics on the GH issue tracker: https://github.com/numba/numba

Dominoes
Sep 20, 2007

sharktamer posted:

I know you said you've solved it, but I'm still curious why you need to match zero or more spaces.
I'm matching lat/lon coordinates, input in various formats. ie: N 52 30.5 / N52-30.5 / n 5230.5 / N52-30 30

qntm posted:

I don't believe the hyphen needs escaping in this context. (?:-|\s*)
You're right - removed.

suffix posted:

(?:\s*|\-) is ambiguous because even when there's a hyphen there are also zero spaces in front of it.

If you're sure that there should never be both spaces and a hyphen, you can use a negative lookahead assertion to only match spaces not followed by a hyphen,

(?:\s*(?!-)|-)

Or else it may be simpler to just match spaces and then an optional hyphen,

(?:\s*-?)
Thanks for the explanation!

namaste friends
Sep 18, 2004

by Smythe
What do you guys use for templating? Jinja2?

accipter
Sep 12, 2003

Cultural Imperial posted:

What do you guys use for templating? Jinja2?

I typically use Mako, but I would be interested to hear what other people prefer.

Haystack
Jan 23, 2005





I also prefer mako, although I consider Jinja perfectly acceptable as an alternative.

I haven't had a chance to use it, but Plim looks pretty cool, if you want to go down the complile-to-html route.

Space Kablooey
May 6, 2009


Cultural Imperial posted:

What do you guys use for templating? Jinja2?

I use Jinja2, but I'm using flask, so.

Adbot
ADBOT LOVES YOU

diddy kongs feet
Dec 11, 2012

wanna lick the dirt out between ur chimp toes
Total programming rookie, working on an assignment in jython and I'm trying to implement input validation for two things at once, kinda stumped. Basically I'm drawing a 'room' on an image and then requesting input location to place a light inside the room. Between taking inputs and placing the light I want to check that the x/y coords are within the image and that the light is actually placed inside the room I've drawn. I can validate both of these independently just fine but can't work out a good economy for doing both appropriately, since my logic so far lets the user pass one check and then fail the first check when passing the second yet continue. I know how to wing it but I'd be repeating a lot of code and that doesn't seem right to me.
Could anyone hook me up with some reading material or just general advice on input validation economy? Struggling with this got me really interested in best practices for things like this.

  • Locked thread