Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
Reformed Pissboy
Nov 6, 2003

Modern Pragmatist posted:

I have 10,000 objects or so and I basically want to group them into categories based on the value of a specific property. The property that I care about is a list of 6 floats. My initial thought on how to approach this was to create a dict() where the keys are the 6 element list, but this isn't possible. What is the best approach for this?

code:
a.prop = [0 1.1 2.2 3.3 4.4 5.5]
b.prop = [5.5 4.4 3.3 2.2 1.1 0]
c.prop = [0 1.1 2.2 3.3 4.4 5.5]

# ideal result:
[[a,c],[b,]]

itertools.groupby can hook you up if you want to get fancy.

code:
# need to sort objects by how you want to group them -- input order matters to groupby
keyfunction = lambda item: item.prop
my_objects = sorted( [a,b,c], key=keyfunction )
final_grouping = []
for key, group in itertools.groupby( my_objects, keyfunction ):
    print "Objects with prop:", key
    for obj in group:
        print "\t", obj
    print
    final_grouping.append( list(group) )
# final_grouping = [[a,c], [b]]

Adbot
ADBOT LOVES YOU

RobotEmpire
Dec 8, 2007
Over the next week or two I will be going to some technical interviews for a few web startups. I am focusing my preparation a bit on the "gotchas" or unexpected (but documented and proper) behavior that Python can exhibit in some cases, e.g. scope & closure issues.

I've been reviewing some of the links/comments at this StackOverflow question which has been good. I'm looking, however, for more ways to prepare. More articles to read, good StackOverflow questions, and so forth. I do not expect that these interviews will consist of super-trivial questions like FizzBuzz, nor do I expect that they'll consist of in-depth, hardcore CS theory questions. I do expect they'll consist of challenging questions/programming problems involving Python in general, as well as web development in general.

Does anyone have experience interviewing at established/well-funded web-centric startups (especially in the Bay Area)? I'm definitely not looking for "cheats" but any light you might shine on the best way to prepare would be appreciated. What topics I ought to spend the most time studying based on your experience, and so forth. I should say that I am definitely not an expert anything. I have written some code in my life and do okay with it, but I'd not consider myself an expert in any domain of knowledge. I am pretty proficient with Python and understand in general how the web works; generally proficient with command line Linux; can provision servers;can perform CRUD-related tasks with pg, mysql, etc., etc. All the very basic things you'd expect someone on your team to be able to perform, but an expert at none. Would love to get some insight while prepping for these very important (to me) interviews.

Lurchington
Jan 2, 2003

Forums Dragoon
I found that it's relevant and important to see if candidates understand the differences between threads and processes, as part of an good grounding in concurrency:
http://jessenoller.com/code/pycon_jnoller_multiprocessing.pdf
It's definitely something that web-centric groups are going to find useful and kind of a gotcha.

I'm on the other end of the country from where you're looking (DC), but we also like seeing an understanding of the core CS concepts that CPython is abstracting:
- What do some of the python primitives look like (e.g. a list is an array of pointers to PyObject)
- How does python do garbage collection
- Sockets/SSL in general sense you're looking web-centric

So it's not hardcore CS, but a level beyond a Python Module of the Week (http://www.doughellmann.com/PyMOTW/ which is excellent) in terms of understanding computing versus Python specifically.

For us it's also really important to get an idea of where you're at on software testing (unittests/systems/selenium if they do that) and sort of understanding the systems engineering side of things, but we're small enough to force everyone to be proficient in a couple of disciplines. If this seems useful, and your company is still using python 2.6 and before (not unheard of) understanding and pimping unittest2 (python2.7/3.2 stdlib unittest module backported) is worthwhile and could make a good impression:
http://www.voidspace.org.uk/python/articles/unittest2.shtml

Johnny Cache Hit
Oct 17, 2011

RobotEmpire posted:

Does anyone have experience interviewing at established/well-funded web-centric startups (especially in the Bay Area)? I'm definitely not looking for "cheats" but any light you might shine on the best way to prepare would be appreciated. What topics I ought to spend the most time studying based on your experience, and so forth. I should say that I am definitely not an expert anything. I have written some code in my life and do okay with it, but I'd not consider myself an expert in any domain of knowledge. I am pretty proficient with Python and understand in general how the web works; generally proficient with command line Linux; can provision servers;can perform CRUD-related tasks with pg, mysql, etc., etc. All the very basic things you'd expect someone on your team to be able to perform, but an expert at none. Would love to get some insight while prepping for these very important (to me) interviews.

Note that my advice may not extend to Bay Area startups, but it's served me well so far, and it seems to hold true to what I've heard from friends who have interviewed at your type of company.

From what I've seen and heard, the face to face interview is the least likely point at which you will land a job -- the employer has already reviewed your resume, which should have enough information to convince them that you are worth hiring.

Do you have an up to date personal webpage? If not, get one set up. It doesn't have to be a blog or anything; a tasteful presentation of your resume with clean CSS and HTML is more than enough. Even more important: your webpage and your resume must direct people to your Github. If you don't have a Github account, start programming now. Find an open source project that interests you and contribute. I've talked with a handful of people who say that being able to see this body of work is the #1 thing they look for.

That aside, back to your question: the Python wiki has a good listing of documented Python warts (http://wiki.python.org/moin/PythonWarts), but I'd spend my time focusing on algorithms and techniques instead. If you know enough to say "these warts are annoying because I run into them when I do {x, y, z}", you'll probably know more than enough.

Also, I'd recommend reviewing Steve Yegge's article on 5 essential phone screen questions and ensure you aren't missing anything big there.

Beyond that, be sure you know something about the nuts n' bolts of software development -- unit testing, version control systems, how to write a good bug report, etc.

Good luck :cheers:

RobotEmpire
Dec 8, 2007

Lurchington posted:

I found that it's relevant and important to see if candidates understand the differences between threads and processes, as part of an good grounding in concurrency:
http://jessenoller.com/code/pycon_jnoller_multiprocessing.pdf
It's definitely something that web-centric groups are going to find useful and kind of a gotcha.

Excellent, I was looking for some beefing up on this exact topic.

quote:

I'm on the other end of the country from where you're looking (DC), but we also like seeing an understanding of the core CS concepts that CPython is abstracting:
- What do some of the python primitives look like (e.g. a list is an array of pointers to PyObject)
- How does python do garbage collection
- Sockets/SSL in general sense you're looking web-centric

Just out of curiosity, why do you ask about something as low-level as pointers for web dev? Smoke test for candidates?

quote:

So it's not hardcore CS, but a level beyond a Python Module of the Week (http://www.doughellmann.com/PyMOTW/ which is excellent) in terms of understanding computing versus Python specifically.

This is important to me but sadly the area in which I lag the most. I'm addressing it, though. :)

quote:

For us it's also really important to get an idea of where you're at on software testing (unittests/systems/selenium if they do that) and sort of understanding the systems engineering side of things, but we're small enough to force everyone to be proficient in a couple of disciplines. If this seems useful, and your company is still using python 2.6 and before (not unheard of) understanding and pimping unittest2 (python2.7/3.2 stdlib unittest module backported) is worthwhile and could make a good impression:
http://www.voidspace.org.uk/python/articles/unittest2.shtml

Yeah we're still on 2.6 at work but I've been using 2.7 for a long time. I've got bare proficiency with unittest2. That is, I can write tests, and when to write them in the most common cases. But I definitely am not at the point where I write tests for every component I write. I'm much more likely to drop into pdb to troubleshoot when there's a problem which is... reactive. Gap in my experience/knowledge when it comes to unit tests, for sure.

--

Kim Jong III posted:

From what I've seen and heard, the face to face interview is the least likely point at which you will land a job -- the employer has already reviewed your resume, which should have enough information to convince them that you are worth hiring.

Do you have an up to date personal webpage? If not, get one set up. It doesn't have to be a blog or anything; a tasteful presentation of your resume with clean CSS and HTML is more than enough. Even more important: your webpage and your resume must direct people to your Github. If you don't have a Github account, start programming now. Find an open source project that interests you and contribute. I've talked with a handful of people who say that being able to see this body of work is the #1 thing they look for.

These are all phone cons with CTOs, senior devs, etc. I'm not in the Bay Area. More to your point, my resume is light; I've gotten these interviews more on the strength of my personal projects (on github) and a friend here and there putting in a word (based on the work they've seen me do).

quote:

That aside, back to your question: the Python wiki has a good listing of documented Python warts (http://wiki.python.org/moin/PythonWarts), but I'd spend my time focusing on algorithms and techniques instead. If you know enough to say "these warts are annoying because I run into them when I do {x, y, z}", you'll probably know more than enough.

Also, I'd recommend reviewing Steve Yegge's article on 5 essential phone screen questions and ensure you aren't missing anything big there.

Beyond that, be sure you know something about the nuts n' bolts of software development -- unit testing, version control systems, how to write a good bug report, etc.

Good luck :cheers:

Thanks. I've read Steve Yegge's article as well as Joel Spolsky's article on phone interviews as well. Frankly I've only been programming seriously for a year or so; professionally less than that. The thing I'm most confident in is analysis & breaking a problem down to addressable components, then explaining that. Less so low-level computing knowledge.

RobotEmpire fucked around with this message at 04:07 on Nov 13, 2011

Johnny Cache Hit
Oct 17, 2011

RobotEmpire posted:

These are all phone cons with CTOs, senior devs, etc. I'm not in the Bay Area. More to your point, my resume is light; I've gotten these interviews more on the strength of my personal projects (on github) and a friend here and there putting in a word (based on the work they've seen me do).

Thanks. I've read Steve Yegge's article as well as Joel Spolsky's article on phone interviews as well. Frankly I've only been programming seriously for a year or so; professionally less than that. The thing I'm most confident in is analysis & breaking a problem down to addressable components, then explaining that. Less so low-level computing knowledge.

I forgot to mention that the #1 thing you need to do to get hired is to get a reference from someone that already works there :c00l:

Did you get a degree in CS?

Based on your experience, reviewers should be interviewing you for an entry level programming position. If you can decompose a problem well, you think like a programmer, and you should do pretty well in an interview. Nobody sane is going to expect you to come in and immediately start coding at a high level.

Your sysadmin skills are helpful icing on the cake.

Lurchington
Jan 2, 2003

Forums Dragoon

RobotEmpire posted:


Just out of curiosity, why do you ask about something as low-level as pointers for web dev? Smoke test for candidates?


Mainly for the reasons outlined in this video:
http://blip.tv/pycon-us-videos-2009-2010-2011/abstraction-as-leverage-1966769

which references this article which I haven't personally read:
http://www.joelonsoftware.com/articles/LeakyAbstractions.html

The idea is that writing effective code requires an understanding a couple of levels below an abstraction.

The array example is relevant for web programmers when you consider that you probably don't want to make a list of every record in your database to pass around, since that'd require the objects to be placed in memory and that'd be a big hit. You'd instead do some sort of lazy evaluation.

For someone who understands what's happening underneath python's abstractions, it's intuitive why you wouldn't want to a make a list. But for someone who just uses python, they'd just have to remember that in some situations (it's not always clear) you'd want to avoid making a list.

Similarly, understanding how Python handles arrays makes it easier to remember the differences with other technologies. The fact that Python array are pointers, means that you can have different types of elements in a list, which wouldn't necessarily be true for other tools you may need day to day.

RobotEmpire
Dec 8, 2007

Kim Jong III posted:

I forgot to mention that the #1 thing you need to do to get hired is to get a reference from someone that already works there :c00l:

Did you get a degree in CS?

Based on your experience, reviewers should be interviewing you for an entry level programming position. If you can decompose a problem well, you think like a programmer, and you should do pretty well in an interview. Nobody sane is going to expect you to come in and immediately start coding at a high level.

Your sysadmin skills are helpful icing on the cake.

Yeah I expect entry-level if I get hired out there. It will be an income hit, since I get paid very well where I am. It would be considered a good starting salary in San Francisco, but here it is definitely baller money. I will take a decrease in purchasing power (I predict a small bump in cash salary which will = big drop in purchasing power due to COL considerations) for the privilege of being surrounded by much smarter people. I've had the misfortune to witness already in my short career the

I don't have a degree in CS. I've been attending classes for about a year and a half on a part-time basis for it. Everything I've learned I've taught myself through some pretty intense self-study. I have worked hard to not identify/pigeonhole/limit myself as "a Python programmer" but as a general-purpose developer. I'm definitely not the typical start-up candidate, a few years older with a much different experience set. That can be a positive and a negative in a few ways :)

Thanks for the words of advice & assurance.

duck monster
Dec 15, 2004

RobotEmpire posted:

Over the next week or two I will be going to some technical interviews for a few web startups. I am focusing my preparation a bit on the "gotchas" or unexpected (but documented and proper) behavior that Python can exhibit in some cases, e.g. scope & closure issues.

I've been reviewing some of the links/comments at this StackOverflow question which has been good. I'm looking, however, for more ways to prepare. More articles to read, good StackOverflow questions, and so forth. I do not expect that these interviews will consist of super-trivial questions like FizzBuzz, nor do I expect that they'll consist of in-depth, hardcore CS theory questions. I do expect they'll consist of challenging questions/programming problems involving Python in general, as well as web development in general.

Does anyone have experience interviewing at established/well-funded web-centric startups (especially in the Bay Area)? I'm definitely not looking for "cheats" but any light you might shine on the best way to prepare would be appreciated. What topics I ought to spend the most time studying based on your experience, and so forth. I should say that I am definitely not an expert anything. I have written some code in my life and do okay with it, but I'd not consider myself an expert in any domain of knowledge. I am pretty proficient with Python and understand in general how the web works; generally proficient with command line Linux; can provision servers;can perform CRUD-related tasks with pg, mysql, etc., etc. All the very basic things you'd expect someone on your team to be able to perform, but an expert at none. Would love to get some insight while prepping for these very important (to me) interviews.

It depends on what your interviewing for. If your going for a straight out code monkey job, then I guess you want to be able to display flexibility and the ability to learn fast on the job. If your going for something a bit higher up then you need to be displaying people skill and an architectural rather than detail focus on software design.

Like at my stage in life, if I go to a job interview and they want to test me on code semantics, I'll usually tell them they are interviewing the wrong guy (and if they really are looking for someone at my level possibly I'd be concerned that the company doesnt have its poo poo together or doesnt understand what I do.) , since I'm usually doing project management and business analysis where an understanding of "big picture" is more important than the ability to be a kick-rear end regex wrangler. But ten years ago, that would be pretty important to ascertain what my skills where.

None the less, it doesnt hurt to have an understanding of broader software life-cycle issues, like the various agile buzz-word concepts, keeping read up on software lifecycle concepts, and unit testing, CI, etc etc etc. And it certainly helps, but its probably something you can only learn on the job, developing a skill at getting and keeping clients etc. I guess the secret is "be confident".

I don't know if theres much of a short cut for demonstrating versatile coding skills except for writing code I'm afraid.

duck monster
Dec 15, 2004

Lurchington posted:

Mainly for the reasons outlined in this video:
http://blip.tv/pycon-us-videos-2009-2010-2011/abstraction-as-leverage-1966769

which references this article which I haven't personally read:
http://www.joelonsoftware.com/articles/LeakyAbstractions.html

The idea is that writing effective code requires an understanding a couple of levels below an abstraction.

The array example is relevant for web programmers when you consider that you probably don't want to make a list of every record in your database to pass around, since that'd require the objects to be placed in memory and that'd be a big hit. You'd instead do some sort of lazy evaluation.

For someone who understands what's happening underneath python's abstractions, it's intuitive why you wouldn't want to a make a list. But for someone who just uses python, they'd just have to remember that in some situations (it's not always clear) you'd want to avoid making a list.

Similarly, understanding how Python handles arrays makes it easier to remember the differences with other technologies. The fact that Python array are pointers, means that you can have different types of elements in a list, which wouldn't necessarily be true for other tools you may need day to day.

Actually theres an interestimg point made here. I used to do a lot of network coding when I was doing embedded stuff. Usually just taking a network library and using it as intended. But sometimes I'd get wierd behavior I simply didn't really understand or know how to fix. One day I was talking to our network guy about it (who didnt know a thing about coding) and he suggested we sit down with some packet capture software and see what it was doing. Turns out I was really not using TCP properly in a fairly fundamental way meaning it was only *sort* of working, and as he explained what all the different fields in the packet meant and how the TCP handshake process worked, it dawned on us the actual library was doing some really wierd poo poo at a low level that meant I needed to get in at a lower level and fix some poo poo up to repair its absurd abuse of packet ordering fields (this was on embedded hardware with a proprietry OS). It was probably the most important lesson in network coding I had ever had, and it was from a non-coder, because it taught me what was happening behind the abstraction.

FoiledAgain
May 6, 2007

I have a project too large to do by hand, so I want do it with Python and I'm looking for some help on where to start. There is a database of languages here, and I want to be able to verify, for each language, whether or not there exists a non-stub Wikipedia article about that language. There are some further complications too (like some languages have many names) but that's the basic need.

I've never used Python for web stuff before, so I'm not sure where to begin. In fact I've never done any web stuff before, but as I say it's too big to do by hand, so I'm willing to put in time to program it.

Johnny Cache Hit
Oct 17, 2011

FoiledAgain posted:

I have a project too large to do by hand, so I want do it with Python and I'm looking for some help on where to start. There is a database of languages here, and I want to be able to verify, for each language, whether or not there exists a non-stub Wikipedia article about that language. There are some further complications too (like some languages have many names) but that's the basic need.

I've never used Python for web stuff before, so I'm not sure where to begin. In fact I've never done any web stuff before, but as I say it's too big to do by hand, so I'm willing to put in time to program it.

A few thoughts:

If you don't have access to the raw language database, you'll have to do screen scraping on your database of languages. I've used beautifulsoup for this before and it is absolutely awesome.

Alternatively, it might be even faster to just copy the language names over and strip them down in your editor -- this is a 5-10 minute Emacs job at most with the right macros, and you won't have to fool around with screen scraping. In general, if you're going to do this more than once, you might want to code it.

Once you've got a list of actual language names extracted from your database, you will have to fetch the pages from Wikipedia. urllib2 will work well for this.

You've got three cases I can see:

1. The language doesn't have any article on Wikipedia -- you're lucky here, because if you request an article that does not exist on Wikipedia you will actually get a 404 back (just checked with curl). Just catch URLErrors and be sure it's a 404, and you can mark that language as a no.

2. The language has a stub-only article. You can probably get away with just searching through the returned HTML for "article is a stub. You can help Wikipedia by expanding it." -- I think most Wikipedia stub templates have that verbage.

3. The language has a full Wikipedia article. If it's not 1 nor 2, it's 3, so mark that language as a yes :)

This is quite literally the quick and dirty way of finishing your project. There might be cleaner ways to do this, but just parsing a bunch of strings is easy :shobon:

Just don't be an rear end -- throttle your bot to be sure it doesn't request all ten thousand pages as fast as it can.

FoiledAgain
May 6, 2007

Kim Jong III posted:

A few thoughts:

Thanks! This is very helpful.

quote:

Just don't be an rear end -- throttle your bot to be sure it doesn't request all ten thousand pages as fast as it can.

Good to know! This might not be something I would think of.

maskenfreiheit
Dec 30, 2004
Edit: Double Post

maskenfreiheit fucked around with this message at 20:36 on Mar 13, 2017

NadaTooma
Aug 24, 2004

The good thing is that everyone around you has more critical failures in combat, the bad thing is - so do you!

GregNorc posted:

I'm not really sure how I can fix that, does anyone have any ideas?

It looks like the code wants "f_exp" to be a list of expected frequencies, the same length of the "f_obj" list. Try this for the last three lines of the code:

code:
observed_freqs = range(25)
expected_freqs = [25/2.0 for x in observed_freqs]
print lchisquare(observed_freqs, expected_freqs)
You're passing in "f_exp" as 25/2, which under Python 2.x, rounds down to an integer value of 12, and integers of course can't be indexed, hence the crash on line 15. Unfortunately, the pasted code isn't a complete code sample since "chisqprob" isn't defined, so I can't test this for you.

Are you intending 25/2 to evaluate to a float rather than an int? Are you running this under Python 2 or 3?

A couple of notes on style. This code clobbers the "list" built-in on line 19, and you don't want to do that. Also, the "f_obj" and "f_exp" in the function are horrible choices for variable names. Something like "observed_freqs" and "expected_freqs" would be a lot cleaner.

(Edit: Fixed my answer since I read the code wrong the first time.)

NadaTooma fucked around with this message at 01:57 on Nov 16, 2011

maskenfreiheit
Dec 30, 2004
Edit: doublepost

maskenfreiheit fucked around with this message at 01:24 on Mar 13, 2017

Johnny Cache Hit
Oct 17, 2011

GregNorc posted:

I'm on 2.6.1

I'm afraid to modify anything too much since it's doing an analysis I don't quite fully understand the inner workings of. What would I wanna change, taking into account 2.6.1?

Nothing makes a mathematician :gonk: more than hearing someone trying to apply statistics while admitting uncertainty. You sure chi-squared is the right test?

I know it's not Python, but if you want to do statistical calculations, look at R. It does chi-squared without you having to muck around with writing the function, and has just about everything baked in that you would need.

NadaTooma
Aug 24, 2004

The good thing is that everyone around you has more critical failures in combat, the bad thing is - so do you!

GregNorc posted:

I'm afraid to modify anything too much since it's doing an analysis I don't quite fully understand the inner workings of. What would I wanna change, taking into account 2.6.1?

I'm not a Statistics guy, so all I can tell you is that the cut-and-pasted code is broken, not what it *should* be doing. Kim Jong Ill is right in that you're better off using something proven, rather than borrowing code where you don't fully understand how it should behave. I wish there was a nicer-sounding way to phrase that, but you get the idea.

fritz
Jul 26, 2003

GregNorc posted:

So I found some code online to do a chi-squared analysis, but I can't get it to run:

Do you understand what a chi-squared test is?

accipter
Sep 12, 2003
I am looking for a simple method to leverage multiple computers around the lab.

I need to perform about 130k calculations each of which takes about 20 minutes. If you do the math on that you end up with just under 5 years. When I first started on this project, I was working with 1/30th of the data, so I just used the multiprocessing module to run the perform the calculation on 8 cores on my workstation which took just over a week. Since I have two idle workstations, I thought I could send jobs out to these additional machines and speed up the process.
I thought I remember seeing a nice solution with a live linux CD that could be used to create a simple cluster for problems like this, but google came up with nothing productive. There isn't much that would need to be sent prior to starting a calculation (maybe around 6400 floats).

Does anyone have any suggestions?

Johnny Cache Hit
Oct 17, 2011

accipter posted:

I am looking for a simple method to leverage multiple computers around the lab.

I need to perform about 130k calculations each of which takes about 20 minutes. If you do the math on that you end up with just under 5 years. When I first started on this project, I was working with 1/30th of the data, so I just used the multiprocessing module to run the perform the calculation on 8 cores on my workstation which took just over a week. Since I have two idle workstations, I thought I could send jobs out to these additional machines and speed up the process.
I thought I remember seeing a nice solution with a live linux CD that could be used to create a simple cluster for problems like this, but google came up with nothing productive. There isn't much that would need to be sent prior to starting a calculation (maybe around 6400 floats).

Does anyone have any suggestions?

I've used Gearman quite successfully for this type of task.

I mean, you could hack your own solution up with pickle/sockets, but it will be terrible and you will spend just as much time securing and debugging it as you'd spend with Gearman.

karl fungus
May 6, 2011

Baeume sind auch Freunde
I've just recently got into Python, and I want to write a random paragraph generator. The idea is that there are pre-defined segments to each sentence, and the segments are chosen randomly.

code:
import random

segment1 = random.choice(['a','b','c'])
segment2 = random.choice(['d','e','f'])
segment3 = random.choice(['g.','h.','i.'])
sentence1 = [segment1, segment2, segment3]
sentence1 = " ".join(sentence1)

segment1 = random.choice(['j','k','l'])
segment2 = random.choice(['m','n','o'])
segment3 = random.choice(['p.','q.','r.'])
sentence2 = [segment1, segment2, segment3]
sentence2 = " ".join(sentence2)

print sentence1, sentence2
That's the gist of it. Of course, where the letters currently are, I'd have phrases or words. I think it looks absolutely terrible. Thus I have some questions:

Is there any way to automatically insert a period at the end of each sentence?
Should I be using different variables for every segment instead of reusing them?
Is there a more condensed way to do this?
Am I doing something horribly wrong?

BeefofAges
Jun 5, 2004

Cry 'Havoc!', and let slip the cows of war.

This would make more sense as a function that takes arbitrary lists, picks words from them, stitches them together into sentences, then appends a period and returns the string. Right now you're just copying and pasting code, which is a sure sign that you should be writing a function.

ynohtna
Feb 16, 2007

backwoods compatible
Illegal Hen
What you're implementing, karl fungus, is what's known as a context-free grammar. Look around and you'll find plenty of reference material and the following page has some short sample Python code:

http://eli.thegreenplace.net/2010/01/28/generating-random-sentences-from-a-context-free-grammar/

ynohtna fucked around with this message at 02:45 on Nov 21, 2011

karl fungus
May 6, 2011

Baeume sind auch Freunde
Hey, thanks! That looks pretty helpful, I'll look through all of it.

BeefofAges posted:

This would make more sense as a function that takes arbitrary lists, picks words from them, stitches them together into sentences, then appends a period and returns the string. Right now you're just copying and pasting code, which is a sure sign that you should be writing a function.

I've tried reading about functions, but they don't make a lot of sense to me. Could you give me an example of a function that deals with strings of text? So far all the examples of functions I've seen involve manipulation of numbers in some way.

Brecht
Nov 7, 2009

karl fungus posted:

Hey, thanks! That looks pretty helpful, I'll look through all of it.


I've tried reading about functions, but they don't make a lot of sense to me. Could you give me an example of a function that deals with strings of text? So far all the examples of functions I've seen involve manipulation of numbers in some way.
code:
import random

def give_me_the_subject():
	return random.choice(["A cat", "The bird", "A man"])

def give_me_the_verb():
	return random.choice(["pounced on", "flew over", "walked to"])

def give_me_the_predicate():
	return random.choice(["the mouse", "the house", "the ground"])

def give_me_a_sentence():
	subject = give_me_the_subject()
	verb = give_me_the_verb()
	predicate = give_me_the_predicate()
	return " ".join([subject, verb, predicate]) + "."

def give_me_a_paragraph():
	first_sentence = give_me_a_sentence()
	second_sentence = give_me_a_sentence()
	third_sentence = give_me_a_sentence()
	return " ".join([first_sentence, second_sentence, third_sentence])

print give_me_a_paragraph()
Next assignment: make the give_me_a_paragraph() function more compact, so it contains only one line. Then, modify it so it constructs a paragraph with a variable number of sentences, specified when you call the function.

karl fungus
May 6, 2011

Baeume sind auch Freunde

Brecht posted:

Next assignment: make the give_me_a_paragraph() function more compact, so it contains only one line. Then, modify it so it constructs a paragraph with a variable number of sentences, specified when you call the function.

I got the first part:

code:
def give_me_a_paragraph():
	return " ".join([give_me_a_sentence(), give_me_a_sentence(), give_me_a_sentence()])
However, I'm absolutely clueless as to how I'd do the second one. I tried playing around with repeat() and a while loop, but neither of those worked. I think in one of my attempts I tried to set x to be a random number between 1 and 20, and then I tried to repeat give_me_a_sentence() by multiplying it by x, but of course that failed because it just repeated the same generated sentence over and over again.

Gothmog1065
May 14, 2009
Is there a common separator programmers use? For example, I will have a plaintext file that will have lines that a script I'm writing will perform, and each line will have different functions (IE: open web, url, new/tab) etc.

So an example file will read something like this:

web,forums.somethingawful.com,2
map,r:,\\network drive\
web,https://www.anotherexample.com

Is there something that programmers generally use, or just whatever tickles my fancy?

BeefofAges
Jun 5, 2004

Cry 'Havoc!', and let slip the cows of war.

Use whatever you want, just be consistent. Commas are fine as long as you don't expect to encounter commas in the strings you're writing to the file.

vikingstrike
Sep 23, 2007

whats happening, captain
So, I'm working on a project that requires me to fill out a Javascript form over and over and over again (basically, I enter a patent number and it gives me results). I would love to be able to automate this, so I can regex the results and store what I need in an output file. I checked out the mechanize module, but it has no Javascript support. Do any of you have experience doing things like this? Any modules that I should check out? I can post the source of the website, if needed.

Johnny Cache Hit
Oct 17, 2011

vikingstrike posted:

So, I'm working on a project that requires me to fill out a Javascript form over and over and over again (basically, I enter a patent number and it gives me results). I would love to be able to automate this, so I can regex the results and store what I need in an output file. I checked out the mechanize module, but it has no Javascript support. Do any of you have experience doing things like this? Any modules that I should check out? I can post the source of the website, if needed.

The source of the website might be helpful.

I'm guessing the Javascript does an AJAX POST somewhere and puts the data back into the page.

You might be able to POST directly to the AJAX endpoint using urllib2 or restkit or the like.

You've then got some options on the output... basic string parsing, regex, or use BeautifulSoup if it's nasty HTML and you need good screen scraping.

As I mentioned before the last time I posted about batch-scraping webpages: don't be an rear end in a top hat to the site owner. Throttle your bot. I always try to be extra nice and give at least 10-15 seconds between requests.

vikingstrike
Sep 23, 2007

whats happening, captain

Kim Jong III posted:

The source of the website might be helpful.

I'm guessing the Javascript does an AJAX POST somewhere and puts the data back into the page.

You might be able to POST directly to the AJAX endpoint using urllib2 or restkit or the like.

You've then got some options on the output... basic string parsing, regex, or use BeautifulSoup if it's nasty HTML and you need good screen scraping.

As I mentioned before the last time I posted about batch-scraping webpages: don't be an rear end in a top hat to the site owner. Throttle your bot. I always try to be extra nice and give at least 10-15 seconds between requests.

Yeah, I don't want to be an rear end in a top hat at all. Actually most of the time I scrape websites, I use 15-20 second buffers.

Here's a paste bin of the source. Sorry for the large spaces, it's how I copied it off the page. I need to be able to include text in the search box, check a box and then change the search date range.

http://pastebin.com/T4c8Ch5P

Johnny Cache Hit
Oct 17, 2011

vikingstrike posted:

Yeah, I don't want to be an rear end in a top hat at all. Actually most of the time I scrape websites, I use 15-20 second buffers.

Here's a paste bin of the source. Sorry for the large spaces, it's how I copied it off the page. I need to be able to include text in the search box, check a box and then change the search date range.

http://pastebin.com/T4c8Ch5P

Oof, that's a ton of code.

I see LexisNexis so I'm assuming everything is paywalled, or I'd say link us the webpage itself & let's try it out. I don't see where onSearch() is defined, which looks to be the function that kicks off the search, so that's probably buried in one of the many referenced JS files.

My suggestion here - rather than trying to pore over hundreds of lines of nasty HTML, try to capture the submit at the source. For this, Firebug is your friend. Install it, go to your page, enable the Net panel, and submit the form. I'm betting you'll see a POST in the XHR section. That POST will probably have a ton of data, but will have your searched text, checked box, and date range.

I'd start there. See if you can deconstruct the POST, and reconstruct it without using the webpage. curl is really good at this -- use -X POST to send a POST (if that's the right verb, of course!), -d for data, and if the site authenticates with cookies, -b is your friend.

vikingstrike
Sep 23, 2007

whats happening, captain

Kim Jong III posted:

Oof, that's a ton of code.

I see LexisNexis so I'm assuming everything is paywalled, or I'd say link us the webpage itself & let's try it out. I don't see where onSearch() is defined, which looks to be the function that kicks off the search, so that's probably buried in one of the many referenced JS files.

My suggestion here - rather than trying to pore over hundreds of lines of nasty HTML, try to capture the submit at the source. For this, Firebug is your friend. Install it, go to your page, enable the Net panel, and submit the form. I'm betting you'll see a POST in the XHR section. That POST will probably have a ton of data, but will have your searched text, checked box, and date range.

I'd start there. See if you can deconstruct the POST, and reconstruct it without using the webpage. curl is really good at this -- use -X POST to send a POST (if that's the right verb, of course!), -d for data, and if the site authenticates with cookies, -b is your friend.

Thanks for the information. This is exactly the type of feedback I was looking for! Just hope I can get it working :smith:

Victor Vermis
Dec 21, 2004


WOKE UP IN THE DESERT AGAIN
I have zero programming experience but I'm trying my darndest to pick it up using How to Think Like a Computer Scientist

I'm somewhere in the fourth chapter and have found everything to be very accessible so far.. except I think I've missed where/if a few things are explained.

The text uses " from filename import * " a lot. Whenever I try this in the shell, I always get "ImportError: No Module named filename " and attempting to use a function or print something that was defined within the script just gives a "not defined" error.

I save the scripts as .py in the folder titled "Scripts". They work as intended when selecting "Run Module" from the drop-down menu. I'm baffled, hopefully I'm just sticking them in the wrong spot or neglecting to add some kind of directions to locating the script within the script itself?

Quick example for clarity-
If the contents of my script are:
code:
message = "Hi!"
Which is then saved as hitest.py in the scripts folder, shouldn't I be able to import it using:
code:
from hitest import *
?

ynohtna
Feb 16, 2007

backwoods compatible
Illegal Hen
Do you have a __init__.py file in your scripts folder?

Victor Vermis
Dec 21, 2004


WOKE UP IN THE DESERT AGAIN

ynohtna posted:

Do you have a __init__.py file in your scripts folder?

Nope! Googled that, and slipped an empty file titled __init__.py into the folder my scripts are in but still no luck.

Looking through google matches explaining __init___ , they all gloss over the process of saving a script.
"You can make a script! Type your code, save it as .py, and then use the import command.."
There's no explanation of where and why you would stash these scripts or what else needs to be done to ensure Step 1 (create) works with Step 2 (import).

As an experiment, tried importing a script that was part of the install, and also located in the Scripts folder.
code:
from byext import *
And received the same "Error no module named byext".

FredMSloniker
Jan 2, 2008

Why, yes, I do like Kirby games.
I've written a short script to filter a text file (the '+who' command on a MUSH) and return just the player names in it. It works—I just drag a text file containing the output onto it, and it overwrites the file with the results—but I'm sure there's a more elegant way to do it. (That unnecessarily redundant assignment of outputfilename is something I know is bad.) Suggestions?

code:
import fileinput

outputtext = []

for line in fileinput.input():
    outputtext = outputtext + [line[13:31].strip()]
    outputfilename=fileinput.filename()

outputfile = open(outputfilename,'w')
for line in outputtext:
    outputfile.write(line+'\n')

outputfile.close()

vikingstrike
Sep 23, 2007

whats happening, captain

Victor Vermis posted:

Nope! Googled that, and slipped an empty file titled __init__.py into the folder my scripts are in but still no luck.

Looking through google matches explaining __init___ , they all gloss over the process of saving a script.
"You can make a script! Type your code, save it as .py, and then use the import command.."
There's no explanation of where and why you would stash these scripts or what else needs to be done to ensure Step 1 (create) works with Step 2 (import).

As an experiment, tried importing a script that was part of the install, and also located in the Scripts folder.
code:
from byext import *
And received the same "Error no module named byext".

You may have touched on this and I missed it but is byext.py in the same folder as the script you are running? Obviously there are ways around this but for a beginner it would probably be the easiest solution.

Adbot
ADBOT LOVES YOU

Victor Vermis
Dec 21, 2004


WOKE UP IN THE DESERT AGAIN
Yes, byext.py and whatever script I'm trying to run, say atest.py, are both located in C:\Python25\Tools\Scripts

Am I mistaken thinking that entering the following code into the shell would then activate all of the variable assignments and whatnot within atest.py?
code:
from atest import *
^ this command being the specific problem at hand. I see it used as an example and am instructed to use it, but I have no idea how or why it works (or in this case doesn't work), unlike Function Definitions and Type and Assignments and Operands and everything else that has a dedicated section within the text.

Victor Vermis fucked around with this message at 01:59 on Nov 24, 2011

  • Locked thread