Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
Johnny Cache Hit
Oct 17, 2011

rivals posted:

I apologize if this gets asked a lot but I skimmed a bit a didn't see anything. I'm looking for a simple and preferably cheap way to host Python webapps. I've never done web dev in Python, just PHP, and I'm looking to start playing around with Python on the web and trying flask. I used to have hosting through a friend (using a LAMP stack) but he shut down his servers. I tried using AWS and doing all of the admin stuff myself but after fighting with fastcgi config on lighttpd for a while I give up. I have a domain, dns hosting, etc already I just want something I can use to deploy python webapps with as little headache as possible. Any suggestions are appreciated. I've been recommended heroku, dotcloud, and google app engine but all by people who have heard good things about them, not from people who have used those services, so some hands on opinions would be great.

Amazon AWS is excellent. Plus you will probably be able to fit in their free usage tier (http://aws.amazon.com/free/).

I've used Google App Engine, and it's nice, but if you're just getting started with Python development I wouldn't recommend it -- stay vanilla, use AWS.

Adbot
ADBOT LOVES YOU

Johnny Cache Hit
Oct 17, 2011

Sailor_Spoon posted:

EDIT: dang, so slow.

if you actually care about seeing all the different combinations, something like this should get you there:

code:
import itertools
a = [1,2,3,4,5]
b = list(
    itertools.chain.from_iterable(
        itertools.combinations(a, i) for i in range(1, len(a) + 1)
    )
)
print b
Since combinations requires a length parameter, you just need to generate the list of combinations for each length, then mash 'em together.

You don't need to check |a|+1, as C(n,k)=0 for k>n.

Otherwise, this code should be exactly what GregNorc wants. There's no specific phrasing to it -- it's just a plain ol' combination without repetition.

Johnny Cache Hit
Oct 17, 2011

No Safe Word posted:

the +1 is to pull in the original since range is not endpoint-inclusive

:ughh:

Man, the fact that range is closed on the left and open on the right always screws me up... and I've been using Python for years now.

Johnny Cache Hit
Oct 17, 2011

Jaur posted:

Is it something wrong with IDLE, my drivers, my code?

It's IDLE. It doesn't play well with pygame if you don't call pygame.quit() at the end.

See the pygame wiki entry for more (and ignore the spam).

Johnny Cache Hit
Oct 17, 2011

RobotEmpire posted:

Does anyone have experience interviewing at established/well-funded web-centric startups (especially in the Bay Area)? I'm definitely not looking for "cheats" but any light you might shine on the best way to prepare would be appreciated. What topics I ought to spend the most time studying based on your experience, and so forth. I should say that I am definitely not an expert anything. I have written some code in my life and do okay with it, but I'd not consider myself an expert in any domain of knowledge. I am pretty proficient with Python and understand in general how the web works; generally proficient with command line Linux; can provision servers;can perform CRUD-related tasks with pg, mysql, etc., etc. All the very basic things you'd expect someone on your team to be able to perform, but an expert at none. Would love to get some insight while prepping for these very important (to me) interviews.

Note that my advice may not extend to Bay Area startups, but it's served me well so far, and it seems to hold true to what I've heard from friends who have interviewed at your type of company.

From what I've seen and heard, the face to face interview is the least likely point at which you will land a job -- the employer has already reviewed your resume, which should have enough information to convince them that you are worth hiring.

Do you have an up to date personal webpage? If not, get one set up. It doesn't have to be a blog or anything; a tasteful presentation of your resume with clean CSS and HTML is more than enough. Even more important: your webpage and your resume must direct people to your Github. If you don't have a Github account, start programming now. Find an open source project that interests you and contribute. I've talked with a handful of people who say that being able to see this body of work is the #1 thing they look for.

That aside, back to your question: the Python wiki has a good listing of documented Python warts (http://wiki.python.org/moin/PythonWarts), but I'd spend my time focusing on algorithms and techniques instead. If you know enough to say "these warts are annoying because I run into them when I do {x, y, z}", you'll probably know more than enough.

Also, I'd recommend reviewing Steve Yegge's article on 5 essential phone screen questions and ensure you aren't missing anything big there.

Beyond that, be sure you know something about the nuts n' bolts of software development -- unit testing, version control systems, how to write a good bug report, etc.

Good luck :cheers:

Johnny Cache Hit
Oct 17, 2011

RobotEmpire posted:

These are all phone cons with CTOs, senior devs, etc. I'm not in the Bay Area. More to your point, my resume is light; I've gotten these interviews more on the strength of my personal projects (on github) and a friend here and there putting in a word (based on the work they've seen me do).

Thanks. I've read Steve Yegge's article as well as Joel Spolsky's article on phone interviews as well. Frankly I've only been programming seriously for a year or so; professionally less than that. The thing I'm most confident in is analysis & breaking a problem down to addressable components, then explaining that. Less so low-level computing knowledge.

I forgot to mention that the #1 thing you need to do to get hired is to get a reference from someone that already works there :c00l:

Did you get a degree in CS?

Based on your experience, reviewers should be interviewing you for an entry level programming position. If you can decompose a problem well, you think like a programmer, and you should do pretty well in an interview. Nobody sane is going to expect you to come in and immediately start coding at a high level.

Your sysadmin skills are helpful icing on the cake.

Johnny Cache Hit
Oct 17, 2011

FoiledAgain posted:

I have a project too large to do by hand, so I want do it with Python and I'm looking for some help on where to start. There is a database of languages here, and I want to be able to verify, for each language, whether or not there exists a non-stub Wikipedia article about that language. There are some further complications too (like some languages have many names) but that's the basic need.

I've never used Python for web stuff before, so I'm not sure where to begin. In fact I've never done any web stuff before, but as I say it's too big to do by hand, so I'm willing to put in time to program it.

A few thoughts:

If you don't have access to the raw language database, you'll have to do screen scraping on your database of languages. I've used beautifulsoup for this before and it is absolutely awesome.

Alternatively, it might be even faster to just copy the language names over and strip them down in your editor -- this is a 5-10 minute Emacs job at most with the right macros, and you won't have to fool around with screen scraping. In general, if you're going to do this more than once, you might want to code it.

Once you've got a list of actual language names extracted from your database, you will have to fetch the pages from Wikipedia. urllib2 will work well for this.

You've got three cases I can see:

1. The language doesn't have any article on Wikipedia -- you're lucky here, because if you request an article that does not exist on Wikipedia you will actually get a 404 back (just checked with curl). Just catch URLErrors and be sure it's a 404, and you can mark that language as a no.

2. The language has a stub-only article. You can probably get away with just searching through the returned HTML for "article is a stub. You can help Wikipedia by expanding it." -- I think most Wikipedia stub templates have that verbage.

3. The language has a full Wikipedia article. If it's not 1 nor 2, it's 3, so mark that language as a yes :)

This is quite literally the quick and dirty way of finishing your project. There might be cleaner ways to do this, but just parsing a bunch of strings is easy :shobon:

Just don't be an rear end -- throttle your bot to be sure it doesn't request all ten thousand pages as fast as it can.

Johnny Cache Hit
Oct 17, 2011

GregNorc posted:

I'm on 2.6.1

I'm afraid to modify anything too much since it's doing an analysis I don't quite fully understand the inner workings of. What would I wanna change, taking into account 2.6.1?

Nothing makes a mathematician :gonk: more than hearing someone trying to apply statistics while admitting uncertainty. You sure chi-squared is the right test?

I know it's not Python, but if you want to do statistical calculations, look at R. It does chi-squared without you having to muck around with writing the function, and has just about everything baked in that you would need.

Johnny Cache Hit
Oct 17, 2011

accipter posted:

I am looking for a simple method to leverage multiple computers around the lab.

I need to perform about 130k calculations each of which takes about 20 minutes. If you do the math on that you end up with just under 5 years. When I first started on this project, I was working with 1/30th of the data, so I just used the multiprocessing module to run the perform the calculation on 8 cores on my workstation which took just over a week. Since I have two idle workstations, I thought I could send jobs out to these additional machines and speed up the process.
I thought I remember seeing a nice solution with a live linux CD that could be used to create a simple cluster for problems like this, but google came up with nothing productive. There isn't much that would need to be sent prior to starting a calculation (maybe around 6400 floats).

Does anyone have any suggestions?

I've used Gearman quite successfully for this type of task.

I mean, you could hack your own solution up with pickle/sockets, but it will be terrible and you will spend just as much time securing and debugging it as you'd spend with Gearman.

Johnny Cache Hit
Oct 17, 2011

vikingstrike posted:

So, I'm working on a project that requires me to fill out a Javascript form over and over and over again (basically, I enter a patent number and it gives me results). I would love to be able to automate this, so I can regex the results and store what I need in an output file. I checked out the mechanize module, but it has no Javascript support. Do any of you have experience doing things like this? Any modules that I should check out? I can post the source of the website, if needed.

The source of the website might be helpful.

I'm guessing the Javascript does an AJAX POST somewhere and puts the data back into the page.

You might be able to POST directly to the AJAX endpoint using urllib2 or restkit or the like.

You've then got some options on the output... basic string parsing, regex, or use BeautifulSoup if it's nasty HTML and you need good screen scraping.

As I mentioned before the last time I posted about batch-scraping webpages: don't be an rear end in a top hat to the site owner. Throttle your bot. I always try to be extra nice and give at least 10-15 seconds between requests.

Johnny Cache Hit
Oct 17, 2011

vikingstrike posted:

Yeah, I don't want to be an rear end in a top hat at all. Actually most of the time I scrape websites, I use 15-20 second buffers.

Here's a paste bin of the source. Sorry for the large spaces, it's how I copied it off the page. I need to be able to include text in the search box, check a box and then change the search date range.

http://pastebin.com/T4c8Ch5P

Oof, that's a ton of code.

I see LexisNexis so I'm assuming everything is paywalled, or I'd say link us the webpage itself & let's try it out. I don't see where onSearch() is defined, which looks to be the function that kicks off the search, so that's probably buried in one of the many referenced JS files.

My suggestion here - rather than trying to pore over hundreds of lines of nasty HTML, try to capture the submit at the source. For this, Firebug is your friend. Install it, go to your page, enable the Net panel, and submit the form. I'm betting you'll see a POST in the XHR section. That POST will probably have a ton of data, but will have your searched text, checked box, and date range.

I'd start there. See if you can deconstruct the POST, and reconstruct it without using the webpage. curl is really good at this -- use -X POST to send a POST (if that's the right verb, of course!), -d for data, and if the site authenticates with cookies, -b is your friend.

Johnny Cache Hit
Oct 17, 2011

Lister_of_smeg posted:

What I don't understand is why urlencode is happy with the raw dict, but not with the dict that comes out of simplejson.dumps.

code:
>>> type(simplejson.dumps(request_dict))
<type 'str'>
>>> type(request_dict)
<type 'dict'>
urlencode doesn't work on strings.

Why do you want to urlencode the simplejson output rather than just urlencode the request_dict? You're adding another step that doesn't really seem sensible.

Johnny Cache Hit
Oct 17, 2011

oiseaux morts 1994 posted:

Ah thanks, although that's more of an aside. I'm really just interested to see if my implementation of an Interface is useful/correct; it's all in the code.

You're duck typing like a champ :downsrim:

One note: avoid except Exception as it will screw up badly in non-trivial situations. Catch the exceptions that you will meaningfully handle.

Johnny Cache Hit
Oct 17, 2011

brosmike posted:

Don't do this if you can avoid it - some windows programs (cmd.exe among them) will try to deal with forward slashes sometimes, but the behavior is inconsistent between programs, difficult to predict, and subject to change between versions.

Why would forward slashes in the path specification even be seen by individual programs? I didn't think those even went into argv, just the binary.

Of course if you're trying to give a path to another program this caveat may well come up, but if you're just calling a program...

Johnny Cache Hit
Oct 17, 2011

fletcher posted:

I've decide to re-write a PHP webapp in Python since the PHP code is crap and I've been wanting to learn Python for awhile anyways. I was going to give Flask & SQLAlchemy a shot, seems to offer a bit more flexibility over something like Django. Any advice before I dive in?

Flask is pretty cool. But so is Django, and you'll probably find it faster to get moving quickly under Django because some of the nice built-ins it gives you.

Just be sure you're not falling into the trap of thinking that you'll need more flexibility without really thinking about what you're trying to do, because often times your problem isn't as hard as you'd think.

Johnny Cache Hit
Oct 17, 2011

the posted:

So I can put all my code in a .py text file, open a command window, and just run it.. right?

Yep, call python whatever.py.

Or add the right shabang (#!/path/to/my/python/) and make it executable and you can skip the Python -- assuming you're on a *NIX.

Johnny Cache Hit
Oct 17, 2011
That code is really loving hard to follow. Here's a minimalsmaller testcase that I believe exhibits the behavior you're seeing.

code:
imgx = 1000
imgy = 500
xa = 3.0
xb = 5.0
maxit = 1000

for i in range(imgx):
    x = 0.1
    for j in range(maxit):
        print x
        x = (xa + (xb - xa) * float(i) / (imgx - 1)) * x * (1 - x)
        print (xa, xb, i, imgx, x)    
        int(x)

Your x value dips just a hair over 1.0 (look in the output for 1.0001990246762804). This throws your system into negative numbers on the next run:

x = (5.0 + 3.0 * float(500) / (999)) * 1.0001990246762804 * (1 - 1.0001990246762804)

x = 6.501501501501501 * 1.0001990246762804 * -0.00019902467628041265

x = -0.0007964564119593775

hi there, now you're going to accelerate really quickly to -inf

When xb = 4.0 x doesn't go over 1.0 (I checked) so your system remains stable.

The big problem here: the way you're converting back and forth from int <-> float is really loving scary and you shouldn't do it. It's very possible that what you're doing is algorithmically correct but an accumulation of error bumps x > 1. It's more likely that what you're doing will never give you the result you need.

Read this:
http://floating-point-gui.de/

Johnny Cache Hit fucked around with this message at 04:19 on Mar 1, 2012

Adbot
ADBOT LOVES YOU

Johnny Cache Hit
Oct 17, 2011

JetsGuy posted:

I keep telling myself that I have to learn cpp so I can get a REAL JOB (TM) outside of scientific research, but I just am exhausted when I go home... :(

In my career I've programmed in PHP, Python, Ruby, and Erlang. I guess what I'm saying is that I hope like hell I never get one of those real jobs, because I'm not a fan of cpp ;)

For your particular problem definitely check out numpy, and don't forget about pypy like tef mentioned. Pypy does seriously amazing things.

  • Locked thread