Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

hooah posted:

The problem is that when the last line runs, eighties_dict gets changed. I've verified that master_dict and eighties_dict are different objects by both printing their IDs and printing the result of master_dict is eighties_dict. Why is this happening?

Edit: This only happens in PyCharm; I tried using Spyder (which I hate), and it worked fine. Fantastic.

Edit 2: The problem also happens on the command line. Super weird.

When you run master_dict.update(eighties) it doesn't copy the values as new objects, it adds a reference to the original object

Python code:
>>> master.update(eighties)
>>> master
{'Toyota': [60, 70, 80], 'Ford': [10, 20, 30]}
>>> eighties
{'Toyota': [60, 70, 80], 'Ford': [10, 20, 30]}
>>> eighties is master
False
>>> eighties["Ford"] is master["Ford"]
True
So your actual dictionaries are different objects, but they share the same list objects, which is what you're poking at.

As for why it works in one thing and not another, that would come down to the environment right? Python version and any libraries you're using. Maybe one of them is making copies somewhere along the line

Adbot
ADBOT LOVES YOU

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

Anyone help with this windows (?) issue? I've made a new Anaconda env and installed Selenium (had to run pip install selenium), and written a script that uses it. If I run it from IDLE, it works completely fine. But if I run it from the command line (with the environment activated) it says
code:
    from selenium import webdriver
ImportError: No module named 'selenium'
I've tried just running python and then doing import sys; print(sys.path), and it looks fine - everything on the path is a folder in the env, including the lib\site-packages folder where selenium actually is. Anyone have any ideas? Is there something different about how IDLE runs scripts?

e- ugh I'm an idiot, I was just running script.py instead of python script.py so it must have been defaulting to the basic env that's in the system path

baka kaba fucked around with this message at 16:01 on Jul 26, 2016

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

flosofl posted:

Technically yes. But in practice, it's not recommended.

Assuming we're talking about the same name in different scopes, is there any reason for this? I've seen... uh, something complaining about using names for method parameters that are already being used in the main block, but there's no actual conflict right? Is it just to avoid weird bugs if you delete the local declaration and end up referring to another variable in an outer scope, or something?

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

Well I'm talking about declaring a local variable, that just happens to share a name with a variable in an outer scope. So you are defining it in the function, and you're not referring to a global variable outside of that function's local scope. (Well you could be if you do a typo, but lots of things can go wrong if you type the wrong variable name!)

I mean something like this

Python code:
def fart(butt):
    print("g'day " + butt)


def fart2(rear end):
    butt = rear end
    print(butt)


if __name__ == "__main__":
    butt = "butte"
    fart(butt)
in PyCharm gives a lint warning for the method parameter, "shadows name 'butt' from outer scope". Same for calling some rando variable 'butt' inside a method. So to avoid that warning (which I'm assuming exists for a good reason) you need to arbitrarily rename all your local butts even though there's no actual conflict in scope? It turns simple naming schemes into something awkward.

You can't accidentally write to a global variable from a local scope unless you specifically enable it - you can't read the global variable if you shadow it either, but that's true for every other language I've used, and I don't get a 'whoa now you already used this name in an outer scope' warning in say Java. So is it really just a warning that this might not be best practice in case you mess up changing things later? Seems weird

baka kaba fucked around with this message at 01:20 on Aug 13, 2016

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

Nippashish posted:

You'll avoid 99% of the cases this shows up if you do
code:
def main():
    # your code here

if __name__ == '__main__':
    main()
instead of writing code in the "main block" directly.

Well I could just disable the inspection too (although that probably is a nicer way to structure things huh), it was more about the 'who cares' aspect. I thought maybe there was an important reason!

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

Yeah I tend to pass variables around instead of using global state too, which is why the b-but you already used this name warnings are annoying. Thanks for the tips anyway guys!

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

Yeah that's what I meant about the main() method being a nice idea. Generally I'm calling functions from __main__ and getting butt from function_a and passing it to function_b later, so I'm calling it butt everywhere because it makes sense to do so, in method signatures and in local variables.

So because something called butt is also accessible from the global scope (which may or may not be the same variable, I'm never accessing it so it doesn't matter) I'm getting the warnings. Having a separate main method and just keeping constants in the global scope basically clears all that up and makes things nice and safe. Is that a recommended practice? Seems like it should be, I've never run into it though

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

mkdirs will make all the parent folders though!

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

Dominoes posted:

What resources do you recommend to learn debugging? I've been coding for a few years, but have never tried, or understood it. You set break points, and it tells you what values variable has without using temporary print statements?

Depends on the debugger, but yeah - you can set a point in the code, and when that line gets executed, the debugger does something. You can have it stop, so you can look at the stack and all the variables in context. Or you can make it log a message or value. Or you can set watch points so it only pauses when some condition is met, so you can inspect the state when some bad value shows up. Or just break on exceptions. Or even modify variables and set the thing running again. It's good!

Definitely worth learning if only to get rid of the print statements, but you can do a lot with it. Just makes your life easier

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

^ yeah, it looks like it just loops, waiting for each chunk to come through, then adds it to the list and waits for the next one. That's 30 minutes of chunks to wait for, so it blocks (that's what hanging is) until it's done

JVNO posted:

I'm curious why the blocked form of the code wouldn't be valid? The intention is to record the entire PyAudio stream from the experiment (up to about 30 minutes) and dump it to a single wave file.

It looks like your code is basically the example code from the documentation, so maybe you're just running out of memory? 30 minutes of raw audio is a lot to hold in a list without streaming it to disk. Does it work with 5 or 10 seconds?

What do your crash errors say?

baka kaba fucked around with this message at 00:30 on Sep 13, 2016

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

JVNO posted:

I'm not sure why the callback function would be necessary still; I want one single recording. If the wav recording is simply too large for a discontinuous recording, is there a way I could stick a couple lines of code at the bottom of each iteration of the experimental trial 'FOR' loop that says 'Write frames to wav file; clear buffer for next trial'? Each trial individually is less than 5 seconds- just the total time exceeds 30 minutes.

Yeah, it's been talked about but in case you're not clear - code execution by default is synchronous, which is a fancy way of saying that all your code gets executed in order, one line at a time, until you hit some end point. So your code starts up, initialises the audio engine thing, then sits in that for loop waiting for all the chunks to come in, 30 mins' worth. Only after that loop has completed does it move on to the next stuff. That waiting behaviour is called blocking because it blocks execution, and does nothing until it gets what it's waiting for. It can't respond to input or anything, because it can't process it until the waiting is over. That's why programs hang (as opposed to crashing)

I think what you're looking for is asynchronous behaviour, where you start the polling loop that grabs the chunks, but that goes off on its own to work and the rest of your code runs as usual. So you have two different tasks running simultaneously, instead of one after the other. That is the kind of thing you want, but it doesn't just happen, right? To break out of that synchronous, one-line-after-the-other behaviour you need some kind of async structure that will let you run multiple things at once

Luckily it looks like the audio thing has already written it for you - if you pass in a callback, it should run in the background on its own and run whatever code is in the callback when there's an event (I'm assuming whenever it finishes a chunk, I'm phoneposting here so I can't look).

So you can set up the engine, give it a callback handler to do whatever with the stuff it produces, send it on its way and then carry on with the rest of your script doing something else. This is basically how async stuff tends to work, and yeah it's more complicated than a basic script that just runs from beginning to end

baka kaba fucked around with this message at 22:06 on Sep 13, 2016

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

Yeah, exceptions are there to tell you something went wrong, and give you information about exactly what happened. They should tell you which line in your code caused it, and what type of error it is. Not catching them means your program crashes (bad), because there's a serious problem (bad) which needs to be fixed (good!)

Catching exceptions is what you do when there are specific problems you want to handle. Things like I/O or network operations can be unreliable, there's a chance they could fail (throwing an exception) and you want your program to handle those situations gracefully. Catching those specific exceptions, in the places you expect they could happen, allows you to catch those errors and do whatever you need to do

You just need to be careful of the overlap here, the exceptions you expect and the ones you don't. You don't want to silently catch the errors that point to a bug in your program, you need it to let you know that Bad Thing has gone wrong. That's why it's best to limit your catching to the specific types of exceptions you expect, and only have your try block cover the places where you expect it to occur (like where you make a call to write to disk). That way you reduce the chances of catching and swallowing something you didn't mean to, and losing valuable information

Reraising is basically throwing an exception again - it's like saying "hang on a minute" with the catch, doing something, then saying "ok carry on crashing". Sometimes this is exactly what you want, or maybe something further back needs to see the exception too, and do some handling itself. Once you get the idea of exceptions in general, you should have a better idea of when you might want to do this kind of thing

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

You could also filter your full data so it only contains records with a transfer out of or into the ICU, and create a dictionary with patients and their last transfer out.

Then with your filtered data, if it's a transfer out you just put it in the dictionary (overwriting any old data for that patient), if it's a transfer in you pull that patient from the dict and compare the times

I can do a code later if you like (phonepostin'), it just might be a lot faster if you're processing a ton of records

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

Hughmoris posted:

Thanks for the idea and I'd definitely like to see some code if you don't mind writing it. I'm going to try my hand at implementing a solution in the morning.

Eela6 basically did the same (with nice tuple names) but hey!

Python code:
import csv
from datetime import datetime, timedelta

NAME = 'NAME'
FROM = 'TRANSFER_FROM'
TO = 'TRANSFER_TO'
ICU = 'ICU'
TIME = 'DATE_TIME'
TIMESTAMP_FORMAT = "%m/%d/%y %I:%M %p"


def convert_timestamp(transfer):
    # replaces the TIME value with a datetime type, to avoid copying the dicts
    timestamp = datetime.strptime(transfer[TIME], TIMESTAMP_FORMAT)
    transfer[TIME] = timestamp
    return transfer


def get_icu_returns(transfers):
    left_icu = {}
    # Iterate over all the transfers, maintaining each patient's last transfer out,
    # and yielding any returns as the pair of transfers out and back into the ICU
    # Input transfers need to be sorted by time!
    for transfer in transfers:
        patient = transfer[NAME]
        if transfer[FROM] == ICU:
            left_icu[patient] = transfer
        elif transfer[TO] == ICU and patient in left_icu:
            yield {'left': left_icu[patient], 'returned': transfer}


with open('icu.csv') as csvfile:
    reader = csv.DictReader(csvfile)
    icu_transfers = (t for t in reader if t[FROM] == ICU or t[TO] == ICU)
    timestamped_transfers = (convert_timestamp(t) for t in icu_transfers)
    # sort by time here if you need to, probably better to let the DB query sort the CSV output
    # e.g. sorted(timestamped_transfers, key=lambda x:x[TIME])
    returns = get_icu_returns(timestamped_transfers)
    under_24_hrs = lambda r : r['returned'][TIME] - r['left'][TIME] <= timedelta(hours=24)
    quick_returns = list(filter(under_24_hrs, returns))
It's basically the idea of piping generators together and filtering out anything unnecessary at each stage - so you read the CSV line by line, dropping anything that doesn't mention the ICU, so you have less work to do when it comes to parsing the timestamps or sorting lists or whatever

I overengineered it a bit storing and returning the full records in get_icu_returns but it was more to show you can keep filtering, you could pipe quick_returns into another filter that pulls out patients coming from a certain ward, that kind of thing

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

I'm not sure you can really get around it, if everyone's doing things slightly differently then the best you can do is be as smart as possible with your selector/parsing code. You could try and generalise and get some definitions that work on every page, but honestly that just makes things complicated when they inevitably change something and create some new cases you have to catch. Maybe try making a separate dict or whatever for each page, just to handle getting the set of elements, and write a common scraper that just uses the appropriate class to get the data it needs to work on

You might actually be able to do it using CSS selectors as strings. Like, I *think* this is right (haven't tested it)
Python code:
cool_bikes = {
    'brand_selector': "div.brand"
    'type_selector': "div.type"
    'price_selector': "div.vehicleSpecs > span.price" # combined two steps there, y'see
}

current_page = cool_bikes
type = vehicle.select(current_page.type_selector)
and so on. Or use a class, something like that, so you have a common interface for getting the selectors a specific page uses

You might want to use methods instead though, in case a page has something that needs more logic than a selector can provide (like having to find a certain sibling)

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

LochNessMonster posted:

I'm pretty new to python (this i my first project, I'm slowly but steadily expanding my knowledge) but I haven't looked at methods or classes yet.
Could you explain what your code snippet does exactly, I'm a bit at a loss here...

edit: to clarify, I see what you're doing, but I don't really understand how to apply that to my code.

(sorry, I screwed up and didn't use dict[key] notation for pulling out the selector strings in the dictionary!)

The idea is that instead of having separate chunks of code for each site, basically repeating the same functionality but with slightly different ways of getting the same information, you have a general set of steps that delegates the details to some other object. That way you can repeat the same steps, but with different objects handling those details, each one representing a site and its particular way of getting the data you need. Like a specialist

So the code I posted was basically holding a bunch of CSS selector strings which (should!) correspond to the selectors you posted for each bit of data in your example. If you do vehicle.select("div.brand") for example, it should do the same as vehicle.find("div", {"class": "brand"}). But because it's a single string, you can stick it in a dictionary with a specific key, and just pull it out and put straight into the select call. vehicle.select(site_dict['brand_selector'])

The reason that's useful is you can create that dict for one site, with all the selectors you need for each piece of data. Then you can create another dict for another site, with its own special snowflake selectors, but using the same key names. That way your main code (that does the selecting) can use the exact same calls
Python code:
cool_bikes = { 'brand_selector': 'div.brand' }
rad_bikes = { 'brand_selector': 'a.manufacturer' }

for site in (cool_bikes, rad_bikes):
    # get refine_data...
    for vehicle in refine_data:
        brand = vehicle.select(site['brand_selector'])
if you see what I'm doing there. You get to reuse the code because you're changing the site object, which takes a generic call but provides the specific data for that site. You're basically looking up a common key in each dictionary, and getting a value specific to each site. You can add more into the dictionary too, like a 'filename' key so your loop can load up the correct file for each site. You're basically querying the current site's dictionary to get the specifics each time

I don't actually do much python so I'm trying not to push classes too hard, but those are what I'm used to. A class can be like a fancier dictionary that can hold properties like selector strings, it can provide methods that do more involved processing and return a value (like maybe some more involved logic to find an awkwardly defined element on the page), and you can make sure that different classes have the exact same interface so they'll always have a get_brand(html_page) method (or whatever) that you can call every time in your loop. But if you're not familiar with them you'll probably want to learn about them first - but it's the same idea in the end, code reuse and splitting off the specifics into separate components you can just plug in to your main code

Sorry if none of that makes much sense, I didn't get any sleep :v:

baka kaba fucked around with this message at 19:22 on Oct 6, 2016

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

Well I don't completely get the deal with the combinations of attributes or whatever, so it's hard to visualise exactly what you're pulling out and how it can vary. cool_bikes is just a bunch of info specific to the Cool Bikes site, with the specific selector queries you need for each bit of data you're scraping. So while you're scraping that site, you refer to that dictionary for all the info you need.

It's just a way of standardising it, like creating a form and filling out the specific lookup info for each site, then referring to that as you pull out the stuff. You'd have one for each site, probably even if two are identical, just to keep things simple and organised and independent

baka kaba fucked around with this message at 20:31 on Oct 6, 2016

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

LochNessMonster posted:

I can imagine it's difficult to understand as you can't see the html source I'm trying to parse. I (now) understand the general idea you were showing me and I think I found a way to make that work, so thanks for that!

Aww yea

You should definitely read up on classes and stuff, just so you're aware of the general idea (even if you don't end up using them). It's good to be aware of this general idea of being able to drop in different objects and use them in some consistent way, so you can separate the specifics and reuse the general bit
https://learnpythonthehardway.org/book/ex40.html
that looks decent


Hughmoris posted:

My python skills (and programming in general) are pretty amateur so I'm walking through each solution and trying to understand how it works. Thanks for taking the time to write these up, and I'll let you know how it turns out.

Oh I'd have commented it more if I'd known - if you want to know anything, just ask!
Eela6's solution and mine are basically the same, we're just constructing the data a bit differently. Mine basically goes like this (it's like a pipeline)
Python code:
# these two lines handle opening a file as a CSV, and creating a 'DictReader' object
# that reads each line on demand and turns it into a record dictionary
with open('icu.csv') as csvfile:
    reader = csv.DictReader(csvfile)
    # this gets all the dictionaries, but ditches the ones that don't mention the ICU (no need to process them)
    icu_transfers = (t for t in reader if t[FROM] == ICU or t[TO] == ICU)
    # this just takes each record dict and replaces its DATE_TIME string with an actual datetime object
    timestamped_transfers = (convert_timestamp(t) for t in icu_transfers)
    # now you can sort on the datetimes
    # e.g. sorted(timestamped_transfers, key=lambda x:x[TIME])

    # this gets each return transfer to the ICU (i.e. only where the patient previously transferred OUT of it)
    # it's a pair of both transfers, so you have both bits of info, with keys 'left' and 'returned'
    returns = get_icu_returns(timestamped_transfers)
    # this defines a filter, since you only care about the transfer pairs where the return is within 24 hours
    under_24_hrs = lambda r : r['returned'][TIME] - r['left'][TIME] <= timedelta(hours=24)
    # this runs the filter and makes a list of the results
    quick_returns = list(filter(under_24_hrs, returns))
I'm using generator comprehensions (like (convert_timestamp(t) for t in icu_transfers), in parentheses) which work like list comprehensions, except they don't produce a list - they just spit out each item when something asks for one. So you're not actually building a data structure in memory to hold everything, which can be important if you're reading in gigabytes of data - it's better to work on one thing at a time, or at least filter out all the stuff you don't need before you start building a list or whatever.

The way I wrote mine, it just reads one record at a time, and passes it right through to the end (maybe storing it in the dictionary that remembers the last time a patient left the ICU). It's only the last line that finally builds a list - but you could just write those results to a file instead, one thing at a time. If your CSV data isn't sorted though, you need to collect it into a list so you can sort it all, so that's why the sorted(timestamped_transfers, key=lambda x:x[TIME]) line is there (that spits out a list, so you need to assign the result to a variable and make the next line run on that). It happens after you've filtered out the records you don't care about though, so it at least cuts things down

You might not care about any of that, and if it's a small dataset then it probably won't matter anyway, but I'm into it! I think it's this guy's fault
http://www.dabeaz.com/generators-uk/
The pdf of the slides is really readable on its own

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

Vivian Darkbloom posted:

I've been hacking away at a little game project in Python, and I stumbled across this problem. I know the solution is probably something trivial but I can't track it down and I'm increasingly bugged.

My question: I have a method that takes a class as an argument. How do I tell the method to make a new object of that class? I guess you could use __new__ but that seems odd.

From looking around, this is possible but it's pretty hacky, which makes it seem like you're not really meant to do this kind of thing. What exactly are you doing? If you lay out the general plan then people might have some more, uh, pythonic suggestions

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

chutwig posted:

It's not hacky, it's just taking advantage of first-class objects, and you'd treat it as any other callable.

Oh is it that easy? All the SO answers had a billion variations on One Weird Reflection Trick. That's cool then :frogbon:

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

I've never used Pandas, but I'm giving it a go (mainly to spit out some charts) and I'm wondering what the general approach is for this problem

I have some JSON data that's a mix of values and dicts, and I'm using pd.read_json to turn it into a dataframe. I want to filter rows based on a value in the dicts, but it looks like df[df.thing['thing_id']] won't work, because it's using the column data as hashed keys and you can't do that with a dict? I could filter the incoming rows by hand before creating a dataframe, but that seems a bit silly since pandas is entirely about working with datasets to get exactly what you want.

And I could manually expand all the dicts into individual columns, but that seems a bit awkward too. What's the general approach to filtering this kind of thing?

I'd post some of the JSON but the API site is now down :thumbsup: But hopefully you get what I mean

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

(Sorry, I meant df[df.thing['thing_id'] == 123], filtering on rows that contain a dict with a specific value on a key)

Yeah each JSON object is like
JSON code:
"3377":{
   "id":3377,
   "date":{
      "date":"2016-10-17 00:00:00.000000",
      "timezone_type":3,
      "timezone":"Europe\/London"
   },
   "company":{
      "code":"IPSOS",
      "name":"Ipsos MORI",
      "canonical":"ipsos-mori"
   },
   "client":"Evening Standard",
   "type":{
      "id":1,
      "code":"WESTVI",
      "name":"Westminster Voting Intention",
      "country":"GBR",
      "canonical":"westminster",
      "menu":"Westminster",
      "short_name":"Westminster VI",
      "category":1,
      "hashtag":"#GE2020",
      "partylist":[
         "CON",
         "LAB",
         "LD",
         "UKIP",
         "GRN"
      ],
      "complist":[

      ]
   },
   "update_time":{
      "date":"2016-10-19 15:26:46.000000",
      "timezone_type":3,
      "timezone":"Europe\/London"
   },
   "tablesurl":"https:\/\/www.ipsos-mori.com\/Assets\/Docs\/Polls\/pm-october-2016-tables.pdf",
   "mode":null,
   "headline":{
      "CON":{
         "pct":47,
         "party":"CON",
         "code":"CON"
      },
      "GRN":{
         "pct":4,
         "party":"GRN",
         "code":"GRN"
      },
      "LAB":{
         "pct":29,
         "party":"LAB",
         "code":"LAB"
      },
      "LD":{
         "pct":7,
         "party":"LD",
         "code":"LD"
      },
      "UKIP":{
         "pct":6,
         "party":"UKIP",
         "code":"UKIP"
      }
   },
   "previous_poll":null
}
so there's a lot of nested data in there, so some columns end up containing dictionaries. So it's really meant to be flattened when you do the read_json command? I know there are parameters for running formatting functions and the like

I mean it makes sense, I just want to do it the 'right' way - pandas looks cool, but most of it is way over my head, it feels like a lot of documentation takes your understanding of data analysis for granted

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

SurgicalOntologist posted:

But you're going to have a much nicer time if you flatten. Try this: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.json.json_normalize.html
and if that can't handle JSON in your specific structure you'll have to write a custom flattener. Shouldn't be too difficult and will give you the most control.

Yeah this looks workable once I get the hang of it, thanks!

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

That error message is saying the parameter you're feeding in (dealer) is the wrong type. It's probably expecting a string or another primitive like an int, and dealer is probably some kind of record object instead. Try passing in str(dealer)

e- me read good

baka kaba fucked around with this message at 14:48 on Nov 3, 2016

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

Multiprocessing (which joblib looks like it uses) creates separate processes for each worker, and they use separate memory from the main process that's creating them. So your worker processes can't see that list, it's created and manipulated in the main process

You have a few options for giving them access - passing it as an argument to the process function creates a copy for them, you can set up message-passing so processes can talk to each other, and you can mess around with shared memory too. Depending on how much data you're working with the 'pass it as an argument' approach might be fine

Parallel processing is awkward, basically. There's a lot to trip you up, and tools you need to get around that

(that's my understanding of it anyway, I don't python on this level)

baka kaba fucked around with this message at 07:53 on Nov 21, 2016

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

They're on there, that documentation is kinda dense and hard to get a clear overview of what's available, I think

https://www.crummy.com/software/BeautifulSoup/bs4/doc/#css-selectors

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

How many players and entities are we talking here? How often do you update the state, is it a fast action game or something turn-based? If the demands aren't high you can probably just do whatever feels simple and makes the most sense, without worrying about efficiency

What you're saying makes sense, there's actually a bidirectional dict library that uses a pair of plain dicts under the hood, so maintaining two views (one from each side) is definitely an approach people use

Do you actually need to maintain that full state though? It depends on what you need to do with the information, but from what you're saying everything relies on a player being able to see it happening - players need to know which entities they can see, and entities need to interact with each other if a player can see them. So you could just update the players' "what I can see" dict, and then combine their sets to get the set of all visible entities, and tell each of those to do their thing. Also saves you iterating over the larger group of entities to see if each is visible

You could also drop the state and just let your visibility check function tell the entity and players to do something if any of them are visible - but it really depends on what your game is going to need and if it's worth storing the visibility information for further use

baka kaba fucked around with this message at 14:12 on Jan 19, 2017

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

floppo posted:

I'm looking to parse text data from three kinds of sources: pdf, .doc and .docx. I have around a thousand of each. I'm surprised at how difficult it has been to extract the text of even a docx file into a Python string (which I'd then write to a plain txt).

Have you looked at one in a text editor? They're not exactly trivial to parse

Can't help with suggestions, but make sure your PDFs actually have text content - sometimes they're just page images

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

You can use Selenium (in Python) to mess around with a PhantomJS webdriver, and use most of your BS4 parsing code to full stuff out of the resulting HTML. You're not actually touching any JavaScript (well unless you're messing around with PhantomJS's broken cookies like what happened when I tried it)

You can just let a web browser handle it too, put in whatever delays to let the page load and the JS mess around with the contents, parse the results. Unfortunately that seems like it's less simple than it used to be, GeckoDriver paths and what the hell ever, but it's definitely doable within Python once it's set up

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

You used to be able to just pop a Firefox browser out of the box with one call, now you need to mess around configuring it a bit (which sucks) but otherwise it's still pretty straightforward. Start browser, get page, then you can grab the html like you're already doing, or do more complicated stuff like clicking elements or waiting for things to become visible

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

You need to set that as global if you're writing to it, right?

I mean technically it's a read then an assignation, but does Python still count it as a local variable from the beginning since it knows it's going to be written to?

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

socialsecurity posted:

I'll admit I'm new to this and mainly helping out a friend with a little project he's working on, but the HTTP server is Apache hosting wordpress do we need to run a Python server library on top of that perhaps I am misunderingstanding what a server means in the context of Python 2.7

Your Python code is receiving data somehow, right? Then it does something and returns a result, and you want that result to eventually reach whatever made the POST request

It depends how your Python code is receiving the data in the first place. Is the webserver piping it into your Python script , like myscript.py "some data" and looking for a result it can send back? In that case you can just print to stdout and the webserver will get it. In this case your script isn't actually handling HTTP, it's just working on some JSON data or whatever and spitting out the results.

Or is your Python running a HTTP server that directly receives the POST request? In that case you need to send back the results in a HTTP response. Usually you'd have a library someone already wrote to do this, so you can handle the incoming request and form a response easily. There'd be some established workflow for this, so you don't have to worry too much about the technical details

It's not really clear what you're doing, but if you're printing your results to stdout then that implies something else is handling all the HTTP, and if you're having problems with responses then it's probably occurring there

(I might have said something stupid here because I don't know much about http :shobon: )

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

socialsecurity posted:

We are having someone send their address via a http wordpress form the info gets posted via j query Ajax to the python script, the python then takes the address info and runs it through a series of pre-made apis that it's supposed to send the info from the apis back. We are using python because that's what the api sending code was written in originally. So far the former sends the data to the python script which does process the data right if we pipe it to the console everything looks as it should just can't get it to respond properly to the Ajax post request.

How does it actually work? How does your ajax request end up talking to the python script? How does the data get in?

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

If you're using some kind of CGI thing where you're meant to just drop the results to stdout, and it looks fine on the command line, seems like the problem is elsewhere no? Did you debug whatever's making the ajax call and see exactly what the response it's getting is? Maybe adding an error callback would be useful too, instead of just the success one

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

Join usssss

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

It's not even just a symbol though, it's literally squashing irrelevant stuff out of sight. Like redacting your code so you can focus the reader on the stuff that needs to be there.

It's such a neat solution that it's honestly hard to see why anyone would have a problem with it, unless they're just used to doing things a different way. Technical issues with using that symbol aside of course :frogbon:

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

Maybe it's a boolean that contains information about whether something is used? Or a collection of unused things from a pool? Like, is the value you're assigning part of an object's state, or something you're going to be ignoring? You can make an assumption, but how sure are you without further investigation?

That's the nice thing about conventions, it gets everyone on the same page so you be pretty drat sure what's happening and what you can forget immediately. Thermo's making the point that when someone ignores a convention, you have to start asking why and you can't really rely on your assumptions anymore

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

QuarkJets posted:

The "_" character is even less descriptive than "unused", and so it could store literally anything.

That's kinda the point though - it's as close to nothing as you can get without being whitespace. You're not bothering to name it, so there's nothing to read into. Visibly it may as well not be there, like a really low noise floor. It just works better at its intended function, being a hole to throw stuff in so the important info sticks out

That's not to say it's necessarily easier to understand the first time you run into it - pretty sure it confused me coming from Java - but that goes for a lot of things, and really it depends on your background too, what you're used to seeing and doing

I'm not saying you can't use a word if you want, and if that's the pattern already in use then even more reason to stick with it. But _ is great and if it's available to you without issues in your language, I definitely wouldn't recommend you avoid it. It's so clean and simple, the sooner it becomes at least a generally recognised convention the better imo

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

Flat is better than nested :v:

Honestly a lot of those rules seem better served by _. Sparseness, readability, one way to do things (in the sense you don't need to think of a context-appropriate name to convey 'unused', that's what _ is actually for).

I guess you can argue it's less explicit than a name, but that goes for a lot of sugar and basic language features. It's a tool to pick up like anything else, once you know it it's about as explicit as you can get

Adbot
ADBOT LOVES YOU

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

QuarkJets posted:

That's not what explicit means, there is no way that _ could be considered more explicit than a descriptive variable name. And again any argument you make about conventions (which you're now calling "a tool") applies equally well to using _ as a prefix (which is already a convention/tool in pylint).

I'm not sure why you're against using _ as a prefix for marking unused variables, it's already an in-use convention, it's in accordance with the other ways that underscores are usually used, and it's friendlier to people on the outside who may have never seen the convention before (such as C programmers).

I mean that once you realise _ is specifically used as a junk variable for values that are ignored, your intent becomes explicit when you use it. I'm talking about the wider convention, which in some languages gives _ special privileges (like reuse or exemption from unused checks) because it has that specific function, which is why I'm calling it a 'tool'. It allows you to do a specific thing in a neat, widely* recognised way, in multiple languages

I don't have any problem with using it in prefixes or whatever!

* for some definition of 'wide' anyway

  • Locked thread