|
Legit question (that also seems stupid), I have a function that returns ('foo', 'bar'), what's the easiest way to directly pass that to one that takes 2 args? Make a default arg and test for a tuple? e so: code:
Pudgygiant fucked around with this message at 06:42 on Sep 7, 2014 |
# ? Sep 7, 2014 06:33 |
|
|
# ? May 9, 2024 02:14 |
|
https://docs.python.org/2/tutorial/controlflow.html#unpacking-argument-listsPython code:
|
# ? Sep 7, 2014 06:49 |
|
Does anybody have experience with the Yelp API? I'm trying to write a command line utility that queries Yelp, and even though I'm forming requests like http://api.yelp.com/v2/search?&term=pizza&radius_filter=3218&cll=30.2747,-97.740344 I'm getting big UNSPECIFIED_ERROR messages returned. I don't see how much more specific I could be than "give me everything with the word pizza in it that is 2 miles from the state capitol," though, and that error is not even listed on Yelp's official list of error messages EDIT: Apparently I am a dumb and using &cll= only works when specifying a neighborhood (&location=) value as well. Literally Elvis fucked around with this message at 12:31 on Sep 7, 2014 |
# ? Sep 7, 2014 10:58 |
|
BeefofAges posted:You could also try using ThreadPool, since I'm guessing you're at least partially I/O bound, not CPU bound. ThreadPool has the same interface as Pool, except that it creates threads instead of processes. It's undocumented, for whatever reason. Been playing with the ThreadPoolExecutor from concurrent.futures and it does the trick for IO limited work.
|
# ? Sep 7, 2014 17:30 |
|
If you want to be cool and need to speed up I/O-bound work, look at green threads using something like eventlet or gevent.
|
# ? Sep 7, 2014 17:36 |
|
So I'm just starting to use pandas for the first time ever, and I'm trying to decide between different methods of doing things. Right now I have a single database table of "events" where each row represents one event and has a datetime as well as a number of other attributes associated with it. Right now I'm charting this by manipulating the data in SQL and then just dumping that into a pandas DataFrame before plotting it. However I'm barely using the DataFrame and I'm wondering if it would maybe be faster/saner to just dump the raw data into a pandas DataFrame and use it. Primarily my question is: 1) Given a table like that, can/how would I use pandas to do things like "get the number of events per day based on a particular attribute". Like "100 events for Windows on 2014/09/07" assuming one of the attributes is an OS. 2) Is this likely to be faster than the SQL queries? Right now I have a few queries that are each taking 20+ minutes to complete against a table with 500 million events.
|
# ? Sep 8, 2014 04:17 |
|
1) Just off the top of my head. There may be faster ways. If you're able to use the datetime as an index (e.g., they're unique), you can take advantage of the datetimeindex methods and attributes like date:Python code:
Python code:
|
# ? Sep 8, 2014 06:59 |
|
SurgicalOntologist posted:1) Just off the top of my head. There may be faster ways. If you're able to use the datetime as an index (e.g., they're unique), you can take advantage of the datetimeindex methods and attributes like date: They are not guaranteed to be unique. I'll have to try that other thing! SurgicalOntologist posted:2) As to which is faster, IDK. Can you even fit all that data into memory? Hm, well I have boxes available to me with either 128GB of RAM or 512GB of RAM. The events in this case are every download from PyPI since Jan 02, 2014 to now and I'm trying to figure out what sort of interesting data can be extracted from it.
|
# ? Sep 8, 2014 07:35 |
|
SurgicalOntologist posted:1) Just off the top of my head. There may be faster ways. If you're able to use the datetime as an index (e.g., they're unique), you can take advantage of the datetimeindex methods and attributes like date: Actually going back to this, do indexes have to be unique? E.g. if I have a DataFrame where I did df.set_index("the datetime field"), and the datetime fields aren't unique did I lose data there?
|
# ? Sep 8, 2014 07:37 |
|
I'm not sure. Try it out on some test data. Without checking the docs my hunch would be that it would work but might make some manipulations impossible. Similar to how a lot of things can get screwed up when you have an unsorted index, or how a lot of mathematical techniques give screwy results if the index isn't evenly spaced. My recommendation given that data set is to generate a PeriodIndex I think it's called, perhaps every hour or even coarser, and covert those individual entries into counts. It will be much easier to deal with in that kind of form I would guess, and much faster. Plus you could always go back to the raw events if you needed something more. Actually given your original question maybe that was your plan all along.
|
# ? Sep 8, 2014 09:41 |
|
Pretty sure a relational database will always be faster for this kind of work. Selecting and sorting are what they do best.
|
# ? Sep 8, 2014 11:50 |
|
Would anybody care to glance over the code I wrote for this Yelp scraping project? I am dumb so it has no tests (something I will rectify in the next week or so), but other than that, I'm pretty happy with it. I just want to make sure the code doesn't blatantly stink really hard. Also, this is the first time I've ever had to write a README.md so that was something.
|
# ? Sep 8, 2014 16:45 |
|
Literally Elvis posted:Would anybody care to glance over the code I wrote for this Yelp scraping project? I am dumb so it has no tests (something I will rectify in the next week or so), but other than that, I'm pretty happy with it. I just want to make sure the code doesn't blatantly stink really hard. Also, this is the first time I've ever had to write a README.md so that was something. Why are you installing dependencies via a shell script that calls sudo pip? Better to have a setup.py or requirements.txt.
|
# ? Sep 8, 2014 17:02 |
|
Literally Elvis posted:Would anybody care to glance over the code I wrote for this Yelp scraping project? I am dumb so it has no tests (something I will rectify in the next week or so), but other than that, I'm pretty happy with it. I just want to make sure the code doesn't blatantly stink really hard. Also, this is the first time I've ever had to write a README.md so that was something. A few things: - I'm 95% sure you don't need the # -*- coding: utf-8 -*- line if this is exclusively python3. this item is worthless - eliminate_duplicate_results(results) can be just set(results) or list(set(results)) if you still really want it to be a list. - This kind of slicing in credentials.readline().strip()[15:] is just too silly, you should be using configparser or yaml or something. - This part code:
- That Business object thing sounds like it's begging to be a namedtuple.
|
# ? Sep 8, 2014 17:48 |
|
BeefofAges posted:Why are you installing dependencies via a shell script that calls sudo pip? Better to have a setup.py or requirements.txt. Symbolic Butt posted:A few things: Symbolic Butt posted:- eliminate_duplicate_results(results) can be just set(results) or list(set(results)) if you still really want it to be a list. I knew this, but when I tried it, it would refuse to work, presumably because the list is of an object?
|
# ? Sep 8, 2014 20:08 |
|
More pandas questions for y'all! Right now I have a dataframe that looks like this: code:
I have the first part of that done using: code:
|
# ? Sep 8, 2014 20:18 |
|
I haven't test this code, but something like this should work:code:
|
# ? Sep 8, 2014 20:23 |
|
Literally Elvis posted:I strugged a lot with this, frankly. I tried making a setup.py, but it never seemed to work. It's something I'm going to work on. Creating a requirements.txt is as simple as pip freeze > requirements.txt, and then an end user just has to do pip install -r requirements.txt.
|
# ? Sep 8, 2014 20:37 |
|
Literally Elvis posted:These are all things I can look into. 40,000 is the largest value the Yelp API will accept for that particular parameter. The only other magic numbers I found were 1609 (the number of meters per mile), and 3959 (the radius of the earth in miles). I will turn those into variables with proper comments when I get out of work. Python code:
|
# ? Sep 8, 2014 21:24 |
|
Thermopyle posted:Creating a requirements.txt is as simple as pip freeze > requirements.txt, and then an end user just has to do pip install -r requirements.txt. Also, it's bad form to have a user blind execute a sudo command. Instead, use a requirements.txt and allow the user to define the installation context.
|
# ? Sep 8, 2014 23:13 |
|
Blinkz0rz posted:Also, it's bad form to have a user blind execute a sudo command. Agreed, I've edited the README and generated the requirements.txt (thanks, Thermopyle!).
|
# ? Sep 9, 2014 00:19 |
|
In general, don't store something as a number unless you're going to do math with it. In your case, this applies to zip codes, but I've also seen phone numbers stored as numeric values once or twice in other projects. (I haven't actually checked whether you're converting to an int when getting the data from Yelp, but I did notice the initialization of Business.zip as 00000.)
Lysidas fucked around with this message at 13:53 on Sep 9, 2014 |
# ? Sep 9, 2014 01:04 |
|
Also, your make_api_call method is waaay more complicated than it needs to be. No need to do readline and stripping to write your own credentials parser when configparser is built into the stdlib.
|
# ? Sep 9, 2014 11:37 |
|
Steampunk Hitler posted:Now I want to be able to drop the installer_version however since there are multiple values there, I'd like to take any row where the only difference is the installer_version column, and sum() the count column. Anyone have an inkling how to do this? It should be Python code:
Python code:
KernelSlanders fucked around with this message at 16:18 on Sep 9, 2014 |
# ? Sep 9, 2014 14:37 |
|
Lysidas posted:In general, don't store something as a number unless you're going to do math with it. In your case, this applies to zip codes, but I've also seen phone numbers stored as numeric values once or twice in other projects. (I haven't actually checked whether you're converting to an int when getting the data from Yelp, but I did notice the initialization of Business.zip as 00000.) Blinkz0rz posted:Also, your make_api_call method is waaay more complicated than it needs to be. No need to do readline and stripping to write your own credentials parser when configparser is built into the stdlib.
|
# ? Sep 9, 2014 16:03 |
|
Hi y'all, so I just "finished" my first programming thing and I want to have it looked over by smart dudes who know what is up so that you can tell me how extremely dumb I am. I have a pretty limited cs background (cf. none), so I wrote all of this with the help of SO and the scattered qt/pyqt docs which are terrifying and wouldn't stop referencing c++ implementations of things that I can only partially comprehend. I'm really looking for feedback on project structure, tech choice (are my library choices appropriate etc.), anything that I may have obviously implemented wrong/in a strange way, and things that I have totally missed that I should be doing. I went to school for architecture, I can take feedback for what it is so please fire away. As a brief introduction, my goal was to create a cross-platform method for sharing my clipboard between machines while also maintaining a full searchable record of my clipboard (well, the bits I choose to store). I used Django with DRF & Elastic Search to make the API/searchable record of the clipboard. Then I used pyqt4 and requests/pyperclip to make a tray app that lets you either send your current clipboard data to the Django app, or set your clipboard equal to the most recent record for your account from the app. Anything you send is private by default, but you can make the data public and get permalinks to any particular record so you can use it as a pastebin-lite too I suppose. As far as things that I know I am bad on go, I feel like my understanding of threading (both in general, and in qt) is really not up to snuff, but I've been struggling to find a good resource for learning how to really deal with them. Additionally, I was running into a segfault on app exit that was really bothering me because I don't know how to debug it. I ended up deciding (after a lot of SO reading) that it was probably the qt exit call executing random-order deletion causing the python interpreter to crash. I "solved" this by telling sip (which I additionally do not fully understand) not to destroy things on exit. This seems to have worked, but also feels really hacky to me, though evidently it is the default behavior in pyqt5. What should I be doing here, if anyone knows? Also, is there any way to debug a segfault like that? https://github.com/LarryBrid/glimmer-client is the qt client, and https://github.com/LarryBrid/glimmer is most of the code for the Django site. The site is operable at https://glimmer.io/. I probably wouldn't store anything really sensitive on it, because (as should be clear) I am far from an expert and I cannot imagine it is airtight. Please excuse the design/copy and that the download button doesn't do anything yet, I'll be adding links to download the client after I have it vetted and make whatever changes are needed. For now you should just be able to run the client.py file from the glimmer-client repo if you want to play around with it. You get your API key from the Profile page right now. e; I just realized I need to add something to alert people if their API calls are failing, as the console will be hidden normally. The March Hare fucked around with this message at 18:04 on Sep 9, 2014 |
# ? Sep 9, 2014 17:58 |
|
Pudgygiant posted:Legit question (that also seems stupid), I have a function that returns ('foo', 'bar'), what's the easiest way to directly pass that to one that takes 2 args? Make a default arg and test for a tuple? Haven't seen anyone reply to this one yet, but for your specific scenario you would do that with the * operator: code:
http://agiliq.com/blog/2012/06/understanding-args-and-kwargs/ Ghaz fucked around with this message at 03:27 on Sep 10, 2014 |
# ? Sep 10, 2014 03:22 |
|
Let me preface this by saying I'm not a programmer, just a guy trying to automate crap at work... I have a folder with lots of text files. In the text files are account numbers and associated information. The text files are sorted and named by a range of dates. For instance: code:
I have my eyes on a larger program but right now I simply want to be able to provide the script a date, and the script knows which text file to open. If the date falls on the edge, for instance 01/08/2014, then it would open both text files with that date. Any advice on how to tackle this?
|
# ? Sep 10, 2014 10:29 |
|
Hughmoris posted:Let me preface this by saying I'm not a programmer, just a guy trying to automate crap at work... Start by reading the entire file list into memory, parse the start and end date parts of the filename into datetime objects, then find the ones that overlap your target date. Hint: ranges that overlap your target date are the ones where the start date is less than or equal to your target and the end date is greater than or equal to your target. That kind of indexing is the sort of thing pandas does really cleanly, but it can certainly be done just using lists as well.
|
# ? Sep 10, 2014 13:38 |
|
Hughmoris posted:Let me preface this by saying I'm not a programmer, just a guy trying to automate crap at work... I think the following should work: Python code:
There might be a nicer (more Pythonic) way to do this and I am sure other people will say so if there is.
|
# ? Sep 10, 2014 13:44 |
|
KernelSlanders posted:Start by reading the entire file list into memory, parse the start and end date parts of the filename into datetime objects, then find the ones that overlap your target date. Hint: ranges that overlap your target date are the ones where the start date is less than or equal to your target and the end date is greater than or equal to your target. That kind of indexing is the sort of thing pandas does really cleanly, but it can certainly be done just using lists as well. Jose Cuervo posted:I think the following should work: Thanks for the ideas. Jose, I'll try that out tonight when I'm back in the office. I appreciate you taking the time to write that up. I'm 100% a python beginner so it'll take me a little time to parse that and understand what its doing.
|
# ? Sep 10, 2014 14:09 |
|
I had a questions over using DictWriter and couldn't find enough information on how to implement it correctly. i have the following code for filling out my class objectcode:
code:
|
# ? Sep 10, 2014 16:08 |
|
theguyoncouch posted:I had a questions over using DictWriter and couldn't find enough information on how to implement it correctly. i have the following code for filling out my class object If you are only updating one columns, you probably can just used csv.reader/csv.writer as below: Python code:
|
# ? Sep 10, 2014 16:15 |
|
accipter posted:If you are only updating one columns, you probably can just used csv.reader/csv.writer as below: It is one column in one row though. How would I tell my loop which row I want
|
# ? Sep 10, 2014 16:48 |
|
There's probably a better way, butPython code:
|
# ? Sep 10, 2014 17:03 |
|
Jose Cuervo posted:I think the following should work: This worked like a champ, thank you.
|
# ? Sep 11, 2014 09:38 |
|
I keep getting pylint errors redigining name 'DEFAULT_LIST' from outer scope and invalid argument name 'DEFAULT_LIST'. My problem is i turned the arguments into a list because pylint was telling me not to have too many arguments and i wanted to make it more readable. How can i correct this? Python code:
|
# ? Sep 11, 2014 16:37 |
|
That's less readable. Just declare what your constructor takes explicitly. Either disable that rule in pylint or adjust it to be something sensible. Also if you want default arguments you need to name them, and having a mutable object (like a list) as a default argument is a bad idea. David Pratt fucked around with this message at 16:54 on Sep 11, 2014 |
# ? Sep 11, 2014 16:50 |
|
David Pratt posted:That's less readable. Just declare what your constructor takes explicitly. Either disable that rule in pylint or adjust it to be something sensible. Alright i'll mess with pylint instead Python code:
|
# ? Sep 11, 2014 17:12 |
|
|
# ? May 9, 2024 02:14 |
|
Hey, time for new Bokeh! Fun stuffs: New widgets like a hands-on table and new examples plus improvements to the cross filter (still a work in progress) Additional abstract rendering capabilities for heat maps and contours when you have millions of points: Some general feature improvements like multiple axes and some design love for the toolbar: And finally, we just hit 2k stars on GitHub As always we are looking for new contributors and team members so if any of this interests you and you want to get involved in an open source project, drop us a line on GH!
|
# ? Sep 11, 2014 17:28 |