|
FoiledAgain posted:From 2.6 onward, you can do from __future__ import print_function, which is suggested in the python documentation.
|
# ? Jul 30, 2014 13:28 |
|
|
# ? May 9, 2024 14:44 |
|
Hoping somebody can help me to figure this out, I think once I can get a simple example under wraps, I might be able to figure out everything else. I'm trying to get a little more advanced with scraping - rather than using Selenium with Chromedriver to handle this problem, I would like to be able to use requests to be able to check pricing on items across various websites. Example Item- http://www.luxedecor.com/product/tayse-casual-shag-rectangular-rug-ta8510 http://www.wayfair.com/Tayse-Rugs-Casual-Shag-Rug-8510-Multi-TYX1789.html The same rug is on both sites, I would like to be able to check pricing on let's say 7'10" x 9'10" size on these two sites and save the price to compare them later. Currently I am using an... unpleasant method where I load the page using chrome webdriver, wait for the page to load, click on the drop down, wait to make sure it opens, then click the option and grab the price using xpath. The script is ugly and not the way I envision the final program to work, but it was just a quick modifier of a existing script I was using, and it works unless the page lags for some reason or reloads. code:
Example LuxeDecor: It looks like it's just the product URL, it's not sending any modified URL to receive the price update. Can anybody help me to figure out how to get the price using something like Requests to avoid physically loading the page and scraping pricing? The items I need to scan will always be static, I will provide the script with the URL, I just need to figure out how to request a price for a specific size without having to load the page and utilize the dropdowns.
|
# ? Jul 30, 2014 15:30 |
|
In my mission to never use R again I've been learning matplotlib and pandas. Here's something i haven't been able to work out. Say I make a graph like this:Python code:
|
# ? Jul 31, 2014 14:26 |
|
JayTee posted:In my mission to never use R again I've been learning matplotlib and pandas. Here's something i haven't been able to work out. Say I make a graph like this: This first answer in this Stack Overflow post should get you started with the labels and show you how to interface with the axis more generally: http://stackoverflow.com/questions/11244514/matplotlib-modify-tick-label-text And here is the documentation on the different ways you can interact with axis: http://matplotlib.org/api/axis_api.html
|
# ? Jul 31, 2014 14:43 |
|
I disregarded that first answer when I came across it yesterday since it said it didn't work on later versions. In case this helps anyone in the future, to get the actual labels for my chart out I had to do this:code:
|
# ? Jul 31, 2014 15:29 |
|
JayTee posted:I disregarded that first answer when I came across it yesterday since it said it didn't work on later versions. In case this helps anyone in the future, to get the actual labels for my chart out I had to do this: Sorry about that. I glanced quickly and it meshed with what I had used in the past. Glad you got it working.
|
# ? Jul 31, 2014 15:49 |
|
vikingstrike posted:Sorry about that. I glanced quickly and it meshed with what I had used in the past. Glad you got it working.
|
# ? Jul 31, 2014 16:20 |
|
I'm building a report builder for my Django app and could use a little advice. My reports are fairly simple where I accumulate scores of data (easy enough) but then I want to alter the report totals by varying dimensions (date ranges / split dates/weeks/months / owners / other metadata etc.). Since I am working with Django Querysets, I have some options as to how I can query the data into one query set with joins where I can traverse the joins for my accumulating data. Or I can take multiple querysets and join them in my app manually which simplifies the queries somewhat (this optimization might come later when I load test the app). My data might look something like this: Parent (with useful dimensional metadata) -> Child (with useful dimensional metadata) ->Child of child (accumulating data source, i.e. Counts to aggregate) I see some stuff about Pandas, also Anaconda. I took a brief look at both and they definitely both sound more hardcore than I need, but then I don't feel like rolling my own axis/dimensional modelling logic if I can perhaps build a dataset and have the app do it for me. Which package is recommended for babby's first stat package that can meet my requirements? Ideally one that uses less resources as I plan to scale this app up quite a bit in production. Also for whichever package recommended, where would I find some good basic tutorials on how to build my dataset and alter it for reporting purposes? I plan to build charts on the client-side front-end with HTML5/js/css.
|
# ? Jul 31, 2014 19:00 |
|
Yeah, the matplotlib documentation is far from amazing. There are a lot of great capabilities there, but many of them are obfuscated
|
# ? Jul 31, 2014 19:43 |
|
Nevermind, I spent the day learning Pandas and I'm pretty happy with it. Does anyone have any advice for feeding multiple tables of data into a Dataframe? Right now I am creating separate Series, one for each reporting dimension and merging them all into a single Dataframe, indexed by the lowest level child ID. This means I have a fair bit of repetition in data as a few of the dimensions are repeated for hundreds of entries per Series. I am getting the results I need by grouping/aggregating and filtering my Dataframe, but I am wondering if there is a better way to do it? I also have another question about the most efficient way to get data out of the Dataframe. Is there an efficient way to pack a grouped result into a dictionary, or do I have to manual iterate and build results dictionaries?
|
# ? Aug 1, 2014 03:57 |
|
Ahz posted:Does anyone have any advice for feeding multiple tables of data into a Dataframe? Right now I am creating separate Series, one for each reporting dimension and merging them all into a single Dataframe, indexed by the lowest level child ID. This means I have a fair bit of repetition in data as a few of the dimensions are repeated for hundreds of entries per Series. I am getting the results I need by grouping/aggregating and filtering my Dataframe, but I am wondering if there is a better way to do it? I can't quite visualize what you're doing but you probably shouldn't have repeated data. Do you have more than 2 dimensions? Maybe consider a Panel, which you can put together from a dictionary of DataFrames (among other ways). Ahz posted:I also have another question about the most efficient way to get data out of the Dataframe. Is there an efficient way to pack a grouped result into a dictionary, or do I have to manual iterate and build results dictionaries? What sort of results dictionaries do you mean? One per row? If so look into DataFrame.iterrows. Or if it's something else maybe DataFrame.to_dict?
|
# ? Aug 1, 2014 04:09 |
|
SurgicalOntologist posted:I can't quite visualize what you're doing but you probably shouldn't have repeated data. Do you have more than 2 dimensions? Maybe consider a Panel, which you can put together from a dictionary of DataFrames (among other ways). I'll quote the query and Dataframe load: Python code:
As I said, I can use the data frame fine, but is there something I'm missing where I can have anywhere from 5 unique records on one column to 2000 on another but maintain mappings for grouping and filtering? Ahz fucked around with this message at 07:32 on Aug 1, 2014 |
# ? Aug 1, 2014 07:26 |
|
Okay, let's see. Looks like most that preprocessing is unfortunately necessary because you are dealing with data in object attributes. However you could streamline that a bit, with a few different options. My suggestion would be to construct the frame from a dict of lists, each dict keyed by content.choice. Then when you construct the frame use the columns kwarg with a list of column names corresponding to the order of items within each dict. That should cut down considerably on all that busywork. To actually answer your question, there is nothing wrong with your approach, it can be convenient to have those columns that don't change values often repeated so many times. For example, I do behavioral experiments, I might have 200 trials per participant. I have a number of columns for IVs and for DVs. The DVs may have unique values every trial but the IVs will not. In a between-subjects experiment for example, each participant would have the same value of an IV for every row (e.g. True or False in a 'control' column). Similarly, I usually ask the participants age and gender and then stick that value on every row for that participant. This is a very common approach and makes it easier to do analyses combining these repeated values and "normal" columns. However there are alternatives, depending on how you conceptualize your data. Both of these are more natural if you see any of these repeated values as categories of observations rather than attributes of observations. You could include some of those columns in the index, creating a MultiIndex. In my experiments, for example, I often use a (participant number, block number, trial number) index. Then I can get all of the first participant's trials by df.ix[(1, slice(None), slice(None)), :] for example. But given that you have a natural index in content.choice this might not be a good choice. Unless you don't really care about content.choice and there's a more semantic way to label your data. Using a Panel might be a better option. A Panel is basically a 3D container of stacked, labeled DataFrames. In my work I sometimes use a Panel, so I'd have a shape of (number of participants, number of trials per participant, number of data variables). So each participant gets its own DataFrame, essentially. This would be a natural approach if due to constraints on your data you are assured of having the same number of rows with each value in one or more of those repeated columns. These different organizations of data are ways of answering the question "wide or long format?" So if you search for wide format vs long format you can get more examples and discussion. Without trying to guess too much based on your column names you might want to put content_q_id in an index (either the row index as part of a MultiIndex or the Panel index for the "items" dimension). Your previous post mentioned getting data out of the frame into a dict. What you are trying to accomplish with that would be relevant to deciding the best organization. If you organize properly you probably don't need to go back to a dict (unless you have a different reason for doing so, like to interface with a different library). SurgicalOntologist fucked around with this message at 20:42 on Aug 1, 2014 |
# ? Aug 1, 2014 20:36 |
|
I thought about the multi-indexing option. Granted I didn't fully wrap my head around it fully yet, but it seems like just another way of getting the same result without a ton of benefit for my use case (mapping multiple data sources across a common index). I was just curious if my method was wildly inefficient before I move on to my next problem. If its a wash either way and memory usage is roughly equivalent (as it seems multi-indexing would still need to track 1-many across for each child of the lowest level), then I'll just keep this as-is. I asked about going to a dictionary with my grouped/filtered result because ultimately I'll be feeding the data to a Django template for report processing on the front-end and Figure that's probably one of the easiest data structures to process in the template. I'll play with the data and see. I'm actually considering dropping down to raw SQL instead of preprocessing the data for my dict, do you think it's worthwhile?
|
# ? Aug 1, 2014 22:49 |
|
Are you running out of memory? If not then memory usage isn't relevant. It's all relative. You're not doing anything particularly wrong but if you dropped the tabular organization (i.e. stick with SQL) you would probably be using less memory. All the considerations I was discussing were "what is the most intuitive way to organize my data", not "what is the most efficient way to accomplish the task". What are you using the DataFrame for anyway? As in what's your overall goal? If it's just to group and filter then I wouldn't bother with any non-stdlib data structures. Maybe check the library toolz and some of its tutorials for ideas for relatively simple non-mathematical data processing using standard library data structures. If I didn't have to do statistics I'd probably do all my data analysis in the functional style promoted by toolz.
|
# ? Aug 1, 2014 23:00 |
|
I think you're right in a way. I like the flexibility Pandas has for shifting and grouping data on the fly, but my app only has about 12 dimensions to track and report on. Given that, I did some testing with raw SQL and I shaved the function I posted above from 1.7s to 0.03s in Pycharm. I might just hard-code each dimension into raw queries. Ultimately I'm concerned about memory usage as I'm building a live reporting app and I know that hitting the operational DB for reporting could have scaling issues when I launch. I'll consider doing some ETL if my app is successful in the market, but until then I just want a moderately efficient way to report on production operational data and do some simple OLAP transformations with it on the fly.
|
# ? Aug 1, 2014 23:37 |
|
For my current project I'm finding myself writing a series of classes that look like this:Python code:
I noticed the similarity to nameduples, and I am wondering if this is the sort of situation to look into metaclasses. On one hand, it's working now, and from what I read, you shouldn't use metaclasses unless you've exhausted the simpler alternatives. On the other hand, it's tempting to define all these class attributes in a dict or even a JSON or YAML file, and then loop over it to construct the classes. That would require a metaclass, no? Or at least a "class factory" function. It would also require namespace hacking I think, which implies to me that it's a bad idea. Where did I go wrong then? What's my best approach here?
|
# ? Aug 3, 2014 00:13 |
|
Why even have subclasses when the important behaviour is in the parent class? Why not just pass the fields you specialise with to the constructor?
|
# ? Aug 3, 2014 00:54 |
|
Right. Because there are certain combinations of inputs that will always be used together, and never varied, and I can say that categorically. These implement a protocol to talk to hardware devices, for example the Polhemus Liberty Latus is a device that is recognized by the string vrpn_Tracker_LibertyHS and has two arguments, the number of markers and the baudrate. No one will remember what arguments it takes or exactly what string they need to define it. If people have to look these things up every time or copy-paste someone else's code, I'll be murdered in my sleep. That's why a namedtuple-esque solution would be nice.
|
# ? Aug 3, 2014 02:04 |
|
Why do you have a superclass with behavior that is specific to how a subclass has defined a configuration parameter? Shouldn't that behavior be defined in the subclass instead? It makes no sense to have 'vrpn_Tracker_LibertyHS' behavior in the superclass, put that code in the sublcass instead Your superclass should have code that is general enough that it is applicable to all of its inherited classes. It shouldn't have code that is specific to a subclass; that code belongs in the subclass.
|
# ? Aug 3, 2014 05:54 |
|
QuarkJets posted:Why do you have a superclass with behavior that is specific to how a subclass has defined a configuration parameter? Shouldn't that behavior be defined in the subclass instead? It makes no sense to have 'vrpn_Tracker_LibertyHS' behavior in the superclass, put that code in the sublcass instead There is no behavior specific to vrpn_Tracker_LibertyHS in the superclass. The whole thing is an interface to a C library, all the device-type-specific behavior is in there. All my code does with those subclass-specific class attributes is write the configuration to a text file (which is used to a launch a server with the C library). Here, instead of explaining it I'll post the relevant part of the implementation I have now: Python code:
SurgicalOntologist fucked around with this message at 08:22 on Aug 3, 2014 |
# ? Aug 3, 2014 07:51 |
|
So I'm dorking my way through reworking some stuff into python for learning and I'm confused by something involving classes.code:
code:
|
# ? Aug 3, 2014 19:10 |
|
Shugojin posted:So I'm dorking my way through reworking some stuff into python for learning and I'm confused by something involving classes. First of all, if nothing is changed in the __init__ you don't have to even define it. If you want to change some little thing that is commonly done like this (assuming Python 3). Python code:
|
# ? Aug 3, 2014 19:20 |
|
Oh okay. That's doing what I need it to, thanks! (I ordinarily do CamelCase my names for things, I just screwed up my fake code )
|
# ? Aug 3, 2014 19:46 |
|
Don't use camelcase for anything but classes though.
|
# ? Aug 3, 2014 19:52 |
|
Clarifying on the style rules: just the definition and not the name of whatever instance of the class, right?
|
# ? Aug 3, 2014 20:19 |
|
Shugojin posted:Clarifying on the style rules: just the definition and not the name of whatever instance of the class, right? Yup, CamelCase for class names, snake_case for variable and method names.
|
# ? Aug 3, 2014 20:54 |
|
Incidentally while completely refactoring some really old code I had written I found that converting it from camelCase to snake_case one bit at a time served as a good indicator for telling me which parts of the code I had looked at already.
|
# ? Aug 3, 2014 21:04 |
|
In Pandas, what do I do if gcf().autofmt_xdate() doesn't seem to be working properly? I have dates that are in the format of 2014-05-02T17:00:00, and I want them to be plotted as the x-axis in a plot against multiple y values. edit: I think I got this working properly. the fucked around with this message at 16:50 on Aug 4, 2014 |
# ? Aug 4, 2014 16:15 |
|
Can you help a noob make this work? Don't mind that this is Django, the problem is purely Python-related as far as I can tell. Also, I'm mainly a Java programmer, and it may show. First, I tried to make two of the classes below Python-abstract, but failed, apparently because the Django Model isn't abstract, so I settled for Django-abstract. That unicode stuff in the second and third classes looks rather silly, now that I think about it, but super doesn't exactly want to cooperate in a more reasonable implementation (like a regular, non-overriden method). Python code:
|
# ? Aug 4, 2014 21:36 |
|
supermikhail posted:stuff What is the problem you're having? After a quick look, I see that you're using super instead of super().
|
# ? Aug 4, 2014 22:31 |
|
It says it wants a parameter for super(). More specifically: "TypeError: super() takes at least 1 argument (0 given)"
|
# ? Aug 4, 2014 22:42 |
|
supermikhail posted:It says it wants a parameter for super(). Did you look at the docs for super? You'll need to provide the class and an instance...namely AdditionProblem and self. Additionally, __unicode__ doesn't take an argument so you can't do that. Like so: Python code:
|
# ? Aug 4, 2014 23:10 |
|
Sorry, I guess I tried to get by on examples, plus the official docs have always looked too arcane to me, so I've left them as a last resort (in fact I didn't have them bookmarked until now). What does "unbound" mean in the case of super with omitted second argument? It occurred to me that I can do TwoOperandProblem's unicode with a wildcard (I'm not sure how appropriate this term is here, but), then replace it with the concrete operator in a subclass. Does this sound alright?
|
# ? Aug 5, 2014 09:17 |
|
I have a technical interview for a Python position today at 4:30.
|
# ? Aug 5, 2014 16:25 |
|
supermikhail posted:Sorry, I guess I tried to get by on examples, plus the official docs have always looked too arcane to me, so I've left them as a last resort (in fact I didn't have them bookmarked until now). What does "unbound" mean in the case of super with omitted second argument? The Python docs, while flawed, are among the better language docs I've used. Just try using them awhile. Can you return a formattable string from TwoOperandProblem's __unicode__? Something like this stupid and contrived example: Python code:
|
# ? Aug 5, 2014 16:40 |
|
the posted:I have a technical interview for a Python position today at 4:30. Well, good luck!
|
# ? Aug 5, 2014 17:01 |
|
Thermopyle posted:The Python docs, while flawed, are among the better language docs I've used. Just try using them awhile. Oh, god! I thought format works like regular Java printf, and that turned the output into something interesting. Actually, the "{}" are supposed to indicate range, so that it would print "{1:100} + {1:100}" when it means "add anything from 1 to 100 to anything from 1 to 100". In terms of code Python code:
|
# ? Aug 5, 2014 17:41 |
|
I'm grabbing some items from a query and importing them into a pandas dataframe. It looks like this: code:
I've tried: ' CloseDate Id' 'CloseDate Id' ' CloseDate' 'CloseDate'
|
# ? Aug 5, 2014 17:53 |
|
|
# ? May 9, 2024 14:44 |
|
the posted:I'm grabbing some items from a query and importing them into a pandas dataframe. Have you tried df['CloseDate Id']?
|
# ? Aug 5, 2014 17:58 |