JVNO posted:Wow, great responses and super quick. Unfortunately the responses aren’t easily applied to my own program- and I decided instead to rebuild the program in a way that obviated the need for removal. This is an interesting problem that's more difficult than it appears. I spent a little bit of time fiddling with it and wasn't able to find a solution that preserved 'true' randomness (i.e, all valid strings are equally likely given the limits of the prng) that wasn't O(n2) or worse. If it's not sensitive, would you mind showing me what you end up with?
|
|
# ? Mar 3, 2018 00:39 |
|
|
# ? May 27, 2024 19:10 |
|
I'm trying to pass a json file to an Amazon DynamoDB table and it's turning up its nose at it. To whit:code:
code:
code:
code:
code:
|
# ? Mar 3, 2018 02:13 |
|
json.loads is turning your pandas json file into a nested dict. You're getting yelled at because "collector_key" is in fact a key whose value is a dict, and you can't turn a dict into an int. E: It looks like your code is assuming that you have a collector key for each number, but what you actually have is one collector key with a bunch of numbers. E2: But that still wouldn't work with your code now, because you're accessing "collector_key" from the same outer dict that you are trying to iterate over. Dr Subterfuge fucked around with this message at 02:44 on Mar 3, 2018 |
# ? Mar 3, 2018 02:34 |
|
Wait, I just realized that the index - or row number, whatever you want to call it - is being included in "collector_key": {"collector_key":{"0":-1,"1":-1,"2":139517343969,"3":-1,"4":-1...} All those numbers that are bolded are totally superfluous. I wonder if removing them (if possible) will fix the problem.
|
# ? Mar 3, 2018 03:20 |
|
Seventh Arrow posted:Wait, I just realized that the index - or row number, whatever you want to call it - is being included in "collector_key": No this is not what you need to do. to_json is not storing "superflous" data, but it's also not storing the data in the format you need. When you look at a json file you can read [] and {} just like you do in python. When you call json.load on the file [] becomes a list and {} becomes a dict. What's going on is you have a table of data like: code:
code:
Pandas .to_json stores the data differently though. to_json stores it like this: code:
For what you are trying to do I think the "json" part is leading you down a garden path. You need to do something like this: code:
|
# ? Mar 3, 2018 11:14 |
|
Thanks for your reply. I was just thinking of this, and maybe it would be possible to read each row directly from the dataframe and not even bother with the json file? I'm not sure (yet) how to call a dataframe row by row, but maybe it's better to eliminate the extra step of going to a file in the first place.
|
# ? Mar 3, 2018 14:16 |
|
Seventh Arrow posted:I'm not sure (yet) how to call a dataframe row by row That's what iterrows does.
|
# ? Mar 3, 2018 14:26 |
|
I guess I'll have to look it up, but doesn't pandas using iloc (or something) to call on a given row? Anyways, I'll give your code a try - many thanks!
|
# ? Mar 3, 2018 14:43 |
|
Check the orient argument of to_json; pandas offers four different ways to organize the JSON. https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_json.html
|
# ? Mar 3, 2018 15:22 |
|
One is for indexing one is for iterating.
|
# ? Mar 3, 2018 15:22 |
|
I have what might be a dumb question which I've tried to explain comprehensibly with questionable success, but hopefully someone can point me in the right direction. I'm dealing with anywhere from a hundred to a few thousand objects in a list (or not in a list, but the order of the objects is relevant), each of which have ~five attributes which I'll call object.a through object.e for simplicity's sake. What I need to do is find certain patterns of objects with certain attributes: for example, I might need any two objects that do not have the same value for object.a, do have the same value for object.b, and are not separated by any objects that have one of a particular set of values for object.d. There are 30 or so patterns I need to match, many of which don't have a fixed upper bound for how many objects can fit in different positions within the pattern, and many of which require matching the values of objects to each other or the number of objects fulfilling one condition to the number fulfilling another condition (ex: any number of consecutive objects with the same .a value followed by the same number of objects with the same .b value). I could write an individual function to handle each pattern individually, but I'd rather not for obvious reasons, and I can imagine performance becoming a nightmare pretty quickly. Basically, I need the functionality of regex but for objects and across multiple attributes. What's the most reasonable way to approach this?
|
# ? Mar 3, 2018 21:08 |
|
Wouldn't it be possible to do a regex with an if/elif/else setup?
|
# ? Mar 3, 2018 21:39 |
|
What are those attributes? Strings? Floats? Other classes?
|
# ? Mar 3, 2018 22:02 |
|
Eela6 posted:This is an interesting problem that's more difficult than it appears. I spent a little bit of time fiddling with it and wasn't able to find a solution that preserved 'true' randomness (i.e, all valid strings are equally likely given the limits of the prng) that wasn't O(n2) or worse. I'll post the full solution after bugfixing, but I'll be pretty honest here: I'm kind of brute-forcing it with 'if' handlers for every legal sequence order With enough artificial variability to appear random. So the placement of L2T and L4T are simple enough: code:
|
# ? Mar 3, 2018 22:22 |
|
I've seen this before, why would you do "from random import *" instead of just "import random"?
|
# ? Mar 3, 2018 22:31 |
To make the rest of your code more opaque and confusing, duh.
|
|
# ? Mar 3, 2018 22:59 |
|
It puts everything from the imported package into your global namespace, so you can do things like call shuffle() directly instead of calling random.shuffle(). Practically its advantage is it cuts down on typing. Maybe there are other reason to do it that I am not aware of. It's generally not a good idea though because it imports everything implicitly, which makes it harder to understand where something like shuffle is defined, and it could cause hidden conflicts if you have something else with the same name in your global namespace. You can get the same behavior more explicitly by doing "from random import shuffle as shuffle" and you only get what you want.
|
# ? Mar 3, 2018 23:06 |
|
Seventh Arrow posted:I've seen this before, why would you do "from random import *" instead of just "import random"? Functionally: Dr Subterfuge posted:It puts everything from the imported package into your global namespace, so you can do things like call shuffle() directly instead of calling random.shuffle(). But here's the Overly Honest Methods answer: The 'import' section of the code is part of a multiple-generations old experiment done by a long-graduated PhD that I have inherited and modified as necessary. and, well, 'if it ain't broke...'
|
# ? Mar 3, 2018 23:09 |
|
Seventh Arrow posted:Wouldn't it be possible to do a regex with an if/elif/else setup? That's the first approach that came to mind, but it seems pretty cumbersome. QuarkJets posted:What are those attributes? Strings? Floats? Other classes? Strings mostly, a few are booleans. JVNO posted:I've completed most of the coding for the L8T trials... But I'm still dealing with a few bugs. Given that it doesn't need to be truly random, it seems like it would be easier to create a function that finds all valid indices where a given form can be inserted and then have it pick a random one, although you would create configurations that would be impossible to complete some percentage of the time. Edit: Like this, but less poo poo/lazy, probably with some if statements for find_spot managing to find no valid positions, and maybe even some randomness in the order of insertion (it's been a long day, but this seems to work correctly): Python code:
Wallet fucked around with this message at 01:17 on Mar 4, 2018 |
# ? Mar 4, 2018 00:18 |
|
Welp, this is the best I could come up with after hours and hours of coding and troubleshooting:Python code:
quote:Given that it doesn't need to be truly random, it seems like it would be easier to create a function that finds all valid indices where a given form can be inserted and then have it pick a random one, although you would create configurations that would be impossible to complete some percentage of the time. I need to test this for my own purposes, but if this works, it's much more elegant. I have a lot yet to learn about python Edit: Welp, that's a story as old as time. Spend days working on a piece of code only to have a much simpler solution presented after you finally figure it out. That version works and is a hell of a lot better than my code. Hope you don't mind me yanking that for my experiments? PoizenJam fucked around with this message at 01:33 on Mar 4, 2018 |
# ? Mar 4, 2018 01:21 |
|
JVNO posted:Edit: Welp, that's a story as old as time. Spend days working on a piece of code only to have a much simpler solution presented after you finally figure it out. That version works and is a hell of a lot better than my code. Hope you don't mind me yanking that for my experiments? Go for it, happy to help. Just mind that I think it's theoretically possible for it to get itself into a state where it can't finish, which you might want to account for.
|
# ? Mar 4, 2018 04:08 |
|
Wallet posted:Go for it, happy to help. Just mind that I think it's theoretically possible for it to get itself into a state where it can't finish, which you might want to account for. I ran 100 000 iterations of your list generation with no errors, so I'll take my chances I think the single items provide enough degrees of freedom for list ordering that it's impossible to generate an invalid list set.
|
# ? Mar 4, 2018 06:07 |
|
JVNO posted:I ran 100 000 iterations of your list generation with no errors, so I'll take my chances Fair enough; I wasn't sure if the distribution was always the same or not, and I was also too lazy to test it 100,000 times.
|
# ? Mar 4, 2018 13:20 |
|
Anyone here use Pandas to generate reports for end-users? If so, what does your workflow look like? I'm stuck a bit in the middle where my current process is to use Python to do data cleanup but then I load the data in an Excel workbook for charts and pivot tables to share with users.
|
# ? Mar 4, 2018 22:53 |
|
What type of reports are you thinking? Out of what you describe, the logical addition would be matplotlib/seaborn to plot figures in python.
|
# ? Mar 4, 2018 22:57 |
|
There are also ways to automate Excel file creation from python if you haven't already gone that route.
|
# ? Mar 4, 2018 23:01 |
|
vikingstrike posted:What type of reports are you thinking? Out of what you describe, the logical addition would be matplotlib/seaborn to plot figures in python. I work in healthcare and my current report goes to department managers and shows staff compliance for documentation of a certain procedure. The vast majority of managers are not technical but they are comfortable enough to open up the Excel workbook I email them and at least look at the first chart that shows how their department is doing against the hospital. If there is a way to paste an image inline in Outlook 2013, I've thought about removing the workbook entire and generate an email for each department and paste the charts and table inside the email body. Basically trying to spoon feed the end-user as much as possible to make their life easier.
|
# ? Mar 4, 2018 23:04 |
|
You can send email using python and write it in a way that the email should be in line. Been a while since I’ve done this, but this be easy to automate. Data cleaning -> figure generation -> email.
|
# ? Mar 4, 2018 23:07 |
|
Python question, how do you deal with doing something twice? I'm making a game of self-playing Blackjack and have the following...Python code:
|
# ? Mar 5, 2018 01:27 |
|
I would use a for loop to do the thing twice.
|
# ? Mar 5, 2018 01:32 |
|
Nippashish posted:No this is not what you need to do. to_json is not storing "superflous" data, but it's also not storing the data in the format you need. When you look at a json file you can read [] and {} just like you do in python. When you call json.load on the file [] becomes a list and {} becomes a dict. It looks like I'm still having a bit of a rough time with this. I decided to try it with a csv file instead to see if it's less fussy and I'm not so sure. I realized that "iterrows" only works on dataframes (at least, I think so) so I tried to update my code accordingly: code:
code:
code:
code:
edit: looking over the full error message, it looks like boto3 is the one doing the complaining. I want to think that maybe boto3 and pandas don't get along, but as far as I know all boto3 sees is numbers being handed to it.
|
# ? Mar 5, 2018 03:42 |
|
Look at this loop and see if you can spot where you're tripping up:code:
|
# ? Mar 5, 2018 04:19 |
|
I thought it was that last comma, but I removed it and no dice. Does the placement of the last brackets matter?
|
# ? Mar 5, 2018 04:26 |
|
Nope. It has to do with how you are first assigning collector key and sales.
|
# ? Mar 5, 2018 04:33 |
|
The [['collector_key', 'sales']] seemed superfluous, so I removed it:code:
|
# ? Mar 5, 2018 04:48 |
|
You aren't using the row data, you are using the original DataFrame. This codecode:
|
# ? Mar 5, 2018 05:12 |
|
Thank you greatly. I think I was on the right track because I came across this page: https://www.analyticsvidhya.com/blog/2016/01/12-pandas-techniques-python-data-manipulation/ and it showed rows being referenced. I was thrown off by the formatting though. I think it needs to go something like this: code:
It doesn't matter that much, though...the exercise only requires me to dump this stuff into the database so I guess I'll put it through as a string type.
|
# ? Mar 5, 2018 05:46 |
|
Sad Panda posted:Python question, how do you deal with doing something twice? I'm making a game of self-playing Blackjack and have the following... I would probably do this in your case Python code:
|
# ? Mar 5, 2018 16:10 |
|
The loop protects you from the universe's ironic sense of humour
|
# ? Mar 5, 2018 18:25 |
|
|
# ? May 27, 2024 19:10 |
|
Next part of my Blackjack program. A lookup table. A short extract of the data would be...code:
My original idea was 2D arrays, but that doesn't seem to support a column name which is what I'd call that 2/3/4/.. at the top. I found one solution, and he used a Pickled 'av table' (so the variable name suggests), but that seems a bit beyond me right now.
|
# ? Mar 5, 2018 23:26 |