Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
Portland Sucks
Dec 21, 2004
༼ つ ◕_◕ ༽つ
I'm building a program that needs to be able to take a list of rules from a JSON file and parse them into essentially a sequence of conditional statements in a function. The parameters are defined within the domain of the problem but the rules need to be able to be generated by an operator of the program on the fly by only interacting with the config file.

For example the config file might be something like

code:
{
    "mode1": [
            {
                "rule": "{current_time} > 0800 AND {current_time} < 2000",
		"priority": "High",
		"enabled": "true" 
            },   
            {
                "rule": "{current_temp} > 90",
		"priority": "High",
		"enabled": "true" 
            },
            {
                "rule": "{vacant} == false",
		"priority": "Medium",
		"enabled": "false" 
            }
     ],
    "mode2": [
                ...... another set of rules ......
    ]
    
}


The result being that, given an input request of mode1, mode2, etc... each having a unique set of user defined rules, the program can cycle its way through the given states resolving conflicts along the way. So some sort of finite state machine more or less. I know how I could go about hard coding it, but I'm having some trouble coming up with a way to build what seems to be a finite state machine machine. Is it possible to dynamically generate code in python? Or is there a construct that exists for this purpose?

Adbot
ADBOT LOVES YOU

DarthRoblox
Nov 25, 2007
*rolls ankle* *gains 15lbs* *apologizes to TFLC* *rolls ankle*...

GrAviTy84 posted:

I have a data set that has a lot of features. Each row is an event but each event is part of a larger event (it's power outage data, I have bulk outage events, broken down into restoration steps, as well as geographical categories, device categories, etc.) I want to be able to "collapse" different categories so that, say I just have a total time elapsed for one outage event by summing over the time taken in each restoration step, but maintain dataframe structure so that if I wanted to break it down further into GIS or month, or whatever sub categories, it would be easy to do later on. As such, I'm trying to develop a function that will take a df, a column name to categorize rows into, and a column name that is summed in each category. I'd like to keep it as standalone as I can so I can collapse these indices in different ways depending on what features I want to study. right now I have:

code:
def sumByFeature(df,sumIndex,sumValue):
    featureIndex = df.groupby(by=sumIndex)[sumValue].sum()
    df = pd.concat([df,featureIndex], axis=1, join_axes=[featureIndex['sumIndex'])
    return df.drop_duplicates([sumIndex],keep='first')
but it really doesnt like the way I'm passing the column name into the function. Is there a correct way to pass a column name of a dataframe into a function?

semi-related: is there a data science thread? I suppose it could be either in here or in SAL.

edit: there might be a much better way to do this, too. I'm new to doing this stuff with python so I'm still learning a lot of the methods that I can use stock.

Could you post some sample data and what you would want the outcome to look like?

Dominoes
Sep 20, 2007

Munkeymon posted:

Sounds like you want a rules engine.
Thanks, homes. From what I gather so far, sounds like I need to roll my own, STS!!! Django.

Rosalind
Apr 30, 2013

When we hit our lowest point, we are open to the greatest change.

I am about to give up on trying to learn Python.

I have completed every Python tutorial recommended to me. They all speak of this beautiful, elegant programming language and everything is so easy to do. Oh yes I know how to set strings to variables and make them all lower case and count the length of a string and add, divide, and subtract. Oh boy do I know how to do these things. But then I get to actually getting started with actually using Python to do anything useful and it turns into a poo poo show.

I spent 5 hours tonight just trying to do the following tonight and have failed miserably: import a CSV that I exported from R and then run basic descriptive statistics on it. I thought this would be a good first task as I have analyzed these data before in R and know them like the back of my hand.

I googled "Import csv from R and descriptive statistics python" (or some approximation thereof). I found people talking about something called "pandas." I then found people discussing Anaconda and Rodeo as "good environments" for data analysis in Python.

I installed Anaconda. It gave me a choice of several different things which did not seem relevant including RStudio??? I googled again and found that the one called Spyder was the one I needed. Every piece of code I tried to run in this Spyder thing gave me various error messages and no amount of googling got me any useful answers.

I then installed Rodeo. This looked like RStudio so I was happy at first, but again nothing seemed to work right. I found some code that someone posted to import a csv using a package called rpy2. I spent 2 hours getting rpy2 installed but I think I finally did after so much searching and then manually installing it because apparently it doesn't work normally in Windows and monkeying about with Paths and writing some custom pip thing that I don't even really know why it was necessary.

At this point at least I would run "import rpy2" and no error message popped up. Maybe I had broken it so much that it didn't even know to give me an error message because none of the rpy2 functions would actually work.

My questions now are: 1. How the heck does anyone ever learn how to use this? I'm the first to admit that I'm no programming guru, but I have some experience with several different languages and actually getting started with Python makes absolutely no sense. Everything feels like the biggest clusterfuck of dependencies and "install X, Y, and Z to get W to work but to get X to work install A and B which require C to run on Windows." 2. Is there a simple, idiot-proof (because that's what I am apparently) guide to moving from R to Python for data analysis and hopefully eventually machine learning?

huhu
Feb 24, 2006

I'd say first calm down, you'll have a lot of experiences like this ahead so you'll need to manage them better. Second, you have this forum, ask specific questions more often. Perhaps start on a project that's not so hard.

Tigren
Oct 3, 2003

Rosalind posted:

I am about to give up on trying to learn Python.

I have completed every Python tutorial recommended to me. They all speak of this beautiful, elegant programming language and everything is so easy to do. Oh yes I know how to set strings to variables and make them all lower case and count the length of a string and add, divide, and subtract. Oh boy do I know how to do these things. But then I get to actually getting started with actually using Python to do anything useful and it turns into a poo poo show.

I spent 5 hours tonight just trying to do the following tonight and have failed miserably: import a CSV that I exported from R and then run basic descriptive statistics on it. I thought this would be a good first task as I have analyzed these data before in R and know them like the back of my hand.

I googled "Import csv from R and descriptive statistics python" (or some approximation thereof). I found people talking about something called "pandas." I then found people discussing Anaconda and Rodeo as "good environments" for data analysis in Python.

I installed Anaconda. It gave me a choice of several different things which did not seem relevant including RStudio??? I googled again and found that the one called Spyder was the one I needed. Every piece of code I tried to run in this Spyder thing gave me various error messages and no amount of googling got me any useful answers.

I then installed Rodeo. This looked like RStudio so I was happy at first, but again nothing seemed to work right. I found some code that someone posted to import a csv using a package called rpy2. I spent 2 hours getting rpy2 installed but I think I finally did after so much searching and then manually installing it because apparently it doesn't work normally in Windows and monkeying about with Paths and writing some custom pip thing that I don't even really know why it was necessary.

At this point at least I would run "import rpy2" and no error message popped up. Maybe I had broken it so much that it didn't even know to give me an error message because none of the rpy2 functions would actually work.

My questions now are: 1. How the heck does anyone ever learn how to use this? I'm the first to admit that I'm no programming guru, but I have some experience with several different languages and actually getting started with Python makes absolutely no sense. Everything feels like the biggest clusterfuck of dependencies and "install X, Y, and Z to get W to work but to get X to work install A and B which require C to run on Windows." 2. Is there a simple, idiot-proof (because that's what I am apparently) guide to moving from R to Python for data analysis and hopefully eventually machine learning?

https://www.dataquest.io/blog/python-vs-r/

That's from the first page of a Google search for "r vs python statistics".

QuarkJets
Sep 8, 2008

Rosalind posted:

My questions now are: 1. How the heck does anyone ever learn how to use this? I'm the first to admit that I'm no programming guru, but I have some experience with several different languages and actually getting started with Python makes absolutely no sense. Everything feels like the biggest clusterfuck of dependencies and "install X, Y, and Z to get W to work but to get X to work install A and B which require C to run on Windows." 2. Is there a simple, idiot-proof (because that's what I am apparently) guide to moving from R to Python for data analysis and hopefully eventually machine learning?

1a. Learning Python is like learning any other language, including R. Use it with whatever project you need to currently do, work through issues, ask questions on the internet, ask about best practices as you become more comfortable with the syntax

1b. Rolling your own Python environment in Windows is kind of finnicky which is why people suggested you download and use Anaconda. Anaconda is basically a stand-alone Python distribution that comes with many other useful packages. This means that you don't have to figure out the dependencies yourself; Anaconda has done that for you, and if you need something else then you can probably add it with conda. If you have anaconda and another python installation then you could be accidentally updating one and not the other, which would be confusing and frustrating and I suspect that's what you've been experiencing. I'd suggest removing everything, installing just anaconda, and then installing whatever additional packages you need with either conda (in case anaconda already knows of the package) or pip (in the rare cases where it doesn't, you can probably still get the package with pip, and you still won't need to worry about dependencies yourself; pip and conda both do this for you).

2. Tigren's link is good, Googling around can also get you more resources. Here's a cheat sheet for converting R commands to Python commands (or Matlab commands in case you have Stockhom's Syndrome or are forced to use MathWorks products)
http://mathesaurus.sourceforge.net/matlab-python-xref.pdf

Here's a cheat sheet for just Machine Learning algorithms:
https://www.analyticsvidhya.com/blog/2015/09/full-cheatsheet-machine-learning-algorithms/

I looked up rpy2, and it sounds like that's a package that lets you use R objects in Python? That sounds neat. It looks like rpy2 is recognized by conda, so you can just install it with the command shown at this page:
https://anaconda.org/r/rpy2

vikingstrike
Sep 23, 2007

whats happening, captain
If you are going to be doing data analysis in python, you'll want to learn and be comfortable with pandas and numpy. Given you are coming from R I would guess pandas is exactly what you are looking for.

import pandas as pd
frame = pd.read_csv(my.csv)

And go to town.

Dominoes
Sep 20, 2007

Rosalind - Python's a general-purpose language, so the tools you need for things like statistics will be included in third-party packages. Finding the right package, and info on how to use it can be confusing, as you've found out. Like Quarkjets said, start with Anaconda, since the packages you need will already be included.

The packages you need in this case are pandas (to read the CSV, and do some stats), and scipy.stats, and statsmodels for the anaysis. Post what you're specifically looking for, or post equivalent R code, and we'll give you example code. Translating between Python and R is usually easy.

Dominoes fucked around with this message at 11:29 on Feb 11, 2017

GrAviTy84
Nov 25, 2004

Viking_Helmet posted:

Could you post some sample data and what you would want the outcome to look like?

I just started at this company and it's utilities data so I don't know how sensitive it is. here's some fake data showing sort of what's going on.
code:
	Event ID	Event value	location	event cause
1	1		1		LA		weather
2	1		1		LA		weather
3	1		1		LA		weather
4	2		1		Santa Ana	weather
5	3		1		LA		3rd party
6	4		1		Anaheim	        maintenance
7	4		1		Anaheim	        maintenance
8	4		1		Anaheim	        maintenance
9	4		1		Anaheim	        maintenance
10	4		1		Anaheim	        maintenance
11	5		1		Santa Ana	3rd party
12	5		1		Santa Ana	3rd party
so I want to make a function that takes this dataframe and 'collapses' repeats in certain columns and spits out a df with the same column structure. so for instance function(df, Event_ID, Event_value) would return a dataframe that looked like:

code:
	Event ID	Event value	location        event cause
1	1			3	LA              weather
2	2			1	Santa Ana	weather
3	3			1	LA              3rd party
4	4			5	Anaheim	        maintenance
5	5			2	Santa Ana	3rd party

GrAviTy84 fucked around with this message at 22:14 on Feb 11, 2017

GrAviTy84
Nov 25, 2004


Like everyone said, anaconda is the way to go to get started, it has most everything you need all wrapped up in a tidy windows installer. Additionally I would get something like PyCharm which is an IDE that is a little more R studio-like in some ways and has some built in things that will make learning a little easier like syntax highlighting and stuff. There are a lot of youtube videos that will walk you through step by step how to get anaconda and pycharm setup. Like Dominoes said, python is general purpose so you have all these other "toolboxes" so to speak that are more specialized. For python the big ones for data science are: pandas (handles data frames), numpy/scipy (mathematics/statistics and other science analysis tools), matplotlib (data visualization), and scikitlearn (machine learning stuff). So just as a quick explanation of how it all comes together, look at vikingstrike's code bit: (apologies if you know all this already but I think reading through code translated into in everyday language really helps with understanding what's going on with syntax and will probably make life a lot easier)

vikingstrike posted:

import pandas as pd
frame = pd.read_csv(my.csv)

he wants to use the pandas set of tools so he tells python "hey, go get pandas, but I'm lazy and dont wanna type 'pandas' all the time so go ahead and call that toolkit 'pd'". Now any time he says pd he's referring to something that exists inside the pandas package. so the next line he says: "in the pandas package, look for a function called read_csv, it's a function that takes a file and spits out a data frame and i want you to assign that dataframe output to the variable frame". so now there is a variable called frame with is basically a matrix of data parsed from the csv.

Another useful thing is to learn how to read documentation on these package sites. https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.describe.html is a really useful one for instance. Using this as an example "scipy.stats.describe()" just says in the scipy package there is a subpackage called stats, in that sub package there is a function called describe(). this function can take a bunch of parameters and theyre listed on the doc. If you see something in the function defn with an equal sign like "bias=True" that is its default value. so if you were to call describe on frame you could say:

stats.describe(frame)

and it would take all those values that have equals signs as defaults, likewise all you need to do to change them is call them, so like:

stats.describe(frame, bias = False)

the output of this function is an array of tuples and you can set a new variable equal to it if you want, like:

frameStats = stats.describe(frame)

and now all of those values are stored in that variable frameStats and you can do whatever you want with them.

Hope this helps, it's a very powerful language, especially for how readable the code is, I'm coming from C++ and am still relatively new to python but I'm noticing how robust it is while still being pretty quick to develop in. again, apologies if this is super basic and you knew all this already, but this is something I wish someone would have walked me through when I first set out in python.

damnfan
Jun 1, 2012


Above is a binary search problem problem I was working on where there were a max of 3 errors to correct. My corrections were:


This would fail for the input shown below. Does anyone know why the binary search failed there? The problem was timed and submitted with a result of 66% correct.

DarthRoblox
Nov 25, 2007
*rolls ankle* *gains 15lbs* *apologizes to TFLC* *rolls ankle*...

GrAviTy84 posted:

I just started at this company and it's utilities data so I don't know how sensitive it is. here's some fake data showing sort of what's going on.
code:
	Event ID	Event value	location	event cause
1	1		1		LA		weather
2	1		1		LA		weather
3	1		1		LA		weather
4	2		1		Santa Ana	weather
5	3		1		LA		3rd party
6	4		1		Anaheim	        maintenance
7	4		1		Anaheim	        maintenance
8	4		1		Anaheim	        maintenance
9	4		1		Anaheim	        maintenance
10	4		1		Anaheim	        maintenance
11	5		1		Santa Ana	3rd party
12	5		1		Santa Ana	3rd party
so I want to make a function that takes this dataframe and 'collapses' repeats in certain columns and spits out a df with the same column structure. so for instance function(df, Event_ID, Event_value) would return a dataframe that looked like:

code:
	Event ID	Event value	location        event cause
1	1			3	LA              weather
2	2			1	Santa Ana	weather
3	3			1	LA              3rd party
4	4			5	Anaheim	        maintenance
5	5			2	Santa Ana	3rd party

Ah, cool - I think you just want the basic groupby functionality. Assuming you have your example loaded into a dataframe, this should do it:
code:
grp = df.groupby(['Event ID','location','event cause']).sum()
grp = grp.reset_index()
print(grp)
code:
 Event ID   location  event cause  Event value
0        1         LA      weather            3
1        2  Santa Ana      weather            1
2        3         LA    3rd party            1
3        4    Anaheim  maintenance            5
4        5  Santa Ana    3rd party            2

DarthRoblox
Nov 25, 2007
*rolls ankle* *gains 15lbs* *apologizes to TFLC* *rolls ankle*...

damnfan posted:



Above is a binary search problem problem I was working on where there were a max of 3 errors to correct. My corrections were:


This would fail for the input shown below. Does anyone know why the binary search failed there? The problem was timed and submitted with a result of 66% correct.

Walk through what your function is doing, step by step:

code:
A = [1, 2, 5, 9, 9]
X = 2
#first step
m = (0 + 4) // 2 # 2
A[2] #5, greater than X so:
r = 1
#second step
m = 1 // 2 = 0
A[0] # 1, less than X so:
l = 1
#third step
l == r ##so we fail out here

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

It would also help to do that with the original code - start with an edge case like an empty array, then 1 element, then 2, then 3... see how it behaves and if you notice a pattern emerging

You can work this out on paper, but it might be a good excuse to practice using a debugger to step through the code too!

damnfan
Jun 1, 2012

Viking_Helmet posted:

Walk through what your function is doing, step by step:

code:
A = [1, 2, 5, 9, 9]
X = 2
#first step
m = (0 + 4) // 2 # 2
A[2] #5, greater than X so:
r = 1
#second step
m = 1 // 2 = 0
A[0] # 1, less than X so:
l = 1
#third step
l == r ##so we fail out here

The problem I am having is the 3 changes constraint. The binary search is fine with an additional 4th change as seen in here. Is there a way to correct this within the given constraint of 3 corrections?

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

That causes issues - the original version moves the left and right extremes towards each other in a way that they should eventually arrive on the same index, at which point the loop immediately ends. That's where the desired element should be, so then it checks if it's there or not

Your version allows another loop when the indices match, so if they're at the end of the array (if X is the largest element) then you move L forward and it's out of bounds. You could get around this by doing your equality check first, but what if X isn't in the array?

Your solution sort of changes how the algorithm works with that early return. "3 changes" is kind of arbitrary but you said "3 errors" first, which implies small corrections - otherwise what counts as a single change, y'know? I think if you look at the original code with the edge cases I mentioned (specifically 2 elements where 1st<X and 2nd=X, eg [2,5] with X=5) you'll see a problem, and there's an easy fix for it

Fusion Restaurant
May 20, 2015
1. Do people use spaces or tabs to indent? I've noticed when copy pasting that sublime text is giving me tabs by default while Spyder is giving me spaces.

2. Also, a pandas Q which was a little too abstract for me to easily answer on stackexchange:
I have a bunch of csvs which I've downloaded from a site. Each is data for one week, all have the same columns, and I'd eventually like to concatenate them all into a big pandas dataframe.

Is the best way to do this to first clean each csv (which will involve dropping some columns, so making it smaller), and then merge?

Or should I just combine all the csvs into one giant csv, then turn it into a dataframe, and then clean that dataframe?

My concern is that the second method might leave me with a dataframe which is too big to really work with in memory (ie RAM) while I'm cleaning it and paring down the # of columns. Would the first approach actually be more memory efficient?

The last thing I guess I could do is to do the data cleaning on the csv's directly by editing the strings, or by reading them into a base python dictionary/list -- but I wasn't sure if that would actually save any memory? It would definitely be much more annoying.

e: Actually maybe what I really should get is a recommendation of a good guide to memory management in pandas. I'm pretty familiar with it in R, and how to efficiently do things w/in that language, but am totally lost in Python/pandas.

Fusion Restaurant fucked around with this message at 15:33 on Feb 13, 2017

shrike82
Jun 11, 2005

the convention is 4 spaces

creatine
Jan 27, 2012




I've been using tkinter to make small GUI programs but I've found it's incredibly frustrating to get to layout the way I want. Does anyone have resources/guides for wxPython or PyQt5? Those are the two I've seen most talked about so I figured I'd start there.

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



Dominoes posted:

Thanks, homes. From what I gather so far, sounds like I need to roll my own, STS!!! Django.

There aren't some already kicking around out there?

vikingstrike
Sep 23, 2007

whats happening, captain

Fusion Restaurant posted:

1. Do people use spaces or tabs to indent? I've noticed when copy pasting that sublime text is giving me tabs by default while Spyder is giving me spaces.

2. Also, a pandas Q which was a little too abstract for me to easily answer on stackexchange:
I have a bunch of csvs which I've downloaded from a site. Each is data for one week, all have the same columns, and I'd eventually like to concatenate them all into a big pandas dataframe.

Is the best way to do this to first clean each csv (which will involve dropping some columns, so making it smaller), and then merge?

Or should I just combine all the csvs into one giant csv, then turn it into a dataframe, and then clean that dataframe?

My concern is that the second method might leave me with a dataframe which is too big to really work with in memory (ie RAM) while I'm cleaning it and paring down the # of columns. Would the first approach actually be more memory efficient?

The last thing I guess I could do is to do the data cleaning on the csv's directly by editing the strings, or by reading them into a base python dictionary/list -- but I wasn't sure if that would actually save any memory? It would definitely be much more annoying.

e: Actually maybe what I really should get is a recommendation of a good guide to memory management in pandas. I'm pretty familiar with it in R, and how to efficiently do things w/in that language, but am totally lost in Python/pandas.

Why would it be that much different in python? Of course working on smaller chunks of a larger data set and then putting them together will lower the amount of RAM you need at any one time.

Whybird
Aug 2, 2009

Phaiston have long avoided the tightly competetive defence sector, but the IRDA Act 2052 has given us the freedom we need to bring out something really special.

https://team-robostar.itch.io/robostar


Nap Ghost

creatine posted:

I've been using tkinter to make small GUI programs but I've found it's incredibly frustrating to get to layout the way I want. Does anyone have resources/guides for wxPython or PyQt5? Those are the two I've seen most talked about so I figured I'd start there.

I've just started with wxPython's Phoenix fork (I started with tkinter too, and then I stopped when I realised it had no support for screen-readers). I'm currently using this, which is pretty terse but it seems OK so far.

Dominoes
Sep 20, 2007

Munkeymon posted:

There aren't some already kicking around out there?
I dunno; no luck yet.

creatine
Jan 27, 2012




Whybird posted:

I've just started with wxPython's Phoenix fork (I started with tkinter too, and then I stopped when I realised it had no support for screen-readers). I'm currently using this, which is pretty terse but it seems OK so far.

Sweet I'll give this a shot. Thanks!

damnfan
Jun 1, 2012

baka kaba posted:

That causes issues - the original version moves the left and right extremes towards each other in a way that they should eventually arrive on the same index, at which point the loop immediately ends. That's where the desired element should be, so then it checks if it's there or not

Your version allows another loop when the indices match, so if they're at the end of the array (if X is the largest element) then you move L forward and it's out of bounds. You could get around this by doing your equality check first, but what if X isn't in the array?

Your solution sort of changes how the algorithm works with that early return. "3 changes" is kind of arbitrary but you said "3 errors" first, which implies small corrections - otherwise what counts as a single change, y'know? I think if you look at the original code with the edge cases I mentioned (specifically 2 elements where 1st<X and 2nd=X, eg [2,5] with X=5) you'll see a problem, and there's an easy fix for it

Thanks, I was stuck with the binary search implementation that I was familiar with rather than thinking of other implementations. The edit that satisfied the requirements ended up just being:
code:
m = (l + r + 1) // 2

Hedningen
May 4, 2013

Enough sideburns to last a lifetime.
Quick question on conventions: I'm working on a hobby Python project for a game I'm involved in, and while I have it doing what I need to do, I feel like it could be more elegant.

It's a simple GUI for in-game banking software tied to a sqlite db with all internal records. Right now, my account check is a mess of if/elif statements based off of checkbox variables, with different data getting called in depending on the check. Thanks to the old method (doc spreadsheet) being cluttered and annoying to manage and search, I'm adhering pretty strictly to normalization standards, but it means that with my terrible coding, I'm writing full SQL statements for each variable possibility that could be called and need multiple JOINs to show all the data effectively. As this is meant to expand when more bank-tracked items are added, this is pretty poor design and will mean more chances of things breaking/not working.

I'm not entirely sure of the best way to handle this, as I'm still a novice. The general code example (with likely errors due to being away from my actual code, as it's on a system I don't have handy at work):

code:

import sqlite3 as dbc
import sys

con = dbc.connect('sample_db')
foo = 1
bar = 0
cid = 12345

def bankSearch():
      cur = con.cursor()
      if foo == 1 and bar == 1
      cur.execute("SELECT * FROM foo_table, bar_table, user_table WHERE uid.foo_table = ? AND uid.foo_table = uid.bar_table AND uid.foo_table = uid.user_table"), (cid))
      cur.close()
      elif foo == 1 and bar == 0
      cur.execute("SELECT * FROM foo_table, user_table WHERE uid.foo_table = ? AND uid.foo_table = uid.user_table"), (cid))
      cur.close()
      elif . . . 

Assuming the if/elif statements work correctly, how could I streamline this sort of thing for an arbitrary number of queries based on variable assignment further up in the code?

My current thought is simply running if/else for the individual variables, as they're simple booleans based on checkboxes and that means I'll just need a pair for every table added, but I'm not 100% sure that's the best solution.

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

damnfan posted:

Thanks, I was stuck with the binary search implementation that I was familiar with rather than thinking of other implementations. The edit that satisfied the requirements ended up just being:
code:
m = (l + r + 1) // 2

I was thinking of math.ceil, but that works too and it's probably a bit neater!

Azuth0667
Sep 20, 2011

By the word of Zoroaster, no business decision is poor when it involves Ahura Mazda.
Tkinter is a pain in the rear end.

accipter
Sep 12, 2003

Azuth0667 posted:

Tkinter is a pain in the rear end.

GUI programming is so totally foreign. It really takes a while to learn how to go about doing things. (At least it did for me when I learned Qt)

Fusion Restaurant
May 20, 2015

vikingstrike posted:

Why would it be that much different in python? Of course working on smaller chunks of a larger data set and then putting them together will lower the amount of RAM you need at any one time.

I think my confusion was basically how much RAM a merge takes? I.e. should I go out of my way to avoid them because they involve copying each dataframe and doing some other inefficient poo poo, or are they basically not a big deal to run? For R if you were using certain packages/methods of merging a merge could use like 4x the RAM required to just store the dataframe while the merge was going on.

I'm probably worrying about this too much at the moment anyways -- it seems like in general Python just uses less memory so I haven't really been running into any issues even w/ basically no optimization.

vikingstrike
Sep 23, 2007

whats happening, captain
Merge in pandas has a "copy" parameter that controls that exact behavior. You'll want to set it to False.

GrAviTy84
Nov 25, 2004

Viking_Helmet posted:

Ah, cool - I think you just want the basic groupby functionality. Assuming you have your example loaded into a dataframe, this should do it:
code:
grp = df.groupby(['Event ID','location','event cause']).sum()
grp = grp.reset_index()
print(grp)
code:
 Event ID   location  event cause  Event value
0        1         LA      weather            3
1        2  Santa Ana      weather            1
2        3         LA    3rd party            1
3        4    Anaheim  maintenance            5
4        5  Santa Ana    3rd party            2

thanks, this worked!

Azuth0667
Sep 20, 2011

By the word of Zoroaster, no business decision is poor when it involves Ahura Mazda.

accipter posted:

GUI programming is so totally foreign. It really takes a while to learn how to go about doing things. (At least it did for me when I learned Qt)

Yeah it is it literally reminds me of learning Spanish. The most frustrating part of tkinter is that the tutorials I find are all 2-3 years out of date or in some other version of python so their example code doesn't even work.

creatine
Jan 27, 2012




Azuth0667 posted:

Yeah it is it literally reminds me of learning Spanish. The most frustrating part of tkinter is that the tutorials I find are all 2-3 years out of date or in some other version of python so their example code doesn't even work.

Yeah this was my biggest problem. The other one was the .grid is so finnicky and I can't get it to ever work correctly

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb

HardDiskD posted:

Just spitballing here, but try to give the NamedTemporaryFile an unique prefix and/or suffix? Also, you might want to give it delete=False.

I started logging the filenames and they are definitely unique (/tmp/tmp04m80o4b & /tmp/tmp8sljj5jq) but the logs are still getting jumbled.

I'm almost certain it comes down to adding the new handler to the root logger:
code:
logging.getLogger('').addHandler(deploy_log_handler)
This seems to impact the root logger of other threads, I'm not really clear on the how and why though.

huhu
Feb 24, 2006
Uneducated question - why does memory management matter? I understand why it would with something like an Arduino which has almost no memory but why would that matter on something like a computer with so much more memory? For context, I'm reading about arrays vs linked lists.

Related question: "To create an array, we must allocate memory for a certain number of elements"... when is this done? I thought a list was an array and I could just do my_list.append(x). What are they talking about that I'm missing?

huhu fucked around with this message at 23:19 on Feb 15, 2017

ArcticZombie
Sep 15, 2010
Some of the things we do with computers can suck up a lot of memory. It can happen quickly when working with large amounts of data or even not so large amounts if your code is making inefficient use of it. No one wants a web browser that uses up 99% of their memory preventing them from running anything else.

I'm guessing that the material you are reading is not Python-specific. A list in Python is not a simple array like the one they are talking about.

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

If you do any programming challenge kind of stuff, you tend to run into problems where it's real easy to use astronomical amounts of memory (with a naive solution at least). You can generate a large graph of states following other states to hunt for optimal solutions, you might need to read in large amounts of data, or you might want to trade off memory for speed by caching lots of results instead of calculating them each time. Linked lists tend to be slower to iterate over than arrays, and if you have enough data the speed could be the difference between an algorithm that finishes in a reasonable time, or one that's unacceptable

Plus sometimes you need to run many instances of your process, so a small inefficiency can multiply into a significant one. And you might have other things to worry about, like garbage collection and how it affects the responsiveness of your system. Depends on the language etc. though

Adbot
ADBOT LOVES YOU

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb

fletcher posted:

I started logging the filenames and they are definitely unique (/tmp/tmp04m80o4b & /tmp/tmp8sljj5jq) but the logs are still getting jumbled.

I'm almost certain it comes down to adding the new handler to the root logger:
code:
logging.getLogger('').addHandler(deploy_log_handler)
This seems to impact the root logger of other threads, I'm not really clear on the how and why though.

I've tried to distill this down into a more concise test case: https://github.com/fletchowns/logging_test

The main log looks fine:
code:
2017-02-16 02:19:12,132 MainThread root INFO     Hello my future girlfriend
2017-02-16 02:19:12,133 Thread-1 worker_logger INFO     Hello from worker 1
2017-02-16 02:19:12,133 Thread-1 other_module.something INFO     Doing something with 1
2017-02-16 02:19:12,134 Thread-2 worker_logger INFO     Hello from worker 2
2017-02-16 02:19:12,134 Thread-2 other_module.something INFO     Doing something with 2
2017-02-16 02:19:12,135 Thread-3 worker_logger INFO     Hello from worker 3
2017-02-16 02:19:12,135 Thread-3 other_module.something INFO     Doing something with 3
2017-02-16 02:19:12,136 Thread-4 worker_logger INFO     Hello from worker 4
2017-02-16 02:19:12,137 Thread-4 other_module.something INFO     Doing something with 4
2017-02-16 02:19:12,137 MainThread root INFO     This is what I sound like
2017-02-16 02:19:14,135 Thread-1 worker_logger INFO     Goodbye from worker 1
2017-02-16 02:19:14,136 Thread-2 worker_logger INFO     Goodbye from worker 2
2017-02-16 02:19:14,138 Thread-3 worker_logger INFO     Goodbye from worker 3
2017-02-16 02:19:14,139 Thread-4 worker_logger INFO     Goodbye from worker 4
The worker logs do not have my desired output though:
code:
2017-02-16 02:19:12,133 INFO     Hello from worker 1
2017-02-16 02:19:12,134 INFO     Hello from worker 2
2017-02-16 02:19:12,135 INFO     Hello from worker 3
2017-02-16 02:19:12,136 INFO     Hello from worker 4
2017-02-16 02:19:14,135 INFO     Goodbye from worker 1
I want it to instead be:
code:
2017-02-16 02:19:12,133 INFO     Hello from worker 1
2017-02-16 02:19:12,133 Thread-1 other_module.something INFO     Doing something with 1
2017-02-16 02:19:14,135 INFO     Goodbye from worker 1
And then each of the other work logs looking exactly like that with just the number changed. What am I doing wrong?

  • Locked thread