Python information and short questions megathread.

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python information and short questions megathread.

«‹›484 »

duck monster: Dec 15, 2004

Karthe posted:

I'm parsing a CSV and have about 24,000 of these queries I need to run:
Python code:
"INSERT INTO words (seq_1, seq_2, fk, good_example) SELECT " + id1 + ", " + id2 + ", fk, " + str(boolCheck) + " FROM table WHERE value = '" + w + "';"
Is there a way to reduce the amount of time it takes to run all of those queries? Right now, based on some rough math, it's going to take about 3.5 hours for the script to finish. Some standard INSERT statements in the same script go blazingly fast thanks to cursor.execute("BEGIN TRANSACTION"), but I've yet to find a way to speed up the above query. I'm pretty sure it's the fact that I'm running a SELECT statement within INSERT, and each SELECT is being run in its own transaction.

I thought running all the SELECT's at once and then using their output as a parameter in executemany() would work, but doing a for loop and execute()'ing each SELECT statement took just as long. I'm stuck as to whether or not I should just resign myself to the fact that it's going to take for-loving-ever to run this script and let it do its thing overnight whenever I need to run it.

Does SQLite support views? Because maybe the solution to this is not to do it at all.

Also if your hitting SQLites limits (Its not hard, its just a babby sql for quick and nifty tasks), maybe upgrade to something more industrial like Mysql or something?

# ? May 11, 2013 06:19

Adbot: ADBOT LOVES YOU

# ? May 9, 2024 04:43

Malcolm XML: Aug 8, 2009; I always knew it would end like ｔｈｉｓ．

duck monster posted:

Does SQLite support views? Because maybe the solution to this is not to do it at all.

Also if your hitting SQLites limits (Its not hard, its just a babby sql for quick and nifty tasks), maybe upgrade to something more industrial like Mysql or something?

SQLite is surprisingly powerful if you heed its limits: single user and your data set fits into memory. It can be much faster than MySQL (lol at embedding that in an android app!) for the stuff that Karthe wants.

Also, Karthe: Indices aren't free--they slow down inserting. You might want to prevent an index rebuild on each insert and re-enable it after the bulk inserts. However, I'm 99% sure your 24000 inserts can be replaced by one properly constructed join using a table based on id1,id2, and w, but you'll need to give some more info about the structure of your data.

In addition, you should be aware of sqlite3's parameter handling: "select ? from table" with a query parameter is far more readable than constructing a string for each query.

# ? May 11, 2013 15:58

Dominoes: Sep 20, 2007

I'm trying to delete dictionary entries based on values. I'm receiving "RuntimeError: dictionary changed size during iterations" errors despite making a copy first. Any ideas? Python3.

Python code:

mydict_copy = mydict
for key in mydict_copy:
    if mydict_copy[key].objectvalue == 0:
        del mydict[key]

Dominoes fucked around with this message at 21:36 on May 11, 2013

# ? May 11, 2013 21:25

OnceIWasAnOstrich: Jul 22, 2006

You aren't actually making a copy, just another reference to the same dict.

Do something like using the copy module or a dictionary comprehension to move all the items into a new dict.

# ? May 11, 2013 21:35

breaks: May 12, 2001

You aren't making a copy. mydict_copy = mydict means that you have two names pointing at the same object.

Dictionaries do have a copy method that you can use to create a copy of them. That said, you don't actually want a copy of the dictionary in this case; you want a separate list of it's keys, which you can get with mydict.keys(). You can then iterate over that list and do whatever horrible things you must to that poor dictionary, you monster.

# ? May 11, 2013 21:39

Dominoes: Sep 20, 2007

breaks posted:

You aren't making a copy. mydict_copy = mydict means that you have two names pointing at the same object.

Dictionaries do have a copy method that you can use to create a copy of them. That said, you don't actually want a copy of the dictionary in this case; you want a separate list of it's keys, which you can get with mydict.keys(). You can then iterate over that list and do whatever horrible things you must to that poor dictionary, you monster.

OnceIWasAnOstrich posted:

You aren't actually making a copy, just another reference to the same dict.

Do something like using the copy module or a dictionary comprehension to move all the items into a new dict.

Thanks. Replacing the first line with the following worked.

code:

mydict_copy = mydict.copy()

I looked up some info on dictionary comprehensions and found phrases like this

Python code:

d = {key: value for (key, value) in sequence}

that I'm having a hard time deciperhing. Why do I want to use .keys() instead of making a copy? I need the dictionary's values to decide which entries to cruely murder.

Dominoes fucked around with this message at 22:13 on May 11, 2013

# ? May 11, 2013 22:07

bonds0097: Oct 23, 2010; I would cry but I don't think I can spare the moisture.; Pillbug

Dominoes posted:

Thanks. Replacing the first line with the following worked.
code:
mydict_copy = mydict.copy()
I looked up some info on dictionary comprehensions and found phrases like this
Python code:
d = {key: value for (key, value) in sequence}
that I'm having a hard time deciperhing. Why do I want to use .keys() instead of making a copy? I need the dictionary's values to decide which entries to cruely murder.

Because you don't want to be modifying an item as you iterate over it like that, it's called a concurrent modification. Instead, you generate a list of keys which will not be modified and iterate over that then delete any entry whose value for key is whatever you're looking for in the actual dictionary. Hope that makes sense.

# ? May 11, 2013 22:47

Dren: Jan 5, 2001; Pillbug

The problem is that you are modifying the object you are iterating over. To do what you want to do, delete entries from a dictionary while iterating it, you'll need to make use of the .keys(), .values(), or .items() methods. These methods create new lists containing keys, values, and key-value tuples.

So if you do something like:

Python code:

d = {'a' : 1, 'b' : 2, 'c' : 3}
for k, v in d.items():
  if v == 2:
    del d[k]

Everything will be fine since you're not modifying the object you are iterating over.

I encourage you to carefully read this: http://docs.python.org/2/library/stdtypes.html#typesmapping

Take note of the iterkeys, itervalues and iteritems methods. When might you use them instead of keys, values, and items?

# ? May 11, 2013 22:50

Nippashish: Nov 2, 2005; Let me see you dance!

You might as well just write mydict = {k:v for k,v in mydict.iteritems() if v.objectvalue != 0} and avoid this whole business of copying and deleting.

# ? May 11, 2013 22:53

Dren: Jan 5, 2001; Pillbug

Nippashish posted:

You might as well just write mydict = {k:v for k,v in mydict.iteritems() if v.objectvalue != 0} and avoid this whole business of copying and deleting.

Seconded. Deleting stuff from a dictionary in python is sort of weird unless you have memory constraints keeping you from making a copy.

# ? May 11, 2013 22:57

Dominoes: Sep 20, 2007

Dren posted:

I encourage you to carefully read this: http://docs.python.org/2/library/stdtypes.html#typesmapping

Take note of the iterkeys, itervalues and iteritems methods. When might you use them instead of keys, values, and items?

I'll read it.

Nippashish posted:

You might as well just write mydict = {k:v for k,v in mydict.iteritems() if v.objectvalue != 0} and avoid this whole business of copying and deleting.

I like this; using it. Caveat of iteritems() being replaced by items() in python3.

QT issue: Anyone know how to pull data from input text and combo boxes? I've been trying to solve this one for weeks with no luck, and it's stopped development of the program. I can't find anything in a search that describes the problem, and most of what I find about text and combo boxes implies I'm doing it correctly. For text boxes, I receive a None result, and combo boxes I recieve the default selection, no matter what the current one is. Code:

code:

x = window.ui.tb_x.toPlainText()
y = window.ui.cb_y.itemData(window.ui.cb_y.currentIndex()) #or
y = window.ui.cb_y.currentText()

I've had no issues with buttons, checkboxes, menus, statusbars, tree widgets etc, but can't get this to work.

Dominoes fucked around with this message at 16:20 on May 12, 2013

# ? May 11, 2013 23:14

duck monster: Dec 15, 2004

Malcolm XML posted:

SQLite is surprisingly powerful if you heed its limits: single user and your data set fits into memory. It can be much faster than MySQL (lol at embedding that in an android app!) for the stuff that Karthe wants.

Oh don't get me wrong, I adore SQLite. I use it more often than MySQL or PostGreSQL (which I've made peace with, although I still prefer MySQLs brute force) , at least in development. Theres a reason its the most common database on the planet.

But it *has* got limitations that become painfully obvious when you try and force too much data in it (ie my attempt at fitting a large astronomy dataset into it, which promptly caused the whole drat thing on fire), however if used within its limits its continuously surprising just how powerful it is.

# ? May 13, 2013 01:57

JetsGuy: Sep 17, 2003; science + hockey
=
LASER SKATES

Hey y'all, I've got a stylistic/speed question regarding importing your own made classes and modules.

I'm working on a project that includes a class (let's call it class A). In order to make the rest of the file more readable and easy to edit, I want to tear this off into another file A.py.

So I get how to do that and import it into main.py. However, I am trying to determine the "best" practice of importing modules (like say numpy) into both files.

Right now, I'm doing the standard import numpy as np in both files. PEP8's guide seems to suggest that this is the way to do things. However, it seems to me that this may cause the program to go slower, and may cause unforseen issues.

Wouldn't it be better to import numpy once and have a way to tell the imported class A to use the imports? OR is it better in the long run to have class A always have the needed imports in case I need to use class A for anything else?

Isn't this technically keeping 2 numpys open when the program runs? Eventually, this could get into N versions of numpy where N is the number of files I break this up into.

Sorry if this was a little convoluted. I'm trying to determine what's going on. :shobon:

# ? May 13, 2013 16:17

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

From a unit testing perspective, each file should contain all the imports it needs to test the classes defined in that file. Anything else is going to be a trainwreck once you start actually writing test cases.

# ? May 13, 2013 16:19

BeefofAges: Jun 5, 2004; Cry 'Havoc!', and let slip the cows of war.

Even if you import numpy twice, it's still just happening twice while loading your program. This shouldn't have any noticeable speed impact. It's not like you're reimporting numpy over and over in a loop.

# ? May 13, 2013 16:42

Hammerite: Mar 9, 2007; And you don't remember what I said here, either, but it was pompous and stupid.; Jade Ear Joe

BeefofAges posted:

Even if you import numpy twice, it's still just happening twice while loading your program. This shouldn't have any noticeable speed impact. It's not like you're reimporting numpy over and over in a loop.

Correct me if I'm wrong, but the second time aren't you just adding a name to the namespace of the module you're in at the time? That's essentially no work.

# ? May 13, 2013 16:49

JetsGuy: Sep 17, 2003; science + hockey
=
LASER SKATES

Hammerite posted:

Correct me if I'm wrong, but the second time aren't you just adding a name to the namespace of the module you're in at the time? That's essentially no work.

That's kinda what I'm wondering. If main.py looks like:

Python code:

import numpy as np
import classA as A

bullshit
code 
here

and classA.py looks like:

Python code:

import numpy as np

class ClassA(Object):
   def __init__(blah blah blah):
     etc.

What is actually going on?

is it making a numpy isolated into each file (like say a main.np and a classA.np)?

If this program gets big, there could be a lot of Numpys!

# ? May 13, 2013 17:01

Plorkyeran: Mar 22, 2007; To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed

There's only ever one instance of each module.

# ? May 13, 2013 17:13

Haystack: Jan 23, 2005

If you're worried about optimization you're infinitely better off spending your time using an actual profiler.

# ? May 13, 2013 17:18

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

Modules are cached in sys.modules.

# ? May 13, 2013 17:18

JetsGuy: Sep 17, 2003; science + hockey
=
LASER SKATES

OK, great, thanks guys. So it *is* just extending the namespace then, not loading a separate instance.

Haystack posted:

If you're worried about optimization you're infinitely better off spending your time using an actual profiler.

I'm sorry, could you clarify this?

# ? May 13, 2013 17:41

Haystack: Jan 23, 2005

Sorry, I was a little more curt than I should have been.

It's often more trouble than it's worth to try to optimize your code as you develop it. Developers often find that they get a lot more mileage simply developing without worrying about fiddly optimization. Towards the end, they run their code through a profiler and directly see the areas are actually bottlenecks in their code.

As it so happens Python ships with an excellent profiling tool called cProfile. You use it from the command line like this:

code:

python -m cProfile yourscript.py

See this Stack Overflow question and the official documentation for cProfile for more info.

That said, there's nothing wrong with trying to learn the best practices before you get in too deep with your codebase, so your original question is perfectly valid.

# ? May 13, 2013 18:09

Bunny Cuddlin: Dec 12, 2004

Haystack posted:

Optimization

I think one of the reasons new programmers get caught in this trap is that they often see discussions online where people talk about things like whether it's more efficient to use {} or dict() to instantiate an empty dictionary. Or maybe they see a didactic code snippet with a comment that explains, oh, we're doing this in a strange way because it's 20% faster. They start to think that's something every developer thinks about constantly as they write their code. What you don't realize, JetsGuy, is that these little bits and pieces of knowledge are part of a large bank of knowledge you accrete over years and years of practice. So maybe at one point you were debugging a tight inner loop and you found that using dict() was responsible for a slowdown that caused stutter in your UI and {} fixed it up. You might mention that in another code fragment that uses {} in a loop that could represent a slowdown, but it doesn't mean that every time you write a loop you pore over every single instruction you've written in every loop to see if there's a faster alternative.

So yes, people talk about these things, and once you learn about the performance implications of {} and dict() you can choose which one to use every time and it'll be faster, but these kind of optimizations are things you should learn as a result of writing lots of code, not something you should figure out before writing any code at all.

Write code that does what you need it to do in the most straightforward way possible. If it's too slow for your purposes or taste, only then should you go back and intentionally look for optimizations. Over time, after doing this many times over, you'll write more efficient code naturally.

# ? May 13, 2013 18:29

JetsGuy: Sep 17, 2003; science + hockey
=
LASER SKATES

Thanks so much guys, you pretty much nailed what my mindset was. I'll try to just write what works *first*.

# ? May 13, 2013 19:52

BeefofAges: Jun 5, 2004; Cry 'Havoc!', and let slip the cows of war.

First make it work, then make it fast (if you need to), then make it pretty (if you need to).

# ? May 14, 2013 03:46

QuarkJets: Sep 8, 2008

BeefofAges posted:

First make it work, then make it fast (if you need to), then make it pretty (if you need to).

If by "pretty" you mean "readable" then shouldn't that be part of "make it work?" Most scientific programming is done in the style of "I'm going to get this to work, I don't care if it's fast or readable", and it's actually a huge problem when a change to the code needs to be made but the entire house of cards falls apart because the code has turned into a black box and no one knows what makes it work

# ? May 14, 2013 08:09

JetsGuy: Sep 17, 2003; science + hockey
=
LASER SKATES

QuarkJets posted:

If by "pretty" you mean "readable" then shouldn't that be part of "make it work?" Most scientific programming is done in the style of "I'm going to get this to work, I don't care if it's fast or readable", and it's actually a huge problem when a change to the code needs to be made but the entire house of cards falls apart because the code has turned into a black box and no one knows what makes it work

I agree with you here, but that's just because I do scientific programming (as you apparently remember). I have in the last few years really made an effort to make my code easier to read and use for the next grad/postdoc/prof who may want to use the code. In the past, it was a long line of procedural code which could fall apart easily if someone didn't really get the code.

Now, I try to put all the "guts" into classes and methods at the top, and clearly delineate each piece of the code. It not only helps for the (rare) times I'll need to create a new class and use inheritances, but it helps make editing the code a lot easier. It makes editing the code easier for me, and makes it easier for future users to customize/edit.

Not ideal, I know, but a LOT better than a majority of the scientific code I read which is largely comment-less code with a ridiculous methodology. I can't tell you how many times I've tried to read a colleagues code and just cried bitter tears. Of course, that was more because it was the she-bitch IDL. :v:

# ? May 14, 2013 14:59

BeefofAges: Jun 5, 2004; Cry 'Havoc!', and let slip the cows of war.

Nah, I always aim for readable. I'll take readability over performance most of the time. I meant more along the lines of code where you read it and say 'wow, that's elegant'

Speaking of scientific programming, most of the scientists I know either have never heard of version control, or have heard of it but never tried it. I think we really need to start teaching scientists software engineering.

We can probably stop this derail and talk about Python though. There's a nice 'common misconceptions' thread going on over on reddit: http://www.reddit.com/r/Python/comments/1e8xw5/common_misconceptions_in_python/

# ? May 14, 2013 16:11

JetsGuy: Sep 17, 2003; science + hockey
=
LASER SKATES

BeefofAges posted:

Speaking of scientific programming, most of the scientists I know either have never heard of version control, or have heard of it but never tried it. I think we really need to start teaching scientists software engineering.

Yeah, in the coding horrors thread, I shocked a bunch of people a few weeks ago talking about this. Some of the younger scientists use version control, but yeah, largely it's not really used. The exception is if you're part of a huge collaboration (e.g. LIGO) where things like good VCS is good and needed.

I've been trying to starting getting into git. Largely right now I'm just using it for my plotter. I wish setting up the gui was a little more straightforward (it's not working on my system). I also am currently having difficulty figuring out how to properly visualize changes and such. The gui version would (hopefully) be better about that.

Git looks great though, aside from my being a newbie.

# ? May 14, 2013 16:49

Emacs Headroom: Aug 2, 2003

My old labmate would put data and scripts into version control, so he knew exactly what data and what computation was used to make exactly which plots. I never took it that far; I use poor man's version control (dated folders) for data and scripts, and just put the library-ish stuff in git. But yes, unit tests and vc are two things that are sadly not pushed hard enough in scientific computing.

# ? May 14, 2013 16:58

JetsGuy: Sep 17, 2003; science + hockey
=
LASER SKATES

Emacs Headroom posted:

I use poor man's version control (dated folders) for data and scripts

This is what I do. I have a "tools" folder that has the "recent" version. Then I have a folder with the previous versions that are dated.

But yeah, I wanna use git, but it seems like git has a bit of a learning curve in visualizing the changes.

# ? May 14, 2013 17:00

Dren: Jan 5, 2001; Pillbug

What do you mean when you say "visualizing the changes"? Do you want diffs between versions?

Does this do what you want?

code:

git log --name-status

# ? May 14, 2013 17:05

JetsGuy: Sep 17, 2003; science + hockey
=
LASER SKATES

Dren posted:

What do you mean when you say "visualizing the changes"? Do you want diffs between versions?

Does this do what you want?
code:
git log --name-status

Yes and no. Right now, I've only been committing changes to the master, so this is ok. However, when I start determining a branch I can't see this being useful I guess?

I just downloaded SmartGit, and it may be more what I want... I'll update.

# ? May 14, 2013 17:12

Lysidas: Jul 26, 2002; John Diefenbaker is a madman who thinks he's John Diefenbaker.; Pillbug

One more non-Python post can't hurt:

Use gitk. Always. Use the --all argument if you want to see all branches and it's virtually always a good idea run it in the background with &. I cannot get any work done without running gitk --all & and periodically refreshing the window after I make/push some commits.

# ? May 14, 2013 17:16

Dren: Jan 5, 2001; Pillbug

JetsGuy posted:

Yes and no. Right now, I've only been committing changes to the master, so this is ok. However, when I start determining a branch I can't see this being useful I guess?

I just downloaded SmartGit, and it may be more what I want... I'll update.

Typically, git log --name-status should be enough. I'm not sure how you envision a branch workflow going but the typical pattern is that you want to go work on some feature so you branch. Then you go work on that feature for a while and all your work happens in the branch. When you're done you merge the changes back into master and delete the branch. Once you merge back, doing git log --name-status on master will give you the full history of everything, including what happened on the branch.

To bring us back to python, I liked the snippets in that reddit thread for iterating over list using a window:

Python code:

my_list = [1, 2, 3, 4, 5]
for a, b in zip(my_list, my_list[1:]):
    print(a, b)

from http://www.reddit.com/r/Python/comments/1e8xw5/common_misconceptions_in_python/c9y4sr9

and

Python code:

def sliding_window(size, iterable):
    iterator = iter(iterable)
    window = collections.deque(itertools.islice(iterator, size-1),
                               maxlen=size)
    for item in iterator:
        window.append(item)
        yield tuple(window)

code:

>>> data = [1, 2, 3, 4, 5, 6, 7]
>>> list(sliding_window(2, data))
[(1, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7)]

from http://www.reddit.com/r/Python/comments/1e8xw5/common_misconceptions_in_python/c9y4n85

# ? May 14, 2013 17:55

QuarkJets: Sep 8, 2008

JetsGuy posted:

Yes and no. Right now, I've only been committing changes to the master, so this is ok. However, when I start determining a branch I can't see this being useful I guess?

I just downloaded SmartGit, and it may be more what I want... I'll update.

You might not have come upon a case where branching would be terribly useful, especially if you're working by yourself. Even if you're just working in the master branch forever, that's still a big improvement over not using version control at all so long as you make regular commits

# ? May 14, 2013 19:13

Hammerite: Mar 9, 2007; And you don't remember what I said here, either, but it was pompous and stupid.; Jade Ear Joe

Is it considered poor form to use the fact that loop variables are still set after the loop?

I have to repeatedly loop over elements of a dictionary and unset an element at each iteration, but can't unset in the loop because that's not allowed. So I came up with

Python code:

while len(myDict):
    for k in myDict:
        if condition(myDict[k]):
            do_things()
            break
    else:
        raise MyCustomException('badly formed dict!')
    del myDict[k]

Here I use the fact that k is still set after the loop is broken out of.

# ? May 14, 2013 21:25

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

Hammerite posted:

Is it considered poor form to use the fact that loop variables are still set after the loop?

I have to repeatedly loop over elements of a dictionary and unset an element at each iteration, but can't unset in the loop because that's not allowed. So I came up with
Python code:
while len(myDict):
    for k in myDict:
        if condition(myDict[k]):
            do_things()
            break
    else:
        raise MyCustomException('badly formed dict!')
    del myDict[k]
Here I use the fact that k is still set after the loop is broken out of.

Why are you using an inner loop when each run through the loop will execute 0 or 1 times? :raise:

Python code:

for k in myDict.keys():
    if condition(myDict[k]):
        do_things()
    else:
        raise MyCustomException('badly formed dict!')

    del myDict[k]

# ? May 14, 2013 21:42

Hammerite: Mar 9, 2007; And you don't remember what I said here, either, but it was pompous and stupid.; Jade Ear Joe

Misogynist posted:

Why are you using an inner loop when each run through the loop will execute 0 or 1 times?
Python code:
for k in myDict.keys():
    if condition(myDict[k]):
        do_things()
    else:
        raise MyCustomException('badly formed dict!')

    del myDict[k]

At each step in the process, at least one of the elements of the dictionary should satisfy the if clause (otherwise the dictionary shall be considered badly-formed by definition). However, I don't know which one(s).

# ? May 14, 2013 21:51

Adbot: ADBOT LOVES YOU

# ? May 9, 2024 04:43

Dren: Jan 5, 2001; Pillbug

This seems like a job for a heapsort.

# ? May 14, 2013 22:22

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python information and short questions megathread.

«‹›484 »