Python

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python

«‹›8 »

Dominoes: Sep 20, 2007

Sick!

# ¿ Sep 22, 2017 17:58

Adbot: ADBOT LOVES YOU

# ¿ May 16, 2024 09:48

Dominoes: Sep 20, 2007

Hey dudes, looking for database-design/Django wisdom. I'm storing and retrieving tabular data. Ie there are ~60 people, ~20 things to track per person. I have a table that has rows foreign-keying to a person, an item, and a few bits of info like dates. This means I'm querying 1200 rows of this each page load... it's bogging the server down to the point where the site takes minutes to load or times out. Is there a better way to store this type of data? I imagine I could fix it by serializing a dict of data in a text field or JSON field associated with each person (items as keys, dates etc as values), but I suspect this is frowned up. What do you suggest?

# ¿ Sep 24, 2017 00:38

Dominoes: Sep 20, 2007

Thermopyle posted:

Yeah, you probably just need to finesse your query. Might have to use raw SQL if the ORM isn't expressive enough.

Fixed it. Apparently there's a huge performance penalty for doing:

Python code:

 [gt for gt in ground_trainings if gt.person == person and gt.item == training_item]

instead of:

Python code:

 [gt for gt in ground_trainings if gt.person_id == person.id and gt.item_id == training_item.id]

Seems like another good ROT is when in doubt, use normal Python code to filter your initial query results; re-filtering/getting can cause extra DB pulls.

Dominoes fucked around with this message at 22:05 on Sep 25, 2017

# ¿ Sep 25, 2017 20:48

Dominoes: Sep 20, 2007

Thermopyle posted:

I can't really do it for you without having access to your codebase and database, but I'm 95% sure you can do that purely in the DB with the ORM or failing that raw sql. Probably be less resource intensive too.

The Django rule of thumb is don't do filtering in python when your highly-optimized database software can do it for you. Databases are built for filtering.

Though, like all rules of thumb, there's quite a bit of leeway there dependent upon your data structure, data size, and machine/network resource constraints.

How can I do this efficiently in Django query language? people is a query result.

Python code:

matching_people = (person for person in people if person.last_name.lower() == last_name and person.first_name[0].lower() == first_initial)
person = next(matching_people)

It's not this:

Python code:

person = people.get(last_name__iexact=last_name, first_name__istartswith=first_initial)

Another example:
Fast:

Python code:

matches = (i for i in items if i.pex_task_id == item['Task ID'])
item_db = next(matches)

Slow:

Python code:

item_db = items.get(pex_task_id=item['Task ID'])

Dominoes fucked around with this message at 01:21 on Sep 26, 2017

# ¿ Sep 25, 2017 22:39

Dominoes: Sep 20, 2007

Does anything in the OP look stale?

# ¿ Oct 8, 2017 11:01

Dominoes: Sep 20, 2007

Looks nice, but can't get it working due to this bug.

# ¿ Oct 18, 2017 09:27

Dominoes: Sep 20, 2007

Scipy 1.0.0 is out. Check out the highlights listed on that page. Two standouts for me:

1: Windows wheel. This means the whole Scipy stack is installable via Pip on Windows.
2: Long-overdue ode improvement. Previously, there were two APIs: odeint, which was easy-to-use, but limited, and ode, which was powerful, but convoluted. The new api: solve_ivp, provides robust options with an odeint-like API (yet even cleaner)

Dominoes fucked around with this message at 21:39 on Oct 26, 2017

# ¿ Oct 26, 2017 21:31

Dominoes: Sep 20, 2007

Thermopyle posted:

Kind of surprised they didn't have a wheel out already.

I haven't had to worry about pip installing stuff on Windows in a long time because so much stuff already has a wheel.

Scipy was a glaring exception until now; it had to do with Fortran compilers/licenses. Read here for details.

# ¿ Oct 26, 2017 22:03

Dominoes: Sep 20, 2007

Check out Toolz.

# ¿ Oct 27, 2017 09:10

Dominoes: Sep 20, 2007

Hey dudes: How do you actively test/work on functions in your code? This is a broad-question; I hope this context helps:

My workflow has involved editing files in PyCharm or another editor, and having a separate Ipyhon window open with %autoreload 2 set; I import the module. To test, I save the code in the editor, and work in Ipython.

I've recently tried Rstudio and Spyder... Love how you can just work entirely in the IDE and run bits of/all your code etc whenever you want. This doesn't appear possible in Pycharm: The run button at the top just runs the whole thing, as if it were a standalone script. The built-in console doesn't use the Spyder/Rstudio behavior I described, and doesn't work with autoreload. Is there any way to do this in Pycharm?

Dominoes fucked around with this message at 13:25 on Nov 4, 2017

# ¿ Nov 4, 2017 12:31

Dominoes: Sep 20, 2007

Dominoes posted:

Hey dudes: How do you actively test/work on functions in your code? This is a broad-question; I hope this context helps:

My workflow has involved editing files in PyCharm or another editor, and having a separate Ipyhon window open with %autoreload 2 set; I import the module. To test, I save the code in the editor, and work in Ipython.

I've recently tried Rstudio and Spyder... Love how you can just work entirely in the IDE and run bits of/all your code etc whenever you want. This doesn't appear possible in Pycharm: The run button at the top just runs the whole thing, as if it were a standalone script. The built-in console doesn't use the Spyder/Rstudio behavior I described, and doesn't work with autoreload. Is there any way to do this in Pycharm?

This appears to be fixed in the latest PyCharm release! Can do everything in the integrated Ipython terminal.

# ¿ Nov 30, 2017 21:41

Dominoes: Sep 20, 2007

vikingstrike posted:

You�ve always been able to highlight code and execute it in the built in i python terminal of pycharm.

Thermopyle posted:

FWIW, I was never really clear on what behavior you were looking for.

That might be because I almost always write a unit tests for whatever function. Like, I might not write the unit test first like a good TDD disciple, but if I get to the point where I'm wanting to run the function, I write a unit test to run the function and then press Ctrl-Shift-F10 to run it.

This gets me more unit tests and lets me do set up work and whatever else needs done.

I'd always have issues with dependencies when running highlighted code in console; had to select the entire function each time, or reload/reset the console each time code changed. Additionally, things like pressing the up arrow to autocomplete with history didn't work. Difficult to troubleshoot now that it's fixed!

# ¿ Dec 1, 2017 07:30

Dominoes: Sep 20, 2007

Is this summary of Pandas accurate? It's how I look at it/assess when to use it:

Wrapper for 2-D and 1-D arrays that includes labels, non-numerical indexing, different syntax, and many R-style statistical methods and filters. Orders-of-magnitude slower than the wrapped array, but if a problem, can convert to an array, perform bottlenecked-calcs, then convert back to a DF/Series.

Dominoes fucked around with this message at 17:01 on Dec 6, 2017

# ¿ Dec 6, 2017 16:58

Dominoes: Sep 20, 2007

If you haven't tried Pipenv, give it a shot: It elegantly combines virtualenvs with pip, and has simplified my workflow. It's been hard-broken with two distinct bugs until a few days ago, but the latest version is good-to-go.

# ¿ Dec 19, 2017 19:27

Dominoes: Sep 20, 2007

Lysidas posted:

Does pipenv still insist on using the deprecated virtualenv tool on all versions of Python, instead of using the venv standard library module when it's available? That was definitely the case when I checked last, and was a no-go for me to try out pipenv.

virtualenv is listed as s dependency in setup.py, so probably.

Sockser posted:

When building out some code last week, I accidentally used {} to make an array instead of ()
e.g. arr = { �a�, �b�, �c�, }

This was working but the order was getting goofed, which led me to discover my error. Is this just generating a dictionary of only keys with None values?

You created a set. It's like a list, but without duplicates. It's mainly used as an intermediate data structure to remove duplicates.

Dominoes fucked around with this message at 20:09 on Dec 20, 2017

# ¿ Dec 20, 2017 20:07

Dominoes: Sep 20, 2007

Native Windows install.

Bonus: Try pipenv.

Use powershell as your terminal, vice cmd.

Dominoes fucked around with this message at 15:12 on Jan 2, 2018

# ¿ Jan 2, 2018 10:12

Dominoes: Sep 20, 2007

Updated OP package and virtualenv sections to emphasize builtin venv and pipenv; vice Anaconda, and legacy tools like virtualenv and virtualenvwrapper.

# ¿ Jan 2, 2018 15:26

Dominoes: Sep 20, 2007

SurgicalOntologist posted:

That's a good use case for pandas. Something like
Python code:
pd.read_csv(filename, index_col=0).loc[keys_to_retrieve]
Edit: to elaborate, in pandas a Series/DataFrame is basically a dictionary that allows vectorized lookups.

How does this compare to builtins speed-wise? PD seems to be a minefield of slowdowns if used improperly.

# ¿ Jan 8, 2018 18:17

Dominoes: Sep 20, 2007

Indeed; the comparisons I've done have mostly been numpy -> pandas. IIRC loc itself can be OOM slower than numpy indexing.

# ¿ Jan 8, 2018 18:35

Dominoes: Sep 20, 2007

Jose Cuervo posted:

If your dataframe has 'Lat', 'Long' and 'Description' columns, then I think this is what you might be looking for:
Python code:
for idx, row in df_raw.iterrows():
    folium.Marker([row['Lat'], row['Long']], popup=row['Description']).add_to(map_1)

iterrows is very slow.

# ¿ Jan 10, 2018 16:50

Dominoes: Sep 20, 2007

Jose Cuervo posted:

Fair enough - but how would you vectorize what Seventh Arrow wants to do?

I'm not sure - without an answer to this question, I'd convert to an array, then perform equivalent operations on it. Something like this should be several thousand times faster, depending on the data etc.

He could keep the data in DF form for most uses, and use the array for iterating and/or mass-indexing.

Python code:

data = df_raw.values

lat_index = 1
lon_index = 2
description_index = 3

for row in data:
    folium.Marker([row[lat_index], row[lon_index], popup=row[description_index]).add_to(map_1)

Dominoes fucked around with this message at 18:19 on Jan 10, 2018

# ¿ Jan 10, 2018 18:13

Dominoes: Sep 20, 2007

I'm not sure where in the list of dicts is your quoted strings are (keys? values?), but you could apply something like this to each string, in a loop or comprehension:

Python code:

import re

for lyrics in lyricses:   # Modify this loop based on where the lyrics are in your data.
    # \ escapes the brackets, which normally have a special meaning in regex. The period
    # means any char, and the * means any amount of the char. The parentheses specify what
    # to return by the .groups() method; ie all text between brackets.
    try:
        bracketed_phrases = re.search(r'\[(.*)\]', lyrics).groups()
    except AttributeError:  # the result of re.search is None if there's no bracketed text.
        continue
    for phrase in bracketed_phrases:
        # Code here to replace the text will vary depending on where these lyrics live in your structure.
        # You could apply x.replace('[', '') etc, or  x[1:-1] to the orig phrase
        # or replace the original phrase with phrase, which which is now trimmed.

It uses a mini-language called Regex, which is for pattern-matching in strings. The code above will probably fail if the bracket format is inconsistent, eg mismatched brackets.

Dominoes fucked around with this message at 18:35 on Jan 25, 2018

# ¿ Jan 25, 2018 18:26

Dominoes: Sep 20, 2007

Python code:

import re

for song in songs:
    try:
        # This regex is modified from the example above to include the brackets in the groups.
        bracketed_phrases = re.search(r'(\[.*\])', song['lyrics']).groups()
    except AttributeError:  # the result of re.search is None if there's no bracketed text.
        continue
    for phrase in bracketed_phrases:
        song['lyrics'].replace(phrase, '')

# ¿ Jan 25, 2018 18:53

Dominoes: Sep 20, 2007

map is a function that applies another function to all items of an iterable (such as a list).

# ¿ Jan 25, 2018 20:06

Dominoes: Sep 20, 2007

Love it. Mutable too; a reason that sometimes keeps me from using namedtuples.

# ¿ Jan 25, 2018 21:03

Dominoes: Sep 20, 2007

fantastic in plastic posted:

Okay, thanks. Is there anything more to it than meets the eye? It looks like I just do something like
code:
map(lambda x: x**2, [1, 2, 3])
and that's all there is to it.

That's the idea. I find that if you have to define an anonymous function within the map, it's usually cleaner to use a comprehension instead; map is nice when you already have a function, or are using a builtin/something from a library.

# ¿ Jan 25, 2018 21:05

Dominoes: Sep 20, 2007

Python: fly with two words. Rust: Try to manipulate strings and end up with cows borrowed from strange places.

# ¿ Mar 17, 2018 01:28

Dominoes: Sep 20, 2007

Same; Django had a steep learning curve, but for me, at least, it's easier to set up a Django project than try to make Flask plugins play well together. Most of the projects I've worked on involve making a website or REST api, so this may not be applicable to other uses.

# ¿ Mar 26, 2018 18:27

Dominoes: Sep 20, 2007

Thermopyle posted:

FWIW, Django probably had a steep learning curve to you because you were also (implicitly) learning how web application servers worked.

Nailed it. Also trying to get Chef/Vagrant working per the tutorial I used; never sorted that out!

The March Hare posted:

Django really isn't that hard, I've never totally understood people saying that it is. It has some up-front default config that you have to figure out but the documentation is pretty solid and none of the concepts in Django are novel.

It has an extensive API, but the official tutorial's good about pointing to the necessities. Your point about the default config makes sense too; you need to make some (I imagine fairly standard) changes to settings.py to get started. (eg allowed hosts; templates and static files; database; adding your app to INSTALLED_APPS, dealing with CSRF/CORS if you're using DRF)

There's a lot of specific syntax to memorize, but most of it's provided by the tutorial.

Dominoes fucked around with this message at 19:43 on Mar 26, 2018

# ¿ Mar 26, 2018 19:35

Dominoes: Sep 20, 2007

The March Hare posted:

I think models are fairly reasonable (especially now that you don't need an outside library for migrations)

That was another stumbling block from earlier Django versions - "You mean I can't change my database unless I install an addon, and when I do, it's a gamble if my site will work again without resetting the DB?

# ¿ Mar 26, 2018 21:02

Dominoes: Sep 20, 2007

Use the Windows native installer.

# ¿ Apr 28, 2018 00:38

Dominoes: Sep 20, 2007

That'll work too.The reason I recommended official over Anaconda is I prefer to avoid having multiple package management tools; using pipenv for everything is nice.

# ¿ Apr 28, 2018 13:59

Dominoes: Sep 20, 2007

Thermopyle posted:

Unfortunately, the whole setup.py vs Pipfile is still as murky of a mess as requirements.txt vs setup.py is.

Thankfully, Pipfile vs requirements.txt is cleancut; the former replaces the latter. Any thoughts on why setup.py's install_requires can't be removed in leu of Pipfile? I assume that's what you're referring to. Ie, what's the limfac?

Npm-inspired tools with lockfiles like yarn, cargo, and pipenv are delightful.

Dominoes fucked around with this message at 20:13 on Apr 28, 2018

# ¿ Apr 28, 2018 20:09

Dominoes: Sep 20, 2007

New Diffeq solver; bindings to a Julia library.

Haven't tried it yet, but this appears to overcome the limitations I've run into when trying to solve non-trivial systems in Scipy.integrate. Both the native Julia version, and its transparent Python bindings look nicer than any solver suite I've seen.

# ¿ May 2, 2018 03:42

Dominoes: Sep 20, 2007

Is it possible to override Python's built-in error handling with a third-party module import? I'd like to improve them to be more like Rust's, where it makes educated guesses about what you did wrong, and what the fix is. (eg, rather than just raise an AttributeError or NameError, point out a best guess of the attribute or variable you misspelled.)

Dominoes fucked around with this message at 03:51 on May 7, 2018

# ¿ May 7, 2018 03:47

Dominoes: Sep 20, 2007

Sweet

# ¿ May 8, 2018 19:35

Dominoes: Sep 20, 2007

Here's a related Reddit thread , peppered with aggression.

Dominoes fucked around with this message at 22:53 on May 14, 2018

# ¿ May 14, 2018 22:51

Dominoes: Sep 20, 2007

I'm firefrommoonlight there; some dude got passive-agg with me for no reason, KR took things personally, and the Pendulum/Poetry guy threw in some smugness.

Dominoes fucked around with this message at 01:17 on May 15, 2018

# ¿ May 15, 2018 01:15

Dominoes: Sep 20, 2007

Thermopyle posted:

I really like how pipenv manages your virtualenvs for you.

I really do not like how slow it is at locking dependencies.

Same on both, as I posted in the reddit thread. It's been a gamechanger for me.

# ¿ May 15, 2018 02:15

Adbot: ADBOT LOVES YOU

# ¿ May 16, 2024 09:48

Dominoes: Sep 20, 2007

SurgicalOntologist posted:

Should I switch?

You should try it.

Bash code:

pip install pipenv
cd ~/myproject
pipenv install

Then you can

Bash code:

pipenv shell

from inside your proj directory to activate the environment

or

Bash code:

pipenv run command

to run a single command.

# ¿ May 15, 2018 03:26

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python

«‹›8 »