Python information and short questions megathread.

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python information and short questions megathread.

«‹›3 »

vikingstrike: Sep 23, 2007; whats happening, captain

What do you mean the way it interprets indentations sucks rear end? As long as you are consistent, it shouldn't matter. Just set it in your text editor and forget about it.

# ¿ Jun 7, 2012 13:41

Adbot: ADBOT LOVES YOU

# ¿ May 9, 2024 10:00

vikingstrike: Sep 23, 2007; whats happening, captain

Soft tabs forever.

# ¿ Jun 7, 2012 14:56

vikingstrike: Sep 23, 2007; whats happening, captain

There are plenty of people that use Linux for development. In fact, I think it's easier/more straight forward to install modules from Linux than other OSes. To each their own.

This will probably help you:
https://svn.enthought.com/enthought/wiki/Install/ubuntu

edit: If you're unsure what to do with an enthought.sh file, your frustration with Linux is just a lack of familiarity with the environment it seems. .sh is the file extension of a shell script. All you should have needed to do is run 'sudo sh enthought.sh'. The script itself probably downloaded/installed things, but .sh itself is not a zip file, like you referred to.

Also, your linux distro should come with python out of the box. All you should have had to do is run 'python' from a terminal to go into the interactive shell. Like the last guy said, just use apt-get for most things.

vikingstrike fucked around with this message at 20:53 on Jun 7, 2012

# ¿ Jun 7, 2012 20:49

vikingstrike: Sep 23, 2007; whats happening, captain

If you're coming from Mathematica, I would recommend reading up on both NumPy and SciPy.

# ¿ Jun 12, 2012 13:21

vikingstrike: Sep 23, 2007; whats happening, captain

So, trying out Scientific Linux as a replacement for Fedora. Just got Numpy and Scipy built from source. Running tests, numpy.test() checks out fine, no failures, but scipy.test() returns 64 failures. Most of which look like the are raising AssertionError() that look like below

code:

FAIL: test_arpack.test_symmetric_modes(True, <gen-symmetric>, 'd', 2, 'SA', None, 0.5, <function asarray at 0x17b5d70>, None, 'cayley')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/nose-1.1.2-py2.6.egg/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/usr/lib64/python2.6/site-packages/scipy/sparse/linalg/eigen/arpack/tests/test_arpack.py", line 249, in eval_evec
    assert_allclose(LHS, RHS, rtol=rtol, atol=atol, err_msg=err)
  File "/usr/lib64/python2.6/site-packages/numpy/testing/utils.py", line 1213, in assert_allclose
    verbose=verbose, header=header)
  File "/usr/lib64/python2.6/site-packages/numpy/testing/utils.py", line 677, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=4.44089e-13, atol=4.44089e-13
error for eigsh:general, typ=d, which=SA, sigma=0.5, mattype=asarray, OPpart=None, mode=cayley
(mismatch 100.0%)
 x: array([[ -4.37654593e+01,   1.93569050e-02],
       [ -6.76756743e+01,   1.10531582e-01],
       [ -3.92194572e+01,   1.32235717e-01],...
 y: array([[-6.99800566,  0.01935691],
       [-6.57812868,  0.11053158],
       [-3.77954962,  0.13223572],...

Anyone familiar with this? Is this something I should be worried about?

# ¿ Jun 13, 2012 21:10

vikingstrike: Sep 23, 2007; whats happening, captain

Thanks for the suggestions, guys! I am going to look into that repo you linked to, to see if I can get this sorted out.

As for why I switched (I haven't yet, totally): I've been using Fedora for a couple of years as a dev environment/demo lab (I use OS X as my main desktop OS) and recently heard about SL. Since I was using it in a VM, I was interested to see if I saw any performance boost compared to Fedora and to get some Gnome 2 love. I'm still getting a feel for it, but it's been fine so far. The biggest difference is the lack of packages that are easy to find right off the bat for what I usually need (like right now, finding the right repo to look at).

What do you enjoy about Fedora that SL lacks?

edit: that repo worked, thanks for the heads up!

vikingstrike fucked around with this message at 03:21 on Jun 14, 2012

# ¿ Jun 14, 2012 01:37

vikingstrike: Sep 23, 2007; whats happening, captain

I've never solved that method in python, but that seems like a pretty great beginner exercise. Can you post up what you have and we can help from there?

# ¿ Jul 9, 2012 01:42

vikingstrike: Sep 23, 2007; whats happening, captain

On OS X, package management is kind of bitch and using the system's default python installation can cause you trouble. Most people recommend using MacPorts or Homebrew to get a more unix-vibe to managing your python stuff. Keeps this more portable during upgrades, and you don't have to worry about Apple steamrolling your installations. MacPorts and Homebrew both have their pluses and minuses, that's topic for another time.

# ¿ Jul 30, 2012 16:44

vikingstrike: Sep 23, 2007; whats happening, captain

Can you post the error? Have you hosed with your PYTHONPATH recently?

# ¿ Aug 21, 2012 03:19

vikingstrike: Sep 23, 2007; whats happening, captain

Sockser posted:

welp

added
export PYTHONPATH=~/Library/Frameworks:$PYTHONPATH
to my .bash_profile

now it works. Cool.

I really hate OSX changing directory structure every release.

It's annoying when you don't know about it and they change it suddenly, but I think moving that folder into the user's home folder is a smart move.

Although, like Emacs Headroom said, using Macports or Homebrew is usually a lot less painful.

# ¿ Aug 21, 2012 03:50

vikingstrike: Sep 23, 2007; whats happening, captain

JetsGuy posted:

I need to get better at remembering PEP8.

That code I posted above is a pretty fair approximation of how I stylistically code, for whatever that's worth.

I code similarly. I think it's just carry over from Matlab warping my brain.

# ¿ Feb 14, 2013 19:25

vikingstrike: Sep 23, 2007; whats happening, captain

Slightly expanding what you've done, you can always use [A-Za-z].

# ¿ Jan 23, 2014 20:11

vikingstrike: Sep 23, 2007; whats happening, captain

Pollyanna posted:

Regular expressions hurt my brain. I want to replace any case instance of the letter T (t or T) with the appropriate case instance of U (e.g. u for t, U for T). I'm trying to make a regex for this:
code:
import re

pattern = re.compile(re.escape('t'), re.IGNORECASE)

s = 'ACGTACGTACGTACGTACGT'

print pattern.sub('u', s)

>> ACGuACGuACGuACGuACGu
...but as you can see, it doesn't replace T with U, instead it's with u. How do I make it do that?

Is this what you're looking for? There may be a way to combine this into one go, but this will work

code:

In [1]: import re

In [2]: s = 'ACGTACGTACGTACGTACGT'

In [3]: re.sub('T', 'U', s)
Out[3]: 'ACGUACGUACGUACGUACGU'

In [5]: s = 'ACGTACGTACGtACGtACGT'

In [6]: re.sub('T', 'U', s)
Out[6]: 'ACGUACGUACGtACGtACGU'

In [7]: re.sub('t', 'u', s)
Out[7]: 'ACGTACGTACGuACGuACGT'

# ¿ Feb 27, 2014 00:46

vikingstrike: Sep 23, 2007; whats happening, captain

Write a function in whatever language you want to do the "speak" part and then use os.system to call it. With this approach you would use python to generate the arguments to pass to your other function.

# ¿ Mar 17, 2014 22:22

vikingstrike: Sep 23, 2007; whats happening, captain

lufty posted:

I'm still getting an "expected indentation block" error

Python code:

#Dice Roller
import random #imports the random feature, allows randomisation of given values

def RollRepeat():
    UserInput = input("Do you want to try again? Type Yes or No.\n")
    if UserInput == "Yes" or "yes":
        return

def RollFunc():
    DiceRoll = input("Which sided dice would you like to roll, a 4, 6 or 12 sided dice?\n") #asks the user for their choice of dice

    if DiceRoll in ("4", "four", "Four"):
    FourVar = random.randint (1, 4)
    print("You have rolled a four sided die, resulting in", FourVar".")

    elif DiceRoll in ("6", "six", "Six"):
    SixVar = random.randint (1, 6)
    print("You have rolled a six sided die, resulting in", SixVar".")

    elif DiceRoll in ("12", "twelve", "Twelve"):
    TwelveVar = random.randint (1, 12)
    print("You have rolled a twelve sided die, resulting in", TwelveVar".")

    else:
    print("Not a valid input.")
    RollFunc()

while True:
    RollFunc()
    RollRepeat()

You need to fix that inner function, at the very least:

code:

def RollFunc():
    DiceRoll = input("Which sided dice would you like to roll, a 4, 6 or 12 sided dice?\n") #asks the user for their choice of dice

    if DiceRoll in ("4", "four", "Four"):
    	FourVar = random.randint (1, 4)
    	print("You have rolled a four sided die, resulting in", FourVar".")

    elif DiceRoll in ("6", "six", "Six"):
    	SixVar = random.randint (1, 6)
   	print("You have rolled a six sided die, resulting in", SixVar".")

    elif DiceRoll in ("12", "twelve", "Twelve"):
    	TwelveVar = random.randint (1, 12)
    	print("You have rolled a twelve sided die, resulting in", TwelveVar".")

    else:
    	print("Not a valid input.")
    	RollFunc()

# ¿ Apr 7, 2014 14:17

vikingstrike: Sep 23, 2007; whats happening, captain

Install pycharm on the remote server and forward it over X? There are also commands like rmate (TextMate) and rsub (Sublime Text) that let you locally edit remote files through the respective applications.

Another option is to setup a Dropbox folder so syncs are handled automatically and then just keep an ssh session open to the remote server for execution.

# ¿ Apr 23, 2014 15:50

vikingstrike: Sep 23, 2007; whats happening, captain

Munkeymon posted:

The only one I can find is for an Index, but I only started using Pandas on Tuesday, so my knowledge and understanding of its capabilities is very much lacking

Here are the docs for a Series sort: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.sort.html?highlight=sort#pandas.Series.sort
and the DataFrame sort: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort.html?highlight=sort#pandas.DataFrame.sort

If your DataFrame is df, then just use df.sort(['var1', 'var2']). For Series, it should just be s.sort().

# ¿ May 2, 2014 16:46

vikingstrike: Sep 23, 2007; whats happening, captain

There are a variety of ways to handle working with multiple files at once. One approach would be to use f.readlines() to store the whole file in memory where each row is an entry in a list. Just kind of depends on how you want to approach things. If you wanted to use the setup you have, you can use the CSV reader to process each row and store the results you need somewhere (a list, e.g.) that can be used with the fuzzy library.

# ¿ May 4, 2014 04:03

vikingstrike: Sep 23, 2007; whats happening, captain

the posted:

I have a list of 176,876 Ids (strings), and I am trying to find an easy way to group them by occurrences (so I can plot it later). Is this something that Pandas can do?

I've loaded them into a DataFrame. Running something like df.pandas.groupby('ID').group() seems to sort of do what I want (printing out every string and the list locations where it occurs), but I'm really looking for something like:
code:
       ID : Occurrence
342000XBB : 37
200333XCC : 31
342203CBB : 17
edit: I think df.groupby('Id').size() solved the problem!

Second edit: Importing the list into a dataframe, I ended up having to write the list to a CSV file and then load it into the dataframe. I couldn't find a way to load a list into a dataframe directly even after reading the Pandas documentation. Did I miss something?

For your second edit, what exactly do you mean? Where is the list coming from? What have you tried up to this point?

# ¿ Jul 23, 2014 20:43

vikingstrike: Sep 23, 2007; whats happening, captain

the posted:

I grabbed some information from an SQL query and imported it into a list object. So I have something like:

alist = [['0024242'],['34234234'],['2342341']...] And so on

I tried making it a Series object in Pandas and then "doing stuff" with it, but that didn't work out. My end goal was to do what I did above, which was just to sort the list and find the counts. Once I found out I could do this in a dataframe, I wanted to put this list in a dataframe, but Pandas doesn't appear to support going from a list to a dataframe. The only way I knew how to make a dataframe was with a read_csv command, so I wrote the list to a csv file and then read it back into a dataframe using Pandas.

Make that a plain list, not a list of lists with one element. You could do something like [x[0] for x in alist]. Then just make it a Series and use value_counts() to get what you need. If you don't have another dimension to the data, you probably don't need a DataFrame. Here are the docs for this method:
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.value_counts.html

# ¿ Jul 23, 2014 22:30

vikingstrike: Sep 23, 2007; whats happening, captain

This is exactly the stuff that pandas and DataFrames are great for. You don't need to worry about dictionaries or whatever. If, for example, you need all of the "Collies", then you just use logical indexes and grab the information you need.

# ¿ Jul 25, 2014 19:42

vikingstrike: Sep 23, 2007; whats happening, captain

the posted:

The module grabs queries 500 entries at a time. There are 8862. How do I store/append them to the dataframe during this process?

And, if you can answer this, you will make my life:

Is there a way, using Pandas, that I could return (in this example) every Breed that has exactly one Adoption Info attached to it (eliminating Maltese and Boxer from that list)?

Do you even look at the docs? Here's how you can concatenate DataFrames together: http://pandas.pydata.org/pandas-docs/stable/merging.html

For your second issue, you want to think about combining what you asked previously about getting group sizes and then using logical indexing: http://pandas.pydata.org/pandas-docs/stable/indexing.html

# ¿ Jul 25, 2014 19:58

vikingstrike: Sep 23, 2007; whats happening, captain

JayTee posted:

In my mission to never use R again I've been learning matplotlib and pandas. Here's something i haven't been able to work out. Say I make a graph like this:
Python code:
import pandas as pd
from matplotlib import pyplot as plt
import numpy as np

df = pd.DataFrame({'type':['a','b','a'], 'fart':[1, 2, 3], 'ffff':[4,5, 6]})
g = df.groupby(['type'])
ga = g.aggregate(np.sum)
ga['fart'].plot(kind = 'bar')
plt.show()
How do I alter the tick labels on the graph? The actual text and things like font size. I've been looking for where this information is kept so I can pull it out and alter it but with no success. And I'm sure I can pass labels using kwds but that ends up being way more hassle than it sounds with what I want to do.

This first answer in this Stack Overflow post should get you started with the labels and show you how to interface with the axis more generally: http://stackoverflow.com/questions/11244514/matplotlib-modify-tick-label-text

And here is the documentation on the different ways you can interact with axis: http://matplotlib.org/api/axis_api.html

# ¿ Jul 31, 2014 14:43

vikingstrike: Sep 23, 2007; whats happening, captain

JayTee posted:

I disregarded that first answer when I came across it yesterday since it said it didn't work on later versions. In case this helps anyone in the future, to get the actual labels for my chart out I had to do this:
code:
In[91]: an_axes = ga['fart'].plot(kind = 'bar')
In[92]: labels = [item.get_text() for item in an_axes.get_xticklabels()]
In[93]: labels
Out[90]: ['a', 'b']
Cheers!

Sorry about that. I glanced quickly and it meshed with what I had used in the past. Glad you got it working.

# ¿ Jul 31, 2014 15:49

vikingstrike: Sep 23, 2007; whats happening, captain

the posted:

I'm grabbing some items from a query and importing them into a pandas dataframe.

It looks like this:
code:
      Amount   CloseDate Id         type
0    5079.00  2013-09-30     Opportunity
1   58050.00  2013-11-04     Opportunity
2    6783.00  2014-01-02     Opportunity
3    7280.00  2013-12-04     Opportunity
I'm trying to plot with the x axis as the date, but I can't figure out what that's called.

I've tried:

' CloseDate Id'
'CloseDate Id'
' CloseDate'
'CloseDate'

Have you tried df['CloseDate Id']?

# ¿ Aug 5, 2014 17:58

vikingstrike: Sep 23, 2007; whats happening, captain

Ahz posted:

Dropping in to say that the Pandas documentation sucks for a stats newb. I finally got my groups and filters working for my main reports, but does anyone know a good Pandas for dummies article/site out there? I find the syntax to Panda's filtering and grouping methods highly unintuitive. For example, this is the method I found works on filtering by row value:
Python code:
data_frame = data_frame.loc[data_frame[column_name] >= low_value]

Wes McKinney the pandas library creator has a book called Data Analysis with Python that goes through a bunch of stuff with pandas and matplotlib that may be helpful.

# ¿ Aug 5, 2014 20:18

vikingstrike: Sep 23, 2007; whats happening, captain

the posted:

I'm guessing this is possible, but I need to know how.

I want to drop rows in a dataframe where the value of said row for Column Y meets a condition (say if that cell = 0). An example would be dropping all rows from a Census data set where the person's age was below 5.

The Pandas documentation on the drop() command is... sparse. From searching, I've seen ways to drop rows based on naming the specific row, but I need a conditional statement here.

edit: This solved it:
code:
l = l[l.Amount != 0]

Drop is for deleting variables in a data frame.

# ¿ Aug 6, 2014 18:12

vikingstrike: Sep 23, 2007; whats happening, captain

the posted:

My entire list of names is 1214

When I ran my first test with this list, I got 940 men and 732 women, or 1672 names, meaning I'm getting duplicates somewhere (i.e both male and female name).

I tried running a test to print out the duplicates, but I don't think this is working:
code:
for i in match_list:
	m = False
	f = False
	for j in mn:
		if string.lower(i[2]) == string.lower(j[0]):
			m = True
			males.append([i[1],''])
	for k in fn:
		if string.lower(i[2]) == string.lower(k[0]):
			f = True 
			females.append([i[1],''])
	if m == True and f == True:
		print i[2]
It looks like it's just spitting out every single name, and I can't figure out why.

edit: I think I figured out the problem. Turns out that people tend to have a lot of common names in both genders. "William" was showing up in the womens list, albeit way down at the bottom. I decided to only look at the top 500 male/female names instead. Hopefully it helps.

edit: Yep, ended up getting 865 males and 198 females

Python has a built in "set" type that you can use to find intersections, unions, etc. if you ever need to do this again in the future.

# ¿ Aug 7, 2014 18:57

vikingstrike: Sep 23, 2007; whats happening, captain

Try "pip install beatboxxx". Looks like you downloaded the source code, which you don't really need.

# ¿ Aug 10, 2014 03:42

vikingstrike: Sep 23, 2007; whats happening, captain

the posted:

That worked, thanks. Any idea why the conda command didn't? Their official docs say that you can install a local package using that command.

Not too familiar with anaconda but I don't think you can pass archives into it like that. You downloaded the source code when all you need is to tell it the module to install. See if "conda install beatboxxx" works.

Here is the Pypi page: https://pypi.python.org/pypi/beatboxxx/21.5 Pip downloads the file you did as part of the installation process. If you can find the module on Pypi you can install and upgrade through pip.

# ¿ Aug 10, 2014 03:45

vikingstrike: Sep 23, 2007; whats happening, captain

EricBauman posted:

This is all somewhat babby's first python, but can someone tell me why
code:
import time
import urllib
countdown = 200

while countdown > 0 :
    for url in open('listoflinks.txt') :
        
       filename = str(countdown) + '.html'
       urllib.urlretrieve(url, filename)    
       
       time.sleep(3)
       countdown = countdown - 1
    print "This prints once every 3 seconds - ", countdown   
doesn't print that message and the countdown every three seconds?
I'm pretty sure I indented it all correctly.

The print statement is outside of the for loop, so it will only execute after every url in "listoflinks.txt" is done being processed. Indent the print line statement one more time if you want it to print after each url is processed.

# ¿ Aug 27, 2014 18:40

vikingstrike: Sep 23, 2007; whats happening, captain

FoiledAgain posted:

edit: Do you even need a while-loop? How many times are you going to loop through that list of urls anyway? If only once, you don't need the while.

Yeah, I was about to post this. If you just need to run through the list once, then you just need the for loop there.

# ¿ Aug 27, 2014 19:39

vikingstrike: Sep 23, 2007; whats happening, captain

I haven't test this code, but something like this should work:

code:

df = df[(df['installer_type'] == "pip") & (df['installer_version'] ~= None)]
df = df.drop(['installer_type'], axis=1)
version_counts = df.groupby(['installer_version'])['count'].agg(sum)

# ¿ Sep 8, 2014 20:23

vikingstrike: Sep 23, 2007; whats happening, captain

Thermopyle posted:

PyCharm 4 is out.

iPython Notebook integration

NumPy array viewer

NumPy code insight

matplotlib support in integrated console

BDD via support for behave and lettuce

debugger can be attached to any running python process

debugger works for Jinja2 templates

blah blah blah

oh also, everything from WebStorm 9

Definitely going to check out this update. The notebook integration and the NumPy array viewer sounds great. Curious if the array viewer is able to peak at pandas DataFrames (doubt it, but it would be a nice bonus).

# ¿ Nov 20, 2014 17:21

vikingstrike: Sep 23, 2007; whats happening, captain

Lysidas posted:

https://youtrack.jetbrains.com/issue/PY-14330

Thanks for this! It seems like the spyder IDE already lets you peak at DataFrames, but I enjoy pycharm too much to switch. iPython notebooks are sufficient for me.

# ¿ Nov 21, 2014 16:07

vikingstrike: Sep 23, 2007; whats happening, captain

Cingulate posted:

I don't get parallel for loop syntax. At all. Like, I keep starring at the documentation to multiprocessing or whatever, and it's all Greek.

I just want to do something like
parfor x in range(0,10): my_list[x] = (foo(x))

FWIW, this is on Python 2.7. IPython, that is.

This article may be helpful: http://chriskiehl.com/article/parallelism-in-one-line/ In particular, using the map() function on a multiprocessing Pool() object at the very end.

# ¿ Feb 2, 2015 20:55

vikingstrike: Sep 23, 2007; whats happening, captain

Thermopyle posted:

Don't worry, this mythical "good programmer" doesn't exist. You'll just find that you turn from a terrible programmer into a terrible programmer who knows they're terrible.!

This should probably be in the OP.

# ¿ Feb 24, 2015 01:16

vikingstrike: Sep 23, 2007; whats happening, captain

Hughmoris posted:

What is a simple way to parse information out of an HTML document? My work has an application that renders forms using HTML but it's not a website. I want to run through a file and any input element it sees, it should record the type, name and value. I'm assuming I don't want to use regex to parse html...

There are libraries like BeautifulSoup that will help parse HTML for you. I've used BeautifulSoup in the past and it was straightforward to pickup.

# ¿ Mar 18, 2015 02:42

vikingstrike: Sep 23, 2007; whats happening, captain

You could always use a DataFrame and have name, email, and state columns. You can easily filter based on state or name to slice it however you need. If you start assigning techs to multiple states you can create variables as needed or just add additional rows.

# ¿ Mar 20, 2015 22:26

Adbot: ADBOT LOVES YOU

# ¿ May 9, 2024 10:00

vikingstrike: Sep 23, 2007; whats happening, captain

If all you need is a count within a subset, you can also do the following in pandas:

df[df["Accuracy-T"] == 2]["Trial Type"].value_counts()

# ¿ May 8, 2015 12:45

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python information and short questions megathread.

«‹›3 »