Python information and short questions megathread.

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python information and short questions megathread.

«‹›484 »

vikingstrike: Sep 23, 2007; whats happening, captain

the posted:

I have a list of 176,876 Ids (strings), and I am trying to find an easy way to group them by occurrences (so I can plot it later). Is this something that Pandas can do?

I've loaded them into a DataFrame. Running something like df.pandas.groupby('ID').group() seems to sort of do what I want (printing out every string and the list locations where it occurs), but I'm really looking for something like:
code:
       ID : Occurrence
342000XBB : 37
200333XCC : 31
342203CBB : 17
edit: I think df.groupby('Id').size() solved the problem!

Second edit: Importing the list into a dataframe, I ended up having to write the list to a CSV file and then load it into the dataframe. I couldn't find a way to load a list into a dataframe directly even after reading the Pandas documentation. Did I miss something?

For your second edit, what exactly do you mean? Where is the list coming from? What have you tried up to this point?

# ? Jul 23, 2014 20:43

Adbot: ADBOT LOVES YOU

# ? May 8, 2024 07:31

the: Jul 18, 2004; by Cowcaster

vikingstrike posted:

For your second edit, what exactly do you mean? Where is the list coming from? What have you tried up to this point?

I grabbed some information from an SQL query and imported it into a list object. So I have something like:

alist = [['0024242'],['34234234'],['2342341']...] And so on

I tried making it a Series object in Pandas and then "doing stuff" with it, but that didn't work out. My end goal was to do what I did above, which was just to sort the list and find the counts. Once I found out I could do this in a dataframe, I wanted to put this list in a dataframe, but Pandas doesn't appear to support going from a list to a dataframe. The only way I knew how to make a dataframe was with a read_csv command, so I wrote the list to a csv file and then read it back into a dataframe using Pandas.

# ? Jul 23, 2014 20:59

The March Hare: Oct 15, 2006; _{Je r�ve d'un}
Wayne's World 3; Buglord

Howdy,

I'm early into implementing search for the first time ever and I've got a really basic setup going w/ Haystack and ElasticSearch. I got my index built, search works, but ES seems to think that a query like:

code:

www.google.com/

is actually a query for

code:

www.google.comBEGIN_BUT_NEVER_FINISH_REGULAR_EXPRESSION_HERE

and it throws me a big fat EOF warning and refuses to execute the search.

Near as I can tell, this is a known thing in ES when using query_string (which haystack does). Is there any way to either totally disable regex in search (I would sort-of rather not do this) or just make it recognize that a url is not a regular expression? Or just idk escape the forward slash or something?

e; A) Woops, meant to post this in the Django thread.
B) Turns out it was a known issue with Haystack not having "/" in the list of reserved chars for escaping that was resolved on master almost a year ago and there just hasn't been an update to the pypi version of Haystack in a really long time -__-.

The March Hare fucked around with this message at 15:27 on Jul 24, 2014

# ? Jul 23, 2014 21:12

SurgicalOntologist: Jun 17, 2004

the posted:

Second edit: Importing the list into a dataframe, I ended up having to write the list to a CSV file and then load it into the dataframe. I couldn't find a way to load a list into a dataframe directly even after reading the Pandas documentation. Did I miss something?

I assume you have more than one list? Or it wouldn't be a DataFrame but a Series (i.e. a single column). But there's a ton of ways to create a DataFrame, which can actually be sort of annoying. One way is a dictionary of lists. The keys are the column names.

Python code:

df = pd.DataFrame({'ids': ids, 'some_other_column': some_other_list})

Edit: Saw your other post. You have a list of lists.

Python code:

In [4]: data = [[1,2,3], [10, 20, 30], [100, 200, 300]]

In [5]: pd.DataFrame.from_records(data, columns=['foo', 'bar', 'baz'])
Out[5]: 
   foo  bar  baz
0    1    2    3
1   10   20   30
2  100  200  300

Wait, you just have a list of 1-element lists of strings? Why not just make that a list. It doesn't seem complex enough to justify a DataFrame. A Series maybe.

Python code:

In [9]: stupid_list =  [['0024242'],['34234234'],['2342341']]

In [10]: flat_list = [i[0] for i in stupid_list]

In [11]: pd.Series(flat_list)
Out[11]: 
0     0024242
1    34234234
2     2342341
dtype: object

Pandas can also create a DataFrame directly from a SQL query. See pandas.io.sql_read_frame for example.

SurgicalOntologist fucked around with this message at 21:46 on Jul 23, 2014

# ? Jul 23, 2014 21:38

KernelSlanders: May 27, 2013; Rogue operating systems on occasion spread lies and rumors about me.

Symbolic Butt posted:

You can use the outer product, that's what I use when I need something like that: np.outer(a, a)

Maybe someone better than me can answer you why working with column vectors aren't that great.

Well, there are other times when column vectors are useful I would think, unless there's some other more pythonic structure I don't know about. Say I have a 3x2 matrix M and want to add a third column that's a linear combination of the first two. In Matlab I can do:

code:

>> M = [1 2; 4 3; 3 1]

M =

     1     2
     4     3
     3     1

>> [M, 2*M(:,1)-M(:,2)]

ans =

     1     2     0
     4     3     5
     3     1     5

Whereas in python the things I've been able to come up with work, but seem really messy:

Python code:

>>> M = np.array([[1,2],[4,3],[3,1]])
array([[1, 2],
       [4, 3],
       [3, 1]])

>>> np.vstack((M.T, 2*M[:,0] - M[:,1])).T  # the double transpose method
array([[1, 2, 0],
       [4, 3, 5],
       [3, 1, 5]])

>>> np.hstack((M, (2*M[:,0] - M[:,1]).reshape(3,1)))  # the explicit reshape method
array([[1, 2, 0],
       [4, 3, 5],
       [3, 1, 5]])

I'm sure there's a "right" way to do this, but it's not obvious to me what it is.

# ? Jul 23, 2014 22:22

vikingstrike: Sep 23, 2007; whats happening, captain

the posted:

I grabbed some information from an SQL query and imported it into a list object. So I have something like:

alist = [['0024242'],['34234234'],['2342341']...] And so on

I tried making it a Series object in Pandas and then "doing stuff" with it, but that didn't work out. My end goal was to do what I did above, which was just to sort the list and find the counts. Once I found out I could do this in a dataframe, I wanted to put this list in a dataframe, but Pandas doesn't appear to support going from a list to a dataframe. The only way I knew how to make a dataframe was with a read_csv command, so I wrote the list to a csv file and then read it back into a dataframe using Pandas.

Make that a plain list, not a list of lists with one element. You could do something like [x[0] for x in alist]. Then just make it a Series and use value_counts() to get what you need. If you don't have another dimension to the data, you probably don't need a DataFrame. Here are the docs for this method:
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.value_counts.html

# ? Jul 23, 2014 22:30

namaste friends: Sep 18, 2004; by Smythe

I'm trying to format two columns of output for data of variable lengths.

Here's what I'd like to see:

code:

Heading1 Heading2
data     data

Or:

code:

Heading1         Heading2
datadatadatadata datadatadatadatadata

Is it possible to do this without using a non-standard library?

# ? Jul 23, 2014 23:26

fletcher: Jun 27, 2003; ken park is my favorite movie; Cybernetic Crumb

Cultural Imperial posted:

I'm trying to format two columns of output for data of variable lengths.

Here's what I'd like to see:
code:
Heading1 Heading2
data     data
Or:
code:
Heading1         Heading2
datadatadatadata datadatadatadatadata
Is it possible to do this without using a non-standard library?

If you can figure out the max length for each column maybe you can just use ljust?

# ? Jul 23, 2014 23:41

BeefofAges: Jun 5, 2004; Cry 'Havoc!', and let slip the cows of war.

https://pypi.python.org/pypi/PrettyTable ?

# ? Jul 24, 2014 00:19

namaste friends: Sep 18, 2004; by Smythe

fletcher posted:

If you can figure out the max length for each column maybe you can just use ljust?

That did the trick! Thanks!

# ? Jul 24, 2014 00:22

namaste friends: Sep 18, 2004; by Smythe

BeefofAges posted:

https://pypi.python.org/pypi/PrettyTable ?

Has to be part of a default python install I'm afraid. Thanks though.

# ? Jul 24, 2014 00:22

Nippashish: Nov 2, 2005; Let me see you dance!

KernelSlanders posted:

I'm sure there's a "right" way to do this, but it's not obvious to me what it is.

The right way to do this is to use row vectors. Everything goes a little bit smoother in numpy if your data are in rows instead of in columns.

# ? Jul 24, 2014 05:18

SurgicalOntologist: Jun 17, 2004

KernelSlanders posted:

Well, there are other times when column vectors are useful I would think, unless there's some other more pythonic structure I don't know about. Say I have a 3x2 matrix M and want to add a third column that's a linear combination of the first two. In Matlab I can do:
code:
>> M = [1 2; 4 3; 3 1]

M =

     1     2
     4     3
     3     1

>> [M, 2*M(:,1)-M(:,2)]

ans =

     1     2     0
     4     3     5
     3     1     5
Whereas in python the things I've been able to come up with work, but seem really messy:
Python code:
>>> M = np.array([[1,2],[4,3],[3,1]])
array([[1, 2],
       [4, 3],
       [3, 1]])

>>> np.vstack((M.T, 2*M[:,0] - M[:,1])).T  # the double transpose method
array([[1, 2, 0],
       [4, 3, 5],
       [3, 1, 5]])

>>> np.hstack((M, (2*M[:,0] - M[:,1]).reshape(3,1)))  # the explicit reshape method
array([[1, 2, 0],
       [4, 3, 5],
       [3, 1, 5]])
I'm sure there's a "right" way to do this, but it's not obvious to me what it is.

I still think np.newaxis is fairly clean, even if it's not Matlab.

Python code:

m = np.random.sample((3, 2))
np.hstack((m, m[:, 0, np.newaxis] + 2*m[:, 1, np.newaxis])) 
array([[ 0.53861827,  0.25918818,  1.05699462],
       [ 0.8475933 ,  0.92157516,  2.69074362],
       [ 0.42827495,  0.0170382 ,  0.46235135]])

But the real answer to this, that goes along with Nippashish's answer, is to avoid growing arrays altogether. If you didn't need to append that linear combination to the original array you could probably make do with a 1D array. IMO the awkwardness here comes from the stacking not from the 1D vs. 2D issue.

SurgicalOntologist fucked around with this message at 07:43 on Jul 24, 2014

# ? Jul 24, 2014 07:41

Nippashish: Nov 2, 2005; Let me see you dance!

SurgicalOntologist posted:

But the real answer to this, that goes along with Nippashish's answer, is to avoid growing arrays altogether.

This is true, it often makes sense to build a list of vectors and then concatenate them into a matrix at the end instead of growing an array like you would in matlab, but that doesn't solve the shape issue. What I was trying to say is that if you work with row vectors instead of column vectors then you don't need to deal with any reshaping/newaxis rejiggering to get them to concatenate nicely into a matrix.

e: For example:

code:

np.vstack((M.T, 2*M[:,0] - M[:,1])).T

becomes

code:

np.vstack((M, 2*M[0] - M[1]))

Nippashish fucked around with this message at 07:55 on Jul 24, 2014

# ? Jul 24, 2014 07:50

SurgicalOntologist: Jun 17, 2004

Yes I understood and agree completely. My additional point was that if you avoid trying to grow arrays then the perceived need for column vectors will probably disappear. One might have to do linear algebra operations in both directions on an array so reorienting to make all operations row-wise isn't always possible. But if you also avoid growing arrays then having the output of those column-wise operations be 1D vectors probably won't be bothersome.

# ? Jul 24, 2014 08:02

the: Jul 18, 2004; by Cowcaster

code:

import string
input = raw_input()
answer = ''
for i in input:
	for j in string.hexdigits:
		if i == j:
			answer += j
print answer

How come an input of 'dog' only spits out 'd' instead of 'dog'?

# ? Jul 24, 2014 18:09

Jose Cuervo: Aug 25, 2004

the posted:

code:
import string
input = raw_input()
answer = ''
for i in input:
	for j in string.hexdigits:
		if i == j:
			answer += j
print answer
How come an input of 'dog' only spits out 'd' instead of 'dog'?

Because string.hexdigits is the string '0123456789abcdefABCDEF' (see https://docs.python.org/2/library/string.html#string.hexdigits).

# ? Jul 24, 2014 18:13

the: Jul 18, 2004; by Cowcaster

Jose Cuervo posted:

Because string.hexdigits is the string '0123456789abcdefABCDEF' (see https://docs.python.org/2/library/string.html#string.hexdigits).

Yeah I should have seen that, sorry. I was looking for string.printable.

To save face, can anyone do this in less than 8 lines, or without using a module?

code:

import string
input = raw_input()
answer = ''
for i in str(input):
	for j in string.printable:
		if i == j:
			answer += j
print answer

the fucked around with this message at 18:17 on Jul 24, 2014

# ? Jul 24, 2014 18:15

accipter: Sep 12, 2003

the posted:

Yeah I should have seen that, sorry. I was looking for string.printable.

To save face, can anyone do this in less than 8 lines, or without using a module?
code:
import string
input = raw_input()
answer = ''
for i in str(input):
	for j in string.printable:
		if i == j:
			answer += j
print answer

code:

import string
input = raw_input()
answer = ''.join([letter for letter in str(input) if letter in string.printable])
print answer

# ? Jul 24, 2014 18:23

the: Jul 18, 2004; by Cowcaster

You win this round

# ? Jul 24, 2014 18:27

Reformed Pissboy: Nov 6, 2003

the posted:

code:

import string
input = raw_input()
answer = ''
for i in str(input):
	for j in string.printable:
		if i == j:
			answer += j
print answer

Nothing wrong with this approach, but you don't need to iterate over string.printable -- i in string.printable without the for will evaluate as "does string.printable contain i?"

Python code:

for i in input:
    if i in string.printable:
        answer += i

_{Also, to hell with list comprehensions

Python code:
import string
input = raw_input()
answer = filter(lambda c: c in string.printable, input)
print answer}

# ? Jul 24, 2014 18:50

KernelSlanders: May 27, 2013; Rogue operating systems on occasion spread lies and rumors about me.

Reformed Pissboy posted:

Nothing wrong with this approach, but you don't need to iterate over string.printable -- i in string.printable without the for will evaluate as "does string.printable contain i?"
Python code:
for i in input:
    if i in string.printable:
        answer += i
_{Also, to hell with list comprehensions

Python code:
import string
input = raw_input()
answer = filter(lambda c: c in string.printable, input)
print answer}

What's wrong with list comprehension approach?

Python code:

>>> thedog = 'the dog'
>>> [c for c in thedog if c in string.hexdigits]
['e', 'd']

# ? Jul 24, 2014 19:01

Reformed Pissboy: Nov 6, 2003

KernelSlanders posted:

What's wrong with list comprehension approach?

Nothing really, I was just being a sillybilly and demonstrating another approach. But, filter() does have the benefit of preserving type if applied to a string or tuple, instead of always returning a list. Saves an obnoxious ''.join().

# ? Jul 24, 2014 19:31

accipter: Sep 12, 2003

Reformed Pissboy posted:

Nothing really, I was just being a sillybilly and demonstrating another approach. But, filter() does have the benefit of preserving type if applied to a string or tuple, instead of always returning a list. Saves an obnoxious ''.join().

Yeah, filter() is much cleaner in this case. I just rarely use it so I usually forget about it.

# ? Jul 24, 2014 19:32

QuarkJets: Sep 8, 2008

I'm just going to say it: gently caress lambda functions

# ? Jul 24, 2014 20:07

Symbolic Butt: Mar 22, 2009; (_!_); Buglord

You can also just use a generator expression directly instead of a list comprehension:

Python code:

answer = ''.join(letter for letter in str(input) if letter in string.printable)

# ? Jul 24, 2014 20:13

namaste friends: Sep 18, 2004; by Smythe

QuarkJets posted:

I'm just going to say it: gently caress lambda functions

Eh...they're really useful once you figure out how they work.

# ? Jul 24, 2014 20:17

Modern Pragmatist: Aug 20, 2008

In case you love regex

Python code:

import string, re
input = re.sub('[^%s]' % string.printable, '', raw_input())

P.S. Please don't do this.

# ? Jul 24, 2014 20:40

QuarkJets: Sep 8, 2008

Cultural Imperial posted:

Eh...they're really useful once you figure out how they work.

PEP 8 disagrees

Really the best reason to have anonymous functions in other languages is that other languages don't let you define named functions within the scope of other functions. But Python lets you define a function wherever you want, with or without invoking lambda, so lambda is just a superfluous way to define a function

# ? Jul 24, 2014 20:56

namaste friends: Sep 18, 2004; by Smythe

QuarkJets posted:

PEP 8 disagrees

Really the best reason to have anonymous functions in other languages is that other languages don't let you define named functions within the scope of other functions. But Python lets you define a function wherever you want, with or without invoking lambda, so lambda is just a superfluous way to define a function

Cool good to know.

# ? Jul 24, 2014 21:09

KernelSlanders: May 27, 2013; Rogue operating systems on occasion spread lies and rumors about me.

QuarkJets posted:

PEP 8 disagrees

Really the best reason to have anonymous functions in other languages is that other languages don't let you define named functions within the scope of other functions. But Python lets you define a function wherever you want, with or without invoking lambda, so lambda is just a superfluous way to define a function

You can use a lambda function inline without assigning it. PEP8 is fine with that usage.

edit:

By which I mean:

Python code:


# This...
map(lambda x: x*x, my_list)

# Is generally preferable to this...
def sq(x):
    return x*x

map(sq, my_list)

# whereas PEP8 prohibits this...
sq = lambda x: x*x
map(sq, my_list)

KernelSlanders fucked around with this message at 21:37 on Jul 24, 2014

# ? Jul 24, 2014 21:30

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

Cultural Imperial posted:

Eh...they're really useful once you figure out how they work.

You know, I've been coding in Python for maybe 6 years and I still always have to stop and think about what the gently caress a lambda is doing when I come across it.

I'm just dumb.

# ? Jul 24, 2014 21:37

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

I don't understand that. It's no different than defining a function with a def.

# ? Jul 24, 2014 21:44

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

Suspicious Dish posted:

I don't understand that. It's no different than defining a function with a def.

Yeah, I don't either.

I guess it's just that since I don't use them (unless I'm almost forced to by something like the key kwarg for sort), so when I see them used it always makes me do a context switch and think about what it's doing. Then I don't use or see one for a few months and go through it all again.

# ? Jul 24, 2014 21:49

Crosscontaminant: Jan 18, 2007

KernelSlanders posted:

Python code:

# This...
map(lambda x: x*x, my_list)

# Is generally preferable to this...
def sq(x):
    return x*x

map(sq, my_list)

The standard library (e.g. operator) provides useful functions like itemgetter for use as key functions and with map/reduce, so there should be no need for a lambda for anything this simplistic. If it's more complex, give it a name and documentation so people who come after you know what the hell you're doing and can use it elsewhere without having to refactor.

# ? Jul 24, 2014 22:08

suffix: Jul 27, 2013; Wheeee!

Reformed Pissboy posted:

Python code:

import string
input = raw_input()
answer = filter(lambda c: c in string.printable, input)
print answer

For completeness' sake:
You don't have to use lambda here, you can use the __contains__ method. (a in b is the same as b.__contains__(a))

Python code:

import string
input = raw_input()
answer = filter(string.printable.__contains__, input)
print answer

This is just a dumb code golf trick though. I think the version with join is clearer and should be preferred. (Although the ''.join() idiom isn't a great example of readability either.)

I.e., I would normally use this version:

Symbolic Butt posted:

Python code:

answer = ''.join(letter for letter in str(input) if letter in string.printable)

This also works in Python 3, where filter() has been changed to return an iterator.

# ? Jul 24, 2014 23:00

SurgicalOntologist: Jun 17, 2004

suffix posted:

For completeness' sake:
You don't have to use lambda here, you can use the __contains__ method. (a in b is the same as b.__contains__(a))
Python code:
import string
input = raw_input()
answer = filter(string.printable.__contains__, input)
print answer
This is just a dumb code golf trick though. I think the version with join is clearer and should be preferred. (Although the ''.join() idiom isn't a great example of readability either.)

It's probably better to do from operator import contains, as implied by the post above yours. (Edit: for a different problem)

SurgicalOntologist fucked around with this message at 23:10 on Jul 24, 2014

# ? Jul 24, 2014 23:07

KernelSlanders: May 27, 2013; Rogue operating systems on occasion spread lies and rumors about me.

Crosscontaminant posted:

The standard library (e.g. operator) provides useful functions like itemgetter for use as key functions and with map/reduce, so there should be no need for a lambda for anything this simplistic. If it's more complex, give it a name and documentation so people who come after you know what the hell you're doing and can use it elsewhere without having to refactor.

How is map(operator.pow, my_list, [2]*len(my_list)) more readable? Or are you suggesting something else?

# ? Jul 24, 2014 23:08

SurgicalOntologist: Jun 17, 2004

nm I'm an idiot.

SurgicalOntologist fucked around with this message at 23:18 on Jul 24, 2014

# ? Jul 24, 2014 23:16

Adbot: ADBOT LOVES YOU

# ? May 8, 2024 07:31

BigRedDot: Mar 6, 2008

Suspicious Dish posted:

I don't understand that. It's no different than defining a function with a def.

Sure it is. Lambdas in python can only contain expressions, not statements. This, and things like map and filter being generally superseded by more efficient list and generator comprehensions makes lambdas in python almost completely useless. The only place I ever use them is as key functions in some of the sorting and operator functions.

# ? Jul 25, 2014 03:35

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python information and short questions megathread.

«‹›484 »