Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
vikingstrike
Sep 23, 2007

whats happening, captain

the posted:

I have a list of 176,876 Ids (strings), and I am trying to find an easy way to group them by occurrences (so I can plot it later). Is this something that Pandas can do?

I've loaded them into a DataFrame. Running something like df.pandas.groupby('ID').group() seems to sort of do what I want (printing out every string and the list locations where it occurs), but I'm really looking for something like:

code:
       ID : Occurrence
342000XBB : 37
200333XCC : 31
342203CBB : 17
edit: I think df.groupby('Id').size() solved the problem!

Second edit: Importing the list into a dataframe, I ended up having to write the list to a CSV file and then load it into the dataframe. I couldn't find a way to load a list into a dataframe directly even after reading the Pandas documentation. Did I miss something?

For your second edit, what exactly do you mean? Where is the list coming from? What have you tried up to this point?

Adbot
ADBOT LOVES YOU

the
Jul 18, 2004

by Cowcaster

vikingstrike posted:

For your second edit, what exactly do you mean? Where is the list coming from? What have you tried up to this point?

I grabbed some information from an SQL query and imported it into a list object. So I have something like:

alist = [['0024242'],['34234234'],['2342341']...] And so on

I tried making it a Series object in Pandas and then "doing stuff" with it, but that didn't work out. My end goal was to do what I did above, which was just to sort the list and find the counts. Once I found out I could do this in a dataframe, I wanted to put this list in a dataframe, but Pandas doesn't appear to support going from a list to a dataframe. The only way I knew how to make a dataframe was with a read_csv command, so I wrote the list to a csv file and then read it back into a dataframe using Pandas.

The March Hare
Oct 15, 2006

Je rêve d'un
Wayne's World 3
Buglord
Howdy,

I'm early into implementing search for the first time ever and I've got a really basic setup going w/ Haystack and ElasticSearch. I got my index built, search works, but ES seems to think that a query like:

code:
www.google.com/
is actually a query for

code:
www.google.comBEGIN_BUT_NEVER_FINISH_REGULAR_EXPRESSION_HERE
and it throws me a big fat EOF warning and refuses to execute the search.

Near as I can tell, this is a known thing in ES when using query_string (which haystack does). Is there any way to either totally disable regex in search (I would sort-of rather not do this) or just make it recognize that a url is not a regular expression? Or just idk escape the forward slash or something?


e; A) Woops, meant to post this in the Django thread.
B) Turns out it was a known issue with Haystack not having "/" in the list of reserved chars for escaping that was resolved on master almost a year ago and there just hasn't been an update to the pypi version of Haystack in a really long time -__-.

The March Hare fucked around with this message at 15:27 on Jul 24, 2014

SurgicalOntologist
Jun 17, 2004

the posted:

Second edit: Importing the list into a dataframe, I ended up having to write the list to a CSV file and then load it into the dataframe. I couldn't find a way to load a list into a dataframe directly even after reading the Pandas documentation. Did I miss something?

I assume you have more than one list? Or it wouldn't be a DataFrame but a Series (i.e. a single column). But there's a ton of ways to create a DataFrame, which can actually be sort of annoying. One way is a dictionary of lists. The keys are the column names.

Python code:
df = pd.DataFrame({'ids': ids, 'some_other_column': some_other_list})
Edit: Saw your other post. You have a list of lists.

Python code:
In [4]: data = [[1,2,3], [10, 20, 30], [100, 200, 300]]

In [5]: pd.DataFrame.from_records(data, columns=['foo', 'bar', 'baz'])
Out[5]: 
   foo  bar  baz
0    1    2    3
1   10   20   30
2  100  200  300
Wait, you just have a list of 1-element lists of strings? Why not just make that a list. It doesn't seem complex enough to justify a DataFrame. A Series maybe.

Python code:
In [9]: stupid_list =  [['0024242'],['34234234'],['2342341']]

In [10]: flat_list = [i[0] for i in stupid_list]

In [11]: pd.Series(flat_list)
Out[11]: 
0     0024242
1    34234234
2     2342341
dtype: object
Pandas can also create a DataFrame directly from a SQL query. See pandas.io.sql_read_frame for example.

SurgicalOntologist fucked around with this message at 21:46 on Jul 23, 2014

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.

Symbolic Butt posted:

You can use the outer product, that's what I use when I need something like that: np.outer(a, a)

Maybe someone better than me can answer you why working with column vectors aren't that great. :v:

Well, there are other times when column vectors are useful I would think, unless there's some other more pythonic structure I don't know about. Say I have a 3x2 matrix M and want to add a third column that's a linear combination of the first two. In Matlab I can do:

code:
>> M = [1 2; 4 3; 3 1]

M =

     1     2
     4     3
     3     1

>> [M, 2*M(:,1)-M(:,2)]

ans =

     1     2     0
     4     3     5
     3     1     5

Whereas in python the things I've been able to come up with work, but seem really messy:

Python code:
>>> M = np.array([[1,2],[4,3],[3,1]])
array([[1, 2],
       [4, 3],
       [3, 1]])

>>> np.vstack((M.T, 2*M[:,0] - M[:,1])).T  # the double transpose method
array([[1, 2, 0],
       [4, 3, 5],
       [3, 1, 5]])

>>> np.hstack((M, (2*M[:,0] - M[:,1]).reshape(3,1)))  # the explicit reshape method
array([[1, 2, 0],
       [4, 3, 5],
       [3, 1, 5]])
I'm sure there's a "right" way to do this, but it's not obvious to me what it is.

vikingstrike
Sep 23, 2007

whats happening, captain

the posted:

I grabbed some information from an SQL query and imported it into a list object. So I have something like:

alist = [['0024242'],['34234234'],['2342341']...] And so on

I tried making it a Series object in Pandas and then "doing stuff" with it, but that didn't work out. My end goal was to do what I did above, which was just to sort the list and find the counts. Once I found out I could do this in a dataframe, I wanted to put this list in a dataframe, but Pandas doesn't appear to support going from a list to a dataframe. The only way I knew how to make a dataframe was with a read_csv command, so I wrote the list to a csv file and then read it back into a dataframe using Pandas.

Make that a plain list, not a list of lists with one element. You could do something like [x[0] for x in alist]. Then just make it a Series and use value_counts() to get what you need. If you don't have another dimension to the data, you probably don't need a DataFrame. Here are the docs for this method:
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.value_counts.html

namaste friends
Sep 18, 2004

by Smythe
I'm trying to format two columns of output for data of variable lengths.

Here's what I'd like to see:

code:
Heading1 Heading2
data     data
Or:

code:
Heading1         Heading2
datadatadatadata datadatadatadatadata
Is it possible to do this without using a non-standard library?

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb

Cultural Imperial posted:

I'm trying to format two columns of output for data of variable lengths.

Here's what I'd like to see:

code:
Heading1 Heading2
data     data
Or:

code:
Heading1         Heading2
datadatadatadata datadatadatadatadata
Is it possible to do this without using a non-standard library?

If you can figure out the max length for each column maybe you can just use ljust?

BeefofAges
Jun 5, 2004

Cry 'Havoc!', and let slip the cows of war.

https://pypi.python.org/pypi/PrettyTable ?

namaste friends
Sep 18, 2004

by Smythe

fletcher posted:

If you can figure out the max length for each column maybe you can just use ljust?

That did the trick! Thanks!

namaste friends
Sep 18, 2004

by Smythe

Has to be part of a default python install I'm afraid. Thanks though.

Nippashish
Nov 2, 2005

Let me see you dance!

KernelSlanders posted:

I'm sure there's a "right" way to do this, but it's not obvious to me what it is.

The right way to do this is to use row vectors. Everything goes a little bit smoother in numpy if your data are in rows instead of in columns.

SurgicalOntologist
Jun 17, 2004

KernelSlanders posted:

Well, there are other times when column vectors are useful I would think, unless there's some other more pythonic structure I don't know about. Say I have a 3x2 matrix M and want to add a third column that's a linear combination of the first two. In Matlab I can do:

code:
>> M = [1 2; 4 3; 3 1]

M =

     1     2
     4     3
     3     1

>> [M, 2*M(:,1)-M(:,2)]

ans =

     1     2     0
     4     3     5
     3     1     5

Whereas in python the things I've been able to come up with work, but seem really messy:

Python code:
>>> M = np.array([[1,2],[4,3],[3,1]])
array([[1, 2],
       [4, 3],
       [3, 1]])

>>> np.vstack((M.T, 2*M[:,0] - M[:,1])).T  # the double transpose method
array([[1, 2, 0],
       [4, 3, 5],
       [3, 1, 5]])

>>> np.hstack((M, (2*M[:,0] - M[:,1]).reshape(3,1)))  # the explicit reshape method
array([[1, 2, 0],
       [4, 3, 5],
       [3, 1, 5]])
I'm sure there's a "right" way to do this, but it's not obvious to me what it is.

I still think np.newaxis is fairly clean, even if it's not Matlab.

Python code:
m = np.random.sample((3, 2))
np.hstack((m, m[:, 0, np.newaxis] + 2*m[:, 1, np.newaxis])) 
array([[ 0.53861827,  0.25918818,  1.05699462],
       [ 0.8475933 ,  0.92157516,  2.69074362],
       [ 0.42827495,  0.0170382 ,  0.46235135]])

But the real answer to this, that goes along with Nippashish's answer, is to avoid growing arrays altogether. If you didn't need to append that linear combination to the original array you could probably make do with a 1D array. IMO the awkwardness here comes from the stacking not from the 1D vs. 2D issue.

SurgicalOntologist fucked around with this message at 07:43 on Jul 24, 2014

Nippashish
Nov 2, 2005

Let me see you dance!

SurgicalOntologist posted:

But the real answer to this, that goes along with Nippashish's answer, is to avoid growing arrays altogether.

This is true, it often makes sense to build a list of vectors and then concatenate them into a matrix at the end instead of growing an array like you would in matlab, but that doesn't solve the shape issue. What I was trying to say is that if you work with row vectors instead of column vectors then you don't need to deal with any reshaping/newaxis rejiggering to get them to concatenate nicely into a matrix.

e: For example:

code:
np.vstack((M.T, 2*M[:,0] - M[:,1])).T
becomes

code:
np.vstack((M, 2*M[0] - M[1]))

Nippashish fucked around with this message at 07:55 on Jul 24, 2014

SurgicalOntologist
Jun 17, 2004

Yes I understood and agree completely. My additional point was that if you avoid trying to grow arrays then the perceived need for column vectors will probably disappear. One might have to do linear algebra operations in both directions on an array so reorienting to make all operations row-wise isn't always possible. But if you also avoid growing arrays then having the output of those column-wise operations be 1D vectors probably won't be bothersome.

the
Jul 18, 2004

by Cowcaster
code:
import string
input = raw_input()
answer = ''
for i in input:
	for j in string.hexdigits:
		if i == j:
			answer += j
print answer
How come an input of 'dog' only spits out 'd' instead of 'dog'?

Jose Cuervo
Aug 25, 2004

the posted:

code:
import string
input = raw_input()
answer = ''
for i in input:
	for j in string.hexdigits:
		if i == j:
			answer += j
print answer
How come an input of 'dog' only spits out 'd' instead of 'dog'?

Because string.hexdigits is the string '0123456789abcdefABCDEF' (see https://docs.python.org/2/library/string.html#string.hexdigits).

the
Jul 18, 2004

by Cowcaster

Jose Cuervo posted:

Because string.hexdigits is the string '0123456789abcdefABCDEF' (see https://docs.python.org/2/library/string.html#string.hexdigits).

:doh:

Yeah I should have seen that, sorry. I was looking for string.printable.

To save face, can anyone do this in less than 8 lines, or without using a module?

code:
import string
input = raw_input()
answer = ''
for i in str(input):
	for j in string.printable:
		if i == j:
			answer += j
print answer

the fucked around with this message at 18:17 on Jul 24, 2014

accipter
Sep 12, 2003

the posted:

:doh:

Yeah I should have seen that, sorry. I was looking for string.printable.

To save face, can anyone do this in less than 8 lines, or without using a module?

code:
import string
input = raw_input()
answer = ''
for i in str(input):
	for j in string.printable:
		if i == j:
			answer += j
print answer

code:
import string
input = raw_input()
answer = ''.join([letter for letter in str(input) if letter in string.printable])
print answer

the
Jul 18, 2004

by Cowcaster
:argh: You win this round

Reformed Pissboy
Nov 6, 2003

the posted:

code:
import string
input = raw_input()
answer = ''
for i in str(input):
	for j in string.printable:
		if i == j:
			answer += j
print answer

Nothing wrong with this approach, but you don't need to iterate over string.printable -- i in string.printable without the for will evaluate as "does string.printable contain i?"

Python code:
for i in input:
    if i in string.printable:
        answer += i

Also, to hell with list comprehensions :hehe:
Python code:
import string
input = raw_input()
answer = filter(lambda c: c in string.printable, input)
print answer

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.

Reformed Pissboy posted:

Nothing wrong with this approach, but you don't need to iterate over string.printable -- i in string.printable without the for will evaluate as "does string.printable contain i?"

Python code:
for i in input:
    if i in string.printable:
        answer += i

Also, to hell with list comprehensions :hehe:
Python code:
import string
input = raw_input()
answer = filter(lambda c: c in string.printable, input)
print answer


What's wrong with list comprehension approach?

Python code:
>>> thedog = 'the dog'
>>> [c for c in thedog if c in string.hexdigits]
['e', 'd']

Reformed Pissboy
Nov 6, 2003

KernelSlanders posted:

What's wrong with list comprehension approach?

Nothing really, I was just being a sillybilly and demonstrating another approach. But, filter() does have the benefit of preserving type if applied to a string or tuple, instead of always returning a list. Saves an obnoxious ''.join().

accipter
Sep 12, 2003

Reformed Pissboy posted:

Nothing really, I was just being a sillybilly and demonstrating another approach. But, filter() does have the benefit of preserving type if applied to a string or tuple, instead of always returning a list. Saves an obnoxious ''.join().

Yeah, filter() is much cleaner in this case. I just rarely use it so I usually forget about it.

QuarkJets
Sep 8, 2008

I'm just going to say it: gently caress lambda functions

Symbolic Butt
Mar 22, 2009

(_!_)
Buglord
You can also just use a generator expression directly instead of a list comprehension:

Python code:
answer = ''.join(letter for letter in str(input) if letter in string.printable)

namaste friends
Sep 18, 2004

by Smythe

QuarkJets posted:

I'm just going to say it: gently caress lambda functions

Eh...they're really useful once you figure out how they work.

Modern Pragmatist
Aug 20, 2008
In case you love regex

Python code:
import string, re
input = re.sub('[^%s]' % string.printable, '', raw_input())
P.S. Please don't do this.

QuarkJets
Sep 8, 2008

Cultural Imperial posted:

Eh...they're really useful once you figure out how they work.

PEP 8 disagrees

Really the best reason to have anonymous functions in other languages is that other languages don't let you define named functions within the scope of other functions. But Python lets you define a function wherever you want, with or without invoking lambda, so lambda is just a superfluous way to define a function

namaste friends
Sep 18, 2004

by Smythe

QuarkJets posted:

PEP 8 disagrees

Really the best reason to have anonymous functions in other languages is that other languages don't let you define named functions within the scope of other functions. But Python lets you define a function wherever you want, with or without invoking lambda, so lambda is just a superfluous way to define a function

Cool good to know.

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.

QuarkJets posted:

PEP 8 disagrees

Really the best reason to have anonymous functions in other languages is that other languages don't let you define named functions within the scope of other functions. But Python lets you define a function wherever you want, with or without invoking lambda, so lambda is just a superfluous way to define a function

You can use a lambda function inline without assigning it. PEP8 is fine with that usage.

edit:

By which I mean:

Python code:

# This...
map(lambda x: x*x, my_list)

# Is generally preferable to this...
def sq(x):
    return x*x

map(sq, my_list)

# whereas PEP8 prohibits this...
sq = lambda x: x*x
map(sq, my_list)

KernelSlanders fucked around with this message at 21:37 on Jul 24, 2014

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

Cultural Imperial posted:

Eh...they're really useful once you figure out how they work.

You know, I've been coding in Python for maybe 6 years and I still always have to stop and think about what the gently caress a lambda is doing when I come across it.

I'm just dumb.

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe
I don't understand that. It's no different than defining a function with a def.

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

Suspicious Dish posted:

I don't understand that. It's no different than defining a function with a def.

Yeah, I don't either.

I guess it's just that since I don't use them (unless I'm almost forced to by something like the key kwarg for sort), so when I see them used it always makes me do a context switch and think about what it's doing. Then I don't use or see one for a few months and go through it all again.

Crosscontaminant
Jan 18, 2007

KernelSlanders posted:

Python code:
# This...
map(lambda x: x*x, my_list)

# Is generally preferable to this...
def sq(x):
    return x*x

map(sq, my_list)
The standard library (e.g. operator) provides useful functions like itemgetter for use as key functions and with map/reduce, so there should be no need for a lambda for anything this simplistic. If it's more complex, give it a name and documentation so people who come after you know what the hell you're doing and can use it elsewhere without having to refactor.

suffix
Jul 27, 2013

Wheeee!

Reformed Pissboy posted:

Python code:
import string
input = raw_input()
answer = filter(lambda c: c in string.printable, input)
print answer

For completeness' sake:
You don't have to use lambda here, you can use the __contains__ method. (a in b is the same as b.__contains__(a))
Python code:
import string
input = raw_input()
answer = filter(string.printable.__contains__, input)
print answer
This is just a dumb code golf trick though. I think the version with join is clearer and should be preferred. (Although the ''.join() idiom isn't a great example of readability either.)

I.e., I would normally use this version:

Symbolic Butt posted:

Python code:
answer = ''.join(letter for letter in str(input) if letter in string.printable)
This also works in Python 3, where filter() has been changed to return an iterator.

SurgicalOntologist
Jun 17, 2004

suffix posted:

For completeness' sake:
You don't have to use lambda here, you can use the __contains__ method. (a in b is the same as b.__contains__(a))
Python code:
import string
input = raw_input()
answer = filter(string.printable.__contains__, input)
print answer
This is just a dumb code golf trick though. I think the version with join is clearer and should be preferred. (Although the ''.join() idiom isn't a great example of readability either.)

It's probably better to do from operator import contains, as implied by the post above yours. (Edit: for a different problem)

SurgicalOntologist fucked around with this message at 23:10 on Jul 24, 2014

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.

Crosscontaminant posted:

The standard library (e.g. operator) provides useful functions like itemgetter for use as key functions and with map/reduce, so there should be no need for a lambda for anything this simplistic. If it's more complex, give it a name and documentation so people who come after you know what the hell you're doing and can use it elsewhere without having to refactor.

How is map(operator.pow, my_list, [2]*len(my_list)) more readable? Or are you suggesting something else?

SurgicalOntologist
Jun 17, 2004

nm I'm an idiot.

SurgicalOntologist fucked around with this message at 23:18 on Jul 24, 2014

Adbot
ADBOT LOVES YOU

BigRedDot
Mar 6, 2008

Suspicious Dish posted:

I don't understand that. It's no different than defining a function with a def.

Sure it is. Lambdas in python can only contain expressions, not statements. This, and things like map and filter being generally superseded by more efficient list and generator comprehensions makes lambdas in python almost completely useless. The only place I ever use them is as key functions in some of the sorting and operator functions.

  • Locked thread