Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
Cat Plus Plus
Apr 8, 2011

:frogc00l:

Modern Pragmatist posted:

I guess one way around it would be to just leave it as a bytestring until we read in all values, and then convert them using the specific encoding if it's provided and iso8859 otherwise.

Yes, that's better. If you decode non-ISO8859 bytestring to Unicode with ISO8859 codec, you've already corrupted the data — reencoding and then decoding with a different codec will not buy you anything but further corruption.

Adbot
ADBOT LOVES YOU

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe
So, it sounds like to me that you decoded a bytestring incorrectly, and now you want to try again using a different encoding specified in the file.

The correct way to fix that is not to re-encode back out to a bytestring, but just keep the bytestring around.

Modern Pragmatist
Aug 20, 2008
Ok great. That's finally starting to click. I'll have to rework some things to get that setup but I'll probably end up storing all the bytestrings and then decode when a specific encoding is provided.

Another related question then. Say I have the Item class:

Python code:
class Item:
    def __init__(self,value,encoding):
        self._bytestring = value
        self.value = self._bytestring.decode(encoding)
        self._encoding = encoding
Now if the user wants to change the value of Item.value, then how do you go about ensuring that the string provided uses the encoding that we need? I'm guess the answer is that you cannot, in which case would I have to create a method that requires the user to specify which encoding they are using or requires them to provide a bytestring instead?

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe
Why are you not using the Unicode sandwich?

Modern Pragmatist
Aug 20, 2008

Suspicious Dish posted:

Why are you not using the Unicode sandwich?

Just watched Ned Batchelder's talk at Pycon. I'll take a stab at it. Thanks.

nonathlon
Jul 9, 2004
And yet, somehow, now it's my fault ...
Up to now, I've been using Epydoc for API documentation. Even when Epydoc was abandoned and starting bitrotting, I just patched it up and kept going. But finally it's time to get with the new hotness, so I'm moving everything across to Sphinx, which has involved learning a whole new way of doing things. Some odd questions:

* I'm extracting docstrings from my code with apidoc. The big irritation at this is that apidoc looks in a given directory for modules and then parses them ... which means that if you point it at the top level of a typical python package, it picks up setup.py and the tests dir as well. apidoc says it can exclude modules as well, but this functionality is only partial working. Any better way of doing things?

* Sphinx uses/creates hella files and dirs. I'd prefer to just package (and distribute) the output rather than all the doctrees etc. What are other people doing?

* Swapping doc tools meant swapping the format doctsrings are in. The native Sphinx format looks cryptic to me. The google form looks better as does the stuff used by the Cartouche extension. Opinions.

* I produce a lot of namespaced packages (e.g. foo.bar), which Sphinx doesn't seem to entirely like. Or rather, it present the documentation for them in a strange way:

code:
Welcome to foo.bar's documentation!

foo.bar
   foo Package

bar Package
   bar Package
   analysis Module
   nodes Module
  Subpackages

io Package
   io Package
   baseio Module
  dialect Module
which doesn't seem entirely useful to me for a package that's physically structured:

code:
foo
- bar
   - analysis.py
   - nodes.py
   - io (dir)
       - baseio.py
       - dialect.py

the
Jul 18, 2004

by Cowcaster
So I have an array of a bunch of floats. I want to find what spot in the array that 4.25 is in there (it's in there). How do I do that?

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

the posted:

So I have an array of a bunch of floats. I want to find what spot in the array that 4.25 is in there (it's in there). How do I do that?

Are we talking precisely 4.25, or just something that is 4.25 to some degree of precision? If it is the former you can just use the index() method.

As in: mylist.index(4.25)

If it needn't be precisely 4.25 but just quite close, I guess you have to do something more involved.

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe

the posted:

So I have an array of a bunch of floats. I want to find what spot in the array that 4.25 is in there (it's in there). How do I do that?

I guess I'll take the more involved approach. Why do you think you need the index into the array? You rarely need it in Python.

lunar detritus
May 6, 2009


This is probably stupid but I want to execute a command line program from a webpage using python. I'm probably going to use flask and just put a 'Start' and a 'Stop button that executes and stops the program. I haven't tried yet but I should use subprocess, right?

My main question is, is there a way to capture the shell's output? The program captures audio and it displays in the shell how big the file is and how long it has been recording. I'm guessing I can use AJAX in the client-side to constantly check for updates but I'm not sure how to capture and refresh that info server-side. Any ideas?

Emacs Headroom
Aug 2, 2003

Suspicious Dish posted:

I guess I'll take the more involved approach. Why do you think you need the index into the array? You rarely need it in Python.

He's probably doing something with his physics homework where he has two measurements and he's got to get the measurement of the second at the same time that the first one is at a certain value.

The annoying thing though (other than the fact he's coming to this thread constantly for his physics homework) is that in this post addressed to him in this very drat thread I explained in detail how to do this with numpy.

It's like talking to a wall. A wall that wants more help with its homework.

the
Jul 18, 2004

by Cowcaster
Sorry, sometimes I forget if I ask the same question. I really do appreciate the help.

It was what he said. I had two arrays of 50 elements that were related. One had an ith position that had a value of 4.25, and I wanted to find out what i was so I could figure out what the ith element of the second array was.

I ended up doing:

Python code:
for i in [i for i,x in enumerate(testlist) if x == 1]: print i
which worked perfectly.

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe

the posted:

It was what he said. I had two arrays of 50 elements that were related. One had an ith position that had a value of 4.25, and I wanted to find out what i was so I could figure out what the ith element of the second array was.

code:
for a, b in zip(L1, L2):
    if matches(a):
        return b

MC Fruit Stripe
Nov 26, 2002

around and around we go
I thought of a specific task I wanted to do in Python, really more of an excuse to learn more Python than anything else, but I've hit a brick wall. What command or module do I want to be using to pull specific text from a website?

I am trying to extract the styles from an artist's Allmusic.com page. In the source code of the site is this

code:
<dd class="styles">
    <ul>
    <li><a href="/style/club-dance-ma0000002544">Club/Dance</a></li>
    <li><a href="/style/soul-ma0000002865">Soul</a></li>
    <li><a href="/style/urban-ma0000011965">Urban</a></li>
    <li><a href="/style/adult-contemporary-r-b-ma0000012131">Adult Contemporary R&B</a></li>
    <li><a href="/style/contemporary-pop-rock-ma0000004443">Contemporary Pop/Rock</a></li>
    <li><a href="/style/contemporary-r-b-ma0000002969">Contemporary R&B</a></li>
    <li><a href="/style/dance-rock-ma0000012069">Dance-Rock</a></li>
    <li><a href="/style/funk-ma0000002606">Funk</a></li>
    <li><a href="/style/alternative-indie-rock-ma0000012230">Alternative/Indie Rock</a></li>
    <li><a href="/style/dance-pop-ma0000004548">Dance-Pop</a></li>
    <li><a href="/style/neo-psychedelia-ma0000012252">Neo-Psychedelia</a></li>
    </ul>
</dd>
I'd like to tell it to read the file, which I am currently grabbing with wget, and to save the file with only what's between <dd class="styles"> and </dd>. I could then take those results, and tell it to tell me only what's between > and < because the only thing that is ever between those in that list is the plain text name of the style.

The problem that I am having is that even though I can get python to input the file, print it out, print out only X characters, print starting from X character etc, I am not understanding how to extract a section of the file. Admittedly, I know diddly, so a little help would be appreciated.

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe

MC Fruit Stripe posted:

I'd like to tell it to read the file, which I am currently grabbing with wget, and to save the file with only what's between <dd class="styles"> and </dd>. I could then take those results, and tell it to tell me only what's between > and < because the only thing that is ever between those in that list is the plain text name of the style.

The problem that I am having is that even though I can get python to input the file, print it out, print out only X characters, print starting from X character etc, I am not understanding how to extract a section of the file. Admittedly, I know diddly, so a little help would be appreciated.

First of all, does Allmusic.com have an API? If so, use that instead of page scraping.

If it doesn't, then you can scrape the page. Use lxml.html to parse the HTML file, and extract the contents you want.

OnceIWasAnOstrich
Jul 22, 2006

MC Fruit Stripe posted:

I thought of a specific task I wanted to do in Python, really more of an excuse to learn more Python than anything else, but I've hit a brick wall. What command or module do I want to be using to pull specific text from a website?

I am trying to extract the styles from an artist's Allmusic.com page. In the source code of the site is this

code:
<dd class="styles">
    <ul>
    <li><a href="/style/club-dance-ma0000002544">Club/Dance</a></li>
    <li><a href="/style/soul-ma0000002865">Soul</a></li>
    <li><a href="/style/urban-ma0000011965">Urban</a></li>
    <li><a href="/style/adult-contemporary-r-b-ma0000012131">Adult Contemporary R&B</a></li>
    <li><a href="/style/contemporary-pop-rock-ma0000004443">Contemporary Pop/Rock</a></li>
    <li><a href="/style/contemporary-r-b-ma0000002969">Contemporary R&B</a></li>
    <li><a href="/style/dance-rock-ma0000012069">Dance-Rock</a></li>
    <li><a href="/style/funk-ma0000002606">Funk</a></li>
    <li><a href="/style/alternative-indie-rock-ma0000012230">Alternative/Indie Rock</a></li>
    <li><a href="/style/dance-pop-ma0000004548">Dance-Pop</a></li>
    <li><a href="/style/neo-psychedelia-ma0000012252">Neo-Psychedelia</a></li>
    </ul>
</dd>
I'd like to tell it to read the file, which I am currently grabbing with wget, and to save the file with only what's between <dd class="styles"> and </dd>. I could then take those results, and tell it to tell me only what's between > and < because the only thing that is ever between those in that list is the plain text name of the style.

The problem that I am having is that even though I can get python to input the file, print it out, print out only X characters, print starting from X character etc, I am not understanding how to extract a section of the file. Admittedly, I know diddly, so a little help would be appreciated.

Take a look at the BeautifulSoup library or Scrapy, much easier than trying to extract bits of HTML with string commands.

MC Fruit Stripe
Nov 26, 2002

around and around we go
I appreciate the answers, and will sit down with all of the options tonight and see what I can learn. Thank you.

lunar detritus
May 6, 2009


MC Fruit Stripe posted:

I appreciate the answers, and will sit down with all of the options tonight and see what I can learn. Thank you.

For what's worth I used Scrapy for something very similar and it worked perfectly well (at least once I figured out the appropriate XPath selector).

Titan Coeus
Jul 30, 2007

check out my horn

OnceIWasAnOstrich posted:

Take a look at the BeautifulSoup library or Scrapy, much easier than trying to extract bits of HTML with string commands.

Seconding BeautifulSoup. Made my life much easier when parsing poorly formatted HTML.

MC Fruit Stripe
Nov 26, 2002

around and around we go
Sigh. I am using BeautifulSoup.

All I want is to take a page on Allmusic.com, say The Beatles page, located here: http://www.allmusic.com/artist/the-beatles-mn0000754032

And ultimately, after however many steps, have exactly the following result

code:
The Beatles	British Invasion;British Psychedelia;Contemporary Pop/Rock;Early Pop/Rock
(Styles pruned for table breakage)

That just does not feel that difficult. But I can get it to return the page, return the page with formatting stripped, I can get it to tell me where 'styles' is, return every link or bolded word in the page, but I just can not get it to grab a section of text.

Help?

I've been trying to use the findAll() function, but it's as useless as I am. The problem is that there's nothing specific to find. I am not looking for the word styles, I am looking for the word styles then the next X words in the document until the list is over, and that is my total confusion. I don't want to search for text, I want to find a word, then return text until another word.

MC Fruit Stripe fucked around with this message at 00:34 on Aug 5, 2012

Titan Coeus
Jul 30, 2007

check out my horn

MC Fruit Stripe posted:

Either they changed the meaning of parsing (unlikely), what I am trying to do is more complicated than space exploration (less likely), or I am just not loving seeing this (very likely).

All I want is to take a page on Allmusic.com, say The Beatles page, located here: http://www.allmusic.com/artist/the-beatles-mn0000754032

And ultimately, after however many steps, have exactly the following result

code:
The Beatles	British Invasion;British Psychedelia;Contemporary Pop/Rock;Early Pop/Rock
(Styles pruned for table breakage)

That just does not feel that difficult. But I can get it to return the page, return the page with formatting stripped, I can get it to tell me where 'styles' is, return every link or bolded word in the page, but I just can not get it to grab a section of text.

Help?

I have no idea how to suggest improvements as I have no clue what you are doing.

Emacs Headroom
Aug 2, 2003

MC Fruit Stripe posted:

The problem is that there's nothing specific to find. I am not looking for the word styles, I am looking for the word styles then the next X words in the document until the list is over, and that is my total confusion. I don't want to search for text, I want to find a word, then return text until another word.

Couple of hints. First, you can use the Firefox Inspector (Tools -> Web Developer -> Inspect) to see that every style link is under a particular div. In this case, it's under dd.styles (which itself is under dl.details which is under div#sidebar.left). So you can use beautifulsoup to return stuff under that div.

Another hint, if the former seems like something in Martian, is to notice that every style listed there is a link, and the link is of the form http://www.allmusic.com/style/blahblahblah

So you can search for that link format, and return the text associated with that href. It would be better just to grab stuff in the div though.

If you don't know anything about how html is structured, this will be slightly harder for you.

Emacs Headroom fucked around with this message at 00:45 on Aug 5, 2012

how!!
Nov 19, 2011

by angerbot
I am writing a python library. It is very simple, and laid out almost exactly like the project described here: http://guide.python-distribute.org/quickstart.html#lay-out-your-project

My setup.py looks the same too. When I create a new virtualenv, then run python setup.py install, I get this output:

code:
$ python setup.py install
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'install_requires'
  warnings.warn(msg)
running install
running build
running build_py
running install_lib
creating /Users/chris/Documents/python-etsy/ettest/lib/python2.7/site-packages/etsy
copying build/lib/etsy/__init__.py -> /Users/chris/Documents/python-etsy/ettest/lib/python2.7/site-packages/etsy
byte-compiling /Users/chris/Documents/python-etsy/ettest/lib/python2.7/site-packages/etsy/__init__.py to __init__.pyc
running install_egg_info
Writing /Users/chris/Documents/python-etsy/ettest/lib/python2.7/site-packages/python_etsy-0.1-py2.7.egg-info
Then when I go to the lib/python2.7/site-packages/ folder, I see my module there, but when I try to import it from the shell, it gives me an "ImportError: No module named etsy"

the code for my project is here: https://github.com/priestc/python-etsy

What could be causing this?

Cat Plus Plus
Apr 8, 2011

:frogc00l:

how!! posted:

What could be causing this?

PYTHONPATH might be wrong. Inspect sys.path in that shell. Make sure virtualenv is active before you start the shell.

Ulio
Feb 17, 2011


Why am I getting a syntax error for the 7 in this "Python 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)] on win32" ???

Lysidas
Jul 26, 2002

John Diefenbaker is a madman who thinks he's John Diefenbaker.
Pillbug
Because that isn't Python code? :confused:

Ulio
Feb 17, 2011


nvm fixed it, thanks. IDLE was just being weird.

edit: Everything I type on IDLE says invalid syntax. It even errors the text written at the very top which appears each time on its own.

Ulio fucked around with this message at 02:20 on Aug 8, 2012

Titan Coeus
Jul 30, 2007

check out my horn

Ulio posted:

nvm fixed it, thanks. IDLE was just being weird.

edit: Everything I type on IDLE says invalid syntax. It even errors the text written at the very top which appears each time on its own.

Screenshot?

Ulio
Feb 17, 2011


I guessing it's something simple I am forgetting but here is a saved file I'm trying to run.

http://imgur.com/jXifV

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe
Yes, that's invalid syntax.

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

Suspicious Dish posted:

Yes, that's invalid syntax.

More specifically...

You need another + between name and "!" on the final line

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe
No. Not only that. He's trying to make Python run a REPL session from a file, and it's choking on the ">>>".

Ulio
Feb 17, 2011


Ok I got it to work now, thanks for the answer. It worked when I opened a new window for it.

Bodhi Tea
Oct 2, 2006

seconds are secular, moments are mine, self is illusion, music's divine.
Say I have a function with an optional argument which I don't know the default value of:

code:
def foo (max_id=<mystery value>)
and another function, that will call foo(). Is there a more succinct way of doing this?
code:
def bar(max_id=None):
    if max_id:
          foo(max_id)
    else:
          foo()

Cat Plus Plus
Apr 8, 2011

:frogc00l:

Bodhi Tea posted:

Is there a more succinct way of doing this?

First you yell at the developer who wrote that other function, and then you do foo(max_id) (because now function uses None for the default value).

You could do this, but I'm not really sure if it's that much better:

Python code:
def bar(max_id = None):
    foo_args = [max_id] if max_id is not None else []
    foo(*foo_args)
It doesn't buy you much, and you need to think a little longer to see what's happening.

tef
May 30, 2004

-> some l-system crap ->

Bodhi Tea posted:

Say I have a function with an optional argument which I don't know the default value of:

code:
def bar(*args, **kwargs):
    foo(*args, **kwargs)
http://docs.python.org/library/functools.html functools.wraps might be useful

Cat Plus Plus
Apr 8, 2011

:frogc00l:
I don't think this the case where all bar does is call foo, because it would make it pretty much pointless. If you use that argument later in the function then going with *args, **kwargs might actually make it worse.

Captain von Trapp
Jan 23, 2006

I don't like it, and I'm sorry I ever had anything to do with it.
I apologize for the dumb question, but I am by no stretch a real programmer and am just fooling around with Python for fun. Someone posed a challenge in which I'm to find all four-digit numbers n such that n and 4n have the same digits, rearranged. Originally, I checked them all in Mathematica, where I did this (in a for loop):

code:
If[Sort[IntegerDigits[n]] == Sort[IntegerDigits[4n]], Print[n]]
What's the Pythonic way to do this? At first I tried:

code:
print [x for x in range(1000,10000) if list(str(x)).sort() == list(str(4*x)).sort()]
Which of course fails because sort sorts in place rather than returning a list for the == operator to act on. It would be easy to do this without list comprehensions but I'm guessing this ought to be a one-liner for a programmer. So I did:

code:
print [x for x in range(1000,10000) if set(str(x)) == set(str(4*x))]
Which mostly works but includes spurious results because (say) 4160 multiplied by four has the same digits but one of them is duplicated. Sets don't care about duplication. (Of course I can eliminate this problem by using 2500 for the upper limit of the range but I'm interested in the "proper" way to see if two lists are the same up to ordering.)

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe

tef posted:

code:
def bar(*args, **kwargs):
    foo(*args, **kwargs)
http://docs.python.org/library/functools.html functools.wraps might be useful

That would make bar(None) behave wrong. I'm not sure what Bodhi wants, though.

Adbot
ADBOT LOVES YOU

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe

Captain von Trapp posted:

Which of course fails because sort sorts in place rather than returning a list for the == operator to act on.

Try the "sorted" function:

code:
print [x for x in range(1000,10000) if sorted(str(x)) == sorted(str(4*x))]

  • Locked thread