Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
onionradish
Jul 6, 2006

That's spicy.
I want to find specific sequence patterns in a list of tuples, based on the second element in the tuple.

For example, given a list like:
code:
[(dog, animal), (cow, animal), (corn, vegetable), (granite, mineral), (carrot, vegetable), (cat, animal), (cow, animal)]
I'd like to find patterns in the sequence, like "animal, animal" and get [[dog, cow],[cat, cow]] or "vegetable, mineral, vegetable" and get [corn, granite, carrot], or "vegetable, vegetable" and get nothing.

I can think of ways to do this by manually iterating through the list and testing the second tuple element against a bunch of IF statements, but suspect there's a smarter, more Python-like way to do it. I've just started with the language, and am already blown away by how much code Python's constructors eliminate.

Adbot
ADBOT LOVES YOU

onionradish
Jul 6, 2006

That's spicy.
Will that give me the sequence, though? The order will matter, and that's the part I'm not sure how do do "elegantly." For example, "animal, vegetable" vs. "vegetable, animal" should return [cow, corn] and [carrot, cat] respectively. Myabe a Regular Expression?

However, the dictionary form would be helpful elsewhere in code as a list of items in a category. Is there an easy way to convert the list example (assumed assigned to a variable) to that dictionary format? I'm doing it currently by iterating the list with separate constructors with custom IF statements. The categories (second tuple parameter) are hard-wired, but the entries (first tuple parameter) will vary.

onionradish fucked around with this message at 18:35 on May 24, 2012

onionradish
Jul 6, 2006

That's spicy.
Thanks for the ideas on pattern matching! A big help, especially learning about zip(); I'll play around with code this weekend.

onionradish
Jul 6, 2006

That's spicy.
What is the "best practices" method for working with settings/INI-type files in Python?

For example, I have a main script that downloads using a custom host and password to be specified in an INI-like file, and I want to update that INI with the "last-date-processed" so it doesn't process old files. The scripts are running under my complete control, and I'm not worried about someone adding malicious values or codes.

Should I manually read/write a TXT file and parse it, should I use "import xxx.ini" to load it into my script, should I use something like JSON or pickle to save/load the variables, or is there a preferred library I should use?

onionradish
Jul 6, 2006

That's spicy.

OnceIWasAnOstrich posted:

There is ConfigParser built in if you like INI-style files and want to be able to easily hand-edit them which can be a pain for JSON files for people who don't use JSON or javascript.
Thanks! That looks like exactly what I was after!

onionradish
Jul 6, 2006

That's spicy.
I could use some advice on project structure so I'm not kicking myself later.

I'm about to migrate to new machine and start a large personal project "for reals" with versioning, proper classes, etc. instead of the folder of hodgepodge scripts I've been using to learn. When I've done development in other languages, all the code and assets were contained within a single \\dev\projectname path. It was easy to manage dependencies because third-party scripts, modules, frameworks, etc. were revisioned along with the rest of the code.

This project will require some third-party modules (numpy, etc.). Since these get installed to Python's directory, how should I manage dependencies on these modules so I can re-create the development environment when I have to move machines, restore from backup, etc.? I can just keep copies of the module installation packages, but I'm guessing there are better ways to go about it.

onionradish
Jul 6, 2006

That's spicy.
Thanks for the advice; and, the sample dirtree is a big help, so thanks for that detail, Haystack! I'll need to do some more reading and testing on virtualenv as soon as I finish uninstalling all the bloatware that came with the new system.

In the example usage given in the docs:

code:
# This creates ENV/lib/pythonX.X/site-packages
$ python virtualenv.py ENV
Is ENV representative of a full path (C:\my_python_files\), a subdirectory of c:\Python27, or some alias that refers to a full path?

Also, when I'm installing modules/packages, do I install to a project's particular virtualenv, to the native Python directory or both? For example, if I want a module like numpy or PIL to be available to every script, I'm assuming that I install it to native Python directory. For my actual project, I'm assuming I'll also need to install it to its particular virtualenv so that version is the one that will run.

onionradish
Jul 6, 2006

That's spicy.
Awesome! Thanks, everybody. Glad I asked!

onionradish
Jul 6, 2006

That's spicy.
Following up on virtualenv, I'm having failures when trying to install packages, and I'm not sure what I'm doing wrong.

I'm on Windows, and using CMD as the shell. I'm able to create a project folder: "virtualenv test". After I do that, I'm navigating to the test directory (D:\test\project) and entering "scripts\activate". Then I'm trying "pip install lxml" as an example.

A bunch of stuff scrolls by and results in "failed with error code 1". I'm not sure what this means:
code:
Command D:\test\project\Scripts\python.exe -c "import setuptools;__file__='
c:\\windows\\temp\\pip-j7v8xg-build\\setup.py';exec(compile(open(__file__).read(
).replace('\r\n', '\n'), __file__, 'exec'))" install --record c:\windows\temp\pi
p-q2edge-record\install-record.txt --single-version-externally-managed --install
-headers D:\test\project\include\site\python2.7 failed with error code 1 in
 c:\windows\temp\pip-j7v8xg-build
For what it's worth, I don't see "D:\test\project\include\site\*" as a folder; the "include" path only has a bunch of *.h files. What am I doing wrong?

onionradish
Jul 6, 2006

That's spicy.
:shepicide: is right -- for f's sake -- I picked lxml thinking it would be a stupidly simple test package! I'm torn between being happy it's not my fault and angry about the hoops I'm going to have to jump through....

I'll give the VS 2008 and "manually-copy-the-binary" methods a try.

onionradish
Jul 6, 2006

That's spicy.

Hard NOP Life posted:

Why is everyones first instinct to try and compile it instead of just installing the binaries?
Wait, *can* you install binaries to virtualenv setups? I'd far rather do that, and it's how I normally install stuff to the root Python folder. Dealing with compilation sucks.

edit: I'll be damnned; looks like you *can* install most binaries to virtualenv. Time for some testing... Yeah, no, that doesn't actually work.

onionradish fucked around with this message at 18:51 on Dec 15, 2012

onionradish
Jul 6, 2006

That's spicy.
An update to my earlier post: the linked StackOverflow post was about using "easy_install" in a Windows virtualenv, but testing with binaries for PIL, it only appears to work, meaning it actually doesn't. The package only gets partially installed. The same SO post includes a suggestion to change the registry around, which seems really hacky.

onionradish fucked around with this message at 18:53 on Dec 15, 2012

onionradish
Jul 6, 2006

That's spicy.
JetBrains' sale blew up their whole process. And of course it's kind of a cascading mess with some people not having a record of the purchase from the processor, others having ordered multiple times when it wasn't clear whether the order was going through. So now in addition to whatever keygen queue backlog they had from the sale, they've also got a backlog of inquiries to sales@jetbrains.com to work through.

The conflicting communication is kind of the issue. One source says 48 hours, one says within 5-6 hours. I don't mind a couple days waiting for a key, as long as I know to expect that. I don't like needing to play Internet Detective to find Twitter and blog posts to figure out what I'm supposed to do: just chill for a couple days? contact sales?

http://blog.jetbrains.com/blog/2012/12/21/to-all-who-placed-an-order-during-the-end-of-the-world-sale/

https://twitter.com/jetbrains

In the end, after reading comments on the blog from people claiming they got their keys "no problem," I joined the herd and sent my info their sales address too. I really like PyCharm and was going to buy it anyway in January, so I'm thrilled to pick it up for $25.

edit: The email response from sales is: "If your license doesn't reach you by Monday, please let us know and we'll make sure to help you out."

onionradish fucked around with this message at 15:40 on Dec 22, 2012

onionradish
Jul 6, 2006

That's spicy.
I'm trying to whip up a script to convert between Unicode and ASCII HTML entities for some website work, but am failing somewhere on the Unicode conversion. The conversion to named entities works fine, but fails when I convert back to accented Unicode, and specifically on the & rsquo ; entity. What am I missing?

Full code at Pastebin

code:
def entity2chr(n):
    """Convert named HTML entity to unicode character (é -> é)"""
    if n in hd.name2codepoint:
	# Debug to figure out which character fails on next line
        print ''.join(['N:', n, ' ', str(hd.name2codepoint[n])])
        return unichr(hd.name2codepoint[n])  # this should return a unicode character, FAILS HERE
    else:
        return "????"  # this isn't an issue; shouldn't happen

onionradish fucked around with this message at 16:39 on Feb 19, 2013

onionradish
Jul 6, 2006

That's spicy.
UTF/HTML stuff...

The Insect Court posted:

You're over-thinking this.
Thanks. This also revealed that me printing to Windows PowerShell was the cause of the UTF errors, not Python. :arghfist::mad:

onionradish
Jul 6, 2006

That's spicy.
A stupid post was here.

onionradish fucked around with this message at 20:39 on Mar 5, 2013

onionradish
Jul 6, 2006

That's spicy.
I spent a day and a half fighting a similar "problem" and had myself convinced I didn't understand Unicode either. My code was actually fine -- the problem was the output console (Windows Powershell). Try running the code in IDLE, or writing the output to a file and see if you're getting the value you expect. It may be your output console that doesn't understand Unicode, not you.

From IDLE:
>>> m=u'm\u0325'
>>> print m


edit: ^^ the M-dot doesn't show when it's enclosed in a 'code' block on the forum

onionradish
Jul 6, 2006

That's spicy.
Is it bad practice to put a lot of the "prep work" for a class into its __init__?

I'm writing a throwaway script that parses a recipe from a URL to get more familiar with lxml, writing classes, unit tests and exception-handling. In my first cut at the script, "recipe = Recipe(url)" fetches the HTML from the URL, parses it, then populates a bunch of class attributes.

Should I instead be calling a method to do that on the object after initializing it? Something like "recipe = Recipe()" then "recipe.getfromurl(url)"?

onionradish
Jul 6, 2006

That's spicy.
Thanks for the __init__ feedback.

The googletesting link Chosen posted is great timing because I'll be trying to set up tests next so I can refactor now where needed.

I'd actually done all of the "work" in the __init__ through methods as Ronald Raiden suggested, but the idea of making the parameter optional (even though it'd always be provided in practice) seems like it'd be better for testing, allowing creation of a "plain" instance and then assertions against the methods.

onionradish
Jul 6, 2006

That's spicy.
Using lxml, foo.text_content() will return plain text contained within a particular level, but it strips out all HTML tags. How can I get the raw HTML contained within a particular level? For example, how do I get all the content between <div class='dummy'> and </div> not just "the quick brown fox"? (This is probably stupidly easy, but I'm not seeing it....)

code:
from lxml.html import document_fromstring

html = """<div class='dummy'><p>the quick brown fox</p><img src='foxy.gif'></div>"""
doc = document_fromstring(html)
foo = doc.cssselect("div.dummy")
foo[0].text_content()  # I want <p>the quick brown fox</p><img src='foxy.gif'>
EDIT: Well, poo poo, I knew it was mostly easy. lxml.html.tostring() was basically what I was looking for....

onionradish fucked around with this message at 21:43 on Jul 25, 2013

onionradish
Jul 6, 2006

That's spicy.
A helpful resource that should probably be added to the OP is the pyvideo.org website, which archives presentations given at various Python conferences.

Some speakers and presentations are better than others, but there are gems of useful, practical information on unit tests and web frameworks (Flask to Django), standard and other modules (Requests, Pygame, etc.), details on core capabilities like iterators/generators, astronomy and other specialized topics, and so on, usually with links to sample code.

onionradish
Jul 6, 2006

That's spicy.
I'm migrating some of the helper scripts I've written over the years in AutoIt to Python to improve my Python coding. Some of these AutoIt scripts have minimalist Windows GUI elements like a system tray icon that shows that the script is running and can display status as a tooltip on that icon. I'd like to replicate that functionality with the least-possible effort and module dependencies.

wxPython seems to be a decent cross-platform library for basic GUI stuff like tray icons, though the documentation seems thin. Can anyone vouch for it and whether or not any of the reference books ("wxPython in Action", "wxPython 2.8 Application Development Cookbook") are worthwhile?

onionradish
Jul 6, 2006

That's spicy.
I use both the Windows console and PyCharm, but when I started learning I was using just Notepad++ and the console. (I later moved up to Spyder before switching to PyCharm.)

One of the early and recurring negative experiences I had with the Windows console is its inability to display any Unicode or Windows characters. I spent far too much time trying to understand what was wrong with some piece of code only to discover that there was nothing wrong with the code at all. The problem was "print"-ing to a crappy console. An IDE (even IDLE) can at least run
code:
print u'\u2019'
without failing.

A full IDE like PyCharm can be overwhelming for sure, but it can be used as just an editor and output console without learning much about the rest of the IDE -- there's still plenty of stuff in it I've never used at all. While learning, I appreciated its auto-inspection to catch stupid typos and its "nagging" about missing docstrings, line length, etc. to remind me about good coding habits. When I ignore the guidelines, it's a conscious choice. It's also much easier to set breakpoints, step through and watch variables in a GUI for code as it starts to get more complex than it is at the console level.

onionradish
Jul 6, 2006

That's spicy.
I built a basic web app using BaseHTTPServer that does simple formatting based on an sqlite database. It works totally fine as is, and importantly automatically launches my browser when I run the script:
Python code:
...
httpd = HTTPServer(('', 8080), RequestHandler)
webbrowser.open('http://localhost:8080')
httpd.serve_forever()
I'd like to move up to a light web framework and am trying to convert the code to bottle (single-file dependency is valued, which is why I'm not jumping to flask). However, since bottle's 'run()' method blocks, I'm not sure how to invoke webbrowser.open() once the server is active, if it's even possible.

onionradish
Jul 6, 2006

That's spicy.

Pollyanna posted:

Additionally, I can see that /AAPL,GOOG,IBM would be functionally identical to /GOOG,AAPL,IBM which sets off my "don't repeat yourself" alarm. I don't know if it's a false alarm, though.
The URLs aren't functionally identical. Changing the order in the URL changes the order of display. So your first example puts APPL first; the second puts GOOG first. Allowing an easy method to specify that order in the URL is a GOODTHING.

onionradish
Jul 6, 2006

That's spicy.
You can do conditional regex matching with a look-ahead regex. It can make the regex pattern really gnarly, so you'll have to decide whether it's worth it for code readability.
Python code:
'(?=...)' # Positive lookahead assertion
# Matches if '...' matches next.

'(?!...)' # Negative lookahead assertion
# Matches if '...' does not match next.

'(?<=...)' # Positive lookbehind assertion
# Matches if the searched substring is preceded by a match for '...'.
# Note that the contained pattern must only match strings of some fixed length.

'(?<!...)' # Negative lookbehind assertion
# Matches if the searched substring is not preceded by a match for '...'.
# Note that the contained pattern must only match strings of some fixed length.

onionradish
Jul 6, 2006

That's spicy.
I recently split a single script into separate files with grouped related functions so the project would be easier to manage.

After I split the script, there were some global constants defined in the parent script that the imported scripts could no longer see. I was able to move the constants to the appropriate files, or just add them as passed function parameters -- so Python may be helping enforce good coding practices -- but it got me curious to understand the scope of variables across imported files better.

As an example, if I wanted a "parent" script to set global constants or config that imported scripts might use, like the path to a master directory, is there an appropriate way to access those parent variables from an imported script? Is that what __init__.py is for, or is the idea bad practice in general, and an indication that they should be passed parameters?

onionradish
Jul 6, 2006

That's spicy.

QuarkJets posted:

For constant parameters used in more than one place I create a py file that just defines those values, and then whenever I need those constants they're just an import away.
Holy crap... that's such an obvious solution! Thanks!

onionradish
Jul 6, 2006

That's spicy.
I'm writing some unit tests for a script that parses an html page. For testing, I'm using a reference HTML file that's a full WGET of the target page. What's best practice for the assert against a function that returns all tags according to a selector, like lxml's "cssselect()" or bs4's "find_all()"?

Let's say that the reference page is supposed to return 14 <a> items as a list. Is it enough to just verify the "len()" of the results, just check a few of the actual values (maybe first and last), or verify that the full result list matches my list of expected results?

The answer might be "it depends," and that's ok. Mostly I'm wondering if the either or both of first two methods would be considered insufficient or bad practice.

onionradish
Jul 6, 2006

That's spicy.

Dren posted:

Why do you have it in a special function? Isn't returning all the tags matching a selector what find_all() does? Why would you stuff that behavior in a new function then try to test it?

Maybe I oversimplified the example or maybe I'm over-testing. In actual practice, it would only be a function becasue it's going to be called multiple times and has conditionals. Maybe a better example would be a function that returns the urls of leeched images found on a page. (I have a couple of clients that I've been unable to break of the habit when they make blog posts.) So a "leeched_images(url)" function might return 10 of 14 found <img> on site A, and 1 of 6 <img> on site B.

Super-hacky pseudo-code below. If the right thing to do is test the "img_is_on_host(imgsrc)" function and not test "leeched_images()" at all, that's fine. Just trying to understand where to draw the line on testing.

Python code:
# Very hacky example
import lxml.html

def img_is_on_host(imgsrc):
    # in practice, this would be more sophisticated, not hardcoded string, etc
    return imgsrc.startswith('http://clientsite_1.com')

def leeched_images(url):
   # in practice, would resolve relative urls in img src, etc.
    doc = lxml.html.parse(pageurl).getroot()
    images = doc.cssselect('img')
    for image in images:
        imgsrc = image.get('src')
        if not img_is_on_host(imgsrc):
            yield imgsrc

for imageurl in leeched_images('http://clientsite_1.com/blog'):
    print imageurl

onionradish fucked around with this message at 23:00 on Mar 11, 2014

onionradish
Jul 6, 2006

That's spicy.
Thanks for taking the time to write that out -- it was really helpful.

onionradish
Jul 6, 2006

That's spicy.
I'm frequently using a pattern to read from a text "config" file with values on a single line, and want to ignore blanks and comments (lines that start with #). Examples of the config files are RSS URLs to be scraped, folders to be indexed, etc.

Is there are more compact or better pattern than this generator?
Python code:
# Python 2.7, aiming for compatibility with Python 2 and 3
def get_stuff(filename):
    with open(filename, 'rb') as f:
        for line in f:
            line = line.decode('utf8').strip()
            if line and not line.startswith('#'):
                yield line

for item in get_stuff("foo.txt"):
	print(item)

onionradish
Jul 6, 2006

That's spicy.
What's the right way to gracefully handle missing imports -- and thus unavailble capabilities -- that are not critical to a script?

For example, I have a script that I use on my home and work systems. At home, the script uses gntp/Growl to display a nice pop-up graphic "toaster" notification. On my work system, I don't have Growl installed and just displaying the notification text in the console is good enough.

Is what I've done below 'right' or is there a more proper way?

Python code:
# At top of script where the imports live...

try:
    import gntp.notifier
except ImportError:
    gntp = None

# Later, where notification occurs...
# (arguments and text omitted for simplicity)

if gntp:
    gntp.notifier.mini(...)
else:
    print(...)

onionradish
Jul 6, 2006

That's spicy.
The second approach will be a lot cleaner for this script. Thanks!

onionradish
Jul 6, 2006

That's spicy.

the posted:

I'm using Beatbox to query a database in Salesforce. I'm grabbing a list of Account zip code fields. They get returned as type 'instance' and I convert them to a string before doing anything with them.

the, when you initialize beatbox, are you using beatbox.Client() or beatbox.PythonClient()?

I remember looking at the code for beatbox when you were posting about having to wrap everything in str() and shuddering -- as SurgicalOntologist has suggested, the API is godawful.

However, it looks like it's because the default Client API is just a wrapper around some horrifying XML thing that is just dumping out results in a list that you then have to wrap in str(), which is what's part of what's causing your encoding issues.

beatbox includes PythonClient which claims to turn "the returned objects into proper Python data types. e.g. integer fields return integers" and appears to return a list of dictionaries (their example):

Python code:
>>> svc = beatbox.PythonClient()
>>> svc.login('username', 'passwordTOKEN')
>>> res = svc.query("SELECT Id, FirstName, LastName FROM Contact WHERE LastName='Doe'")
>>> res[0]
{'LastName': 'Doe', 'type': 'Contact', 'Id': '0037000000eRf6vAAC', 'FirstName': 'John'}

onionradish
Jul 6, 2006

That's spicy.

the posted:

Huh, I've been using Client. Interesting.

Switch and your code should be MUCH easier to work with. You'll have to change some of your code where you were doing list indexing to get access a record's field, but result['FirstName'] will be a lot easier to read and work with than remembering that result[3] is supposed to be FirstName.

It looks like PythonClient also supports accessing the dictionary in dot-notation, meaning result.FirstName would also work.

onionradish
Jul 6, 2006

That's spicy.

the posted:

FYI I just tried this and it's popping an error. Ideas?

code:
sf = beatbox._tPartnerNS
svc = beatbox.PythonClient()
svc.login(sf_username,sf_pass)

sales_list = []

squery = "SELECT Price FROM Opportunity WHERE Opportunity.RecordType = \'FD - Buying\'"

query = svc.query(squery)

for i in query[sf.records:]:
	print i
	raw_input()
Pops AttributeError: 'module' object has no attribute 'PythonClient'
I don't have access to a SalesForce account anymore, so can't directly test access or queries.

Are you getting that error on the svc = beatbox.PythonClient() line or somewhere else? If you do import beatbox; dir(beatbox) does PythonClient show up in the list? If not, see if your version is the same as the PyPi version.

I'm assuming you retyped the code; the first colon looks like a typo here: for i in query[sf.records:]:

onionradish
Jul 6, 2006

That's spicy.

the posted:

How would I handle reading times that are like:

23:32.6
1:02:55.7

Where it's usually in MM:SS but occasionally HH:MM:SS

I tried using datetime, but that has issue with the tenths of a second. And I'm also wondering how to handle checking whether or not there's an hour present. Should I search to see if there are two colons present in the string?

I need to plot the data, so I have just been trying to read all of the times and then convert it all to seconds. Unless there's a way to plot datetime data?
If you're on Python 2.6+, datetime has added a '%f' format for milliseconds microseconds, which will take care of the tenths of seconds. Checking for the number of colons using str.count() and using either '%M:%S.%f' or '%H:%M:%S.%f' seems easy enough.

%f treats the microsecond value as a fraction, so '.6' gets correctly parsed into '600000' microseconds.
\/ \/ \/

onionradish fucked around with this message at 02:53 on Jul 17, 2014

onionradish
Jul 6, 2006

That's spicy.

thegasman2000 posted:

Sorry I am sure your all fed up of answering this but whats the text editor of choice for a python newbie? Looking for something that will prompt and highlight errors such as indentation before compiling if possible. Trying out textmate at the moment but its not highlighting an error I am getting.
PyCharm is more than a text editor, but I personally found it very helpful when I started learning.

It catches syntax errors before the script is run, warns when code isn't following PEP8 "best practices" in formatting so I learned good habits early, can pop-up documentation on functions and arguments, and includes breakpoint debugging and other tools to inspect variables, step through loops, etc.

I've grown into using some of its advanced features like integration with version control and test automation, but you can ignore all of those kinds of features when starting and just use it to edit scripts.

The Community Edition is free.

Adbot
ADBOT LOVES YOU

onionradish
Jul 6, 2006

That's spicy.

thegasman2000 posted:

Can i access that from pycharm?
Under PyCharm's 'Run' menu, choose 'Edit Configurations...' and enter the arguments in the 'Script Parameters' field. For Crosscontaminant's example, you'd enter 'one two three' (without the quotes). The next time you run the script it will use those parameters.

  • Locked thread