Python information and short questions megathread.

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python information and short questions megathread.

«‹›484 »

Dominoes: Sep 20, 2007

Lumpy posted:

You can write your own ModelManager to handle hooking up the ORM to a remote DB. We are in the process of writing a Django app that uses a custom ModelManager coupled with SQLAlchemy to use a (*shudder*) Azure Cloud MSSQL store for our data.

It looks like Django's model system is working well, and is a straightforward solution.

# ? Jan 13, 2014 00:33

Adbot: ADBOT LOVES YOU

# ? May 31, 2024 01:46

KICK BAMA KICK: Mar 2, 2009

Thanks for the recommendation of Think Python some time ago. Working through one of the exercises (using some syntax not yet covered in the text) I came up with this as part of my solution:

code:

def sorted_anagrams(anagram_dict, min_chars = 0, min_anagrams = 0):
    """Returns a tuple containing lists of anagrams, in descending order of list length.

    anagram_dict: A dictionary of the form generated by make_anagram_dict. Keys are strings of characters,
    alphabetically sorted; values are lists of anagrams made from those characters.

    min_chars, min_anagrams: If provided, excludes anagrams with fewer than the minimum number
    of characters or groups with fewer than the minimum number of anagrams."""

    return tuple(anagrams for characters, anagrams in sorted(anagram_dict.items(), key=lambda t: len(t[1]), reverse=True)
                 if len(characters) >= min_chars and len(anagrams) >= min_anagrams)

My question: is a one-liner comprehension like that "Pythonic" or is that trying too hard to be Pythonic? To me that was more natural than writing it out in the longer fashion but I could imagine someone else reading that and saying "No, dude, just break that down."

# ? Jan 13, 2014 03:08

BeefofAges: Jun 5, 2004; Cry 'Havoc!', and let slip the cows of war.

To me that looks practically unreadable. The good thing is that the docstring explains what's going on. When I find an undocumented comprehension that looks like that buried in someone's code I want to strangle them.

# ? Jan 13, 2014 04:33

QuarkJets: Sep 8, 2008

KICK BAMA KICK posted:

Thanks for the recommendation of Think Python some time ago. Working through one of the exercises (using some syntax not yet covered in the text) I came up with this as part of my solution:
code:
CODE
My question: is a one-liner comprehension like that "Pythonic" or is that trying too hard to be Pythonic? To me that was more natural than writing it out in the longer fashion but I could imagine someone else reading that and saying "No, dude, just break that down."

One-liners are fine if they're easy to comprehend, but this one is pretty complex. One-liners are not good if I'm afraid that it looks easily breakable, which this one does (even if it isn't)

Even if you wanted to keep this as a one-liner, I'd suggest breaking the logic into several more lines (it's already 2 lines, technically)

# ? Jan 13, 2014 04:44

Plorkyeran: Mar 22, 2007; To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed

I'd pull the sorted(...) out of the comprehension, but otherwise think it's fine.

The problem isn't asking for the final result to be a tuple, btw (the example results are lists and returning a tuple make zero sense there).

# ? Jan 13, 2014 05:31

SirPablo: May 1, 2004; Pillbug

unixbeard posted:

I need to read a bunch of excel files in python, just reading no writing/creation. It seems like there are a few packages, xlrd and openpyxl, before I dive in does anyone have opinions or advice for/against either of them?

I needed to do some quick work and xlrd was slick, easy to learn.

# ? Jan 13, 2014 10:26

Dren: Jan 5, 2001; Pillbug

KICK BAMA KICK posted:

Thanks for the recommendation of Think Python some time ago. Working through one of the exercises (using some syntax not yet covered in the text) I came up with this as part of my solution:
code:
def sorted_anagrams(anagram_dict, min_chars = 0, min_anagrams = 0):
    """Returns a tuple containing lists of anagrams, in descending order of list length.

    anagram_dict: A dictionary of the form generated by make_anagram_dict. Keys are strings of characters,
    alphabetically sorted; values are lists of anagrams made from those characters.

    min_chars, min_anagrams: If provided, excludes anagrams with fewer than the minimum number
    of characters or groups with fewer than the minimum number of anagrams."""

    return tuple(anagrams for characters, anagrams in sorted(anagram_dict.items(), key=lambda t: len(t[1]), reverse=True)
                 if len(characters) >= min_chars and len(anagrams) >= min_anagrams)
My question: is a one-liner comprehension like that "Pythonic" or is that trying too hard to be Pythonic? To me that was more natural than writing it out in the longer fashion but I could imagine someone else reading that and saying "No, dude, just break that down."

It's not that bad but I'd pull the sorted(...) step out of the comprehension and put it on its own line.

# ? Jan 13, 2014 15:12

Computer viking: May 30, 2011; Now with less breakage.

Dren posted:

It's not that bad but I'd pull the sorted(...) step out of the comprehension and put it on its own line.

Even just reformatting would make it easier to read, e.g.

code:

return tuple(
  anagrams 
  for characters, anagrams 
  in sorted(anagram_dict.items(), key=lambda t: len(t[1]), reverse=True)
  if len(characters) >= min_chars and len(anagrams) >= min_anagrams
)

# ? Jan 13, 2014 18:13

Pollyanna: Mar 5, 2005; Milk's on them.

I have an issue with Heroku. However, I can't replicate the error on my end. I suspect that something is erroring out on Heroku's end, but I can't get a stacktrace from them. All that happens is a 500 error.

I heard about the logging module, and was hoping that it could help me by just printing out a trace to the console, as if you set "-v" flag or something. How do I do this?

# ? Jan 13, 2014 23:39

Modern Pragmatist: Aug 20, 2008

Pollyanna posted:

I have an issue with Heroku. However, I can't replicate the error on my end. I suspect that something is erroring out on Heroku's end, but I can't get a stacktrace from them. All that happens is a 500 error.

I heard about the logging module, and was hoping that it could help me by just printing out a trace to the console, as if you set "-v" flag or something. How do I do this?

My guess would be directory permissions since it seems that you're dynamically writing HTML files (?). Design decisions aside, you should be able to use the following to print your own stack trace after the IOError is encountered:

Python code:

import traceback

try:
    # blah
except IOError, e:
    traceback.print_exc()

# ? Jan 14, 2014 01:36

fletcher: Jun 27, 2003; ken park is my favorite movie; Cybernetic Crumb

Little test program:

code:

#!/usr/bin/env python


class CoolObject(object):
    def __init__(self, a=None, b=None):
        self.a = a
        self.b = b

    def __eq__(self, other):
        print('__eq__ called')
        if isinstance(other, self.__class__) and self.a == other.a and self.b == other.b:
            return True
        return False

the_list = []

original = CoolObject(a=1, b=2)
the_list.append(original)

duplicate = CoolObject(a=1, b=2)

print('original in the_list? %s' % (original in the_list))

print('duplicate in the_list? %s' % (duplicate in the_list))

print('duplicate NOT in the_list? %s' % (duplicate not in the_list))

print('original == duplicate? %s' % (original == duplicate))

Results:

code:

original in the_list? True
__eq__ called
duplicate in the_list? True
__eq__ called
duplicate NOT in the_list? False
__eq__ called
original == duplicate? True

Is __eq__ only called 3 times because it's just checking the reference in my first print()?

I have a function that generates a bunch of CoolObjects and returns a list, but as I'm generating them I don't want to add them to the list if a duplicate is already in there. Is overriding __eq__ the correct way to handle 'x in somelist' and 'x not in somelist' behavior?

# ? Jan 14, 2014 02:17

emoji: Jun 4, 2004

C code:

list_contains(PyListObject *a, PyObject *el)
{
    Py_ssize_t i;
    int cmp;

    for (i = 0, cmp = 0 ; cmp == 0 && i < Py_SIZE(a); ++i)
        cmp = PyObject_RichCompareBool(el, PyList_GET_ITEM(a, i),
                                           Py_EQ);
    return cmp;
}

Note If o1 and o2 are the same object, PyObject_RichCompareBool() will always return 1 for Py_EQ and 0 for Py_NE.

But you might want to define __hash__ and use sets if your objects are immutable.

# ? Jan 14, 2014 04:36

SurgicalOntologist: Jun 17, 2004

Regarding that question I asked last week when everyone was like, "You're looking for asynchronous I/O"- I've been reading a bunch on that, and I think I'm learning something. However, pretty much everything I can find--especially beginner guides--frames the issue as one of dealing with slow I/O processes. I have the opposite problem: an extremely fast input process. The use case is that either (a) something should run every time new data comes in, or (b) something needs to reference only the most recent piece of data (b will be far more common than a). I will never have a situation where I'm waiting for data to come in.

I don't doubt, of course, that asyncio can handle this (whether the proposed 3.4 library or the concept in general), I'm just having trouble thinking about it since everything I've read seems to be "how to elegantly get your program to block until your input comes in". Instead, I want to repeatedly ask for input (which requires calling a specific function over and over), but do so in the background without turning every program using this API into one big loop. There's plenty of jargon I haven't figured out yet, but I don't know where to look--I don't feel like I've found the trail. Any suggestions for further reading?

# ? Jan 14, 2014 05:03

FoiledAgain: May 6, 2007

What is the reason for string.find() to return -1 on failure? That leads to this unexpected output:

code:

>>> x = 'vikings'
>>> if x.find('eggs'): print('found it!')
...
found it!
>>>

# ? Jan 14, 2014 08:13

fletcher: Jun 27, 2003; ken park is my favorite movie; Cybernetic Crumb

In case it's at position 0:

code:

>>> x = 'eggs over easy'
>>> if x.find('eggs'): print('found it!')
...
>>>

# ? Jan 14, 2014 08:29

Opinion Haver: Apr 9, 2007

fletcher posted:

In case it's at position 0:

code:

>>> x = 'eggs over easy'
>>> if x.find('eggs'): print('found it!')
...
>>>

So why doesn't it return None if it's not found?

# ? Jan 14, 2014 08:41

suffix: Jul 27, 2013; Wheeee!

Opinion Haver posted:

So why doesn't it return None if it's not found?

The code would still be wrong, since both 0 and None would be false. Better to find the bug early.

.find() isn't usually what you want. If you want to check if it's there, use 'in'. If you need to know the position, use .index(), which raises an exception if it's not there.

code:

if 'eggs' in x: print('found it!')

# ? Jan 14, 2014 10:04

QuarkJets: Sep 8, 2008

FoiledAgain posted:

What is the reason for string.find() to return -1 on failure? That leads to this unexpected output:
code:
>>> x = 'vikings'
>>> if x.find('eggs'): print('found it!')
...
found it!
>>>

You shouldn't be using string.find() for substring checks anyway, use in. If you wanted to use string.find(), then you could instead search for a less than 0 index

code:

if x.find('eggs') >= 0: print('found it!')

find always returns an integer, which is nice.

# ? Jan 14, 2014 10:59

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

SurgicalOntologist posted:

Regarding that question I asked last week when everyone was like, "You're looking for asynchronous I/O"- I've been reading a bunch on that, and I think I'm learning something. However, pretty much everything I can find--especially beginner guides--frames the issue as one of dealing with slow I/O processes. I have the opposite problem: an extremely fast input process. The use case is that either (a) something should run every time new data comes in, or (b) something needs to reference only the most recent piece of data (b will be far more common than a). I will never have a situation where I'm waiting for data to come in.

I don't doubt, of course, that asyncio can handle this (whether the proposed 3.4 library or the concept in general), I'm just having trouble thinking about it since everything I've read seems to be "how to elegantly get your program to block until your input comes in". Instead, I want to repeatedly ask for input (which requires calling a specific function over and over), but do so in the background without turning every program using this API into one big loop. There's plenty of jargon I haven't figured out yet, but I don't know where to look--I don't feel like I've found the trail. Any suggestions for further reading?

Can you give more details about the exact problem you're trying to do? What is the network input is used for? Is this a command line program or a GUI? What GUI framework?

# ? Jan 14, 2014 13:39

SurgicalOntologist: Jun 17, 2004

Suspicious Dish posted:

Can you give more details about the exact problem you're trying to do? What is the network input is used for? Is this a command line program or a GUI? What GUI framework?

I'm interfacing with a motion capture device, in an application using pyglet (actually I'm writing more of a framework that interfaces this and other equipment together using pyglet, it's the same problem more or less but I'd like the resulting API to be clean if possible). The device supplies position, angle, velocity, etc that will be used to update objects on the screen. Most commonly, the 3D position will be projected onto the plane of the screen. Basically I'm using a motion capture device to make a big touchscreen.

The device has an API with this example code:

code:

import vrpn

def callback(userdata, data):
    print(userdata, " => ", data);

tracker=vrpn.receiver.Tracker("Tracker0@localhost")
tracker.register_change_handler("position", callback, "position")

while 1:
    tracker.mainloop()

I tried having the callback dispatch a pyglet event, and using the pyglet schedule interval function to call tracker.mainloop, but the program blocks as it repeatedly calls the callback. I think it's too fast, and there's always another event waiting so nothing gets to happen. What I'd like for it to do is keep the most recent data in an attribute as well as store a history, so there would be no need for

This is strange, just tested this now: once I register the event with pyglet, even calling tracker.mainloop once causes the program to block. Not sure why that would happen.

Edit: I figured it out. The device fills the buffer if mainloop isn't repeatedly called, and so when I'm testing in the command line as opposed to a script, there could easily be a lot of data waiting. But it wouldn't have actually blocked forever. So I guess I just need to call mainloop faster than the data is coming in, and my program won't block. (Lightbulb) All this callback stuff with pyglet is pretty much asynchronous I/O already, isn't it?

SurgicalOntologist fucked around with this message at 17:14 on Jan 14, 2014

# ? Jan 14, 2014 16:34

Opinion Haver: Apr 9, 2007

QuarkJets posted:

find always returns an integer, which is nice.

Yeah, but you could still do this:

code:

if x.nonefind('eggs') is not None: print('found it!')

# ? Jan 14, 2014 20:02

Dickbutt Ouroboros: Nov 13, 2002; handbandit?
Son of a bitch!

I have a short, stupid question I've come across. The exercise solution for this suggest using multiple if statements, but a while loop seemed like it would also sort of work. For some reason this piece of code will only run 3 times. As soon as heads or tails gets to two it stops. Is it possible to use the and inside of the while statement, or am I missing something?

I see why this method is bad in practice, as the for loop will execute the full 10,000 times even after the while conditions are met. I just want to know why the while statement isn't working.

code:

from random import randint
heads = 0
tails = 0
trials = 10000
numFlips = 0
for counter in range(0,trials):
    while (heads <= 1) and (tails <= 1):
        coinflip = randint(0,1)
        if coinflip == 1:
            heads = heads + 1
            numFlips = numFlips + 1
        else:
            tails = tails + 1
            numFlips = numFlips + 1
    
print heads
print tails
print "It takes an average of {} flips to see both heads and tails.".format(numFlips)

# ? Jan 14, 2014 22:32

ManoliIsFat: Oct 4, 2002

It's because you want an OR. If heads is 2 and tails is 0, your while condition will stop being true (2<=1 AND 0<=1 evaluates to FALSE), and thus will break out of the while loop.

# ? Jan 14, 2014 22:39

Computer viking: May 30, 2011; Now with less breakage.

Could you also do something like this?
while any( (heads, tails) <= 1)):

I can't offhand remember if comparing to a list will do what I hope, or even if "any" is actually a Python function. (I don't have a PC at hand to test right now.)

Computer viking fucked around with this message at 23:04 on Jan 14, 2014

# ? Jan 14, 2014 23:02

Dickbutt Ouroboros: Nov 13, 2002; handbandit?
Son of a bitch!

Okay, I think I see what you're saying. While can take a boolean value as a trigger. I was looking to drop out of the loop when both statements evaluated to FALSE using the and, but it is seeing it as a single statement.

# ? Jan 14, 2014 23:03

ManoliIsFat: Oct 4, 2002

handbandit posted:

Okay, I think I see what you're saying. While can take a boolean value as a trigger. I was looking to drop out of the loop when both statements evaluated to FALSE using the and, but it is seeing it as a single statement.

Ya, that's how while loops works. It checks the truth of the statement every time it runs. In English, you're used to casually saying "while heads and tails are less than 1, keep doing this loop", but that's not what your boolean is doing. You're bool is saying "if heads >1 AND tails>1, this returns true. All other combinations return false"

pre:

        
         T <= 1     T > 1
H <= 1     T          F
H > 1      F          F

you want your code to look like this:

code:

from random import randint
heads = 0
tails = 0
trials = 10000
numFlips = 0
for counter in range(0,trials):
    while (heads <= 1) or (tails <= 1):
        coinflip = randint(0,1)
        if coinflip == 1:
            heads = heads + 1
            numFlips = numFlips + 1
        else:
            tails = tails + 1
            numFlips = numFlips + 1
    
print heads
print tails
print "It takes an average of {} flips to see both heads and tails.".format(numFlips)

# ? Jan 14, 2014 23:13

Dominoes: Sep 20, 2007

Hey dudes, I'm wondering if y'all know a way to serve a generated text-based file directly from a web server for download, without saving it as a file first.

I have this code, where HttpResponse is a Django object, and xml is an xml ElementTree.

Python code:

    xml = lowfly_code.drx_from_db(Notam) # Also saves 'test.xml'
    response = HttpResponse(FileWrapper(open('test.xml')), content_type='application/xml')
    response['Content-Disposition'] = 'attachment; filename=test.xml'
    return response

I've unsuccessfully experimented with using ET's toString function.

# ? Jan 15, 2014 17:44

OnceIWasAnOstrich: Jul 22, 2006

Dominoes posted:

I've unsuccessfully experimented with using ET's toString function.

What has been unsuccessful? Are you unable to get a string from elementree or unable to create a response with it. It is possible you are using ET.tostring() on the ElementTree instead of an Element.

Python code:

response_text = ElementTree.tostring(xml.getroot())

# ? Jan 15, 2014 18:15

Dominoes: Sep 20, 2007

OnceIWasAnOstrich posted:

What has been unsuccessful? Are you unable to get a string from elementree or unable to create a response with it. It is possible you are using ET.tostring() on the ElementTree instead of an Element.
Python code:
response_text = ElementTree.tostring(xml.getroot())

Hey, I think you found the problem. I can't test it now since my code's temporarily broken, but that's consistent with the ElementTree docs; I was indeed using it on a Tree.

edit: You nailed it brother.

Python code:

xml, filename = lowfly_code.drx_from_db(Notam)
texml = ET.tostring(xml.getroot(), encoding='unicode')
response = HttpResponse(FileWrapper(io.StringIO(texml)), content_type='application/xml')
response['Content-Disposition'] = 'attachment; filename="{0}"'.format(filename)
return response

Dominoes fucked around with this message at 19:30 on Jan 15, 2014

# ? Jan 15, 2014 18:44

OnceIWasAnOstrich: Jul 22, 2006

I can't say I am that familiar with Django so FileWrapper may do something that I am not aware of (maybe it streams the file? but you aren't using a streaming HttpResponse), but it seems like wrapping a wrapper of a string is excessive. HttpResponse can be easily created with just a plain string since are already keeping the whole thing in memory at some point.

# ? Jan 15, 2014 19:41

Pollyanna: Mar 5, 2005; Milk's on them.

Has anyone here done Shift-JIS decoding/encoding? I have a .bin file encoded with Shift-JIS (Japanese text), and I want to read each byte and translate them to UTF-8 or UTF-16. What I was thinking of was looping through it with file.read(1), then decoding/encoding the selected bytes (if that makes any sense). Would this work? Cause so far I just get strings like '/x00/xac' and I don't know how to change that to readable text. Has this been done before? Should I be reading one or two bytes at a time? What final encoding should I use? How I do :saddowns:

# ? Jan 15, 2014 19:47

Dominoes: Sep 20, 2007

OnceIWasAnOstrich posted:

I can't say I am that familiar with Django so FileWrapper may do something that I am not aware of (maybe it streams the file? but you aren't using a streaming HttpResponse), but it seems like wrapping a wrapper of a string is excessive. HttpResponse can be easily created with just a plain string since are already keeping the whole thing in memory at some point.

Right again; it still works after removing the FileWrapper.

# ? Jan 15, 2014 19:53

Luigi Thirty: Apr 30, 2006; Emergency confection port.

Is it possible to get the current Windows sound volume via the Windows API libraries? I'm good at Python but bad at Windows and I'm getting conflicting information on doing it from the Googletron.

# ? Jan 15, 2014 20:46

BeefofAges: Jun 5, 2004; Cry 'Havoc!', and let slip the cows of war.

Luigi Thirty posted:

Is it possible to get the current Windows sound volume via the Windows API libraries? I'm good at Python but bad at Windows and I'm getting conflicting information on doing it from the Googletron.

https://stackoverflow.com/questions/18112457/python-change-windows-7-master-volume ?

pywin32 or https://pypi.python.org/pypi/WMI/ are probably your best bets if you can find the Windows APIs for audio volume. Apparently the Windows audio volume APIs are cryptic and poorly documented. Good luck!

# ? Jan 15, 2014 21:04

RyceCube: Dec 22, 2003

I have a bunch of data that looks like this:

code:

{"Game":"Chess","title":"just for fun!","size":"2","entriesData":
["PLAYERNAME","IMAGEHERE"],"entryFee":1,"prizeSummary","gameId":"9436","tableSpecId":"1079","dateUpdated":13898
10809648,"dateCreated":1389659697294,"stack":235,"entryHTML":null}

(added linebreaks to prevent table from breaking)

with basically a bunch of entries one after another all on one line from the website.

I want to parse this data to get playername, game type, etc.

I know I should use the JSON library to accomplish this.

The page I get the code from has a bunch of HTML on it as well. Is it okay to use the json.load on the html, or should I strip that from it first?

I'm not really entirely sure where to begin solving this problem, and am a bit confused by the JSON documentation.

Any tips or hints would be greatly appreciated.

# ? Jan 15, 2014 21:06

Luigi Thirty: Apr 30, 2006; Emergency confection port.

BeefofAges posted:

https://stackoverflow.com/questions/18112457/python-change-windows-7-master-volume ?

pywin32 or https://pypi.python.org/pypi/WMI/ are probably your best bets if you can find the Windows APIs for audio volume. Apparently the Windows audio volume APIs are cryptic and poorly documented. Good luck!

gently caress that, nevermind. I don't need to know the volume that badly.

Phiberoptik posted:

I have a bunch of data that looks like this:
code:
{"Game":"Chess","title":"just for fun!","size":"2","entriesData":
["PLAYERNAME","IMAGEHERE"],"entryFee":1,"prizeSummary","gameId":"9436","tableSpecId":"1079","dateUpdated":13898
10809648,"dateCreated":1389659697294,"stack":235,"entryHTML":null}
(added linebreaks to prevent table from breaking)

with basically a bunch of entries one after another all on one line from the website.

I want to parse this data to get playername, game type, etc.

I know I should use the JSON library to accomplish this.

The page I get the code from has a bunch of HTML on it as well. Is it okay to use the json.load on the html, or should I strip that from it first?

I'm not really entirely sure where to begin solving this problem, and am a bit confused by the JSON documentation.

Any tips or hints would be greatly appreciated.

You need to get it down to just the JSON data if you want to load it into a JSON library. The HTML will make it barf. You could try clever applications of .split() on the raw page to try to get just the JSON separated out.

Luigi Thirty fucked around with this message at 21:13 on Jan 15, 2014

# ? Jan 15, 2014 21:10

BeefofAges: Jun 5, 2004; Cry 'Havoc!', and let slip the cows of war.

Phiberoptik posted:

I have a bunch of data that looks like this:
code:
{"Game":"Chess","title":"just for fun!","size":"2","entriesData":
["PLAYERNAME","IMAGEHERE"],"entryFee":1,"prizeSummary","gameId":"9436","tableSpecId":"1079","dateUpdated":13898
10809648,"dateCreated":1389659697294,"stack":235,"entryHTML":null}
(added linebreaks to prevent table from breaking)

with basically a bunch of entries one after another all on one line from the website.

I want to parse this data to get playername, game type, etc.

I know I should use the JSON library to accomplish this.

The page I get the code from has a bunch of HTML on it as well. Is it okay to use the json.load on the html, or should I strip that from it first?

I'm not really entirely sure where to begin solving this problem, and am a bit confused by the JSON documentation.

Any tips or hints would be greatly appreciated.

Is there a different API or endpoint you can call that will just give you the JSON without any HTML?

Trying to parse the JSON out of a bunch of HTML sounds like you're doing it wrong.

# ? Jan 15, 2014 21:40

Dominoes: Sep 20, 2007

Phiberoptik posted:

I have a bunch of data that looks like this:

code:

{"Game":"Chess","title":"just for fun!","size":"2","entriesData":
["PLAYERNAME","IMAGEHERE"],"entryFee":1,"prizeSummary","gameId":"9436",
"tableSpecId":"1079","dateUpdated":13898
10809648,"dateCreated":1389659697294,"stack":235,"entryHTML":null}

(added linebreaks to prevent table from breaking)

You're on the right track. Splitting the data from the HTML will be the hard part. I've heard good things about Beautiful Soup.

Once you've isolated the data as a string, run json.loads() to turn it into a dict.

# ? Jan 15, 2014 21:40

Lysidas: Jul 26, 2002; John Diefenbaker is a madman who thinks he's John Diefenbaker.; Pillbug

Pollyanna posted:

Has anyone here done Shift-JIS decoding/encoding? I have a .bin file encoded with Shift-JIS (Japanese text), and I want to read each byte and translate them to UTF-8 or UTF-16. What I was thinking of was looping through it with file.read(1), then decoding/encoding the selected bytes (if that makes any sense). Would this work? Cause so far I just get strings like '/x00/xac' and I don't know how to change that to readable text. Has this been done before? Should I be reading one or two bytes at a time? What final encoding should I use? How I do

How big is the file? You shouldn't read individual bytes for this -- these character encodings are standard in Python and you shouldn't even begin to reimplement them. You can do this line by line or (if the file's small enough) by reading the entire file into memory.

I'm curious about the .bin extension -- does the file only contain Shift-JIS text? If so, you can probably do

Python code:

#!/usr/bin/env python3
from argparse import ArgumentParser

p = ArgumentParser()
p.add_argument('input_file')
p.add_argument('output_file')
args = p.parse_args()

with open(args.input_file, encoding='shift-jis') as i, open(args.output_file, 'w', encoding='utf-8') as o:
    for line in i:
        o.write(line)

EDIT: Note that this is a bad reimplementation of the iconv utility. You should probably just use that.

Lysidas fucked around with this message at 22:14 on Jan 15, 2014

# ? Jan 15, 2014 22:11

Adbot: ADBOT LOVES YOU

# ? May 31, 2024 01:46

John DiFool: Aug 28, 2013

Phiberoptik posted:

I have a bunch of data that looks like this:
code:
{"Game":"Chess","title":"just for fun!","size":"2","entriesData":
["PLAYERNAME","IMAGEHERE"],"entryFee":1,"prizeSummary","gameId":"9436","tableSpecId":"1079","dateUpdated":13898
10809648,"dateCreated":1389659697294,"stack":235,"entryHTML":null}
(added linebreaks to prevent table from breaking)

...

The page I get the code from has a bunch of HTML on it as well. Is it okay to use the json.load on the html, or should I strip that from it first?

...

Are you sure there isn't anyway to grab that data from the server without the HTML? If that's dynamic data on an HTML page then I wouldn't be surprised if the page uses JavaScript to load the JSON from through some specific URL on the server.

# ? Jan 15, 2014 22:15

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python information and short questions megathread.

«‹›484 »