|
tef posted:javascript/json only has double precision floating point That's just javascript. json as a spec allows arbitrary sized integers/decimals.
|
# ? Aug 14, 2012 21:55 |
|
|
# ? May 20, 2024 00:08 |
|
leterip posted:That's just javascript. json as a spec allows arbitrary sized integers/decimals. Javascript and most parsers that i've encountered. Json is designed to be a subset of Javascript. That is why we have things like this quote:To escape an extended character that is not in the Basic Multilingual
|
# ? Aug 14, 2012 22:00 |
|
The King of Swag posted:I've seen that before and the problem is that I need to save my data out as binary and be able to easily read it all back. I second Protobuf for this. Makes future/backwards compatibility a lot easier, and sooner or later you will change the format and will have to load older files. Plus, language agnostic and portable. JSON is fine if you don't want a binary file format.
|
# ? Aug 14, 2012 22:04 |
|
Python's JSON parser distinguishes between ints and floats based on the existence of a decimal literal. http://hg.python.org/cpython/file/5395f96588d4/Lib/json/scanner.py#l48
|
# ? Aug 14, 2012 22:05 |
|
It also handles NaN, and +/- Infinity. Which is excluded from the specquote:Numeric values that cannot be represented as sequences of digits
|
# ? Aug 14, 2012 22:22 |
|
Sure, but that says nothing about the validity of the way Python's JSON scanner interprets number literals. There's nothing explicit, in either the JSON RFC or otherwise that all numbers should be within the representable range of IEEE754 double-precision floats.
|
# ? Aug 14, 2012 22:25 |
|
I just don't like JSON
|
# ? Aug 14, 2012 22:43 |
|
Sure. You don't have to like it. That's your opinion, and I respect it. But don't make up incorrect facts that are verifiably false. That's rude.
|
# ? Aug 14, 2012 22:52 |
|
Sorry, I just assumed that the design criteria of JSON, being a subset of Javascript, implied that the numbers were doubles, and the strings didn't handle things outside the BMP. The latter is explicit, the former is just my faulty recollection Also, you smell.
|
# ? Aug 14, 2012 23:03 |
|
How would I go about writing a family tree in Python, and what charting library should I use for it?
|
# ? Aug 15, 2012 00:24 |
|
ufarn posted:How would I go about writing a family tree in Python, and what charting library should I use for it? I'd pick a graph library (like igraph or NetworkX), they usually come with drawing or at least Graphviz export.
|
# ? Aug 15, 2012 00:41 |
|
I have what may be a dumb question. I have a python script that is a small command line app. It's a proof of concept for what may end up being a more fully-featured command-line app. I need to demonstrate this to someone who is not technically savvy enough to install python and whatnot, and I can't just email them an exe. I don't know much about web stuff but is there some website where I can upload the .py and have it run the thing in a browser so I can send this person to, e.g. http://myscript.idleonline.com or something?
|
# ? Aug 15, 2012 16:42 |
|
Ideone can run both Python 2 and 3 (and more). It's really handy.
|
# ? Aug 15, 2012 16:51 |
|
That is handy, thanks. But I need it to be interactive. It's along the lines of:code:
|
# ? Aug 15, 2012 17:02 |
|
You can put the input sequence in "stdin" field, and it'll work. Alternatively, you can try repl.it, which actually runs the interpreter compiled to JavaScript inside the browser.
|
# ? Aug 15, 2012 17:12 |
|
Or you could send the exe http://www.pyinstaller.org/ Works for me.
|
# ? Aug 15, 2012 23:42 |
|
I am writing a webapp (using CherryPy) for visualization of some data sets. This isnt going to be on the internet, just run on the users computer. Currently I am loading the data (1-6+ integer lists 3m+ elements long) into memory. The the client POSTs and CherryPy responds with a json dump of the slice of the datasets that was requested. This works great because I only have one dataset and an obscene amount of RAM, both things that users may lack. I'm trying to think of the most appropriate way to access the data as needed without loading everything into memory. I thought about sqlite but sql doesnt seem to have a way to easily deal with array-like data. Any suggestions?
|
# ? Aug 16, 2012 22:43 |
|
evilentity posted:Or you could send the exe I considered that but py2exe was building 73mb of poo poo that needed to come along with the exe so I gave up. I ended up doing a GoTo meeting which worked fine. But I'll check out PyInstaller. Thanks.
|
# ? Aug 16, 2012 22:48 |
|
OnceIWasAnOstrich posted:I'm trying to think of the most appropriate way to access the data as needed without loading everything into memory. I thought about sqlite but sql doesnt seem to have a way to easily deal with array-like data. Any suggestions? You could use HDF5. If you're not building something you intend to generalize to other things though, you could also use split up your arrays into a series of separate files (like pickled numpy arrays, each containing 100k elements or something) and load them as needed?
|
# ? Aug 16, 2012 22:49 |
|
OnceIWasAnOstrich posted:I'm trying to think of the most appropriate way to access the data as needed without loading everything into memory. I thought about sqlite but sql doesnt seem to have a way to easily deal with array-like data. Any suggestions? PyTables might help.
|
# ? Aug 16, 2012 23:30 |
|
PiotrLegnica posted:PyTables might help. This for HDF5 is probably what I will use, although what I've gathered from reading up on all this is I should really be using Numpy arrays instead of cPython lists. I don't know why but I have some sort of weird bias against Numpy.
|
# ? Aug 17, 2012 00:35 |
|
I'm trying to start using Tkinter and I'm off to a bad start. I can't import it. Here's the error I get: code:
EDIT: I was running Python 2.7, upgrading to 2.7.3 fixed it if anybody wanted to know. Chimp_On_Stilts fucked around with this message at 03:18 on Aug 19, 2012 |
# ? Aug 18, 2012 22:46 |
|
OnceIWasAnOstrich posted:This for HDF5 is probably what I will use, although what I've gathered from reading up on all this is I should really be using Numpy arrays instead of cPython lists. I don't know why but I have some sort of weird bias against Numpy. While HDF5 and PyTables is actually pretty amazing for what I'm doing (and saves a shitton of memory and improves speed for a number of internal parts), there is no way in hell that most of the people who will want to use my program will be able to get HDF5 + PyTables installed correctly, especially on OSX. On another note, I have discovered another reason for my dislike of numpy. Python code:
|
# ? Aug 20, 2012 16:21 |
|
OnceIWasAnOstrich posted:Apparently cPython's optimization of list summation is better than numpy's optimized summation of arrays. When you have a script where ~80% of the execution time is tied up in integer list summation...welp. This is Python 2.7.3 installed through Macports: Python code:
|
# ? Aug 20, 2012 16:41 |
|
Apparently a.sum() and numpy.sum(a) aren't exactly the same for some reason. But yeah, don't use built-in sum for NumPy arrays, and you will benefit on large datasets anyway (because NP arrays are unboxed, so at the very least you'll be more cache-friendly). For 200 elements I get (Py 2.7, Win7 x64, NP 1.6.1): code:
code:
|
# ? Aug 20, 2012 17:42 |
|
Emacs Headroom posted:Uh, that's pretty weird. Is your numpy install hosed somehow? This is 100% possible and something I should probably investigate, although I have been way to busy to bother. I have a lot of trouble compiling any Python packages with gcc on 10.7. I probably horribly broke something somewhere since I was new to both OSX and Python. Trying to install anything that needs to use gcc with distribute gives me this sort of error: code:
I should probably just nuke my OS and start over.
|
# ? Aug 20, 2012 18:07 |
|
I must be retarded or crazy. I've installed Python too many times on my machine and it's totally hosed me I think. Don't know if this is a Unix problem or a Python problem or both. Setup Pygame on OSX Mountain Lion and everything. Friend started a code project, I grab his code. Guess I have all my modules installed to /usr/bin/python2.7. Cool. So if I go into a terminal and type which python2.7 I get "/usr/bin/python2.7" if I call python2.7 main.py I get an error about a missing module if I call /usr/bin/python2.7 main.py it works just fine do I have to gently caress with my bash profile or something?
|
# ? Aug 21, 2012 03:14 |
|
Can you post the error? Have you hosed with your PYTHONPATH recently?
|
# ? Aug 21, 2012 03:19 |
|
welp added export PYTHONPATH=~/Library/Frameworks:$PYTHONPATH to my .bash_profile now it works. Cool. I really hate OSX changing directory structure every release.
|
# ? Aug 21, 2012 03:26 |
|
I couldn't imagine wresting with the system Python in OSX. Macports might not do everything right, but it makes Python a lot less painful.
|
# ? Aug 21, 2012 03:31 |
|
Sockser posted:welp It's annoying when you don't know about it and they change it suddenly, but I think moving that folder into the user's home folder is a smart move. Although, like Emacs Headroom said, using Macports or Homebrew is usually a lot less painful.
|
# ? Aug 21, 2012 03:50 |
|
So I'm back with more python3 str/bytes stuff. The basic problem that I'm trying to solve is this. I read some data in from a binary file (comes in as a bytestring); however, at the time of reading, I don't actually know the proper encoding. By default the files use 'iso8859', but if it's anything other than that, it is specified at the end of the file (annoying, I know). So basically I've tried to create a subclass of str that essentially stores the bytestring and then allows us to decode it later with the specified encoding if there is one. The user only has to deal with unicode and never accesses this class directly. Instead, another part of my program creates a new String instance and can specify the encoding determined after reading so that it can be stored and the unicode can be properly encoded when I need to write it back to the file. I hope that makes sense. I've tried to use the unicode sandwich as much as possible so that the user only deals in unicode and the only time bytestrings are necessary is when reading/writing the data to a file. If you could take a look at my code, I would really appreciate it. I may be over-complicating things a bit. http://pastie.org/4562873
|
# ? Aug 21, 2012 17:04 |
|
Seek to the end of the file, read the encoding first?
|
# ? Aug 22, 2012 03:13 |
|
What suspicious Dish said, or use something like mmap.
|
# ? Aug 22, 2012 05:35 |
|
When reading a file: Read the entire file into a bytes object. Read the encoding from the end, then return the file (minus encoding) decoded with that encoding. When writing a file: Encode the data (unicode) using your specified encoding, yielding bytes. Append the encoding. Write it to the file.
|
# ? Aug 22, 2012 07:55 |
|
Modern Pragmatist posted:So I'm back with more python3 str/bytes stuff. Is the data ever stored as Unicode using the BOM, which must be at the beginning of the file? What about large files? If there is a 2 Gb file you're going to suck a ton of memory to read it in as a bytestring then create a decoded copy of it. I haven't used Python 3 but I read some documentation on it: the unicode howto - http://docs.python.org/release/3.0.1/howto/unicode.html the open function - http://docs.python.org/release/3.0.1/library/functions.html#open Python 3's open function allows you to specify an encoding for reading. If you were to specify the proper encoding when you opened the file you would never have to worry with bytestrings, encoding, or decoding. You would also not be required to load the entire contents of the file into memory in order to properly decode them (which means you could work with large files). I recommend that you seek to the end of the file where the encoding specification is, read the encoding, then re-open the file in the proper encoding. You can/should get rid of your intermediate string class. Python 3's open method will do all of the work for you.
|
# ? Aug 22, 2012 22:02 |
|
Dren posted:Python 3's open function allows you to specify an encoding for reading. If you were to specify the proper encoding when you opened the file you would never have to worry with bytestrings, encoding, or decoding. You would also not be required to load the entire contents of the file into memory in order to properly decode them (which means you could work with large files). It's there in Python 2, too. codecs.open or io.open (but that's usable only in 2.7 — 2.6 had lovely implementation of io).
|
# ? Aug 22, 2012 23:44 |
|
Dren posted:Is the data ever stored as Unicode using the BOM, which must be at the beginning of the file? So the files I'm dealing with are a little complicated. Basically, you have the following binary format: <ID><Length of Data><Datatype><Data><ID><Length of Data><Datatype><Data>... And this repeats for ~100 data fields. The encoding is actually contained in one of these fields. Also, only 4 of the 15 or so possible Datatypes are encoded using the specified encoding. The others, are encoded using iso8859. Because of this mix of encodings, I don't think setting the encoding for open would work out. The field that contains the encoding to be used is approximately the 25th sequential field in the file but isn't consistent (not easy to seek for). That's why my initial thought was to store all the bytestrings and then decode them all after I've read in all fields (including the encoding). It wouldn't be so bad if the encoding field always occurred before any fields that required it, but that is rarely the case. I can try to do something similar to what Suspicious Dish suggested but I'll have to try to figure out how to work that into the existing code. As it is, most fields are oblivious of the other fields, so it's difficult for each field to check for a specific encoding during file reading.
|
# ? Aug 23, 2012 02:57 |
|
Read as bytestrings, and decode after you read something. Don't create a subclass of str, that's a really terrible idea.
|
# ? Aug 23, 2012 03:10 |
|
|
# ? May 20, 2024 00:08 |
|
Ok so your files have a record format, <ID><Length of Data><Datatype><Data> Each file consists of a collection of these records. How many datatypes are there? It sounds like some are metadata, for instance the record that tells the encoding of other types. I imagine that some of the datatypes have data that is of a known encoding or is an integer or something that doesn't need to be decoded. Furthermore, I expect that the ID, length, and datatype fields are fixed width fields. I suggest that you create a class to model the record. Something like this, apologies if I'm using Python 2.x style: code:
My read methodology and while loop could probably be more pythonic, I've been writing C for a while. You should probably use the with syntax for the open. Handle IO errors in the scope of the file open, not inside the Record class. In the Record class it'd be a good idea to throw an error if you see an incomplete record or an enum value of an unknown type. You might want to store the records in something other than a list for access purposes. You might also want to implement some comparator operators and @total_ordering in the Record class so that you can easily sort it with sorted(). http://docs.python.org/py3k/library/functools.html?highlight=functools#functools If you have records of a type other than string, you could rework decode to properly decode them and call decode on everything.
|
# ? Aug 23, 2012 18:25 |