Python

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python

Hollow Talk: Feb 2, 2014

I have a quasi-religious (i.e. style) question, since I briefly paused today when writing something to consider what would be the most natural way for other people. Background is how to write/compose a set of transformations, e.g. funneling a dict through a series of functions which each take a dict, operate on it, and return a dict.

code:

# This isn't part of the question
def inc(n: Union[int, Float]) -> Union[int, Float]:
    return n + 1

def dec(n: Union[int, Float]) -> Union[int, Float]:
    return n - 1

# To apply a function, would you rather write this

row = [1, 2, 3, 4]
row = [inc(n) for n in row]
row = [inc(n) for n in row]
row = [dec(n) for n in row]

# or this

row = [1, 2, 3, 4]
row = map(inc, row)
row = map(inc, row)
row = list(map(dec, row))

# or this

row = [1, 2, 3, 4]
row = list(map(dec, map(inc, map(inc, row))))

NB: This isn't about the first assignment, which could also be e.g. [inc(n) for n in [1, 2, 3, 4]], nor is it about iterators etc.

The reason why I paused and wondered was because in e.g. Clojure, I would probably write something like this:

code:

;; Most likely

(->> [1 2 3 4]
     (map inc)
     (map inc)
     (map dec))

;; Very likely

(map (comp inc inc dec) [1 2 3 4])

;; Somewhat less likely

(map #(-> %1 inc inc dec) [1 2 3 4])

;; Rather less likely

(map dec (map inc (map inc [1 2 3 4])))

Opinions?

Hollow Talk fucked around with this message at 18:35 on Apr 28, 2020

# ? Apr 28, 2020 18:32

Adbot: ADBOT LOVES YOU

# ? May 23, 2024 14:57

ynohtna: Feb 16, 2007; backwoods compatible; Illegal Hen

Hollow Talk posted:

Opinions?

Flip off Guido and import hy?

# ? Apr 28, 2020 18:44

Phobeste: Apr 9, 2006; never, like, count out Touchdown Tom, man

It kinda depends pretty heavily on what your actual use case is. In your trivial example I'd almost prefer

Python code:

[dec(inc(inc(n)) for n in row]

but as those functions you're mapping get more complex I'd decompose them into separate transforms. I personally don't like using the functional syntax in python since nothing else really does. Even (ignoring your caveat about iterators) if you wanted lazy iteration, I'd prefer a generator comprehension.

# ? Apr 28, 2020 18:44

CarForumPoster: Jun 26, 2013; â¡POWERâ¡

Phobeste posted:

It kinda depends pretty heavily on what your actual use case is. In your trivial example I'd almost prefer
Python code:
[dec(inc(inc(n)) for n in row]
but as those functions you're mapping get more complex I'd decompose them into separate transforms. I personally don't like using the functional syntax in python since nothing else really does. Even (ignoring your caveat about iterators) if you wanted lazy iteration, I'd prefer a generator comprehension.

If it was very simple, this.

If it wasn't extremely obvious, the first or 2nd depending on the day of the week.

# ? Apr 28, 2020 19:03

Dominoes: Sep 20, 2007

In the it doesn't matter camp, but if row were a class, something like this might be easier to read:

Python code:

row
    .inc_all()
    .inc_all()
    .dec_all()

You don't want to obfuscate or wrap things things too much to do this though; only if it makes sense.

# ? Apr 28, 2020 19:10

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

1. With no other facts at hand, the advice is to do what is most idiomatic in Python and that's list comprehensions.
2. If you're working in an existing code base or in an organization you should match what is idiomatic in that code base or what the organization's style guides indicate.
3. If no one else is looking at the code except for you, it's tempting to do whatever you want. I'd caution that it's bad to build up habits and preferences at odds with the larger Python universe.
4. Sometimes it's OK to break idioms, conventions, and guides if it makes a big difference in readability for a specific case.

# ? Apr 28, 2020 19:24

Dominoes: Sep 20, 2007

Thermopyle posted:

do what is most idiomatic in Python and that's list comprehensions.

I like this one as a default; it makes things easier for other people to read your code.

# ? Apr 28, 2020 19:26

SurgicalOntologist: Jun 17, 2004

Not saying it's the way to go, but I can't resist...

Python code:

from toolz.curried import map
from toolz import pipe


pipe(
    row,
    map(inc),
    map(inc),
    map(dec),
)

Some people like functional programming in Python, others get angry. My prefered, more idiomatic way to use functional-ish programming is method chaining as Dominoes suggested. But I feel like playing around with toolz really took my programming to the next level, even if I don't use it much.

There are other functional programming packages in Python but they all seem too "magic" for me. On the other hand, toolz is elegantly simple. The only exception might be curried functions.

SurgicalOntologist fucked around with this message at 19:49 on Apr 28, 2020

# ? Apr 28, 2020 19:46

Hollow Talk: Feb 2, 2014

Thanks for all the replies so far. For the record, I tend to use list or generator comprehensions for these things as well, but something in my brain had a "what if"-moment earlier. The toolz syntax is nice for data pipelining, but it's another dependency.

# ? Apr 28, 2020 20:02

Wallet: Jun 19, 2006

Personally I'd prefer a list comprehension in the most abstract sense where we're just talking about the example code but if this is a series of transforms that is being done repeatedly in the same order I'd prefer a function and if you have multiple lists like this that need some set of transforms applied to them in different orders I'd prefer a class of some kind. Removing all of the context of what the list is and what the transformations are kind of makes the question meaningless�I'm happy to do something a little less idiomatic if it is significantly more comprehensible to people who have to deal with it in the future (myself included).

Or, you know, what Thermopyle said but less concise.

# ? Apr 28, 2020 21:58

QuarkJets: Sep 8, 2008

Count me in for a single list comprehension, like the one Phobeste posted

# ? Apr 28, 2020 22:07

OnceIWasAnOstrich: Jul 22, 2006

I have what is essentially a ~100gb Python dictionary, 200 million 4kb float32 arrays each with a string key. I need to provide this to some people who will be using Python to randomly access these. This is a better plan than having them generate them on-demand because they are costly to generate (~half a second to generate a single array).

I'm looking for suggestions on the best way to distribute these. My options seem to be to create a database with a traditional database server and have to host it (either a SQL or more likely a dedicated key-value store) or some sort of file-based setup (sqlite, hdf5, bdb). I would really prefer the latter from a cost perspective.

My goal is to minimize the effort and code required to access the data. Something that let you just create a Python object by handing it a file/folder name and accessing it just like a dictionary would be the ideal solution. sqlitedict is available and gdb/bdb seems to be in the standard library and works this way, but may not be portable. Has anyone done something like this and have good/bad experience to share?

# ? Apr 30, 2020 20:44

CarForumPoster: Jun 26, 2013; â¡POWERâ¡

EDIT NVM dumb answer

# ? Apr 30, 2020 20:57

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

OnceIWasAnOstrich posted:

I have what is essentially a ~100gb Python dictionary, 200 million 4kb float32 arrays each with a string key. I need to provide this to some people who will be using Python to randomly access these. This is a better plan than having them generate them on-demand because they are costly to generate (~half a second to generate a single array).

I'm looking for suggestions on the best way to distribute these. My options seem to be to create a database with a traditional database server and have to host it (either a SQL or more likely a dedicated key-value store) or some sort of file-based setup (sqlite, hdf5, bdb). I would really prefer the latter from a cost perspective.

My goal is to minimize the effort and code required to access the data. Something that let you just create a Python object by handing it a file/folder name and accessing it just like a dictionary would be the ideal solution. sqlitedict is available and gdb/bdb seems to be in the standard library and works this way, but may not be portable. Has anyone done something like this and have good/bad experience to share?

Would pickle or shelve work for you? I haven't used them for that much data...

Both are in the standard library.

Of course, the users of your data have to trust you since pickle (and shelve since it uses pickle under the hood) can run arbitrary code when unpickling.

# ? Apr 30, 2020 20:58

taqueso: Mar 8, 2004

Yeah I would go with the latter option, if there is a python de-facto modern k-v store format use that. Otherwise pick your favorite that is well supported, I think sqlite would be at least. If you can give an example python program and say "do pip install py-sqlite first" it should be ok.

You don't want to host a db server, and they don't want to use your db server either.

# ? Apr 30, 2020 21:09

CarForumPoster: Jun 26, 2013; â¡POWERâ¡

Thermopyle posted:

Would pickle or shelve work for you? I haven't used them for that much data...

Both are in the standard library.

Of course, the users of your data have to trust you since pickle (and shelve since it uses pickle under the hood) can run arbitrary code when unpickling.

Don't you have to load an entire pickle into memory? Since they don't know whats going to be loaded ahead of time, he likely cant chunk it thus leaving him to try to load 100GB into memory.

# ? Apr 30, 2020 21:15

OnceIWasAnOstrich: Jul 22, 2006

Thermopyle posted:

Would pickle or shelve work for you? I haven't used them for that much data...

Both are in the standard library.

Of course, the users of your data have to trust you since pickle (and shelve since it uses pickle under the hood) can run arbitrary code when unpickling.

Pickle alone doesn't solve my problem, but shelve might. It seems to sometimes be backed by a Python dict, but has options to basically act as a wrapper around dbm which is definitely on-disk. I'll need to check if the underlying files are sufficiently-portable, but that does solve the not needing to distribute any extra code or install libraries issue.

Since everything is effectively already bytes or can be interpreted as bytes I suppose I could use the dbm module directly and avoid pickle security issues. Those won't be a big deal in this instance because they are already trusting me for more than that. I vaguely remember having technical or portability issues with that module years ago but that was back in the 2.7 days.

# ? Apr 30, 2020 22:20

QuarkJets: Sep 8, 2008

I am not a fan of using shelve for data quantities approaching "massive" amounts, by that point I just use HDF5.

Store the arrays as datasets using their dict key as the dataset name. There, you're done. This should take less than 10 lines using a for loop. If you want to split the data across multiple files that's just a flag passed to h5py's File constructor

Users fetch the list of dataset names from the file, which is equivalent to fetching a list of keys from the dictionary. Then they load the dataset with square bracket syntax, just like a dict.

You can implement gzip compression on the datasets with literally 1 line, if that's desirable, and users don't have to do anything extra to read the data; HDF5 performs the decompression automatically.

# ? May 1, 2020 00:00

OnceIWasAnOstrich: Jul 22, 2006

QuarkJets posted:

I am not a fan of using shelve for data quantities approaching "massive" amounts, by that point I just use HDF5.

Store the arrays as datasets using their dict key as the dataset name. There, you're done. This should take less than 10 lines using a for loop. If you want to split the data across multiple files that's just a flag passed to h5py's File constructor

Users fetch the list of dataset names from the file, which is equivalent to fetching a list of keys from the dictionary. Then they load the dataset with square bracket syntax, just like a dict.

You can implement gzip compression on the datasets with literally 1 line, if that's desirable, and users don't have to do anything extra to read the data; HDF5 performs the decompression automatically.

I was curious about this when I was looking into HDF5. The "dataset" terminology made me worry it would choke when I made hundreds of millions of datasets.

edit: It seems like there is a practical limit somewhere below a million datasets: https://forum.hdfgroup.org/t/limit-on-the-number-of-datasets-in-one-group/5892 I could split up my items/datasets into groups, maybe by prefix, and write a little wrapper script to hide that without too much effort. I'm also a little skeptical of shelve being able to handle that many keys without choking, unless it happens to use a really scalable search tree setup.

OnceIWasAnOstrich fucked around with this message at 00:25 on May 1, 2020

# ? May 1, 2020 00:16

QuarkJets: Sep 8, 2008

Drat, I did not realize that HDF5 had trouble with millions of datasets in a group

A simple workaround might be to split the datasets across arbitrarily-named groups (say 10k datasets per group) and then since you know your own hierarchy you can create a pair of datasets that provide the mapping from keys to dataset paths
Sort of like this:
Data/ # Group
Data/DataGroup1/ # Group
Data/Datagroup1/key_1 # Dataset
Data/Datagroup1/key_2 # Dataset
...
Data/DataGroupM # Group
Data/DataGroupM/key_N # Dataset
etc.
replacing key_N with the actual key names

And then you'd have a pair of datasets that look like this
keys
key_1
key_2
...
key_N

paths
Data/DataGroup1/key_1
Data/DataGroup1/key_2
...
Data/DataGroupM/key_N

Now you fetch the list of keys from the keys dataset

Python code:

    with h5py.File('data.hdf5', 'r') as h5fi:
        keys = h5fi['keys'][:]

And you fetch any respective dataset by mapping it through the paths dataset

Python code:

    with h5py.File('data.hdf5', 'r') as h5fi:
        # Fetch all keys
        keys = h5fi['keys'][:]
        # Randomly fetch 1 key
        i_key = randint(0, len(keys))
        # Fetch dataset path
        path_to_dataset = h5fi['paths'][i_key]
        # Fetch data
        data_np = h5fi[path_to_dataset][:]
        # Do something with the key and the data

This is not as trivial as I had in mind but not too onerous, and the upside is not having to run a 100GB SQL database and not having to load 100GB of data into memory like you'd need to with something like pickle (shelve may also have this requirement, but I'm not sure)

QuarkJets fucked around with this message at 01:45 on May 1, 2020

# ? May 1, 2020 01:42

Malcolm XML: Aug 8, 2009; I always knew it would end like ｔｈｉｓ．

OnceIWasAnOstrich posted:

I have what is essentially a ~100gb Python dictionary, 200 million 4kb float32 arrays each with a string key. I need to provide this to some people who will be using Python to randomly access these. This is a better plan than having them generate them on-demand because they are costly to generate (~half a second to generate a single array).

I'm looking for suggestions on the best way to distribute these. My options seem to be to create a database with a traditional database server and have to host it (either a SQL or more likely a dedicated key-value store) or some sort of file-based setup (sqlite, hdf5, bdb). I would really prefer the latter from a cost perspective.

My goal is to minimize the effort and code required to access the data. Something that let you just create a Python object by handing it a file/folder name and accessing it just like a dictionary would be the ideal solution. sqlitedict is available and gdb/bdb seems to be in the standard library and works this way, but may not be portable. Has anyone done something like this and have good/bad experience to share?

use a filesystem, one array per file; if there's only one level of hierarchy let the kernel deal with it/use directories -- store an index somewhere if u must

or hdf5

pickling is bad and dumb, don't use it unless you have to. 4kb float arrays, densely packed, is not such a case

Malcolm XML fucked around with this message at 02:27 on May 1, 2020

# ? May 1, 2020 02:15

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

pickling is fine if you're using it correctly.

I don't think I'd use it for this just because it's so big.

# ? May 1, 2020 02:41

QuarkJets: Sep 8, 2008

Yeah pickling is fine. I wind up using HDF5 even for small things just because I'm so familiar with it, it's becoming pretty widespread in astronomy circles. Plenty of greybeards still using FITS though, which was probably a pretty huge breakthrough back in the 1980s

# ? May 1, 2020 05:08

Malcolm XML: Aug 8, 2009; I always knew it would end like ｔｈｉｓ．

Thermopyle posted:

pickling is fine if you're using it correctly.

I don't think I'd use it for this just because it's so big.

the valid cases for pickling that don't lead to tears is vanishingly small

unless you need to serialize _arbitrary_ python objects, there are usually better, faster, smaller ways that don't restrict you to only python

# ? May 1, 2020 06:28

Ghost of Reagan Past: Oct 7, 2003; rock and roll fun

Hollow Talk posted:

I have a quasi-religious (i.e. style) question, since I briefly paused today when writing something to consider what would be the most natural way for other people. Background is how to write/compose a set of transformations, e.g. funneling a dict through a series of functions which each take a dict, operate on it, and return a dict.
code:
row = [1, 2, 3, 4]
row = list(map(dec, map(inc, map(inc, row))))

This one. But I like to make sure all my code passes the "2AM test." That is, if I'm woken up at 2AM and have to look at the code because of some dumb production incident or something, will I be able to understand it? This I would, but once you start doing more complicated things, your mileage may vary. So I think this works, but you're gonna be teetering on the edge of the 2AM test with more complex code. Readability counts

# ? May 1, 2020 12:36

CarForumPoster: Jun 26, 2013; â¡POWERâ¡

Malcolm XML posted:

the valid cases for pickling that don't lead to tears is vanishingly small

unless you need to serialize _arbitrary_ python objects, there are usually better, faster, smaller ways that don't restrict you to only python

This is not a vanishingly small use case it�s an extremely common one, especially in data science. Eg:
-using your trained ML model in production
-Having a DataFrame containing python objects that you want to export
-very quickly loading/writing a few GB mixed dtype table

The last one can be accomplished using other formats though somewhat slower

CarForumPoster fucked around with this message at 13:32 on May 1, 2020

# ? May 1, 2020 13:27

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

Malcolm XML posted:

the valid cases for pickling that don't lead to tears is vanishingly small

unless you need to serialize _arbitrary_ python objects, there are usually better, faster, smaller ways that don't restrict you to only python

Really there's technically better ways to do almost everything that pickle does.

It's surprisingly often that the betterness of these other ways is trumped by the fact that pickle is in the standard library and takes almost zero time to implement.

You should default to the easy, low-cost way and re-evaluate when it becomes apparent you need something more. You've lost almost nothing and quite possibly saved yourself a lot.

Of course, this requires knowing what pickle does and it's shortcomings. There's a very wide gray area where it just might work fine.

# ? May 1, 2020 15:55

Malcolm XML: Aug 8, 2009; I always knew it would end like ｔｈｉｓ．

Every one of those cases has awful gotchas if pushed beyond one offs and after fixing many cases of pickling gone wrong if you can't spare 10 seconds to find a better domain specific solution that's about the only case to use it

Lmao at deploying pickles that's another case where it'll explode in tears (and is bad engineering too since it can easily break when python or your own or library code changes ). It's dog slow, breaks on common object structure, and is a giant security hole if you load untrusted pickles since the pickle vm does ace by default

Arrow / parquet is far better for data frames.

There's a lot of crap in the python stdlib.

# ? May 1, 2020 17:05

Hollow Talk: Feb 2, 2014

Malcolm XML posted:

Every one of those cases has awful gotchas if pushed beyond one offs and after fixing many cases of pickling gone wrong if you can't spare 10 seconds to find a better domain specific solution that's about the only case to use it

Lmao at deploying pickles that's another case where it'll explode in tears (and is bad engineering too since it can easily break when python or your own or library code changes ). It's dog slow, breaks on common object structure, and is a giant security hole if you load untrusted pickles since the pickle vm does ace by default

Arrow / parquet is far better for data frames.

There's a lot of crap in the python stdlib.

# ? May 1, 2020 17:17

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

One offs are surprisingly common.

I certainly wouldn't advocate for using it for anything other than that.

One offs like distributing a data structure to a handful of known systems is a perfect use case of pickle.

Claiming that pickling is bad and dumb isn't really the right way to say it if only because it's obviously wrong depending on the use case. Claiming that pickling is bad and dumb for X, Y, and Z is a much more defensible position.

The standard library is an awful mess. However, adding dependencies is quite often worse.

A big problem with pickle is that it's so easy to use that 9 times out of 10 that it gets used it's the inappropriate tool. This colors everyone's perception of the thing.

Thermopyle fucked around with this message at 18:48 on May 1, 2020

# ? May 1, 2020 18:39

Malcolm XML: Aug 8, 2009; I always knew it would end like ｔｈｉｓ．

Thermopyle posted:

A big problem with pickle is that it's so easy to use that 9 times out of 10 that it gets used it's the inappropriate tool. This colors everyone's perception of the thing.

This is exactly why it's bad and dumb, e.g., if it has to have a giant red box in the docs calling out insecurity it probably shouldn't be shipped in the stdlib.

It's a giant footgun. Sure there's a very tiny use case for it but 99% of the time use something else

If 99% of the time it's bad and dumb, I think it's completely worth calling it bad and dumb in general. If you are advanced enough to know how the pickle VM and protocol works and where it doesn't you can make up your own mind. Most people do not.

Good use case: "I want to rehydrate some complex object state that isn't accidentally circular for something exploratory and that I don't rely on to work in exactly the same environment as when I saved it"

It's exactly the same as Java serialization. More problems than benefits.

quote:

One offs like distributing a data structure to a handful of known systems is a perfect use case of pickle.

This is exactly when you run into some of dumb fuckin' issues of compatibility and have to use a dependency like dill to debug it.

Anyway for this guy's use of it in production: numpy will wrap arrays into pickle using save, but just use parquet and a filesystem arrangement. You're then not locked into using Python if you need to transfer data elsewhere, and don't have discover __reduce_ex__. Also you get statistics and various computations for free + compression

Malcolm XML fucked around with this message at 21:47 on May 1, 2020

# ? May 1, 2020 21:44

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

Malcolm XML posted:

This is exactly why it's bad and dumb, e.g., if it has to have a giant red box in the docs calling out insecurity it probably shouldn't be shipped in the stdlib.

I think that stupid red box is a consequence of the rest of the pickle docs being bad and not a consequence of something bad being included in the stdlib.

Anyway...

Malcolm XML posted:

I think it's completely worth calling it bad and dumb in general.

A more accurate description would be "don't use it unless you're very sure of the consequences". I think we'll just have to disagree on this one.

# ? May 1, 2020 23:35

Dominoes: Sep 20, 2007

What should and shouldn't be in the std lib is a subjective can of worms. Python (initially at least) positioned itself as a batteries-included lang, and has a deliberately broad std lib. (Which is now in varying states of quality, maintenance, and API styles) This happens to be one of the sharpest tools in the lib.

# ? May 1, 2020 23:38

QuarkJets: Sep 8, 2008

Malcolm XML posted:

Anyway for this guy's use of it in production: numpy will wrap arrays into pickle using save, but just use parquet and a filesystem arrangement. You're then not locked into using Python if you need to transfer data elsewhere, and don't have discover __reduce_ex__. Also you get statistics and various computations for free + compression

# ? May 2, 2020 00:00

Malcolm XML: Aug 8, 2009; I always knew it would end like ｔｈｉｓ．

Thermopyle posted:

I think that stupid red box is a consequence of the rest of the pickle docs being bad and not a consequence of something bad being included in the stdlib.

pickle is inherently designed insecurely but it's not at all obvious that you are running an unrestricted bytecode interpreter to reconstruct the object

Dominoes posted:

What should and shouldn't be in the std lib is a subjective can of worms. Python (initially at least) positioned itself as a batteries-included lang, and has a deliberately broad std lib. (Which is now in varying states of quality, maintenance, and API styles) This happens to be one of the sharpest tools in the lib.

I think we can all agree a tool that's virtually impossible to use safely and correctly is not a great choice to be in the stdlib, giving it a seal of quality it doesn't deserve. It should've been removed in python3 to become an optional lib, in which case you have to opt in very explicitly.

https://www.python.org/dev/peps/pep-0594/ cannot get here quick enough though. In this case the battery is leaking acid all over the place.

QuarkJets posted:

yeah parquet is a beast but it's the common format for easy analytics data transfer -- (other than hdf5 or .mat lol)

it's also ~big data~ compliant so you throw spark/presto/hadoop/etc if you need to without any real work

# ? May 2, 2020 00:03

Dominoes: Sep 20, 2007

Malcolm XML posted:

It should've been removed in python3 to become an optional lib, in which case you have to opt in very explicitly.

I'd love a nice cleaned up std lib, but don't see a path for it to happen, politically.

# ? May 2, 2020 00:09

OnceIWasAnOstrich: Jul 22, 2006

I've played around a bit with using whatever filesystem someone happens to be running to handle this and storing individual binary files with raw bytes. Putting all 200m in a single directory is a no-go, it causes everything I tried (ext4, xfs, btrfs) to blow up. Apparently even the B-trees can't handle that. It works reasonably well if I create a directory hierarchy per-character since I have a max of 36 possibilities [A-Z0-9] for each character and a max of 10 characters so the hierarchy doesn't end up too deep or with too many files in a single folder, just an insane amount of folders but well within the capabilities of most filesystems. This makes copying this an exercise in "you better loving have an SSD". It's probably not so bad once I manage to create a tar and the writing shouldn't be too random.

I guess if I just made it its own FS and copied it at a block level and distributed an image that would solve the random I/O problem but add in requiring people to deal with a filesystem image. This is basically the same thing I did with HDF5 (and a recursive version of what QuarkJets suggested) but with the folders replaced by HDF5 groups. HDF5 has the benefit of being a solid file object I can read/write with sequential I/O. I guess this is kind of the same thing as a filesystem image except that the code to deal with it is in the HDF5 library instead of the kernel.

For practical purposes, this is going to be used by a dozen or less of my students for research purposes for now, so I can make them do whatever I want as long as I teach them how to do it. This is all making putting this all in a real database (and maybe creating a database dump) and a docker image more and more appealing.

OnceIWasAnOstrich fucked around with this message at 01:35 on May 2, 2020

# ? May 2, 2020 01:31

TheKingofSprings: Oct 9, 2012

Anyone know how the subprocess library works? I�m doing some latency measurement stuff across two computers running a shell command on my receiver through python�s subprocess library, and I�ve noticed that if I run my code in the actual terminal the latency is a more or less stable oscillation like I�m expecting, but running it in python as a subprocess causes my measured latency to slowly increase to infinity.

I suspect it has something to do with subprocess receiving fewer CPU resources but I am not certain.

# ? May 5, 2020 19:47

necrotic: Aug 2, 2005; I owe my brother big time for this!

It should have the same resources available to it, it's just another process. Does the script generate a lot of output to either stderr or stdout? Is that perhaps getting buffered in the python process?

# ? May 5, 2020 20:54

Adbot: ADBOT LOVES YOU

# ? May 23, 2024 14:57

QuarkJets: Sep 8, 2008

Yeah what's Python being used for here? Just launching a subprocess and nothing else? Also, there are many ways to create a subprocess, which are you using and with what options?

# ? May 6, 2020 00:46

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python

«‹›230 »