|
Progressive JPEG posted:ive been using avro as a serialization format for some one-way record streaming stuff and its worked out p well. avro's nice for streaming because the spec defines a framing format so you dont have to invent your own every time. seriously why didnt protobuf at least define some kind of convention for like a size header to use when dealing with a list of multiple adjacent records, it could've just been '8 bits of magic followed by 24 bits of size' and itd have been plenty; you mean like a repeated field?
|
# ? May 24, 2016 08:47 |
|
|
# ? May 11, 2024 08:53 |
|
Progressive JPEG posted:ive been using avro as a serialization format for some one-way record streaming stuff and its worked out p well. avro's nice for streaming because the spec defines a framing format so you dont have to invent your own every time. seriously why didnt protobuf at least define some kind of convention for like a size header to use when dealing with a list of multiple adjacent records, it could've just been '8 bits of magic followed by 24 bits of size' and itd have been plenty; lol at anyone with a 16mb proto message. also the avro schema can either be provided up-front with the data or in a separate channel, making the data fully self-describing, while the records themselves have little overhead since any decoding fully depends on the contents of the provided schema how quickly are you generating 16mb of streaming data? your link makes it seem like schemas need to be shuffled around all the time, but saving one alongside each file wouldn't be so bad Progressive JPEG posted:avro specifies its own socket protocol too, but afaict the only existing implementation of that, even among all the first party libs, only exists in the java lib, which sorta makes sense given the hadoop lineage. but im intending to support 4-5 or so different langs (which all do have good 1st-party or 3rd-party serialization implementations) and i dont want to require more diying than necessary for users of the interface
|
# ? May 24, 2016 08:56 |
|
JawnV6 posted:protobuf couldn't do repeated entries with a single size, records are variable length in a way dependent on the data. even for ints. so unpacking requires linked listish traversal most of the strange in protobufs can be explained by protobufs being used heavily for serialization-to-disk at google. as they did it converting from one to many is just shove more things the payload, where avro would need another message type. it doesn't make sense for a wire protocol (my bytesss), but does if you are trying to read two year old blobs
|
# ? May 24, 2016 12:05 |
|
eschaton posted:a friend had this in his .signature for a long time: lol
|
# ? May 24, 2016 13:15 |
|
aw man, you can't remote debug DOS4/GW applications in Open Watcom because the required DLL is missing
|
# ? May 24, 2016 14:30 |
|
Bloody posted:what is property based testing Adapting a post from an internal email at work (I do write long emails, bear with me) Before diving into what property-based testing and how it works, let's go for some wild claims we've had at the Erlang Factory last march. In the slide set[1] from Thomas Arts, it is said that they used quickcheck (canonical property-based testing tool) to run tests on Project FIFO, a financial cloud open source project[2]. Their end results was:
So how does that work? That's the magic of property-based testing. There's two basic modes for property-based testing: datastructure generation and Finite State Machine model validation. Datastructure generation is fairly simple and has been reimplemented in many languages[3] from the initial Haskell version of quickcheck. What it does is go from a type declaration or generator declaration and generate more and more complex instances of that type to pass to a function or method. For example, the type 'integer' may start at 0, and progressively hit -1, +1, -3, +5, ..., up until more complex boundaries. The type 'string' may generate "", as well as "ap321..`;r". Custom generators can be written such that if I gave it the proper rules, I could generate a data structure that represents a dyno's logical state, for example, or possible user input. What is done then is that these types as declared are passed to a test case, which is repeated. At each repetition, the complexity and size of generated terms is increased. This forces you to write tests differently, as a set of properties (hence the name). So instead of writing testcases like: code:
code:
If a failure occurs, the property-based test framework then performs something called shrinking. Shrinking means taking the current failing term, and trying to reduce it to its simplest form that provokes the failure. For example, the test case above could fail initially on a list such as: [0, 2321, 41234.2123, 18.32130, -1, -3412312, 0.0, 3123] Which makes it non-obvious what the bug is. Shrinking could return me a failure case where the failing input is: [0, 0.0] Which would let me know that either the property or the max value is wrong based on float vs. integer sorting order (or sort stability). I don't have to dig, the test framework does it for me. That's basic property-based testing in a nutshell. The most advanced kind, which only exists in Erlang as far as I can tell (and is the one discussed in Thomas Arts' presentation), works by doing a special thing: if we can represent a program's execution sequence as a data structure, we can use quickcheck on it. So the big magical thing they do is build a framework where you define a model consisting of a state machine containing:
It then, step by step, compares the output of the model to the output of the real thing, and ensures all post conditions still hold. If one doesn't or the model disagrees from the implementation, you have a test failure. And much like regular test cases, it then tries to shrink that result by finding simpler sequences of operations that find problems. So if we had a bug that went 'user adds an app, removes an app, re-adds the app, scales up, moves up a tier, moves down a tier, scales down, scales up, runs a new release' and the bug was in the sequence 'moves up a tier, scales down', the shrinking mechanism could possibly find that. --- The big strength of this is that it's not exactly the same as fuzzing; the more complete test suites are more like model validation through generated command sequences (non-exhaustive). The basic stuff is a bit more like fuzzing, but with the requirement of the properties/invariants being maintained. The other big distinction is that since all of that exploration is guided by how types and command sequences are created, you do have the ability to shrink problems to a simpler form. I'm still spending a bunch of time toying with some of these frameworks (specifically PropEr[4], a GPL variant of the Quiviq Quickcheck framework[5]); if there's interest I can walk through more complete examples or give sample runs of [6] --- Additional reading material
References: [1]: http://www.erlang-factory.com/static/upload/media/1461230674757746pbterlangfactorypptx.pdf -- video at https://youtu.be/iW2J7Of8jsE [2]: https://project-fifo.net/ [3]: List in 4th paragraph of https://en.wikipedia.org/wiki/QuickCheck [4]: http://proper.softlab.ntua.gr/ [5]: http://www.quviq.com/ [6]: Sample test suite I have written to property-check a python ATM from Erlang with the FSM stuff
|
# ? May 24, 2016 14:38 |
|
Wheany posted:so what is the reasoning behind "never write if" in code? it's exactly what you said: quote:trying to minimize the number of branches, so the code is easier to reason about except apparently that one guy took it literally and just replaced all his ifs with whiles? lol.
|
# ? May 24, 2016 15:31 |
|
Luigi Thirty posted:oh i am literally retarded thanks can't you enable warnings, like idk try compiling the logical part of the code in a modern compiler before compiling into the old dos stuff seriously I can't see myself doing any C++ without a shitton of warnings enabled
|
# ? May 24, 2016 15:40 |
|
Jabor posted:you mean like a repeated field? itd be the equivalent of an ongoing stream of proto messages, each one individually decodable along the way proto doesnt encode the length of each message so you gotta do it yourself seriously if they just like added a sentence to that section saying, 'but if you DID need to use a header, we recommend this format' then itd save so much trouble for anyone with that use case (whether its storing multiple messages to a file, or streaming multiple messages over a network) and there'd be a possibility of a standard by convention that any tooling could support. i mean its obvious that this isnt a foreign concept to them so whats the deal with leaving it totally undefined JawnV6 posted:protobuf couldn't do repeated entries with a single size, records are variable length in a way dependent on the data. even for ints. so unpacking requires linked listish traversal you can even parse an individual protobuf message without having the original .proto spec, you just lose the field labels and end up with tag numbers as though they were all unknown fields (AMA about writing a real slick debugging tool to print potentially unknown protos a couple jobs ago). but when you start having multiple records, it needs some kind of delimiter or size header to determine when one message ends and another begins, or else it just parses as a single long message with a mishmash of repeated fields quote:how quickly are you generating 16mb of streaming data? your link makes it seem like schemas need to be shuffled around all the time, but saving one alongside each file wouldn't be so bad the 16mb example was to say that you wouldnt want a single protobuf message that big anyway since the libraries quickly start falling over in a sea of mallocs. iirc the protobuf implementations have some hardcoded limits at 100mb so if youre being really terrible youd need to hack libraries at both ends to parse messages exceeding that. in that situation gee maybe youd want to represent your data as a series of small messages hmmmm also yeah swagger's ok but a goal was to keep sizes in check over long intervals if needed
|
# ? May 24, 2016 15:45 |
|
ErIog posted:I'm not like a Good Programmer, but isn't the compiler already doing that? probably. i think either way they end up as a beq and then some code to put the return value onto the stack and then break down the stack frame. MononcQc posted:Adapting a post from an internal email at work (I do write long emails, bear with me) one day i want to work on grown up poo poo like this
|
# ? May 24, 2016 15:57 |
|
Progressive JPEG posted:itd be the equivalent of an ongoing stream of proto messages, each one individually decodable along the way iirc theres an easy way of forcing the libs to let you have really big protobuf messages i dont remember what it is but i definitely used it once to serialize multi-gigabyte data structures lmao
|
# ? May 24, 2016 16:08 |
|
MononcQc posted:Adapting a post from an internal email at work (I do write long emails, bear with me) this is an awesome post, tyvm
|
# ? May 24, 2016 16:09 |
|
i fixed my usb interface block in some ways but broke it in others. now sometimes when transmitting the device just says ehhh gently caress it and throws driver errors until you disconnect-reconnect the device. lmao
|
# ? May 24, 2016 16:10 |
|
but the proto format does encode the length of everything? like, if i had the following proto: code:
|
# ? May 24, 2016 16:24 |
|
LeftistMuslimObama posted:one day i want to work on grown up poo poo like this same
|
# ? May 24, 2016 16:29 |
|
Question: does it bother anyone else if the function defs are touching?
|
# ? May 24, 2016 16:37 |
|
Jabor posted:but the proto format does encode the length of everything? Progressive JPEG posted:proto doesnt encode the length of each message so you gotta do it yourself seriously, Jabor's right and they have specified a "header" format if you just wrap messages up like he did. each one gets a tag and length delimiter. you still have to chew through each one, but idk how much more definition you'd need Progressive JPEG posted:you can even parse an individual protobuf message without having the original .proto spec, you just lose the field labels and end up with tag numbers as though they were all unknown fields (AMA about writing a real slick debugging tool to print potentially unknown protos a couple jobs ago). but when you start having multiple records, it needs some kind of delimiter or size header to determine when one message ends and another begins, or else it just parses as a single long message with a mishmash of repeated fields really, how difficult is it to provide both the reader & writer schema, versioning them, etc.? your link made it sound like an awful pain Progressive JPEG posted:gee maybe youd want to represent your data as a series of small messages hmmmm i haven't worked on something with the unimaginable luxury of 16mb lying around without having to checksum every 512kb or whatever, but it kinda sounds like you're using it for long term storage as well as encoding? why aren't you using something else as a data store once you're off the streaming device?
|
# ? May 24, 2016 17:12 |
HoboMan posted:Question: does it bother anyone else if the function defs are touching?
|
|
# ? May 24, 2016 17:35 |
|
you should be padding functions with documenting comments
|
# ? May 24, 2016 17:37 |
|
i use a form feed
|
# ? May 24, 2016 17:41 |
|
Wheany posted:i use a form feed vertical tabs
|
# ? May 24, 2016 17:58 |
|
the Russians used a pencil
|
# ? May 24, 2016 18:10 |
Power Ambient posted:you should be padding functions with documenting comments 2 empty lines general function comments and short working example commented function 2 empty lines general funct...
|
|
# ? May 24, 2016 18:15 |
project im on is meant to be self explanatory on a glance to people unfamiliar with it, so my fucntions ahve like at least a line of comments per line of code that i dont even try to cram into long single line statements, and each class/method is introduced with good half a dozen-dozen lines of generic how-tos and poo poo. and then theres 20 page documentation
|
|
# ? May 24, 2016 18:17 |
speaking of which, anyone willing to share a word or two about the sphinx documentation thing for python?
|
|
# ? May 24, 2016 18:18 |
|
Today I learned about context managers in python
|
# ? May 24, 2016 18:31 |
|
kalstrams posted:speaking of which, anyone willing to share a word or two about the sphinx documentation thing for python? i've used it briefly and it was okay. it's kind of the only game in town for python docs. if you want to have proper python documentation, you should use it just be aware that the autodoc features require a lot of manual tweaking to make good documentation. for example, the docs folder for tornado (https://github.com/tornadoweb/tornado/tree/master/docs) include a lot of files like this where the autoclass generator wasn't good enough so they went in and manually indexed the methods, and that can introduce some maintenance overhead. also you may want to use "google-style" docstrings since they're much more human readable http://www.sphinx-doc.org/en/stable/ext/napoleon.html though iirc these have some limitations in the actual built doc output abraham linksys fucked around with this message at 18:43 on May 24, 2016 |
# ? May 24, 2016 18:40 |
|
just got handed an old C project with K&R style function declarations now i'm sad
|
# ? May 24, 2016 19:10 |
|
Power Ambient posted:you should be padding functions with documenting comments lol @ this guy
|
# ? May 24, 2016 19:16 |
|
avro is pretty great if you do anything with spark. i moved a ton of our spark inputs from csv to avro and millions of problems went away instantly also, property based testing is the poo poo. i have about 1200 lines of PropEr code (the erlang framework mentioned) that exhaustively test and exercise a pretty complex postgres db with ~70 stored procs that exposed a bunch of logic errors in the stored procs i also have a json parser that is modelled internally on push down automata that has ~300 lines of PropEr tests that test every single possible state transition (count: a lot) i almost think erlang is worth learning just for PropEr alone if you work on anything that resembles a state machine
|
# ? May 24, 2016 19:34 |
|
oh wait if i can't get my debugger working, can't I just use serial output from dosbox going to a virtual serial port as a dumb terminal for debug statements I'm going back in time from 1993 to 1983
|
# ? May 24, 2016 19:43 |
|
the talent deficit posted:i almost think erlang is worth learning just for PropEr alone if you work on anything that resembles a state machine yeah, I'm trying to push it internally to use as a blackbox test tool for some more complex ruby components and get some of that ~synergy~ going between some of our Erlang and ruby peeps in this here company.
|
# ? May 24, 2016 19:45 |
I read through that email in its entirety, but since I'm foreigner with no computer education, can someone frame applicability of property based testing in layman's terms
|
|
# ? May 24, 2016 21:01 |
|
HoboMan posted:lol @ this guy #nocomments
|
# ? May 24, 2016 21:15 |
|
kalstrams posted:I read through that email in its entirety, but since I'm foreigner with no computer education, can someone frame applicability of property based testing in layman's terms Rather than going in and explicitly listing all the relevant cases and finding the edge conditions yourself, you write a specification that says "I expect this type of data" and then a bit of code that says "here's what I expect to be true no matter what data you have". Then the test tools generates all the tests for you and finds edge conditions you wouldn't have thought of. So coming back to the example in the email, let's say I have a piece of code that finds me the greatest value in a list of value. I could write tests such as: code:
So the thing I can do is find the properties of max($list). The trick is to figure out how to say what it does without the implementation itself. One way would be, for example, to use an existing 'min' function and say that the biggest number is the one for which all the other ones are smaller. Or I could take a sort() function that returns me all the items in order, and then picking the last item means it needs to be the greatest: code:
So what's interesting is that instead of having say, 30 lines of test code for all kinds of interleavings for the max() function, I have 2 effective lines of code that cover the same data. I can ask the test framework to generate longer or more complex cases, or to generate a hundred million variations of them if I want. As iterations go forward, the framework adds complexity to the types in the dataset. Let's say I have the example of the property above failing on the input: [120, -2321, -41234.2123, 18.32130, -1, -3412312, 120.0, -3123] Shrinking (the framework goes back and tries to simplify the input) could produce say: [120, 120.0] or even better: [0, 0.0] This means that max([0, 0.0]) != last(sort([0, 0.0])). It's not obvious why that is. Maybe the code is wrong, or maybe the test is. In this case, a likely candidate could be that sort() is stable (if two values are compared equal, they remain in the same order in the final list) whereas max() could always favor the leftmost item in the list to be the greatest. This requires me to refine my property: do I want max() to be stable, or do I not care? If I don't care, then I need to modify my property such that either value can be good (make sure 0 == 0.0 is true); if not, then I need to modify my implementation to always return the rightmost operand in the list as greatest. And that stuff would likely not have been exposed by manual testing, but property-based testing uncovers all kinds of crap like that in code. The advanced use cases work on the same principle, but instead of having <list of numbers generator>, you generate sequences of commands a more complex state machine would contain (log in, write poo poo post, get suspended, refresh, log out, buy new account, shitpost, etc.) and plug that into the system. MononcQc fucked around with this message at 21:31 on May 24, 2016 |
# ? May 24, 2016 21:28 |
|
another huge win of property based testing is testing optimizations. you can model your system as hopelessly naive but obvious and then run sequences of operations on the naive but obviously correct model alongside the optimized but hard to understand model and have reasonable certainty they have the same properties and results
|
# ? May 24, 2016 21:33 |
|
the talent deficit posted:another huge win of property based testing is testing optimizations. you can model your system as hopelessly naive but obvious and then run sequences of operations on the naive but obviously correct model alongside the optimized but hard to understand model and have reasonable certainty they have the same properties and results Yeah that's especially great for data structures. I did this for my merkle tree implementations where the objective is to have a data structure that is very fast at doing diffs; just make a very naive list comparison function otherwise that is obviously correct and compare them. If the fast merkle tree is giving the same result as the naive-but-correct implementation, then you can trust the complex one as much as you trust the simple one, except it's gonna be efficient enough to be usable. Once you get that, a shitload of property tests are simpler to write, and it becomes a weak (but useful) form of model testing.
|
# ? May 24, 2016 21:36 |
|
MononcQc posted:Rather than going in and explicitly listing all the relevant cases and finding the edge conditions yourself, you write a specification that says "I expect this type of data" and then a bit of code that says "here's what I expect to be true no matter what data you have". Then the test tools generates all the tests for you and finds edge conditions you wouldn't have thought of. i like this in idea but in practice, i cant see how to actually apply this. rarely will you will be dealing with simple methods like that ive worsed on some pretty shite codebases so far, I know this
|
# ? May 24, 2016 21:51 |
|
Valeyard posted:i like this in idea like any kind of programmatic testing, you do have to design your code somewhat to facilitate it. but really all that's doing is asking you formulate an invariant for a region of code and then procedurally testing every variation of input that is legal to ensure the invariant is satisfied. if you can't write an invariant for a region of code, there's an extremely good chance that it's doing something it's not supposed to. when you take an algorithms course one of the basic things you need to do is define an invariant for your algorithm and then prove it inductively. here you just define the invariant (aka, what your code is supposed to be doing) and then the framework proves it by testing every legal input.
|
# ? May 24, 2016 22:25 |
|
|
# ? May 11, 2024 08:53 |
|
i know nothing about java. is there a useful database migration tool/library? it's a small project, so "don't bother" is a valid answer.
|
# ? May 24, 2016 22:45 |