|
rotor, please do the needful
|
# ? May 13, 2014 20:47 |
|
|
# ? Jun 12, 2024 20:51 |
|
Brain Candy posted:if only java had immutable data structures with minimal performance costs, like, java.util.Collections.unmodifiableSet
|
# ? May 13, 2014 20:48 |
|
Bloody posted:as of this post i would up the punishment to 3 day probations for lawchat that seems reasonable, ok
|
# ? May 13, 2014 21:02 |
https://github.com/aspnet
|
|
# ? May 13, 2014 21:09 |
|
Shaggar posted:if the databases are on the same server, union the tables across all the databases. different servers
|
# ? May 13, 2014 21:21 |
|
Soricidus posted:that's not an immutable data structure
|
# ? May 13, 2014 21:32 |
|
Sweeper posted:so i have a data set where it is a string and a count for some number of sets. i basically want to find the top 10 things by count in a union of all these sets. the problem is this data is very large and exists in multiple databases.... what should i be looking for? do i actually need to look through the entirety of each set to guarantee that nothing is missed? hadoop / cascading
|
# ? May 13, 2014 21:42 |
|
seriously though, counting things is the "hello world" for hadoop/cascading. write an etl package to dump everything to hdfs or s3, spin up a few nodes, whomp.
|
# ? May 13, 2014 21:43 |
|
Brain Candy posted:totally is, if nobody gets to alter the backing set. maybe you think persistent is the same as immutable? it solves one problem, i.e. passing someone a collection without copying it and knowing they won't modify it. but it doesn't solve other problems, such as being passed a collection without copying it and knowing the caller won't modify it.
|
# ? May 13, 2014 22:29 |
|
Kevin Mitnick P.E. posted:java gets it right. except i guess in the case that you give command line arguments that can't be decoded by the platform default codec in which case you've earned whatever hell you find yourself in two words: surrogate characters
|
# ? May 14, 2014 01:46 |
|
As long as humans continue to exist, data will continue to be messy, and attempts to say "this is text and this is bytes" are naive and unrealistic, no matter how firmly you say it. Python 3's Unicode support is truly embarrassing.
|
# ? May 14, 2014 02:04 |
|
Soricidus posted:no, i think "if nobody gets to alter the backing set" is a very big if. you know its immutable because because that's what it said in the documentation/annotations. as long as non-language designers get to make new data structures, that's about where you are going to end up, +/- pretend guarantees like const everything you point out about the data structure also applies to the things it contains. if you want to get a stronger guarantee than this, you'll need to stop people from making mutable things period
|
# ? May 14, 2014 02:06 |
|
Suspicious Dish posted:As long as humans continue to exist, data will continue to be messy, and attempts to say "this is text and this is bytes" are naive and unrealistic, no matter how firmly you say it. How long is the data here? Is it 4 bytes? Or is it 2 16-bit characters?
|
# ? May 14, 2014 02:35 |
|
Notorious b.s.d. posted:seriously though, counting things is the "hello world" for hadoop/cascading. too slow it is for a UI, no? It is also filterable so it's in a relational database atm. whoever did the setup decided that when they needed more space they would just stand up another pair of databases (master/slave) and now we have 7 pairs of databases and I want to be able to do joins on the data across them like I could have just dumped the data into dynamo and done scans on it, but it ends up being super loving expensive and a poo poo ton of full table scans to count the counts for ever column in the database.
|
# ? May 14, 2014 03:23 |
|
Brain Candy posted:you know its immutable because because that's what it said in the documentation/annotations. as long as non-language designers get to make new data structures, that's about where you are going to end up, +/- pretend guarantees like const you'd know it was immutable if you used a good programming language where everything is immutable unless explicitly specified otherwise
|
# ? May 14, 2014 03:33 |
|
Sweeper posted:too slow it is for a UI, no? It is also filterable so it's in a relational database atm. whoever did the setup decided that when they needed more space they would just stand up another pair of databases (master/slave) and now we have 7 pairs of databases and I want to be able to do joins on the data across them there exist sharding proxies for mysql (tungsten) and postgres (pgpool) that let you use a subset of sql in parallel... but they're a pain in the butt if your data is small enough, the cheapest/best answer is probably gonna be a DB server that is 10x as big. off the shelf x86 will do several TB of RAM now Sweeper posted:like I could have just dumped the data into dynamo and done scans on it, but it ends up being super loving expensive and a poo poo ton of full table scans to count the counts for ever column in the database. by definition a precise count requires a table scan. that's why map/reduce frameworks are good at counting: guaranteed ceiling on worst-case time and it is a simple aggregate function you can use indices to get estimated counts on postgres/oracle but they're only estimates
|
# ? May 14, 2014 04:30 |
|
if you can make any assumptions about the data you could probably whip up an estimator that was good to within an acceptable confidence interval in less execution time but this would probably take a lot more effort than just querying 7 databases
|
# ? May 14, 2014 04:44 |
|
Notorious b.s.d. posted:there exist sharding proxies for mysql (tungsten) and postgres (pgpool) that let you use a subset of sql in parallel... but they're a pain in the butt i'm thinking i could stop query after some count < x where x is probably the lowest in the top 10 counts. i was hoping for something simple dome smart math person came up with where i could be like 85% sure or w/e that there won't be any lower summed counts than that no one is going to green light my one giant db plan sadly
|
# ? May 14, 2014 05:08 |
|
Sweeper posted:i'm thinking i could stop query after some count < x where x is probably the lowest in the top 10 counts. i was hoping for something simple dome smart math person came up with where i could be like 85% sure or w/e that there won't be any lower summed counts than that if estimates are good enough, take a random sample and work with that data instead. i'm no ace at statistics, but a sample of 1,000 adults is good enough to make high-confidence estimates about the behavior of 100 million voters, so a little thought could go a long way
|
# ? May 14, 2014 05:19 |
|
Sweeper posted:i'm thinking i could stop query after some count < x where x is probably the lowest in the top 10 counts. i was hoping for something simple dome smart math person came up with where i could be like 85% sure or w/e that there won't be any lower summed counts than that that might work, id probably do a two pass thing where if you want the top 10 overall, take the top 50 (+/-) strings from each db, and then do a second pass where you found the counts of each of the strings from the first pass and took the top 10 of those
|
# ? May 14, 2014 05:45 |
|
Notorious b.s.d. posted:if estimates are good enough, take a random sample and work with that data instead. depending on how the counts are distributed and how big the samples are id be worried that you wouldn't be able to tell #10 from #11 (or #5 from #30) very reliably, if there's lots of strings the frequencies of any individual one are probably gonna be pretty low, and then you're in the situation of estimating small binomial probabilities which is not where i personally like being
|
# ? May 14, 2014 05:48 |
|
Booblord Zagats posted:I figure I should share a story my dad told me about idiot contractors. Not sure If I've told it on here before, but since he just told it again the other night when we were talking to him, I'll go ahead and tell it again because it's a good story.
|
# ? May 14, 2014 06:10 |
|
Brain Candy posted:everything you point out about the data structure also applies to the things it contains. if you want to get a stronger guarantee than this, you'll need to stop people from making mutable things period but that doesn't alter the only fact i'm arguing, which is that java does not provide immutable collections. all it provides is a way to create an immutable view of a mutable collection, which is plang level poo poo. it's not even visible in the type system, let alone statically checked.
|
# ? May 14, 2014 08:26 |
|
Damiya posted:js owns and node remains a great platform for scripts and tasks.. typescript is cool
|
# ? May 14, 2014 08:46 |
|
Soricidus posted:yes? i agree. java already did this for strings and it worked out really well. taking it further would be great. are you an objc bro *tries to do the secret handshake with u*
|
# ? May 14, 2014 08:47 |
|
Brain Candy posted:totally is, if nobody gets to alter the backing set. maybe you think persistent is the same as immutable? I can't believe a clanger like me has to point this out, but that's totally not an immutable data structure. an immutable data structure has mutator methods, except instead of mutating the object they return a new, immutable object with the result of the mutation, which you can then operate on, or compare-and-swap with the original to make the result globally visible (the GC will then handle the clean-up of the original object, and as a clanger I'm terribly envious of this). usually, the structure of a true immutable data structure is specifically optimized for this kind of copy-on-write mutation, so that multiple objects resulting from the mutation of a common object can share most or all of the data that wasn't affected by the mutation. this kind of structure tends to look an awful lot like the call tree of a recursive algorithm, frozen in memory, and that's why immutable data structures are a gateway drug to functional programming (flangers please don't take offense at my very clangerish view of immutable data structures)
|
# ? May 14, 2014 09:16 |
|
hackbunny posted:I can't believe a clanger like me has to point this out, but that's totally not an immutable data structure. an immutable data structure has mutator methods, except instead of mutating the object they return a new, immutable object with the result of the mutation, which you can then operate on, or compare-and-swap with the original to make the result globally visible (the GC will then handle the clean-up of the original object, and as a clanger I'm terribly envious of this). usually, the structure of a true immutable data structure is specifically optimized for this kind of copy-on-write mutation, so that multiple objects resulting from the mutation of a common object can share most or all of the data that wasn't affected by the mutation. this kind of structure tends to look an awful lot like the call tree of a recursive algorithm, frozen in memory, and that's why immutable data structures are a gateway drug to functional programming yeah this is pretty much it CoW is so amazingly powerful, systems peeps have been using it forever w.r.t virtual memory/fork etc that + TCO basically making recursion costless is great
|
# ? May 14, 2014 09:44 |
|
i can see why j-langers don't like immutability if the only experience they have with it is mutable containers except with the setters removed
|
# ? May 14, 2014 10:03 |
|
everything should be immutable starting with this thread
|
# ? May 14, 2014 10:17 |
|
ComradeCosmobot posted:two words: surrogate characters just stream over .codepoints() no problem Malcolm XML posted:yeah this is pretty much it yeah, costless except everything is a tree and causes a page fault
|
# ? May 14, 2014 11:02 |
|
i'd have to dig up the post i made like a hundred pages ago about doing reference counting by doing lifetime management on the collections level to make this post complete, but one of the niceties of that type of approach is that you can have the collections mutate if they only have a single reference, while maintaining the appearance of immutability in all other cases. makes for a solid optimization while keeping things nice and simple
|
# ? May 14, 2014 11:17 |
|
Cybernetic Vermin posted:i'd have to dig up the post i made like a hundred pages ago about doing reference counting by doing lifetime management on the collections level to make this post complete, but one of the niceties of that type of approach is that you can have the collections mutate if they only have a single reference, while maintaining the appearance of immutability in all other cases. makes for a solid optimization while keeping things nice and simple isnt that what rust is doing? i wonder how rust would handle memory alignment and packing stuff that high performance software want
|
# ? May 14, 2014 12:27 |
|
lol
|
# ? May 14, 2014 12:43 |
|
Soricidus posted:yes? i agree. java already did this for strings and it worked out really well. taking it further would be great. below is the definition of persistent data structures which everybody has been calling immutable for some reason. it's an immutable view of a mutable collection w. idempotence. hackbunny posted:I can't believe a clanger like me has to point this out, but that's totally not an immutable data structure. an immutable data structure has mutator methods, except instead of mutating the object they return a new, immutable object with the result of the mutation, which you can then operate on, or compare-and-swap with the original to make the result globally visible (the GC will then handle the clean-up of the original object, and as a clanger I'm terribly envious of this). usually, the structure of a true immutable data structure is specifically optimized for this kind of copy-on-write mutation, so that multiple objects resulting from the mutation of a common object can share most or all of the data that wasn't affected by the mutation. this kind of structure tends to look an awful lot like the call tree of a recursive algorithm, frozen in memory, and that's why immutable data structures are a gateway drug to functional programming
|
# ? May 14, 2014 12:45 |
|
for my next trick i will get mad when somebody says you can't use recursion in a lang without TCO
|
# ? May 14, 2014 14:29 |
|
how about environments that have flaky TCE and detonate your stack when you expected you'd be fine
|
# ? May 14, 2014 14:31 |
|
Otto Skorzeny posted:how about environments that have flaky TCE and detonate your stack when you expected you'd be fine this never happens if you use this haskell extension
|
# ? May 14, 2014 16:33 |
|
Went ahead and published a new article in Xplain: http://magcius.github.io/xplain/article/window-tree.html It doesn't contain everything I'd like it to contain, namely the WM part of it, but I figured you guys might appreciate it.
|
# ? May 14, 2014 22:54 |
|
oh god I hope this hasn't been posted yet http://codeofrob.com/entries/you-have-ruined-javascript.htmlquote:To configure the dealer, all we have to do is
|
# ? May 15, 2014 10:06 |
|
|
# ? Jun 12, 2024 20:51 |
|
Max Facetime posted:oh god I hope this hasn't been posted yet http://codeofrob.com/entries/you-have-ruined-javascript.html
|
# ? May 15, 2014 10:18 |