Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
Share Bear
Apr 27, 2004

Shaggar posted:

json schema is an oxymoron

which is why we're proud to introduce mongoDb, it's webscale

Adbot
ADBOT LOVES YOU

Arcsech
Aug 5, 2008

Share Bear posted:

which is why we're proud to introduce mongoDb, it's webscale

does webscale just mean "ok to use for your startup ~*~unicorn~*~ because you'll be dead in a year anyway"

Valeyard
Mar 30, 2012


Grimey Drawer
You should json for your closely related interapp communication, and use XML for anything that's being treated as "external source" to the reciever

Bloody
Mar 3, 2013

you should never use json

Valeyard
Mar 30, 2012


Grimey Drawer
Yeah use YAML and SOAP

Valeyard
Mar 30, 2012


Grimey Drawer
I need to deal with YAML too often

Arcsech
Aug 5, 2008
learning about cassandradb since we use it in one of our products

holy poo poo this is pretty badass. like people talk about "scalability" a lot but this is real deal*

i mean look at this poo poo:


(*as long as your data is mostly/exclusively append-only. if you want to update or delete then things get kind of... fuzzy**. fortunately for us we're basically logging and reading a shitload of realtime statistics so updates/deletes happen approximately never)

(**this link is about a now-very-old version of cassandra, and im too stupid to know whether any of those things have changed.)

Baxate
Feb 1, 2011

Valeyard posted:

Yeah use YAML and SOAP

YAML doesn't support hard tabs so i don't use it

fritz
Jul 26, 2003


it almost makes me wish they had used matlab instead

Bloody
Mar 3, 2013

a million writes per second doesn't sound like very many

Toady
Jan 12, 2009


there's precedent for that change

JawnV6
Jul 4, 2004

So hot ...
BLE switched to "central/peripheral"

kitten emergency
Jan 13, 2008

get meow this wack-ass crystal prison
yaml sucks

kitten emergency
Jan 13, 2008

get meow this wack-ass crystal prison
i fixed a three year old hacky workaround by changing one line of code today

redleader
Aug 18, 2005

Engage according to operational parameters

Shaggar posted:

you can create maps like $aThing = @{ SomeProp="fffff", whatever="guuuuhhhhh" } and then $aThing.SomeProp or $aThing.whatever. I haven't done much diving into the syntax cause I was doing something pretty simple.

nice, didn't know you could access keys of hash tables like that

fart simpson
Jul 2, 2005

DEATH TO AMERICA
:xickos:

Arcsech posted:

learning about cassandradb since we use it in one of our products

holy poo poo this is pretty badass. like people talk about "scalability" a lot but this is real deal*

i mean look at this poo poo:


(*as long as your data is mostly/exclusively append-only. if you want to update or delete then things get kind of... fuzzy**. fortunately for us we're basically logging and reading a shitload of realtime statistics so updates/deletes happen approximately never)

(**this link is about a now-very-old version of cassandra, and im too stupid to know whether any of those things have changed.)

you need almost 300 "nodes" to perform a million writes per second?

fritz
Jul 26, 2003

i still cant get over how terrible a name 'cassandra' is for a data store

tef
May 30, 2004

-> some l-system crap ->

Arcsech posted:

learning about cassandradb since we use it in one of our products

things i know about cassandra: no-one i know who works with cassandra in their last job is looking for jobs in their next job. also they worked on operations. the other problem is that you're hosed for a better out of the box product for heavy read/write workloads

Zaxxon
Feb 14, 2004

Wir Tanzen Mekanik

Arcsech posted:

learning about cassandradb since we use it in one of our products

holy poo poo this is pretty badass. like people talk about "scalability" a lot but this is real deal*

i mean look at this poo poo:


(*as long as your data is mostly/exclusively append-only. if you want to update or delete then things get kind of... fuzzy**. fortunately for us we're basically logging and reading a shitload of realtime statistics so updates/deletes happen approximately never)

(**this link is about a now-very-old version of cassandra, and im too stupid to know whether any of those things have changed.)

Just don't fool yourself into thinking cassandra actually works like a database. CQL is very deceptive.

tef
May 30, 2004

-> some l-system crap ->
i have been learning about cassandra

the lore passed down to me is that cassandra works as long as you balance your node ring by hand, keep your clocks tightly in sync, and never use counters. it's really easy to do bad things in cassandra.

the best way to use cassandra is to denormalize the gently caress out of everything, and by that i mean, instead of writing to different tables for each model and joining them together to get results, have a whole bunch of tables that represent the results you're interested in and write to every table required when you insert. you wouldn't have an artists table and a albums table and a tracks table, you'd have a tracks_artist table, a tracks_album, a tracks_tracks table, and write "track, artist, album" to each of them, but using a different column as the primary key.

there is no other way to be able to search efficiently by artist/album/track than duplication. that is the way in which cassandra scales: writes are more costly than reads. and herein lies the problem. you have no guarantees that when you write and copy that all or any of them will be visible at the same time.

there's a couple of problems you can have here but a lot of it is that nothing is atomic. you do a read and find old and new versions of the same key. you can search through a secondary index and not find the key it points to. you'll have problems with deleting things because eventual consistency will try real hard. you might have problems with rebalancing. you'll also get things where count(unread) = 1 but unread = {}. consistency is really hard even if you're just append only. even restoring is hard because you're not sure what data you lost.

this is fine for analytics or approximate data, and cassandra does have "lightweight" transactions of a multiround paxos variant, but most of the time you'll be dealing with the eventual consistency for speed, but it's a big tradeoff

quote:

fortunately for us we're basically logging and reading a shitload of realtime statistics

the only real problem is realtime. cassandra is not really good for stream processing, and there's a lot of real interesting work around timeseries databases (and some stuff by facebook, which has a whole bunch of neat tricks), but you can sorta plug that stuff atop cassandra anyhow.

:shrug: the best thing cassandra is good for is analytic queries over large datasets which get transformed, exported, or thrown away. it's great for event logs.

but do people use cassandra like this? do they gently caress

Arcsech
Aug 5, 2008

tef posted:

the best way to use cassandra is to denormalize the gently caress out of everything, and by that i mean, instead of writing to different tables for each model and joining them together to get results, have a whole bunch of tables that represent the results you're interested in and write to every table required when you insert. you wouldn't have an artists table and a albums table and a tracks table, you'd have a tracks_artist table, a tracks_album, a tracks_tracks table, and write "track, artist, album" to each of them, but using a different column as the primary key.

:shrug: the best thing cassandra is good for is analytic queries over large datasets which get transformed, exported, or thrown away. it's great for event logs.

but do people use cassandra like this? do they gently caress

actually that is exactly what we do, both the "denormalize like gently caress" and the analytic queries, I just phrased it really drat poorly.

What we do is log per-minute statistics derived from (lots and lots of) sensor data to do queries/analysis/monitoring on, then we give the option to pull out the detailed sensor data for closer inspection based on the statistics queries

We also have a wrapper built up around it to enforce doing the right thing built by the senior dev so the rest of us morons don't gently caress poo poo up.

Also that's all we use it for, the product also uses two other databases (one relational, one nosql) for config and account data

tef
May 30, 2004

-> some l-system crap ->
well tbh it sounds like you've already got 90% of cassandra

can you explain to me what is up with this

http://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0

can anyone

the talent deficit
Dec 20, 2003

self-deprecation is a very british trait, and problems can arise when the british attempt to do so with a foreign culture






between ansible, cloudformation and swagger basically all i write is yaml

the talent deficit
Dec 20, 2003

self-deprecation is a very british trait, and problems can arise when the british attempt to do so with a foreign culture





Arcsech posted:

actually that is exactly what we do, both the "denormalize like gently caress" and the analytic queries, I just phrased it really drat poorly.

What we do is log per-minute statistics derived from (lots and lots of) sensor data to do queries/analysis/monitoring on, then we give the option to pull out the detailed sensor data for closer inspection based on the statistics queries

We also have a wrapper built up around it to enforce doing the right thing built by the senior dev so the rest of us morons don't gently caress poo poo up.

Also that's all we use it for, the product also uses two other databases (one relational, one nosql) for config and account data

we replaced riak with cassandra for a similar workload then replaced that with redshift + a poo poo ton of ram and an lru cache. redshift + cache utterly dominates cassandra in perf for our read load (where 99% of queries involve less than 1% of the table) and we can afford to batch load to redshift so write throughput is better too

it also costs us like $4k less a month

Arcsech
Aug 5, 2008

I'm still reeling from my transition from embedded c to this so, no. I also don't fully understand the jepsen article I linked to, tbh

Also this is a product we sell licenses of (or sell preloaded on a set of servers as an appliance), so we don't actually run any cassandra instances ourselves except for dev and test environments. Our customers buy hardware or cloud hosting and our SEs do ops on on the customer's hosting, basically (to my understanding)

I guess part of the reason for going with cassandra was that our customers seem to have the shittiest, slowest disks in existence and apparently cassandra is good at covering for shitass hardware

by this point you could probably figure out where I work now, so I should probably stop posting

tef
May 30, 2004

-> some l-system crap ->
like "Fortunately, you can also achieve strong consistency in a fully distributed, masterless system like Cassandra with quorum reads and writes", assuming every clock on the network is in sync at all times, but yeah https://issues.apache.org/jira/browse/CASSANDRA-6178 (spoiler: no, you can't) and like, woo, i guess, paxos, even if it's also twice as many rounds as paxos. it's lightweight, i guess.

but yeah, the first new round is for commiting to disk, so what, this is a 3 phase thing? but it doesn't seem to take locks, so uh. yes. and then another phase to do a read then a write operation, instead of a compare and swap

why does it need a read and a write phase, like, if paxos is working, propose a CAS and if a majority of them do, it'll pass, but if you don't do a quorum read you won't get an accurate read, and uh, different leaders may have different timestamps. a leader replaced by another server with a slow clock won't be able to write.

and oh, what, there's a ps: "ConsistencyLevel.SERIAL has been added to allow reading the current (possibly un-committed) Paxos state without having to propose a new update. If a SERIAL read finds an uncommitted update in progress, it will commit it as part of the read."

that's right, for the isolation mode "commit any uncommitted data read in read order (?)" , but paxos, yay

paxos is clever, but it doesn't like contention, so i'll 'read the source for how we handle errors'

code:
    *  Note that since we are performing a CAS rather than a simple update, we perform a read (of committed
     *  values) between the prepare and accept phases.  This gives us a slightly longer window for another
     *  coordinator to come along and trump our own promise with a newer one but is otherwise safe.
so yeah

tef
May 30, 2004

-> some l-system crap ->

the talent deficit posted:

between ansible, cloudformation and swagger basically all i write is yaml

i'm sorry

tef
May 30, 2004

-> some l-system crap ->

the talent deficit posted:

we replaced riak with cassandra for a similar workload then replaced that with redshift + a poo poo ton of ram and an lru cache. redshift + cache utterly dominates cassandra in perf for our read load (where 99% of queries involve less than 1% of the table) and we can afford to batch load to redshift so write throughput is better too

it also costs us like $4k less a month

redshift is MPP too.

DONT THREAD ON ME
Oct 1, 2002

by Nyc_Tattoo
Floss Finder
holy gently caress i just spent like an entire day trying to understand how webpack works. docs are confusing rear end gently caress holy poo poo

brap
Aug 23, 2004

Grimey Drawer
webpack is a bitch. I just downloaded someone's project skeleton that had the poo poo set up how I needed it.

FamDav
Mar 29, 2008

MALE SHOEGAZE posted:

so i got pretty deep into json schema and holy poo poo it's garbage and every implementation behaves differently.

every implementation works differently because the spec is written such that the implementation details of, say, arrays are split across 4-5 different sections of the spec. if you aren't consider all of these sections at the same time you will write something incorrect. it also has features like anonymous ADTs that are seldom used and difficult to represent in statically typed languages (i spent some time hacking on a model generator for it).

also tuples and arrays are defined using the same keyword but then use ~3 boolean properties to determine how tuple-y or array-y the thing is

FamDav
Mar 29, 2008
if you want something that allows you to take arbitrary json and apply some schema to it, no matter how crazy or abstract that schema might be, then json schema is for you

if you want something that gives you a schema for json that is mappable into any programming language you would use, then json schema is probably not for you

~Coxy
Dec 9, 2003

R.I.P. Inter-OS Sass - b.2000AD d.2003AD

Soricidus posted:

i'm sure it's great for low-level c programming. what i don't understand is why does something like that get used when my .net code fails an assertion when running within the visual studio 2015 debugger? i don't get why it's using ancient low-level c runtime functionality instead of just raising an exception or something like everything else in .net. idk it's not a big deal it's just ... weird

press control+alt+E and turn break on exception back on

Mahatma Goonsay
Jun 6, 2007
Yum

MALE SHOEGAZE posted:

holy gently caress i just spent like an entire day trying to understand how webpack works. docs are confusing rear end gently caress holy poo poo
yeah it is super confusing to get set up, but is pretty great once it works. if you are using react the hot module replacement is basically magic.

Mahatma Goonsay fucked around with this message at 14:52 on Dec 5, 2015

Vanadium
Jan 8, 2005

p sure there's a bug in gtk# that makes it go "(foo:5377): GLib-CRITICAL **: Source ID 6 was not found when attempting to remove it" after every non-recurring timeout :smith:

Bloody
Mar 3, 2013

visual studio updated and now it just doesnt work at all

lol

:rip:

MrMoo
Sep 14, 2000

Some crazies today, Node.js can run on Microsoft's Chakra JavaScript engine as they open sourced it, and from some crappy image host:

quote:

Imgix serves 1 billion images per day. Resizing an image takes ~700ms. They processes images on Mac Pros using Core Image to handle graphics processing in their own datacenter. Some of the tools they use are HAProxy, Heka (logs), Prometheus, Graphite, Riemann, C, Objective-C, and Lua.

https://scaleyourcode.com/interviews/interview/19

Someone is actually using Julia:

quote:

Federal Reserve Bank of NY converts major economic model to Julia. Bret Victor: Here’s an opinion you might not hear much — I feel that one effective approach to addressing climate change is contributing to the development of Julia. Julia is a modern technical language, intended to replace Matlab, R, SciPy, and C++ on the scientific workbench. It’s immature right now, but it has beautiful foundations, enthusiastic users, and a lot of potential.

https://www.reddit.com/r/programming/comments/3vb5rz/federal_reserve_bank_of_ny_converts_major/
http://worrydream.com/ClimateChange/

MrMoo fucked around with this message at 17:14 on Dec 5, 2015

Soricidus
Oct 21, 2010
freedom-hating statist shill

Bloody posted:

visual studio updated and now it just doesnt work at all

lol

:rip:

join the club

uninstalling, deleting localappdata, and then installing the updated version from scratch worked for me

apart from that hiccup, though, my experience of visual studio has been that shaggar was right

JawnV6
Jul 4, 2004

So hot ...
we just started using cassandra

it replaced mongo

Adbot
ADBOT LOVES YOU

MeruFM
Jul 27, 2010
we use hadoop with hive on top

according to the cassandra site, cassandra is way better though

  • Locked thread