DENORMALIZE YOURSELF AND FACE TO WEB SCALE. STATE IS A FUCK

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > YOSPOS > DENORMALIZE YOURSELF AND FACE TO WEB SCALE. STATE IS A FUCK

«‹›7 »

git apologist: Jun 4, 2003

Roosevelt posted:

mongodb ftw

actually it�s quite bad

# ? Mar 3, 2024 05:29

Adbot: ADBOT LOVES YOU

# ? May 3, 2024 00:49

git apologist: Jun 4, 2003

what a perfect snipe for yospos

# ? Mar 3, 2024 05:29

Shaggar: Apr 26, 2006

git apologist posted:

shaggar says induced demand is fictional

for roadways it is. the correct term in that case is unrealized demand

# ? Mar 3, 2024 05:36

psiox: Oct 15, 2001; Babylon 5 Street Team

mongo is such amateur (derogatory) trash and i miss riak being a real thing

# ? Mar 3, 2024 05:39

fart simpson: Jul 2, 2005; DEATH TO AMERICA

Elder Postsman posted:

i use postgres at home, and most of the stuff i've written that touches it uses entity framework

why are you database touching at home???

# ? Mar 3, 2024 05:40

Elder Postsman: Aug 30, 2000; i used hot bot to search for "teens"

i use it to keep track of dnr fishing surveys, and also to track airplanes so i can get a notification when interesting ones are flying nearby

# ? Mar 3, 2024 05:45

rotor: Jun 11, 2001; classic case of pineapple derangement syndrome

git apologist posted:

shaggar says induced demand is fictional

shaggar is a loveable scamp

# ? Mar 3, 2024 06:11

Scud Hansen: Dec 13, 2015; Darkness and Evil

Elder Postsman posted:

so i can get a notification when interesting planes are flying nearby

hmm ive been doing this manually like a loving fool

# ? Mar 3, 2024 06:14

Powerful Two-Hander: Mar 10, 2004; Mods please change my name to "Tooter Skeleton" TIA.

I'm about to have an argument that SQL is good, actually, because every time databases come up someone says "microservices!" As if that is somehow a substitute for ACID

# ? Mar 3, 2024 11:42

git apologist: Jun 4, 2003

sql is good for almost everything, you should qualify out of that first before building a rube goldberg app

# ? Mar 3, 2024 11:57

Subjunctive: Sep 12, 2006; ✨sparkle and shine✨

Powerful Two-Hander posted:

I'm about to have an argument that SQL is good, actually, because every time databases come up someone says "microservices!" As if that is somehow a substitute for ACID

SQL isn�t good, but that has exactly zero to do with microservices being bad, which they are.

and SQL doesn�t always get you ACID, and you can have ACiD without SQL

# ? Mar 3, 2024 12:08

Cybernetic Vermin: Apr 18, 2005

sql, query planning, and acid are all rather separate things though.

sql is obviously bad because its only purpose is to be a friendly syntactic sugar layered on top of a database, to make it easy for anyone to learn and work with databases. and it pretty much does the opposite. not that much in the way of alternatives though, but otoh things just doing fake "sql" tend to immediately do a better job since it is so easy to grab a subset which lets most people get started, and then do whatever you need to to make the rest friendlier.

query planning is not *as* obviously bad, but in designing the database you need to plan for how common queries will be handled *anyway* (i.e. which indices do we include, where do we tweak normalization to make it practical), and query planning basically undermines this work. it is too hard a problem to solve reliably, and end result it might give you some tiny wins but mostly you just run the risk of something without warning turning expensive in production (which is hard to predict/diagnose in test since its all masked by query optimization).

acid is cool, good and the main job of the system, accept no imitations.

e: the combination of sql the language and query planning is what makes this especially problematic, because really you should be able to express something akin to a plan to the system, and then the system can try to optimize that under the constraint that it has to provably not introduce huge regressions. i.e. what you'd expect a compiler for programming languages to do: find all clean wins, but don't imagine that you'll just rewrite the program in an entirely different way. but sql is not really structured in a way that makes that easy.

Cybernetic Vermin fucked around with this message at 12:15 on Mar 3, 2024

# ? Mar 3, 2024 12:11

shitface: Nov 23, 2006

Powerful Two-Hander posted:

I'm about to have an argument that SQL is good, actually, because every time databases come up someone says "microservices!" As if that is somehow a substitute for ACID

git apologist posted:

sql is good for almost everything, you should qualify out of that first before building a rube goldberg app

we�ve gone through some weird-rear end attempts at replacing our traditional analytics solution whose only crime was first having an ugly java web start client (when this was a common thing) to then having an ugly web client. this thing worked absolutely loving perfectly, could suck down events like nobody�s business and wrote them to a standard rdbms of your choice. everyone we sold it to could use it to produce useful analytics, since everyone has some sql skills, somewhere in their business. occasionally they would lose grip on performance and each time I�ve gone in and with an index here and a query rewrite there and taken executions times down like 5x to 60x

since then we�ve had several attempts at replacing it with some proprietary temporal db thing we bought, with elastic, and probably some others that I�ve blocked out of my memory. none of them loving properly and none of them our customers had any actual hope of understanding (yeah yeah, skill issue etc.), but especially the proprietary temporal one. if I had a penny for every time I heard, �hey shitface my servers are 100% cpu all the time, what happened� and the answer is �somehow you ingested an event dated 6 months ago and it�s now recalculating everything between then and now. good luck�

just write a better looking web ui on the sensible poo poo already so we can get back to some sanity

e: but also what cybernetic vermin said. sql itself is not a magic bullet but its generally way easier to work with a traditional rdbms than some shard up your rear end bullshit

# ? Mar 3, 2024 12:23

Asleep Style: Oct 20, 2010

gather round everybody, it's time for an effort post on the biggest piece of poo poo that AWS has ever created: their Quantum Ledger Database, aka QLDB

from their docs: Amazon Quantum Ledger Database (Amazon QLDB) is a fully managed ledger database that provides a transparent, immutable, and cryptographically verifiable transaction log.

I'm pretty sure that the docs used to say BLOCKCHAIN a lot but they've toned that down apparently

QLDB is a noSQL document db, but the most important thing to understand about it is that it's journal-first. it doesn't store data directly, it stores transactions. you want your data? no problem, the query will replay the transaction history and reconstruct your data. this architecture has Implications

let's start with some of the nice things, since that will be brief. you get audit logs more or less for free, since the whole transaction history is right there. it's serverless, so spinning up a new ledger is easy. cloud costs are low compared to something like postgres in RDS

now for some of the downsides. I'm going to rip this first one straight from the AWS docs, because they say it better than I can.

quote:

Backup and Restore
Can I take a snapshot or backup of my ledger?

Amazon QLDB does not support a backup and restore feature as of now. At present, an export to S3 functionality is available. Using this functionality you can export the contents of your QLDB journal to S3.

Can I restore my ledger to a particular point in time?

Amazon QLDB does not support a point-in-time restore feature as of now.

yes, that's right. no backups

you're also limited to only 20 tables. yes, it's a document db, not a relational db, so you won't be normalizing your data into a million tables, but with only 20 you will run out quickly. oh, also a dropped table counts as an inactive table, and you're limited to a total of 40 active and inactive tables. with enough new features and architecture changes in your application you may reach a point where your only option is to create an entirely new ledger

another limitation is that each table can only have 5 indexes. since the db is journal first, god help you if you need to query something without an index. if you have a sufficiently large amount of data in a table, even an index won't save you because of the biggest problem of all:

30 second transaction timeout. if your query takes longer than 30 seconds, it times out and returns nothing. it takes a surprisingly small number of documents in a table before even a query like:

code:

SELECT id FROM <table>

will time out, EVEN THOUGH ID IS AN INDEXED FIELD. no problem, let's just just add a LIMIT statement. what do you mean that's not supported? fine, we really need this query to run, can we beef up the instance? oh right, serverless

your best shot to actually get data out when you need it is to have a high-cardinality indexed field that you can set a range on. something like:

code:

SELECT created_time FROM <table>
    WHERE created_time > 2024-03-01::00:00 AND 
    created_time < 2024-03-01::00:15

make the window small enough so that you no longer get transaction timeouts and slide it over whatever range you actually need. without equality predicates you don't actually get an indexed lookup, you get a full table scan and TRANSACTION TIMEOUT. if you have a gun to your head and need to build a greenfield QLDB-based application after reading this post, I suggest integer ids

would you like to do a join? go gently caress yourself. it's supported, but journal first means that it is going to take way too long. both your join criteria and your equality predicates had better be high-cardinality indexes

is QLDB the wrong choice for what mostly boils down to a CRUD app? absolutely, but that decision predates me and this was the VP's pet technology. the thing is I can't think of a situation where QLDB is possibly the right choice. there's no problem space where the benefits outweigh the downsides. dealing with your own custom audit table implementation is going to be an order of magnitude less trouble than QLDB.

that VP left so I was finally able to get permission to migrate our application to postgres. our integration test suite takes 45 minutes to run with QLDB. I've migrated ~25% of the tests so far and I still haven't cracked a minute. this is the most naive possible implementation, dropping tables and recreating them for every single test. using an ORM without any by-hand SQL queries. when the migration is done it's looking like the application is going to be somewhere between 10-100x faster with a bunch of meat left on the bone for future optimizations if we ever need them. I can't wait to be rid of this horrible piece of poo poo

# ? Mar 3, 2024 17:41

Shaggar: Apr 26, 2006

SQL is pretty good, actually. If you write sql and its bad you should probably learn to write good sql instead

# ? Mar 3, 2024 18:20

Cybernetic Vermin: Apr 18, 2005

nah, you're unusually just wrong there. the basics work but are really stupidly structured, and as you get into the depths (which happens after the first 5% of the spec) it is just pants-on-head nonsense.

if datalog had been recursion-free by default it would likely have won out, and we would all have been way better off. the sql spec has the dumb recursion too of course (let's just do unbounded computation in our database system guys!), but datalog headlines that arguable misfeature.

# ? Mar 3, 2024 18:26

Subjunctive: Sep 12, 2006; ✨sparkle and shine✨

does actual SQL have recursion, or is it just vendor extensions?

not being able to name and reference a subquery makes nested subqueries pretty lovely to read; this is a thing that Malloy gets right IMO

# ? Mar 3, 2024 18:32

Cybernetic Vermin: Apr 18, 2005

Subjunctive posted:

does actual SQL have recursion, or is it just vendor extensions?

not being able to name and reference a subquery makes nested subqueries pretty lovely to read; this is a thing that Malloy gets right IMO

i believe it is in the spec with some keyword, but obviously in one of the more recent ones that are checkboxes more about "are you a big enough vendor to deal with this crap and retain dod approved status?" than anything

# ? Mar 3, 2024 18:35

Shaggar: Apr 26, 2006

idk if CTEs are in "actual sql" but i use them for recursion.

# ? Mar 3, 2024 18:38

Powerful Two-Hander: Mar 10, 2004; Mods please change my name to "Tooter Skeleton" TIA.

Subjunctive posted:

does actual SQL have recursion, or is it just vendor extensions?

not being able to name and reference a subquery makes nested subqueries pretty lovely to read; this is a thing that Malloy gets right IMO

I think you can do that with CTEs?

the godawful dB on this application at work has like no schema at all which is an artefact of the front end platform but the tables written internally are also poo poo and the procs used are loving terrible as well

temp tables out the rear end, CTEs, dynamic SQL, parameters passed as separated strings, xml generated in the database :discourse:

I mean I remember when I first discovered temp tables and used them for everything but this is extreme

# ? Mar 3, 2024 18:38

Cybernetic Vermin: Apr 18, 2005

Shaggar posted:

idk if CTEs are in "actual sql" but i use them for recursion.

oh, yeah, that's what i was grasping for. sql99 apparently, so not even that late

# ? Mar 3, 2024 18:40

Powerful Two-Hander: Mar 10, 2004; Mods please change my name to "Tooter Skeleton" TIA.

this is the one people keep shouting "microservices!" about when it comes to a replacement and I'm supposed to be starting the feature spec for the replacement next week and I fully expect to have many many arguments about why [insert web scale platform that the architects have a boner for right now] with the internal architecture functins

# ? Mar 3, 2024 18:40

Subjunctive: Sep 12, 2006; ✨sparkle and shine✨

ah, OK

my SQL has never been very strong and I�ve never felt like I really understood it systematically (relational division? what?), so I�m nibbling my way through SQL and Relational Theory by Chris Date and it has been very interesting so far

also I�m pretty sure he has ADHD

# ? Mar 3, 2024 18:41

Subjunctive: Sep 12, 2006; ✨sparkle and shine✨

real architects carry pagers

# ? Mar 3, 2024 18:43

PIZZA.BAT: Nov 12, 2016

Asleep Style posted:

QLDB is a noSQL document db, but the most important thing to understand about it is that it's journal-first. it doesn't store data directly, it stores transactions. you want your data? no problem, the query will replay the transaction history and reconstruct your data. this architecture has Implications

gotcha. so it's like most nosql databases only they turned off the journal cleanup, as a joke

# ? Mar 4, 2024 00:56

Subjunctive: Sep 12, 2006; ✨sparkle and shine✨

so why aren�t SQL user-defined types used more widely? they�re in 2003 (but not mysql of course, and maybe only partially in some others?), but I don�t see ORMs producing them or pretty much any tooling around them, or recommendations to use them in documentation or learning materials

seems like they would be a pretty valuable tool for data modelling and abstraction, given how central the RDBMS representation is to a lot of software design these days

# ? Mar 4, 2024 01:48

Shaggar: Apr 26, 2006

They have a history of being misused to represent structures that should just be a table and these days if you want to cram a bunch of random trash into a column theres the json data type.

# ? Mar 4, 2024 02:08

Sapozhnik: Jan 2, 2005; Nap Ghost

sql has a bunch of problems, i'd say that a big problem is that it is too low-level but also doesn't lend itself to clean abstractions being built on top of it for whatever reason.

typically with a business database you want to be able to do stuff like soft deletes, point-in-time snapshots, auditing, time-series partitioning, that sort of thing. these are all related, but you have to solve all of these problems from scratch in your application layer, and these are problems everybody has eventually. the closest thing to a common solution to a lot of these problems is orms and that cure seems worse than the disease (i find myself saying that phrase a lot).

people replacing sql-based databases always seem to be trying to solve the wrong problems. they're always trying to build "web scale" poo poo but actual "web scale" operations don't buy off the shelf solutions, they build in-house talent that can build and maintain a suitable solution (note the emphasis on the talent and not the solution). or not even that, stackoverflow, facebook(?), uber etc all use mostly standard open source sql data stores with some kv stuff on the side aiui. what everybody else needs is something that solves the common problems (including joins and transactions tyvm) at medium scale and doesn't have bizarre performance characteristics that cause it to poo poo the bed at 3 am. even then i'd be real wary about making some vc backed bullshit the core of my business, if i was building a business i mean.

Cybernetic Vermin posted:

sql, query planning, and acid are all rather separate things though.

sql is obviously bad because its only purpose is to be a friendly syntactic sugar layered on top of a database, to make it easy for anyone to learn and work with databases. and it pretty much does the opposite. not that much in the way of alternatives though, but otoh things just doing fake "sql" tend to immediately do a better job since it is so easy to grab a subset which lets most people get started, and then do whatever you need to to make the rest friendlier.

query planning is not *as* obviously bad, but in designing the database you need to plan for how common queries will be handled *anyway* (i.e. which indices do we include, where do we tweak normalization to make it practical), and query planning basically undermines this work. it is too hard a problem to solve reliably, and end result it might give you some tiny wins but mostly you just run the risk of something without warning turning expensive in production (which is hard to predict/diagnose in test since its all masked by query optimization).

acid is cool, good and the main job of the system, accept no imitations.

e: the combination of sql the language and query planning is what makes this especially problematic, because really you should be able to express something akin to a plan to the system, and then the system can try to optimize that under the constraint that it has to provably not introduce huge regressions. i.e. what you'd expect a compiler for programming languages to do: find all clean wins, but don't imagine that you'll just rewrite the program in an entirely different way. but sql is not really structured in a way that makes that easy.

iawtp

# ? Mar 4, 2024 02:29

Asleep Style: Oct 20, 2010

PIZZA.BAT posted:

gotcha. so it's like most nosql databases only they turned off the journal cleanup, as a joke

yes, exactly. then the marketing folks got ahold of it and decided that actually the joke was the central feature

# ? Mar 4, 2024 05:04

redleader: Aug 18, 2005; Engage according to operational parameters

my query planner high score is something like "estimated number of rows to read: 50, actual number of rows read: 50,000,000,000". the actual result set returned to the client would have been <= 100 rows

# ? Mar 4, 2024 06:40

redleader: Aug 18, 2005; Engage according to operational parameters

the mssql query planner is an incredible piece of engineering. it takes some of the most hosed up, dogshit sql i've ever seen on large tables on a chronically underresourced server, and returns results in a (usually) very reasonable amount of time

pity that query planning is a problem that shouldn't be solved by building a smarter planner

# ? Mar 4, 2024 06:44

cowboy beepboop: Feb 24, 2001

we just use MySQL, it's been good enough ever since innodb was introduced

# ? Mar 4, 2024 09:43

Powerful Two-Hander: Mar 10, 2004; Mods please change my name to "Tooter Skeleton" TIA.

just lol if you haven't spent an afternoon rewriting an entire query to try and make it more efficient by using all sorts of look through tricks and ending up with either the exact same performance or performance improvements equivalent to taking the planner index suggestions and just spamming the poo poo out of them

# ? Mar 4, 2024 12:40

PIZZA.BAT: Nov 12, 2016

i'm hearing vector databases are the new hotness that's getting all the vc attention

# ? Mar 20, 2024 14:17

Archduke Frantz Fanon: Sep 7, 2004

PIZZA.BAT posted:

i'm hearing vector databases are the new hotness that's getting all the vc attention

Did you like it when your queries returned exactly what you asked for? well now it simply guesses!

# ? Mar 20, 2024 14:33

Kernel Sanders: Sep 15, 2020

i just use xml files op

# ? Mar 20, 2024 14:58

Subjunctive: Sep 12, 2006; ✨sparkle and shine✨

Archduke Frantz Fanon posted:

Did you like it when your queries returned exactly what you asked for? well now it simply guesses!

are vector databases now doing LLM-style random walk everywhere? I thought it was just nearest-neighbour with lots of dimensions

# ? Mar 20, 2024 15:01

Archduke Frantz Fanon: Sep 7, 2004

Subjunctive posted:

are vector databases now doing LLM-style random walk everywhere? I thought it was just nearest-neighbour with lots of dimensions

that would make more sense but the article i quickly looked up was really focused on their use for machine learning so expect that to infect them in the coming months

# ? Mar 20, 2024 15:11

Generic Monk: Oct 31, 2011

redleader posted:

the mssql query planner is an incredible piece of engineering. it takes some of the most hosed up, dogshit sql i've ever seen on large tables on a chronically underresourced server, and returns results in a (usually) very reasonable amount of time

pity that query planning is a problem that shouldn't be solved by building a smarter planner

iawtp, although it's the main thing keeping my coworkers employed so who's to say whether it's good or not

# ? Mar 20, 2024 15:15

Adbot: ADBOT LOVES YOU

# ? May 3, 2024 00:49

TheFluff: Dec 13, 2006; FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE

Sapozhnik posted:

typically with a business database you want to be able to do stuff like soft deletes, point-in-time snapshots, auditing, time-series partitioning, that sort of thing. these are all related, but you have to solve all of these problems from scratch in your application layer, and these are problems everybody has eventually. the closest thing to a common solution to a lot of these problems is orms and that cure seems worse than the disease (i find myself saying that phrase a lot).

system-versioned temporal tables have been in the standard since SQL:2011 but so far I think only mssql and oracle have implemented them. I've used the feature in mssql and it owns, you get basically all the stuff you mention for free with really nice syntax for time traveling too

# ? Mar 20, 2024 16:14

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > YOSPOS > DENORMALIZE YOURSELF AND FACE TO WEB SCALE. STATE IS A FUCK

«‹›7 »