Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Bruegels Fuckbooks
Sep 14, 2004

Now, listen - I know the two of you are very different from each other in a lot of ways, but you have to understand that as far as Grandpa's concerned, you're both pieces of shit! Yeah. I can prove it mathematically.

the dude who wrote that and his company are complete retards, but you'd never know it if if you didn't do this for a living from how well the blog is presented.

Adbot
ADBOT LOVES YOU

Sir Bobert Fishbone
Jan 16, 2006

Beebort
Ugh one of our critical software vendors is just starting to use Octopus to handle upgrades and deployments for their customers. This sure instills a lot of confidence!

nielsm
Jun 1, 2009



SQL code:
select CONCAT('a', 'b')
pre:
Msg 195, Level 15, State 10, Line 1
'CONCAT' is not a recognized built-in function name.
WTF?
It even autocompletes CONCAT in the SSMS editor.
MS SQL Server 10.50.6000.

Edit: Apparently it was only added in SQL Server 2012 and this is 2008 R2.
I thought CONCAT was supposed to be ancient ANSI SQL.

redleader
Aug 18, 2005

Engage according to operational parameters

nielsm posted:

SQL code:
select CONCAT('a', 'b')
pre:
Msg 195, Level 15, State 10, Line 1
'CONCAT' is not a recognized built-in function name.
WTF?
It even autocompletes CONCAT in the SSMS editor.
MS SQL Server 10.50.6000.

Edit: Apparently it was only added in SQL Server 2012 and this is 2008 R2.
I thought CONCAT was supposed to be ancient ANSI SQL.

let this be your first lesson on the true meaning of :airquote: "ansi sql"

Moonwolf
Jun 29, 2004

Flee from th' terrifyin' evil of "NHS"!


Nth Doctor posted:

It's this and never making indices

Or covering everything in them 'for speed'

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
Or their exposure to SQL is MySQL.

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

PhantomOfTheCopier posted:

Or their exposure to SQL is MySQL.

This is stupid; there is no more reason to avoid joining tables in MySQL than there is with any other database product.

NinpoEspiritoSanto
Oct 22, 2013




Hammerite posted:

This is stupid; there is no more reason to avoid joining tables in MySQL than there is with any other database product.

MySQL had horrendous performance with joins for a long time.

redleader
Aug 18, 2005

Engage according to operational parameters
can mysql do hash joins yet?

Bruegels Fuckbooks
Sep 14, 2004

Now, listen - I know the two of you are very different from each other in a lot of ways, but you have to understand that as far as Grandpa's concerned, you're both pieces of shit! Yeah. I can prove it mathematically.

redleader posted:

can mysql do hash joins yet?

in 8.0 (released 2018) mysql supports hash joins.
https://dev.mysql.com/doc/refman/8.0/en/nested-loop-joins.html

prior to 5.3 mysql join performance with large datasets was pretty bad because it only did naive nested loop joins, so joining large datasets could be really slow. after 5.3 it got more and more sophisticated about optimizing joins.

I think oracle could do hash joins in 1992.

McGlockenshire
Dec 16, 2005

GOLLOCKS!
e: n/m, pedantic bs

But if you're avoiding joins because you think they are slow you are dumb.

McGlockenshire fucked around with this message at 19:45 on Oct 17, 2019

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
I work in a place that avoids relational databases because they think they are slow. :eng99:

In point of actual fact, they just think that "infinite scalability" of cloud nosquirrel is a better operational model than stupid devs causing an outage by deploying relational databases with default settings.

They offered no indication of how to maintain relational consistency in their improved operational model. Apparently devs that can't setup a database can design protocols and data safety features atop distributed keystores.

NinpoEspiritoSanto
Oct 22, 2013




The art of the data structure is being lost in mainstream development I feel.

If I were working on anything new, if I remotely cared about consistency/safety of my data I'm using an RDBMS (personal preference Postgres).

The rest is just a crapshoot of which superfast thing you can use as a state bucket for a load balanced thing where it's not the end of the world if the data goes poof.

While software engineering is a relatively young discipline compared to other engineering fields, there's still some things I think are now in a state of "correct" and don't need further messing with. Modern RDBMS as a fast reliable and consistent data store when used properly is one of them.

Ruggan
Feb 20, 2007
WHAT THAT SMELL LIKE?!


But how is that going to help our capture rate? It’s vital we leverage a scalable keystore solution to enable high-velocity workflows. This lets our devs work closer to the grain vis a vis data, giving us a better operational model and higher agility in our core market space. Do you understand?

Agrikk
Oct 17, 2003

Take care with that! We have not fully ascertained its function, and the ticking is accelerating.
I have a MySQL variable issue that is driving me nuts.

I have a DOS script that iterates over a set of files in a directory and runs a SQL script against each file (basically loading and post-processing of each file) and as part of that process, the filename and timestamp is captured and inserted into a different table.

@externalstring is the current file name. There are four steps to post-processing so I collect five timestamps:

code:

insert into elapsed_time (fileid, phase0) select @externalString,now();

update user_daily_elapsed_time set phase1=now() where fileid=@externalString;

update user_daily_elapsed_time set phase2=now() where fileid=@externalString;

update user_daily_elapsed_time set phase3=now() where fileid=@externalString;

update user_daily_elapsed_time set phase4=now() where fileid=@externalString;
For whatever reason, the insert statement works just fine, but each of the subsequent update statements fail with the following error:

quote:

Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (cp850_general_ci,IMPLICIT) for operation '='

All my tables are set to latin1_swedish_ci so for whatever reason @externalstring is being recognized as cp850_general_ci during the updates. I tried setting the table to be cp850 but either I did it wrong (likely) or this wasn't the problem.

Help, please?

nielsm
Jun 1, 2009



Agrikk posted:

All my tables are set to latin1_swedish_ci so for whatever reason @externalstring is being recognized as cp850_general_ci during the updates. I tried setting the table to be cp850 but either I did it wrong (likely) or this wasn't the problem.

Help, please?

Try the CONVERT(a USING b) function to convert your input string to the database's encoding.

NinpoEspiritoSanto
Oct 22, 2013




Looks like the client you're running in the script is defaulting to cp850 on the connection, you should configure client charset and collation to match the tables if memory serves.

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

Agrikk posted:

I have a MySQL variable issue that is driving me nuts.

I have a DOS script that iterates over a set of files in a directory and runs a SQL script against each file (basically loading and post-processing of each file) and as part of that process, the filename and timestamp is captured and inserted into a different table.

@externalstring is the current file name. There are four steps to post-processing so I collect five timestamps:

code:

insert into elapsed_time (fileid, phase0) select @externalString,now();

update user_daily_elapsed_time set phase1=now() where fileid=@externalString;

update user_daily_elapsed_time set phase2=now() where fileid=@externalString;

update user_daily_elapsed_time set phase3=now() where fileid=@externalString;

update user_daily_elapsed_time set phase4=now() where fileid=@externalString;
For whatever reason, the insert statement works just fine, but each of the subsequent update statements fail with the following error:


All my tables are set to latin1_swedish_ci so for whatever reason @externalstring is being recognized as cp850_general_ci during the updates. I tried setting the table to be cp850 but either I did it wrong (likely) or this wasn't the problem.

Help, please?

You don't just have to worry about the tables' (columns') character sets, you also have the character sets used by the connection to worry about. Are you telling it to use latin1_swedish_ci for the connection character set as well? SET NAMES latin1 COLLATE latin1_swedish_ci;

BTW It's probably better to use utf8_mb4 than latin1, or ascii if you are certain you'll only store ascii data.

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!

Ruggan posted:

But how is that going to help our capture rate? It’s vital we leverage a scalable keystore solution to enable high-velocity workflows. This lets our devs work closer to the grain vis a vis data, giving us a better operational model and higher agility in our core market space. Do you understand?
Imagine driving down the highway knowing where only 80% of the other drivers are at any given time. Now take that but randomly switch which 20% are missing every second. Now go read about eventual consistency. :clint:

DELETE CASCADE
Oct 25, 2017

i haven't washed my penis since i jerked it to a phtotograph of george w. bush in 2003
imagine still using mysql in the year of our lord jesus christ two thousand nineteen

Ruggan
Feb 20, 2007
WHAT THAT SMELL LIKE?!


PhantomOfTheCopier posted:

Imagine driving down the highway knowing where only 80% of the other drivers are at any given time. Now take that but randomly switch which 20% are missing every second. Now go read about eventual consistency. :clint:

ButtcoinDB

NinpoEspiritoSanto
Oct 22, 2013




DELETE CASCADE posted:

imagine still using mysql in the year of our lord jesus christ two thousand nineteen

Yeah I'm stuck with it at work on this particular product but gently caress me I wouldn't put anything new on it. Not even latest/greatest MariaDB. Sure it works to a point but there's no compelling reason to use it over anything else, certainyly not Postgres.

Pollyanna
Mar 5, 2005

Milk's on them.


DELETE CASCADE posted:

imagine still using mysql in the year of our lord jesus christ two thousand nineteen

There's been a push at my company to force all new services to use MySQL to get everyone on the same database, even though our MySQL DBs are consistently the ones that eat poo poo all the time and piss off our customers.

Agrikk
Oct 17, 2003

Take care with that! We have not fully ascertained its function, and the ticking is accelerating.

DELETE CASCADE posted:

imagine still using mysql in the year of our lord jesus christ two thousand nineteen

I’m curious about this sentiment.

What should be used instead? Postgres? MariaDB? Hadoop? :D

Does your feeling carry over to AWS Aurora MySQL vs Aurora Postgres?

NinpoEspiritoSanto
Oct 22, 2013




The problem is with MySQL and derivatives. While improving, a lot of decisions made early on still plague MySQL and lead to scenarios where unexpected things happen with data. Only until recently, default behaviour on a not null char column would simply insert an empty string if you passed null. This is just one example of many.

Only recently I logged a bug report with a stable version of Maria where you can make it reliably crash with one of the query optimizers enabled. Up until very recently the table cache and its iterator was still single threaded which runs into performance walls very quickly on a dbms with many tables. Character set and collation handling remain a total mess. The query planner still makes incredibly odd decisions and it along with innodb still have a googolplex of configuration options and none of the defaults are particularly useful in anything outside a toy scenario. Point in time snapshots are still hugely time consuming with official tools and all rely on crash recovery to restore. JSON data type in Maria is a thin veneer around longtext, and MySQL, Percona and Maria all use their own takes on InnoDB.

InnoDB still has arbitrary row size restrictions and it remains impossible to truly predict how much RAM it will consume based on config/data set.

It's a loving pig of a DBMS and there's no reason to pick it over Postgres in 2019 and I'm sure there's other better open source alternatives too these days.

e: Also because of weird defaults in the behaviour that were in place for literal decades, programmers became accustomed to such behaviour which opens up a whole other can of worms when the defaults change/options become available to tighten that stuff up. It's nowhere near my list if I'm considering a database for any new project and serious questions should be asked of anyone saying "we should use MySQL/Percona/MariaDB for this".

NinpoEspiritoSanto fucked around with this message at 16:35 on Oct 19, 2019

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!

Bundy posted:

:eng101:

... serious questions should be asked of anyone saying "we should use MySQL/Percona/MariaDB for this".
An excellent post.

On the last point, developers I swear. "But it's what everyone uses" is the extent of their knowledge and then they ignore every reason you've given to choose a more appropriate database, and here I mean the specific reasons that a different DB solves their particular problem. The other great one is "but no one here is familiar with PostgreSQL". I hate to break it to you, but you aren't really familiar with MySQL either.

TheFluff
Dec 13, 2006

FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE

Agrikk posted:

I’m curious about this sentiment.

What should be used instead? Postgres? MariaDB? Hadoop? :D

Does your feeling carry over to AWS Aurora MySQL vs Aurora Postgres?

Postgres or MS SQL Server. The latter is expensive though.

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe
"All the developers here have used MySQL before and are used to it, and none of them have used PostgreSQL before" is a very strong reason to use MySQL for a new project. Mere technical superiority in multiple aspects of the software is irrelevant.

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



Bundy posted:

It's a loving pig of a DBMS and there's no reason to pick it over Postgres in 2019 and I'm sure there's other better open source alternatives too these days.

"AWS charges less for MySQL/Maria instances." Glad I don't work there anymore.

Hammerite posted:

"All the developers here have used MySQL before and are used to it, and none of them have used PostgreSQL before" is a very strong reason to use MySQL for a new project. Mere technical superiority in multiple aspects of the software is irrelevant.

The level of familiarity most devs have with any particular relational database is superficial enough to transfer to any other, with the possible exception of old versions of MySQL where you'd have to do very silly things to work around it's shortcomings, in which case they really need to unlearn those habits.

Agrikk
Oct 17, 2003

Take care with that! We have not fully ascertained its function, and the ticking is accelerating.
I tell you right now that I’ve been doing A/B comparisons between Aurora Postgres and Aurora MySQL and insert performance on MySQL is garbage.

I’m trying to get MySQL to work because I want to play with multi-region replication but I can’t get 2200 CSV files to load in anything resembling reasonable performance expectations.

NinpoEspiritoSanto
Oct 22, 2013




Even with load data infile?

Agrikk
Oct 17, 2003

Take care with that! We have not fully ascertained its function, and the ticking is accelerating.
Yeah. I’m loading 2200 CSV files [each with 1.5 million rows of data consisting of five columns with a total character width of no more than 86] into a single table with an auto-incrementing rowID and an index on three of the five columns.

I recognize that I could drop the index and recreate it after, but that won’t help me in the long run since I’ll be adding data to this table every night and the table has to be useable as fresh data is added.

abelwingnut
Dec 23, 2002


a bit of a doozy here.

i have a table A that contains a computer, its ip address, and various other fields. i have another table B that contains an ip range, in the form of a.b.c.d/e, and fields about that ip range.

i am best trying to join these two tables on the ip address. basically, if an ip address in table A falls into an ip range defined in table B, join the rows. so for instance, let's say i have comp1 at ip 1.2.3.4 in table A. if in table B there's a entry for ip range 1.2.3.0/24, these two should join.

the service i am using is called snowflake. it allows json objects within a normal sql row, so to speak. it also has a function called PARSE_IP, which when i run that on the ip ranges in table B, gives me an object with the start and end ip address for that range, among other information. i can also use a function GET to extract each of those fields within the object.

so given those options, i thought i'd write a join like this, knowing full well it would likely be slow:

code:
SELECT
	blah
FROM
	tablea a
        JOIN tableb b ON
			 a.ipaddress
			 BETWEEN
			 Get(Parse_ip(b.ipaddressrange), 'ipStartAddress')
			 AND
                 	 Get(Parse_ip(b.ipaddressrange), 'ipEndAddres') 
this, turns out, is just too drat slow. these tables are massive, and the ranges vary, and there's just too much going on that it is impossible to use, even with snowflake's decent computational speeds.

so next thought was to flatten table B into, essentially, a lookup table. therefore, in the example i wrote above, where 1 row contained ipAddressRange = 1.2.3.0/24 and the other fields, you'd have 256 rows:

1.2.3.0 | other fields for that range
1.2.3.1 | other fields for that range
1.2.3.2 | other fields for that range
...

then perform this on the entire table B, producing a huge lookup, which would then clean that JOIN up into something way simpler and more efficient.

coming from straight sql land where i don't have to deal with inserted json objects (like the idea, but boy howdy is it slow in this case), this feels like the natural solution.

however, i'm wondering if any of you can think of a better way to join these given what i've specified. i'm not entirely sure how to 'flatten' the object like that, especially given i can't write functions in this drat thing. i think i would have to convert the addresses to binary or decimal, then tell it to fill rows from a to b somehow. i don't know, snowflake is a mess.

like would it make more sense to separate the octets into their own columns, and create a join on all four columns? snowflake doesn't use indexing like normal sql. it uses its own clustering ability, which is loving godawful. so i'm trying to battle the inefficiency of their function and their lack of true indexing.

any thoughts are appreciated.

abelwingnut fucked around with this message at 17:40 on Oct 23, 2019

TheFluff
Dec 13, 2006

FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE
Have you tried treating the IP's as 32-bit integers? As in, to check if 1.2.3.4 is in the range 1.2.3.0/24 you mask off the lowest 8 bits by binary AND'ing the IP with a mask with the 24 highest bits set and then compare that with the range IP, so like (0x01020304 & (1 << 24)) == 0x01020300 (C-like syntax, << is the bit shift left operator). May not actually be faster if you have to convert the addresses from strings, but if they're stored as 4-byte binary values it's probably worth a shot.

TheFluff fucked around with this message at 19:15 on Oct 23, 2019

Ruggan
Feb 20, 2007
WHAT THAT SMELL LIKE?!


Who owns the schema? Do you have the ability to make structural changes to tables?

Edit: Prob not, rereading your post. What database?

Ruggan fucked around with this message at 21:36 on Oct 23, 2019

abelwingnut
Dec 23, 2002


i can create tables at will, and i might have some leeway in altering existing tables, but probably not much.

yea, like when i mentioned flattening table B, i meant making a new table C that had one row for every ip within all ranges found in table B. flattening might not be the right word, but uhh...unnesting? squashing? not sure but i hope you catch my drift.

ideally i could add columns with the address stored in binary, decimal, hex in those tables and just compare without having to call the PARSE_IP. that would have to be so much faster. but yea, i don't think i can add those. too many other things feed into them and that'd be a massive rewrite of some scripts.

abelwingnut fucked around with this message at 21:37 on Oct 23, 2019

Ruggan
Feb 20, 2007
WHAT THAT SMELL LIKE?!


Yeah. What database system?

abelwingnut
Dec 23, 2002


it's on this online service snowflake. it's its own thing. https://www.snowflake.com/

its sql language is largely similar to t-sql or mysql, but you're VERY limited beyond basic dml and basic ddl. like i can't create functions or sprocs or anything like that.

if i go the route of flattening/unnesting/squashing that i mentioned, i'm reading about doing a rowcount cte combined with a number table. i...think i'm wrapping my head around it, at least.

more info here: https://dba.stackexchange.com/questions/59309/how-to-dynamically-generate-rows-based-on-value-in-a-column

abelwingnut fucked around with this message at 21:52 on Oct 23, 2019

Just-In-Timeberlake
Aug 18, 2003

abelwingnut posted:

it's on this online service snowflake. it's its own thing. https://www.snowflake.com/

its sql language is largely similar to t-sql or mysql, but you're VERY limited beyond basic dml and basic ddl. like i can't create functions or sprocs or anything like that.

man that product name is loving on the nose

Adbot
ADBOT LOVES YOU

Ruggan
Feb 20, 2007
WHAT THAT SMELL LIKE?!


Creating a whole range of rows for every combo sounds very messy though. You could instead take your example of 1.2.3.0/24 and represent it with three columns:

FirstThreeOctets: 1.2.3
FourthOctetFrom: 0
FourthOctetTo: 24

Can you create persisted calculated columns in your tables that represent it in this way? Or even fully separate the octets? This way you can continue to store it the way you do today but facilitate better joins.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply