Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
abelwingnut
Dec 23, 2002


I've been learning SQL on the job over the last 3 months. Pretty useful, but it's definitely got some weird characteristics. Are there any more recent, friendlier scripting languages that are in vogue these days? I realize SQL's old and there's probably something 'better', at least in some sense, out there. Just curious what the alternatives are.

Mac friendly would be nice too.

Adbot
ADBOT LOVES YOU

abelwingnut
Dec 23, 2002


any sql server dba here? things be making no sense with IS and sql server agent and i'm looking for guidance.

abelwingnut fucked around with this message at 02:20 on Feb 27, 2016

abelwingnut
Dec 23, 2002


will do. i'll preface by saying i'm a developer, not a dba. this may be some basic login credential chicanery.

running sql server 2012. IT just created the IS catalog this past week. i built a package on my machine locally, debugged it, built it, deployed it to the server. great. runs fine from vs on my local machine in debug mode.

when i right-click -> execute it from ssms on my local machine under my active directory-controlled account, it fails. when i right-click -> execute it from ssms on the server under the active directory-controlled, default account for that machine, it runs. when it runs as a job from sql server agent, it fails. when it fails, it errors with 'data flow error: warning: access is denied', file destination subcomponent. the package queries and puts the results in a tab-delimited file on a network share. my account has full access to the share.

what is going on here? my account has the same rights as the other account that succeeds. the fact it runs at all tells me it's not the package itself.

so yea, there are two aspects here: how can i get it to run from my local machine under my account, and how can i get sql server agent to run it.

abelwingnut fucked around with this message at 18:14 on Feb 27, 2016

abelwingnut
Dec 23, 2002


just right-click > run as admin on vs, then deploy from there?

the default account on the server is a network account. sql server agent and IS are running under LocalSystem, i think

abelwingnut
Dec 23, 2002


another SSIS question

i wrote an SSIS package that generates a report. basic flow is declare table variable > fill table variable with select statement that's a union of two other tables > another select statement calling from about eight tables, the table variable included > spit out results to csv. the query takes 40m to run for some reason--i'll post about that later. but it turns out when i was testing the package it locked the tables it was calling from and crashed the client programs. i'm not writing anything to any tables, only to the table variable. when i run the query outside of the package, no such crashing.

do IS packages lock tables or something? i've never experienced that in the past, probably because i've never had a query run this long.

abelwingnut fucked around with this message at 02:21 on Mar 10, 2016

abelwingnut
Dec 23, 2002


thanks. i defined them with source assistant, sql server.

how should i define them to avoid this apparent locking? i'm googling but not having much luck.

abelwingnut
Dec 23, 2002


speaking of, isolation levels and locking and transaction management are something i never really got into. have a good primer on hand?

abelwingnut
Dec 23, 2002


could use group by or partition over. i'll leave the actual query writing as an exercise for the reader :)

abelwingnut
Dec 23, 2002


isn't this what partition by over is built for?

abelwingnut
Dec 23, 2002


customize this for your specific table

SELECT
table_schema
,table_name
,column_name
,data_type
,character_maximum_length
,is_nullable
,column_default
,numeric_precision
,numeric_scale
FROM INFORMATION_SCHEMA.columns
WHERE table_name = 'blah'

abelwingnut
Dec 23, 2002


SELECT max(len(column)) FROM table

abelwingnut
Dec 23, 2002


i get what you're saying now. i don't know how to do that in one go off the top of my head without writing an sp. if this is a one-off and there aren't too many columns, i'd investigate each column. something like:

SELECT id FROM table WHERE len(column) > whatever the maximum character length is of the target column in the target table

if no results, then you know the column's fine. if there are results, then fix that column.

alternatively, you could just insert data with a LEFT on the data so that it truncates to whatever length you need. not sure how important the integrity is. also, be sure to use rtrim and ltrim when inserting.

abelwingnut fucked around with this message at 14:40 on Jun 29, 2016

abelwingnut
Dec 23, 2002


speaking of, how do you guys fine tune? just comb through the execution plans and fix any and all areas with lots of reads? what other stats do you look at?

abelwingnut
Dec 23, 2002


dropping in to say just how loving much i hate importing and exporting between different db engines and all the fun with encodings and datatypes this brings. currently trying to export data from sql server into an acceptable format for postgres and its copy function. apparently whatever utf-8 sql server spits out is not acceptable for postgres, but the regular latin works? and now i'm having such fun with NULL in an integer column. and this is all after learning that i can't import data into a postgres rds instance from an s3 bucket, yet you can into a mysql rds instance, which just seems amazingly dumb. oh, and this is all because my company refuses to just allow me to connect sql server to the aws rds instance and use database migration service to link them up

such joy!

abelwingnut fucked around with this message at 00:28 on Mar 11, 2019

abelwingnut
Dec 23, 2002


yea, i just discovered checking unicode defaults to utf-16, not utf-8. so that is certainly part of the problem. reworking that all now and going to see if that helps.

but yes, i do believe the byte order marker is spooking postgres. i get the 'oxFF' error, even. not really sure how to avoid sql server spitting that out

abelwingnut
Dec 23, 2002


just came across memSQL. anyone used it?

abelwingnut
Dec 23, 2002


a dingus posted:

How do I query dates from an access server when the server is storing dates as strings in the format 'YYYY-MM-DD 00:00:00.00000'? I am trying to pull a range of dates, sometimes the > or < operators work but other times the call fails.

that’s standard datetime format. i would use:

WHERE date BETWEEN ‘oldestdate’ AND ‘mostrecentdate’

also, note that sql engines assume 00:00:00.00000 if no time is given with the dates

abelwingnut
Dec 23, 2002


lookup or dimension table

abelwingnut
Dec 23, 2002


my god. have any of you had to use snowflake?

:ohno:

abelwingnut
Dec 23, 2002


Tax Oddity posted:

No, but I've heard lots of buzz about it, been meaning to check it out. No good?

i feel like the system underlying it might be worthwhile, but the interface and mechanics of it are so drat buggy. the ui is complete garbo and slow and debilitating, the history section is inaccurate, and there are so many odd limitations in every little thing that you have to find silly, annoying, and costly ways around. it's ability to handle tables with trillions upon trillions of rows is great and all, and would certainly bring sql server to its knees, but it's just so unwieldy. i'm constantly fighting the drat thing.

oh, and you can't write sps or schedule any sort of tasks, there's no way to write functions, and you can't send multiple sql statements in a single api call. need to create a table with certain conditions, then insert data, then somehow offload that data? three api calls that are just annoying to cron together on an outside server.

also seems wildly expensive but that's on my client and they seem to have no care for cost so whatever.

abelwingnut fucked around with this message at 15:39 on Aug 12, 2019

abelwingnut
Dec 23, 2002


yea, i have to use node instead of python.

abelwingnut
Dec 23, 2002


if the tables are indexed and joined on those indices properly, then absolutely a JOIN is faster. at worst the JOIN's performance is roughly equal to the subquery. the subquery form, at least in that example, is just more intuitive and readable--that's it.

abelwingnut fucked around with this message at 12:44 on Aug 22, 2019

abelwingnut
Dec 23, 2002


they reached for the subquery first because it's just more natural to think that way. that's how i tend to 'first draft' any such query. basically, the thinking's 'compile a dataset A, then run my query on A to get a resulting dataset B'. it just makes more sense, at least to me. a JOIN isn't natural unless you're constantly thinking in SQL logic and execution plans, which i would imagine only dedicated SQL developers do.

and the bigger the tables, the more helpful indices will be.

abelwingnut
Dec 23, 2002


there is no reason to have the 'sal' on line 2. there's also a chance the percents could gently caress with your code, but that depends on which flavor of sql.

and for completeness in the case, you want an ELSE

abelwingnut fucked around with this message at 22:13 on Sep 23, 2019

abelwingnut
Dec 23, 2002


TheFluff posted:

NOT IN(subquery) is also far harder for the query planner to optimize than NOT EXISTS IIRC, although that might be Postgres-specific? Don't remember.

e: right, at least it used to be that if you selected a nullable column in the subquery the query planner would go bananas. Not sure if that's the case anymore, but as mentioned you gotta make sure you don't get nulls in a NOT IN anyway.

this is true in ms-sql as well.

abelwingnut
Dec 23, 2002


a bit of a doozy here.

i have a table A that contains a computer, its ip address, and various other fields. i have another table B that contains an ip range, in the form of a.b.c.d/e, and fields about that ip range.

i am best trying to join these two tables on the ip address. basically, if an ip address in table A falls into an ip range defined in table B, join the rows. so for instance, let's say i have comp1 at ip 1.2.3.4 in table A. if in table B there's a entry for ip range 1.2.3.0/24, these two should join.

the service i am using is called snowflake. it allows json objects within a normal sql row, so to speak. it also has a function called PARSE_IP, which when i run that on the ip ranges in table B, gives me an object with the start and end ip address for that range, among other information. i can also use a function GET to extract each of those fields within the object.

so given those options, i thought i'd write a join like this, knowing full well it would likely be slow:

code:
SELECT
	blah
FROM
	tablea a
        JOIN tableb b ON
			 a.ipaddress
			 BETWEEN
			 Get(Parse_ip(b.ipaddressrange), 'ipStartAddress')
			 AND
                 	 Get(Parse_ip(b.ipaddressrange), 'ipEndAddres') 
this, turns out, is just too drat slow. these tables are massive, and the ranges vary, and there's just too much going on that it is impossible to use, even with snowflake's decent computational speeds.

so next thought was to flatten table B into, essentially, a lookup table. therefore, in the example i wrote above, where 1 row contained ipAddressRange = 1.2.3.0/24 and the other fields, you'd have 256 rows:

1.2.3.0 | other fields for that range
1.2.3.1 | other fields for that range
1.2.3.2 | other fields for that range
...

then perform this on the entire table B, producing a huge lookup, which would then clean that JOIN up into something way simpler and more efficient.

coming from straight sql land where i don't have to deal with inserted json objects (like the idea, but boy howdy is it slow in this case), this feels like the natural solution.

however, i'm wondering if any of you can think of a better way to join these given what i've specified. i'm not entirely sure how to 'flatten' the object like that, especially given i can't write functions in this drat thing. i think i would have to convert the addresses to binary or decimal, then tell it to fill rows from a to b somehow. i don't know, snowflake is a mess.

like would it make more sense to separate the octets into their own columns, and create a join on all four columns? snowflake doesn't use indexing like normal sql. it uses its own clustering ability, which is loving godawful. so i'm trying to battle the inefficiency of their function and their lack of true indexing.

any thoughts are appreciated.

abelwingnut fucked around with this message at 17:40 on Oct 23, 2019

abelwingnut
Dec 23, 2002


i can create tables at will, and i might have some leeway in altering existing tables, but probably not much.

yea, like when i mentioned flattening table B, i meant making a new table C that had one row for every ip within all ranges found in table B. flattening might not be the right word, but uhh...unnesting? squashing? not sure but i hope you catch my drift.

ideally i could add columns with the address stored in binary, decimal, hex in those tables and just compare without having to call the PARSE_IP. that would have to be so much faster. but yea, i don't think i can add those. too many other things feed into them and that'd be a massive rewrite of some scripts.

abelwingnut fucked around with this message at 21:37 on Oct 23, 2019

abelwingnut
Dec 23, 2002


it's on this online service snowflake. it's its own thing. https://www.snowflake.com/

its sql language is largely similar to t-sql or mysql, but you're VERY limited beyond basic dml and basic ddl. like i can't create functions or sprocs or anything like that.

if i go the route of flattening/unnesting/squashing that i mentioned, i'm reading about doing a rowcount cte combined with a number table. i...think i'm wrapping my head around it, at least.

more info here: https://dba.stackexchange.com/questions/59309/how-to-dynamically-generate-rows-based-on-value-in-a-column

abelwingnut fucked around with this message at 21:52 on Oct 23, 2019

abelwingnut
Dec 23, 2002


Just-In-Timeberlake posted:

man that product name is loving on the nose

yes, it loving sucks. i did a project for this same group that should have taken like 1/4 of the time it should actually take. like, i had to create a separate loving server, install cron on there, make sure that the js involved with cron didn't interfere with the formatting of the sql, move all of the code into separate files, then, because you can't send more than one sql statement via their api, had to make sure that things triggered successively as they should.

instead of just using a loving stored procedure.

i won't even get into the lack of true indexing and the nebulous hell that is clustering on snowflake.

abelwingnut fucked around with this message at 22:02 on Oct 23, 2019

abelwingnut
Dec 23, 2002


Ruggan posted:

Creating a whole range of rows for every combo sounds very messy though. You could instead take your example of 1.2.3.0/24 and represent it with three columns:

FirstThreeOctets: 1.2.3
FourthOctetFrom: 0
FourthOctetTo: 24

Can you create persisted calculated columns in your tables that represent it in this way? Or even fully separate the octets? This way you can continue to store it the way you do today but facilitate better joins.

the ranges are variable. they range from /7 to /32. yes, it's wild. that one /7 would take up 33MM rows alone. but yea, therefore, i'd have to create a column for each octet. also not sure how leading zeroes would work in the join in that scenario?

also, what do you mean by persisted calculated columns? i can definitely separate the octets, either by a function i found in there or by mathing it down. i have the equations to convert these to decimal, and can also extract that from the object that is created from PARSE_IP.

and yes, creating a whole table of those ranges does sound messy...in any other system. but this platform is built for massive tables. there's one in here, this is not a table that i'm working with by the way, that's literally 228 995 504 699 rows. that table alone is 24.6TB. and this platform actually is somewhat fast at dealing with it, admittedly.

abelwingnut fucked around with this message at 22:10 on Oct 23, 2019

abelwingnut
Dec 23, 2002


nielsm posted:

If you can modify the schema and get everyone to agree on it, handle the addresses as 32 bit integers and just store the first and last address of each range, and do a simple between check.

yea, i'm fairly sure that is a no-go.

PhantomOfTheCopier posted:

My first two thoughts:

Use PostgreSQL foreign data wrappers and the built in network address type and be done with it. Roops custom service for data, yeah that won't work.

Next, use regular expressions. Netmasks are straightforward to represent as patterns. The issue here is that pattern matching is not a standard column collation, so it'll be slow.

Next thought, the issue is there speed will suck because the join condition is "inside out". IE, you're trying to find a value between two calculated values, and that'll be slow, maybe even O(mn) depending on the database. So, flip the problem around. What if you wrote a big florping case statement that checked all possible netmasks for a given ip? Then you could do a direct equality match with B.

But yeah, if you're doing range checks in a database without geometry or custom types, you really want integers, coupled with some indexes optimized for greater/less-than queries.

Have you tried contacting snowflake? :cop:

and yea, i have to use snowflake. i have told them how awful it is, and that's just not happening.

contacting snowflake is my next step, but wanted to make sure i wasn't missing something.

abelwingnut
Dec 23, 2002


yea, i like that. i'm not sure how i could possibly code that, though. would have to be a series of CASE statements, each computing the binary form, then converting it to the string. or something like that. i'll have to think it through a bit.

abelwingnut
Dec 23, 2002


general sql question:

let's say i have:

code:
SELECT
	*
FROM
	tableA a
	JOIN tableB b ON udfBlah(a.x) = b.y
so in my mind, the way the engine works, it looks at the first row in tableA, evaluates a.x, then goes to tableB, then looks for b.y in the first row. whatever that result may be, true or false, what happens next?

my guess is it has held the result of udfBlah(a.x) in memory so that it doesn't have to reevaluate, and can then move on to row 2 in tableB, evaluate it, then row 3 in tableB, and so forth until it reaches the end of the table B. at that point it then reevaluates udfBlah(a.x) for row 2, then cycles through table B again while holding the new value of udfBlah(a.x) in memory.

do i have that correct, or does it reevaluate udfBlah(a.x) with each evaluation?

abelwingnut
Dec 23, 2002


call me crazy but i have always, always, always avoided triggers. they seem like way more trouble than they're worth, and it seems like they can get out of hand super, super quick. like how would you even start to reverse engineer a data problem stemming from one of them?

abelwingnut
Dec 23, 2002


t-sql here, is it possible to have stored procedures that produce temp tables that are then joined within the scope of a grand stored procedure? like:

--

spOverall
- spProducesTemp1
- spProducesTemp2
- spProducesTemp3

SELECT blah FROM #Temp1 JOIN #Temp2 JOIN Temp3

<result is update to a non-temp table>

END --spOverall

--

And yes, spProducesTemp2 could use the temp table from spProducesTemp1 to generate #Temp2.

abelwingnut fucked around with this message at 00:41 on May 21, 2020

abelwingnut
Dec 23, 2002


thanks, will check out all of this tonight or tomorrow. gotta a 4k-line single stored procedure that i should probably reduce down to specific jobs!

abelwingnut
Dec 23, 2002


alright, another question.

what version of sql handles decentralization/distribution best? like i could see needing to have this one db i'm working with on 5-10 servers, maybe more. what techniques are best? is there sort of a primer on this? i've only managed centralized databases so this is all new to me.

and yea, i know most versions of nosql are all about this, but the data we're dealing with is pretty straightforward. the structure won't change much once we get it finalized.

or maybe i'm thinking of some form of instant replication? like, i'm imagining having two servers, say in nyc and california, and for prime speed, i'd want west coast users to hit the ca server and east coast users to hit the nyc server. but i'd want to make sure the information inserted into one server is replicated over fairly quickly to the other.

or is better to just have one centralized server that is juiced up? thinking we'd ultimately have a ton of connections and calls so just trying to lighten the load based on location. or am i overthinking this?

abelwingnut fucked around with this message at 17:45 on Jun 4, 2020

abelwingnut
Dec 23, 2002


is there a good free db or table for all postal/zip/mail codes in the world?

i'm also not sure if this stuff changes? i'm american and really only know this system. it feels like it never changes? maybe the british do?

abelwingnut
Dec 23, 2002



blammo, danke.

Adbot
ADBOT LOVES YOU

abelwingnut
Dec 23, 2002


also, quick question.

i have a junction table connecting two other tables. as it stands the columns for the junction table are:

code:
CREATE TABLE junctionAtoB (
	id INT NOT NULL AUTO_INCREMENT,
	idA INT NOT NULL,
	idB INT NOT NULL,
	PRIMARY KEY (id).
	FOREIGN KEY (idA) REFERENCES A(id),
	FOREIGN KEY (idB) REFERENCES B(id)
) ENGINE=INNODB
;
is there a better way of doing this? i always have a row identifier (junctionAtoB.id here), and i always make that the primary key, but i'm seeing a lot of people do not use those in junction tables. instead they make the primary key (idA, idB). is there an advantage there? should i alter my practice of making the row identifier the primary key and starting using multiple columns? in A and B, it makes sense as these would be the main ways of joining. so i guess in that sense it makes less sense to do so in the junction table as i'd almost never be searching on it. just thinking aloud.

furthermore, anything else i should alter about this? do i have the foreign keys like i want? i need to add constraints for cascading and all that, but just wanting to get the primary key and base setup first.

thanks.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply