Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Generic Monk
Oct 31, 2011

For the past couple of jobs now I've been doing ETL stuff with MSSQL. I don't really have any formal training with this stuff so I've pretty much been picking up best practices on the fly and not trying to completely cock stuff up like my colleagues do on the regular.

It's kind of difficult to get people without much knowledge or care for best practices to follow a source control proces without it being braindead simple, so in these jobs I've settled into the routine of 'The create table/procedure/function etc scripts are tracked in Git; if you want to edit an entity in the database you open/edit/run these files so you're not making changes to stuff in the database that then doesn't get recorded in source control'. This works but it's prone to error and feels like a clunky hack.

How do others manage this? It seems like SQL server management studio really isn't remotely interested in the idea of source control or that you would regularly be running external files, but surely you would want to have this stuff in source control. Am I just using the wrong tool? I just found out DataGrip exists which looks really nice, but It's quite possible my workflow is just hosed up.

Adbot
ADBOT LOVES YOU

Generic Monk
Oct 31, 2011

monochromagic posted:

For ETL stuff I'd recommend looking into dbt. There are many similar tools out there, and some might be better for your use case, but start with dbt. It is by far the largest and most well-supported, and seems to fit your needs, and you can run it open source. Also hooks nicely into CI/CD processes so you don't actually have to do the changes manually.

(As an aside - maybe also look into doing ELT instead, it's what all the cool kids are doing)

The business is going to move to data warehousing in Redshift IIRC after this project is done, but this project is just a bunch of stored procedures that take data from 3 source systems and spit out a bunch of CSV files to be imported into a third party's CRM, so dbt is a bit out of scope at this stage. I think the moral is that the whole way we're doing things has been a bit hosed, I'm just trying to make the best out of a bad situation. I do need to learn dbt though; it keeps coming up. I do bounce off the marketing jargon in their website a little though.

Generic Monk
Oct 31, 2011


is it bad? the company is all in on amazon everything including the VPN which takes 2 solid minutes to make a connection and doesn’t remember your credentials, and their citrix/terminal server product on which they can’t figure out how to increase the space from 10GB on the user partition if anyone needs it

Generic Monk
Oct 31, 2011

Lib and let die posted:

I will occasionally tap into ChatGPT when I need my statements to bridge the gap between "clever" and "fancy", and did so this morning when working with some data exported from salesforce (my goal was to take each Account with a type of 'Household' from the Accounts file and populate, in-line, the primary household member (indicated in the one2oneContact field as per usual SF) and all associated members of the household identified by the [AccountId] in the Contacts file - the insert script for this particular item in our database is built around the expectation that we are getting data in the form of [HouseholdID][Member1][Member2][Member3] and so on - suffice to say it needs some serious reworking but that's neither here nor there).

Anyway, chatGPT gave me a fine output. It might not be the most efficient way to do it, but it gives me the results I want. I've got to export the results to excel and do a text to columns then re-import it, but it's still way less annoying than it could be.

My question, ultimately, is with ChatGPT's followup 'notes' or whatever.

What exactly is the "STUFF and FOR XML PATH('') technique" and what is it doing?

I can provide the full output here, but the relevant code bit is this:

code:
SELECT
        pc.AccountID,
        STUFF((SELECT ', ' + CAST(sc.ContactID AS VARCHAR(500))
               FROM SecondaryContacts sc
               WHERE pc.AccountID = sc.AccountID
               FOR XML PATH('')), 1, 2, '') AS SecondaryContactIDs
    FROM PrimaryContacts pc
pc and sc are aliases assigned to CTEs for primary and secondary contacts defined a bit further up in the output, for reference

It's away to work around the lack of a GROUP_CONCAT function (returning the rows of a query result stitched together in one row) in older versions of MS SQL server.

FOR XML in SQL server allows you to return the results of a query as XML elements, i.e. text that is marked up with XML tags. Adding the PATH('') modifier removes the tags and replaces them with nothing so you have all the rows concatted together. You want a comma-separated list of values so each value in the subquery begins with a comma and a space (the SELECT ', ' + part). But you don't want the result to begin with a comma and a space, so the STUFF function is just taking 2 characters of the string, starting from position 1, and replacing them with nothing (the 1, 2, ''' at the end are the first position, 2 characters, and a blank string (nothing) respectively).

It's just an ugly hack to work around the lack of a purpose built function that does that in SQL Server. 2017 and later have the STRING_AGG function which does what you want in a more intuitive and readable way. You may or may not be using this, but the way chatGPT has done it is the 'classic' way that will be all over the internet, hence why it recommended it :)

Generic Monk
Oct 31, 2011

Lib and let die posted:

And there's the problem with ChatGPT: I can use STRING_AGG lmfao

It's a great resource, it just behoves you to sanity check what it tells you. Which is what you're doing, so sounds like you're using it correctly :)

Generic Monk
Oct 31, 2011

e: I'm an idiot and overcomplicating a simple problem LOL

Generic Monk fucked around with this message at 17:35 on Aug 24, 2023

Adbot
ADBOT LOVES YOU

Generic Monk
Oct 31, 2011

GigaPeon posted:

Here's what I was looking at

SQL Prompt: https://www.red-gate.com/products/sql-prompt/ People at the summit I was talking to were big on this one. Saying it saved them 100s of hours. Refactoring and templates and macros. Let's you define or pick a house style. Of course now they're using ChatGPT integration for some stuff. "Make this sp more efficient". Talking to a Redgate guy at an afterparty, he said that they're planning on making it so you can limit training to your own data for security concerns. I've been using their online formatter for a long time (https://www.red-gate.com/website/sql-formatter) so I was interested in the more robust features of this

a few of the older members of my team use this and swear by it. it looks ok but i do balk at the concept of having to pay for plugins to fix the barebones functionality of SSMS. datagrip is in the same ballpark price wise and is generally just way more full featured and integrated; it’s the nicest way i’ve found to write sql.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply