|
credburn posted:In order to do that, I need to first find out what districts are most common. That should be easy. But then I need to find out what actors most frequently appear in the films being rented by those districts. That's where, conceptually, I get lost. I feel like this is the sort of mundane and typical task that will become second nature in a career setting, but right now I find I'm overwhelmed. Assuming that you are doing things in multiple SQL statements, you might consider something like this: 1. Find out what districts are most common. You say this is easy, so I'll assume it is. 2. Find out, in those districts, what films are being rented. Since your statement was "where the most movies are rented", I'm going to assume here that duplicate rentals should count twice here. 3. Find out, in those rentals, what actors appear the most.
|
# ? Jul 25, 2023 20:07 |
|
|
# ? May 21, 2024 01:57 |
That ER diagram technically seems to be missing an arrow from inventory to store, but that shouldn't be relevant to the question. Intermediate questions: How would you get the actors that appear in the most films? How would you get the actors that appear in the most films that are in inventory for a specific store? How would you find the most rented out films for a specific store? How would you find the single most rented out film for every store? How would you find the most popular actors for the films rented out for a specific store?
|
|
# ? Jul 25, 2023 20:15 |
|
I got hired for a DB Admin role and I have no idea what I'm doing. Oh god.
|
# ? Jul 25, 2023 22:07 |
|
ulmont posted:Assuming that you are doing things in multiple SQL statements, you might consider something like this: nielsm posted:That ER diagram technically seems to be missing an arrow from inventory to store, but that shouldn't be relevant to the question. Thanks so much, gang! This is actually exactly what I was looking for without realizing it!
|
# ? Jul 25, 2023 23:05 |
|
Away all Goats posted:I got hired for a DB Admin role and I have no idea what I'm doing. Oh god. if you're dealing with a small oltp, relatively low traffic database, you probably won't have to do much admin-wise. just reindex now and again, least privilege security, backups, etc. just don't be an ioiot/rock its boat, obviously. cam definitely get hairier so hopefully that's all.
|
# ? Jul 25, 2023 23:09 |
|
abelwingnut posted:if you're dealing with a small oltp, relatively low traffic database, you probably won't have to do much admin-wise. just reindex now and again, least privilege security, backups, etc. just don't be an ioiot/rock its boat, obviously. Thanks, that does make me feel a bit better. It's a small (30~ people) company dealing with healthcare data. But it's my first bigboy job in the industry that isn't an internship/co-op(and they were unrelated to databases) so I'm getting a huge sense of imposter syndrome. Away all Goats fucked around with this message at 00:59 on Jul 26, 2023 |
# ? Jul 26, 2023 00:56 |
|
i haven't worked in healthcare, but i imagine it has its own hurdles. hipaa, probably lots of auditing, etc. can't help you there, but it can't be impossible.
|
# ? Jul 26, 2023 03:57 |
|
Away all Goats posted:I got hired for a DB Admin role and I have no idea what I'm doing. Oh god. You got this! What type(s) of databases are you going to work with? SQL Server? Oracle?
|
# ? Jul 26, 2023 04:31 |
|
Hughmoris posted:You got this! Thanks. They mentioned it was mostly working with PostgreSQL, which I was honest about and said I had no experience in. I mostly worked with Oracle previously. From what I've seen they're pretty similar.
|
# ? Jul 26, 2023 18:43 |
|
You'll do fine. Everyone is as bad or worse than you are. Even the healthcare side of things, despite regulation, is pretty rotten. I am not a pessimist and don't ask me how I know this.
|
# ? Jul 27, 2023 12:42 |
|
Away all Goats posted:Thanks. They mentioned it was mostly working with PostgreSQL, which I was honest about and said I had no experience in. I mostly worked with Oracle previously. From what I've seen they're pretty similar. The SQL syntax is basically identical because the Postgres devs follow what Oracle puts on the spec. They're pretty different still in management, but your main things are going to be the obvious a) find where the backups are, test them, confirm they work, b) work out how to tell how your DB is doing and have it yell at you if it goes out of the regular boundaries.
|
# ? Jul 28, 2023 20:06 |
|
For the past couple of jobs now I've been doing ETL stuff with MSSQL. I don't really have any formal training with this stuff so I've pretty much been picking up best practices on the fly and not trying to completely cock stuff up like my colleagues do on the regular. It's kind of difficult to get people without much knowledge or care for best practices to follow a source control proces without it being braindead simple, so in these jobs I've settled into the routine of 'The create table/procedure/function etc scripts are tracked in Git; if you want to edit an entity in the database you open/edit/run these files so you're not making changes to stuff in the database that then doesn't get recorded in source control'. This works but it's prone to error and feels like a clunky hack. How do others manage this? It seems like SQL server management studio really isn't remotely interested in the idea of source control or that you would regularly be running external files, but surely you would want to have this stuff in source control. Am I just using the wrong tool? I just found out DataGrip exists which looks really nice, but It's quite possible my workflow is just hosed up.
|
# ? Jul 30, 2023 15:41 |
|
Generic Monk posted:For the past couple of jobs now I've been doing ETL stuff with MSSQL. I don't really have any formal training with this stuff so I've pretty much been picking up best practices on the fly and not trying to completely cock stuff up like my colleagues do on the regular. For ETL stuff I'd recommend looking into dbt. There are many similar tools out there, and some might be better for your use case, but start with dbt. It is by far the largest and most well-supported, and seems to fit your needs, and you can run it open source. Also hooks nicely into CI/CD processes so you don't actually have to do the changes manually. (As an aside - maybe also look into doing ELT instead, it's what all the cool kids are doing)
|
# ? Jul 30, 2023 18:15 |
|
monochromagic posted:For ETL stuff I'd recommend looking into dbt. There are many similar tools out there, and some might be better for your use case, but start with dbt. It is by far the largest and most well-supported, and seems to fit your needs, and you can run it open source. Also hooks nicely into CI/CD processes so you don't actually have to do the changes manually. The business is going to move to data warehousing in Redshift IIRC after this project is done, but this project is just a bunch of stored procedures that take data from 3 source systems and spit out a bunch of CSV files to be imported into a third party's CRM, so dbt is a bit out of scope at this stage. I think the moral is that the whole way we're doing things has been a bit hosed, I'm just trying to make the best out of a bad situation. I do need to learn dbt though; it keeps coming up. I do bounce off the marketing jargon in their website a little though.
|
# ? Jul 30, 2023 19:58 |
|
Generic Monk posted:Redshift Wrt dbt marketing lingo I agree - I'd simply go straight to the docs. They have a pretty good sample project as well which showcases common usage patterns. Totally understand making the best of a bad situation, and fwiw it sounds like you're on the right track given the circumstances. But yeah, for levelling up the ETL game in the future I'd recommend dbt, Airbyte and Dagster - but I'm just waffling on about data engineering now and it probably isn't the thread for that.
|
# ? Jul 31, 2023 15:28 |
|
monochromagic posted:but I'm just waffling on about data engineering now and it probably isn't the thread for that. Is there a thread for that?
|
# ? Jul 31, 2023 16:17 |
|
we moved our warehose to redshift from snowflake a few months ago. while it is quicker and better structurally, the web ide is loving awful. it'll run queries up to 300k characters, but it won't save queries longer than 30k, which...what the gently caress? and yes, those numbers are correct. it'll run queries it can't save. baffling. furthermore, it has the dumbest autocomplete/intellisense in the known universe. there's zero in the way of auto-formatting. the lefthand panel containing the schema is a ux nightmare. oh, and it crashes like every tenth query you run. don't go to redshift if you can avoid it! at least snowflake wouldn't crash all the time, even if writing in it is hellish.
|
# ? Jul 31, 2023 17:33 |
|
abelwingnut posted:we moved our warehose to redshift from snowflake a few months ago. while it is quicker and better structurally, the web ide is loving awful. it'll run queries up to 300k characters, but it won't save queries longer than 30k, which...what the gently caress? and yes, those numbers are correct. it'll run queries it can't save. baffling. Are you an analyst, engineer, or administrator for Redshift? Or a combination of all three? From the outside looking in, I often hear that Redshift is a platform that can offer a lot of juice but you have to know what you're doing in terms of modeling and writing queries. I've toyed with it on a few personal projects but my job doesn't use it.
|
# ? Jul 31, 2023 17:38 |
|
Average Lettuce posted:Is there a thread for that? I created one years ago, but there was zero interest. The data engineering subreddit is pretty decent, though.
|
# ? Jul 31, 2023 17:46 |
|
Hughmoris posted:Are you an analyst, engineer, or administrator for Redshift? Or a combination of all three? engineer, but they're trying to shoehorn me into a bunch of analysis and i'm not liking it!
|
# ? Jul 31, 2023 17:51 |
|
Seventh Arrow posted:I created one years ago, but there was zero interest. The data engineering subreddit is pretty decent, though. Yeah, that's my go-to as well.
|
# ? Jul 31, 2023 17:57 |
|
I'm all for using this as a general purpose Database/Analyst/Engineer thread. It's not like we are getting so much traffic that we can't keep up with the topic. Thoughts?
|
# ? Jul 31, 2023 18:00 |
|
i always assumed this thread was just a catch-all for data jobs given there were no other threads at the time? i've got this bookmarked in my control panel and honeslty haven't looked at the subforum's main page since then, which was like...two years ago?, so maybe there are some new threads now. but yea, one thread for data makes a ton of sense to me. analyst, admin, engineer, visualizer/whatever they're called, scientist, anything. abelwingnut fucked around with this message at 18:10 on Jul 31, 2023 |
# ? Jul 31, 2023 18:08 |
|
I'd be open to create an OP for a data thread and see if there's interest. I love talking about that poo poo and am currently a one person data engineering army, so having some input would be welcome. Ofc, there might be decent overlap with this thread as SQL is still king in this area, but I feel that there's a decent amount of stuff that's out of scope for people mostly interested in DB management (love you guys and everything you do).
|
# ? Jul 31, 2023 18:31 |
|
monochromagic posted:I'd be open to create an OP for a data thread and see if there's interest. I love talking about that poo poo and am currently a one person data engineering army, so having some input would be welcome. Ofc, there might be decent overlap with this thread as SQL is still king in this area, but I feel that there's a decent amount of stuff that's out of scope for people mostly interested in DB management (love you guys and everything you do). I think that's a great idea! I've done data analyst work for a while and really want to break into data engineering. Specifically, I (think) I want to become a Databricks engineer so I've been brushing up on that. Any of yous doing Databricks work?
|
# ? Jul 31, 2023 19:41 |
|
Hughmoris posted:I think that's a great idea! If you want to break into data engineering, the things you should prioritize the most are: Python SQL Data Modeling ETL Principals One of the big 3 cloud technologies Databricks has kind of evolved beyond pyspark, but being proficient at pyspark should nevertheless help you get your foot in the door at least.
|
# ? Jul 31, 2023 19:56 |
|
Seventh Arrow posted:If you want to break into data engineering, the things you should prioritize the most are: Thanks for the feedback. I have a good grasp on Python and SQL, with a decent grasp on Data Modeling and ETL. I've picked up an AWS Solutions Architect - Associate cert, and the Azure Data Engineer cert. My current job has a LOT of downtime, and they foot the bill on some training costs (hence the certs). I'm trying to stay productive until I job hop, which could be later this year. Some of the DE-related topics I've been looking at diving into: - Databricks - Snowflake - Redshift/Glue/EMR - Elasticsearch and Kibana (I know the least about these) - Splunk
|
# ? Jul 31, 2023 20:05 |
|
Generic Monk posted:For the past couple of jobs now I've been doing ETL stuff with MSSQL. I don't really have any formal training with this stuff so I've pretty much been picking up best practices on the fly and not trying to completely cock stuff up like my colleagues do on the regular. We use Liquibase for this: https://www.liquibase.org/ Every change, update, or insert for our various DBs is written as a liquibase change set and saved in the relevant git repo. When it comes time for a new version deployment, the whole thing gets run and takes care of any changes needed, with history tracking and all that good stuff. It can take some getting used to for people to write their statements correctly (You have to be good about using IF NOT EXISTS, setting the context for stuff like "run on changes, run always, run in different environments, or you'll get weird poo poo happening), but for any decently sized project it feels like a must. Gwaihir fucked around with this message at 20:24 on Jul 31, 2023 |
# ? Jul 31, 2023 20:17 |
|
Seventh Arrow posted:If you want to break into data engineering, the things you should prioritize the most are: python? you mean the money laundering language?
|
# ? Jul 31, 2023 20:35 |
|
abelwingnut posted:engineer, but they're trying to shoehorn me into a bunch of analysis and i'm not liking it! Do you know why the company switched off Snowflake? It seems the biggest reason I hear about is Snowflake is pretty drat pricey.
|
# ? Jul 31, 2023 20:44 |
|
Hughmoris posted:Do you know why the company switched off Snowflake? It seems the biggest reason I hear about is Snowflake is pretty drat pricey. our new cto is a huge amazon fanboy and has switched over our entire infrastructure to aws. it'd be possible to have snowflake as the warehouse while having glue, dms,, et al do their various jobs and write to snowflake and what not. but if you're already moving over dbt and orchestration over to aws, might as well move to redshift as well. from a usability standpoint, geesh, i don't know which i'd choose. snowflake has its own litany of ui/ux problems. i'd probably choose redshift given aws seems to actually be working on its products. snowflake has barely changed in like 3yr and already feels somewhat over. never used azure so no idea how it compares. i hear it's better, though. abelwingnut fucked around with this message at 20:58 on Jul 31, 2023 |
# ? Jul 31, 2023 20:56 |
|
RandomBlue posted:python? you mean the money laundering language? welcome to coding prison, sucker!
|
# ? Jul 31, 2023 21:07 |
|
monochromagic posted:I'd be open to create an OP for a data thread and see if there's interest. I love talking about that poo poo and am currently a one person data engineering army, so having some input would be welcome. Ofc, there might be decent overlap with this thread as SQL is still king in this area, but I feel that there's a decent amount of stuff that's out of scope for people mostly interested in DB management (love you guys and everything you do). I’d be in for a data thread as well. Transitioning my role from Full stack developer to more of a Data Engineering role so I’m interested in discussing more.
|
# ? Aug 1, 2023 03:55 |
|
Why is Oracle (the company) unpleasant to deal with?
|
# ? Aug 1, 2023 04:31 |
|
A lot of Data Engineering jobs mention python - where should I start? I've done a lot of data related dev stuff, almost all on c#/ms SQL if that helps? Looking at one of the Azure certs mentioned above too.
|
# ? Aug 1, 2023 09:57 |
|
is it bad? the company is all in on amazon everything including the VPN which takes 2 solid minutes to make a connection and doesn’t remember your credentials, and their citrix/terminal server product on which they can’t figure out how to increase the space from 10GB on the user partition if anyone needs it
|
# ? Aug 1, 2023 10:05 |
|
mortarr posted:A lot of Data Engineering jobs mention python - where should I start? I've done a lot of data related dev stuff, almost all on c#/ms SQL if that helps? Looking at one of the Azure certs mentioned above too. Star War Sex Parrot fucked around with this message at 19:27 on Aug 1, 2023 |
# ? Aug 1, 2023 17:31 |
|
Hey gang, I am struggling to join these tables and produce the results I'm looking for. I'm using postgres and this diagram: This is a slight variation from my earlier goal; now what I am trying to accomplish is, I want to find out what the five most common districts are, and in those I want to produce what genre is most popular in each of the five districts. I am able to get the five most common districts pretty simply: code:
code:
code:
Specifically the error I'm getting back when I use this code altogether is: ERROR: missing FROM-clause entry for table "address" LINE 10: JOIN customer ON top_district = address.district
|
# ? Aug 1, 2023 20:10 |
|
The problem is this join: "JOIN customer ON top_districts.district = address.district". One side of the ON has to include the table you want to join to (customer). Also, because it's within the subquery, the address table is not available to the rest of your query unless you join it back on or expose one of its IDs in the subquery.
|
# ? Aug 1, 2023 20:30 |
|
|
# ? May 21, 2024 01:57 |
|
credburn posted:Hey gang, I am struggling to join these tables and produce the results I'm looking for. I'm using postgres and this diagram: a couple things:
e: f;b
|
# ? Aug 1, 2023 20:33 |