|
I could be totally wrong on this, but wouldn't you also have to do a nolock on the SELECT query?
|
# ¿ Aug 10, 2022 02:37 |
|
|
# ¿ May 6, 2024 00:10 |
|
I'm sure this will be kindergarten-level for most of you, but I'm trying to get the lowest salary, along with the department: https://www.db-fiddle.com/f/4EuiV3Pgbd7BTzRR4BKxZC/20 My impression was that if I don't put a GROUP BY, then it should give me a single result with the lowest overall salary, am I mistaken? If I don't put it, I get an error complaining about the lack of a GROUP BY.
|
# ¿ Aug 29, 2022 17:02 |
|
nielsm posted:No, you can't get the smallest like that. I will try that, thank you! Hughmoris posted:Would something like this work for you? I'm just seeing the same thing that I posted
|
# ¿ Aug 29, 2022 17:15 |
|
That works, thanks! I would not have thought of using a subquery, very interesting.
|
# ¿ Aug 29, 2022 17:19 |
|
kumba posted:When using any sort of summarization function, imagine the individual rows you're considering to be sheets of paper and your summarization to be a folder those papers are in. In this example, your names & salaries are all individual sheets of paper and your MIN function is your folder. This is a good analogy, thanks! I will mull it over some.
|
# ¿ Aug 29, 2022 17:31 |
|
I'm trying to work out a SQL problem from a supposed Reddit interview question: https://www.interviewquery.com/interview-guides/reddit-data-analyst (Ad comments) I tried hammering out a solution by populating the tables with some fake data (which is probably wrong): https://www.db-fiddle.com/f/3frdKCCacKFCUUYiL9dJj7/12 but now I'm not even sure I can make sense of the question. The idea is that you're supposed to calculate a percentage of comments for each ad in each of the tables. But isn't there going to be a one-to-one relationship between the amount of comments on given ad and the instances of the ad ID? In other words, every time someone comments on "polo shirts", then surely the ad ID for "polo shirts" is going to be there too, making the ratio 100%(?) It's a bit tricky because they don't provide a sampling of data, just the schemas. In completely unrelated news, I have a data engineer interview with Reddit this week
|
# ¿ Sep 11, 2022 20:06 |
|
Yes, I was kind of wondering about that, since there only seemed to be one "instance" of each ad. How would you calculate the percentage, then? Just assume that "ad" is "10" and do something like "SELECT COUNT comment_id/10 * 100..."?
|
# ¿ Sep 11, 2022 20:32 |
|
Yeah I admittedly usually bomb these SQL/Python interview tests but I'm not getting to work with this stuff in my current job anymore, so I might as well try. Also, whenever possible I try to copy/paste the interview challenge and analyze it later so I can get better at this stuff.
|
# ¿ Sep 12, 2022 14:55 |
|
Is there a way to prevent an aliased column from returning nulls? Here's the query that I've come up with so far: code:
It's not liking the 'where' clause, though. I halfway suspect that I should be using LAG's 'default value' instead, but I'm not quite sure how to use it. Here are the requirements, if you're curious: quote:Let’s see how temperature has changed each year in each state. quote:Hint
|
# ¿ Sep 22, 2022 20:35 |
|
Thanks, I will look into that!
|
# ¿ Sep 22, 2022 22:00 |
|
Yes you're right and I was just looking at that in my SQL cheat sheet yesterday: • When we want to limit the results of a query based on values of the individual rows, use WHERE. • When we want to limit the results of a query based on an aggregate property, use HAVING. HAVING statement always comes after GROUP BY, but before ORDER BY and LIMIT. and of course I immediately forgot about it right when it could've been useful
|
# ¿ Sep 23, 2022 00:03 |
|
Perhaps StrataScratch?
|
# ¿ Oct 4, 2022 03:36 |
|
I'm still a newbie data engineer but what kind of metadata governance are we talking about? In the gigs I've been in, there's always been a data governance team that makes the decisions, which they hand down to the DE's. Is that not the case here? Are they looking for a software solution to handle metadata? There could be a lot of facets to data governance, like the handling of personal information, dimensional modeling, etc.
|
# ¿ Oct 6, 2022 03:14 |
|
I'm getting murdered in data engineering interviews because I've let whatever SQL skills I had atrophy over time. I've noticed that when I look at a problem my brain tends to freeze at first and then I tend to overthink everything. I start thinking things like "maybe I'll need to filter, so a subquery might be required..." or "maybe I'll need a CTE instead" and then it kind of snowballs from there. I got whammied by a "top x of y" problem (i.e. "top earners in the department") because I wasn't sure what kind of tool to approach it with. Should I just be grinding StrataScratch, or is there a more strategic way to approach this?
|
# ¿ Mar 29, 2023 20:50 |
|
DELETE CASCADE posted:select * from (... rank() over (partition by department order by earnings desc) ...) where rank >= x Thanks - really, I think I need to just keep practicing and doing challenges that will let me start to recognize patterns and their solutions. This article helped somewhat.
|
# ¿ Apr 8, 2023 14:28 |
|
Average Lettuce posted:Is there a thread for that? I created one years ago, but there was zero interest. The data engineering subreddit is pretty decent, though.
|
# ¿ Jul 31, 2023 17:46 |
|
Hughmoris posted:I think that's a great idea! If you want to break into data engineering, the things you should prioritize the most are: Python SQL Data Modeling ETL Principals One of the big 3 cloud technologies Databricks has kind of evolved beyond pyspark, but being proficient at pyspark should nevertheless help you get your foot in the door at least.
|
# ¿ Jul 31, 2023 19:56 |
|
RandomBlue posted:python? you mean the money laundering language? welcome to coding prison, sucker!
|
# ¿ Jul 31, 2023 21:07 |
|
mortarr posted:A lot of Data Engineering jobs mention python - where should I start? I've done a lot of data related dev stuff, almost all on c#/ms SQL if that helps? Looking at one of the Azure certs mentioned above too. Our humble SA thread https://forums.somethingawful.com/showthread.php?threadid=3812541 https://www.reddit.com/r/inventwithpython/ https://www.reddit.com/r/learnpython/ https://www.reddit.com/r/Python/ https://www.reddit.com/r/PythonNoobs/ Books Python Crash Course Automate the Boring Stuff with Python Think Python Youtube Indently Tech With Tim mCoding ArjanCodes Python Simplified Online Learning Codecademy Real Python The Official Guide DataCamp Discussion Official Python IRC Discord Brain-busters LeetCode Practice Python HackerRank Edabit
|
# ¿ Aug 1, 2023 22:37 |
|
lazerwolf posted:I’d be in for a data thread as well. Transitioning my role from Full stack developer to more of a Data Engineering role so I’m interested in discussing more. A Data Engineering thread is finally upon us, huzzah! https://forums.somethingawful.com/showthread.php?threadid=4050611&pagenumber=1#lastpost thanks to monochromagic!
|
# ¿ Dec 29, 2023 15:05 |
|
I like Codecademy a lot for learning. They have an interactive query window and a course that will take you from basic select queries to calculating churn. Of course, once you're past the free period you have to fork over to continue. I think it's worth it though, given the amount of stuff you can learn on their platform. YMMV though. If you need free stuff, then on youtube there's Ankit Bansal and Alex Freberg. There's also learnsql.com, and sql-practice.com for when you have enough under your belt to start practicing with some challenges.
|
# ¿ Mar 4, 2024 06:02 |
|
|
# ¿ May 6, 2024 00:10 |
|
"And that, my son, is when mankind discovered time travel."
|
# ¿ Mar 12, 2024 03:54 |