Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Seventh Arrow
Jan 26, 2005

I could be totally wrong on this, but wouldn't you also have to do a nolock on the SELECT query?

Adbot
ADBOT LOVES YOU

Seventh Arrow
Jan 26, 2005

I'm sure this will be kindergarten-level for most of you, but I'm trying to get the lowest salary, along with the department:

https://www.db-fiddle.com/f/4EuiV3Pgbd7BTzRR4BKxZC/20

My impression was that if I don't put a GROUP BY, then it should give me a single result with the lowest overall salary, am I mistaken? If I don't put it, I get an error complaining about the lack of a GROUP BY.

Seventh Arrow
Jan 26, 2005

nielsm posted:

No, you can't get the smallest like that.
Order by the salary ascending (smallest first), and get the first one row.

I will try that, thank you!



I'm just seeing the same thing that I posted :confused:

Seventh Arrow
Jan 26, 2005


That works, thanks! I would not have thought of using a subquery, very interesting.

Seventh Arrow
Jan 26, 2005

kumba posted:

When using any sort of summarization function, imagine the individual rows you're considering to be sheets of paper and your summarization to be a folder those papers are in. In this example, your names & salaries are all individual sheets of paper and your MIN function is your folder.

As soon as you try to calculate that function, imagine writing the answer on the outside of that folder and closing it shut - now you only have access to what you wrote on the folder and cannot look at any of the details inside it. This is why your query fails, as you're trying to return both aggregated information and non-aggregated information. Your query will run just fine if the only thing you try to select is the minimum salary without the details.

If you need to get the department along with the salary, you can use a subquery, select TOP 1 and order by the salary, or you can look into CTEs for more advanced stuff

This is a good analogy, thanks! I will mull it over some.

Seventh Arrow
Jan 26, 2005

I'm trying to work out a SQL problem from a supposed Reddit interview question: https://www.interviewquery.com/interview-guides/reddit-data-analyst (Ad comments)

I tried hammering out a solution by populating the tables with some fake data (which is probably wrong):

https://www.db-fiddle.com/f/3frdKCCacKFCUUYiL9dJj7/12

but now I'm not even sure I can make sense of the question. The idea is that you're supposed to calculate a percentage of comments for each ad in each of the tables. But isn't there going to be a one-to-one relationship between the amount of comments on given ad and the instances of the ad ID? In other words, every time someone comments on "polo shirts", then surely the ad ID for "polo shirts" is going to be there too, making the ratio 100%(?)

It's a bit tricky because they don't provide a sampling of data, just the schemas. In completely unrelated news, I have a data engineer interview with Reddit this week :tif:

Seventh Arrow
Jan 26, 2005

Yes, I was kind of wondering about that, since there only seemed to be one "instance" of each ad. How would you calculate the percentage, then? Just assume that "ad" is "10" and do something like "SELECT COUNT comment_id/10 * 100..."?

Seventh Arrow
Jan 26, 2005

Yeah I admittedly usually bomb these SQL/Python interview tests but I'm not getting to work with this stuff in my current job anymore, so I might as well try. Also, whenever possible I try to copy/paste the interview challenge and analyze it later so I can get better at this stuff.

Seventh Arrow
Jan 26, 2005

Is there a way to prevent an aliased column from returning nulls?

Here's the query that I've come up with so far:

code:
select state, year, tempc,
tempc - lag(tempc) over (partition by state order by year) as change_in_temp
from state_climate
where change_in_temp is not null;
https://www.db-fiddle.com/f/kgQLu7Mcg2DqEXEUxYfGAQ/11

It's not liking the 'where' clause, though. I halfway suspect that I should be using LAG's 'default value' instead, but I'm not quite sure how to use it.

Here are the requirements, if you're curious:

quote:

Let’s see how temperature has changed each year in each state.

Write a query to select the same columns but now you should write a window function that returns the change_in_temp from the previous year (no null values should be returned).

Which states and years saw the largest changes in temperature?
Is there a particular part of the United States that saw the largest yearly changes in temperature?

quote:

Hint

This one is a little tricky! In order to answer the questions you need to:

find the difference in temperature from the previous year using subtraction and LAG
you need all three parameters that LAG accepts in order to look to the previous row and never return a null
PARTITION BY state and ORDER BY year
lastly your entire results set needs to use ORDER BY change_in_temp (use ASC or DESC!)

Seventh Arrow
Jan 26, 2005

Thanks, I will look into that!

Seventh Arrow
Jan 26, 2005

Yes you're right and I was just looking at that in my SQL cheat sheet yesterday:

• When we want to limit the results of a query based on values of the individual rows, use WHERE.
• When we want to limit the results of a query based on an aggregate property, use HAVING.
HAVING statement always comes after GROUP BY, but before ORDER BY and LIMIT.

and of course I immediately forgot about it right when it could've been useful :cripes:

Seventh Arrow
Jan 26, 2005

Perhaps StrataScratch?

Seventh Arrow
Jan 26, 2005

I'm still a newbie data engineer but what kind of metadata governance are we talking about? In the gigs I've been in, there's always been a data governance team that makes the decisions, which they hand down to the DE's. Is that not the case here? Are they looking for a software solution to handle metadata? There could be a lot of facets to data governance, like the handling of personal information, dimensional modeling, etc.

Seventh Arrow
Jan 26, 2005

I'm getting murdered in data engineering interviews because I've let whatever SQL skills I had atrophy over time. I've noticed that when I look at a problem my brain tends to freeze at first and then I tend to overthink everything. I start thinking things like "maybe I'll need to filter, so a subquery might be required..." or "maybe I'll need a CTE instead" and then it kind of snowballs from there. I got whammied by a "top x of y" problem (i.e. "top earners in the department") because I wasn't sure what kind of tool to approach it with. Should I just be grinding StrataScratch, or is there a more strategic way to approach this?

Seventh Arrow
Jan 26, 2005

DELETE CASCADE posted:

select * from (... rank() over (partition by department order by earnings desc) ...) where rank >= x

common interview question because some databases have special syntax for this and the interviewer wants to show off that he knows it. definitely a realistic problem that does show up in real work, tho


Thanks - really, I think I need to just keep practicing and doing challenges that will let me start to recognize patterns and their solutions. This article helped somewhat.

Seventh Arrow
Jan 26, 2005

Average Lettuce posted:

Is there a thread for that?

I created one years ago, but there was zero interest. The data engineering subreddit is pretty decent, though.

Seventh Arrow
Jan 26, 2005

Hughmoris posted:

I think that's a great idea!

I've done data analyst work for a while and really want to break into data engineering. Specifically, I (think) I want to become a Databricks engineer so I've been brushing up on that. Any of yous doing Databricks work?

If you want to break into data engineering, the things you should prioritize the most are:

Python
SQL
Data Modeling
ETL Principals
One of the big 3 cloud technologies

Databricks has kind of evolved beyond pyspark, but being proficient at pyspark should nevertheless help you get your foot in the door at least.

Seventh Arrow
Jan 26, 2005

RandomBlue posted:

python? you mean the money laundering language?

welcome to coding prison, sucker!

Seventh Arrow
Jan 26, 2005

mortarr posted:

A lot of Data Engineering jobs mention python - where should I start? I've done a lot of data related dev stuff, almost all on c#/ms SQL if that helps? Looking at one of the Azure certs mentioned above too.

Our humble SA thread

https://forums.somethingawful.com/showthread.php?threadid=3812541

Reddit

https://www.reddit.com/r/inventwithpython/
https://www.reddit.com/r/learnpython/
https://www.reddit.com/r/Python/
https://www.reddit.com/r/PythonNoobs/

Books

Python Crash Course
Automate the Boring Stuff with Python
Think Python

Youtube

Indently
Tech With Tim
mCoding
ArjanCodes
Python Simplified

Online Learning

Codecademy
Real Python
The Official Guide
DataCamp

Discussion

Official Python IRC
Discord

Brain-busters

LeetCode
Practice Python
HackerRank
Edabit

Seventh Arrow
Jan 26, 2005

lazerwolf posted:

I’d be in for a data thread as well. Transitioning my role from Full stack developer to more of a Data Engineering role so I’m interested in discussing more.

A Data Engineering thread is finally upon us, huzzah!

https://forums.somethingawful.com/showthread.php?threadid=4050611&pagenumber=1#lastpost

thanks to monochromagic!

Seventh Arrow
Jan 26, 2005

I like Codecademy a lot for learning. They have an interactive query window and a course that will take you from basic select queries to calculating churn. Of course, once you're past the free period you have to fork over to continue. I think it's worth it though, given the amount of stuff you can learn on their platform. YMMV though.

If you need free stuff, then on youtube there's Ankit Bansal and Alex Freberg. There's also learnsql.com, and sql-practice.com for when you have enough under your belt to start practicing with some challenges.

Adbot
ADBOT LOVES YOU

Seventh Arrow
Jan 26, 2005

"And that, my son, is when mankind discovered time travel."

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply