|
Roundboy posted:But I think I did figure it out. You need to assign default rights as the owner of the table. My unique case is that I have a dozen if more potential owners. 100% of the use cases I find on the Google deal with pure universal read only users (I only want this schema) or single owner scenarios. But, it feels like there's a bug here. If alice and bob are members of schema_users, and schema_users is granted permissions to create objects within the schema, then alice can create a table only through an inherited privilege. Why then should alice's default privileges apply to the new object? If schema_users was NOINHERIT, then alice could SET ROLE first and create a table, which would then be owned by schema_users directly and contain schema_users' default privileges. I suspect the actual answer to your question is, It's a staging area for table playing, so does per-user authority truly matter? Typical role membership is granted to classes of users; administrators have equal power to destroy the database, and everyone else falls into the readonly bucket. Individuals requiring full privilege isolation are typically granted CREATEDB or otherwise provided a database of their own that is inaccessible to the public. At least per-user settings are possible but, given this (historical) expectation for access control, it's unsurprising that it's rather difficult to obtain the desired granularity. Adopt an expectation that each table named "username" is owned by "username" and tell your users to be friendly to one another (and that they should have backups). First offense, forgive; second offense, "bye bye this is why we can't have nice things build your own staging area with sqlite3". For others'/future reference, there is a rather thorough walkthrough of layered privileges at https://dba.stackexchange.com/questions/117109/how-to-manage-default-privileges-for-users-on-a-database-vs-schema Edit: Suppose such a per-user default was implemented. If the user was not "trustworthy", they could simply revoke all as soon as they create an object, so there's no way to enforce the SELECT privilege to others. PhantomOfTheCopier fucked around with this message at 13:41 on Jun 8, 2017 |
# ? Jun 8, 2017 04:59 |
|
|
# ? Jun 4, 2024 17:30 |
|
In database theory, we say that a good primary key is one that is stable, that its value should not change. I'm currently trying to design a database with mostly natural keys, but most of the candidates have the possibility of changing. It just doesn't seem like a big deal to me that, say, a user might change its email address and cascade the change to the other tables. Is there something I'm missing about the dangers of primary key instability?
|
# ? Jun 8, 2017 13:12 |
|
rt4 posted:In database theory, we say that a good primary key is one that is stable, that its value should not change. I'm currently trying to design a database with mostly natural keys, but most of the candidates have the possibility of changing. It just doesn't seem like a big deal to me that, say, a user might change its email address and cascade the change to the other tables. Is there a reason why you want to use natural primary keys over INT-IDENTITY? Primary key instability can break FK relationships and if your PK is also your clustered index could cause lots of reorganization of the index as values change.
|
# ? Jun 8, 2017 16:00 |
|
Given two tables, tableleft and tableright, with a left join, would it be possible that the resulting join has more rows than tableleft? Edit: Should have just tested it. Looks like it would only be an issue of joining on a column that doesn't have unique values. huhu fucked around with this message at 16:00 on Jun 12, 2017 |
# ? Jun 12, 2017 15:57 |
|
huhu posted:Given two tables, tableleft and tableright, with a left join, would it be possible that the resulting join has more rows than tableleft? absolutely code:
e2: im having to pass captchas to post in here. what the gently caress e3: to get a screenshot of this captcha, gently caress u cloudfare kumba fucked around with this message at 16:10 on Jun 12, 2017 |
# ? Jun 12, 2017 16:06 |
|
Uggh is this one of those problems solved by single passes in sql or do I have to deal with temp tables? I have a table of addresses that I have determined to be dupes. I am using a ranking function to pick the 'winner' of the dupe pair. code:
code:
code:
|
# ? Jun 13, 2017 14:50 |
|
Roundboy posted:Uggh is this one of those problems solved by single passes in sql or do I have to deal with temp tables? If I understand what you're trying to do correctly, something like this? ;with CTE (main_id, winner_id, loser_id, row_number) as ( select main_id, winner_id, loser_id, row_number() over (partition by whatever order by things) from myTable where clause ) SELECT main_id, winner_id, loser_id FROM CTE WHERE winner_id NOT IN (select loser_id from CTE) and otherClauses
|
# ? Jun 13, 2017 15:01 |
|
Roundboy posted:[snip] code:
|
# ? Jun 13, 2017 18:43 |
|
dupes come in pairs because i can only find them with a self join ala:code:
the correct output for a situation like this : code:
Self table joins are the only way I am aware of to check for duplicate info because of a problem in our data, and the id is unique and not based on actual values. Typically I resolve the dupe pair like what was posted above like id1 < id2 or rank. Using where not in eliminated all records since its on both sides of the equation. Taking a step back, is there a better way or its just a slog through it ? Edit: I think I got it. I switched to using rank() vs row_number() becaus I actually wanted it to rank, and adjusted my partition by because it was grabbing too much, The ultimate result was that there were the same 3 pairs of groups as above, but the ones I wanted were all rnk = 1 .. shich I can just select for in the CTE Roundboy fucked around with this message at 19:18 on Jun 13, 2017 |
# ? Jun 13, 2017 19:03 |
|
Seems like given your structure, doing like you said and having a "WHERE winner_id < loser_id " would do the trick. That's how I've solved things like that in the past.
|
# ? Jun 13, 2017 19:10 |
|
Roundboy posted:dupes come in pairs because i can only find them with a self join ala: code:
|
# ? Jun 13, 2017 20:09 |
|
kumba posted:Seems like given your structure, doing like you said and having a "WHERE winner_id < loser_id " would do the trick. That's how I've solved things like that in the past. Yeah it's probably a bad thing to do but I've used the incremental nature of synthetic IDs like this as a shorthand for chronology/first-in in the past as well.
|
# ? Jun 13, 2017 20:29 |
|
zen death robot posted:FYI this thread sets off the WAF for a capcha check because of all the sql code in it. The best I can do is whitelist your IP if it becomes an issue. Just hit me up in PMs and let me know. hahaha holy poo poo this owns i am not trying to inject anything in your database zdr i promi */ GO DROP TABLE dbo.Users; GO
|
# ? Jun 14, 2017 20:02 |
|
Huh, I didn't even think of what that kind of stuff would trigger, I think I'm going to go post SELECT * FROM CREDITCARDS in all the threads I don't like
|
# ? Jun 15, 2017 01:26 |
|
Hello, friends, I haven't done any SQL since my undergraduate but I was recently was asked some in an interview and messed up some straightforward questions. I decided to work through HackerRank problems to relearn/teach myself SQL and it's pretty easy to pick up once you work through examples, or rather the basic queries are! One of the questions was to write a query to do the following Return the maximum salary an employee could earn which was done by multiplying salary X months which were two fields. Then on the same line also return the number of employees who had that maximum salary. My attempts were like this and ofc didn't work code:
code:
e: this is MySQL btw!
|
# ? Jun 15, 2017 15:12 |
|
That first line is returning earnings as salary * months, as well as the number of records with that level of earnings (because of the 2nd line). That GROUP BY clause is effectively putting the different earnings levels into buckets, and the COUNT(*) is giving you the number of records in each bucket. The last line is ordering by earnings descending to give you the highest earners first, and the limit 1 means you're only getting results for the highest earnings amount found. So for example, if your data looked like: code:
code:
code:
|
# ? Jun 15, 2017 15:29 |
|
This makes a tonne of sense and really clears up what was going on, thanks so so much!
|
# ? Jun 15, 2017 17:22 |
|
Is this the right thread for ETL questions? I'm in the early stages of needing to build a data warehouse and have been messing around with ETL tools, namely Pentaho and Talend and so far it feels like they're built with the assumption that your production DB and data sources are designed with some sort of sanity behind them. In order to put together a row in one of my dimensional or fact tables I usually need to go through a process like code:
My other thought was to do something like code:
|
# ? Jun 16, 2017 03:46 |
|
Portland Sucks posted:Is this the right thread for ETL questions? That definitely does sound like something Pentaho should be able to do with a couple of lookups, filters and such. If you can give me a more detailed overview of what you have and what your goal is, I can try to help. Pentaho can be very finicky at times, esöecially with data warehousing.
|
# ? Jun 16, 2017 06:10 |
|
I have been poking around a lot lately in regards to the logic of backing MySQL's temporary data location onto a ram disk. I have spent a lot of time reading up on temporary table creation, both in memory and on disk. I am seeing configurations in the instances I manage where we are looking at a 50% split between on disk/in memory. In particular one application which is just consistently doing massive joins that seem to occur every few minutes that are all going to disk. My logic here is that were this to be going to disk on disk that is backed to ram these should be complete a lot faster. This seems to be the generally accepted outcome based on every "Improve MySQL performance" article I have read so far. The only real concerns I can see would be what might occur should a query consume the available memory assigned to a ram disk in this scenario. Provided you have allocated the memory overall on the machine I see no concern with exhausting the overall memory allocation. My bigger point is here and it is something I have been looking for but not seeing a lot on, are there good solid reasons why I should find some other way to deal with these scenarios rather than jumping straight to the "back it onto RAM" bandwagon? I am happy to either path but I can not help but want to hear the flip side of the coin on this in regards to those who would not suggest such a thing.
|
# ? Jun 19, 2017 03:02 |
|
There is a setting or two that deals with exactly how much of the transaction will take place in memory, and after which point, it will dump to disc and continue. This is a very costly proposition. You can set this for joins, groups, etc. But it is typically pretty small and it's set per transaction per connection. Simply setting this to a huge number will kill your box. It's possible to set this right before a particular run then dump it back. With ram disk I would imagine you are using ram ram, then the disk is a dump to more ram.. but there is a transfer. You may want to set your per transaction limit lower then usual,. So this big runs hit 'disk' right away, but I have very little experience in this, and it seems not many people have done this. I had issues with my low use MySql box just chugging on big asks with a moderate dataset, that with other analytical shortfalls I went to postgres
|
# ? Jun 19, 2017 03:38 |
|
This seems strange. As RAM is still considered "expensive", the usual approach is to configure the database to ensure near maximum memory availability (and consumption) by the workers. There should be very little space in which to create a ramdisk. Environments with continuous, complex workloads are moving to nvme, but those looking to reduce working latencies typically make due by switching some portion of storage to general SSDs. PostgreSQL is notably better if you have more than a simple join, but still... Are you quite sure the database is configured correctly?
|
# ? Jun 19, 2017 05:36 |
|
Roundboy posted:There is a setting or two that deals with exactly how much of the transaction will take place in memory, and after which point, it will dump to disc and continue. This is a very costly proposition. The current value for this configured machine is actually quite large ironically. PhantomOfTheCopier posted:This seems strange. As RAM is still considered "expensive", the usual approach is to configure the database to ensure near maximum memory availability (and consumption) by the workers. There should be very little space in which to create a ramdisk. Environments with continuous, complex workloads are moving to nvme, but those looking to reduce working latencies typically make due by switching some portion of storage to general SSDs. PostgreSQL is notably better if you have more than a simple join, but still... Most of the workloads we run (and ever really see the need to even consider tuning in most cases) are all the same. This question comes as a result more of "Do we allow these queries that are not that massive to run in the 512MB of space we have available in /tmp given their size or specifically send them to actual disk?" PhantomOfTheCopier posted:Are you quite sure the database is configured correctly? I believe the one I am looking at is, in fact I find it far more likely its the queries being run given what I have seen. Its join after join with no indexes, processes constantly holding for query_cache locks and so on. Again this was largely for my interest as I poke deeper into MySQL beyond "I killed this stuck process for you.".
|
# ? Jun 19, 2017 08:50 |
|
Question about Postgresql streaming replication and WAL-archiving: Documentation states that:quote:If archive_mode is set to on, the archiver is not enabled during recovery or standby mode. If the standby server is promoted, it will start archiving after the promotion, but will not archive any WAL it did not generate itself. [last archived master server segment] [missing segment] [first segment archived from new master] etc And it won't be possible to recover from such archive because there is a gap? Alternative is to set archive_mode=always, in which case we either archive each standby independently under a different prefix, multiplying its size by number of standbys running, or use shared archive: quote:The shared archive can be handled similarly, but the archive_command must test if the file being archived exists already, and if the existing file has identical contents. This requires more care in the archive_command, as it must be careful to not overwrite an existing file with different contents, but return success if the exactly same file is archived twice. And all that must be done free of race conditions, if two servers attempt to archive the same file at the same time.
|
# ? Jun 22, 2017 12:13 |
|
Has anyone ever actually used the MySQL spider storage engine? The documentation casually mentions it as though it's incredibly simple to query multiple databases as if they were one, but it sounds too good to be true.
|
# ? Jun 22, 2017 14:16 |
|
Forgall posted:Question about Postgresql streaming replication and WAL-archiving: Documentation states that: Suppose you have M1 steaming+archiving, S1 steaming standby, S2 replaying. M1 goes down mid-WAL (likely). Now S1 and S2 have different views of reality and you are required to maintain consistency. You promote S1. What becomes of S2? For bonus points, what if S3 was replaying archives from S1?
|
# ? Jun 22, 2017 14:59 |
|
PhantomOfTheCopier posted:Read the next two sentences of the documentation. There's no "failure" with a replaying standby, but there's a gap for a steaming standby. There's kinda a reason. PhantomOfTheCopier posted:Suppose you have M1 steaming+archiving, S1 steaming standby, S2 replaying. M1 goes down mid-WAL (likely). Now S1 and S2 have different views of reality and you are required to maintain consistency. You promote S1. What becomes of S2? For bonus points, what if S3 was replaying archives from S1?
|
# ? Jun 22, 2017 16:00 |
|
Forgall posted:Those seem to say that S1 does receive latest log entries from M1 via streaming replication, but does not archive them after being promoted. Does it say that? quote:To get a complete series of WAL files in the archive, you must ensure that all WAL is archived, before it reaches the standby. This is inherently true with file-based log shipping, as the standby can only restore files that are found in the archive, but not if streaming replication A few features have been added since I was PG replication expert, so I'll probably make at least one false statement here, but there are a few pieces of the picture you're not quite seeing. It would be best to go back and rethink the purpose of WAL archiving, specifically to think about PITR and timelines. There's a difference between a single transaction and a complete WAL, so a streaming replication standby has a different notion of the state of the data than does a server rebuilding the database from WAL (even in a "slow replication" scenario). Transaction logs are tightly coupled to a server and its timeline, so it's probably better to think that M1 WAL contains transactions M1.1 through M1.a that get written into WAL segment M1.a. If S1 is streaming transaction logs from M1, one of the very first things performed at startup/reconnect is validation that the xlogs still belong to M1 and that the same timeline is being replayed (see the replication protocol). If M1 goes down or executes PITR onto a different timeline, S1 can't replicate because it would cause data inconsistency. During hot standby, quote:Transactions started during hot standby will never be assigned a transaction ID and cannot write to the system write-ahead log. So while I cannot find you a single document supporting this view, one assumption that I'd say is false is that all replicas in a system are replaying "transactions with ID 1 through 9999" or whatever. Instead, replay happens based on the transaction log ID, and the WAL entries are applied directly to the on-disk blocks. The transaction ID, as far as I can find, is only used during hot standby feedback; if you check the above doc for "xmin", you'll see that the receiver can specify a "minimum transaction id" to ensure the sender doesn't enter what would be a wraparound state on the hot standby. The transaction ID in archived WAL is necessary for PITR, of course, but it doesn't really drive replication. So, what does all that mean? During S1 failover, if it starts archiving logs after recovery, those will be S1.N logs, not M1.N logs. This means that an archive-replaying S2 has no timeline it can follow, and even though they patched to permit some timeline switches, that only works when S2 was streaming (and can be switched to follow S1). Once S1 has started writing logs, they exist so that S1 can perform PITR and describe the S1 timeline, not the M1 timeline. If S1 immediately undergoes failure (power cycle) it must perform recovery against it's own WAL, not the WAL from M1. Moreover, while S1 can replay transactions from M1 as long as possible, it cannot guarantee that it has a consistent view of the M1 log. What if M1 did archive a WAL containing transactions that have not yet been received on S1? Could S1 always properly reconstruct M1.N? Instead the system is designed to be very, very safe regarding data consistency and how it is done. S1 cannot bastardize S2 data in any way, because S2 is tracking a completely different timeline. Oh, you'll get S2's view of the world before the master went down, but just because S1 is promoted doesn't mean M1 isn't going to come back and complete its archiving. If you want to have S2 reflect some other reality, it has to base off S1 and follow that timeline, which could well take place at, after, or before the last transaction received on S2. Meanwhile, S3 wants to happily continue following S1, so in an always-archiving scenario, S3 doesn't care a whim about M1.N WAL, and only wants to continue tracking S1 WAL, so S1 certainly doesn't want to claim creation of M1 entries during failover. Go back and read the replication documents again and review what you think replication does and how it does it. You may find you need to change the methods of replication used in your database cluster.
|
# ? Jun 23, 2017 05:02 |
|
Theres a lot of expert questions in here so hopefully this isnt too simple. I'm a complete SQL noob and trying to do something for my fantasy basketball site in MySQL. I have a table PlayerStats that has lots of rows for each players yearly statistics code:
The question is: How do I sort descending by Points, but sort on the combined records where there are multiple records with the same PlayerID and year? So the result would be: code:
|
# ? Jun 25, 2017 02:39 |
|
Seems like this is easy in mysqlSQL code:
KernelSlanders fucked around with this message at 04:16 on Jun 25, 2017 |
# ? Jun 25, 2017 04:14 |
|
KernelSlanders posted:Seems like this is easy in mysql Thanks! This is exactly what I needed. I will change the Year column name but seems to work okay for now. skrath fucked around with this message at 05:21 on Jun 25, 2017 |
# ? Jun 25, 2017 05:07 |
|
skrath posted:Thanks! This is exactly what I needed. I will change the Year column name but seems to work okay for now. Idk about mysql, but in Microsoft SQL Server you can use brackets around most reserved words to make syntax highlighting behave like: SELECT [SELECT] FROM dbo.[Database]
|
# ? Jun 25, 2017 17:28 |
|
zen death robot posted:It's cloudflare and I'm sorry I thought I had this thread filtered I thought KernelSlanders was talking about code-tag syntax highlighting, but I'm browsing on mobile and couldn't tell. For once ZDR, it may not be Cloudflare.
|
# ? Jun 25, 2017 19:10 |
|
Nth Doctor posted:Idk about mysql, but in Microsoft SQL Server you can use brackets around most reserved words to make syntax highlighting behave like: It's double quotes in MySQL: SELECT "select" FROM "table" WHERE "select" LIKE 'select'; but still not a great practice to name tables or columns after keywords.
|
# ? Jun 25, 2017 19:14 |
|
KernelSlanders posted:It's double quotes in MySQL: SELECT "select" FROM "table" WHERE "select" LIKE 'select'; but still not a great practice to name tables or columns after keywords. Dunno if they changed it, but it used to be that you had to set the SQL mode appropriately to be able to use double quotes as identifier quotes. Otherwise double quotes are just an alternative to single quotes for quoting literal strings. Backticks reliably work as identifier quotes (this is MySQL-specific afaik). You should be setting the SQL mode anyway if working with MySQL, though, because setting the right mode makes it less sloppy. Double quotes are identifier quotes per ANSI SQL iirc.
|
# ? Jun 26, 2017 14:11 |
|
Anyone got an ear to bend about Galera clustering? I want to replicate everything to a single point for backups and replicate content from that single point back to the other shards, but I want shards to have their own exclusive databases for per-shard data. A bit like: Master/backup: content sharda shardb Sharda: content sharda Shardb: content shardb I just want to know if I can exclude remote databases selectively I guess. I've been googling pretty hard and don't see why not just yet, as long as the shard dbs have different names. Edit: I was being too specific while googling about. Galera just uses the vanilla 'replicate-ignore-db' in my.cnf. I'll just symlink a .cnf file in my setup script. Ranzear fucked around with this message at 00:45 on Jul 1, 2017 |
# ? Jul 1, 2017 00:32 |
|
Hammerite posted:Dunno if they changed it, but it used to be that you had to set the SQL mode appropriately to be able to use double quotes as identifier quotes. Otherwise double quotes are just an alternative to single quotes for quoting literal strings. Backticks reliably work as identifier quotes (this is MySQL-specific afaik). Yeah, you're right. I've been switching back and forth between mysql and postgresql too much
|
# ? Jul 1, 2017 18:52 |
|
I have a dumb general database organization question. I have a spreadsheet containing employee work schedule that looks something like this:code:
Now, I suspect that creating a column for every month in a single data table is an incredibly dumb way to do it, especially since I already know that this will have to extend to the next year as well. I'm thinking about looping through the spreadsheet with a VBA script to generate an Access table with every person-month pairing (turning the label into proper date format), but maybe there's a smarter way to do it?
|
# ? Jul 11, 2017 12:34 |
|
grate deceiver posted:I have a dumb general database organization question. I have a spreadsheet containing employee work schedule that looks something like this: Can you pivot the data in Excel?
|
# ? Jul 11, 2017 12:57 |
|
|
# ? Jun 4, 2024 17:30 |
|
Kuule hain nussivan posted:Can you pivot the data in Excel? Basically for every one of those X's I need to keep track of a bunch of numbers, a separate spreadsheet file, and some .pdf attachments. There's 12 (probably 24) months and 200 people, doing it in Excel is starting to get annoying as hell.
|
# ? Jul 11, 2017 13:22 |