Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
PhantomOfTheCopier
Aug 13, 2008

Pikabooze!

Roundboy posted:

But I think I did figure it out. You need to assign default rights as the owner of the table. My unique case is that I have a dozen if more potential owners. 100% of the use cases I find on the Google deal with pure universal read only users (I only want this schema) or single owner scenarios.
Okay, I'm with you now. Indeed, the expectation is that a user with object creation privileges will be in charge of the access granted to that object, so indeed each user will assign their own default privileges within the schema.

But, it feels like there's a bug here. If alice and bob are members of schema_users, and schema_users is granted permissions to create objects within the schema, then alice can create a table only through an inherited privilege. Why then should alice's default privileges apply to the new object? If schema_users was NOINHERIT, then alice could SET ROLE first and create a table, which would then be owned by schema_users directly and contain schema_users' default privileges.

I suspect the actual answer to your question is, It's a staging area for table playing, so does per-user authority truly matter? Typical role membership is granted to classes of users; administrators have equal power to destroy the database, and everyone else falls into the readonly bucket. Individuals requiring full privilege isolation are typically granted CREATEDB or otherwise provided a database of their own that is inaccessible to the public. At least per-user settings are possible but, given this (historical) expectation for access control, it's unsurprising that it's rather difficult to obtain the desired granularity.

Adopt an expectation that each table named "username" is owned by "username" and tell your users to be friendly to one another (and that they should have backups). First offense, forgive; second offense, "bye bye this is why we can't have nice things build your own staging area with sqlite3".


For others'/future reference, there is a rather thorough walkthrough of layered privileges at https://dba.stackexchange.com/questions/117109/how-to-manage-default-privileges-for-users-on-a-database-vs-schema


Edit: Suppose such a per-user default was implemented. If the user was not "trustworthy", they could simply revoke all as soon as they create an object, so there's no way to enforce the SELECT privilege to others.

PhantomOfTheCopier fucked around with this message at 13:41 on Jun 8, 2017

Adbot
ADBOT LOVES YOU

spiritual bypass
Feb 19, 2008

Grimey Drawer
In database theory, we say that a good primary key is one that is stable, that its value should not change. I'm currently trying to design a database with mostly natural keys, but most of the candidates have the possibility of changing. It just doesn't seem like a big deal to me that, say, a user might change its email address and cascade the change to the other tables.

Is there something I'm missing about the dangers of primary key instability?

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


rt4 posted:

In database theory, we say that a good primary key is one that is stable, that its value should not change. I'm currently trying to design a database with mostly natural keys, but most of the candidates have the possibility of changing. It just doesn't seem like a big deal to me that, say, a user might change its email address and cascade the change to the other tables.

Is there something I'm missing about the dangers of primary key instability?

Is there a reason why you want to use natural primary keys over INT-IDENTITY?
Primary key instability can break FK relationships and if your PK is also your clustered index could cause lots of reorganization of the index as values change.

huhu
Feb 24, 2006
Given two tables, tableleft and tableright, with a left join, would it be possible that the resulting join has more rows than tableleft?

Edit: Should have just tested it. Looks like it would only be an issue of joining on a column that doesn't have unique values.

huhu fucked around with this message at 16:00 on Jun 12, 2017

kumba
Nov 8, 2003

I posted my food for USPOL Thanksgiving!

enjoy the ride

Lipstick Apathy

huhu posted:

Given two tables, tableleft and tableright, with a left join, would it be possible that the resulting join has more rows than tableleft?

absolutely

code:
DECLARE @tableleft TABLE 
  ( 
     id INT 
  ) 
DECLARE @tableright TABLE 
  ( 
     id INT 
  ) 

INSERT INTO @tableleft 
            (id) 
SELECT 1 
UNION ALL 
SELECT NULL 
UNION ALL 
SELECT 2 

INSERT INTO @tableright 
            (id) 
SELECT 1 
UNION ALL 
SELECT 1 
UNION ALL 
SELECT 1 
UNION ALL 
SELECT NULL 
UNION ALL 
SELECT 2 
UNION ALL 
SELECT NULL 
UNION ALL 
SELECT 2 
UNION ALL 
SELECT 2 
UNION ALL 
SELECT 2 
UNION ALL 
SELECT 0 

SELECT * 
FROM   @tableleft 

SELECT * 
FROM   @tableleft l 
       LEFT JOIN @tableright r 
              ON l.id = r.id 
e: i see you answered this already oh well leaving for posterity

e2: im having to pass captchas to post in here. what the gently caress

e3: to get a screenshot of this captcha, gently caress u cloudfare

kumba fucked around with this message at 16:10 on Jun 12, 2017

Roundboy
Oct 21, 2008
Uggh is this one of those problems solved by single passes in sql or do I have to deal with temp tables?

I have a table of addresses that I have determined to be dupes. I am using a ranking function to pick the 'winner' of the dupe pair.

code:
main_id         addr_ID1             addr_id2        row_number()
999                111                   222            1
999                222                   111            2 
I take row #1 in the case above. But the problem is now I have multiple dupes on the same record

code:
main_id         addr_ID1             addr_id2        row_number()
999                111                   222            1
999                222                   111            2 
999                111                   333            3
999                333                   222            2  
Using a CTE, I pull all these records and select the lowest one, but how can I avoid the situation where the loser in one case is a winner in another?

code:
main_id         winner_id             loser_id        
999                111                   222           
999                111                   333            
999                222                   333                       
The last condition there should be dropped because its handled in a previous case. Im starting to go down another path of self joins and i think im making a mess of things

kumba
Nov 8, 2003

I posted my food for USPOL Thanksgiving!

enjoy the ride

Lipstick Apathy

Roundboy posted:

Uggh is this one of those problems solved by single passes in sql or do I have to deal with temp tables?

I have a table of addresses that I have determined to be dupes. I am using a ranking function to pick the 'winner' of the dupe pair.

code:
main_id         addr_ID1             addr_id2        row_number()
999                111                   222            1
999                222                   111            2 
I take row #1 in the case above. But the problem is now I have multiple dupes on the same record

code:
main_id         addr_ID1             addr_id2        row_number()
999                111                   222            1
999                222                   111            2 
999                111                   333            3
999                333                   222            2  
Using a CTE, I pull all these records and select the lowest one, but how can I avoid the situation where the loser in one case is a winner in another?

code:
main_id         winner_id             loser_id        
999                111                   222           
999                111                   333            
999                222                   333                       
The last condition there should be dropped because its handled in a previous case. Im starting to go down another path of self joins and i think im making a mess of things

If I understand what you're trying to do correctly, something like this?

;with CTE (main_id, winner_id, loser_id, row_number) as
(
select main_id, winner_id, loser_id, row_number() over (partition by whatever order by things)
from myTable
where clause
)
SELECT main_id, winner_id, loser_id
FROM CTE
WHERE winner_id NOT IN (select loser_id from CTE)
and otherClauses

Jethro
Jun 1, 2000

I was raised on the dairy, Bitch!
Why are you concerned about pairs of records when the dups don't come in pairs? Is there no universal ranking you can use to determine the one winner, so pairwise comparison is necessary? Assuming you can just order by things (as kumba put it):

code:
;with dups as (
select main_id,
       addr_ID,
       row_number() over (partition by main_id order by things) addr_rnk,
       count(*) over (partition by main_id) dup_ct
  from base_table),
dup_win as (
select main_id,
       addr_ID winner_id
  from dups
 where addr_rnk = 1
       and dup_ct > 1)
select main_id,
       winner_id
  from dup_win

Roundboy
Oct 21, 2008
dupes come in pairs because i can only find them with a self join ala:

code:
select 
a.main_id,a.addr_id,b.addr_id
,row_number() over(partition by a.main_id order by a.addr_rank asc,a.addr_status desc) as rnk
from address a
inner join address b on a.main_id = b.main_id 
where
and a.address_status = 'A'
and b.address_status = 'A'
and a.addr_id <> b.addr_id
and a.address_line1 = b.address_line1
and a.address_line2 = b.address_line2
and a.state = b.state
and a.zip= b.zip
The ranking function helped to drive the winner address id to the top

the correct output for a situation like this :
code:
main_id	  addr_id_winner	addr_id_loser
11	        a1	       a2
11	        a1	       a3
11	        a2	       a1
11	        a2	       a3
11	        a3	       a2
11	        a3	       a1
would be a1 is the winner, with a2 and a3 rolling up to it. a3 into a2 is redundant and useless, and a1 into the others is the other side of a dupe pair.

Self table joins are the only way I am aware of to check for duplicate info because of a problem in our data, and the id is unique and not based on actual values.

Typically I resolve the dupe pair like what was posted above like id1 < id2 or rank. Using where not in eliminated all records since its on both sides of the equation.

Taking a step back, is there a better way or its just a slog through it ?



Edit: I think I got it. I switched to using rank() vs row_number() becaus I actually wanted it to rank, and adjusted my partition by because it was grabbing too much,

The ultimate result was that there were the same 3 pairs of groups as above, but the ones I wanted were all rnk = 1 .. shich I can just select for in the CTE

Roundboy fucked around with this message at 19:18 on Jun 13, 2017

kumba
Nov 8, 2003

I posted my food for USPOL Thanksgiving!

enjoy the ride

Lipstick Apathy
Seems like given your structure, doing like you said and having a "WHERE winner_id < loser_id " would do the trick. That's how I've solved things like that in the past.

Jethro
Jun 1, 2000

I was raised on the dairy, Bitch!

Roundboy posted:

dupes come in pairs because i can only find them with a self join ala:


Edit: I think I got it. I switched to using rank() vs row_number() becaus I actually wanted it to rank, and adjusted my partition by because it was grabbing too much,

The ultimate result was that there were the same 3 pairs of groups as above, but the ones I wanted were all rnk = 1 .. shich I can just select for in the CTE

code:
select 
a.main_id,a.addr_id
,row_number() over(partition by a.main_id, a.address_status, a.address_line1, a.address_line2, a.state, a.zip order by a.addr_rank asc,a.addr_status desc) as rnk
from address a

Scaramouche
Mar 26, 2001

SPACE FACE! SPACE FACE!

kumba posted:

Seems like given your structure, doing like you said and having a "WHERE winner_id < loser_id " would do the trick. That's how I've solved things like that in the past.

Yeah it's probably a bad thing to do but I've used the incremental nature of synthetic IDs like this as a shorthand for chronology/first-in in the past as well.

kumba
Nov 8, 2003

I posted my food for USPOL Thanksgiving!

enjoy the ride

Lipstick Apathy

zen death robot posted:

FYI this thread sets off the WAF for a capcha check because of all the sql code in it. The best I can do is whitelist your IP if it becomes an issue. Just hit me up in PMs and let me know.

hahaha holy poo poo this owns

i am not trying to inject anything in your database zdr i promi

*/ GO DROP TABLE dbo.Users; GO

Scaramouche
Mar 26, 2001

SPACE FACE! SPACE FACE!

Huh, I didn't even think of what that kind of stuff would trigger, I think I'm going to go post SELECT * FROM CREDITCARDS in all the threads I don't like

EmmyOk
Aug 11, 2013

Hello, friends, I haven't done any SQL since my undergraduate but I was recently was asked some in an interview and messed up some straightforward questions. I decided to work through HackerRank problems to relearn/teach myself SQL and it's pretty easy to pick up once you work through examples, or rather the basic queries are! One of the questions was to write a query to do the following

Return the maximum salary an employee could earn which was done by multiplying salary X months which were two fields.

Then on the same line also return the number of employees who had that maximum salary.

My attempts were like this and ofc didn't work

code:
SELECT MAX(salary * months), COUNT(salary * months = MAX(salary * months)) FROM Employee
A correct solution ended up being this

code:
SELECT salary * months AS earnings, COUNT(*) FROM Employee
GROUP BY earnings
ORDER BY earnings DESC limit 1;
However I'm not too clear on how it works. What does that first line return?

e: this is MySQL btw!

kumba
Nov 8, 2003

I posted my food for USPOL Thanksgiving!

enjoy the ride

Lipstick Apathy
That first line is returning earnings as salary * months, as well as the number of records with that level of earnings (because of the 2nd line).

That GROUP BY clause is effectively putting the different earnings levels into buckets, and the COUNT(*) is giving you the number of records in each bucket. The last line is ordering by earnings descending to give you the highest earners first, and the limit 1 means you're only getting results for the highest earnings amount found.

So for example, if your data looked like:

code:
EmployeeID	salary	months
1		1000	4
2		2000	2
3		1800	2
4		1500	2
5		2000	1
Your top earners are those making 4000 (1000 * 4 and 2000 * 2; so, two people). That query would return:

code:
earnings	(No column name)
4000		2
If you didn't have the LIMIT 1 there, it would instead return:

code:
earnings	(No column name)
4000		2
3600		1
3000		1
2000		1
Hopefully that makes sense. I know very little about MySQL, but your post did teach me that you can group by aliases which is neat because MSSQL does not allow for that and it's super annoying!

EmmyOk
Aug 11, 2013

This makes a tonne of sense and really clears up what was going on, thanks so so much!

Portland Sucks
Dec 21, 2004
༼ つ ◕_◕ ༽つ
Is this the right thread for ETL questions?

I'm in the early stages of needing to build a data warehouse and have been messing around with ETL tools, namely Pentaho and Talend and so far it feels like they're built with the assumption that your production DB and data sources are designed with some sort of sanity behind them. In order to put together a row in one of my dimensional or fact tables I usually need to go through a process like

code:
query from one table to build a list of values and grab some additional values i'll need for my row

use previous values to query another table and if that doesn't work then query the same table with slightly different values for another set of results

join the two result sets and then iterate over that set to query from one of a dozen or so different DBs to extract the final information
I haven't had any success with either of those tools preforming the final queries that are dependent on previous queries row by row. I'm wondering if these tools are just over hyped and I should be scripting this or if there is a light at the end of the tunnel with these things that I'm just not seeing.

My other thought was to do something like

code:
Sources -> automated complicated scripts -> staging area -> Pentaho -> Warehouse
mostly because I do really like the idea of having a simple abstracted layer between our mess and the warehouse and we know that we'll end up needing to make more dimensional models in the future from the same sources.

Kuule hain nussivan
Nov 27, 2008

Portland Sucks posted:

Is this the right thread for ETL questions?

I'm in the early stages of needing to build a data warehouse and have been messing around with ETL tools, namely Pentaho and Talend and so far it feels like they're built with the assumption that your production DB and data sources are designed with some sort of sanity behind them. In order to put together a row in one of my dimensional or fact tables I usually need to go through a process like

code:
query from one table to build a list of values and grab some additional values i'll need for my row

use previous values to query another table and if that doesn't work then query the same table with slightly different values for another set of results

join the two result sets and then iterate over that set to query from one of a dozen or so different DBs to extract the final information
I haven't had any success with either of those tools preforming the final queries that are dependent on previous queries row by row. I'm wondering if these tools are just over hyped and I should be scripting this or if there is a light at the end of the tunnel with these things that I'm just not seeing.

My other thought was to do something like

code:
Sources -> automated complicated scripts -> staging area -> Pentaho -> Warehouse
mostly because I do really like the idea of having a simple abstracted layer between our mess and the warehouse and we know that we'll end up needing to make more dimensional models in the future from the same sources.

That definitely does sound like something Pentaho should be able to do with a couple of lookups, filters and such. If you can give me a more detailed overview of what you have and what your goal is, I can try to help. Pentaho can be very finicky at times, esöecially with data warehousing.

insidius
Jul 21, 2009

What a guy!
I have been poking around a lot lately in regards to the logic of backing MySQL's temporary data location onto a ram disk. I have spent a lot of time reading up on temporary table creation, both in memory and on disk. I am seeing configurations in the instances I manage where we are looking at a 50% split between on disk/in memory. In particular one application which is just consistently doing massive joins that seem to occur every few minutes that are all going to disk. My logic here is that were this to be going to disk on disk that is backed to ram these should be complete a lot faster. This seems to be the generally accepted outcome based on every "Improve MySQL performance" article I have read so far.

The only real concerns I can see would be what might occur should a query consume the available memory assigned to a ram disk in this scenario. Provided you have allocated the memory overall on the machine I see no concern with exhausting the overall memory allocation.

My bigger point is here and it is something I have been looking for but not seeing a lot on, are there good solid reasons why I should find some other way to deal with these scenarios rather than jumping straight to the "back it onto RAM" bandwagon? I am happy to either path but I can not help but want to hear the flip side of the coin on this in regards to those who would not suggest such a thing.

Roundboy
Oct 21, 2008
There is a setting or two that deals with exactly how much of the transaction will take place in memory, and after which point, it will dump to disc and continue. This is a very costly proposition.

You can set this for joins, groups, etc. But it is typically pretty small and it's set per transaction per connection. Simply setting this to a huge number will kill your box.

It's possible to set this right before a particular run then dump it back.

With ram disk I would imagine you are using ram ram, then the disk is a dump to more ram.. but there is a transfer. You may want to set your per transaction limit lower then usual,. So this big runs hit 'disk' right away, but I have very little experience in this, and it seems not many people have done this.

I had issues with my low use MySql box just chugging on big asks with a moderate dataset, that with other analytical shortfalls I went to postgres

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
This seems strange. As RAM is still considered "expensive", the usual approach is to configure the database to ensure near maximum memory availability (and consumption) by the workers. There should be very little space in which to create a ramdisk. Environments with continuous, complex workloads are moving to nvme, but those looking to reduce working latencies typically make due by switching some portion of storage to general SSDs. PostgreSQL is notably better if you have more than a simple join, but still...

Are you quite sure the database is configured correctly?

insidius
Jul 21, 2009

What a guy!

Roundboy posted:

There is a setting or two that deals with exactly how much of the transaction will take place in memory, and after which point, it will dump to disc and continue. This is a very costly proposition.

You can set this for joins, groups, etc. But it is typically pretty small and it's set per transaction per connection. Simply setting this to a huge number will kill your box.


The current value for this configured machine is actually quite large ironically.

PhantomOfTheCopier posted:

This seems strange. As RAM is still considered "expensive", the usual approach is to configure the database to ensure near maximum memory availability (and consumption) by the workers. There should be very little space in which to create a ramdisk. Environments with continuous, complex workloads are moving to nvme, but those looking to reduce working latencies typically make due by switching some portion of storage to general SSDs. PostgreSQL is notably better if you have more than a simple join, but still...

Most of the workloads we run (and ever really see the need to even consider tuning in most cases) are all the same. This question comes as a result more of "Do we allow these queries that are not that massive to run in the 512MB of space we have available in /tmp given their size or specifically send them to actual disk?"

PhantomOfTheCopier posted:

Are you quite sure the database is configured correctly?

I believe the one I am looking at is, in fact I find it far more likely its the queries being run given what I have seen. Its join after join with no indexes, processes constantly holding for query_cache locks and so on. Again this was largely for my interest as I poke deeper into MySQL beyond "I killed this stuck process for you.".

Forgall
Oct 16, 2012

by Azathoth
Question about Postgresql streaming replication and WAL-archiving: Documentation states that:

quote:

If archive_mode is set to on, the archiver is not enabled during recovery or standby mode. If the standby server is promoted, it will start archiving after the promotion, but will not archive any WAL it did not generate itself.
So if master server crashes and its latest WAL segment is not archived, and standby server with achive_mode=on is promoted to master then archive is going to look like:
[last archived master server segment] [missing segment] [first segment archived from new master] etc
And it won't be possible to recover from such archive because there is a gap?

Alternative is to set archive_mode=always, in which case we either archive each standby independently under a different prefix, multiplying its size by number of standbys running, or use shared archive:

quote:

The shared archive can be handled similarly, but the archive_command must test if the file being archived exists already, and if the existing file has identical contents. This requires more care in the archive_command, as it must be careful to not overwrite an existing file with different contents, but return success if the exactly same file is archived twice. And all that must be done free of race conditions, if two servers attempt to archive the same file at the same time.
But what isn't clear to me is, what are situations where master and standby will try to archive segments with different contents? How does it interact with archive_timeout setting? Unless there some mechanism to synchronize it, timed segment rotation will not trigger on master and standby at the same time, leading to segments always being different, leading to no savings from using shared archive.

spiritual bypass
Feb 19, 2008

Grimey Drawer
Has anyone ever actually used the MySQL spider storage engine? The documentation casually mentions it as though it's incredibly simple to query multiple databases as if they were one, but it sounds too good to be true.

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!

Forgall posted:

Question about Postgresql streaming replication and WAL-archiving: Documentation states that:
Read the next two sentences of the documentation. There's no "failure" with a replaying standby, but there's a gap for a steaming standby. There's kinda a reason.

Suppose you have M1 steaming+archiving, S1 steaming standby, S2 replaying. M1 goes down mid-WAL (likely). Now S1 and S2 have different views of reality and you are required to maintain consistency. You promote S1. What becomes of S2? For bonus points, what if S3 was replaying archives from S1?

Forgall
Oct 16, 2012

by Azathoth

PhantomOfTheCopier posted:

Read the next two sentences of the documentation. There's no "failure" with a replaying standby, but there's a gap for a steaming standby. There's kinda a reason.
Those seem to say that S1 does receive latest log entries from M1 via streaming replication, but does not archive them after being promoted.

PhantomOfTheCopier posted:

Suppose you have M1 steaming+archiving, S1 steaming standby, S2 replaying. M1 goes down mid-WAL (likely). Now S1 and S2 have different views of reality and you are required to maintain consistency. You promote S1. What becomes of S2? For bonus points, what if S3 was replaying archives from S1?
If M1 fails when writing out segment number N, and S1 has archive_mode=on, then I assume segments 1 to N-1 are archived by M1, N is missing from archive, and after S1 is promoted it will start writing segments starting from N+1. Seems like S2 will just wait forever for segment N to appear in archive.

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!

Forgall posted:

Those seem to say that S1 does receive latest log entries from M1 via streaming replication, but does not archive them after being promoted.

If M1 fails when writing out segment number N, and S1 has archive_mode=on, then I assume segments 1 to N-1 are archived by M1, N is missing from archive, and after S1 is promoted it will start writing segments starting from N+1. Seems like S2 will just wait forever for segment N to appear in archive.

Does it say that?

quote:

To get a complete series of WAL files in the archive, you must ensure that all WAL is archived, before it reaches the standby. This is inherently true with file-based log shipping, as the standby can only restore files that are found in the archive, but not if streaming replication
is enabled.

A few features have been added since I was PG replication expert, so I'll probably make at least one false statement here, but there are a few pieces of the picture you're not quite seeing. It would be best to go back and rethink the purpose of WAL archiving, specifically to think about PITR and timelines. There's a difference between a single transaction and a complete WAL, so a streaming replication standby has a different notion of the state of the data than does a server rebuilding the database from WAL (even in a "slow replication" scenario).

Transaction logs are tightly coupled to a server and its timeline, so it's probably better to think that M1 WAL contains transactions M1.1 through M1.a that get written into WAL segment M1.a. If S1 is streaming transaction logs from M1, one of the very first things performed at startup/reconnect is validation that the xlogs still belong to M1 and that the same timeline is being replayed (see the replication protocol). If M1 goes down or executes PITR onto a different timeline, S1 can't replicate because it would cause data inconsistency. During hot standby,

quote:

Transactions started during hot standby will never be assigned a transaction ID and cannot write to the system write-ahead log.

So while I cannot find you a single document supporting this view, one assumption that I'd say is false is that all replicas in a system are replaying "transactions with ID 1 through 9999" or whatever. Instead, replay happens based on the transaction log ID, and the WAL entries are applied directly to the on-disk blocks. The transaction ID, as far as I can find, is only used during hot standby feedback; if you check the above doc for "xmin", you'll see that the receiver can specify a "minimum transaction id" to ensure the sender doesn't enter what would be a wraparound state on the hot standby.

The transaction ID in archived WAL is necessary for PITR, of course, but it doesn't really drive replication.

So, what does all that mean? During S1 failover, if it starts archiving logs after recovery, those will be S1.N logs, not M1.N logs. This means that an archive-replaying S2 has no timeline it can follow, and even though they patched to permit some timeline switches, that only works when S2 was streaming (and can be switched to follow S1). Once S1 has started writing logs, they exist so that S1 can perform PITR and describe the S1 timeline, not the M1 timeline. If S1 immediately undergoes failure (power cycle) it must perform recovery against it's own WAL, not the WAL from M1.

Moreover, while S1 can replay transactions from M1 as long as possible, it cannot guarantee that it has a consistent view of the M1 log. What if M1 did archive a WAL containing transactions that have not yet been received on S1? Could S1 always properly reconstruct M1.N?

Instead the system is designed to be very, very safe regarding data consistency and how it is done. S1 cannot bastardize S2 data in any way, because S2 is tracking a completely different timeline. Oh, you'll get S2's view of the world before the master went down, but just because S1 is promoted doesn't mean M1 isn't going to come back and complete its archiving. If you want to have S2 reflect some other reality, it has to base off S1 and follow that timeline, which could well take place at, after, or before the last transaction received on S2. Meanwhile, S3 wants to happily continue following S1, so in an always-archiving scenario, S3 doesn't care a whim about M1.N WAL, and only wants to continue tracking S1 WAL, so S1 certainly doesn't want to claim creation of M1 entries during failover.

Go back and read the replication documents again and review what you think replication does and how it does it. You may find you need to change the methods of replication used in your database cluster.

skrath
Nov 14, 2000
Horsum venit vir qui fert locustas!
Theres a lot of expert questions in here so hopefully this isnt too simple. I'm a complete SQL noob and trying to do something for my fantasy basketball site in MySQL.

I have a table PlayerStats that has lots of rows for each players yearly statistics

code:
PlayerID	Year	Team		Points
100		2016	Chicago		345
159		2016	Los Angeles	210
271		2016	Miami		614
100		2016	Philadelphia	400
Where a player gets traded in the same year (see PlayerID 100), I need to combine the stats when sorting.

The question is: How do I sort descending by Points, but sort on the combined records where there are multiple records with the same PlayerID and year? So the result would be:

code:
PlayerID	Year	Points		Team
100		2016	745		Chicago-Philadelphia
271		2016	614		Miami
159 		2016	210		Los Angeles
The rows are completely different apart from the PlayerID and Year, so I dont want to merge the data in any way, just present it as an output in a sorted fashion for whatever column I'm sorting. Also, if theres a way to concatenate the Team field so that any output (using PHP) can show that that would be even better.

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.
Seems like this is easy in mysql

SQL code:
select playerId, year, sum(points) as pts, GROUP_CONCAT(team SEPARATOR '-') 
from players
group by playerId, year
order by pts desc;
e: as the SA code highlighting points out "year" is a SQL keyword.

KernelSlanders fucked around with this message at 04:16 on Jun 25, 2017

skrath
Nov 14, 2000
Horsum venit vir qui fert locustas!

KernelSlanders posted:

Seems like this is easy in mysql

SQL code:
select playerId, year, sum(points) as pts, GROUP_CONCAT(team SEPARATOR '-') 
from players
group by playerId, year
order by pts desc;
e: as the SA code highlighting points out "year" is a SQL keyword.

Thanks! This is exactly what I needed. I will change the Year column name but seems to work okay for now.

skrath fucked around with this message at 05:21 on Jun 25, 2017

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


skrath posted:

Thanks! This is exactly what I needed. I will change the Year column name but seems to work okay for now.

Idk about mysql, but in Microsoft SQL Server you can use brackets around most reserved words to make syntax highlighting behave like:
SELECT [SELECT] FROM dbo.[Database]

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


zen death robot posted:

It's cloudflare and I'm sorry I thought I had this thread filtered

I thought KernelSlanders was talking about code-tag syntax highlighting, but I'm browsing on mobile and couldn't tell. For once ZDR, it may not be Cloudflare. :shrug:

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.

Nth Doctor posted:

Idk about mysql, but in Microsoft SQL Server you can use brackets around most reserved words to make syntax highlighting behave like:
SELECT [SELECT] FROM dbo.[Database]

It's double quotes in MySQL: SELECT "select" FROM "table" WHERE "select" LIKE 'select'; but still not a great practice to name tables or columns after keywords.

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

KernelSlanders posted:

It's double quotes in MySQL: SELECT "select" FROM "table" WHERE "select" LIKE 'select'; but still not a great practice to name tables or columns after keywords.

Dunno if they changed it, but it used to be that you had to set the SQL mode appropriately to be able to use double quotes as identifier quotes. Otherwise double quotes are just an alternative to single quotes for quoting literal strings. Backticks reliably work as identifier quotes (this is MySQL-specific afaik).

You should be setting the SQL mode anyway if working with MySQL, though, because setting the right mode makes it less sloppy.

Double quotes are identifier quotes per ANSI SQL iirc.

Ranzear
Jul 25, 2013

Anyone got an ear to bend about Galera clustering? I want to replicate everything to a single point for backups and replicate content from that single point back to the other shards, but I want shards to have their own exclusive databases for per-shard data. A bit like:

Master/backup:
content
sharda
shardb

Sharda:
content
sharda

Shardb:
content
shardb

I just want to know if I can exclude remote databases selectively I guess. I've been googling pretty hard and don't see why not just yet, as long as the shard dbs have different names.

Edit: I was being too specific while googling about. Galera just uses the vanilla 'replicate-ignore-db' in my.cnf. I'll just symlink a .cnf file in my setup script.

Ranzear fucked around with this message at 00:45 on Jul 1, 2017

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.

Hammerite posted:

Dunno if they changed it, but it used to be that you had to set the SQL mode appropriately to be able to use double quotes as identifier quotes. Otherwise double quotes are just an alternative to single quotes for quoting literal strings. Backticks reliably work as identifier quotes (this is MySQL-specific afaik).

You should be setting the SQL mode anyway if working with MySQL, though, because setting the right mode makes it less sloppy.

Double quotes are identifier quotes per ANSI SQL iirc.

Yeah, you're right. I've been switching back and forth between mysql and postgresql too much

grate deceiver
Jul 10, 2009

Just a funny av. Not a redtext or an own ok.
I have a dumb general database organization question. I have a spreadsheet containing employee work schedule that looks something like this:

code:
Name		Jan		Feb		Mar		Apr		...
Jim		O		X		X		O
Bob		X		O		X		X
Ted		O		O		X		O
There's a column for every month in the year, X means that the person is working that month, O that they don't. I need to move this spreadsheet into Access in order to be able to attach some additional information to those X's and do some other stuff that is a pain in the rear end to do in Excel. The number of hours and specific start and end dates don't matter, only if they work that month.

Now, I suspect that creating a column for every month in a single data table is an incredibly dumb way to do it, especially since I already know that this will have to extend to the next year as well. I'm thinking about looping through the spreadsheet with a VBA script to generate an Access table with every person-month pairing (turning the label into proper date format), but maybe there's a smarter way to do it?

Kuule hain nussivan
Nov 27, 2008

grate deceiver posted:

I have a dumb general database organization question. I have a spreadsheet containing employee work schedule that looks something like this:

code:
Name		Jan		Feb		Mar		Apr		...
Jim		O		X		X		O
Bob		X		O		X		X
Ted		O		O		X		O
There's a column for every month in the year, X means that the person is working that month, O that they don't. I need to move this spreadsheet into Access in order to be able to attach some additional information to those X's and do some other stuff that is a pain in the rear end to do in Excel. The number of hours and specific start and end dates don't matter, only if they work that month.

Now, I suspect that creating a column for every month in a single data table is an incredibly dumb way to do it, especially since I already know that this will have to extend to the next year as well. I'm thinking about looping through the spreadsheet with a VBA script to generate an Access table with every person-month pairing (turning the label into proper date format), but maybe there's a smarter way to do it?

Can you pivot the data in Excel?

Adbot
ADBOT LOVES YOU

grate deceiver
Jul 10, 2009

Just a funny av. Not a redtext or an own ok.

Kuule hain nussivan posted:

Can you pivot the data in Excel?

Basically for every one of those X's I need to keep track of a bunch of numbers, a separate spreadsheet file, and some .pdf attachments. There's 12 (probably 24) months and 200 people, doing it in Excel is starting to get annoying as hell.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply