|
If you were using postgres, a unique partial index would make your job really easySQL code:
|
# ? Aug 27, 2014 18:05 |
|
|
# ? May 10, 2024 18:30 |
|
EAT THE EGGS RICOLA posted:I have a postgres database that I need to replicate across a dozen servers. If you're using the last few 9.x releases, look into pg_basebackup and pg_receivexlog and friends. http://www.safaribooksonline.com/library/view/postgresql-9-high/9781849516969/ch06s07.html posted:With this new utility, PostgreSQL has the ability to transmit transaction logs to a remote system without the need for a dedicated PostgreSQL server. This also means that we can avoid ad hoc tools such as rsync when maintaining an archive server to save old WAL files.
|
# ? Aug 27, 2014 18:42 |
EAT THE EGGS RICOLA posted:I have a postgres database that I need to replicate across a dozen servers. Use a VPN or tunnel over SSH. Definitely don't open up the database port to the world!
|
|
# ? Aug 27, 2014 18:53 |
|
EAT THE EGGS RICOLA posted:I have a postgres database that I need to replicate across a dozen servers. As the other posts have indicated, it depends entirely on what you're permitted to do. I read this as "nothing but whole file transfers", which means you use standard, version eight WAL shipping. That would affect your sync rate, of course.
|
# ? Aug 27, 2014 23:45 |
|
Sedro posted:If you were using postgres, a unique partial index would make your job really easy Yeah. Postgres would be nice. If we were actually using MariaDB 10.0 (but still InnoDB), I'm guessing that wouldn't make this any easier, since I can't find anything on unique partial indices in MariaDB, either. Other ideas I had: 1. SELECT ... FOR UPDATE with a where clause crafted to scan and lock each row in my whole table, of almost 2 million records* (actually, writes only make up like 2% of our observed usage, so this isn't necessarily the worst thing ever, but it's spiky and I'm not sure how many of those consist of writes to this particular table...) 2. GET_LOCK in a stored procedure .. but this is unsafe for replication. 3. Allow these transactions to make dirty reads via READ UNCOMMITTED, check that the record exists, then optimistically insert the record assuming that a similar transaction isn't occurring at the same time, then do another select on the title and rollback the transaction if multiple records exist with the title. But this opens us up to a different race condition, where our pre- and post-checking see the after-effects of a delete on a duplicate title and insert the new record, only for the delete to be rolled back, resulting in us still having undesired duplicate titles. * It would be neat if I could just do a SELECT ... FOR UPDATE on a range of title strings that only includes the title I'm interested in, plus one title "greater" and one title "lesser", or something similar to that, to prevent reads on the gap between checking and inserting a new record. Any idea if I can get something close to this? Or do I misunderstand how gap locks work in InnoDB?
|
# ? Aug 28, 2014 02:22 |
|
Safe and Secure! posted:So let's say I have Book records, where each record has Why not rename it to IsUnique and leave it NULL in the overlap case. A standard index permits multiple nulls, no? Edit: Sorry, that doesn't quite do it... seems like you'll need a coverage map of sorts, then. PhantomOfTheCopier fucked around with this message at 04:18 on Aug 28, 2014 |
# ? Aug 28, 2014 04:07 |
|
I'm working on a mostly key/value storage app in which ALL data access must be logged. I've setup a trigger-populated audit table that tracks data modifications (user, time, before/after), but I'm stumped on the best way to log selects. I'm more of a developer than a DBA. Database is Postgres. I could write a function that captures and logs all select queries, but that won't scale with lots of users getting large result sets. Only logging result row keys won't work because I need to log the data returned as it was in the query, not its present, potentially modified-since-then state ("show me all the data Bob accessed from time X to Y"). My latest thought is to log the result row keys and, when queried for the data, "rollback" a copy of the table using the audit table (plpython probably). Am I crazy? Is there a better way to do this? Audit queries would be very rare, so performance isn't a major concern.
|
# ? Aug 31, 2014 02:55 |
|
How many unique queries will you end up having? Would it be realistic to put all the auditable selects behind stored procedures?
|
# ? Aug 31, 2014 04:50 |
|
Minus Pants posted:I'm working on a mostly key/value storage app in which ALL data access must be logged. I've setup a trigger-populated audit table that tracks data modifications (user, time, before/after), but I'm stumped on the best way to log selects. I'm more of a developer than a DBA. Database is Postgres. You're not crazy to think that you shouldn't log all the results and should instead leverage your audit entries to construct the result sets when needed, but with a perfectly capable database sitting in front of you, you really ought not be using something external like plpython; you're likely to get it wrong. You'd be better off developing the proper stored rollback procedures now so that code can track your table structure: If anyone deems it necessary to add columns at some future date, they will be required to ensure the procedures for the triggers are correct, along with the rollback procedures. Don't be confusing matters by adding some external code that can desync from your database. That will only lead to utterly useless (i.e., corrupt) audit tables. What do you do when someone comes in and says they don't know Python? Make them learn Python? Okay, good, go read the 2000 pages of PostgreSQL documentation. It's a great read.
|
# ? Aug 31, 2014 05:45 |
|
McGlockenshire posted:How many unique queries will you end up having? Would it be realistic to put all the auditable selects behind stored procedures? Yes, that would be doable. There would only be about a dozen selects. PhantomOfTheCopier posted:The answer depends distinctly on the actual requirements, which might be somewhat obscured by your interpretation of them based on a few of your comments here. You should start by reading the documentation, of course: http://www.postgresql.org/docs/9.1/static/runtime-config-logging.html The options you seek are the CSV log and log_statement=all for starters. log_statement is very useful, but logging the results of selects (the data itself or references to it) is the part I'm struggling with. PhantomOfTheCopier posted:You're not crazy to think that you shouldn't log all the results and should instead leverage your audit entries to construct the result sets when needed, but with a perfectly capable database sitting in front of you, you really ought not be using something external like plpython; you're likely to get it wrong. You'd be better off developing the proper stored rollback procedures now so that code can track your table structure: If anyone deems it necessary to add columns at some future date, they will be required to ensure the procedures for the triggers are correct, along with the rollback procedures. Don't be confusing matters by adding some external code that can desync from your database. That will only lead to utterly useless (i.e., corrupt) audit tables. Thanks, this post gave me an epiphany (use the db, stupid!). I'll make some coffee and crack open the pg manual
|
# ? Aug 31, 2014 14:45 |
|
Minus Pants posted:log_statement is very useful, but logging the results of selects (the data itself or references to it) is the part I'm struggling with. If you must have an instantly-accessible log of all requests to the database and the results from it, then, indeed, you need to create an entire architecture to restrict the database access methods to prevent users from executing anything that isn't expressly permitted. You'll want a thorough understanding of triggers and procedures, as well as roles and grants. In such a system, you'll want to keep as much as possible within the database so it survives backup and replication and cannot be easily circumvented by external software bugs. If, on the other hand, you need the ability to retrieve some result sets after the fact, or otherwise intend to perform batch audits at some future point, you should consider another solution. Your audit tables would permit you to roll backward and forward again, but as you have noted, that takes a little bit of work. Depending on how far into history you need to reach, a single audit could be fast or slow, and you have to perform every query again after the data has been reconstructed. Presumably the trade-off is to reduce the storage size at the expense of audit speed. If you're running batch audits, you might be better off coupling the query log with WAL shipping to an audit server. Determine a nominal period for a basebackup and ensure that you have all WAL segments during the intervening period. Operating on a separate system, you can then configure the PG server to start from the basebackup and replay to any point in time. You could couple this with audit replay to roll forward, and so forth. Yeah, it sounds like you need to stay in the research phase a bit longer, then the design phase a bit more after that. Leave the aglie python out of it or you're going to end up with users running their own queries with no one the wiser.
|
# ? Aug 31, 2014 16:42 |
|
This is another 'works in regex, trying to figure out how to work it in SQL' question (MSSQL 2008). I've got these quasi variable SKU numbers that have information in them I need to extract. They're constantly changing and being added to, so I need to be able to do this in as 'real time' as possible (e.g. running a query once to store the values I need in another table isn't a good solution). They're structured like: code:
1. A variable numbers of letters, followed by a variable number of digits, let's call it the 'base' sku. 2. A finish code for the product, two letters 3. A color code for the product, three letters. The idea being, I want to be able to select the 'base' sku and do a GROUP BY so I can see all color/finish variations rolled up to a parent. Easy peasy right? Just do SELECT LEFT(SKU,LEN(SKU)-5). However, there is a problem; Segment 2 (finish code) is not always present, which means the -5 method will sometimes truncate the base SKU too far, and I get unrelated products in the group by. In RegEx this is very straightforward: code:
Is there a way to construct a SELECT query that can grab the 'base' SKU accurately? Which hopefully doesn't involve CTEs or installing code on a production server. The only distinction I can rely on are: - The first part ('base sku') will always be a range of 5-10 letters followed by 3-5 numbers. - The finish code is always two letters, the color code is always three letters - The color code is always present I realize I can build a WHERE clause that will tell me which ones have finish codes or not (by cobbling together %A-Z0-9% etc.), but I can't actually SELECT that value.
|
# ? Sep 2, 2014 20:26 |
|
If I'm understanding your requirements, you can extract characters [0,n] where n is the index of the last numeric character. You can use patindex to find the first occurrence of a number. Reverse the string to find the last occurrence. Then take the substring to extract that first segment. Something like this SQL code:
|
# ? Sep 2, 2014 20:55 |
|
Sedro posted:If I'm understanding your requirements, you can extract characters [0,n] where n is the index of the last numeric character. Well gat dang that is clever. I was on the patindex to find the first position but couldn't think of anything to do with it; it's the reverse that's the sneaky part here. Works exactly as stated above, all I had to do was replace the string literals with my column name. Thank you kindly pard. EDIT-ARRGH the fuckers don't even follow their own rules. Sometimes characters come after the digits, that aren't finish or color codes. FML. Scaramouche fucked around with this message at 00:07 on Sep 3, 2014 |
# ? Sep 2, 2014 21:39 |
|
This might not be the right place but just wondering if there is a way to determine a hashing algorithm used for a user table field in a DB if you have access to the table. Like I have the username, password and salt and can change things to whatever I want. I know this isn't really a SQL specific question since the hashing isn't done by SQL but the client program but I figure people who work with SQL all the time are more likely to know this stuff. The software was coded in VB6 also. An example: username something password Awful1!! Salt icRpbUhwhgL9h5G Hash 95D69CBBEC9B818642F2CB1BF232DAB7 If this is a dumb place to ask this, any guidance to a better spot would be appreciated. Demonachizer fucked around with this message at 15:25 on Sep 3, 2014 |
# ? Sep 3, 2014 15:19 |
|
Demonachizer posted:This might not be the right place but just wondering if there is a way to determine a hashing algorithm used for a user table field in a DB if you have access to the table. Like I have the username, password and salt and can change things to whatever I want. I know this isn't really a SQL specific question since the hashing isn't done by SQL but the client program but I figure people who work with SQL all the time are more likely to know this stuff. The software was coded in VB6 also. Given that there's no information about the hash in the DB, you have to brute force it. Try some common hash algorithms with some combinations of password and salt and user name until you can match the hash in the database. You know the length of the hashed value so you only need to try 128-bit algorithms like MD5. Given that the software was developed in VB6, it's either old (or designed by lazy programmers if new) so it probably isn't a complicated scheme. For future reference, there's a general programming questions thread.
|
# ? Sep 3, 2014 19:06 |
|
Demonachizer posted:This might not be the right place but just wondering if there is a way to determine a hashing algorithm used for a user table field in a DB if you have access to the table. Like I have the username, password and salt and can change things to whatever I want. I know this isn't really a SQL specific question since the hashing isn't done by SQL but the client program but I figure people who work with SQL all the time are more likely to know this stuff. The software was coded in VB6 also. Nothing in here: http://www.insidepro.com/hashes.php, but it isn't truly comprehensive. If I were you I would try every permutation of username/password/salt with 128-bit hashing algorithms combined with toLower/toUpper on everything.
|
# ? Sep 3, 2014 19:28 |
|
Architectural question time! I have a whole number of machines which generate event recorder data. I'm looking at using Infobright or another columnar DB to store this as it's highly (~300x) compressible. Most reads will be aggregate/sequential reads from segments defined by a datetime, which I use as the key. Multiple machines come together to describe 1 asset, so evaluations will have to sample a time period from multiple machines. How am I best structuring the tables? I'd first leap to having 1 table per machine but the idea of having table name as a variable in queries strikes me as not best practice. How does a real DBA get around this design problem?
|
# ? Sep 4, 2014 10:55 |
|
I've got a working around bad architecture problem. Basically, I have to look up cases by "[year]-[two letter 'code']-[morenumbers]". There is no table to match the code, nothing in the schema at all to support these two letter codes, etc. The UI just has three text boxes the user types in "2004" "CG" "butts" into and it concatenates it into "2004-CG-butts" and searches from there, and when partial it uses a like. I was wondering if there's any non lovely way to look at the table and get the unique two letter 'codes', but it looks like I'd have to do a drat table scan. I'm also not sure how to do the actual string manipulation in tSQL before I just selected distinct.
|
# ? Sep 4, 2014 19:23 |
|
Does anyone know how to strip out special characters out of a text file for importing into a database? I have a list (1.5 million rows) of user names and a bunch of them include non-printable ASCII characters (characters 0 - 31) that cause importing to a Postgres database to fail. This list is coming from a SQL Server table export (where they load in just fine, strangely enough) into a text file. FWIW: These special characters can be found throughout the name, not just at the beginning. Anyone have a technique they use? This is a windows-based environment.
|
# ? Sep 4, 2014 19:29 |
|
gently caress them posted:I've got a working around bad architecture problem. Is this list of two letter codes static or is it getting updated? I'm not sure of your load and read/write patterns, but maybe you can create an initial table that stores all of your two letter codes based on a SELECT DISTINCT, and then as new records come in on other tables you could perform an upsert into the two letter codes table?
|
# ? Sep 4, 2014 19:35 |
|
Nam Taf posted:Architectural question time! How about a single table with a primary key on TIMESTAMP, MACHINE_NAME? I do the same thing to aggregate web logs from multiple web servers into a single table. edit: Oh wait, you said columnar database. No idea if PKs work the same way on columns instead of rows. Agrikk fucked around with this message at 19:41 on Sep 4, 2014 |
# ? Sep 4, 2014 19:37 |
|
Agrikk posted:Is this list of two letter codes static or is it getting updated? I believe it's static, but it's also a lot of stuff. It's every case that still has a physical copy somewhere within the court system. So, not a small number. Thankfully, all new stuff is "e-filed". My state sucks but is actually on the ball with regards to this! How do I select distinct from a sub string? If the server could optimize that that would be great. I know how to do this in C#/whatever but not in SQL. If new stuff was going on, I'd make a code table, but the thing is I don't know what the codes mean, and neither does anyone else. A few are known, but new ones were just made up ad-hoc.
|
# ? Sep 4, 2014 19:40 |
|
gently caress them posted:How do I select distinct from a sub string? If the server could optimize that that would be great. I know how to do this in C#/whatever but not in SQL. If you need the query to perform well (ie. you will re-run the query often because the set of distinct codes change) then you need to use a functional index.
|
# ? Sep 4, 2014 19:46 |
|
gently caress them posted:I believe it's static, but it's also a lot of stuff. It's every case that still has a physical copy somewhere within the court system. So, not a small number. Thankfully, all new stuff is "e-filed". My state sucks but is actually on the ball with regards to this! Use SELECT DISTINCT SUBSTRING(CaseNumber, x,2) from Tablename Where X is the position inthe string of the first letter of the two letter code. e;f;b
|
# ? Sep 4, 2014 19:56 |
|
Sedro posted:select distinct substring(NameOfYourColumn, 6, 2) from ... An ongoing project is actually indexing tables at all, on that note. Bit of a horror but, meh. Many of the indexed tables indexed the wrong column. Since I have the time, how would I set up a function based index?
|
# ? Sep 4, 2014 20:29 |
|
gently caress them posted:An ongoing project is actually indexing tables at all, on that note. Bit of a horror but, meh. Many of the indexed tables indexed the wrong column. SQL code:
|
# ? Sep 4, 2014 21:27 |
|
Agrikk posted:How about a single table with a primary key on TIMESTAMP, MACHINE_NAME? I could make it one table but then I'd have hundreds of machines in one giant table and that also seems not the best. The problem is that machines aren't permanently grouped to assets - a group of machines comes together to be an asset but then can dissolve and move to different assets later on. Machines A, B and C may be asset 1 and D, E and F asset 2 one day, but then A, D and F may be 1 and B, C and E 2 the next.
|
# ? Sep 5, 2014 02:56 |
|
1.8m rows were scanned in .4 seconds. I take it 1.8m rows isn't that much for a production SQL server? I Still have no basis of what "big" is, or isn't. Also, there is an index on that row, apparently. If new data was coming in I'd build a (proper, useful) code table and add the column. Databases are just a drat rabbit hole.
|
# ? Sep 5, 2014 03:28 |
|
gently caress them posted:1.8m rows were scanned in .4 seconds. I take it 1.8m rows isn't that much for a production SQL server? I Still have no basis of what "big" is, or isn't. 1.8 Million rows is a decent size, but not that big. Billions is "big" these days. Ask me about having to table copy 10TB of Db2 tables because of poo poo design. Actually, never mind. I'm going to drink until I loving forget how terrible our Architects are.
|
# ? Sep 5, 2014 04:01 |
|
Nam Taf posted:I could make it one table but then I'd have hundreds of machines in one giant table and that also seems not the best. What's wrong with hundreds of machines IDs in a table? If the logs have the same format dump them all into a single table. It certainly beats joining across a hundred tables... I have a pool of web servers in an VPC that get spun up and destroyed all the time due to auto scaling events that match demand. Each of these instances pushes web logs to a central S3 bucket that is polled by a process to snort the logs and collects the machine name. The pool of servers ranges from 6-20 web servers at any given moment and my log table contains over five hundred machine IDs.
|
# ? Sep 5, 2014 07:45 |
|
Fair enough. I guess it just doesn't feel 'right' to have a database with 13 million rows (150 machines) a day for 1 table. I mean it would work, but so would calling SQL in Python with table name as a variable input. I parsed 24 million rows (1 machine's last year of data) in today and Infobright compressed the 1.3GB csv input to 4.1MB. That was pretty sexy. It had 0.00 second query times for aggregate functions too. Columnstore databases are amazing.
|
# ? Sep 5, 2014 16:11 |
|
Nam Taf posted:Fair enough. I guess it just doesn't feel 'right' to have a database with 13 million rows (150 machines) a day for 1 table.
|
# ? Sep 5, 2014 19:01 |
|
Nam Taf posted:Fair enough. I guess it just doesn't feel 'right' to have a database with 13 million rows (150 machines) a day for 1 table. Welcome to Big Data. Around these parts in my company it's quite normal to have tables of billions and billions of rows increasing by 50-100 million rows a day. ...and yeah. ^^^ Summarize, summarize, summarize.
|
# ? Sep 5, 2014 19:46 |
|
Hey guys, got a newbie question regarding PL/SQL. Suppose I've got a procedure that's got a fairly complex SELECT to query the data I want to work on, and I'd like to store the results for the runtime of the procedure so I can then run further queries on that. I've done some googling about temporary tables and such and supposedly on an MS system, building temporary tables at runtime is no big deal, but Oracle doesn't support it. The consensus pretty much seems to be "It's not supposed to work that way, don't do it", followed by (what looks to me) horribly kludge-y ways to get it done anyway by explicitly creating, filling and then dropping a table. I could just have the explicit query in there multiple times, but that seems inelegant when I really just need a way to say "Okay, remember this poo poo we did at the beginning? We're working with that again." Or wrap the whole thing up in an FOR loop so I can refer back to that outer query inside the loop however I want. But it feels like there's probably a better solution than what a newbie with Google can figure out. But even if there isn't, at this stage I'm mostly just genuinely curious: why isn't it done like this? Is this supposed to be one of those cases where the language wants to force you to clean up your thoughts (and program flow) to the point where you no longer need temp tables to get it done? Or am I just looking in the wrong place?
|
# ? Sep 6, 2014 09:01 |
|
Gatac posted:Hey guys, got a newbie question regarding PL/SQL. Suppose I've got a procedure that's got a fairly complex SELECT to query the data I want to work on, and I'd like to store the results for the runtime of the procedure so I can then run further queries on that. I've done some googling about temporary tables and such and supposedly on an MS system, building temporary tables at runtime is no big deal, but Oracle doesn't support it. The consensus pretty much seems to be "It's not supposed to work that way, don't do it", followed by (what looks to me) horribly kludge-y ways to get it done anyway by explicitly creating, filling and then dropping a table. You can do this with a GLOBAL TEMPORARY TABLE. Depending on your result set size and default sampling level, you might need to hint CARDINALITY or OPTIMIZER_DYNAMIC_SAMPLING to avoid a bad plan (block size is used in the optimizer if you don't have stats, iirc).
|
# ? Sep 6, 2014 14:08 |
|
I posted in this thread a while ago about being a dev thats ended up as a bit of a dba / dev wierd cross. Anyway can anyone recommend any certs/training both from a "this is actually useful" and a "this is bullshit but hr departments love it" angle? I dont know how long I will stay doing this vs going back to pure dev work but im enjoying it a lot at the moment and if I can learn more or get some handy bits of paper on the companies dime I certainly wouldn't say no.
|
# ? Sep 6, 2014 21:05 |
|
var1ety posted:You can do this with a GLOBAL TEMPORARY TABLE. Depending on your result set size and default sampling level, you might need to hint CARDINALITY or OPTIMIZER_DYNAMIC_SAMPLING to avoid a bad plan (block size is used in the optimizer if you don't have stats, iirc). Hmm, okay. Different question, is it possible to construct an UPDATE that doesn't actually change a field? I know it's possible to just have it write back what's already in there, but is there a way to avoid any change at all? Basically, I'd like to update a record, but certain fields only need to be changed under certain circumstances. (Using CASE statements when assigning the values.) I could just externalize those into separate UPDATEs wrapped in IF blocks, of course, but is it possible to centralize this into one neat UPDATE?
|
# ? Sep 9, 2014 06:19 |
|
Gatac posted:Hmm, okay. Different question, is it possible to construct an UPDATE that doesn't actually change a field? I know it's possible to just have it write back what's already in there, but is there a way to avoid any change at all? Basically, I'd like to update a record, but certain fields only need to be changed under certain circumstances. (Using CASE statements when assigning the values.) I could just externalize those into separate UPDATEs wrapped in IF blocks, of course, but is it possible to centralize this into one neat UPDATE? I can say that this is 100% possible to write in MSSQL. It's possible to write many, many things that you should not write, though, and this is one of them. You're much better off having multiple UPDATE statements with differing WHERE clauses than having one massive UPDATE that selectively updates based on the value in the row -- isn't that what WHERE clauses are for?
|
# ? Sep 9, 2014 07:15 |
|
|
# ? May 10, 2024 18:30 |
|
Anaxandrides posted:I can say that this is 100% possible to write in MSSQL. It's possible to write many, many things that you should not write, though, and this is one of them. You're much better off having multiple UPDATE statements with differing WHERE clauses than having one massive UPDATE that selectively updates based on the value in the row -- isn't that what WHERE clauses are for? What is the concern about updating columns that aren't changing? The only case where that could theoretically matter is if you update rows that don't need any updates at all, but even that can be offset by an efficient WHERE clause. Just update all the columns if any of them have changed...or do you have triggers on the table that care if certain columns were touched? code:
SLOSifl fucked around with this message at 13:12 on Sep 9, 2014 |
# ? Sep 9, 2014 13:05 |