Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
Is there only one order? Can users customize the order? Is the order actually entirely arbitrary, or are you not telling us about a property of the data that induces the order? Are the items arranged into some manner of subgroups? I ask because it seems questionable to require the display, in random/chosen order, of a list that's 20k elements long.

It's not clear to me that those numeric sorting IDs should be a field in that table (as opposed to some other). It's also not clear --- sacrilege here --- why, if this order is so important, that this isn't part of the primary key for the thingies (create sequence... increment 100, for example). I can see it going either direction on both questions depending.

Adbot
ADBOT LOVES YOU

pangstrom
Jan 25, 2003

Wedge Regret
Kind of repeating poster above me, and only saying this only because you said you were new to this stuff: typically the "why" of the ordering is manifest in "the data" itself. Maybe it's who created the record or modified_date or store_id or whatever else or combo therein. I guess I can see the speed/simplicity advantage if those criteria are complicated at just having an abstract "sort_order" field, but unless you have a reason NOT to I would focus on having the "why" in "the data" and just putting "order by" at the end of your queries.

pangstrom fucked around with this message at 14:23 on Apr 30, 2018

Paul MaudDib
May 3, 2006

TEAM NVIDIA:
FORUM POLICE

PhantomOfTheCopier posted:

It's also not clear --- sacrilege here --- why, if this order is so important, that this isn't part of the primary key for the thingies (create sequence... increment 100, for example).

So if you change the sort order, you're going to go through and update all the other tables that relate to this with new keys?

Surrogate keys are the way to go, for this and many other reasons.

Pekinduck
May 10, 2008

PhantomOfTheCopier posted:

Is there only one order? Can users customize the order? Is the order actually entirely arbitrary, or are you not telling us about a property of the data that induces the order? Are the items arranged into some manner of subgroups? I ask because it seems questionable to require the display, in random/chosen order, of a list that's 20k elements long.

I'm sorry I have to be vague I hope this makes sense. Each row refers to a physical object in storage and the order is the order they're in on the shelves. The order is basically arbitrary; there's nothing in the data that could be used to determine it. I can't change the order because there's so many of the things and various documents refer to this order. I need to occasionally change it if objects are rearranged.

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


Pekinduck posted:

I'm sorry I have to be vague I hope this makes sense. Each row refers to a physical object in storage and the order is the order they're in on the shelves. The order is basically arbitrary; there's nothing in the data that could be used to determine it. I can't change the order because there's so many of the things and various documents refer to this order. I need to occasionally change it if objects are rearranged.

Are these spaces on shelves well defined? Like every foot of shelf space is a separate "object spot"? If that's the case, make a location table and FK it to your items table with a unique index such that only one item could be in any spot on the shelves, and then enforce a rigorous data-defined sort order on your locations and order by that.

Rubellavator
Aug 16, 2007

I need a sanity check.

Can anyone think of a practical reason to write a query like this:

code:
select ...
from T1
inner join (
	T2 
	inner join T3 on T2.KEY = T3.KEY
	inner join (
		T4 inner join T5 on T4.KEY = T5.KEY
	) on T2.KEY = T4.KEY
	left outer join T6 on T2.KEY = T6.KEY
) on T1.KEY = T2.KEY
instead of this:

code:
select ...
from T1
inner join T2 on T1.KEY = T2.KEY
inner join T3 on T2.KEY = T3.KEY
inner join T4 on T2.KEY = T4.KEY
inner join T5 on T4.KEY = T5.KEY
left outer join T6 on T2.KEY = T6.KEY
If a table is joined to multiple times and it's not the original "from" table, then those joins are nested. Is there a reason to do this? Our homegrown QueryBuilder does and I have no idea why and the people who wrote it are long gone. The former runs fine on Oracle/Postgres but takes ~20 minutes on Derby, and the latter takes about 1 second. I've never really used or seen queries that used nested joins like that, so I feel like I might be missing something.

The code that generates the query looks like this:

code:
query.from('T1')
	.join('T1', T1.T2Relation)
	.join('T2',  T2.T3Relation)
	.join('T2', T2.T5Relation)
	.join('T4', 'T4.T5Relation)
	.leftouter('T2', T2.T6Relation)

Ruggan
Feb 20, 2007
WHAT THAT SMELL LIKE?!


Inner joins nested on other inner joins don’t do anything except potentially confuse the optimizer depending on your dbms. Probably better off doing the latter.

Nesting an inner join on another join is typically used when you want to compute that join first. For example, let’s say you have three tables: employees, awards, awardtypes. If you want to get every employee and the awards they’ve received whose frequency is annual, you’d do something like this:

code:
select * 
from employees e 
left join awards a 
    inner join awardtypes t 
    on a.awardtypeid=t.awardtypeid 
    and t.frequency=’annual’ 
on e.employeeid=a.employeeid
In the example above, the inner join is only filtering the awards table and doesn’t affect the employees that are returned.

Maybe they generalized and built it this way to be accommodating. But for your example you could swap in your new query just fine.

NihilCredo
Jun 6, 2011

iram omni possibili modo preme:
plus una illa te diffamabit, quam multæ virtutes commendabunt

Re: user-editable ordering in SQL, begriffs recently wrote a good blog post on the subject covering various approaches

https://begriffs.com/posts/2018-03-20-user-defined-order.html

I also recommend the HN discussion of that post:

https://news.ycombinator.com/item?id=16635440

Pekinduck
May 10, 2008

Nth Doctor posted:

Are these spaces on shelves well defined? Like every foot of shelf space is a separate "object spot"? If that's the case, make a location table and FK it to your items table with a unique index such that only one item could be in any spot on the shelves, and then enforce a rigorous data-defined sort order on your locations and order by that.

Alas no, the objects vary widely in size and are just packed in.

The current system is a huge card catalog with the cards in the order the objects are in on the shelves. Its catagorized a bit but in broad categories.

Pekinduck fucked around with this message at 17:58 on May 1, 2018

McGlockenshire
Dec 16, 2005

GOLLOCKS!
I suggest renaming the field to something like "shelf_position" then. You aren't sorting the data, you're storing the relative position on the shelf. It's actual data, not metadata about the row ordering.

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


Am I crazy for toying with the idea of a linked list implementation to define adjacency over the table?

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!

Nth Doctor posted:

Am I crazy for toying with the idea of a linked list implementation to define adjacency over the table?
Naup, I had that thought too, but it's clear above... model the actual data. In this case, physical position is real data and needs to be there.

Now then, what happens when something is removed? Is it necessary to enforce a pigeonhole principle on the range of values?

Ruggan
Feb 20, 2007
WHAT THAT SMELL LIKE?!


add a few columns for width, height, length to measure the items and also a few columns of xpos, ypos, zpos, to represent relative location from a geotagged spot in your store (your choice just be consistent) and also an item_orientation column storing a vector representation so you know what way it’s facing

after you’ve set that all up, load your items and run some collision validation using a combination of spatial sql and 3d modeling techniques to make sure all your data makes sense

then write a machine learning algorithm that chooses the most likely route a person will choose for any given pick list... implement it as a CLR function in sql

finally, pass your list of items to the function and sort by pick order. Boom, done!!! just make sure you work out a process that keeps your sql tables up to date

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

Ruggan posted:

add a few columns for width, height, length to measure the items and also a few columns of xpos, ypos, zpos, to represent relative location from a geotagged spot in your store (your choice just be consistent) and also an item_orientation column storing a vector representation so you know what way it’s facing

after you’ve set that all up, load your items and run some collision validation using a combination of spatial sql and 3d modeling techniques to make sure all your data makes sense

then write a machine learning algorithm that chooses the most likely route a person will choose for any given pick list... implement it as a CLR function in sql

finally, pass your list of items to the function and sort by pick order. Boom, done!!! just make sure you work out a process that keeps your sql tables up to date

i like this post!

Particularly the part where it glosses over the hardest part in the last sentence, lol.

Pekinduck
May 10, 2008

McGlockenshire posted:

I suggest renaming the field to something like "shelf_position" then. You aren't sorting the data, you're storing the relative position on the shelf. It's actual data, not metadata about the row ordering.

Alright, thanks. Luckily the collection of objects is fairly static; its the sort of deal where you take one, use it, then put it back where it was.

Ruggan posted:

add a few columns for width, height, length to measure the items and also a few columns of xpos, ypos, zpos, to represent relative location from a geotagged spot in your store (your choice just be consistent) and also an item_orientation column storing a vector representation so you know what way it’s facing

after you’ve set that all up, load your items and run some collision validation using a combination of spatial sql and 3d modeling techniques to make sure all your data makes sense

then write a machine learning algorithm that chooses the most likely route a person will choose for any given pick list... implement it as a CLR function in sql

finally, pass your list of items to the function and sort by pick order. Boom, done!!! just make sure you work out a process that keeps your sql tables up to date
Whoa

LargeHadron
May 19, 2009

They say, "you mean it's just sounds?" thinking that for something to just be a sound is to be useless, whereas I love sounds just as they are, and I have no need for them to be anything more than what they are.
I have a SQL index that uses the INCLUDE clause. My layman's understanding is that this increases performance, as the columns in the INCLUDE clause are available for read in the index itself, removing the need for a lookup in the main table after finding the records I want. I have to reproduce this index in a SQLite database, but SQLite doesn't seem to support the INCLUDE clause. What are my options here?

Jethro
Jun 1, 2000

I was raised on the dairy, Bitch!

LargeHadron posted:

I have a SQL index that uses the INCLUDE clause. My layman's understanding is that this increases performance, as the columns in the INCLUDE clause are available for read in the index itself, removing the need for a lookup in the main table after finding the records I want. I have to reproduce this index in a SQLite database, but SQLite doesn't seem to support the INCLUDE clause. What are my options here?

Either just make an index with the indexed columns and deal with the probably pretty negligible performance impact (since if you cared that much you probably wouldn't be using SQLite), or add the included columns to the index in some order that makes sense for how they're used.

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



Our system has a query that runs a lot to drive a part of our UI that's central to the daily work of a good chunk of the business that I've been trying to optimize. Mostly, it's gone pretty well, but the other morning, SQLServer poo poo out a Bad Plan. Here's the diagram of the relevant section of that Bad Plan:


That's an inner join on two tables with a one-to-many relationship that just aggregates the count of the many side. It's not a real foreign key relationship, which is probably why this is happening, but actually fixing this schema is a Out of Scope whereas, say, adding the right query hint to keep it limping along is In Scope. Sorry, I know that's triggering and, hell, it bugs the poo poo out of me, but that's life ATM.

Here's the diagram of the same part of the good plan:


GoodPlan.Object4 == BadPlan.Object3 is the Many side of the relationship
GoodPlan.Object3 == BadPlan.Object2 is the One side
(sorry about that - it's SentryOne Plan Explorer's anonymized output)

The Bad Plan does use a different index that's probably useless because it's basically 'not deleted' which is most of the table, but hinting it to the index used in the Good Plan did not change the merge strategy downstream. That is, it still read a bunch of the table and did the aggregate on that rather than merging and then doing the aggregate on the much smaller intermediate result set.

Is there anything I can do to get it to consistently pick the merge strategy from the Good Plan?

e: OPTION (LOOP JOIN) seems to be helping a lot 🤞

Munkeymon fucked around with this message at 20:32 on May 2, 2018

Ruggan
Feb 20, 2007
WHAT THAT SMELL LIKE?!


What are your predicates on the table that is a scan in the bad diagram?

You basically need to find out why the optimizer is preferring a table scan to a seek. If you can figure that out and fix it your performance should return.

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



Ruggan posted:

What are your predicates on the table that is a scan in the bad diagram?

You basically need to find out why the optimizer is preferring a table scan to a seek. If you can figure that out and fix it your performance should return.

The only predicate was that IsDeleted column for some reason. LOOP JOIN actually improved other parts of the plan I was going to work on later where it was doing similar over-reads, but with better indexes on smaller tables, too. I'm also going to drop that index because it just can't be that useful to index on a binary that includes most of the table in the overwhelmingly common case. Well, of course it's loving nullable so trinary, I guess :rolleyes:

Ruggan
Feb 20, 2007
WHAT THAT SMELL LIKE?!


Is it a filtered index (index with a where clause)? Those are actually super useful because they let the optimizer avoid needing to analyze the table for that filter criteria. The classic example is yours exactly - some stupid flag that is always filtered in the same way for 90% of use cases (like employee.isActive = 1).

Is your query basically:

FROM A
INNER JOIN B
ON A.id = B.id
WHERE A.IsDeleted = 0

If so, a filtered index on A with the same where clause is probably appropriate. Even better if it is an index on whatever that join key is. Then it will almost certainly do an efficient operation.

Ruggan
Feb 20, 2007
WHAT THAT SMELL LIKE?!


One last thing. You absolutely need an index on the many side of that one-to-many on whatever your join key is. That is what allows the optimizer to do a seek to find all matches in the many side from the one side.

What I can gather from your original pics of the query plan is that the optimizer could not effectively seek the appropriate records so instead it calculated the aggregate for the ENTIRE table (which involves an implicit sort on the grouper) and then joined that to your other table by picking out the then sorted records.

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



There is an index on the join key - it'd, be something like B.aId in your example, but the optimizer used a different, unrelated one on B.isDeleted instead*. A.id is a real primary key, but I made a covering index for A on the parts of the where clause that apply to it, so it's good to go. (B is Object4 in the Good Plan and Object3 in the Bad Plan; A is Object3 in the Good and Object2 on the Bad - the anonymization wasn't consistent, again: sorry for the confusion!)

Filtering the index on isDeleted is a good idea. I'll try that and see how it shakes out.

* I'm not sure why it's even there - I think a lot of indexes have been haphazardly applied over the years because "it made the plan look good in SSMS one time"

Munkeymon fucked around with this message at 15:01 on May 4, 2018

Shugojin
Sep 6, 2007

THE TAIL THAT BURNS TWICE AS BRIGHT...


I love to concatenate location, date, and transaction ID columns to make unique identifiers because the $30k verifone pos reuses transaction IDs from day to day

E: I even more love to be doing it because the software lacks a specific functionality that sales assured us it had and was the main reason we bought it

Shugojin fucked around with this message at 13:54 on May 11, 2018

Sad Panda
Sep 22, 2004

I'm a Sad Panda.
I'm an SQL moron. I'm using MySQL and used AES_ENCRYPT just fine to store passwords in my database. I'm trying to retreive them. I got the following to work, but that returns the blob rather than the password.

SQL code:
SELECT Bookies.BookName, 
       Bookies.BookURL, 
       Accounts.AccountOwner, 
       Accounts.Username, 
       Accounts.Password 
FROM   Bookies 
       INNER JOIN Accounts 
               ON Bookies.ID = Accounts.BookID; 
The following returns the passwords in blob format, but I can't seem to join the following into the above.
SQL code:
SELECT AES_DECRPYT(`password`, 'secret') AS `Password` 
FROM   `Accounts`
My broken attempt was

edit : Got it.
SQL code:
SELECT Bookies.BookName, 
       Bookies.BookURL, 
       Accounts.AccountOwner, 
       Accounts.Username, 
       AES_DECRYPT(`Password`, 'secret') 
FROM   Bookies 
       INNER JOIN Accounts 
               ON Bookies.ID = Accounts.BookID; 

Sad Panda fucked around with this message at 14:51 on May 14, 2018

TheFluff
Dec 13, 2006

FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE

Sad Panda posted:

I'm an SQL moron. I'm using MySQL and used AES_ENCRYPT just fine to store passwords in my database. I'm trying to retreive them.

You... you do know this is a thing that you should never, ever, ever do under ordinary circumstances, right? I mean, I assume there are real life scenarios where there are good reasons for doing this, but normally when you store passwords you always store them in non-retrievable form, and you didn't specifically say "I know what I'm doing, don't get on my case"...

McGlockenshire
Dec 16, 2005

GOLLOCKS!
On top of "don't do that," MySQL AES_ENCRYPT uses a non-standard padding method that will cause problems if you can't use AES_DECRYPT later (read: need to decrypt the data in your own code). Also, I hope you aren't using replication, because the plaintext and encryption key are gonna be in your replication logs now.

So yes, please don't do that. It's bad and wrong and a giant loving security threat. Using two-way encryption on passwords is never the right choice. Ever. Even if the specification says the password must be retrievable, the specification is wrong. Do not do it that way, ever. Use a one-way hash. The current industry standard is bcrypt, with implementations available for every single language and platform. Another adequate solution is Argon2, the winner of the Password Hashing Competition, but implementations are slightly more rare. If you feel like being weird, you can use PBKDF2, which hasn't actually been recommended for password storage, but implementations are easy and everywhere.

McGlockenshire fucked around with this message at 18:50 on May 14, 2018

Malcolm XML
Aug 8, 2009

I always knew it would end like this.
What company do you work for I would like to know what other insane security holes you guys must have

Sad Panda
Sep 22, 2004

I'm a Sad Panda.
It's a personal project. I'm creating a Python application that stores my login information for a bunch of websites in a database. It retrives the requested login information from the database, and logs into the website to scrape some information from it.

The MySQL database is stored locally, and the Python is too. What's the security threat other than someone getting my Mac?

If there's a better option then I'm all ears.

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

Sad Panda posted:

It's a personal project. I'm creating a Python application that stores my login information for a bunch of websites in a database. It retrives the requested login information from the database, and logs into the website to scrape some information from it.

The MySQL database is stored locally, and the Python is too. What's the security threat other than someone getting my Mac?

If there's a better option then I'm all ears.

If you are only going to access the MySQL server via programs running on the machine itself, then make sure it is configured to only accept connections from localhost. That might be the default.

maybe relevant: https://stackoverflow.com/questions...=google_rich_qa

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe
also, lol

Sad Panda
Sep 22, 2004

I'm a Sad Panda.

Hammerite posted:

If you are only going to access the MySQL server via programs running on the machine itself, then make sure it is configured to only accept connections from localhost. That might be the default.

maybe relevant: https://stackoverflow.com/questions...=google_rich_qa

Thanks. Googling makes it sound like it's the default. I edited /etc/my.cnf to include

code:
[mysqld]
bind-address = 127.0.0.1
port=3306
sql-mode="STRICT_TRANS_TABLES,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION"
skip-networking

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
It doesn't matter. Do it the right way. Bad routines breed future failures:

https://it.slashdot.org/story/18/05/02/0613205/github-accidentally-exposes-some-plaintext-passwords-in-its-internal-logs

thebigcow
Jan 3, 2001

Bully!

Sad Panda posted:

It's a personal project. I'm creating a Python application that stores my login information for a bunch of websites in a database. It retrives the requested login information from the database, and logs into the website to scrape some information from it.

The MySQL database is stored locally, and the Python is too. What's the security threat other than someone getting my Mac?

If there's a better option then I'm all ears.

Oauth if the websites support it, which I assume they don't.

So you get to store passwords and watch people freak out when you ask for help with the only way to solve your problem.

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

Freaking out is the correct default response to storing the passwords like that.

McGlockenshire
Dec 16, 2005

GOLLOCKS!

Sad Panda posted:

If there's a better option then I'm all ears.

In addition to what the others have said, at least push down the encryption stuff to the application layer instead of doing it in the database, and decrypt to the plaintext at the last practical moment. As I mentioned above, the builtin "AES" stuff is not actually AES the way other things understand AES.

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



Just get your passwords out of the Keychain, the secure password storage database that comes with your OS? https://gist.github.com/gcollazo/9434580

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



Unrelated: we recently migrated from on-perm to an AWS instance for our production database. It probably could have been handled better, but I can't say that for sure because I wasn't involved much until D-day when I spent 12 hours helping patch up holes an incomplete replication left and adding indexes because the could server's performance was so different for some reason... *looks pointedly at 'max server memory' setting*. On second thought, maybe I can say that for sure after all. (<- figured I'd spare people my griping if they don't want it)

Today I found a bunch of records in a VARBINARY(MAX) column were truncated to exactly 32768 in length and all of them were last updated before the migration. Weird! It hasn't cause any issues because the data in those columns had already been consumed and represented elsewhere - they're just temporary backup data. Still, I'd like to know what did that. I suspect it was replication, but I can't find any confirmation that replication can do that. The logs from when we migrated have rotated out and the old production server is... IDK - in some infrastructure guy's basement now? The backups I can get to with my access level have the same truncated old data, but whole newer data.

I just wanna know if this is A Thing and my google-fu is failing me. Or I'm wrong about the cause.

Data Graham
Dec 28, 2009

📈📊🍪😋



Munkeymon posted:

Just get your passwords out of the Keychain, the secure password storage database that comes with your OS? https://gist.github.com/gcollazo/9434580

Bet the specification for Keychain that said passwords must be retrievable wasn't quite wrong :v:

Adbot
ADBOT LOVES YOU

SnatchRabbit
Feb 23, 2006

by sebmojo
Does anyone have a good link for quick and dirty cliff notes style tips for formatting a spool in oracledb?

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply