SELECT * FROM Questions WHERE Type = 'Stupid'

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > SELECT * FROM Questions WHERE Type = 'Stupid'

wolrah: May 8, 2006; what?

Victor posted:

noonches, you have to be kidding me. You're advocating an RDBMS which returns different results based on configuration like indexes. That means I could attempt to speed of a platform backed by MySQL by adding an index, and all of a sudden break the application. How on earth could that be construed in any way other than retarded?

AFAIK there are MySQL options that can disable most if not all of the retardedness. Unfortunately it suffers from the same problem as PHP where there are a ton of applications out there built with the assumption of retarded behavior. Obviously these break when the system is configured in a sane manner, so the defaults and settings commonly used on shared hosts are insane.

It's a bad cycle too, since leaving that crap enabled just means more idiots will use it, meaning there are even more applications which depend on bad features, and so on.

# ¿ Dec 9, 2007 21:03

Adbot: ADBOT LOVES YOU

# ¿ May 7, 2024 17:46

wolrah: May 8, 2006; what?

Hopefully simple MySQL question.

I have a query that results in two columns of results, a customer ID and a cost for whatever month I've specified in the query. What I'd like to do is run the same query over again for a different month and have that get inserted as another column. Some customers have no data in some months as they were added or removed from the system, so it'll need to fill those spots in the output with NULLs or zeros.

edit: the query I'm using to get one month of data

code:

SELECT `group`, SUM((cdr_new.chargeDuration / 60 * rates.rate)) AS 'cost'
FROM cdr_new
LEFT JOIN rates
ON `product_code-call` = rates.code
WHERE LEFT(`startTime`, 6) = '200902'
GROUP BY `group`

and a small subset of the returned data (obviously can't publish identifiable customer info):

code:

+-------------------------------+-------------------+
| group                         | cost              |
+-------------------------------+-------------------+
| Resi K                        |  7.41592000000002 | 
| Resi1                         |                 0 | 
| ResiA                         |           1.69735 | 
| ResiB                         |          25.33945 | 
| ResiC                         |           0.22568 | 
| ResiF                         |           0.97017 | 
| ResiH                         |            0.0312 | 
| ResiS                         |           4.58148 | 
| ResiW                         |            1.7869 |  
| _Testing and Demo_            |           0.49162 | 
| _Testing_                     |                 0 | 
+-------------------------------+-------------------+
208 rows in set (3 min 48.35 sec)

I will gladly admit that I'm a SQL retard (I used JOIN for the first time ever last night designing this query) and the cdr_new table is utter crap (it's a raw import of CSV files delivered from my softswitch with one index and no real typing, I'm working on replacing it with something using the appropriate types and more indices).

edit2: I just figured out how to treat SELECT results as a table in a JOIN, so I came up with this absolute beast of a query

http://pastebin.com/f6b48d991

I'm fairly certain I did about a million things wrong there and it's going to take nearly an hour to complete if it's fair to figure it'll take 14 times as long as the above query did. That said, I only need to run it once.

edit3: I hate 32 bit memory limits...

After an hour of running, the server returned and Sequel Pro (CocoaMySQL fork) shat itself. Restarting using the 64 bit CLI client and just having it dump to CSV.

edit4: I'm not sure what I did wrong there, but I managed to end up with over 1000 rows per customer in the output. I killed the job when I noticed the CSV was over 7GB, when it should be on the order of KB.

wolrah fucked around with this message at 15:29 on Mar 3, 2009

# ¿ Mar 3, 2009 02:33

wolrah: May 8, 2006; what?

I have recently discovered that one of my former coworkers made a horrible mistake or was incredibly lazy when initially designing the main table for our billing and call record systems. Every single one of the 98 columns is a varchar with a maximum length equal to the largest amount of data those fields may contain in the CSVs we feed in to it from our switch. This table has a little over 73.5 million rows and takes up nearly 30GB on disk. I've added two indexes over time which helped immensely with some queries, but performance is still pretty bad

I'm completely retarded when it comes to optimizing databases so I have a few questions:

1. How horrible is this? Are we talking he owes me a beer next time I see him or he should be set on fire?

2. Would a half-rear end temporary solution of changing them to CHAR instead of VARCHAR gain me anything noticeable? Some back-of-the-envelope math indicates this would result in the table's size exploding up to just short of 238GB, but we have plenty of disk space.

2a. If so, what's the best way to go about doing this?

3. Any ideas for other tweaks I can do for a table this size while I bury myself in books trying to come up with a replacement that's a bit more properly designed than "let's just dump all of these fields in here exactly as we got them".

# ¿ Mar 24, 2010 21:54

wolrah: May 8, 2006; what?

Is there any way in MySQL to give priority to queries involving a certain table or database?

I have a server that originally just hosted files, then later was also turned in to a basic email server for a voicemail system. The email setup uses MySQL for its user database, just a few small tables which are very rarely changed and basically just respond to a SELECT when someone receives a voicemail or accesses their account.

Recently due to another site being taken down we moved over a very large database with millions of rows in one large table (yes it's the same terribly designed one I've asked a few questions about in the past). Unfortunately, we've discovered that doing almost anything outside of simple INSERTs and queries that hit the indexes properly on this table seems to bring the entire server to a halt and results in users being unable to access their voicemail until the query finishes.

A new server is being purchased to permanently store the large database, but until that is available I'm looking for any stopgap solutions which will allow the mail server's database to have priority over everything else. I don't care if queries on the big table are even entirely halted every time something comes in on the voicemail, nothing against the big one is time sensitive.

So far I've found plenty of ways to limit the number of queries a user can issue in a given time or what can be queried, but this table is so large that a single very simple non-index SELECT against it is enough to kill the server so none of those are helpful.

# ¿ Oct 15, 2010 21:27

wolrah: May 8, 2006; what?

McGlockenshire posted:

That won't help much if it's limited server resources that are the problem. One instance could easily consume all available I/O.

But if the other database is small enough to be kept in RAM, and both instances are tuned properly, it could work out.

My best guess is that it's disk I/O, since the server has a single hard drive and the big table is about 40GB with dozens of varchars. The voicemail database is incredibly tiny (2x 16KB + 1x 176KB tables) and basically static, so keeping it entirely in RAM seems like it would do the job.

What's the best way to achieve this? MEMORY table with an --init-file script that loads from the on-disk tables when the server starts?

# ¿ Oct 19, 2010 16:30

wolrah: May 8, 2006; what?

Melraidin posted:

Salt and hash the passwords. This will prevent people with access to the DB from being able to determine the original password in any reasonable time. If you only hash the passwords then rainbow tables could be used to find values that would hash to the same password.

I use something along these lines:
code:
dbPassword = CONCAT(
    LEFT( dbPassword, {SaltLength} ),
    MD5( CONCAT( LEFT( dbPassword, {SaltLength} ), '{userPassword}' ) )
)
Where dbPassword is the field in the DB containing the salted and hashed PW, SaltLength is the length of the salt in characters, and userPassword is the PW the user has entered you're testing against.

Don't do this. It was the common wisdom for some time, but has since been shown to be a bad idea. See this link for the details, but the short version is that common hashing functions like MD5 and SHA1 were designed for speed. This is good when using them to verify a lot of data, but bad when using them for passwords. Salts can help dodge rainbow tables when done right, but using GPU acceleration a modern computer can brute-force even salted passwords fairly rapidly. See here for a real world example.

Basically use a hashing algorithm intended for passwords, not speed. Speedy hashes are bad for security.

# ¿ Feb 16, 2011 03:09

wolrah: May 8, 2006; what?

Here's one that seems simple on the surface but is breaking my brain:

Given a table full of records with DATETIME columns for start time and end time, I'm trying to produce an output table with the number of active records in given intervals, including those which both started and ended within the interval.

Right now for any arbitrary interval the following query seems to work:

code:

SELECT `account`, COUNT(*) AS `Active` FROM sessions WHERE `startTime` <= '2011-01-19 13:01' AND `releaseTime` >= '2011-01-19 13:01' GROUP BY `account`

What I'm aiming for is something which would basically loop the above query within MySQL and return either one row per minute per account with a column for count (as I currently get with any given minute using the above), one row per account with a count column per minute (preferred), or one row per minute with a count column per account (easy enough to rotate the table in software after)

edit: welp, got the first option working, no idea how bad this is though

code:

SELECT minutes.minute, sessions.account, COUNT(*) AS 'Active'
FROM minutes
LEFT JOIN sessions
ON TIME(sessions.startTime) <= minutes.minute AND TIME(sessions.releaseTime) >= minutes.minute
WHERE DATE(sessions.startTime) <= '2011-01-19' AND DATE(sessions.releaseTime) >= '2011-01-19' AND sessions.typeOfNetwork = 'public'
GROUP BY sessions.account, minutes.minute

I had to build a table with one column of TIME entries, one per minute, to get it working.

wolrah fucked around with this message at 01:26 on Feb 21, 2011

# ¿ Feb 21, 2011 00:43

wolrah: May 8, 2006; what?

Thel posted:

What username are you using to connect to your SQLite database? Does it have permission to access those tables?

(I may be blowing smoke here since I have not the foggiest idea how security in SQLite works.)

SQLite databases are just files and there's no security to speak of. Your permissions to the file are it.

Canine Blues, if you're using dotConnect as I think you are, the problem may be that you have "Data Source=" in your SQLiteConnection line rather than "DataSource="

wolrah fucked around with this message at 23:54 on Jul 9, 2012

# ¿ Jul 9, 2012 23:51

wolrah: May 8, 2006; what?

I'm working on a pet project disk spanning pseudo-filesystem (think Greyhole or Windows Home Server) for which I'm currently using SQLite to keep track of what files are on what actual disks.

The issue I'm running in to is that my main goal is to tolerate disk failure with as little impact as possible. The file-handling side of this is pretty easy, just don't show any files that only exist on a disk that's unavailable, my problem is on the database side. I don't want to depend on any data outside of the disks in the pool, so I currently have a "master" disk which contains the live SQLite DB and the service simply copies this to the other drives when the pseudo-FS is unmounted.

Obviously this has major flaws if it's not unmounted cleanly, particularly if the master disk was to fail while mounted.

It doesn't seem like there's any way to simultaneously write to multiple database files in SQLite, so is there another similar self-contained database which might be better suited to my use (I don't need much beyond basic SELECTs with a bit of JOIN going on)? Is there some third-party SQLite replication system I've been unable to find? Or is the overlap between those who need self-contained databases and those who need replication so small that there's just nothing there?

# ¿ Nov 28, 2012 00:24

wolrah: May 8, 2006; what?

I don't see it mentioned often anywhere, but is anyone here familiar with Sybase? I have a client with a dental management program that runs on a Sybase Adaptive Server Anywhere 7.x backend, and let's just say that the application in question has earned a "Best of" DailyWTF entry for how badly it uses said database. http://thedailywtf.com/Articles/Classic-WTF-Rutherford,-Price,-Atkinson,-Strickland,-and-Associates-Dentistry,-Inc.aspx

We actually have the remote sites all connecting back to a pair of terminal servers to run this application "locally" at the main site thanks to this idiotic implementation, and don't even get me started on the hacks that are the backup scripts to make it work right.

Anyways, we're having odd load issues with the server where idle CPU usage on the database process will never go below 25% and performance suffers greatly, where under normal use it's floating near zero. Disconnecting all clients has no immediate impact, though usually if left for 30-60 minutes it will suddenly fall off to zero. Restarting the DB server resolves the issue until it comes back again some days or weeks later.

The machine is a dedicated 2008R2 box with 16GB of RAM and a Xeon E5640, though ASA7 is apparently 32 bit w/o LAA so it doesn't use more than 2GB ever.

The application vendor has been less than helpful, so I'm looking for any resources that might help me try to at least determine what the database is doing when it gets stuck like this.

Sybase's public docs are not proving amazingly useful, they seem to be sort of like MSDN in that the information is probably there somewhere, but finding it is proving to be a challenge. Being apparently eight versions behind current of course doesn't help either, as much of the documentation I can find is made for much newer code.

# ¿ Aug 21, 2013 18:06

wolrah: May 8, 2006; what?

MrMoo posted:

I want to say w3m and a website but I would hope there is something native that works with ncurses and is less retarded.

Though the w3m approach would mean you'd also have a web interface so it could still be useful when the users aren't on a terminal.

If you really want it to be terminal based for the long term, I've used this before with Python for some personal projects and it didn't suck: http://urwid.org/

I'm sure you can find some kind of similar library for your favorite programming language.

# ¿ Mar 20, 2016 02:56

wolrah: May 8, 2006; what?

I'd go with the latter simply because you're then testing your backups as well. If the restore to Test doesn't work right you know you have a backup problem and need to act accordingly.

# ¿ Dec 15, 2016 16:17

wolrah: May 8, 2006; what?

Kuule hain nussivan posted:

It looked fine while I was the most recent poster

Testing this theory. It shows six unread for me right now, none unread within the normal thread view, but the last six posts (which I've already seen a few dozen times) are showing as unread on the "post reply" page.

edit: Nope, still the same behavior.

# ¿ Oct 24, 2017 14:34

wolrah: May 8, 2006; what?

Kuule hain nussivan posted:

It seemed to be a feature of the Awful app. When I post, it marks the thread as having no unread posts, but this changes back to 6 unread as soon as I refresh my bookmarks-/subforum-view. In short, still hosed.

Makes sense. I think the weird behavior I see on the post page is an artifact of the SALR Chrome extension and not actually something the forums are directly doing (though it's still obviously getting triggered by the forum bug).

# ¿ Oct 24, 2017 15:10

wolrah: May 8, 2006; what?

kiwid posted:

I keep being told I should stop using MySQL and use a "real" database.

What do people mean by this? I always ask for more clarification and then conversation stops.

Like PHP, MySQL was a lot shittier in previous versions and that reputation lives on. AFAIK the largest problems these days are actually bad defaults, that it's possible to configure it to be decent enough but the default settings do a lot of insane things like silently dropping or modifying data.

# ¿ Jul 17, 2018 18:33

wolrah: May 8, 2006; what?

You definitely don't want any situations where the client has the ability to modify someone else's score data. Many years ago Super Meat Boy did that and it didn't take long before that bit them in the rear end.

https://forums.somethingawful.com/showthread.php?noseen=0&threadid=2803713&pagenumber=258#post398884189

I'm not familiar with Firebase so I'm not sure what parts of the equation it may handle for you, but the way I'd do this would be to throw together a quick web API that sat between the database and the clients, with an endpoint for submitting high scores and another for retrieving the leaderboard. That app could then do whatever other housekeeping work you wanted it to do.

Rule #1 of internet-facing services: Never trust the client. Assume any safeguards you put in to client code will be defeated. Validate everything they send you and don't let them interact with anything they don't need to.

# ¿ Jan 2, 2019 06:45

wolrah: May 8, 2006; what?

Shadow0 posted:

Web is even more terrifying to me than the database, I'm not sure I'm up to that task.

Whatever programming language you're most comfortable with, there is almost certainly a widely used framework or two available which handles the hard parts of dealing with HTTP and just lets you write your code surrounded by a bit of boilerplate. Since it sounds like you're treating this more as a toy for now and presumably don't want to spend money on running it I'd look in to any "serverless" options. Those basically abstract all of the web parts including the server operation and let you just focus on the code.

quote:

Yeah, I definitely assumed that!
I don't know how I would go about making sure that scores sent over were valid (I know nothing about security [Actually, it seems like I might know nothing about anything]). Maybe whenever the value is increased naturally, I store the answer as a hash, then send that with the score and have the server hash the score and compare them? Does that sound right?
I'm pretty confident approximately 0 people will download my app, but I'm trying to get in good practice.

By validation I mean more of ensuring the data is correctly formatted and is not being directly fed in to the database in a way that might allow injection attacks.

If the game is running entirely on the client there's no way to actually ensure the scores are legitimate. Depending on your game's design there may be ways to sanity check it (like if a good high score should be ~30,000 then a client submitting 4.2 billion is probably doing something shady) but unless the server is involved in running the game you pretty much have to trust the client as far as the score is concerned.

Shadow0 posted:

Stored procedures look interesting, thanks! I think that might work well. Firestore/Firebase doesn't seem to have exactly that, but it does have something called "Cloud Functions". It seems pretty similar ~~to the web API idea.~~ to the thing you said not to do... I'll keep looking around, haha. I think they run in the background though, so they are non-blocking. If I'm reading it correctly (I'm sure I'm not).
I guess I can expose a method like "sendScoreToServer", and then it fires off and does all its thing on the server. Seems like it'll be perfect. Then I'll just have to figure out how the security mess works and limit the database to only take that call and queries. Thanks for the help, guys!

Cloud functions seem to be Google's form of the "serverless" idea I mentioned above. Use HTTP triggers for your "submit score" and "get leaderboard" functions that the client connects to, then the trusted code you've set up on the Google platform does the database interaction.

# ¿ Jan 2, 2019 16:46

wolrah: May 8, 2006; what?

Let's Encrypt put up a blog post today about their new database servers, with some sides of details about how they've configured ZFS and MariaDB to work together.

https://letsencrypt.org/2021/01/21/next-gen-database-servers.html

# ¿ Jan 22, 2021 01:08

Adbot: ADBOT LOVES YOU

# ¿ May 7, 2024 17:46

wolrah: May 8, 2006; what?

Just-In-Timeberlake posted:

We have a table with ~20 years of data in it that's approaching 9 million rows that is plenty fast to query with the right indexes in place.

9 million rows isn't exactly massive in the database world, especially if we're talking about 20 years of data. I work for a small VoIP provider, we generate a couple hundred thousand rows on our CDR table every month, and we're a fart in the breeze to our upstreams. When I set up our first CDR processing system 15 years ago I didn't even know what an index was and it still ran fine on a random Pentium 4 desktop I had turned in to a server until the tables got large enough to fall in to swap.

# ¿ Feb 24, 2022 00:41

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > SELECT * FROM Questions WHERE Type = 'Stupid'