|
Pvt. Public posted:I was recently told to build a wish list of changes for my office and one of them (in a long list) is to upgrade the vCenter cluster we run. Right now it runs two nodes (both active) and stores VMs on iSCSI LUNs held on a QNAP 459U. The first thing I am going to do is get rid of the lovely Trendnet 8-port GigE switch the cluster is using in favor of something else (potentially two switches in HA). Buy a rackmount server with a bunch of internal bays and put ZFS on it. It won't do synchronous replication but in most cases you don't really want that anyway, even if you think you do. And it will likely be a lot more stable than any cheap SAN running something proprietary.
|
# ? Aug 23, 2012 05:40 |
|
|
# ? May 10, 2024 00:38 |
|
Thanks for the info, guys. That was pretty much what I found when looking into it further today. I will just be getting a second QNAP and drives as a closet cold spare.
|
# ? Aug 23, 2012 22:24 |
|
NetApp's DataMotion for Volumes is trash. I was all excited about it, finally got all of my clients to upgrade to ONTAP 8+, but after actually trying to use it, here are a few... catches: 1) Doesn’t work with deduped volumes. Yes, seriously, after NetApp recommends that you turn on dedupe on EVERY volume, they pull this poo poo. I have heard Tom Georgens recommend on multiple times to enable dedupe everywhere! No penalities! The volume will be deswizzled/reinflated on the destination which means you will need to delete all of the snapshots to re-dedupe the volume at all. 2) Doesn’t work on CIFS/NFS-exported volumes. So, no VMware or CIFS/SMB volumes. You have to move these the old fashioned way, with plenty of downtime and SnapMirror. 3) Doesn’t work on SnapVault secondaries (still needs to be tested). When combined with the “no dedupe” rule, this will make it essentially worthless on any backup filer. So if you have an iSCSI-only volume that doesn’t have dedupe enabled on your primary filer, well, now you have a good candidate. Unfortunately, with 99+% of my overall volumes being deduped and/or SnapVault destinations and/or CIFS/NFS, DataMotion is worthless. Man, I am so mad. NetApp really needs to get their poo poo together w/r/t deduplication. It seems like most of their poo poo simply does not work with dedupe, and yet they want it on everywhere.
|
# ? Aug 25, 2012 18:17 |
|
madsushi posted:VMware ... You have to move these the old fashioned way, with plenty of downtime
|
# ? Aug 25, 2012 18:36 |
|
Misogynist posted:Does not compute. Sorry, I should've said: "if you need to do a flash-cut, then you're stuck with snapmirror then a downtime-causing cut to the new volume." If you have: *VMware enterprise plus licensing *weeks/months to transfer your data (and wait for replication) *twice the space available on your SnapVault destination Then yeah, you can make a new source volume and use sMotion to move a few VMs into the new source, wait for replication via SnapVault, move a few more, repeat, then wait several weeks/months before deleting the old source/destination volumes because you need to maintain those snapshots in case of a restore request.
|
# ? Aug 25, 2012 18:40 |
|
madsushi posted:Man, I am so mad. NetApp really needs to get their poo poo together w/r/t deduplication. It seems like most of their poo poo simply does not work with dedupe, and yet they want it on everywhere. Vol move works fine with de-duplicated volumes. It just doesn't work when dedupe is actively running on them. You need to stop any ongoing scans before moving the volumes. The fact that it only works with block protocols sucks. I haven't found any explanation for that but my guess is that they couldn't find an easy way to deal with stale file-handles since they basically have to un-export and re-export the filesystem as part of the cutover which NFS clients don't like. It's probably possible, but most resources are devoted to cluster-mode these days.
|
# ? Aug 26, 2012 18:38 |
|
NippleFloss posted:Vol move works fine with de-duplicated volumes. It just doesn't work when dedupe is actively running on them. You need to stop any ongoing scans before moving the volumes. I was basing the dedupe issue off of this quote: DEDUPLICATION and COMPRESSION *If deduplication is active on the source FlexVol volume, then it must be stopped for a successful cutover. *DataMotion for Volumes does not move the fingerprint database and change logs of a deduplicated FlexVol volume. After the DataMotion for Volumes process is complete, users must execute sis start -s to rebuild the fingerprint DB at the destination. "sis start -s" isn't going to work on a volume with snapshots, right? Or am I mistaken here?
|
# ? Aug 26, 2012 19:11 |
|
madsushi posted:"sis start -s" isn't going to work on a volume with snapshots, right? Or am I mistaken here?
|
# ? Aug 26, 2012 19:49 |
|
adorai posted:Why wouldn't it? sis can't touch blocks that are locked by snapshots, which is why you always have to run dedupe before your regular snapshots. Same for enabling dedupe on a volume with snapshots - dedupe can only touch new blocks, not the blocks already locked.
|
# ? Aug 26, 2012 20:43 |
|
madsushi posted:sis can't touch blocks that are locked by snapshots, which is why you always have to run dedupe before your regular snapshots. Same for enabling dedupe on a volume with snapshots - dedupe can only touch new blocks, not the blocks already locked.
|
# ? Aug 26, 2012 22:15 |
|
Okay, so my RAID 5 disappeard overnight last night. Not exactly sure what happened yet, still looking into it. I did just update my AMD drivers, which includes that RAIDXpert utility, so that may have something to do with it. Anyway, I don't have all my specs and such on hand, so I am asking this in a generic sense. If I need additional help, I'll make a thread in Haus of Tech Support. 1) What is the best tool to recreate the array without losing my data? I see a few tools via Google search, but I am wondering if there is a standard recommended way to doing this. 2) My controller is onboard, so I am thinking of getting a more legitimate controller card. What are my options here? I need something with at least 5 SATA III ports, can handle RAID 5, and can detect an existing array. This seems to be the front-runner: http://www.newegg.com/Product/Product.aspx?Item=N82E16816103220 Am I going to find anything better/cheaper/more effective?
|
# ? Aug 27, 2012 19:46 |
|
Some of you may remember my upgrade nightmare with my VNX5300 a few pages back. I'm happy to report that as of this morning, EMC was able to get both the file and block portions at the same version they need to be. The 5300 is now working as I expect it to and without spewing warnings. All it took was EMC releasing a new firmware to fix the issue that bit me in the rear end when I tried to do the upgrade myself. As a side note, the tech they sent was awesome but we had to do the upgrade via the command line instead of USM, which is currently broken for upgrades. So don't use USM to do upgrades for a while...
|
# ? Aug 27, 2012 20:05 |
|
Bea Nanner posted:Okay, so my RAID 5 disappeard overnight last night. Not exactly sure what happened yet, still looking into it. I did just update my AMD drivers, which includes that RAIDXpert utility, so that may have something to do with it. Anyway, I don't have all my specs and such on hand, so I am asking this in a generic sense. If I need additional help, I'll make a thread in Haus of Tech Support. Have you tried rolling the drivers back?
|
# ? Aug 27, 2012 20:10 |
|
Rhymenoserous posted:Have you tried rolling the drivers back? No I haven't done much at the moment. I could do a windows system restore. It could be the new drivers, but I'm not totally sold on that. It may just be a controller failure. The drives are detected by windows, but the raid not so much. Also there is a BIOS controller for the RAID, which makes me skeptical that the fix is within Windows. Do you really think a System Restore or something similar will make the RAID magically show up again? Oh, also. I saw some recommendations (via Google) that said "just make a new raid with the same name as the old raid." I could try this, but I'm scared it will just build a new blank array and all my poo poo will be wiped.
|
# ? Aug 27, 2012 20:27 |
|
You have good backups right? No guts no glory.
|
# ? Aug 27, 2012 20:36 |
|
for 9TBs? LOL. This IS the backup.
|
# ? Aug 27, 2012 20:38 |
|
You're probably not going to have an import feature with onboard raid. If you reinitialize your data is gone. I think youre probably hosed.
|
# ? Aug 27, 2012 20:42 |
|
Bea Nanner posted:for 9TBs? LOL. This IS the backup. RAID is not a backup for exactly this reason.
|
# ? Aug 27, 2012 20:47 |
|
'Arf a brick at it.
Rhymenoserous fucked around with this message at 20:53 on Aug 27, 2012 |
# ? Aug 27, 2012 20:50 |
|
Okay, so devils advocate, I lose everything and I hate my onboard controller. What do I get instead?
|
# ? Aug 27, 2012 20:55 |
|
Bea Nanner posted:Okay, so devils advocate, I lose everything and I hate my onboard controller. What do I get instead?
|
# ? Aug 27, 2012 22:41 |
|
Bea Nanner posted:Okay, so devils advocate, I lose everything and I hate my onboard controller. What do I get instead?
|
# ? Aug 27, 2012 22:42 |
|
Misogynist posted:Depends. Is your storage locally attached to a machine you use regularly for other things, or is it basically just network-attached storage? evil_bunnY posted:What's it for, precious? Also how big is the bag of money. Currently it's local, so the easiest solution would be a PCI card, I think. I'm not against migrating the whole thing to an enclosure. Usage is primarily media storage and playback. A good bit of disc usage due to torrents, streaming, etc. Money...well...I can save up if necessary. The card I linked was around $500. Can I get a decent sata III card for any less? Is that one good? I think an enclosure would be more. Closer to $1000 - $1500, right? So set me straight. What do I need for a good permanent solution for my storage so I don't have to worry about my controller ever again?
|
# ? Aug 28, 2012 00:21 |
|
I'm currently running this tool: http://www.freeraidrecovery.com/ Which has been going for about 24 hours (lots of data, so not too uprising) and hasn't turned anything up yet. It seems ...sort of legit, so I figured its worth a shot. Did I just Bansai Buddy my HTPC?
|
# ? Aug 28, 2012 00:27 |
|
Yah, so basically that tool was worthless. I've been playing around with my setup and come to a few realizations. First, I probably should be posting in this thread?? http://forums.somethingawful.com/showthread.php?threadid=2801557 I was linked here from the A/V forum, so that's my excuse. You folk have been helpful so far so unless you kick me out I will carry out.. So now I have two options. I can try to rebuild the array exactly like it was with the existing controller (which seems perfectly fine and dandy despite deciding 'you dont have a raid anymore' the other day). This won't erase my data, right? I'm not convinced it will show back up, though. So I need to back it up or get some data out of this. I know the hard drives are fine. All the data is there. So how do I get it out? I have a spare 3TB drive (which I was going to add to the array), so I want to at least recover the hard to replace stuff. So what is the best way to recover this data? I think best route is to 1) migrate all possible data using X data recovery tool, 2) but X new controller card and 3) recreate new array and then migrate recovered data back. Are my assumptions correct? I'm a bit in the dark here, so any help is appreciated.
|
# ? Aug 28, 2012 17:01 |
|
If you can re-create the array without zeroing the drives, you can then run recovery tools on the virtual device your controller presents to the OS. e: I have no idea if the controller would allow that though.
|
# ? Aug 28, 2012 17:26 |
|
Bea Nanner posted:I'm currently running this tool: http://www.freeraidrecovery.com/
|
# ? Aug 28, 2012 17:34 |
|
Misogynist posted:This is a dumb question, but have you talked to your RAID vendor yet? It's onboard so...uhh....I think maybe it's an AMD chipset?
|
# ? Aug 28, 2012 18:21 |
|
Bea Nanner posted:It's onboard so...uhh....I think maybe it's an AMD chipset?
|
# ? Aug 28, 2012 18:35 |
|
I'm going to make this suggestion again: Roll back to prior to the update that broke everything, then go get a real backup solution. Since this is all on a das rolling back your drivers shouldn't affect anything on the DAS itself. Raid is not backup. If this server + Das is holding "Live" data then you need another device to which you back up that live data at the bare minimum. Not to beat up on you but at one point you are saying "This is the backup" yet at others you are alluding there is live data on this that would cause issues if it were to go missing. Which is it? Or is it both?
|
# ? Aug 29, 2012 16:17 |
|
I have an EqualLogic array with two members. One is a 6010 with 16 600GB 10K SAS drives, the other is a 6010 with 16 100GB SSD drives. Both are running RAID 50, and we have a single large volume on top of them for SQL stuff. (Yeah, we really should have split the SSD into its own pool, but we were told by Dell that autotiering would handle it. That's neither here nor there at this point.) Two Force10 S4810 10Gb switches are handling the iSCSI traffic, and we only have one host that uses the array (our SQL server). I also have constant complaints from management that the array is "not performing well". They're basing that entirely on the average latency numbers from SAN HQ. Here's what I'm seeing for the past day (this includes overnight spikes from index rebuilds and consistency checks): General Information Read/write %: <0.1%/99.9% Est. I/O load: Low Average I/O Size Reads: 64 KB Writes: 6.1 KB Inbound Replication: 0 KB Weighted Avg: 16KB Average IOPS Reads: 117.5 Writes: 566.6 Inbound Replication: 0 Total: 684.1 Average Latency Reads: 4.0 ms Writes: 2.1 ms Inbound Replication: 0 ms Weighted Avg: 2.4 ms Average I/0 Rate Reads: 7.3MB/sec Writes: 3.4MB/sec Inbound Replication: 0KB/sec Total: 10.7MB/sec Average Queue Depth: 3.0 Like I said, they're wigging out about latency. To me, an average latency of 2.4ms isn't anything to be worried about. I believe our IOPS are low due to the fact that the SQL server has 256GB of RAM and doesn't have to read from disk often (most data should be in RAM, the database is only about 300GB and most of that is archive stuff that isn't touched). Seeing as this is the first time I've had a SAN though, I don't have any experience as far as what point latency becomes "bad". The application is super responsive, no issues on that front. They're just worried about that number. Should I tell them to let it go, or do I actually have a problem?
|
# ? Aug 29, 2012 20:07 |
|
General Information Read/write %: <0.1%/99.9% Est. I/O load: Low Average IOPS Reads: 117.5 Writes: 566.6 Inbound Replication: 0 Total: 684.1 Are your reads minuscule or something? And who the gently caress cares about latency <10ms. Low average IOPS nobody cares about : it literally means nothing until it the moving average (over say 10 secs) gets close the array's capacity. An interesting metric would be to look at how much of your reads are coming from flash, and if it's low whether that's caused by the EQL algos or your workload patterns. Unless your workload is highly heterogenous over time, 700IOPS is trivial. It's much more interesting to look at outliers, and try to figure out what's causing actual poo poo performance. Boogeyman posted:the database is only about 300GB and most of that is archive stuff that isn't touched Nukelear v.2 posted:Also who the hell pays for a rack of SSD for 684 IOPS. evil_bunnY fucked around with this message at 20:36 on Aug 29, 2012 |
# ? Aug 29, 2012 20:27 |
|
Boogeyman posted:Should I tell them to let it go, or do I actually have a problem? I wouldn't worry about this at all. 4ms on a 64k read is good. Also who the hell pays for a tray of SSD for 684 IOPS. Edit: drat you quoted me before I could fix rack->tray To give you an idea, I run EQL's hybrid SSD/10K array for my DB and on sqlio benchmarks on 64k seq reads, my avg latency is 12ms. You really need to actually put the thing under load to see if you have any real underlying issues. It's 300G data set on 256G of RAM, of course it's hitting memory for reads. Doesn't even need to be intelligently designed schema. To optimize SQL here are the steps I'd take in the order I'd take them as you grow/people bitch: 1) Do nothing, cause holy hell Dell saw you coming 2) Make an SSD partition and put your SQL Logs there 3) Figure out what indices matter on your problem queries, create new data files and migrate your indices onto SSD (autotiering is going to do this for you, but in a delayed fashion, this will force fresh data onto SSD's) VVV Nukelear v.2 fucked around with this message at 21:04 on Aug 29, 2012 |
# ? Aug 29, 2012 20:32 |
|
The database is growing exponentially, rather than wait until it outgrew the onboard storage on the R910, they decided to migrate everything to EqualLogic before we started having capacity and performance problems. Yeah, the IOPS are piddly poo poo right now, but they'll be increasing in the future. Peak IOPS now are around 2800 to 3000 or so. Thanks to good indexing and table compression, I'm pretty sure that most of the reads are just hitting RAM on the server and not having to hit the SAN. This will change as we accumulate more data. I'd like to split the pool into an SSD pool for highly accessed tables and a SAS pool for the rest of the data, but I didn't have a lot of luck with that when I first tested the arrays before putting them into production. It's been a while, but I think I had to drop the SSD member from the pool altogether, and it screwed a bunch of stuff up. Like I said, I was told to set it up this way by Dell, "make it all RAID 50 and autotiering will figure it out". I don't doubt that it will, but I'm pretty sure it needs to see some kind of workload before it can make those kind of decisions. It's pretty much sitting there with its thumb up its butt right now. Anyways, thanks for the info, I'll tell them to quit worrying about it.
|
# ? Aug 29, 2012 20:54 |
|
Boogeyman posted:
If this is an MS SQL server you can verify this pretty trivially by looking at your cache hit ratio
|
# ? Aug 29, 2012 21:04 |
|
Syano posted:If this is an MS SQL server you can verify this pretty trivially by looking at your cache hit ratio I should have thought of that earlier (SQLServer:Buffer Manager - Buffer cache hit ratio, right?). I watched it for about five minutes and didn't see it drop below 99.99%. For even greater laughs, they're considering replacing the EqualLogic stuff with this to "solve the latency problem": http://www.violin-memory.com/products/6000-flash-memory-array/ I'm half tempted to tell them just to go ahead and do it if that's what they want.
|
# ? Aug 29, 2012 21:13 |
|
Boogeyman posted:I should have thought of that earlier (SQLServer:Buffer Manager - Buffer cache hit ratio, right?). I watched it for about five minutes and didn't see it drop below 99.99%. Unless they can actually quantify a problem caused by 4ms avg latency, this is officially the stupidest thing I've ever seen. If this isn't a SQL cluster, take your SQL logs back to local storage. Buy a new Dell with the PCI-E connected SSD drives and run your logs off that (or buy those PCI addin boards for your current server.) Anything that has to go outside your box is going to have latency that will be unacceptable to them. Edit: Basically it sounds like something is tweaked further up in your application stack and they are trying to scapegoat the storage. Nukelear v.2 fucked around with this message at 21:35 on Aug 29, 2012 |
# ? Aug 29, 2012 21:22 |
|
Nukelear v.2 posted:Edit: Basically it sounds like something is tweaked further up in your application stack and they are trying to scapegoat the storage. I can tell you exactly what that is. Every month we have to run a process that looks at about billion records individually, does some stuff, then inserts a new copy of each of those records in another table. Moving from local disk to the SAN greatly increased the length of that process, and they're convinced that it's due to the latency. I agree with them to a point, it's a huge volume of very small reads and writes. When they first brought it up, I told them to stop doing it that way and to make an effort to work on larger batches. They're in the process of rewriting it now, and I'm assuming that if they select and update large batches of records instead of onesie-twosie like they're currently doing, it won't be such a big deal.
|
# ? Aug 29, 2012 21:47 |
|
Boogeyman posted:I can tell you exactly what that is. Every month we have to run a process that looks at about billion records individually, does some stuff, then inserts a new copy of each of those records in another table. Moving from local disk to the SAN greatly increased the length of that process, and they're convinced that it's due to the latency. I think we found your problem.
|
# ? Aug 29, 2012 22:03 |
|
|
# ? May 10, 2024 00:38 |
|
Boogeyman posted:I should have thought of that earlier (SQLServer:Buffer Manager - Buffer cache hit ratio, right?). I watched it for about five minutes and didn't see it drop below 99.99%. Yeah dude, youre totally answering all your queries out of RAM.
|
# ? Aug 29, 2012 22:07 |