|
Okay, cool. I scrounged up a Cisco 2940. Let's see if that gets us through for the time being.
|
# ¿ May 20, 2010 21:34 |
|
|
# ¿ May 11, 2024 14:55 |
|
oblomov posted:Can anyone recommend an open source distributed file system that can stretch across multiple jbod systems ala Google FS? I am looking for one that will stripe across different drives in this "storage cluster". I've looked at ParaScale, HDFS, OpenAFS, etc... HDFS seems the most promising out of the bunch, but target metrics of multitude huge files is not quite what I was looking for. Edit: There's also MogileFS.
|
# ¿ May 21, 2010 14:06 |
|
Syano posted:I am looking for something entry level. We have about 14 servers we want to virtualize and we need some flexibility in the device so we are able to grow with it and my vendors keep coming back with these out of the ballpark priced solutions. What do you guys suggest for a good entry level unit? The big cost savings of virtualization is simple consolidation, and you can't determine how to maximize your consolidation ratio if you don't have the right numbers. From a storage perspective, we need to know what kind of IOPS (I/O transfers per second) you're looking for, and what the basic performance profile is of your applications (random vs. sequential I/O, reads vs. writes, and where your bottlenecks are). We also need to know what kind of reliability you're looking for, what your plans are regarding replication and disaster recovery features, whether you need to boot from SAN, and how you plan to back all of this up. You need to think about these things from both a business and technical perspective. In terms of the raw numbers, here are the places you need to look (I'm assuming you're a Windows shop): \PhysicalDisk(_Total)\Disk Transfers/sec \PhysicalDisk(_Total)\Disk Bytes/sec \PhysicalDisk(_Total)\Current Disk Queue Length You should probably let those perfmon logs cook for about a week before you take a look at them and start figuring out specs. These numbers, by themselves, don't indicate what your real requirements are. You need to look at the bottlenecks in your application to make sure that your disk is really performing where it should be. For example, if your system is starved for memory, you may be paging, or your database servers may be reaching out to disk a lot more than they need to because they don't have the memory they need for a query cache. Your best bet is to post over in the Virtualization Megathread, and I'll help you out with getting all the other numbers you need. Vulture Culture fucked around with this message at 15:07 on May 21, 2010 |
# ¿ May 21, 2010 15:03 |
|
I'm running Nagios, and NSClient++ on our Windows servers. Most are QLogic HBAs with SANsurfer, some are Emulex with HBAnyware. How do I monitor Windows MPIO to make sure each server sees the proper number of paths to its configured storage? (It should be 4 paths per LU: 1 active and 1 passive on each of the 2 connected fabrics.) Bonus points if you can answer this for ESX, Linux, AIX or Solaris.
|
# ¿ May 24, 2010 22:35 |
|
adorai posted:I don't know what crashplan is, but windows server can replicate data all on it's own, transferring only block level changes (and looking for already existing blocks it can copy before putting it on the wire) to keep any number of file shares in sync. soj89 posted:Bottom line: What's the best RAID type to put in place? What about the controller type? The more I've been reading, it seems like Raid 1+0 is preferred over Raid 5 in any case. Would an i7 quad with 8 gb be overkill for the main file server? Especially since it's not particularly i/o intensive. These things generally doesn't matter too much performance-wise for small bursts if you have a battery-backed controller, because the controller often has enough cache to soak up the whole write long before it gets put to disk. Wikipedia actually has a really nice writeup on the performance characteristics of different RAID levels: http://en.wikipedia.org/wiki/Standard_RAID_levels My tl;dr version:
An interesting property of striped RAID arrays is that if you want a normalized stripe width, you need to carefully pick your number of disks, generally to 4N for RAID-0, 4N+1 for RAID-5 and 4N+2 for RAID-6. Generally, performance isn't important enough for you to be anal-retentive to this level. Vulture Culture fucked around with this message at 04:48 on Jun 1, 2010 |
# ¿ Jun 1, 2010 04:06 |
|
Vanilla posted:Read IO is the same for R1 and R6 if you're looking at the same number of drives. In a RAID-1 (or RAID-0+1, etc.) array, if you're doing a large sequential read, the heads on both disks in the mirror set are in the same place (and passing over the same blocks), so you get zero speed benefit out of the mirror set even though you're technically reading from both disks at once. Versus RAID-0, your throughput is actually cut in half. For large sequential reads, your performance will almost always be better on an array without duplicated data, because you get to actually utilize more spindles. With RAID-5/6, you do lose a spindle or two on each stripe (though not always the same spindle like with RAID-4) because you aren't reading real data off of the disks containing the parity information for a given stripe. This implies that for random workloads with a lot of seeks, RAID-0+1 will give you better IOPS. RAID-5/6 for read throughput, RAID-0+1 for read IOPS. Vulture Culture fucked around with this message at 07:45 on Jun 1, 2010 |
# ¿ Jun 1, 2010 07:41 |
|
Vanilla posted:It's a file server, so the IO profile is likely going to be highly random read, cache miss. The caveat being that it may be some kind of specific file server (CAD, Images) dealing with huge file sizes and then a boost in sequential read would be beneficial as you note. Keep in mind, though, that cache hit/miss is also going to be heavily influenced by the behavior of cache prefetch on the array, which can be substantial on large files. Also consider that for small uncached reads, this implies that major portions of the MFT/inode table/whatever probably are going to be cached (though the OS will probably handle plenty of this before the requests hit the storage subsystem again). Vanilla posted:I do disagree that you lose a spindle or two with each stripe, the parity is distributed and does not consume whole drives - all drives participate in the random read operations without penalty and worst case random read IO for a 10k spindle - about 130 IOPS. However, I may be totally wrong here because i'm not factoring in the data stripe itself but have never seen it as a factor is IOPS comparison calcs! Determining streaming throughput is very linear -- your factors are the number of spindles, the rotational speed of the spindles, and the areal density of the data on the platters. When you're reading from a spinning disk, and every Nth segment doesn't contain usable data, that knocks your streaming rate down to ((N-1)/N) of the maximum, because you're throwing away a good chunk of the data as garbage. For visual reference, just look at a diagram like this one: http://www.accs.com/p_and_p/RAID/images/RAID5Segments.gif Ignore the array as an array, and just focus on how the data is laid out on a single spindle. You have six segments there, and you're only reading data from five of them. It's going to be slower than if you're reading six pieces of data off the same amount of disk. The reason that this doesn't matter in IOPS calculations is because your main bottleneck becomes how fast you can seek, not how fast you can stream data off the disk, and since the parity disk is idle at that particular moment, it can move the head to where it's needed next and participate in another read. Anyway, I didn't mean to go off on too big a tangent, because like I said, this is sort of a boundary case that rarely actually matters in real life. The RAID-1+0 sequential read bit is the more important one, because a lot of people don't consider the overlapping head issue and it can mean losing up to half your read throughput. Vulture Culture fucked around with this message at 09:32 on Jun 1, 2010 |
# ¿ Jun 1, 2010 09:18 |
|
Hey, any other IBM guys in here have issues with Midrange Storage Series SANs just hanging [incomplete] remote mirror replication until it's manually suspended and restarted? I had one LUN sitting at 10 hours remaining for about 9 days until I finally just kicked it and I'd like to know if it's a unique problem before I call IBM to open a case.
|
# ¿ Jun 8, 2010 19:18 |
|
Oh man, am I ever regretting agreeing to work on IBM SAN gear. Should have taken the job at the EMC shop instead. I think "storage manager client deletes a completely different array than the one I clicked" is sort of a bad thing.
|
# ¿ Jun 23, 2010 21:07 |
|
Nukelear v.2 posted:This thread needs a bump.
|
# ¿ Jun 24, 2010 07:47 |
|
I got to see what happens when an IBM SAN gets unplugged in the middle of production hours today, thanks to a bad controller and a SAN head design that really doesn't work well with narrow racks. (Nothing, if you plug it right back in. It's battery-backed and completely skips the re-initialization process. Because of this incidental behavior, I still have a job.)
|
# ¿ Jun 29, 2010 05:40 |
|
Nomex posted:If it's a DS4xxx unit I would schedule a maintenance window so you can power it off and reboot it properly. DS units are touchy, and you might see some glitches down the road. I also have a call open with engineering asking why the gently caress the controllers on a $250,000 SAN are allowed to ever be out of sync with one another for any reason, but at least because of the battery-backed memory I'm not helping our Exchange admin rebuild corrupt mailboxes while I do it! Vulture Culture fucked around with this message at 17:21 on Jun 30, 2010 |
# ¿ Jun 30, 2010 17:18 |
|
Nomex posted:You should see what happens when you turn one on in the wrong order. I hate IBM DS equipment with the fury of a thousand suns.
|
# ¿ Jun 30, 2010 19:10 |
|
So it looks like Enhanced Remote Mirroring on the IBM DS4000/5000 doesn't actually replicate any LUNs bigger than 2TB, even though it lets you configure the relationship, runs it and then tells you that it's completely synchronized. Also, this limitation isn't documented anywhere. Hope this doesn't burn anyone else!
|
# ¿ Jul 15, 2010 16:53 |
|
AmericanCitizen posted:We have three of these:
|
# ¿ Jul 16, 2010 04:54 |
|
IBM SAN guys: I figured out IBM's mostly-undocumented recovery procedure for an accidentally-deleted LUN. Is there any way to make recover logicalDrive (or equivalent) preserve the old NAA ID of the deleted LUN? I'm paranoid about resignaturing in VMware, especially when boot LUNs get involved.
Vulture Culture fucked around with this message at 06:24 on Jul 28, 2010 |
# ¿ Jul 27, 2010 06:08 |
|
Are the iSCSI target and initiator on the same subnet? Can you put them on the same subnet? Are you using jumbo frames? Does it work any better if you fall back to 1500 MTU? Do your performance problems go away if you dismantle the port channel for testing? (The port channel probably isn't doing you any good whatsoever, because port channels only distribute traffic from different hosts, and can't saturate more than a single link with traffic from a single other host. You can round-robin your traffic to the Oracle server (writes), but you'd probably get better application performance dedicating one NIC to the iSCSI initiator and one to Oracle connections from applications.) Since you're running from a software iSCSI initiator, do you see anything funny going on (e.g. lots of retransmits) if you run Wireshark as you test? Vulture Culture fucked around with this message at 03:52 on Jul 29, 2010 |
# ¿ Jul 29, 2010 03:42 |
|
adorai posted:verify that your LUNs are properly aligned. Probably a good thing to check just for due diligence, though. Besides, it never hurts for guys interacting with SANs to actually understand segments, stripes and alignment.
|
# ¿ Jul 29, 2010 03:57 |
|
Anyone know the standard log write size for Exchange 2010, perchance? I know the DB page size was increased to 32k, which actually makes it possible to run on RAID-5 in our environment, but I can't find anything about the transaction logs.
|
# ¿ Aug 3, 2010 16:07 |
|
cosmin posted:regarding the NICs discussion, is anything wrong with using multiport NICs (apart from availability issues) or a single port NIC is the best solution? Theoretically you could have issues with crap cards that have ToE implementations that can't keep up with 4Gb being pushed through in round-robin or something, but I've never run into anything like that.
|
# ¿ Aug 21, 2010 23:31 |
|
Eyecannon posted:Building a new fileserver, would like to use ZFS, but I would rather use linux than solaris... is this legit now?: http://github.com/behlendorf/zfs/wiki
|
# ¿ Aug 26, 2010 15:56 |
|
Bluecobra posted:What's wrong with x86 Solaris 10?
|
# ¿ Aug 26, 2010 19:09 |
|
Keep in mind that there may be a substantial performance penalty associated with the snapshot depending on how intensive, and how latency-sensitive, your database workload is. Generally, snapshots are intended for fast testing/rollback or for hot backup and should be deleted quickly; don't rely on the snapshots themselves sticking around as part of your backup strategy. The performance penalty scales with the size of the snapshot delta.
|
# ¿ Aug 31, 2010 23:49 |
|
Syano posted:So talk to me a bit more in depth about hardware based snapshots then. I am guessing based on what you guys are saying that my best use of them as a backup tool would be to take a snapshot, mount that snapshot to a backup server, run my backup out of band from the snapshot, then delete the snapshot when that is done as not to inccur a ton of overhead? That sound about right?
|
# ¿ Sep 1, 2010 14:12 |
|
paperchaseguy posted:XIV does this, but at the block level. XIV snapshots are incredibly easy to work with.
|
# ¿ Sep 3, 2010 13:36 |
|
adorai posted:there is likely zero benefit to defragging a san.
|
# ¿ Sep 3, 2010 20:35 |
|
Would any of you IBM guys happen to be able to come up with a reason why every morning at 2:03 AM, my ESX servers flag a SCSI bus reset from our DS5100 that happens just long enough to gently caress up my Exchange cluster and bring every single mailbox server in the DAG offline? Every once in awhile, but not all the time, this also is enough to upset replication to the storage head in question from our DS4800 on the same fabric. I'm just about to place a case, but I figured I'd ask here first in case anyone knew offhand what might be special about 2 AM on these devices. Vulture Culture fucked around with this message at 22:38 on Sep 8, 2010 |
# ¿ Sep 8, 2010 22:34 |
|
H110Hawk posted:Had BlueArc gotten any better than the steaming pile of poo poo they were in the Titan 2000 days? Apparently our sales guy was super shady and eventually got fired. I met the new guy and he said most of his time is spent cleaning up the mess the old guy made. Adding a dose of reality to that, only half of our problems were likely caused by the sales guy, the other half was due to poor hardware. It's not common for most vendors to resell kit, especially storage kit, from other vendors. Almost all of the major vendors resell something or other made by LSI. In most of these cases, the vendors buy and rebrand, and have access to the firmware code to investigate and fix bugs and forward their bugs and fixes upstream. BlueArc doesn't take responsibility for the products that they sell. Instead, they go "oh, it's an LSI problem" and sit on their hands for a few months while we have production issues with devices in a permanent error state. I don't know if the firmware licenses are really expensive and that's why their solutions are so cheap or if they just don't have anyone on staff familiar enough with the products they sell to actually maintain the firmware, but I definitely haven't been thrilled with the whole product support experience. We used to have great experiences with their support, but our main support guy moved down to Philly and now we only hear from him when things go really, really wrong. We had someone last week come in to set up replication between our Titan cluster and the Mercury head at our DR site, and he somehow ended up knocking out one of our Titans used by half of our HPC cluster during production hours, presumably by having butterfingers with the FC cabling on the backend. He then denied that anything happened, which is perfectly plausible from a factual standpoint, but the fact that this thing just happened to go down when he was fiddling around behind the rack was a little bit too convenient a coincidence for my tastes. My overall recommendation is that BlueArc is remarkably cost-effective if you need a reasonably inexpensive vendor for scalable storage because you have huge storage requirements (e.g. you're in multimedia production or life sciences and keep huge amounts of data online basically forever) but you should be absolutely aware of what you're getting into before you ink a contract, because they definitely do not have the reliability track record of a tier-1 storage vendor like EMC.
|
# ¿ Sep 9, 2010 12:15 |
|
Intraveinous posted:Could you expand at all on your ill-fated Xyratex adventures? They seem to be the OEM for just about everything that's not LSI made or small/big enough to be made in house, but I keep hearing about problems with their stuff.
|
# ¿ Sep 17, 2010 20:02 |
|
And hey, while I'm here, can someone explain to me how long-distance ISLs (~3km) are supposed to be configured on Brocade 5000/300 switches? Obviously I didn't set something up right on our longer pair of campus ISLs, because I started anal-retentively monitoring SNMP counters today and I'm noticing a crapload of swFCPortNoTxCredits on those ports.
|
# ¿ Sep 17, 2010 20:03 |
|
sanchez posted:"An issue exists on HP StorageWorks 2000 Modular Smart Array products running firmware versions J200P46, J210P22, or J300P22 that will eventually cause controller configuration information to be lost, with subsequent loss of management capability from that controller. Array management, event messaging, and logging will cease functioning, but host I/O will continue to operate normally. This issue affects the ability to manage the array from the affected controller only; if a partner controller is available, the array can be managed through the partner controller. Because configuration information is stored in non-volatile memory, resetting or powering off the controller will not clear this error. If the issue occurs, the controller must be replaced. This failure mode is time sensitive and HP recommends immediately upgrading firmware on all MSA2000 controllers. This is not a hardware issue and proactive replacement of a controller is not a solution. To avoid this condition, you must upgrade your controller to the latest version of firmware."
|
# ¿ Sep 20, 2010 15:31 |
|
Syano posted:That post scares the hell out of me. We are virtualizing our entire network at the moment all on top of a Dell MD3200i. Now when I say our entire network I am really just talking 15ish servers, give or take. But hearing a story of a SAN go tits up even with 'redundant' options gives me nightmares. Lots of people use great resiliency features as an excuse to be lazy and stupid with everything else. I can't even wait to see the havoc that Exchange 2010 wreaks on the most retarded of administrators. H110Hawk posted:Redundant is another word for money spent to watch multiple things fail simultaneously.
|
# ¿ Sep 21, 2010 13:10 |
|
adorai posted:I hope you aren't implying that lazy admins will use replication as an excuse to not backup their data.
|
# ¿ Sep 21, 2010 13:31 |
|
Is there anywhere I can find, in plain English, the real-life implications, causes, and resolutions of common Brocade port errors like swFCPortTooManyRdys and swFCPortRxBadOs?
|
# ¿ Sep 21, 2010 15:56 |
|
ghostinmyshell posted:Brocade usually says replace or upgrade the firmware Cultural Imperial posted:Is anyone out there looking at 10GbE? Vulture Culture fucked around with this message at 22:43 on Sep 21, 2010 |
# ¿ Sep 21, 2010 22:33 |
|
oblomov posted:On Exchange 2010, we are going to go no backup route. However, we'll have 3 DAG copies, including 1 lagged DAG over wan to remote datacenter, and we have redundant message archiving infrastructure as well. MS is also not doing backups on Exchange 2010 as well, and they don't even have lagged copies.
|
# ¿ Sep 22, 2010 02:22 |
|
oblomov posted:I figure with new and improved I/O handling for 2010, we could get away running this off SATA SAN, either NetApp or Equallogic. We'll see how our testing goes. I am not sure that virtuals will turn out to be cheaper, considering SAN costs, increased license costs (smaller virtual servers vs. larger hardware ones) and vmware costs, but we'll be doing the number crunching. Anyway, enough derailing the thread . Right now we're running on FC because we had literally terabytes of spare FC capacity just sitting here, but I don't really see any compelling reason why we couldn't run on SATA with the I/O numbers we're pulling off the SAN.
|
# ¿ Sep 22, 2010 12:52 |
|
Cultural Imperial posted:Reading all these posts about people aggregating 6+ ethernet ports together, I was curious if anyone had thought about using 10GbE instead.
|
# ¿ Sep 22, 2010 20:19 |
|
adorai posted:It's not just about speed, it's also about redundancy and network segregation. I am not going to run iSCSI, management, and guest networking on one link, even if it is 10GbE, because I don't want a vmotion to suck up all of my iscsi bandwidth. http://blog.aarondelp.com/2010/09/keeping-vmotion-tiger-in-10gb-cage-part.html http://www.mseanmcgee.com/2010/09/great-minds-think-alike-%E2%80%93-cisco-and-vmware-agree-on-sharing-vs-limiting/ http://bradhedlund.com/2010/09/15/vmware-10ge-qos-designs-cisco-ucs-nexus/ Putting aside the concerns about link redundancy, it certainly seems in the virtualization spirit to consolidate links and let the bandwidth be partitioned out as needed, letting software handle the boundaries dynamically, rather than installing dramatically underutilized hardware to enforce resource boundaries.
|
# ¿ Sep 22, 2010 23:16 |
|
|
# ¿ May 11, 2024 14:55 |
|
1000101 posted:I think with 10gigE that best practice will gradually phase out and be replaced with logical separation and traffic shaping. It takes a LOT of work to saturate a 10 gig link.
|
# ¿ Sep 26, 2010 22:34 |