|
Moey posted:So their storage and server networks are not segregated? I'm so happy I'm not the only one that thought that. For a second I just assumed that I was the stupid one. It may still be true but at least I'm not alone.
|
# ? Oct 17, 2012 20:43 |
|
|
# ? May 25, 2024 14:26 |
|
Physically no, just logically. E: Just changed the Default route on my lab I realized it had a route and could ping my management interface. I disabled it and can still R/W to the datastores. It has to be doing something on the host level Dilbert As FUCK fucked around with this message at 20:49 on Oct 17, 2012 |
# ? Oct 17, 2012 20:45 |
|
I think it just automagically routes iscsi traffic through whatever vmk portgroup(s) is(are) in the same subnet as the iscsi target. It's probably pretty common for people who don't have MPIO set up, and just have the one vmk portgroup handling vmotion/iscsi/ftlogging, since it Just Works without binding a vmk portgroup to the swiscsi initiator manually.
|
# ? Oct 17, 2012 20:54 |
|
Mierdaan posted:I think it just automagically routes iscsi traffic through whatever vmk portgroup(s) is(are) in the same subnet as the iscsi target. It's probably pretty common for people who don't have MPIO set up, and just have the one vmk portgroup handling vmotion/iscsi/ftlogging, since it Just Works without binding a vmk portgroup to the swiscsi initiator manually. That was what I was thinking it sees the VSS, the VMK, and the subnet so it just throws it to that network. Either way their SAN/HBA's is getting replaced, this thing is on it's last leg, and that came from the top IT guy to me.
|
# ? Oct 17, 2012 21:05 |
|
Corvettefisher posted:Physically no, just logically.
|
# ? Oct 18, 2012 01:46 |
|
On the topic of vmkernel portgroups and storage without setting up iSCSI port binding: ESXi uses the first vmkernel interface on the same IP subnet as the target storage. If you had: vmk0: 192.168.0.0/24 for management, and vmk1: 172.16.0.0/16 for iSCSI connectivity with 172.16.x.x being your target/portal IP, it'll be talking over vmk1 even without port binding. On another note, VLAN tagging is not taken into account with ESXi's routing table, by the way. So a configuration like: vmk0: 192.168.0.0/24 in VLAN 5, and vmk1: 192.168.0.0/24 in VLAN 10... Both are seen as being in the same IP network, as far as the routing table is concerned. This can be a problem/confusing for some, especially if vmk0 and vmk1 talk to different switches/networks/hardware. Another way to think about it is that the vmkernel portgroups aren't responsible for tagging, but the vSwitch or Distributed vSwitch is, which is a different layer.
|
# ? Oct 18, 2012 02:59 |
|
iSCSI traffic will try going out any kernel interface that is within the subnet mask of the target or (god help you) it will route to get there. So assuming your management network is something like 192.168.1.x/24 and the storage is 192.168.2.x/24 but they're all on the same physical network and the gateway can route the packets between the two then it is entirely possible that your bouncing your iSCSI traffic through a route hop. That's a bad thing and you don't want to do it.
|
# ? Oct 18, 2012 19:06 |
|
BangersInMyKnickers posted:iSCSI traffic will try going out any kernel interface that is within the subnet mask of the target or (god help you) it will route to get there. So assuming your management network is something like 192.168.1.x/24 and the storage is 192.168.2.x/24 but they're all on the same physical network and the gateway can route the packets between the two then it is entirely possible that your bouncing your iSCSI traffic through a route hop. That's a bad thing and you don't want to do it. Yeah that is what I found when I was looking into it last night, we are changing it either way.
|
# ? Oct 18, 2012 19:21 |
|
Has anyone ever seen a Windows CIFS copy knock down a host NIC before? One of our developers did a big copy (26k files 1.5Gb) from one vm web server to another across hosts. Within a few seconds the copy failed and nearly every VM on the destination VM's host lost network for 5 minutes. The host has an entry 'Uplink vmnic0 has recovered from a transient failure due to a watchdog timeout' The funny thing is that if I change targets around wherever I send the copy to the target VM's host interface dies, so it wouldn't just be a single failing nic. As one other datapoint, we have mirrored hardware at our DR site but I can perform the copy just fine. The only difference I can see is that problem site uses vmxnet3 and our DR site uses E1000. Edit: Using ESX5. Both hosts use teamed active nics with the default source port balancing policy. These are also not our iscsi nics, so the increased iscsi load shouldn't be affecting it. If the copy occurs from two vm's on the same host, it completes just fine. Nukelear v.2 fucked around with this message at 20:14 on Oct 18, 2012 |
# ? Oct 18, 2012 20:05 |
|
BangersInMyKnickers posted:iSCSI traffic will try going out any kernel interface that is within the subnet mask of the target or (god help you) it will route to get there. So assuming your management network is something like 192.168.1.x/24 and the storage is 192.168.2.x/24 but they're all on the same physical network and the gateway can route the packets between the two then it is entirely possible that your bouncing your iSCSI traffic through a route hop. That's a bad thing and you don't want to do it. Interesting to know. I am assuming it will only try to route to the storage if the real iSCSI path is unavailable? So lets say my regular network is 10.1.1.1/24 and storage is 192.168.1.1/24 and I have a route that is not locked down in any way from the 10.1.1.X to 192.168.1.X. If the path that was setup for iSCSI (192.168.1.X) goes down, my host will take a path that was setup for management or VMnetwork and route the iSCSI traffic over that? So today I finally got around to getting a syslog server up and running (long overdue). We have an ESXi host that is setup for test/dev purposes. I am noticing that it is logging an error every 67 minutes. Below is a screenshot. Anyone have any idea what this is about? Google really isn't helping me.
|
# ? Oct 18, 2012 20:10 |
|
If that secondary route is available then it should get discovered during the discovery scan of the software iSCSI initiator (I doubt dynamic discovery will work since it is in a different subnet, so you'll be manually adding the target IP(s). At that point, the multiple paths will be listed under the initiator properties and it is up to you to configure in the lun multipathing options to define which would be the primary path/how it handles failover, or if you're just going to run it in round-robin. But again, routing iSCSI traffic is going to almost always a bad idea and should be avoided.
|
# ? Oct 18, 2012 20:42 |
|
Nukelear v.2 posted:Has anyone ever seen a Windows CIFS copy knock down a host NIC before? Check your switch for output drops? sh int count err
|
# ? Oct 18, 2012 20:49 |
|
OK guys, need a little help. Our main VMWare guy is leaving in a month, I have a solid understanding on how Virtualization works, can do basic administrative tasks in VCenter, etc, but I need to get into the nuts and bolts of this poo poo in case something goes wrong. We have Premium support or whatever for all our production stuff, and believe it or not there's money for training. I'm a MS guy though, so my VMWare experience is limited. Besides the Lowe book what other books should I purchase? I can swing a training class up to a week long, which one should I go for? Any recommendations? My thoughts are to pick up the Lowe Mastering VSphere 5 book and the official VCP5 cert guide books and then take the vSphere: Install Configure & Manage course.
|
# ? Oct 18, 2012 20:50 |
|
I will say that we just moved off iSCSI for NFS in our environment and it is fantastic to not be beholden to SCSI lock contention and all this volume/lun/VMFS sizing nonsense. It simplified the heck out of our deduplication and disaster recovery colume replication schedules and we're seeing 80+% deduplication rates on the NFS volume dedicated to OS paritions and it has been only going up as we add more VMs.
|
# ? Oct 18, 2012 20:51 |
|
skipdogg posted:OK guys, need a little help. Our main VMWare guy is leaving in a month, I have a solid understanding on how Virtualization works, can do basic administrative tasks in VCenter, etc, but I need to get into the nuts and bolts of this poo poo in case something goes wrong. We have Premium support or whatever for all our production stuff, and believe it or not there's money for training. I'm a MS guy though, so my VMWare experience is limited. Was your VMware guy also managing the network configuration and/or storage systems? You're going to need to bone up on those since more likely than not those will be the things that bite you in the butt, especially if they were done wrong and you need to fix it.
|
# ? Oct 18, 2012 20:53 |
|
^^ absolutely.skipdogg posted:OK guys, need a little help. Our main VMWare guy is leaving in a month, I have a solid understanding on how Virtualization works, can do basic administrative tasks in VCenter, etc, but I need to get into the nuts and bolts of this poo poo in case something goes wrong. We have Premium support or whatever for all our production stuff, and believe it or not there's money for training. I'm a MS guy though, so my VMWare experience is limited. If you have the training money, take the Fast Track course. They cram a lot more into that one, but it's more $$$. The O'Reilly VMware Cookbook is pretty good too, but Lowe's book is just great.
|
# ? Oct 18, 2012 20:54 |
|
The EMC v2 book is also really a good purchase. EMC in books on amazon will bring it up.Nukelear v.2 posted:Has anyone ever seen a Windows CIFS copy knock down a host NIC before? One of our developers did a big copy (26k files 1.5Gb) from one vm web server to another across hosts. Within a few seconds the copy failed and nearly every VM on the destination VM's host lost network for 5 minutes. The host has an entry 'Uplink vmnic0 has recovered from a transient failure due to a watchdog timeout' I am a bit confused here what is your setup look like? If it is 26k files that is a ton of I/O requests, you may be maxing out your IOPS. Basically DOS'ing the storage, causing the VM's to lose access to disk and funky stuff happens when datastores are DOS'd. Other than that you may want to check some of the nic's make sure drivers are up to day, bugfixes applied. Dilbert As FUCK fucked around with this message at 21:11 on Oct 18, 2012 |
# ? Oct 18, 2012 21:03 |
|
Mierdaan posted:Check your switch for output drops? sh int count err I'll loop in the network guys to take a look. I would have expected to see drop errors in the vmware host performance graphs, but there was nothing.
|
# ? Oct 18, 2012 21:03 |
|
Corvettefisher posted:I am a bit confused here what is your setup look like? If it is 26k files that is a ton of I/O requests, you may be maxing out your IOPS. Basically DOS'ing the storage, causing the VM's to lose access to disk and funky stuff happens when datastores are DOS'd. Yeah but if that was the case it should be throwing a bunch of alarms about storage latency thresholds getting exceeded.
|
# ? Oct 18, 2012 21:12 |
|
Mierdaan posted:If you have the training money, take the Fast Track course. They cram a lot more into that one, but it's more $$$.
|
# ? Oct 18, 2012 21:13 |
|
BangersInMyKnickers posted:Was your VMware guy also managing the network configuration and/or storage systems? You're going to need to bone up on those since more likely than not those will be the things that bite you in the butt, especially if they were done wrong and you need to fix it. The EMC SAN is FC and managed by my boss. Shouldn't be on my radar for a while I hope. Mierdaan posted:^^ absolutely. I'll check this out. I looked at it, but it mentioned extended hours and I have a hard stop at 5PM most days to get the kids. If I take it in East Coast time though it might work.
|
# ? Oct 18, 2012 21:14 |
|
BangersInMyKnickers posted:Yeah but if that was the case it should be throwing a bunch of alarms about storage latency thresholds getting exceeded.
|
# ? Oct 18, 2012 21:17 |
|
BangersInMyKnickers posted:I will say that we just moved off iSCSI for NFS in our environment and it is fantastic to not be beholden to SCSI lock contention and all this volume/lun/VMFS sizing nonsense. It simplified the heck out of our deduplication and disaster recovery colume replication schedules and we're seeing 80+% deduplication rates on the NFS volume dedicated to OS paritions and it has been only going up as we add more VMs. We moved to NFS quite some time ago to work around those issues as well, also not having to worry about VM corruption in the off chance there is a communication interruption. Being able to add storage without holding your breath while rescanning HBAs is nice too.
|
# ? Oct 18, 2012 21:18 |
|
skipdogg posted:I'll check this out. I looked at it, but it mentioned extended hours and I have a hard stop at 5PM most days to get the kids. If I take it in East Coast time though it might work.
|
# ? Oct 18, 2012 21:18 |
|
skipdogg posted:The EMC SAN is FC and managed by my boss. Shouldn't be on my radar for a while I hope. Well at least you don't need to worry about that end of it. But the networking in front of the cluster can be equally important, because if you don't have proper redundancy set up on the switches with the management interfaces for your hosts you can easily hit a situation where hosts think they are isolated and start shutting down VMs out of nowhere. If you do need to manage the networking, make sure you are familiar with link aggregation and spanning tree.
|
# ? Oct 18, 2012 21:19 |
|
Misogynist posted:Specifically, it consolidates in some of the material from the Datacenter Design and Manage for Performance classes. If you're planning on taking both of these quickly, you probably don't need to invest the extra money into the Fast Track, but it's a great deal if you're looking to learn the most important stuff as fast as you can. Eh, I would really stay away from the fast track courses they really don't do much. They are good but so is setting up a lab going through some labs from http://www.amazon.com/VMware-vSphere-Administration-Instant-Reference/dp/1118024435 Really good book. http://www.amazon.com/Administering-vSphere-Planning-Implementing-Troubleshooting/dp/1435456548/ref=pd_sim_b_4 Has a bunch of nice labs and how to's comes with a trainsignal CD as well
|
# ? Oct 18, 2012 21:19 |
|
Misogynist posted:Your employer is looking at paying $5500 for the course but you can't get a babysitter for a few hours out of one week? Oh I could, I just like picking my kids up from daycare and spending time with them. 5:30 to 9:30PM is kiddo time, I try to avoid anything work related between these hours.
|
# ? Oct 18, 2012 21:23 |
|
PM me if you want to go over some stuff this/next weekend, I am more than willing to help. Oh look at the time, time for class. can't remember who emailed me but call if you have my number
|
# ? Oct 18, 2012 21:24 |
|
Corvettefisher posted:I am a bit confused here what is your setup look like? If it is 26k files that is a ton of I/O requests, you may be maxing out your IOPS. Basically DOS'ing the storage, causing the VM's to lose access to disk and funky stuff happens when datastores are DOS'd. Each host runs 6 nics total: 2xiscsi(10G), 2xproduction traffic(1G), 2xmanagement(1G). 4 hosts with an EQL PS4110 backend. Storage backend is dual 10G nics on separate 10Gig switches, different nic/switch than the one that dies. I do get warnings on storage latency when this copy happens but I think it's because esx5 got really sensitive about throwing them, it's complaining that latency went from under 1k to 81k MICROseconds. The NIC that dies isn't an iscsi nic, it's a production network traffic nic. If I do this copy from VM's that are on the same host (thus no actual network traffic across the production links) then it completes just fine with the same latency warnings. I'll double check, but everything seems as up to date as I can without going to 5.1 Nukelear v.2 fucked around with this message at 23:20 on Oct 18, 2012 |
# ? Oct 18, 2012 21:33 |
|
skipdogg posted:Oh I could, I just like picking my kids up from daycare and spending time with them. 5:30 to 9:30PM is kiddo time, I try to avoid anything work related between these hours.
|
# ? Oct 18, 2012 22:14 |
|
Still the FT course is a lot of information in a short time and assumes you already have a great deal of VMware experience going in. Not saying it won't help or get you on the right track, but realize it will assume you know a large portion of it already.Nukelear v.2 posted:4 hosts with an EQL PS4100 backend. Dilbert As FUCK fucked around with this message at 14:48 on Oct 19, 2012 |
# ? Oct 18, 2012 22:25 |
|
Misogynist posted:It's really not a huge issue if the training is close to you, hours typically run 8:00 to 6:00 with a short lunch in the middle. I'd probably do this Online Live... I'm not seeing an in person class here in San Antonio anytime for the rest of the year. I did my SCCM training with that Online Live format and it wasn't terrible. Lots of downtime though waiting for other folks to roll through the labs. 2 Grand is a big price difference for the extra 10 hours of instruction.
|
# ? Oct 18, 2012 22:41 |
|
skipdogg posted:I'd probably do this Online Live... I'm not seeing an in person class here in San Antonio anytime for the rest of the year. I did my SCCM training with that Online Live format and it wasn't terrible. Lots of downtime though waiting for other folks to roll through the labs. 2 Grand is a big price difference for the extra 10 hours of instruction. ICM is garbage and not worth the money if you can read and aren't getting the cert. Also, apparently there are some items covered in the fast track course that are not really gone into great detail in ICM (as most of the technical questions I asked were considered out of scope but the instructor did mention that they were covered in FT).
|
# ? Oct 18, 2012 23:08 |
|
Great info to know. Thanks!
|
# ? Oct 18, 2012 23:18 |
|
Yeah 99% of the class when I took it for my 5.0 for work had people in there just to take the test. The instructor let everyone know first thing that taking this course will not teach you to be an admin or guarantee you pass the cert if you have minimal work experience. They really expect you to have a good deal of working with it prior to, the academic ICM class however is almost completely different and goes from a zero to hero approach.
|
# ? Oct 18, 2012 23:47 |
|
Seems to be a non-issue if anyone else ends up staring at this message wondering what is going on. quote:This seems to be a know issue and is harmless to your system. The fix for this issue is found in ESXi 5.0 update 2.
|
# ? Oct 19, 2012 00:48 |
|
Misogynist posted:It's really not a huge issue if the training is close to you, hours typically run 8:00 to 6:00 with a short lunch in the middle. If you're fast with labs, you can be done by 4PM every day. I guess it depends on how fast your instructor lectures too, though I have no idea how the online stuff works. BangersInMyKnickers posted:Yeah but if that was the case it should be throwing a bunch of alarms about storage latency thresholds getting exceeded. I dunno, in my environment I can overrun the buffers on my making GBS threads 3560X pretty much by putting a host in maintenance mode. Sometimes this results in several-hundred output drops (no big deal), sometimes this results in several thousand output drops on every port on the switch, not just the ESXi hosts' connected ports. Here's an output, with counters cleared about a minute before, from doing this: code:
Mierdaan fucked around with this message at 01:15 on Oct 19, 2012 |
# ? Oct 19, 2012 01:08 |
|
Mierdaan posted:If you're fast with labs, you can be done by 4PM every day. I guess it depends on how fast your instructor lectures too, though I have no idea how the online stuff works. I've done training courses where they were a combination of in-person and online, so you had the instructor in front of the class with a camera that fed to the online people. It was a pain in the rear end from the start because everything ran through the JRE and the first 2 hours were just trying to get the online people up and working. And then after every section there was a long pause while he waited for the online people to type questions, and answering them was painful because of that disconnect when you're trying to explain new concepts to people who aren't physically in front of you. And then the online people were constantly ducking out with phone calls from work or kid stuff because they were working from home, and the instructor was trying to get them caught up. It was annoying to say the least. But in both cases where the course was structured like that, all the labs and materials were laid open from the start so you could work ahead or poke through the optional labs and materials.
|
# ? Oct 19, 2012 14:45 |
|
janitorx posted:We moved to NFS quite some time ago to work around those issues as well, also not having to worry about VM corruption in the off chance there is a communication interruption. This sounds weird to me, I've grown iSCSI virtual filesystems fairly often without a hitch. Hell I just bumped one of my datastores a few hours ago. What problems do you guys have?
|
# ? Oct 19, 2012 16:39 |
|
|
# ? May 25, 2024 14:26 |
|
Rhymenoserous posted:This sounds weird to me, I've grown iSCSI virtual filesystems fairly often without a hitch. Hell I just bumped one of my datastores a few hours ago. What problems do you guys have? Its a little annoying because there are multiple steps (grow volume, grow lun, grow VMFS), but the biggest issue is SCSI lock contention. Thin provisioned VMDK files grow in size in 1-8mb chunks depending on how you set up the VMFS. Because iSCSI was designed to really be a single host protocol, VMFS has to overcome that by being multi-host aware. Every time one of those blocks needs to grow it issues a SCSI lock command which can hold up any other vmdk trying to grow at that same time. This is the VMFS locking issue that you may have heard about, and why they recommend limiting the number of VMs to a iSCSI datastore to somewhere in the 10-15 range. That means you have to manage a bunch of different luns, which means more wasted space as you try to maintain adequate freespace on each of them. And more luns means it is harder to keep your OS vmdk files separate from data ones to minimize the time it takes to run deduplication cycles. And, at least on our NetApp, we could only grow luns to be 10x their initial size which caused some headaches when we were getting started and doing the initial P2V conversion. And if you are growing out luns you have to force a rescan on the iSCSI initiator which can be a slow, painful experience if you are in an environment with a lot of lun mappings. NFS just doesn't have any of these weird limitations because it is multi-host aware and way more flexible.
|
# ? Oct 19, 2012 17:00 |