|
GrandMaster posted:I thought the status within vSphere was dependent on the vSwitch having more than 1 NIC connected.. The status is based on being able to verify that you have two or more vmkernels going out over independent paths, both of which have access too all initiator IPs and their luns. It's the only way that the software can verify for itself that it has full redundancy. In Moey's case, he does have full redundancy in his configuration but the software just can't tell that because of how it is set up, so you get a report of partial/no redundancy. It's an easy enough fix though so long as you aren't already in production.
|
# ? Feb 1, 2013 06:41 |
|
|
# ? May 9, 2024 14:32 |
|
KS posted:If you're not doing something special on those switches (like a VPC) then that's not really a good idea for iscsi traffic. With that diagram and traffic on a single subnet, are you relying on layer 2 redundancy and spanning tree? It's just not fast enough. Even if you are doing VPC and port-channels, IMO that's great for NFS but you should be relying on MPIO for iscsi. It's relying on MPIO in the iSCSI initiators. It's a recommended and supported configuration. There is no spanning tree because there are no loops between switches, just multiple endpoints and interconnected switches. The switch interconnect allows gives a bit more protection in a double-failure scenario in the event that a nic on the host and the nas both failed at the same time. Without it and you were running two independent switches side-by-side, if the nic on the left side of the host and the right side of the nas went out, you would completely lose storage traffic. The interconnect would allow the right host nic to talk to the left nas nic in that scenario. Not too likely, but it's easy, works, and is generally a better configuration. e: It also allows you to transition to NFS with full redundancy much easier. We learned that the hard way when we had to take everything down to re-wire.
|
# ? Feb 1, 2013 06:46 |
|
Yeah, thought about it a bit more and edited out the stp stuff, but it's still wrong. Without doing some sort of fixed path or weighted MPIO (and it'd be manual because nothing in the vmware plugin would detect this), you'd send half of your traffic from initiators on switch A to targets on switch B. You'd see link saturation on the cross-link and probably some out of order packet fun as well. Sure, it'll work, but it's far from optimal. If you were doing it right to move to NFS, those switches would be in a VPC, iscsi would STILL be in independent VLANs on both switches, and you'd have a VLAN for NFS (or re-use ONE of the iscsi VLANs). ISCSI hosts would then get a link to each switch, and NFS hosts would get a port-channel to both switches. I can post NX-OS configs for this scenario. BangersInMyKnickers posted:The switch interconnect allows gives a bit more protection in a double-failure scenario in the event that a nic on the host and the nas both failed at the same time. edit: the good news is, if you're really running this way you could gain a fair bit of performance when you fix it. I'd suggest monitoring traffic volume on your ISL. You CAN do a layer-2 redundant setup on one subnet, but it involves VPC capable switches and port-channels to the hosts. I don't really see any indication that that's going on in that diagram. KS fucked around with this message at 07:17 on Feb 1, 2013 |
# ? Feb 1, 2013 06:55 |
|
I'm having some difficulty conceptually with how a hypervisor (specifically ESXi 5.1) works. So, on a certain level, I understand that the hypervisor takes whatever resources are available to it and divides them between the virtual machines it is hosting. At the most basic level, this apparently means actual CPU cycles, which is why VMware measures "used performance" on a VM in Mhz. You can also give guests more than one virtual CPU, and this is where it gets a bit weird for me. I've been told that giving a VM more cores doesn't necessarily make it faster. This seems counter-intuitive, especially with applications that are multithreaded and benefit from running on a physical box with an abundance of cores. So for some tasks, you might have a physical server with 8 cores, but a VM with only 2, and the VM will even perform better. This breaks my brain. What's going on here?
|
# ? Feb 1, 2013 07:38 |
|
Powdered Toast Man posted:I'm having some difficulty conceptually with how a hypervisor (specifically ESXi 5.1) works. This is a really tiny thing, and it's all due to a minor little detail called co-scheduling. You might have heard the term "SMP" thrown around when talking about multi-CPU systems. SMP stands for Symmetric Multi-Processing, and "Symmetric" is really the key to understanding this. As a physical system with multiple CPUs runs, those CPUs need to be kept in lock-step; this is why all the CPUs in a multi-socket system need to be identical. If one of the CPUs somehow falls out of sync with the others, the entire system will crash and burn. This is, for obvious reasons, something you want to avoid whether you're virtualizing or not. So in the olden days of virtualization, hypervisors did exactly this: they virtualized SMP by literally running all of the virtual CPUs simultaneously on the same number of physical CPUs. If you had a 4-CPU machine, and you wanted to run a 4-CPU VM on it, the hypervisor would wait until all four CPUs were free before giving your VM its time slice, letting it run, and then resuming business as usual. So, the hypervisor has to context-switch the VMs running on those CPUs, wait until the last one is off and do nothing with those completely free CPUs in the meantime, then run the gigantic monster VM. Nothing else could run while that VM was scheduled to monopolize all the CPUs in the system. SMP virtual machines weren't just slow; they dragged down your entire virtual environment, and not by any small amount. SMP really sucked in those olden days. Nowadays, we have something called relaxed co-scheduling. Some smartypants engineers with VMware figured out that you don't really need to keep the CPUs completely in lockstep, you just need to keep them really, really close. This sounds like a really minor distinction, but it has a major architectural implication -- you don't need to run all your vCPUs at once anymore. So now one CPU at a time can run, but if it gets too far ahead of the slowest CPU, it needs to hang back and wait for that CPU to catch up or bad things can happen. So now there's a lot more flexibility, but the devil's in the details. If you're running an SMP VM, now your CPUs need something to do. If one CPU is constantly running ahead of the other, you end up going start-stop-start-stop-start-stop and never really being able to run for more than a small fraction of the consecutive time that the VM is supposed to run. If you've dealt with OS internals at all, you know that too many context switches are a bad thing; in addition to the overhead of storing all those registers so you can run something else on the CPU, you're also killing your CPU cache in the process. Most hypervisors now implement some form of relaxed co-scheduling. Relaxed co-scheduling reduces the overhead of scheduling an SMP VM from really loving bad to minimally tolerable in most cases. In the best case, a CPU-bound SMP VM churning at full tilt will incur almost no SMP overhead at all. Bottom line: if you're using a properly multithreaded application, and not routinely processing a single-threaded batch job that blasts a single vCPU, you're fine with SMP (within reason). If you're running a small number of single-threaded workloads on a larger number of vCPUs, SMP will hurt your performance. Most operating systems in the Year of Our Lord 2013 support hot-adding CPUs, so as a general rule, start your VMs small and add CPUs as you need them. I still wouldn't recommend doing something like running an 8-vCPU VM on an 8-core box if you value your sanity. Vulture Culture fucked around with this message at 08:32 on Feb 1, 2013 |
# ? Feb 1, 2013 08:26 |
|
Powdered Toast Man posted:So for some tasks, you might have a physical server with 8 cores, but a VM with only 2, and the VM will even perform better. This breaks my brain. What's going on here? You're probably thinking of instances where four 2-vCPU VMs will outperform the one 8-core physical box. VMware has some Hadoop benchmark documents that show this (it's a very slight advantage). This is for the same reason that 4 2-core physical boxes would (probably) also outperform that 8-core box on certain tasks.
|
# ? Feb 1, 2013 12:16 |
|
KS posted:Yeah, thought about it a bit more and edited out the stp stuff, but it's still wrong. Without doing some sort of fixed path or weighted MPIO (and it'd be manual because nothing in the vmware plugin would detect this), you'd send half of your traffic from initiators on switch A to targets on switch B. You'd see link saturation on the cross-link and probably some out of order packet fun as well. Sure, it'll work, but it's far from optimal. We handled the switch interconnect by bundling two links between them. Then you can do round-robin all day long if you want and you won't have saturation at that point anymore than any other place on the SAN. But there also isn't anything wrong with doing failover with affinity which would do the job as well, so that switch interconnect only gets used under a failure scenario. That configuration is extremely similar to your suggestion of two parallel switches, just with an added layer of redundancy.
|
# ? Feb 1, 2013 15:41 |
|
BangersInMyKnickers posted:The status is based on being able to verify that you have two or more vmkernels going out over independent paths, both of which have access too all initiator IPs and their luns. It's the only way that the software can verify for itself that it has full redundancy. In Moey's case, he does have full redundancy in his configuration but the software just can't tell that because of how it is set up, so you get a report of partial/no redundancy. It's an easy enough fix though so long as you aren't already in production. I am not taking any performance hit on this am I? The SAN has two controllers with connections from each controller split between the two switches. The hosts have their iSCSI connections split between NICs. Each host has two vmkernels for iSCSI bound to their own NIC. Pathing policy is set to round robin. The "path" number to each datastore is correct with the number of logical paths (including crossing from switch A to switch B).
|
# ? Feb 1, 2013 16:45 |
|
So I assume that the latest ESXi is pretty good at co-scheduling...that being said, is it a bad idea to have more than twice as many vCPUs on guests as you have actual logical processors on the host?
|
# ? Feb 1, 2013 20:31 |
|
Powdered Toast Man posted:So I assume that the latest ESXi is pretty good at co-scheduling...that being said, is it a bad idea to have more than twice as many vCPUs on guests as you have actual logical processors on the host? No, the over-provisioning of vCPU's is very common, it is fine to have more vCPU's than physical/logical cores, but be aware that too much over provisioning can cause performance derogation on the host and in term the virtual machines. You'll need to check your CPU ready and performance of virtual machines in order to find the load that your host can take. Dilbert As FUCK fucked around with this message at 20:44 on Feb 1, 2013 |
# ? Feb 1, 2013 20:40 |
|
Moey posted:I am not taking any performance hit on this am I? No, you have have full gigabit (I assume) fabric through, and so long as you have enough target luns (probably 4 or more) the round robin should roughly balance out the traffic across your two paths without having to specify specific paths. If you stick with the /24 subnet mask, you might as well remove the switch interconnect. No traffic would be able to pass over it anyway because your subnet mask will stop it.
|
# ? Feb 1, 2013 20:49 |
|
movax posted:Giving this a try, but running into some issues. Had to set a different package path or something to get the compat modules to install, and looks like the VMXNET 3 modules aren't working properly in pfSense (won't get DHCP IPs, etc). After doing that several times in the past few years, I decided to stick with E1000s and suck up the additional hit in CPU usage. I can't really speak for the completely manual VMware Tools (on BSD) install, but if pfSense updates still break poo poo, I'm staying far away. E1000E is also similar enough if you yearn for the 10-Gbit rates, but generally speaking all the virtual NICs achieve far greater speeds than their advertised/negotiated rates in many cases. For instance, between two 10Gbit VMXNET3 interfaces, I got 14.5 Gbit/s between two VMs on the same box, and 9.5 Gbit/s over the actual CNAs.
|
# ? Feb 1, 2013 21:08 |
|
We had our EMC rep come by and give us a little pep-talk/update on whatever, blah blah blah. She brought along one of their VARs (side note: I loving hate it when companies do this) and they were giving me the whole "SANs are dead, in a couple years nobody will care, it'll just be a big, slow NAS with big flash caches on every host." Anyway, after talking with them a little more, I didn't realize that they had gotten the application acceleration flash caches working in VMWare, with vMotion also working. But what I would like to know is what kind of real world performance increases have people seen when using it. Which products did you go with and why? I mean, I don't think I would go with EMC, because it locks you into their software, and their flash hardware, but maybe they're the best option?
|
# ? Feb 1, 2013 21:56 |
|
Corvettefisher posted:No, the over-provisioning of vCPU's is very common, it is fine to have more vCPU's than physical/logical cores, but be aware that too much over provisioning can cause performance derogation on the host and in term the virtual machines. You'll need to check your CPU ready and performance of virtual machines in order to find the load that your host can take. I should probably also mention that we're one host failure away from taking down an entire $500M company. Our redundancy is laughable. To be fair, the host servers are good quality hardware (Dell R710), but...it's like this small nagging voice in the back of my head is always telling me how totally hosed we would be if anything ever went wrong. It always cracks me up when people say things "aren't in the budget" but don't consider the possible costs of system failures.
|
# ? Feb 1, 2013 22:26 |
|
Serfer posted:We had our EMC rep come by and give us a little pep-talk/update on whatever, blah blah blah. She brought along one of their VARs (side note: I loving hate it when companies do this) and they were giving me the whole "SANs are dead, in a couple years nobody will care, it'll just be a big, slow NAS with big flash caches on every host." Dell has been brewing up a lot of crazy stuff and their push to go private in partially due to their desire to do more skunkworks projects and actually produce something of their own instead of slapping their badges on things other people make for them. I have a feeling that the way we think of storage currently is going to change a lot over the next three to five years and that VAR isn't very far off. If you can put yourself in to a holding pattern for that long to see what is going to happen, I would recommend it.
|
# ? Feb 1, 2013 22:29 |
|
Serfer posted:We had our EMC rep come by and give us a little pep-talk/update on whatever, blah blah blah. She brought along one of their VARs (side note: I loving hate it when companies do this) and they were giving me the whole "SANs are dead, in a couple years nobody will care, it'll just be a big, slow NAS with big flash caches on every host." Some of the storage vendors are doing some pretty cool stuff with local to shared type stuff. Not to mention flash technology is really changing a lot of things. Flash is loving amazing for VDI, EMC's FAST(seriously look this up) is almost a have to have on any VDI deploy due to how VDI environments operate. Fusion-IO is really amazing stuff as well with HA and deploying new desktops. I have a feeling 'traditional' SAN's will still be around for quite some time for Enterprises, however the SMB market and how they address storage networks will change more dynamically in the coming years. Here is a demo of 510 Virtual Desktops boots with a VNX and FAST. https://www.youtube.com/watch?v=-qmxMVPqo_o It's pretty loving cool Dilbert As FUCK fucked around with this message at 22:52 on Feb 1, 2013 |
# ? Feb 1, 2013 22:45 |
|
BangersInMyKnickers posted:No, you have have full gigabit (I assume) fabric through, and so long as you have enough target luns (probably 4 or more) the round robin should roughly balance out the traffic across your two paths without having to specify specific paths. If you stick with the /24 subnet mask, you might as well remove the switch interconnect. No traffic would be able to pass over it anyway because your subnet mask will stop it. Full gigabit (I got to implement 10GbE at my old place ). Maybe I am missing something but why do you say the switch interconnect does nothing? Everything is within the same /24 so I do not see why it wouldn't be routable over it.
|
# ? Feb 2, 2013 00:37 |
|
Moey posted:Full gigabit (I got to implement 10GbE at my old place ). Initially you said that while you were using the /24 subnet, your IPs were in the 192.168.x.x range. Is that not actually the case?
|
# ? Feb 2, 2013 01:12 |
|
BangersInMyKnickers posted:Dell has been brewing up a lot of crazy stuff and their push to go private in partially due to their desire to do more skunkworks projects and actually produce something of their own instead of slapping their badges on things other people make for them. I have a feeling that the way we think of storage currently is going to change a lot over the next three to five years and that VAR isn't very far off. If you can put yourself in to a holding pattern for that long to see what is going to happen, I would recommend it. Oh I agree with them that SAN systems are going to die out. Whether it's something like openstack that kills it with distributed storage, or host based flash caches that kill it. I just want to know if anyone has had any experience with any of the systems. Fusion-IO's turbo system, or EMC's VFCache, or even SanDisk's FlashSoft. I mean I know it's only been like 4 months since it started working with VMWare stuff, but some people are in the bleeding edge.
|
# ? Feb 2, 2013 01:28 |
|
BangersInMyKnickers posted:Initially you said that while you were using the /24 subnet, your IPs were in the 192.168.x.x range. Is that not actually the case? This is the case. Everything (SAN and hosts) are all in the same 192.168.x.x range. So in the picture that I drew, the host has 4 paths to the SAN. If I pull the switch interconnect, that host will only have 2 paths. Edit: I said "routable" in my last post, that was the wrong term.
|
# ? Feb 2, 2013 02:24 |
|
BangersInMyKnickers posted:Dell has been brewing up a lot of crazy stuff and their push to go private in partially due to their desire to do more skunkworks projects and actually produce something of their own instead of slapping their badges on things other people make for them. I have a feeling that the way we think of storage currently is going to change a lot over the next three to five years and that VAR isn't very far off. If you can put yourself in to a holding pattern for that long to see what is going to happen, I would recommend it. They've bought companies to provide basically the entire IT infrastructure. I wouldn't be surprised if their last acquisition before going private is a business that reuses shipping containers, and then they'll just start drop shipping data centers ala Project Blackbox.
|
# ? Feb 2, 2013 04:24 |
|
Powdered Toast Man posted:So I assume that the latest ESXi is pretty good at co-scheduling...that being said, is it a bad idea to have more than twice as many vCPUs on guests as you have actual logical processors on the host?
|
# ? Feb 2, 2013 05:01 |
|
But if you are overcommited 5 to 1 because each one of your virtual servers has the same number of vCPUs as your processors have cores... that may be a problem.
|
# ? Feb 2, 2013 05:45 |
|
Internet Explorer posted:But if you are overcommited 5 to 1 because each one of your virtual servers has the same number of vCPUs as your processors have cores... that may be a problem. I was under the impression that as long as you have enough physical memory, things might get really slow if you're heavily overcommited but it wouldn't necessarily tank your VMs. As an example, we have a dual hex core box with 24 logical processors that has about 50 vCPUs on it. Everything runs pretty well. Then again, it has 128GB of RAM.
|
# ? Feb 2, 2013 05:55 |
|
Internet Explorer posted:But if you are overcommited 5 to 1 because each one of your virtual servers has the same number of vCPUs as your processors have cores... that may be a problem.
|
# ? Feb 2, 2013 06:04 |
|
Powdered Toast Man posted:I was under the impression that as long as you have enough physical memory, things might get really slow if you're heavily overcommited but it wouldn't necessarily tank your VMs. Generally ram is the factor that limits a box, however improper vCPU assignment can hinder what your host can acceptably sustain. Monitoring CPU ready and processing time is important, don't assign more than what you need or are recommended.
|
# ? Feb 2, 2013 06:41 |
|
VMware lab question. I had mine all set up perfectly, but was in a real deadspot at work with nothing to do over the past few weeks so I just deleted the entire thing and started fresh. Fun stuff. But now, I've got a few ESXi instances running, all connected through vcenter, and with openfiler playing SAN on the backend. Carved out a 200gb VMs LUN and presented it, connected it up, put my ISOs on a separate datastore, created the VM itself, ran it Aaaannnddd tiiiimmmeee ssslllooowwweeeeeddd ddoooooowwwwnnnnnnnnnn I've never seen Windows install this slowly, and I've never had this problem before, so I'm a little about it. I'm 20 minutes into a Server 2008 install, and it's at 8% on expanding files (not even installing). Has anyone seen this before, or have suggestions? It is not a "yo poo poo is slow" problem, it is a "yo poo poo is hosed", but I can't figure out where it's hosed. ---- Problem #2 - I have moved 50gb worth of ISOs and data to a datastore which is hosted on my SAN. The files are there. The files are accessible. But where are the files? Like, I am browsing the hard drive of the lab box, and I swear to everything holy I can not find my god damned datastore. The SAN's vmdk files are a total of like 1gb. What is going on on this lab box? I am in bizarro world. MC Fruit Stripe fucked around with this message at 07:46 on Feb 2, 2013 |
# ? Feb 2, 2013 07:26 |
|
Repost from the cisco thread, since half the problem is virtualization-related. Maybe one of you guys has a clue. I'm once again stumped by my Nexus'es unwillingness to do my bidding. The problem is that I can't get my port-channels to some new ESXi hosts to get/stay up. Config: (indentical on both switches) code:
code:
Log file: code:
ESXi config: vSwitch1 vmnic5: vmnic4: teaming:
|
# ? Feb 4, 2013 15:48 |
|
MC Fruit Stripe posted:I am browsing the hard drive of the lab box, and I swear to everything holy I can not find my god damned datastore. The SAN's vmdk files are a total of like 1gb. What is going on on this lab box? I am in bizarro world.
|
# ? Feb 4, 2013 15:50 |
|
On your host you have the nic balancing as IP HASH right? Here's the KB on what you are doing http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1004048 MC Fruit Stripe posted:VMware lab question. I had mine all set up perfectly, but was in a real deadspot at work with nothing to do over the past few weeks so I just deleted the entire thing and started fresh. Fun stuff. First off. Are you using Software ISCSI or hardware? Version of ESXi? Is this the only VM on the box or are there multiple VM's If it is an Openfiler SAN did you install tools and use the proper Storage Adapter? Thick or thin disks? NFS or ISCSI? If NFS I would suggest you assign it 1-2gb ram and 2vcpu's, Iscsi should be fine 1vcpu 1gb ram. run a conary updateall on the Openfiler box. (you can also try freenas which has some VAAI/ZFS support) What I would try Move the ISO to the local datastore and see if the time required to deploy goes down. Check to see what the datastore is reporting used, as well as network to the datastore. Check for high latency to each datastore. What are the specs of which you are installing? Sometimes I'll assign more resources to a VM to build a template them trim them down after install/updates. Problem #2 I have no idea what you are asking here Dilbert As FUCK fucked around with this message at 16:41 on Feb 4, 2013 |
# ? Feb 4, 2013 16:32 |
|
Yeah:
|
# ? Feb 4, 2013 16:36 |
|
For the hell of it have you tried giving the vSwitch off ALL VLAN and specify 739? E: Also is the 5.0 or 5.1? And they aren't broadcom nics are they? Dilbert As FUCK fucked around with this message at 17:06 on Feb 4, 2013 |
# ? Feb 4, 2013 16:42 |
|
I have not. It's 5.0 and Intel 10bge nics.
|
# ? Feb 4, 2013 17:32 |
|
Corvettefisher posted:No, the over-provisioning of vCPU's is very common, it is fine to have more vCPU's than physical/logical cores, but be aware that too much over provisioning can cause performance derogation on the host and in term the virtual machines. You'll need to check your CPU ready and performance of virtual machines in order to find the load that your host can take. Esxtop on the two hosts I was concerned about showed most of the VMs under 1% RDY, although one is hovering at around 5%, which is supposed to be bad, I guess? It's a very high traffic web server.
|
# ? Feb 4, 2013 17:51 |
|
Powdered Toast Man posted:Esxtop on the two hosts I was concerned about showed most of the VMs under 1% RDY, although one is hovering at around 5%, which is supposed to be bad, I guess? It's a very high traffic web server. <1% is great, 1-3% is okay, 5+ is where the VM might be experiencing some slowdowns. How many vCPU's does it have and how utilized are those cores(report from within the web server IE taskmanager/TOP)
|
# ? Feb 4, 2013 18:12 |
|
Here's a current view of esxtop on the host in question: (whited out areas are VM names) The server has two X5680 processors and 96GB of memory; it's a Dell R710.
|
# ? Feb 4, 2013 18:26 |
|
Here's something I'm really curious about: If Hyper-V's only intended to run Windows as dom0, and the Windows kernel has awareness of Hyper-V, why the hell isn't the hypervisor integrated into the kernel? You know, similar to KVM, probably able to get some more performance out of it all, because there wouldn't be this hypercall barrier between the hypervisor and the host OS.
|
# ? Feb 4, 2013 18:33 |
|
Powdered Toast Man posted:Here's a current view of esxtop on the host in question: Things I would check; Reduce the number of vCPU's on the VM. It sounds a bit counter intuitive however reducing the time that the scheduler has to ask "can I process this" and can reduce the CPU ready. Enable Hot Add, which allows non disruptive assignment of vCPU's. Enable Hyperthreading if not already done so, as well as VT-d. Move it to a host with a newer/higher performing CPU (assuming you set up EVC should be non disruptive)
|
# ? Feb 4, 2013 18:44 |
|
evil_bunnY posted:Repost from the cisco thread, since half the problem is virtualization-related. Maybe one of you guys has a clue. You're using LACP with a standard vSwitch. This won't work. Change to 'channel-group X mode on' Also make sure the switch is configured for src-dst-ip assuming that actually exists on a nexus still. src-dst-ip-port (or whatever it is may also work.) 1000101 fucked around with this message at 18:53 on Feb 4, 2013 |
# ? Feb 4, 2013 18:46 |
|
|
# ? May 9, 2024 14:32 |
|
I'm researching dumping XenServer as were up for renewal and moving over to a VMware vSphere essentials kit. I wanted to handle the whole data recovery with one shot. How is VDP in the essentials plus kit? Can I get away with using it to a backup NAS instead of Veeam?
|
# ? Feb 4, 2013 19:18 |