Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Vulture Culture
Jul 14, 2003

I was never enjoying it. I only eat it for the nutrients.

chutwig posted:

I got stuck with trying to sort out the FCoE plans of somebody who had left my last company. He had armed all the servers with Intel X520-DA2s. VMware and bare-metal Linux both used software FCoE initiators with these cards and were completely unable to keep up with even modest traffic, so VMs would constantly kernel panic when FCoE on the hypervisor poo poo the bed and all their backing datastores dropped off. I ordered a couple of Emulex OneConnect cards that obviated the need for the software FCoE crap and they worked really well, right up until one of the cards crashed due to faulty firmware. At that point I'd had enough of trying to salvage the FCoE poo poo, sent everything back and bought a pair of MDSes and some Qlogic FC HBAs, and never thought about my SAN crashing again.

Moral of the story: it did make a difference for me, but don't do FCoE unless you have a Cisco engineer on-site and a company-funded expense account with the nearest boozeria.
I can't remember the last time I worked with any piece of Emulex hardware, NIC or HBA, that didn't have catastrophic, production-impacting firmware problems.

Adbot
ADBOT LOVES YOU

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

Vulture Culture posted:

I see "DevOps" being used to mean "tool engineer," and boy oh boy, everyone's sick of hearing me go down this road again. Fencing is a quadrilateral and STONITH is a rhombus. Fencing is good. Self-fencing is good. STONITH is not a synonym for either of these things.

If you're writing really robust, generalizable code, I agree. But I get the very clear impression that you've never worked on code for a startup. You're not going to spend days or even hours working on robust reconnection logic for a service when you might toss out that message broker, or that database, or that method of handling sharding, or that language, or that entire product a week or a month down the road in response to some unanticipated business or technical need. You throw your hands up, you say "gently caress it," and let the service start over. Same thing if your services start out of order, or whatever the "race" (probably not a real race) situation is. Duct tape is the greatest invention in the history of mankind.

I've done the same thing. Service starts and a dependent upstream service isn't running yet? Exit and try again.

yeah im not gonna invest tons of manhours attempting to correct a once in a few weeks issue that is solved by rebooting if it's not impacting my customers at all

DevNull
Apr 4, 2007

And sometimes is seen a strange spot in the sky
A human being that was given to fly

Ahdinko posted:

Yes thank you VMware, I was curious about the throughput and performance of NSX, it is good to know that Bridge gets 1 more receive than VXLAN, this chart has been very useful for me.
(I didn't crop any units/axis labels off, this is as it came from them)



A few years ago the performance team at VMware did a comparison of PCoIP vs. Blast(which is really just beefed up VNC now) and they did this same type of thing. They made everything a unitless number. We asked for actual FPS information and they wouldn't give us those numbers, not even internally. They don't like to publish real number with performance results, because then they will be held to a standard of meeting those numbers. It is all marketing fluff.

Dr. Arbitrary
Mar 15, 2006

Bleak Gremlin
I've got a legacy application that needs XP.

It's hard to find XP keys now so I was thinking of using nested virtualization: ESXi -> Win7 -> XP

I don't know if that'll work. I also worry that if I convert the XP mode computer into a VMware vm it'll lose its licensing.

What's the right way to do this?

Gucci Loafers
May 20, 2006

Ask yourself, do you really want to talk to pair of really nice gaudy shoes?


Nested Virt is supported, so it should...

1000101
May 14, 2003

BIRTHDAY BIRTHDAY BIRTHDAY BIRTHDAY BIRTHDAY BIRTHDAY FRUITCAKE!

Ahdinko posted:

Yes thank you VMware, I was curious about the throughput and performance of NSX, it is good to know that Bridge gets 1 more receive than VXLAN, this chart has been very useful for me.
(I didn't crop any units/axis labels off, this is as it came from them)



That should be in gigabits per second. Essentially if a hardware bridge can do ~19gbps (split between transmit and receive) on a 10GbE port that it can do only slightly less using VXLAN.

Pile Of Garbage
May 28, 2007



Our new CommVault Snap Backup system went haywire over the weekend when some datastores ran out of space, actually managed to hit the VM snapshot hierarchy depth in vSphere:



:getin:

Wicaeed
Feb 8, 2005
Is LACP usage still only officially supported with the vSphere Enterprise license?

It's been a while since I fudged around with it, but I recall that the free (and lower licensed tiers) never really worked "right" or something.

Or maybe I'm just retarded.,

Wicaeed fucked around with this message at 17:04 on Jul 7, 2015

kiwid
Sep 30, 2013

Wicaeed posted:

Is LACP usage still only officially supported with the vSphere Enterprise license?

It's been a while since I fudged around with it, but I recall that the free (and lower licensed tiers) never really worked "right" or something.

Or maybe I'm just retarded.,

You need the enterprise plus license.

Cidrick
Jun 10, 2001

Praise the siamese

Wicaeed posted:

Is LACP usage still only officially supported with the vSphere Enterprise license?

It's been a while since I fudged around with it, but I recall that the free (and lower licensed tiers) never really worked "right" or something.

Or maybe I'm just retarded.,

Real 802.3ad LACP link aggregration only works if you're using distributed vSwitches, which are only available on Enterprise Plus.

Most people don't need true LACP link aggregration anyway. The standard vSwitch active/active NIC teaming is simpler to configure anyway. I'm not a networking expert, but I think the only benefit LACP gives you is better load balancing on the trunk, since it's considered one logical "link" and you let the magic of the protocol balance across the physical links for you.

Vulture Culture
Jul 14, 2003

I was never enjoying it. I only eat it for the nutrients.

Cidrick posted:

Real 802.3ad LACP link aggregration only works if you're using distributed vSwitches, which are only available on Enterprise Plus.

Most people don't need true LACP link aggregration anyway. The standard vSwitch active/active NIC teaming is simpler to configure anyway. I'm not a networking expert, but I think the only benefit LACP gives you is better load balancing on the trunk, since it's considered one logical "link" and you let the magic of the protocol balance across the physical links for you.
I'm going to switch to the term "aggregate" since "trunk" is overloaded and has other specific meanings in a VLAN context.

The LACP specification doesn't define any hashing or traffic distribution algorithms. These are entirely vendor-dependent on both the switch and operating system sides, and you still need to configure them to distribute traffic using a method that's appropriate for your environment (MAC hash, IP hash, etc.). As a result, it only very rarely gives you better load balancing capabilities on an aggregate -- specifically, each side of the pair can be alerted when a physical port is going down and make topology changes automatically, rather than having to detect (often incorrectly) the down port on the other end.

The real benefit of LACP is that you don't end up with configuration mismatches on both sides re:, for example, the number of ports in the aggregate or, say, some kind of condition where the aggregate isn't brought up correctly on the other end of the link.

1000101
May 14, 2003

BIRTHDAY BIRTHDAY BIRTHDAY BIRTHDAY BIRTHDAY BIRTHDAY FRUITCAKE!
LACP adds some complexity but it may give you better link utilization. Just make sure you're still connecting to 2 physical switches and that they support MLAG/VPC/VSS/whatever your switch vendor provides) so you can have a switch die and not take down your ESXi hosts.

If you're still running on Cisco catalyst without VSS then just avoid LACP and load balance on originating port-ID or use load based teaming.

quote:

The LACP specification doesn't define any hashing or traffic distribution algorithms. These are entirely vendor-dependent on both the switch and operating system sides, and you still need to configure them to distribute traffic using a method that's appropriate for your environment (MAC hash, IP hash, etc.). As a result, it only very rarely gives you better load balancing capabilities on an aggregate -- specifically, each side of the pair can be alerted when a physical port is going down and make topology changes automatically, rather than having to detect (often incorrectly) the down port on the other end.

While this is all technically true (the best kind of true) there are good odds your dvswitch and the upstream switch can now agree on a pretty specific hash that'll most likely give you pretty solid distribution of flows across the bundle's member links.

As of 5.5 you can load balance based on any of these hashes (which are supported by every cisco switch I've touched):

Destination IP address and TCP/UDP port
Destination IP address, TCP/UDP port and VLAN
Source IP address and TCP/UDP port
Source IP address, TCP/UDP port and VLAN
Source and destination IP address and TCP/UDP port
Source and destination IP address, TCP/UDP port and VLAN

I pruned out everything but the most specific stuff. It's come a long way from src-dst-ip.

That said, if I'm using 10GbE I still don't bother with LACP since it's pretty hard to make even an ESXi host chew up a pair of 10 gig interfaces long enough for anyone to notice.

1000101 fucked around with this message at 18:30 on Jul 7, 2015

Wicaeed
Feb 8, 2005

Vulture Culture posted:

I'm going to switch to the term "aggregate" since "trunk" is overloaded and has other specific meanings in a VLAN context.

The LACP specification doesn't define any hashing or traffic distribution algorithms. These are entirely vendor-dependent on both the switch and operating system sides, and you still need to configure them to distribute traffic using a method that's appropriate for your environment (MAC hash, IP hash, etc.). As a result, it only very rarely gives you better load balancing capabilities on an aggregate -- specifically, each side of the pair can be alerted when a physical port is going down and make topology changes automatically, rather than having to detect (often incorrectly) the down port on the other end.

The real benefit of LACP is that you don't end up with configuration mismatches on both sides re:, for example, the number of ports in the aggregate or, say, some kind of condition where the aggregate isn't brought up correctly on the other end of the link.

Yeah, our Jr. Networking guy keeps doing this, it's actually a bitch to troubleshoot since it doesn't always appear until you VMotion a vm from one host to another when it's the only vm on the vlan or something stupid, ugh.

Pile Of Garbage
May 28, 2007



Wait so is the general rule on the Cisco-switch side to not use port-channels?

BangersInMyKnickers
Nov 3, 2004

I have a thing for courageous dongles

PernixData is making a free version of their ram/SSD acceleration software for VMware hosts.

http://www.pernixdata.com/declare-your-storage-independence

Only does read acceleration and support is through the community forum, read caching is distributed and fault-tolerant. Paid enterprise version get your write caching and some other stuff. It operates on a block level instead of per VMDK like how VMware does ssd acceleration, so you basically build a big collective pool of redundant SSD (or ramdisk) between all your hosts and the software takes care of caching hot blocks and intercepting the disk request so they are satisfied either by local SSD or SSD on a peer host instead of going back to your NAS.

Pile Of Garbage
May 28, 2007



BangersInMyKnickers posted:

PernixData is making a free version of their ram/SSD acceleration software for VMware hosts.

http://www.pernixdata.com/declare-your-storage-independence

Only does read acceleration and support is through the community forum, read caching is distributed and fault-tolerant. Paid enterprise version get your write caching and some other stuff. It operates on a block level instead of per VMDK like how VMware does ssd acceleration, so you basically build a big collective pool of redundant SSD (or ramdisk) between all your hosts and the software takes care of caching hot blocks and intercepting the disk request so they are satisfied either by local SSD or SSD on a peer host instead of going back to your NAS.

Lol it's SAP HANA for your entire infrastructure. Sounds entirely unscalable and clicking through their website I still can't find hard numbers as to performance. I can see a potential use-case of squeezing all possible performance from a small SMB setup but beyond that the gains are doubtful, especially since they talk about caching between host DAS.

BangersInMyKnickers
Nov 3, 2004

I have a thing for courageous dongles

One of the major hosting providers in the area run it for all their production VDI. For bursty workloads it seems to hold up fine for what they let me touch, hundreds of sessions running with zero measurable storage latency backed against the shittiest 12 spindle SATA Dell storage appliance they could buy. What makes you think it isn't scalable? The redundancy is against two hosts (or more, its configurable) in the pool, not every single host.

And I can't say I'm surprised they don't publish performance numbers, its completely based on how much money you are willing to throw that your SSD, be it sata, sas, pcie, or even a pcie ramdisk.

Pile Of Garbage
May 28, 2007



Lol come on, I've literally never heard of PernixData until now and their model of distributed cache over DAS on multiple hosts is entirely unscalable as your interconnect requirements will scale linearly with workload and exponentially with the number of hosts. Also you can throw as much money you want at storage but the bottleneck is always going to be interconnect and hey you can only install so many 10Gb quad-port adapters in a host :)

ragzilla
Sep 9, 2005
don't ask me, i only work here


BangersInMyKnickers posted:

One of the major hosting providers in the area run it for all their production VDI.

This is one of the niches that local RAM/SSD based caching tend to play best in, since on VDI you're normally using internal disks or lovely DAS, and host memory is cheap. As soon as you're running on a SAN just let the SAN take care of caching for you.

-edit-
See also CBRC, Infinio, ILIO.

ragzilla fucked around with this message at 21:26 on Jul 7, 2015

BangersInMyKnickers
Nov 3, 2004

I have a thing for courageous dongles

cheese-cube posted:

Lol come on, I've literally never heard of PernixData until now and their model of distributed cache over DAS on multiple hosts is entirely unscalable as your interconnect requirements will scale linearly with workload and exponentially with the number of hosts. Also you can throw as much money you want at storage but the bottleneck is always going to be interconnect and hey you can only install so many 10Gb quad-port adapters in a host :)

Yeah, you're still misunderstanding how it works under the hood. If its servicing a read request it will attempt to pull out of the local cache so ideally you're at pcie bus speed and latency and the network interface doesn't get touched. If the data isn't on the local host's cache then it looks for it on an adjacent host which may have populated those blocks and pulls it host to host over the san so again you're talking microsecond latencies without touching the uplink to your storage appliance. If neither of those are available then it gets processed normally. I don't know how much money they are throwing around where you are for storage but flash pool/cache expansion on our netapp is an expensive proposition.

On write it commits to the local cache disk and (by default) one other host's cache and those systems become authoritative for those blocks to the cluster while they are waiting to commit back to the san. If one of the hosts up and dies before the write cache has committed back to the san then the other takes over for it. Again, I'm not seeing how there is a scaling issue, especially on storage fabric bandwidth. On the environment I got to see they were satisfying 70% of read requests directly from the pcie bus which cut down on san traffic considerably and the host to host traffic was going over lag interconnects on the storage switches that were barely being used anyhow.

BangersInMyKnickers
Nov 3, 2004

I have a thing for courageous dongles

ragzilla posted:

This is one of the niches that local RAM/SSD based caching tend to play best in, since on VDI you're normally using internal disks or lovely DAS, and host memory is cheap. As soon as you're running on a SAN just let the SAN take care of caching for you.

-edit-
See also CBRC, Infinio, ILIO.

I would have been incredibly useful for me about 2 years ago when management kept loving me over for storage funding so I was stuck on year 8 of a FAS3020c that was falling over with 100ms storage latency ever hour or so from bursty workloads. $10k for some pcie cards is a drat sight easier to swallow than $200k to rip and replace end of life units, and I doubt I am the only person in the world who found themselves in that boat. It could also be helpful in a situation where you need to normalize storage latency because you're forced to run VDI and virtual servers against the same storage appliance.

Pile Of Garbage
May 28, 2007



BangersInMyKnickers posted:

Yeah, you're still misunderstanding how it works under the hood. If its servicing a read request it will attempt to pull out of the local cache so ideally you're at pcie bus speed and latency and the network interface doesn't get touched. If the data isn't on the local host's cache then it looks for it on an adjacent host which may have populated those blocks and pulls it host to host over the san so again you're talking microsecond latencies without touching the uplink to your storage appliance. If neither of those are available then it gets processed normally. I don't know how much money they are throwing around where you are for storage but flash pool/cache expansion on our netapp is an expensive proposition.

On write it commits to the local cache disk and (by default) one other host's cache and those systems become authoritative for those blocks to the cluster while they are waiting to commit back to the san. If one of the hosts up and dies before the write cache has committed back to the san then the other takes over for it. Again, I'm not seeing how there is a scaling issue, especially on storage fabric bandwidth. On the environment I got to see they were satisfying 70% of read requests directly from the pcie bus which cut down on san traffic considerably and the host to host traffic was going over lag interconnects on the storage switches that were barely being used anyhow.

You're assuming an entirely unsaturated interconnect. And you assuming best-case scenario for cache distribution wherein the nearest-neighbour has the data. Just to clarify I'm not directly making GBS threads on this system, it can and will work for small, contained workloads spread statically across a number of hosts. However my contention is that this system is not scalable. This again comes back to my point of interconnect which drives home a simple truth: cache ceases to be useful when its access time exceeds that of the targeted data. Scaling out a system where an intermediate storage-layer cache is bound to DAS which increases exponentially with the number of compute nodes added to said system is insane!

Erwin
Feb 17, 2006

cheese-cube posted:

Lol come on, I've literally never heard of PernixData until now and their model of distributed cache over DAS on multiple hosts is entirely unscalable as your interconnect requirements will scale linearly with workload and exponentially with the number of hosts. Also you can throw as much money you want at storage but the bottleneck is always going to be interconnect and hey you can only install so many 10Gb quad-port adapters in a host :)

They certainly send me enough marketing emails. I'm sure they'd be happy to add you to their list.

YOLOsubmarine
Oct 19, 2004

When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

cheese-cube posted:

You're assuming an entirely unsaturated interconnect. And you assuming best-case scenario for cache distribution wherein the nearest-neighbour has the data. Just to clarify I'm not directly making GBS threads on this system, it can and will work for small, contained workloads spread statically across a number of hosts. However my contention is that this system is not scalable. This again comes back to my point of interconnect which drives home a simple truth: cache ceases to be useful when its access time exceeds that of the targeted data. Scaling out a system where an intermediate storage-layer cache is bound to DAS which increases exponentially with the number of compute nodes added to said system is insane!

Vmotion network is the interconnect. Most shops can't saturate a single 10GbE link with storage IO so if you've got two per host for vmotion you're never going to saturate it with cache IO.

VMWare clusters aren't really hugely scalable either. They can get very big, sure, but the largest container for a single workload is a single host so there isn't a ton of benefit to building very large clusters.

Also, a bunch of systems currently work this way like ExtremeIO, ScaleIO, VSAN, Simplivity, and Nutanix. Distributed caching is pretty common and there are more than enough fast, low latency, interconnect options out there to make it feasible at all but the most massive scales, and those tend to use sharding or some other form of partitioning.

YOLOsubmarine fucked around with this message at 23:13 on Jul 7, 2015

1000101
May 14, 2003

BIRTHDAY BIRTHDAY BIRTHDAY BIRTHDAY BIRTHDAY BIRTHDAY FRUITCAKE!

cheese-cube posted:

Wait so is the general rule on the Cisco-switch side to not use port-channels?

It depends! I've done it both ways depending on what my virtualization hosts are running, what workloads are running on them and what my upstream switches are doing.

10GbE is a lot of bandwidth. If I'm on 1 gig I might consider it if my upstream switches support some form of MLAG.

madsushi
Apr 19, 2009

Baller.
#essereFerrari

Erwin posted:

They certainly send me enough marketing emails. I'm sure they'd be happy to add you to their list.

A couple of the popular VMware bloggers went over to Pernix a while ago too.

kiwid
Sep 30, 2013

I'm having troubles getting nested esxi hosts to work in two networks, one being the labeled vm network (with management port), and the second being a labeled iscsi network on a different vswitch. Here is a screenshot of the current setup:



10.10.10.10 is the physical/top level host and the screenshot shows the networking setup.

vswitch0 has the labeled vm network and the management port and uses both physical nics in teaming.
vswitch1 has the labeled iscsi network and also has a port in the iscsi subnet and uses no physical nics (because they aren't needed as all the networking should be in the vswitches).

The 4 vms are two nested esxi hosts, one vcenter vm, and one freenas with iscsi. Each VM has two nics, one in the labeled vm network and one in the labeled iscsi network.

The vcenter vm doesn't really need to be in the iscsi network but i have one nic in there for troubleshooting and pinging other vms. The vcenter vm can ping the freenas and vice versa.

The problem is that I don't know how to setup the nested esxi host networking. Here is a screenshot of how I envisioned it should look like (.13 is esxi1 and .14 is esxi2):



Like I said, the vcenter and freenas vms can ping each other on either subnet, how come I'm having problems with the esxi vms?

edit: I can ping the management port on the nested esxi vms, just not the iscsi port.

edit2: Solved. Apparently I need to enable promiscuous mode on the physical hosts vswitches, though, I don't really know why. I saw it in this article: http://www.vladan.fr/nested-esxi-6-in-a-lab/ and it appears to have solved my networking woes. If someone wants to explain why this is required, I'm all ears.

kiwid fucked around with this message at 03:33 on Jul 9, 2015

1000101
May 14, 2003

BIRTHDAY BIRTHDAY BIRTHDAY BIRTHDAY BIRTHDAY BIRTHDAY FRUITCAKE!
Enable MAC change, forged transmits and promiscuous mode for any vSwitch or portgroup that a nested ESXi host is using. That should sort you out. (make sure you do all 3 or your guests may not have connectivity)

edit:

It happens because the nested ESXi server's vnic is going to go need access to traffic that isn't destined for the MAC address in it's .vmx file. Whenever you create a vmk (say for management or NFS) it's going to be another virtual interface that gets it's own MAC that is going to differ from the underlying "physical" MAC.

1000101 fucked around with this message at 04:23 on Jul 9, 2015

kiwid
Sep 30, 2013

What are your guys opinions on Nutanix?

some kinda jackal
Feb 25, 2003

 
 
I've got a VCSA6 server at home that I'm deploying some test VMs on.

I need to get access to this from work for reasons, just so I can tweak VMs, etc.

I've forwarded https://mydynds.url.com:44481 to https://vcsa.localdomain:443 and while the initial connection works fine (get screen prompting me to click here to access Web UI), once I click trough it tries to redirect me to https://vcsa.localdomain:443/something/something, which I obviously won't be able to resolve externally.

Is there any way to tell VCSA to not force a URL redirect and just use the URL I came in on?

Yes, I realize that this is ridiculously insecure but I don't really have the cycles to set up a VPN home right now and I was hoping this would be a quick workaround. This isn't a long term solution, but I'd like to get this working today, if it's at all possible. So far my googling hasn't come up with much.

kiwid
Sep 30, 2013

Martytoof posted:

I've got a VCSA6 server at home that I'm deploying some test VMs on.

I need to get access to this from work for reasons, just so I can tweak VMs, etc.

I've forwarded https://mydynds.url.com:44481 to https://vcsa.localdomain:443 and while the initial connection works fine (get screen prompting me to click here to access Web UI), once I click trough it tries to redirect me to https://vcsa.localdomain:443/something/something, which I obviously won't be able to resolve externally.

Is there any way to tell VCSA to not force a URL redirect and just use the URL I came in on?

Yes, I realize that this is ridiculously insecure but I don't really have the cycles to set up a VPN home right now and I was hoping this would be a quick workaround. This isn't a long term solution, but I'd like to get this working today, if it's at all possible. So far my googling hasn't come up with much.

What happens if you just hit the IP address instead?

evol262
Nov 30, 2010
#!/usr/bin/perl

Martytoof posted:

I've got a VCSA6 server at home that I'm deploying some test VMs on.

I need to get access to this from work for reasons, just so I can tweak VMs, etc.

I've forwarded https://mydynds.url.com:44481 to https://vcsa.localdomain:443 and while the initial connection works fine (get screen prompting me to click here to access Web UI), once I click trough it tries to redirect me to https://vcsa.localdomain:443/something/something, which I obviously won't be able to resolve externally.

Is there any way to tell VCSA to not force a URL redirect and just use the URL I came in on?

Yes, I realize that this is ridiculously insecure but I don't really have the cycles to set up a VPN home right now and I was hoping this would be a quick workaround. This isn't a long term solution, but I'd like to get this working today, if it's at all possible. So far my googling hasn't come up with much.

Set up a rewriting (rewriting probably optional) reverse proxy

some kinda jackal
Feb 25, 2003

 
 

kiwid posted:

What happens if you just hit the IP address instead?

Still tries to redirect to https://vcsa.localdomain:443 :[

evol262 posted:

Set up a rewriting (rewriting probably optional) reverse proxy

To be honest if I have to set up proxy I'll just go to the trouble to set up the VPN instead; I was hoping there was some XML thing I could change to "yes" in one of the arcane configs on the VCSA to prevent a redirect but it looks like it wasn't meant to be :\

bull3964
Nov 18, 2000

DO YOU HEAR THAT? THAT'S THE SOUND OF ME PATTING MYSELF ON THE BACK.


Looks like I actually tracked down my NTFS corruption issue I mentioned a few months ago. It was a VMWare bug on 6.0.

Tracked down to error messages in my logs.

Creating cbt node 7cb4c6-cbt failed with error Cannot allocate memory (0xbad0014, Out of memory).
Could not attach vmkernel change tracker: ESXi tracking filter failed (0x143c). Disk will be opened, but change tracking info vill be invalidated.

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2114076

Those errors occurred at the exact minute that the first NTFS corruption showed up. Looks like CBT was borked in the initial releases of 6.0.

So, if I would have sVMotioned these on the 5.1 hosts, I would have had no problems. It was 6.0 that caused it all along.

kiwid
Sep 30, 2013

Is it recommended that we have at least one physical domain controller?

Also, do you guys virtualize vCenter or does that sit on a physical box as well?

Moey
Oct 22, 2010

I LIKE TO MOVE IT

kiwid posted:

Is it recommended that we have at least one physical domain controller?

Also, do you guys virtualize vCenter or does that sit on a physical box as well?

All DCs and vCenter virtual. Been doing this for like 5 years with no issues.

If you have a huge cluster, maybe pin your vCenter server to a host or two so if it goes down, you can find it faster.

BangersInMyKnickers
Nov 3, 2004

I have a thing for courageous dongles

Moey posted:

All DCs and vCenter virtual. Been doing this for like 5 years with no issues.

If you have a huge cluster, maybe pin your vCenter server to a host or two so if it goes down, you can find it faster.

Are you doing CPU reservations or putting the DC in a high CPU priority resource pool? I'm running a fairly small domain with one DC vm and one physical (physical holds all the roles) but I was considering going virtual on both once I have a second cluster at the DR site running so I'm active-active instead of active-standby. Clock drift is always a concern, but the virtual DC is hanging at .9% cpu latency so that's probably not an issue.

Moey
Oct 22, 2010

I LIKE TO MOVE IT

BangersInMyKnickers posted:

Are you doing CPU reservations or putting the DC in a high CPU priority resource pool? I'm running a fairly small domain with one DC vm and one physical (physical holds all the roles) but I was considering going virtual on both once I have a second cluster at the DR site running so I'm active-active instead of active-standby. Clock drift is always a concern, but the virtual DC is hanging at .9% cpu latency so that's probably not an issue.

Nope. While it wouldn't be a bad idea, I have lots of headroom resource wise in my current and past environments, so it has not been an issues. DCs really just loaf along all day. I run AD DS, DNS. DHCP and NAP on my DCs. 1vCPU and 4gb memory. Server 2012.

Docjowles
Apr 9, 2009

I run (well, ran, just changed jobs) at least one physical box for any super critical service. DNS, LDAP, DHCP etc. But that's because I was forced to use OpenStack + KVM as our virtual environment by management and the entire thing would not infrequently tip over. Let me tell you how fun it is to troubleshoot an outage when you can't loving resolve hostnames and have to dig fallback credentials out of the password manager to log in because LDAP is down.

On a properly licensed and configured VMware cluster, I wouldn't have any reservations.

Adbot
ADBOT LOVES YOU

evol262
Nov 30, 2010
#!/usr/bin/perl
DCs in openstack just seems like such a bad idea. Nova doesn't even set up libvirt auth or anything. Why don't they run DNS/LDAP/DHCP/etc in normal VMs (on KVM if they want) with some kind of HA manager instead of dumping it in openstack?

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply