Tuesday, March 3, 2009

ESX, EMC, and Cisco

Fault tolerance is the name of the game.
Within my company we have 3 main locations with 3 data centers, the main being at the location I work in. We have 2 ways to setup networking for our ESX servers. The stats are as follow:

HQ
VMware ESX networking setup:
Storage - EMC NS-20 via iSCSI
Network - Cisco Catalyst 6509 w/VSS
Hosts - Dell Poweredge 2950 dual/quad procs, w/ 36gb ram
70+ guest VMs

2 of our 4 hosts have 6 nics in them and the other 2 have 8 nics.
2 for VM network
2 for vmotion
2 for iSCSI
and the additional 2 in the others hosts are for DMZ VM guest access.
We connect them so that 1 of the 2 teamed connections resides on different PCI cards.
Within the network config and ESX we setup teaming/bonding/trunking between the 2 NICs.
Because our 6509s are utilizing VSS, it affords us the opportunity to share backplane info and treat both of them as a single virtual switch. Why is this huge? Because we can take advantage LACP(not within ESX but elsewhere) AND have chassis redundancy.
On the ESX side we use team load-balancing by IP hash, which is only recommended for nics connecting to the same switch chassis. LACP is not supported at the time of this writing. Below is an example of a nic's port configuration:

interface GigabitEthernet2/1/14
description ESX04-vmnic5-vmotion
switchport
switchport access vlan 180
switchport mode access
switchport nonegotiate
channel-group 31 mode on
spanning-tree portfast
spanning-tree bpduguard enable

interface Port-channel31
description ESX04-VMkernel
switchport
switchport access vlan 180
switchport mode access
switchport nonegotiate

interface GigabitEthernet1/2/10
description W-IT-ESX04-vmnic0
switchport
switchport trunk encapsulation dot1q
switchport mode trunk
switchport nonegotiate
flowcontrol send off
channel-group 32 mode on
spanning-tree portfast trunk
spanning-tree bpduguard enable

interface Port-channel32
description W-IT-ESX04-vm/sc
switchport
switchport trunk encapsulation dot1q
switchport mode trunk
switchport nonegotiate
spanning-tree portfast trunk
spanning-tree bpduguard enable

interface GigabitEthernet1/2/9
description W-IT-ESX04-vmnic1
switchport
switchport access vlan 165
switchport mode access
switchport nonegotiate
mtu 9216
flowcontrol send off
channel-group 30 mode on
spanning-tree portfast
spanning-tree bpduguard enable

interface Port-channel30
description W-IT-ESX04-iSCSI
switchport
switchport access vlan 165
switchport mode access
switchport nonegotiate
mtu 9216


Our shared storage is an EMC Celerra(NS-20). I has the ability to serve disk up via CIFS, NFS, and iSCSI. This unit is setup similarly to our ESX hosts in that has an LACP channel spanning across both switches for iSCSI and normal TCP/IP traffic. This setup is robust, efficient and performs well.

A frequent misnomer is that through a 2 link LACP channel, we have an aggregate 2gb at our disposal. This is not entirely true. If 1 of the gig connections "fills up", data does not spill over into the other connection. Also, if a single host that has a 2gb LACP channel to the Celerra, it will always traverse whatever link it is currently using to get there. All other data flowing to/from that host will use the other link in the LACP channel. This can be important when configuring your iSCSI targets on the Celerra. We created 4 targets on the Celerra across a 2gb LACP channel. This effectively load balances the iSCSI lun traffic over the entire channel. Had we only used a single target for the entire channel, it would only use 1 connection. LACP is not the greatest for 1 to 1 connections. It is meant for 1 to many and many to many. We chose 4 targets as a happy medium rather than creating a target for each lun, which could add admin complexity.

Thus far, results have been very positive. We've had no reports of slowness or other negative observations with this setup. Network failover is near instantaneous and allows us to be as resilient as our VM guest OSes and shared storage allow us to be.

No comments:

Post a Comment