Tuesday, September 22, 2009

Wednesday, September 16, 2009

Thank you Apple!

In attemping to ungrade to snow leopard, the process failed horribly. Thankfully, I backed everything up prior by using Carbon Copy Cloner because Time Machine doesn't work for me. I had suspected that my system had issues. Thankfully Apple knew my system was beyond repair (ok, they probably didn't know) and refused to upgrade my MacBook. They instead coerced me to reinstall fresh from 10.5 and THEN upgrade to snow leopard. Even though it was a time consuming PITA, it was probably the best solution.

The results are positive thus far. Seems quicker. I can't attribute that to the fresh install or upgrade though. Built in Cisco VPN client is nice. Haven't tried Time Machine yet to see if I gain that functionality. Safari is fast and stable and so far all my apps continue to work.

All in all its been good. The upgrade sucked eggs. If I were Joe-user, I'd be super-p!ssed since Macs are just supposed to work. I am at least technical and can at least RTFM and work my way out of it. Ole Joe will probably have to go see a "genius".

Friday, September 11, 2009

Thursday, September 3, 2009

NineInchNails, many thanks

For laying down an unreal site for your content. If you like NIN even a little bit check
remix.nin.com. So very good.

Friday, July 31, 2009

VMware guest optimization: Linux

In an effort to squeeze every bit of performance and achieve maximum efficiency out of our vSphere farm, I came across adjusting "ticks" in recent kernel releases. As far as I understand it, ticks are the method in which the CPU(s) query the kernel activity. The more the ticking, the more the checking. Ticks can be favorable for desktop installations such as a user moving the mouse while compiling a program. Compiling is processor intensive but because of the ticks, the CPU looks for the mouse movement in order to fluidly paint the movement across the screen.

Most server installations don't use a mouse often. Therefore, the treadoff may be less variable but the kernel is able to process jobs longer. A side effect, among many, is less power consumption because the kernel isn't constantly nagging the processor(s). It is recommended that Linux VMs employ some sort of tick managment to reduce the activity. Here is a link from VMware showing the various distros that support tick management and how to enable it.

I changed a test CentOS VM and the results were interesting. Check out the idle CPU prior to reboot(spiked activity) and after. Changing this in 1 - 2 VMs, may not yield much for results. However, across 15-30 per host, as we consistently run on our hosts in the farm, we can gain back some needed processing powa!


Thursday, July 23, 2009

Apple's Time Machine brought down by Alex's Voice

I have been toiling with Apple's Time Machine backup software for sometime. I use a MacBook Pro as my primary workstation for work. On paper, Time Machine looks righteous. It snapshots your disk(s) periodically and lets you restore to certain points of time. YMMV depending on how much disk you have for a backup.

I've only gotten it to work once then it died and never worked again. I couldn't find a resolution within a couple hours so I gave up. My mac is becoming less and less reliable these days, it seems. I've been inclined to get it working again. My hope is that this problem has been flushed out in the forums or through updates since the last time I tried about a year ago. NOPE! Like a Volkswagen door handle, its still broken. I let it run and attempted to find out what files it hung up on. Low and behold, it was Alex's voice from the speech synthesizer. I speculate it is corrupt. Interesting note found here. This is the largest file in Leopard. Nice. Incidently, the OS weighs in @ around 10gb. After this, Time Machine appeared to run. Let's see if it works on subsequent backups or if pukes again.

Wednesday, July 22, 2009

VMware VDR, presently, is junk

What else can I say. VMware really messed this one up. Its released as GA, but it isn't ready for beta. It flat out doesn't work. Every aspect of it. However, I was able to successfully import the OVF and get it installed with an IP. I added CIFS as well as local datastores. Neither now work on 1.0.1. So, I have a useless appliance in the farm.
I had hight hopes for this product, hopefully they fix it fast before others start to try to deploy this in production.
Until then, VCB it is.

Thursday, June 11, 2009

Thanks H1N1!


Thanks for bringing proper hand washing techniques back to everyone's attention!

Wednesday, May 13, 2009

File serving to change as we know it

I'm convinced. Deduplication is the way of future pertaining to mass file serving, storage, backups and the like. Dedupe is no longer bleeding edge. As I mentioned earlier, it is a must have for IT. It started out as an option to ditch tape for backups at the block level. Now we are seeing it creep into the file sharing realm in the form of file level dedupe. My company owns 2 of EMC's "unified storage" devices aka Celerras. 2 months ago EMC released a software upgrade that allows for compression AND file level dedupe. Moore's law tells us we have CPU cycles to spare, in most file server's cases. Seems like a reasonable tradeoff, increased CPU processing for more storage capacity. In the end a much more all around efficient solution is realized. In time where services are being offloaded to the cloud and servers are being consolidated and virtualized or all together axed, efficiency is the name of the game.

It won't be long (I predict less than a year) when we'll be seeing more dedupe solutions in the file server space.

Tuesday, April 14, 2009

IT Thrill Seeking

Disclaimer: The ensuing hilarity is not condoned or suggested, merely posted for your reading pleasure.

All IT "professionals" aren't Mt.Dew drinking, dark office sitting, no showering, no documenting geeks. Some of us are inherent thrill seekers or closet adrenaline junkies. We've been presented with challenges or say, opportunities to for thrill fulfillment in the IT industry. A few examples. I am notorious for moving servers within racks "hot". I've honed my skills over time. Its not the smartest thing but sometimes the only option. Just make sure you have all your bases covered. Dual homed NICs, dual power supplies with a good, long extension cord, a place to set the running server and steady hands.



BTW, this was an ESX server housing roughly 15ish VMs, withOUT vmotion.
Got any good IT thrill seeking stories of your own? Do share please!

The Sun debacle

I've had this issue stewing for some time now and am just now posting it. I'll start by saying, I like Sun. As a whole, I think Solaris is a GREAT OS when coupled with Sparc processors. Great things happen when you can develop an OS around your own hardware, IE Sun and Apple to name a few.
A b!tch to setup and equally as painful to troubleshoot, however once you get a system dialed, you're golden. For a long time or until hardware fails or someone mucks it up. I also REALLY like ZFS. If I could butter my bread with it, wear it under my pants, or or kiss it good night, I would. ZFS is a GREAT piece of software, no one knows about.

Now, here is my issue. A few years ago, I set up 2 T2000s to run as Oracle DB servers. These servers front ended a Sun branded Stor-Edge 6130. I chose to go with ZFS rather than Veritas or UFS. ZFS had only been out for about a year but I felt the benefits where worth it and it wasn't exactly bleeding edge. So, for this setup, 3 of the 4 varibles are Sun owned. I turned off caching within the kernel for ZFS and let the RAID handle due to known issues. I followed every best practice article I could find to make this setup solid. And it was as it lasted almost 2 years and 2 san switch failures(without issue).

Fast forward to January. We run avg to small sized Oracle DB as far as enterprise DBs go. Its about 100 Gb. Not crazy right? So, our DBA adds a small 5gb DB onto the server and BOOM GOES DYNAMITE! We get tons of cpu bottlenecking and throttling. Long story short, Oracle doesn't see an issue with the setup. Sun doesn't see any issue with the setup, yet acknowledge this could be a ZFS kernel issue. Bugs happen and I'm OK with that.

The real problem occurs with Sun's response to our issue. I basically plead with them to attempt to reproduce our errors. I question what is different about our setup than anyone else in the world. Surely we can't be the only one's running an Oracle 10g DB on ZFS over Solaris10u4. Little old me? I'm the only one with the gumption, guile stupidity or whatever to run an Oracle DBA on ZFS? C'mon. Is Sun hurting that bad? Maybe I'm the only customer left. Sun tech guy says there is not patch and it won't even be addressed in Solaris 11. WTF?!?! So they know of it, but I must be the only one with an issue. Sun tech guys, "we'll right you a patch but you need to sign something stating you'll actually deploy it. And test it out. On production machines. HA! Good one Sun tech guy. After literally days of arguing and pleading to get them to do something I get a recomendation from him to to UFS and enabling directIO. Woohoo, beaten by your own OS. Way to stay with it. Prior to this incident I have had nothing but great experiences with Sun support. This took the cake. We are so soured that RAC on RedHat the next implementation.

Condemnation for ZFS? No way. I still love. I think this is an isolated incident and I won't be deploying it the same fashion. I'll be hard pressed to be allowed to purchase any more Sparc systems though. I'll have to settle for x86. Could be worse I guess. Could dealing with HPUX.

Mo 'Green' tidbits

If you know you will idle in your care longer than 8 seconds, shut it off. That is the sweet spot for fuel waste. Concerned about prematurely killing your start by shutting off you car all the time?
When was the last time your replaced a start in a modern(post carburetor) car?

Friday, April 3, 2009

Another Google WOW - Servers

One of Google's inspirational achievements:
http://news.cnet.com/8301-1001_3-10209580-92.html

There are so many cool things happening in this article.

Wednesday, March 25, 2009

Save with CFLs. Yes and no.

One thing no one ever mentions about saving money with CFLs is that you should only use them as replacements when the existing bulb burns out. If you don't, you're essentially wasting your initial investmeny in the bulb and filling landfills with working bulbs. It may seem insignificant but it all adds up.

Riding out those bulbs until they burn may save you enough for a Grain Belt Premium. And that is one Premo you didn't have before

Tuesday, March 17, 2009

Gotta Have IT

It has occurred to me that there are 3 vital game changers out there.
Data deduplication
Virtualization
Wan optimization

All encompass maximizing efficiency.
Data dedupe gives you back 30 to 40 percent of your storage by finding
indentical blocks of data and placing pointers to it rather than writing
full copies of data. So,with file level dedupe, you have 1 copy of
a word document with 5 pointers referncing it rather than 6 full
copies.With block level dedupe, you save 1 version of the text
"thank you"; from every file with additional pointers, rather
than 8 billion strings of that text. It ain't cheap but pays for
itself.

Virtualization.The newest in sliced bread. Remember
mainframes? The more things change, the more they stay the
same Jam a bunch of separate, functional apps in one place
gaining resource, hardware, and operational efficiencies. With an
avg server running at 5% efficiency, this is very wasteful of the
aforementioned resources. The savings can be astronomical
depending on environment.

Wan optimization. This merges many technology concepts to provide
reduced latency and response times across slow links through caching,
compression and other secret sauce. Response times are akin to being on
a LAN.

Coolness comes at cost though. However, long run, all will save
you $.

Reduce, reduce, recycle.

....and above ALL, help someone.
I recently coordinated a donation of some of my company's outdated
PCs(pentium 3s) with a great nonprofit organization, pcsforpeople. They take
in used pentium 3s or higher fix them up, if needed, and distribute them to
underpriveledged families that can use them.

<preach>
Think about this. Something we take for granted(computers, the
internet, and email), some folks have yet to even experience. Hopefully,
families can take advantage of the vast information that can
become available to them. Can you imagine a day without email or
surfing? I can't. Its become a necessity for most and a privelege for some.
</preach>

You can help. I found the organization on a suggestion from a
coworker. I was looking for someone that could use these computers
rather than sending them off to be trashed. That is all it takes! More than
likely there is someone in your area doing the same very thing as
pcsforpeople. Find them.

Those in the southern Minnesota area, contact Andy Elofson at =
andy.elofson@co.blue-earth.mn.us or pcsforpeople.com

Tuesday, March 3, 2009

ESX, EMC, and Cisco

Fault tolerance is the name of the game.
Within my company we have 3 main locations with 3 data centers, the main being at the location I work in. We have 2 ways to setup networking for our ESX servers. The stats are as follow:

HQ
VMware ESX networking setup:
Storage - EMC NS-20 via iSCSI
Network - Cisco Catalyst 6509 w/VSS
Hosts - Dell Poweredge 2950 dual/quad procs, w/ 36gb ram
70+ guest VMs

2 of our 4 hosts have 6 nics in them and the other 2 have 8 nics.
2 for VM network
2 for vmotion
2 for iSCSI
and the additional 2 in the others hosts are for DMZ VM guest access.
We connect them so that 1 of the 2 teamed connections resides on different PCI cards.
Within the network config and ESX we setup teaming/bonding/trunking between the 2 NICs.
Because our 6509s are utilizing VSS, it affords us the opportunity to share backplane info and treat both of them as a single virtual switch. Why is this huge? Because we can take advantage LACP(not within ESX but elsewhere) AND have chassis redundancy.
On the ESX side we use team load-balancing by IP hash, which is only recommended for nics connecting to the same switch chassis. LACP is not supported at the time of this writing. Below is an example of a nic's port configuration:

interface GigabitEthernet2/1/14
description ESX04-vmnic5-vmotion
switchport
switchport access vlan 180
switchport mode access
switchport nonegotiate
channel-group 31 mode on
spanning-tree portfast
spanning-tree bpduguard enable

interface Port-channel31
description ESX04-VMkernel
switchport
switchport access vlan 180
switchport mode access
switchport nonegotiate

interface GigabitEthernet1/2/10
description W-IT-ESX04-vmnic0
switchport
switchport trunk encapsulation dot1q
switchport mode trunk
switchport nonegotiate
flowcontrol send off
channel-group 32 mode on
spanning-tree portfast trunk
spanning-tree bpduguard enable

interface Port-channel32
description W-IT-ESX04-vm/sc
switchport
switchport trunk encapsulation dot1q
switchport mode trunk
switchport nonegotiate
spanning-tree portfast trunk
spanning-tree bpduguard enable

interface GigabitEthernet1/2/9
description W-IT-ESX04-vmnic1
switchport
switchport access vlan 165
switchport mode access
switchport nonegotiate
mtu 9216
flowcontrol send off
channel-group 30 mode on
spanning-tree portfast
spanning-tree bpduguard enable

interface Port-channel30
description W-IT-ESX04-iSCSI
switchport
switchport access vlan 165
switchport mode access
switchport nonegotiate
mtu 9216


Our shared storage is an EMC Celerra(NS-20). I has the ability to serve disk up via CIFS, NFS, and iSCSI. This unit is setup similarly to our ESX hosts in that has an LACP channel spanning across both switches for iSCSI and normal TCP/IP traffic. This setup is robust, efficient and performs well.

A frequent misnomer is that through a 2 link LACP channel, we have an aggregate 2gb at our disposal. This is not entirely true. If 1 of the gig connections "fills up", data does not spill over into the other connection. Also, if a single host that has a 2gb LACP channel to the Celerra, it will always traverse whatever link it is currently using to get there. All other data flowing to/from that host will use the other link in the LACP channel. This can be important when configuring your iSCSI targets on the Celerra. We created 4 targets on the Celerra across a 2gb LACP channel. This effectively load balances the iSCSI lun traffic over the entire channel. Had we only used a single target for the entire channel, it would only use 1 connection. LACP is not the greatest for 1 to 1 connections. It is meant for 1 to many and many to many. We chose 4 targets as a happy medium rather than creating a target for each lun, which could add admin complexity.

Thus far, results have been very positive. We've had no reports of slowness or other negative observations with this setup. Network failover is near instantaneous and allows us to be as resilient as our VM guest OSes and shared storage allow us to be.