Performance Tuning VMWare Server

Stole this from http://www.fewt.com/2008/06/performance-tuning-vmware-server-on.html for personal reference.
EDIT 2015: The link is now dead, so props to me for doing so. Damn you, Internet.

My only addition to this would be that it works perfectly on Debian as well.

This write-up aims to provide detail into building and tuning Linux and VMWare for a Virtual Server solution using 64bit Ubuntu by discussing my experiences in building this system.

Here are the example specs for the new VMWare Server:
AMD x2 4200+ 65nm CPU
4GB PC2-6400 DDR2 RAM (Dual Channel)
2x Seagate Barracuda 7200RPM 250GB SATA 3.0Gb drives
ASRock ALiveNF6P-VSTA System Board
580W Power Supply

Operating System: Ubuntu 7.10 Server 64bit

Drives configured as follows:

/dev/sda
1 – Swap (1GB)
2 – md0 (250MB)
3 – md1 (231GB)

/dev/sdb
1 – Swap (1GB)
2 – md0 (250MB)
3 – md1 (231GB)

/dev/hda
1 – ext3 (100GB)

All MD devices are configured as RAID-1 using Linux Software Raid

Device md0 was formatted as ext3 and exists as /boot
Device md1 was added to volume group vg00, then Logical Volume ROOT and mounted on /

LILO was used, and it is written to both sda and sdb (so the system remains bootable if sda fails)

Using Logical Volume Manager (LVM) for / provides flexability in that I can add two more 250GB drives as md2, and grow / to 500GB with a few simple commands (vgextend, lvextend, resize2fs). In addition, using LVM also provides capabilities to mirror or snapshot volumes. Using Software Raid instead of the integrated NVidia MediaShield provides better RAID management flexability, and as both devices are CPU backed since there is no internal RAID processor on the MCP chips there is really no performance gain by using the integrated card.

Tuning:

After building the system, and installing VMWare I started creating Virtual Machines. Something that I immediately noticed was that during even small IO, wait states climbed to 100% CPU and network dropoffs occured, it was causing connections to the web server VM to drop for even static content performance was absolutely terrible. I initially thought that the problem was the Software Raid devices, but I quickly identified that the problem child was actually problem children and that the RAID wasn’t one of them. I installed a product called monitorix and started collecting data which was instrumental in identifying the performance bottlenecks.

Virtual Machine configuration:

When creating virtual machines on this platform I elected to us pre-allocated disks. Using pre-allocated disks reduces disk fragmentation, and improves overall performance. Additionally, I always remove the floppy device. Here is an overview of the configurations:

Generic base configuration:

Linux 32bit and 64bit VMs:
384MB Ram 20GB SCSI disk pre-allocated 1 Ethernet device (bridged)

Windows VMs:
256MB Ram 8GB SCSI disk pre-allocated 1 Ethernet device (bridged)

It’s recommended by VMWare that Windows VMs be configured to use IDE, however in my experience the Virtual IDE devices use tons more CPU time than the SCSI device. This is due to the emulation level done and lack of I/O threading in VMWare’s IDE controller. I have to assume that this is a problem with IDE in general, as it’s never been very good at multithreaded I/O (this is one big reason it’s never been used for servers). Additionally, I recommend using the LSILogic controller is it supports multithreaded IO while the Buslogic controller doesn’t.

One of my virtual machines was destined to become a print server, printing to a USB printer directly connected to the VMWare server. Ubuntu 7.10 doesn’t configure USBFS out of the box. This can be corrected by editing a few files:

Add to fstab:

usbfs /proc/bus/usb usbfs auto 0 0

Edit /etc/init.d/mountdevsubfs.sh, and uncomment the following lines:

#mkdir -p /dev/bus/usb/.usbfs
#domount usbfs "" /dev/bus/usb/.usbfs -obusmode=0700,devmode=0600,listmode=0644
#ln -s .usbfs/devices /dev/bus/usb/devices
#mount --rbind /dev/bus/usb /proc/bus/usb

This is done by removing the # from the front of each line. Once this is done, go ahead and run the script.

/etc/init.d/mountdevsubfs.sh start

In the Virtual Machine configuration, I needed to ensure that the printer was always connected on startup so I inserted the following configuration into that Virtual Machine’s VMX file:

usb.present = "TRUE"
usb.generic.autoconnect = "FALSE"
usb.autoConnect.device0 = "0x0000:x0000"
usb.autoConnect.device1 = "0x04e8:0x327e"
usb.generic.skipsetconfig = "TRUE"

You can get the IDs for your devices by issuing an lsusb on the VMWare Server, this command will output similar to the following:

Bus 002 Device 002: ID 04e8:327e Samsung Electronics Co., Ltd
Bus 002 Device 001: ID 0000:0000
Bus 001 Device 001: ID 0000:0000

Additionally, I had to blacklist usblp on the VMWare Server so the host didn’t connect to the printer making it unavailable to the guest.

echo "blacklist usblp" >>/etc/modprobe.d/blacklist

As you can see above I have used the device ID for the Samsung printer, as well as the USB hub that it’s connected to (0000:0000). Now, whenever the Virtual Machine restarts, it scans the USB bus on the host and automatically connects those devices. Printing now “just works” after rebooting. Of course for it to “just work” you also need to configure CUPS or Windows printer shareing, but that is out of the scope of this article ;-).

Add each of these to /etc/vmware/config:

mainMem.useNamedFile tells VMWare where to put it’s temporary workspace file. This file contains the content of the Virtual Machine memory which is not used. By default it is placed in the directory with the virtual machine, however that can seriously impact performance so we’ll turn it off.

mainMem.useNamedFile = FALSE

tmpDirectory is the default path for any temp files. We need to change that to be a shared memory filesystem (in RAM).

tmpDirectory="/dev/shm"

prefvmx.useRecommendedLockedMemSize and prefvmx.minVmMemPct tell VMWare to either use a fixed sized memory chunk or balloon and shrink memory as needed. Since I have 4GB of memory in this “server” I want to make sure that I use a fixed chunk of memory to reduce disk IO.

prefvmx.useRecommendedLockedMemSize="TRUE"
prefvmx.minVmMemPct="100"

To tune each Virtual Machine, I installed VMWare tools and then made the following changes to each VMX file:

Set the time in the Virtual Machine to the hosts time (I use NTP on the host):

tools.syncTime = "TRUE"

When I reboot the host, I want to gracefully stop each VM instead of just powering it off:

autostop = "softpoweroff"

I don’t care about collapsing memory into a shared pool, this tells the VM to not share which saves CPU cycles:

mem.ShareScanTotal=0
mem.ShareScanVM=0
mem.ShareScanThreshold=4096
sched.mem.maxmemctl=0
sched.mem.pshare.enable = "FALSE"

This basically performs the same action as the configuration I put in /etc/vmware/config by telling the VM to eliminate the temp files and not to balooning and shrink memory, however it doesn’t hurt anything to have it in both locations:

mainMem.useNamedFile = "FALSE"
MemTrimRate = "0"
MemAllowAutoScaleDown = "FALSE"

Additionally, by default Ubuntu writes an access time stamp to every inode that’s accessed. This is pretty accessive and known to cause bottlenecks in high I/O scenarios. It doesn’t negatively impact the filesystem unless you care about access time stamps, so in each VM and the VMWare host host I add “noatime” as an option to all of my mounted disks in /etc/fstab.

In order for the VMWare configuration to work properly with shared memory, you’ll need to increase the default shared memory size for tmpfs to match the amount of memory in your system. This can be done by editing /etc/default/tmpfs:

SHM_SIZE=4G

You can use ‘mount -o remount /dev/shm‘ and ‘df -h‘ to implement and verify the change.

Last, I configure /etc/sysctl.conf on the VMWare Server which configures the kernel to perform better as a Virtual Server by inserting the following configuration:

vm.swappiness = 0
vm.overcommit_memory = 1
vm.dirty_background_ratio = 5
vm.dirty_ratio = 10
vm.dirty_expire_centisecs = 1000
dev.rtc.max-user-freq = 1024

Lastly, I disable the tickless kernel option in kernel 2.6.22 which further reduces the Virtual Machine I/O constraints by reverting back to using ticks which is better supported by VMWare. This can be done by adding the following option to the kernel options line in /boot/grub/menu.lst or /etc/lilo.conf:

nohz=off

With all of these options configured, the VMWare server now performs wonderfully at under 20% host CPU utilization with 6 Virtual Machines all running various flavors of Windows and Linux.

Leave a Reply

Your email address will not be published. Required fields are marked *