Quantcast
Channel: Virtualization – ATS Cloud
Viewing all articles
Browse latest Browse all 18

vSphere 5 Troubleshoot & Enhance Performance

$
0
0

I’ve compiled a list of things together from VMware Documentation however the suggestions in this section are not meant to be a comprehensive guide to diagnosing and troubleshooting problems in the virtual environment. It is meant to provide information about some common problems that can be solved without contacting VMware Technical Support. At least this give you a good understanding where you need to start.

——————————————————————-

Solutions for Consistently High CPU Usage

Temporary spikes in CPU usage indicate that you are making the best use of CPU resources. Consistently high CPU usage might indicate a problem. You can use the vSphere Client CPU performance charts to monitor CPU usage for hosts, clusters, resource pools, virtual machines, and vApps.

Problem
¦
Host CPU usage constantly is high. A high CPU usage value can lead to increased ready time and processor queuing of the virtual machines on the host.
¦
Virtual machine CPU usage is above 90% and the CPU ready value is above 20%. Application performance is impacted.

Cause
The host probably is lacking the CPU resources required to meet the demand.

Solution
¦
Verify that VMware Tools is installed on every virtual machine on the host.
¦
Compare the CPU usage value of a virtual machine with the CPU usage of other virtual machines on the host or in the resource pool. The stacked bar chart on the host’s Virtual Machine view shows the CPU usage for all virtual machines on the host.
¦
Determine whether the high ready time for the virtual machine resulted from its CPU usage time reaching the CPU limit setting. If so, increase the CPU limit on the virtual machine.
¦
Increase the CPU shares to give the virtual machine more opportunities to run. The total ready time on the host might remain at the same level if the host system is constrained by CPU. If the host ready time doesn’t decrease, set the CPU reservations for high-priority virtual machines to guarantee that they receive the required CPU cycles.
¦
Increase the amount of memory allocated to the virtual machine. This action decreases disk and or network activity for applications that cache. This might lower disk I/O and reduce the need for the host to virtualize the hardware. Virtual machines with smaller resource allocations generally accumulate more CPU ready time.
¦
Reduce the number of virtual CPUs on a virtual machine to only the number required to execute the workload. For example, a single-threaded application on a four-way virtual machine only benefits from a single vCPU. But the hypervisor’s maintenance of the three idle vCPUs takes CPU cycles that could be used for other work.
¦
If the host is not already in a DRS cluster, add it to one. If the host is in a DRS cluster, increase the number of hosts and migrate one or more virtual machines onto the new host.
¦
Upgrade the physical CPUs or cores on the host if necessary.
¦
Use the newest version of hypervisor software, and enable CPU-saving features such as TCP Segmentation Offload, large memory pages, and jumbo frames.

——————————————————————————————————————————————————————-

Solutions for Memory Performance Problems

Host machine memory is the hardware backing for guest virtual memory and guest physical memory. Host machine memory must be at least slightly larger than the combined active memory of the virtual machines on the host. A virtual machine’s memory size must be slightly larger than the average guest memory usage. Increasing the virtual machine memory size results in more overhead memory usage.

Problem
¦
Memory usage is constantly high (94% or greater) or constantly low (24% or less).
¦
Free memory consistently is 6% or less and swapping frequently occurs.

Cause
¦
The host probably is lacking the memory required to meet the demand. The active memory size is the same as the granted memory size, which results in memory resources that are not sufficient for the workload. Granted memory is too much if the active memory is constantly low.
¦
Host machine memory resources are not enough to meet the demand, which leads to memory reclamation and degraded performance.
¦
The active memory size is the same as the granted memory size, which results in memory resources that are not sufficient for the workload.

Solution
¦
Verify that VMware Tools is installed on each virtual machine. The balloon driver is installed with VMware Tools and is critical to performance.
¦
Verify that the balloon driver is enabled. The VMkernel regularly reclaims unused virtual machine memory by ballooning and swapping. Generally, this does not impact virtual machine performance.
¦
Reduce the memory space on the virtual machine, and correct the cache size if it is too large. This frees up memory for other virtual machines.
¦
If the memory reservation of the virtual machine is set to a value much higher than its active memory, decrease the reservation setting so that the VMkernel can reclaim the idle memory for other virtual machines on the host.
¦
Migrate one or more virtual machines to a host in a DRS cluster.
¦
Add physical memory to the host.

——————————————————————————————————————————————————————-

Solutions for Storage Performance Problems

Datastores represent storage locations for virtual machine files. A storage location can be a VMFS volume, a directory on Network Attached Storage, or a local file system path. Datastores are platform-independent and host-independent.

Problem
¦
Snapshot files are consuming a lot of datastore space.
¦
The datastore is at full capacity when the used space is equal to the capacity. Allocated space can be larger than datastore capacity, for example, when you have snapshots and thin-provisioned disks.

Solution
¦
Consider consolidating snapshots to the virtual disk when they are no longer needed. Consolidating the snapshots deletes the redo log files and removes the snapshots from the vSphere Client user interface.
¦
You can provision more space to the datastore if possible, or you can add disks to the datastore or use shared datastores.

——————————————————————————————————————————————————————-

Solutions for Disk Performance Problems


Use the disk charts to monitor average disk loads and to determine trends in disk usage. For example, you might notice a performance degradation with applications that frequently read from and write to the hard disk. If you see a spike in the number of disk read/write requests, check if any such applications were running at that time.

Problem
¦
The value for the kernelLatency data counter is greater than 4ms.
¦
The value for the deviceLatency data counter is greater than 15ms indicates there are probably problems with the storage array.
¦
The queueLatency data counter measures above zero.
¦
Spikes in latency.
¦
Unusual increases in read/write requests.

Cause
¦
The virtual machines on the host are trying to send more throughput to the storage system than the configuration supports.
¦
The storage array probably is experiencing internal problems.
¦
The workload is too high and the array cannot process the data fast enough.

Solution
¦
The virtual machines on the host are trying to send more throughput to the storage system than the configuration supports. Check the CPU usage, and increase the queue depth.
¦
Move the active VMDK to a volume with more spindles or add disks to the LUN.
¦
Increase the virtual machine memory. This should allow for more operating system caching, which can reduce I/O activity. Note that this may require you to also increase the host memory. Increasing memory might reduce the need to store data because databases can utilize system memory to cache data and avoid disk access.
¦
Check swap statistics in the guest operating system to verify that virtual machines have adequate memory. Increase the guest memory, but not to an extent that leads to excessive host memory swapping. Install VMware Tools so that memory ballooning can occur.
¦
Defragment the file systems on all guests.
¦
Disable antivirus on-demand scans on the VMDK and VMEM files.
¦
Use the vendor’s array tools to determine the array performance statistics. When too many servers simultaneously access common elements on an array, the disks might have trouble keeping up. Consider array-side improvements to increase throughput.
¦
Use Storage VMotion to migrate I/O-intensive virtual machines across multiple hosts.
¦
Balance the disk load across all physical resources available. Spread heavily used storage across LUNs that are accessed by different adapters. Use separate queues for each adapter to improve disk efficiency.
¦
Configure the HBAs and RAID controllers for optimal use. Verify that the queue depths and cache settings on the RAID controllers are adequate. If not, increase the number of outstanding disk requests for the virtual machine by adjusting the Disk.SchedNumReqOutstanding parameter.
¦
For resource-intensive virtual machines, separate the virtual machine’s physical disk drive from the drive with the system page file. This alleviates disk spindle contention during periods of high use.
¦
On systems with sizable RAM, disable memory trimming by adding the line MemTrimRate=0 to the virtual machine’s .VMX file.
¦
If the combined disk I/O is higher than a single HBA capacity, use multipathing or multiple links.
¦
For ESXi hosts, create virtual disks as preallocated. When you create a virtual disk for a guest operating system, select Allocate all disk space now. The performance degradation associated with reassigning additional disk space does not occur, and the disk is less likely to become fragmented.
¦
Use the most current hypervisor software.

——————————————————————————————————————————————————————-

Solutions for Poor Network Performance

Network performance is dependent on application workload and network configuration. Dropped network packets indicate a bottleneck in the network. Slow network performance can be a sign of load-balancing problems.

Problem
Network problems can manifest in many ways:
¦
Packets are being dropped.
¦
Network latency is high.
¦
Data receive rate is low.

Cause
Network problems can have several causes:
¦
Virtual machine network resource shares are too few.
¦
Network packet size is too large, which results in high network latency. Use the VMware AppSpeed performance monitoring application or a third-party application to check network latency.
¦
Network packet size is too small, which increases the demand for the CPU resources needed for processing each packet. Host CPU, or possibly virtual machine CPU, resources are not enough to handle the load.

Solution
¦
Determine whether packets are being dropped by using esxtop or the advanced performance charts to examine the droppedTx and droppedRx network counter values. Verify that VMware Tools is installed on each virtual machine.
¦
Check the number of virtual machines assigned to each physical NIC. If necessary, perform load balancing by moving virtual machines to different vSwitches or by adding more NICs to the host. You can also move virtual machines to another host or increase the host CPU or virtual machine CPU.
¦
If possible, use vmxnet3 NIC drivers, which are available with VMware Tools. They are optimized for high performance.
¦
If virtual machines running on the same host communicate with each other, connect them to the same vSwitch to avoid the cost of transferring packets over the physical network.
¦
Assign each physical NIC to a port group and a vSwitch.
¦
Use separate physical NICs to handle the different traffic streams, such as network packets generated by virtual machines, iSCSI protocols, VMotion tasks.
¦
Ensure that the physical NIC capacity is large enough to handle the network traffic on that vSwitch. If the capacity is not enough, consider using a high-bandwidth physical NIC (10Gbps) or moving some virtual machines to a vSwitch with a lighter load or to a new vSwitch.
¦
If packets are being dropped at the vSwitch port, increase the virtual network driver ring buffers where applicable.
¦
Verify that the reported speed and duplex settings for the physical NIC match the hardware expectations and that the hardware is configured to run at its maximum capability. For example, verify that NICs with 1Gbps are not reset to 100Mbps because they are connected to an older switch.
¦
Verify that all NICs are running in full duplex mode. Hardware connectivity issues might result in a NIC resetting itself to a lower speed or half duplex mode.
¦
Use vNICs that are TSO-capable, and verify that TSO-Jumbo Frames are enabled where possible.



Viewing all articles
Browse latest Browse all 18

Trending Articles