Test Steps for Auditing Virtualization
How to review critical controls that protect the confidentiality, integrity, or availability of environment for the supported operating systems and users that rely on the environment.
Editor's Note: IT Auditing: Using Controls to Protect Information Assets (Second Edition) authors Chris Davis and Mike Schiller (with Kevin Wheeler) provide a handbook for creating an organization’s IT audition function and for performing their IT audit. This first in a series of excerpts explores key factors for auditing your virtualization environment, including security, data protection, processes and procedures, and capacity management.
The virtualization audit covered here is designed to review critical controls that protect the confidentiality, integrity, or availability of the environment for the supported operating systems and users that rely on the environment. Each of the following steps applies to some extent; however, use your judgment to determine the depth to which you decide to take any one step. For example, an auditor reviewing high-performance environments supporting a business-critical application might spend more time asking questions and reviewing vendor-specific analysis output that verifies that the virtualized environment has the capacity and performance necessary to handle peak loads.
NOTE: This audit focuses on the hypervisor and management of the virtual environment, regardless of where the hypervisor is installed. If the hypervisor is installed as an application on another operating system, audit the underlying operating system separately using the appropriate test steps in Chapter 6, “Auditing Windows Operating Systems,” or Chapter 7, “Auditing UNIX and Linux Operating Systems.”
Note that there are several excellent hardening guides and configuration checking utilities, and we encourage the use of these tools to help provide consistency across the environment. Vendors have different approaches for shipping products. Some vendors include unnecessary services and product features enabled. Others ship their products in a hardened state whereby the administrator must enable additional services. Note many of the hardening guides have a narrow scope that focus on the compromise of the hypervisor as opposed to ensuring that controls support business processes and objectives. This is the value provided by Control Objectives for IT (COBIT).
Setup and General Controls
Step 1: Document the overall virtualization management architecture, including the hardware and supporting network infrastructure.
The team responsible for managing virtualization should maintain documentation illustrating the virtualization architecture and how it interfaces with the rest of the environment. Documentation should include supported systems, management systems, and the connecting network infrastructure. This information will be used by the auditor to help interpret the results of subsequent audit steps.
How: Discuss and review existing documentation with the administrator. As applicable, verify that document structure and management are aligned with corporate standards. Verify the entire environment, including management, storage, and network components, are properly documented.
Step 2: Obtain the software version of the hypervisor and compare with policy requirements.
Review the software version to ensure that the hypervisor is in compliance with policy. Older software may have reliability, performance, or security issues that can increase the difficulty in managing the virtualization platform(s). Additionally, disparate software versions may increase the scope of administrator’s responsibilities as he or she attempts to maintain control over the different hypervisors and their feature, control, and administration differences.
<em>How:</em>Work with the administrator to obtain this information from the system and review vendor documentation. Ensure that the software is a version the vendor continues to support and does not contain widely known and patchable vulnerabilities that would bypass existing controls. Also verify that the current running version does not contain performance or reliability issues that would affect your environment. Review any mitigating factors with the administrator, such as issues that have not been fixed but are not applicable to the environment.
Step 3: Verify that policies and procedures are in place to identify when patches are available and to evaluate and apply applicable patches. Ensure that all approved patches are installed per your policy requirements.
Most virtualization vendors have regularly scheduled patch releases. You should be prepared for the scheduled releases so that you can plan appropriately for testing and installation of the patches. If all the patches are not installed, widely known security vulnerabilities or critical performance issues could exist.
How: Interview the administrator to determine who reviews advisories from vendors, including timely notifications about new vulnerabilities and zero-day attacks, what steps are taken to prepare for the patches, and how the patches are tested before being applied to the production systems. Ask to review notes from the previous patching cycle. Obtain as much information as possible about the latest patches through conversations with the administrator and review of vendor documentation, and determine the scope of the vulnerabilities addressed by the patches. Compare the available patches with the patches applied to the hypervisor. Talk with the administrator about steps taken to mitigate potential risk if the patches are not applied in a timely manner.
Step 4: Determine what services and features are enabled on the system and validate their necessity with the system administrator.
Unnecessary services and features increase risk exposure to misconfigurations, vulnerabilities, and performance issues and complicate troubleshooting efforts.
How: Today’s virtualization systems range from the very simple to the extremely complex. Work closely with the virtualization administrator to discuss enabled services and their applicability to the environment. Review and evaluate procedures for assessing vulnerabilities associated with necessary services and features and keeping them properly configured and patched.
Account and Resource Provisioning and Deprovisioning
Administrative accounts in the virtual environment must be managed appropriately, as should the provisioning and deprovisioning of virtual machines.
Step 5: Review and evaluate procedures for creating administrative accounts and ensuring that accounts are created only when a legitimate business need has been identified. Also review and evaluate processes for ensuring that accounts are removed or disabled in a timely fashion in the event of termination or job change.
Effective controls should govern account creation and deletion. Inappropriate or lacking controls could result in unnecessary access to system resources, placing the integrity and availability of sensitive data at risk.
How: Interview the system administrator, and review account-creation procedures. This process should include some form of verification that the user has a legitimate need for access. Take a sample of accounts and review evidence that they were approved properly prior to being created. Alternatively, take a sample of accounts and validate their legitimacy by investigating and understanding the job function of the account owners.
Review the process for removing accounts when access is no longer needed. This process could include a component driven by the company’s human resources (HR) department providing information on terminations and job changes. Or the process could include a periodic review and validation of active accounts by the system administrator and/or other knowledgeable managers. Obtain a sample of accounts and verify that they are owned by active employees and that each employee has a legitimate business requirement for administrative access.
Step 6: Verify the appropriate management of provisioning and deprovisioning new virtual machines, including appropriate operating system and application licenses.
Written policies should govern the process used to create new virtual machines, manage users, and allocate software licenses. The ease of spinning up new servers for development and testing has created a new challenge for managing hardware and license resources.
Policies or procedures should also exist for “cleaning up” or removing virtual machines, rights, and licenses that are no longer needed when a project is completed. Failure to manage virtual host allocation could unnecessarily expend virtualization capacity and software licenses.
Virtual machines should be accountable to specific groups or users. Failure to govern rights management may allow users that should no longer have access to hosts to maintain inappropriate levels of access.
How: Discuss policies and procedures for provisioning and deprovisioning new hosts and accounts with the virtualization administrator, including license allocation, user management, and host ownership. Several tools help manage this process, particularly in development environments where server sprawl tends to become a problem. For example, VMware’s Lab Manager allows the provisioning administrator to set time limits for how long a virtual machine can be active. Lab Manager provides a control that protects the virtualization resources from becoming overrun with virtual machines that consume resources from the virtual hosts that really need those resources.
Virtual Environment Management
The virtual environment must be managed appropriately to support existing and future business objectives. Resources must be monitored and evaluated for capacity and performance. Resources must also support the organization’s Business Continuity/Disaster Recovery objectives.
Step 7: Evaluate how hardware capacity is managed for the virtualized environment to support existing and future business requirements.
Business and technical requirements for virtualization can change quickly and frequently, driven by changes in infrastructure, business relationships, customer needs, and regulatory requirements. The virtualization hardware and infrastructure must be managed to support existing business needs and immediate anticipated growth. Inadequate infrastructure places the business at risk and may impede critical business functions that need more hardware capacity.
How: Virtual machine capacity is managed by the hypervisor to allocate a specific amount of storage, processor, and memory to each host. Verify that capacity requirements have been documented and that customers have agreed to abide by them. Capacity allocation may directly affect performance. Review processes for monitoring capacity usage for storage, memory, and processing, noting when they exceed defined thresholds. Evaluate processes in place for responding and taking action when capacity usage exceeds customer-approved thresholds. For example, some organizations utilize cloud bursting to offload increases in demand for internal computing capacity, whereby a service provider makes additional capacity available as needed. Discuss the methods used to determine present virtualization requirements and anticipated growth. Review growth plans with the administrator to verify that the hardware can meet the performance requirements, capacity requirements, and feature requirements to support infrastructure and business growth.
Step 8: Evaluate how performance is managed and monitored for the virtualization environment to support existing and anticipated business requirements.
Virtualization performance of the infrastructure as a whole and for each virtual machine is driven by several factors, including the physical virtualization media, communication protocols, network, data size, CPU, memory, storage architecture, and a host of other factors. Inadequate virtualization infrastructure places the business at risk of losing access to critical business applications. It’s possible to have adequate capacity but incorrectly configured and underperforming virtual machines that fail to deliver on the Service Level Agreement (SLA).
How: Verify that regular periodic performance reviews of the processor, memory, and bandwidth loads on the virtualization architecture are performed to identify growing stresses on the architecture. A common performance measurement for virtual environments is based on Input/Output Operations Per Second (IOPS). Verify that performance requirements have been documented and that customers have agreed to abide by them. Review processes for monitoring performance and noting when performance falls below defined thresholds. Evaluate processes in place for responding and taking action when performance falls below customer-agreed thresholds. Discuss the methods used to determine present performance requirements and anticipated changes.
NOTE: A review of capacity management and performance planning is essential to this audit. Be careful to ensure that the administrator has a capacity management plan in place and verifies that performance needs are appropriate for the organization.
Step 9: Evaluate the policies, processes, and controls for data backup frequency, handling, and offsite management.
Processes and controls should meet policy requirements, support Business Continuity/Disaster Recovery (BC/DR) objectives, and protect sensitive information. Data backups present monumental challenges for organizations, particularly when it comes to the central data repositories in the organization, namely the databases and virtualization platforms. Vendors offer several solutions to manage the frequency, handling, and offsite delivery of data and system backups. The implemented solution should be appropriate to meeting the stated goals of the BC/DR plans.
How: Review policy requirements for meeting Recovery Point Objectives (RPOs), which affect how much data might be lost from a disaster, and Recovery Time Objectives (RTOs), which affect how long it will take to restore data after a disaster occurs. The RPOs and RTOs, shown in Figure 11-3, for virtualized hosts should be aligned with the BC/DR programs. Discuss the relative priority to other systems based on business criticality and dependencies. Verify that an appropriate Service Level Agreement (SLA) is in place that supports your stated RPO/RTO objectives if part of this process is outsourced or handled by another party. You should also ensure that sensitive data is encrypted prior to offsite storage.
Figure 11-3: Recovery Point Objective and Recovery Time Objective
Step 10: Review and evaluate the security of your remote hypervisor management.
Secure remote hypervisor management protects the hypervisor from remote attacks that might otherwise disrupt the hypervisor or hosted virtual machines. Each of the hypervisors has its own management tools designed to allow remote administration of the hypervisor and virtual machines. Many of these commercial tools can manage other commercial hypervisors in an effort to manage heterogeneous virtual environments seamlessly. Despite their obvious differences, the areas that should be reviewed have some commonalities.
Unused services, accessible APIs, and installed applications may subject the hypervisor to additional attack vectors if a security flaw is discovered. In addition, remote users should be forced to access the system using accounts that can be tied to a specific user for logging and tracking. The difference between this step and step 4 is the careful analysis of network-accessible components for the hypervisor with regard to remote management. Unless specifically required and appropriately controlled, network-accessible features should not be enabled. Enable only those components that are necessary and appropriately configured for remote management.
How: Each vendor provides specific security guides for enabling remote management. These security guides are generally easy to read and should be reviewed in detail prior to beginning the audit. The execution of this step consists of a policy review, account permissions review, and a configuration review.
Review remote access policies and access methods with the administrator. Verify that all remote access is logged to a system separate from the environment. Question the need for any clear-text communications used for remote access. Identify and validate the appropriateness of administrative accounts that have remote access.
NOTE: The use of secure protocols is particularly important in a DMZ and other high-risk environments. It is also advisable to use secure protocols for management on internal networks to minimize internal attack vectors. Attackers will use a single compromised beachhead system to learn about the environment, pivot, and attack other systems from within.
Obtain vendor appropriate guidance for configuring secure remote hypervisor access. These should be used to identify and verify that the environment is securely configured for remote access. This process can be conducted manually, but we highly recommend using one of the several available versions of configuration checking tools. For example, the Tripwire-VMware developed tool verifies the following which may also assist you with other parts of this audit:
Virtual network labeling
Port Group settings
Network isolation for VMotion and iSCSI
NIC Mode settings / Layer 2 Security settings
MAC address parameters
VMware ESX Service Console security settings
SAN resource masking and zoning
Disk partitioning for Root File System
VirtualCenter database configuration
Excerpted from IT Auditing: Using Controls to Protect Information Assets; copyright 2011 by The McGraw-Hill Companies. Used by permission of McGraw-Hill.