What Is Virtualization?
Many software programs, Microsoft Word for example, are designed to be installed and operated on a single computer. Only your computer can use the software programs that are installed on your computer’s drive. Under this model, two people using Microsoft Word would require separate installations of the program on two separate computers.
Virtualization, on the other hand, allows a single installation of a software program on a single computer (running as a server) to run multiple instances of that software program, thereby making that software available to multiple people via the Internet without them having to install the computer program on their own individual computer.
As discussed by Peng Li (2010), there are several models for virtualization. The virtualization model adopted by the PAVEL project, discussed below, corresponds most closely to the “Centralized Virtual Labs Powered by Desktop Virtualization” model described by Li, though in the PAVEL model, individual applications are virtualized and run from a virtualized server, allowing multiple instances of the same application to be run from one Operating System (OS). This model allows for the greatest efficiency in terms of both performance and scalability, while still allowing for the installation of multiple operating systems. The model used by the PAVEL project is also ideal for running high-end, expensive software on the many different kinds of computers students may have.
Benefits of virtualization
- Software can be virtualized once and used by many people simultaneously.
- A user can install a Remote Desktop Client and run any piece of software for which an Remote Desktop Protocol (RDP) has been created.
- Since the processing is done remotely on the server, users with older and less powerful computers can run more resource intensive applications.
- It allows an instructional computing model that is not reliant on a physical lab space. Hence students and faculty can access the virtual lab from anywhere and anytime via an Internet connection.
Drawbacks of virtualization
- Virtualized programs can operate more slowly than locally installed programs in cases of resource intensive activites. This was evident when students used GoldenThread, a software tool for assessing, verifying, and communicating image digitization quality standards, to analyse scanned images. Using the hypervisor (see below) to allocate additional resources to a virtual server may help alleviate such limitations.
- Since virtualization adds a layer of complexity, virtualized applications may need some extra configuration to run properly on a virtualized server. Such extra effort would not be necessary in the case of a non-virtualized local installation onto a single computer.
PAVEL Project’s Virtualization Approach
Server side (physical and virtual)
The following illustration gives an overview of the approach to virtualization taken by the PAVEL project. Individual components of this illustration will be discussed in the following paragraphs.
All of the virtualized software used by the PAVEL project is installed on a single Dell physical server.
This physical server is setup like an extremely powerful personal computer. For example, while your personal computer may have 2 CPUs running it (dual core), the PAVEL server is using 12 CPUs (2x6 core). Like a personal computer, the physical server has CPUs, RAM, and hard drives. We’ll look at each of these in turn.
The CPU, or central processing unit, is the component of the system that carries out the instructions, or logic, of each piece of software. Many current personal computers will use between one and four CPUs, depending on the processing power needed by the user. Since our physical server will be used to run multiple virtual machines, we will need more than what is normally available in a personal computer.
Currently the PAVEL server is using 12 CPUs, set up in a 2 x 6 core configuration. The best cost-performance value to obtain the necessary processing power for the PAVEL system is currently the AMD Opteron Six-Core CPU. Two of these six-core (or “hexa-core”) CPUs give our server a total of 12 CPUs (at 2.593 GHz each).
RAM, or random access memory, is the component of the system that allows data to be accessed in any order. This is the memory that stores data and instructions when a user is interacting with the device in real time. At present, the PAVEL system is configured to use 32 GB of standard Dual Ranked 800mhz DIMM’s (dual in-line memory module) for RAM. This currently provides the best cost-performance value.
The hard drives, or data storage, of the system are where software programs, operating systems, and application data and information are physically stored. This type of memory is organized in a hierarchical structure, as in a directory tree.
For virtualization purposes it is suggested that at least four hard drives in a RAID (redundant array of independent disks) implementation are used. RAID technology allows multiple disks to be accessed as a single drive by the operating system. Different RAID configurations are available, each with a different balance between data redundancy and input/output speeds.
The PAVEL server’s six hard drives are configured as follows:
- Two 160 GB SATA drives in a RAID1 configuration. In RAID1, data is written to both drives, producing a set of identical drives. This is where the VMWare OS (hypervisor - see next paragraph) is stored, and, because there is little read/write activity with this disk, RAID1 is used because it offers the best redundancy when read/write activity is not a concern.
- Four 1 TB SATA drives in RAID10 configuration. The RAID10 configuration combines the features of the RAID1 configuration (identical drives) with the features of RAID0 configuration. RAID0 features a striped set of drives, in which chunks of data from one file are written sequentially to separate drives, providing faster access than a single drive could. This is the main datastore for the virtualized machines, as well as the software they will run and the data to run them. RAID10 is the configuration for these hard drives, because it offers the fastest read/write access while still maintaining redundancy.
A hypervisor, software that is used to manage virtual servers, is installed on this physical server. There are many brands of hypervisors available. The PAVEL project selected the VMware vSphere Hypervisor (aka VMWare ESXi) based on consultation with our computing staff.
Instead of installing server software, such as Windows Server 2008, directly onto the physical server, the server software is installed through the hypervisor. This installation through the hypervisor enables the physical server to deliver programs virtually to multiple users. The hypervisor can also be used to control user permissions for accessing the virtualized programs as well as other settings, such as how much processing power (CPUs), RAM, and hard drive space to allocate to a virtualized server.
These virtual servers, enabled through a hypervisor, can be accessed remotely and operated in basically the same way as non-virtualized servers.
On the PAVEL project server, the hypervisor is set up to create four virtual machines, with resources allocated through the hypervisor. These virtual machines are configured differently in order to meet the different requirements needed for the various applications deployed through the grant.
- “NEH528” Server: 4 Virtual CPU’s, 4GB RAM, 50GB Data Storage, Windows 7 OS
- Software Installed: Jacksum (Checksum), JHOVE, JHOVE2, BagIt
- “NEH675” Server: 4 Virtual CPU’s, 4GB RAM, 110GB Data Storage, Windows Server 2008R2 OS
- Software installed: Golden Thread, Object Level Database, Device Level Database
- “PAVEL” Server: 4 Virtual CPU’s, 4GB RAM, 132GB Data Storage, Windows Server 2008R2 OS
- Software installed: Oxygen XML, Altova XMLSPY, Archivists Toolkit Client, Archivists Toolkit Maintenance Program, Marc Edit
- “TITANIA” Server: 4 Virtual CPU’s, 16GB RAM, 214GB Data Storage, Red Hat Enterprise Linux 5 OS
- Software installed: Archon, ICA AtoM, MySQL server, Collective Access - Providence, Collective Access - Pawtucket
Once enabled, each virtual server can be used to install programs that can be accessed by remote users. For example, once the Archivist Toolkit has been installed on the virtual machine, in this case the PAVEL Server, it can be made available to all users who have been granted permission to access this virtual server. In the PAVEL project’s implementation, access is achieved by the instructor passing a list of student login names to our computing staff, who then input the names into the Active Directory software to grant access to the students.
Figure 2: Virtualized software running (via RDP) on a Mac.
Once the software program (in our example, Archivist Toolkit) is installed on the virtual server, RemoteApp Manager software is used to create an RDP (Remote Desktop Protocol or .rdp) file for that specific program. RDP is a protocol developed by Microsoft which gives the user a graphical user interface to another machine running remotely (see Fig. 3). RDP files are created using the Microsoft RemoteApp software, which generates a text file with the correct settings for the connection made by the Remote Desktop Client software (see next section); Windows versions of RDP files must be TXT files, while the Mac counterparts of these files must be converted to XML files (this conversion is achieved by opening the TXT file on the Mac and converting some settings stored in the file to work properly on a MAC, such as adding the name of the Active Directory server used to grant access). Then users can download and store the RDP file onto their personal computer so that they can access the program (Archivists Toolkit) from the virtual server. The RDP file can be made available for download online to ease and standardize access to multiple users.
Figure 3: Example RDP files for project software applications for Windows and Mac operating systems
To access the virtualized program users will need to download the RDP file illustrated in Fig. 2 above. In addition they will also need to install a Remote Desktop Client (RDC) program that can run the RDP file. For example, Windows computers would use the Remote Desktop Client program which comes pre-installed on Windows XP and later. Separate files can be downloaded and installed on Mac and Linux computers.
Once the RDP and Remote Desktop Client programs are enabled, users would then be able to click on the RDP file, log into the server (see Fig. 5), and access the virtualized application. The application will appear and function basically as it would if were installed on the user’s own computer. For example, a Mac user logging into the “PAVEL” server will see the following:
Figure 5: The server login screens for a user running Microsoft RDC on a Mac.
From the user’s perspective, accessing and using software on a virtualized system is no different than accessing a traditional physical server. As can be seen in Fig. 4 above, when the user logs into and accesses software on, for instance, the “PAVEL” virtualized server, it appears to the user that he or she is simply accessing a physical server named “PAVEL.”