borwell have recently installed a multiple server cluster to host our internal IT services. This allow the company to run a large number of virtual machines across multiple hosts, whilst reducing downtime and resiliency. By using a number of features we are able to provide increased uptime, such as:
* High Availability Clustering
* Automated Failover
* Multi-Path IO (MPIO)
* Cluster Aware Updating
In the event of hardware failure Virtual Machines will automatically resume on another server ensuring that essential services such as E-mail and the Phone System remain online. This also means that physical servers can be updated and rebooted without taking any Virtual Machines offline.
Each server has multiple power supplies and multiple network cards to help reduce the risk of downtime in the event of hardware failure.
Exposing and synchronising local disks is managed by a third party piece of software that ensures disks are always in sync across the hosts in the cluster.
Data is synchronised between servers using multiple 10Gbps links, each server is connected to every other server in the cluster twice using a direct connection. Each server is configured to use iSCSI with Multi Path IO (MPIO). This allows the servers to use multiple network paths to access the same target disk adding additional resiliency to the cluster.
Updates are performed automatically on a weekly schedule, using Cluster Aware Updating. This automatically migrates virtual machines off a server to update it. Once the updates have been installed and if required the server has been rebooted, virtual machines are migrated back on to the server so the update process on the next host can be performed.
borwell chose the route of using hyper-converged storage as opposed to dedicated hardware solely for storage such as a SAN. This was due to it being a better long term investment in regards of hardware replacement.
Servers will need to be replaced regardless at the end of their viable life span, but we have effectively removed the cost of having to replace a SAN in a number of years as hardware reaches end of life due to the storage being located on the individual servers in the cluster.
As with anything you need to plan for growth! In this case we may need more compute power in the future. It is a relatively simple task to promote another server to be part of the cluster. It just needs to be installed into our racks, have the disk synchronization configured and then configured to be a part of the cluster. Virtual machines will then be able to migrate to and from it as required.
The same method would be employed for a complete hardware refresh, new servers would be added to the cluster. All virtual machines would then be migrated to the new servers, once this is completed all of the old servers would then be removed.