How to select infrastructure that enhances and optimizes VMware ESXi environments is a topic of discussion in many of the conversations we have with our clients. Virtualization Engineers, Storage Engineers, Network Engineers, and application owners are all quick to explain the issues they are having with their current environments. Some of the recurring themes we hear:
- It takes too long to get new datastores approved and configured!
- The virtualization platform is growing too complicated, too quickly for storage to keep up!
- It takes too long to provision VMs!
- The hypervisor is a black box to the networking team!
- It's so hard to troubleshoot issues with all these vendors!
- I can't wait until Friday!
Some of these issues stem from problems with the people and processes involved with the virtualization workflows, and we will save those for another blog.
Some of these issues, however, come from architecture decisions made without putting the unique requirements of virtualized workloads first. It can be tempting to continue designing systems the way we always have. Tried and true multi-purpose architectures around shared storage have been around long before VMware and they serve a purpose. Unfortunately, a keen eye towards optimizing these virtualized environments requires a deeper look at some of the requirements:
- A solution that integrates with vCenter or other areas of the underlying infrastructure to show port status, vlan assignment, and can trace packets and report on performance from the VM to the physical switch port.
- A solution that allows for microsegmentation in the future, even if it won't be implemented day 1.
- The ability to track performance not to the LUN, but all the way from the vDisk to the array.
- Disk pooling because no one should need to deal with disk groups.
- Granular storage-based snapshots of individual vDisks.
- The storage system should automatically migrate data to new disks without intervention from an admin.
- The ability to meet or exceed performance requirements at low latency.
High Availability and DR
- N+1 for all components, N+2 for anything mission critical, across all components including storage, compute, and networking.
- A solution that provides flexibility to use either application aware replication and failover, or storage-based replication.
- Load balancing and clustering for mission critical applications.
- Built-in replication technologies that minimize the number of solutions required to replicate and failover workloads.
- The ability to scale storage, CPU, memory, and networking quickly and easily.
- Horizontal and vertical scaling based on growth patterns.
- The ability to quickly and easily scale down, without manual data evacuation.
Infrastructure Lifecycle Management
- A solution that provides easy upgrades for software, firmware, and the hypervisor itself.
- The ability to add new components and remove old components all non-disruptively and with little to no intervention from administrators.
- The capability to provide a baseline of versioning across all components of the solution and remediate anomalies
- Limit the vendors in the environment to the extent possible to drive faster time to resolution. We like to say, "one back to pat" as opposed to "one throat to choke."
- Search for a vendor with a great NPS score.
- Understand each of the vendors' SLAs and the specifics around how they work with one another to solve any issues that arise.
Looking at each of the requirements above helps to form a basis for evaluating infrastructure solutions for virtualized environments. At AEBS we help clients make the best choices around virtualization infrastructure for their specific environments. Many times, this involves looking at the requirements above, ranking how important each is to that client, then working through demos and proof of concepts to prove out whether each requirement is met to that client's satisfaction. We'd love to guide you to a solution that optimizes your VMware environment as well!