Many of our vSAN customers are familiar with the terms "objects" and "components" but new and experienced users alike occasionally get confused about what they are, why they exist, and how they are different from each other. Whether you are a seasoned vSAN administrator, or a novice looking to better understand how vSAN works, this post explains these concepts in new ways to help you better understand what these terms are, and why they matter.
Why Objects and Components Exist in vSAN
Storage systems will often house data on a block storage device or unit, such as an individual SSD, or a LUN on a storage array. SCSI commands are used to send and receive data and presented through some addressable form, such as a file system. Whether it be an OS like Windows using NTFS, Linux using Ext4, or vSphere using a clustered file system like VMFS, these file systems tend to treat a single volume as a logical boundary of data that must remain pristine with all metadata and data readily available. If there are issues anywhere in the system, portions of metadata that are responsible for presenting data and directory structure information may be unavailable, which can hit arbitrarily across a file system. While that sounds disconcerting, file systems can work well when the physical boundary is contained within a single storage device such as a laptop, or a storage array that has its redundancy built into a single physical enclosure.
Distributed systems provide storage by aggregating storage resources from several nodes or hosts. But a well-designed distributed storage system must account for temporary or sustained failure conditions in hosts or devices across a cluster and scale-out easily when adding more hosts. The use of a classic file system in a distributed architecture would not demonstrate the desired traits under these conditions.
Therefore, vSAN takes a novel approach to storage. It provides storage that is most analogous to an object store model. Rather than use a classic filesystem that would create a large boundary of data spread across all hosts, it uses a small boundary of data for entities that you wish to store on the system, such as virtual machines. This approach shares some similarities with vVols.
Figure 1. Comparing how vSAN uses a smaller boundary of data through an object store model.
With this model, we use the terms "objects" and "components" to express a very generic data structure that is visually easy to comprehend. Let's look at these in more detail.
In vSAN, we can think of an object as a logical boundary of data. It is treated as its own block storage device/unit and uses SCSI commands to send and receive data, but it has a much smaller boundary of data than a traditional storage system described earlier. When a VM is stored on a vSAN datastore, it is usually comprised of a handful of objects. While there are many types of objects in vSAN, the most common example of an object is a virtual disk, or VMDK - and is what we'll use for this post. When compared to a clustered file system, this results in a smaller, more granular boundary of data access, and determiner of data availability. This provides important benefits for vSAN administrators.
- Better availability. Data availability is determined on a per-object basis, rather than an entire file system spread across all hosts.
- Simplified scalability. vSAN is the arbiter of access instead of relying on the locking mechanisms of a file system. It understands concepts around ownership, knowing which hosts can and should access that object at any given moment. This removes the complexity of access when clusters scale up in host count.
- A more granular level of management. Administrators can easily prescribe desired outcomes to different objects using storage policies, rather than a one-size-fits-all level approach.
A vSAN object is the smallest manageable unit of data. Through the use of storage policies, A single storage policy can prescribe a common outcome to a group of VMs (many objects), a single VM (objects that relate to just one VM), or even a single object, such as a VMDK.
Figure 2. A resilient VMDK object courtesy of an assigned storage policy.
Although vSAN refers to these units of storage as objects, it does not mimic all traits of object storage, such as immutability of data, or compatibility across other object store types. You can however store S3-compatible object data on vSAN using our Data Persistence platform paired with a certified solution from partners such as Cloudian, Dell/EMC ECS, or Minio.
To make objects resilient, we break objects into smaller chunks of data and ensure copies of that data are placed on different hosts across a cluster. Components represent a chunk of data contained within an object. An object may consist of one or more components and may have some type of hierarchical association to other components in an object, often known as a RAID tree.
The layout of object components - such as location, count, and relationships - will all be determined and managed by vSAN. Several factors influence this, including resilience policy rules applied by the storage policy, amount of data in the object, available disk capacity on a host, etc. For example, vSAN understands that any single component that is a part of an object using a RAID-5/6 data placement scheme cannot have the other components in that stripe with parity residing on the same host. This "anti-affinity" is built right into vSAN and ensures that the data remains available in the event of a host failure. vSAN takes care of all of this so you don't have to.
Figure 3. Components of an object when assigned a RAID-1 storage policy.
In many ways, it is best to think of components as an implementation detail. They are not a user-manageable entity and do not need you to perform any management tasks at a component level. So why do we talk about them? We describe them through words and illustrations to help customers understand concepts around data availability, data placement schemes, and movement of data. It can be useful to know when a particular object is in a degraded state, or perhaps how a new storage policy impacts how the data is placed.
Visual Representations of Objects and Components
Data placement schemes of object components can get complex. An object using an erasure code will have a different placement scheme than an object using a mirror. Components may be split into smaller chunks as a result of the "Number of disk stripes per object" rule assigned in a storage policy, or they may be split due to capacity constraints within a cluster. The great news is that the administrator doesn't need to concern themselves with where the components live. But vCenter Server will show you the details of object data, including the component locations.
When an object is using a storage policy that uses a RAID-1/mirroring data placement scheme, you will sometimes see illustrations (such as in Figure 3) that show multiple object replicas. This may give the impression that there are multiple objects for the VMDK. We do this to help illustrate the hierarchical relationship of this mirrored data as replicas, but an object using a RAID-1 mirror is only one object and has only one object ID. We group the mirrored components in a manner that represents the RAID tree accurately. A RAID-5 or RAID-6 object uses a data+parity erasure code to place the data in a resilient way across hosts and would look similar to Figure 4, below.
Figure 4. Components of an object when assigned a RAID-6 storage policy.
Management and Monitoring
For an administrator, since a vSAN object is the smallest manageable unit of data, this is where the emphasis should reside. vSAN Objects are managed by applying a storage policy to them. The storage policy defines the desired outcome, and vSAN works to ensure that outcome if the cluster is capable of doing so. Since components can play a part in the availability of an object, vCenter Server will show the placement of object components, which also helps illustrate details of an availability issue.
Figure 5. Identifying the placement of components in an object.
Skyline Health for vSAN also provides dozens of health checks that will focus on the health of each object but will also provide information about the related components if there is a triggered health alert.
Figure 6. Using Skyline Health to observe the health status and compliance status of objects.
Recommendation: Use Skyline Health and the monitor tab for most of your vSAN administration. Unlike traditional datastores, there is very little need to use the "Datastore" view in vCenter Server for a vSAN powered cluster.
vSAN uses an approach to data storage that is most analogous to an object store model. Since vSAN is a distributed storage solution, this model helps us provide predictable levels of availability at a granular level, while also providing the agility to scale easily when needed.