VMware Data Services Manager (DSM) - FAQ

Answering Data Services Manager DSM technical questions you may have

VMware Data Services Manager is a VMware solution that offers a data-as-a-service toolkit for on-demand provisioning and automated management of PostgreSQL and MySQL databases on a vSphere environment. VMware Data Services Manager provides a graphical user interface and a REST API in the toolkit, enabling administrators and developers to get the most out of the service. VMware Data Service Manager 1.5 is now available with several improvements to help deploy and manage databases faster.

 

Architecture.

General Questions 

Which vSphere versions are support with VMware DSM?

DSM V1.5 supports vSphere 6.7, vSphere 6.x and vSphere 8.0.x as per the VMware DSM v1.4 Release Notes - https://docs.vmware.com/en/VMware-Data-Services-Manager/1.5/data-services-manager/GUID-release_notes.html#release-1.5.x

Which Guest Operating System is used for the Database VMs in DSM?

For DSM v1.4 and DSM v1.5, the Guest OS is VMware Photon OS 3.0.

Is there a need to change vCenter networking configuration? 

No. There is no need to change any networking configuration for your vCenter or vSphere hosts. vCenter is not directly connected to VMware Cloud. There is no need to build VPNs.

Can Photon OS be upgraded using non-DSM based tools (e.g., suite manager)?

No, Upgrades to Photon OS is only available through the images that VMware provides for DSM. Patching must be done via the DSM UI/API - it cannot be done through any other tool set.

How do you configure multiple agents in multiple vCenter Servers managed from the same DSM deployment?

Database replication for PostgreSQL and MySQL cannot work over different vCenter Servers in version 1.4 or 1.5 of DSM. Thus all components belonging to the same (clustered) database must be deployed to infrastructure that is managed by a single vCenter Server.

How is the licensing managed?

For the DSM product, licensing is done on a per-core basis which is the same as vSphere. This is reported within DSM Manager.

Resources and Limits

General information about the resources required and limits of the platform. 

What are the resource requirements of the Provider VM?

As of DSM v1.5, the resource requirements of the Provider VM is 8 vCPUs, 16GB Memory and 5 VMDKs totaling 735.81GB. All disks follow the storage policy settings for thick, thin, SPBM, etc.

What are the resource requirements of the Agent VM?

As of DSM v1.5, the resource requirements of the Provider VM is 8 vCPUs, 16GB Memory and 6 VMDKs totaling  667.88GB. All disks follow the storage policy settings for thick, thin, SPBM, etc. 

How many operations can be run in parallel in VMware DSM v1.5?

VMware Data Services Manager supports the following Agent concurrency limits, as per the DSM v1.5 Release Notes https://docs.vmware.com/en/VMware-Data-Services-Manager/1.5/data-services-manager/GUID-release_notes.html#release-1.5.0:

  • 8 parallel database deployments (create, restore, PITR, clone, add replica operations) with a bounded waiting queue of size two (2).

  • Operations with an unbounded waiting queue:

    • 8 parallel database management operations (for example: power on/off, promote, generate log bundle).
    • 5 parallel backups.
    • 5 parallel transaction log uploads.

 

 

 

What are the resource requirements of a PostgreSQL PG_Monitor VM?

2 vCPUs, 2GB Memory and 10GB storage.

Is it possible to store/process the data in a different location?

Not at the moment. Expansion to other regions including Europe is on the roadmap. We do not have an estimated time of release yet.

Is the infrastructure dedicated to VMware Cloud Services?

Yes.

How to enable gateway High Availability?

Currently, there is a one-to-many Appliance-vCenter relationship. Only one Cloud Gateway can be connected to a single vCenter at a time. No high availability capabilities have been included. We recommend utilizing vSphere high availability to protect the Cloud Gateway appliance.

How to backup and restore the Cloud Gateway appliance?

The vCenter Cloud Gateway Appliance is a stateless appliance which means that it doesn’t contain any data that needs to be backed up or restored.  If something goes wrong with the appliance administrators are encouraged to engage support, so procedures to fix the appliances can be follow, or in last resource a replacement appliance may need to be deployed.

Take note of all your configuration details such as IP address, DNS, appliance name and store that in a safe place; as this will be  needed in the case of a redeployment. 

What happens if Cloud Gateway loses cloud connectivity? Will vSphere environments go offline?

No. vSphere, vCenter and Virtual Machines continue to run. There is no impact to vSphere environments in the case of gateway disconnection. In VMware Cloud Services console, the Cloud Gateway will appear as disconnected. vCenter data will not be updated on the cloud console. Deleted and disconnected Cloud Gateway instances are removed from the cloud after 30 days.

After approximately 7 days, administrators will start seeing some errors while connecting to vCenter server to perform administrative tasks. In order to login they must follow the procedure to generate an emergency token described in the KB Article 83798. Connectivity should be restored as soon as possible.

What if there is a need for a disconnected or ‘air-gapped’ environment?

If establishing a permanent connection from Cloud Gateway to VMware Cloud is not an option, not even through a proxy, then a different licensing option might need to be considered. 

Will there be downtime to vSphere environments when connecting to vSphere+?

No. The only change that happens is the connection is established between the Cloud Gateway and the cloud, and the Cloud Gateway against the vCenter(s).

 

Databases

Database FAQ.

Which database are currently supported by VMware?

As of release version 1.5 of VMware DSM, the database supported are PostgreSQL, MySQL and early access to MS SQL Server. Check the Release Notes (https://docs.vmware.com/en/VMware-Data-Services-Manager/1.5/data-services-manager/GUID-release_notes.html#release-1.5.0) for the exact versions of each database that is supported.

Does DSM manage SQL Server database deployed on Windows OS?

No. The MS SQL Server that is deployed by VMware DSM is the Linux container image available from the Microsoft Container Registry. The Guest OS that we use to host this database is VMware Photon OS. This is similar to how databases for PostgreSQL and MySQL are deployed, both of which are also based on docker container images.

What is happening during the preconfigure and configure phases of database creation?

The preconfigure step is making the database VM ready for configuration by adding the necessary software components. When the configure step begins, the database configuration files are creating, the user and user database are created and the database is started.

How is failover handled in PostgreSQL?

When enabling HA on a PostgreSQL DB in DSM v1.4, PostgreSQL uses an extension called pg_auto_failover. pg_auto_failover is a service for PostgreSQL that monitors and manages automated failover for a PostgreSQL cluster. This requires the creation of a monitor node/VM which acts both as a witness and is also used for orchestration - see https://pg-auto-failover.readthedocs.io/en/main/intro.html.  The monitor observers the state of the database nodes and assigns them a specific goal state. It co-ordinates the cluster and will initiate fail-overs when appropriate. The primary.<fqdn> mapping points to the IP of new Primary, and this is the DNS entry which should be used by clients to connect to the database. Thus, a correctly working DNS service is essential for PostgreSQL clustering.

Though PostgreSQL has auto failover, the DSM control plane adjusts the metadata once the auto failover has happened (by polling the cluster status). During this process, control plane will trigger a task called Promote Replica which will adjust the control plane metadata. The primary.<fqdn> mapping is shifted to the IP of new Primary during the Promote Replica process. Clients should continue to connect to the DNS entry primary.<fqdn> to ensure that there is minimal impact to their DB connection in the event of a failover. Interruption may be observed because it takes 5 minutes (the polling interval) for the control plane to detect the auto-fail-over and trigger the promote task.

How is failover handled in MySQL?

In DSM v1.4, MySQL HA has a virtual Cluster IP which can be used by users to access the database. In the event of a failover, there is no need for users to worry about DNS mappings. Users can stay connected to the virtual IP which routes the client connections to the new primary. This is handled by an internal MySQL Router component. The virtual IP address (Cluster IP) is associated with promoted replica – there is no need to use an additional VM such as PG_Monitor. VMware DSM uses MySQL InnoDB Cluster (also known as Group Replication functionality). An InnoDB Cluster consists of at least three MySQL Server instances & utilizes the Group Replication feature. More at https://dev.mysql.com/doc/refman/8.0/en/mysql-innodb-cluster-introduction.html. Clients should use different ports when connecting to a clustered MySQL databases and not the port used with standalone MySQL databases. Clients should not use the default port of 3306 which is the port used when connecting to a standalone MySQL database. Instead, clients should use the following ports with the virtual IP address (Cluster IP) of the clustered MySQL database:

Port 6446 - Read-Write Connection Port 6447 - Read-Only Connection

 

As mentioned, MySQL InnoDB also utilizes MySQL Router, a lightweight middle-ware that provides transparent routing between your client application and InnoDB Cluster. MySQL Router provides routing and load-balancing for client connections. An InnoDB Cluster usually runs in a single-primary mode, with one primary instance (read-write) and multiple secondary instances (read-only).

 

Can I encrypt databases?

Yes, you can encrypt database contents, but it has to be done at the infrastructure layer. Therefore, on vSphere, either vSAN encryption or VM encryption can be used. The vSAN encryption feature encrypts the whole of the vSAN datastore, so any workloads deployed to this datastore is encrypted. VM encryption (or VMcrypt) works at a more granular layer of VM or VMDK. This does require a storage policy which used VMcrypt, and also requires a KMS (Key Management Service) to be enabled on the vSphere infrastructure. Since storage policies cannot be selected from withing the DSM UI, the step to associate a DB VM/VMDK must be taken in the vSphere client.

Does VMware DSM support integration of database users with an identity and access management system (IAM), e.g. Active Directory, LDAP, etc?

No, VMware DSM leaves this task of managing database users to the customer. LDAP integration is available for DSM users (Provider Admin, Org Admin, Org Users), but the database users themselves are not integrated with any IAM via DSM. Instead we provide a single database user (dbaas) to login to the database. Our thought process here is that customers would integrate their own IAM for the database users.

What are the health checks that VMware DSM runs against a deployed database?

there are 12 health checks. (1) Connectivity, (2) Data Disk Health, (3) System Disk Health, (4) CPU Health, (5) Max Connections, (6) Metrics, (7) NTP Sync, (8) Telegraf Service, (9) Database Bin Log, (10) Database Bin Log Cloud Sync, (11) Database service and (12) VM Password Expiry.

Which database are currently supported by VMware?

As of release version 1.5 of VMware DSM, the database supported are PostgreSQL, MySQL and early access to MS SQL Server. Check the Release Notes (https://docs.vmware.com/en/VMware-Data-Services-Manager/1.5/data-services-manager/GUID-release_notes.html#release-1.5.0) for the exact versions of each database that is supported.

Are database backups full, differential or incremental?

The first backup taken of a database is a full backup. All subsequent backups are incremental. Once the Backup Retention time has expired, a new backup chain is started. Previous backup chains are still available for restore until purged. A backup chain won't be purged until the last backup in the chain has expired. Restores apply the full backup in the chain to begin, and then apply all relevant incremental backups to reach the recovery point requested.

How are database backups and restores implemented on PostgreSQL?

For PostgreSQL databases, DSM uses pgBackrest, a postgres extension (https://pgbackrest.org/). DSM has used this mechanism since version 1.4. This mechanism uses archive WAL (Write-Ahead Log). This means that we can restore to any PIT (point in time). This feature uses incremental backups, not differential one. Thus, when there is a restore, a full back up is used first and then the incremental backups are be applied one by one to get to the PIT. PITR is essentially a restore plus a replay of transactions held in the backed up transaction logs (generic name for WAL/binary logs). WAL is the mechanism that PostgreSQL uses to ensure that no committed changes are lost. Transactions are written sequentially to the WAL and a transaction is considered to be committed when those writes are flushed to disk. Afterwards, a background process writes the changes into the main database cluster files (also known as the heap). In the event of a crash, the WAL is replayed to make the database consistent.

Note that restore operations involve the deployment of a new Photon OS OVA, installation and configuration of DSM components, and the creation of a new PostgreSQL database before the original database contents are restored. It is not an in-place restore, and the restore task requests that you provide a new VM name as part of the workflow.

How are database backups and restores implemented on MySQL?

For backing up and restoring MySQL databases, DSM uses Percona’s XtraBackup tool. 

Note that restore operations involve the deployment of a new Photon OS OVA, installation and configuration of DSM components, and the creation of a new MySQL database before the original database contents are restored. It is not an in-place restore, and the restore task requests that you provide a new VM name as part of the workflow.

How does Restore differ from a Recover operation?

All restore operations are done to a new VM to avoid propagating any VM issues.  VM deployment and configuration takes up a lot of the time during a restore operation. Recover operations are in-place. Recover is intended to be used as a fast get out of trouble tool. If recover fails because there is some problem outside of the database data, a restore operation would be required.

What time does the backup purger run?

Purging of backups is done daily at 12 by a scheduler. The chain expiryTime = chainCreationTime + localRetentionDays.

Can I recover a clustered database from backup?

Yes, you can recovery the Primary database, but you cannot recover the replicas of a clustered database. If you recover a Primary database from a backup that is created earlier than the replica, the replicas of the Primary database become INACTIVE. You need to delete the replicas in the INACTIVE state and recreate the replicas of the Primary database once more.

Can I do an in-place restore of a database, overwriting the current database?

No. The database restore operation in DSM creates a new Photon OS VM, builds a new database and restores the backup to this new database. There is no in-place restore mechanism in DSM. However, there is a recover operation, which is done in-place. If a recover operation fails because there is some problem outside of the database data, a restore operation would be needed. This will create a new VM and database.

Does VMware DSM control the size of the WAL (Write-Ahead Log) in PostgreSQL databases and binary logs in MySQL databases?

If automated backups are enabled, VMware Data Services Manager limits the size of WAL files of PostgreSQL Primary databases (15% of actual disk size) and Binary log files of MySQL Primary databases (10% of actual disk size) of a High Availability (HA) cluster. This feature prevents the exponential growth of WAL files and of Binary log files and helps in the efficient functioning of database HA clusters. Write-Ahead Log (WAL) files or binary log files log all the updates of the Primary database of a HA cluster.

Is it possible to do additional customization of databases via predefined templates over and above what is available as advanced parameters in the UI? For instance, is it possible to configure the database names and configuration parameters? Is it possible to tune the PostgreSQL database parameters with DSM?

There are a number of database parameters which can be customized using DSM. For PostgreSQL, please look at the official documentation on the topic located here: https://docs.vmware.com/en/Data-Services-Manager/1.4/data-services-manager/GUID-postgresql.html#dbopt_cfg

Can the database creation parameters be reused for creating another database, i.e. can one export the database creation config file of an existing DB and use it as a template to create a new database?

It is possible to export the parameters through the API, but it's not in the same format as the creation config. In general, API payloads can be re-used with minimal modifications.

For example:

If a customer wants to deploy multiple databases of PostgreSQL 13 with same configuration, the  only parameter that will have to change is the DB instance name. The remaining parameter can stay the same. Now, If the customer wants to deploy few databases of PostgreSQL 12 or MySql, then database instance name and logical build id (which refers to the DB type) will be different. Other parameters can remain the same.

 

 

  • Exporting a database configuration is not possible via the UI. But If you use postman for API requests, then the configuration can be exported to multiple formats (curl, java, etc,,..) and reused with minor modifications.

 

When upgrading a Clustered database, either PostgreSQL or MySQL, is the upgrade done is a rolling fashion which allows the database to stay online?

Yes. For PostgreSQL, upgrades on cluster happens in rolling fashion with this sequence. First upgraded is the monitor --> replicas --> primary db. So when primary db upgrade is happening , auto failover happens and upgraded replica is promoted to new primary. Since MySQL uses a virtual IP address to front the database. The virtual IP address will be routed to new primary on rolling upgrade, thus access is not impacted during upgrades either.

Networking and Security

Considerations for networking deployment and security of the solution

How many networks are required for DSM v1.5 implementation on vSphere?

2 networks are required for DSM V1.5. A management network, where the Provider communicates to agents, as well as enabling the agents to communicate to the vCenter server. And an application network where the end-users can access the deployed database VMs. Note that there may be multiple application networks. This topology has been simplified from the 3 networks required for 1.4 DSM.

Is DHCP required on any of the networks?

In DSM v1.5, DHCP is required on the application network for database VMs to pick up client side IP addresses. In DSM v1.4, DHCP is required on the control plane network to provide IP addresses to the database VMs (Provider and Agent must be provided with static IP addresses on this network). In DSM 1.4, DHCP is also required on the application network for database VMs to pick up client side IP addresses, same as in v1.5.

Is DNS required on any of the networks?

Yes. DNS is required on the application network when configuring clustering to create highly available databases. The different database VMs need to be able to resolve the FQDNs of the other participants of the DB cluster. The easiest way to configure DNS for DSM is to add a Conditional Forwarder to your central DNS for the DNS suffixes used by your organizations. The forward should point to the Provider VM which uses systemd-resolved as DNS resolver. The Provider keeps all components belonging to the databases in its own /etc/hosts. Any requests for resolution using the DNS suffixes used by your organizations are first sent to the central DNS, and then onto the Provider VM which can then resolve the names. Note that in DSM v1.4, DNS is not required on the control plane network. DNS is also not needed on the application network for standalone databases.

Do the different DSM networks need to be able to route to each other?

In the case of the control plane network in DSM v1.4, it needs to be able to route to the management network. The agent VM needs to be able to connect to vCenter to discover environments. Since Agent is only plumbed up onto the control plane network, it need a route to the vCenter via the control plane network.

For the application network in the case of DSM v1.4 and DSM v1.5, there are also some routing configurations. If the centralized DNS is located on, for instance, the management network, then there will need to be able a route from the application network to the management network. This is because the clustering configuration relies on the FQDN of the nodes and thus will need to be able to resolve the names to IP Addresses.

Similarly, if the lookup for resolving the FQDN is sent to the Provider as part of a conditional forward for resolving, then the application network will need to reach the provider on the management network (assuming that is what was added to the conditional forward configuration of the central DNS server). This will mean that the customer may need to have the DHCP server updated to provide the database VMs with some additional routing information along with the IP address.

How do I use the DNS resolver capabilities of the Provider?

The VMware DSM Provider VM has a built-in DNS resolver for the database VM names and the primary database name. You can add a conditional forwarder to your central DNS which redirects queries for the clustered database back to the IP address of the Provider VM.

Can I deploy VMware DSM in an air-gapped vSphere environment which has not connectivity to the internet?

Yes. Instructions on how to setup DSM in an air-gapped vSphere environment are in the official documentation. Putting it simply, administrators will need to manually populate S3 buckets with database templates and updates rather than automatically downloading them from VMware. Instructions can be found here - https://docs.vmware.com/en/VMware-Data-Services-Manager/1.4/data-services-manager/GUID-provider-manual_provrepo.html

Can customers use multiple app networks for the same agent? For example, if customers have applications hosted on different networks?

Yes, Multiple application networks can be used for the same Agent / DSM environment. Connectivity to Application network is not expected from any DSM component and this network is solely for DB access. Hence, customers can isolate this network however they want. However, in case of a database cluster, If VMs belong to same DB cluster spans across multiple application networks, then the routing between those networks should be configured by the user in order to get the replication working as expected.

Does DSM use a firewall? If so, which ports are open on the database VM?

These ports are open in DB VM:

22 - (eth0 and eth1)

443 - (eth0)

33060 - (eth1)

33061 - (eth1)

6446 - (eth1)

6447 - (eth1)

6448 - (eth1)

6449 - (eth1)

3306 - (eth1) - Myql

5432 - (eth1) - Postgres

Are the exchanges within DSM encrypted?

For backups, it depends on the backup storage. If you use AWS S3 or MinIO S3 Object storage with TLS, then the traffic is encrypted. If you use MinIO without TLS, then it's plain HTTP traffic. Note that as of DSM v1.4, TLS is required on the S3 buckets. For replication, SSL is enabled.

Agent and DB are communicating over two-way TLS. So, this traffic is encrypted.

Client to database communication can also have TLS configured, but this is not enabled by default.

Storage

Storage FAQs.

Why do I need to configure so many S3 buckets? What are they used for?

There buckets are for the Provisioner. The first is used to store downloaded database templates and updates from VMware. Another is used for Provisioner Logs and the final one is used as a backup location for the Provisioner.

There are 2 other S3 buckets required for database backups. One backup bucket is described as a 'local' bucket while the other is described as a 'cloud' bucket.

There is also a bucket required for each Agent as each agent should have its own unique bucket. These Agent buckets are used to hold a copy of the database templates for that agent's environment.

When configuring TLS for S3 object stores, are there any special requirements for the certificates?

Yes, the certificate needs to include a CN (Common Name). Some S3 providers such as MinIO have tools that create self-signed certificates that do not include a CN (e.g. certgen). This can lead to certificate validation/handshake errors e.g. 'NoneType' object has no attribute 'decode' when trying to store items in the buckets. We are looking to add a certificate validation step in future releases.

Filter Tags

vSphere+ Blog Document