VMware Greenplum on Azure: Ripe and Ready for Consumption

September 29, 2022

The recent release of VMware Greenplum 6.7 on Microsoft Azure is loaded with new and improved menu items that are sure to satisfy your big data and analytics appetite. VMware Greenplum on Microsoft Azure is a mouthful, so for the rest of this article, I will refer to it as simply Greenplum on Azure. Let's sample the new additions below:

New Instance Options

Microsoft Azure contains a wide range of cloud instance types, everything from general-purpose to resource-optimized (CPU, Memory, Graphics, etc.). There is an even wider range of virtual machine sizes, or instances, within the respective instance types. The primary consideration for choosing both the right instance type and the right instance is the performance requirements of the application that will be run on the particular instance.

Database performance is heavily dependent on disk I/O and memory usage. Likewise, sufficient memory availability plays a critical role in query processing and query performance. Therefore, Greenplum on Azure is best suited for Memory Optimized instances, as shown in the table below.

Instance Types

Choosing the right instance within an instance type is a little bit of art and a whole lot of science. It is often a delicate balance between cost and performance. In earlier releases, a small number of instances were supported: D13_v2, D14_v2, H8, and H16. Version 6.7 of Greenplum on Azure has removed support for the higher-cost, high-performance H8/H16 instances and added support for 4 new lower-cost, high-performance E8/E16 instances (see the chart below).

Instance Size

The addition of these newly supported instances provides the consumer with enhanced flexibility for the following:

  • Deployment of new Greenplum on Azure clusters
  • Migrating existing Greenplum on Azure deployments
  • Scaling existing Greenplum on Azure deployments
    • Which is a combination of deploying a new, scaled-out cluster and then migrating to that new cluster, as it is not possible to mix instance types within clusters

 

The current list of supported instances allows a consumer to choose the instance size that optimizes cost and satisfies the initial performance requirements of the intended use case. As those requirements change over time, a consumer has the flexibility to scale out the existing deployment or migrate to other supported instances.

Network Deployment Flexibility

Prior to version 6.7, each Greenplum on Azure cluster had to be deployed to a dedicated Azure Virtual Network. This network architecture increases the operational burden, as it requires virtual network peering or a custom routing solution to allow existing Azure cloud resources to communicate with each Greenplum on Azure cluster deployment.

Different Subnet

Greenplum on Azure 6.7 enables the deployment of multiple Greenplum clusters into the same virtual network with different subnets. Azure cloud resources and Greenplum clusters that reside in the same virtual network but different subnets can communicate by default, requiring no additional or special network constructs. This makes it easier to integrate Greenplum clusters with cloud applications and services running on these existing virtual networks.

Same Subnet

Static IP Address Support

Greenplum on Azure uses an Azure Resource Manager (ARM) template to automate the deployment of a cluster in the Azure cloud. To execute a deployment, the user must provide a few network details as shown in the configuration parameters screenshot below.

Network Parameters

In previous releases, once a deployment is executed each master and segment node obtains an IP Address dynamically. Furthermore, that dynamically assigned IP Address could change over time based on various operations that occur on cloud instances. This automatic addressing mechanism increases the administrative burden, especially for deployments with multiple Greenplum clusters.

As mentioned in the "Network Deployment Flexibility" section, Greenplum on Azure 6.7 requires the deployment of each Greenplum cluster into a dedicated subnet. Static IP Addresses are automatically configured for all instances within the deployment, with the master node being allocated the .4 address of the subnet and segment node addressing starting with the .5 address of the subnet. Therefore, all Greenplum on Azure cluster deployments have standardized and consistent network configurations making it much easier to administer.

Static IP

Object Storage Flexibility

Greenplum on Azure uses a number of tools to automate the deployment, management, and execution of a Greenplum cluster in the Azure cloud. The gprelease utility is used to upgrade a Greenplum cluster to the latest minor release available. Gprelease automatically downloads the software binaries, copies the binaries to the instances in the Greenplum cluster, stops the cluster, installs the newly downloaded version, and then starts the cluster again. In versions prior to 6.7, an Amazon S3 bucket is required as the centralized container for the software binaries.

S3 Storage

In version 6.7, the gprelease utility can be used to specify an Azure Blob Storage endpoint as the centralized container for software binaries. If a Blob Storage endpoint is not specified, the default S3 endpoint will be required. Using Azure Blob storage for software updates simplifies the configuration and allows all updates to be executed and maintained within a single cloud environment. Providing and maintaining the binaries and packages in S3 is the responsibility of VMware. However, the use of Azure Blob Storage requires that the user copy the binaries and packages from the default S3 bucket to the Azure Blob Storage container. The contents in the storage container must remain synchronized with the contents in the S3 bucket. While this increases the Administrative effort for maintaining object storage, it would be negligible through the use of scripting and automation to maintain synchronization.

Azure Storage

Resource Management Flexibility

Prior to version 6.7, each Greenplum on Azure cluster was deployed in a dedicated, empty resource group. The primary benefit of resource groups is to allow the grouping of resources that will be managed collectively. This simplifies the management and administration of Azure resources when you consider operational tasks such as applying permissions and lifecycle management. Therefore, the requirement of a dedicated, empty resource group per Greenplum cluster increases the operational burden of managing those resources.

Multiple RGs

In version 6.7, Greenplum on Azure clusters can be deployed to existing, non-empty resource groups. This allows the consumer to deploy a cluster in the resource group of their choosing, and those new Greenplum resources would inherit the policies and permissions of the resource group. The deployment of Greenplum clusters into existing resource groups lessens the operational burden of managing these new resources.

single RG

Conclusion

Assuming the above items did not completely fill you up, check out the following link to review other feature enhancements, updates, and consumption details included in the latest release of Greenplum on Azure.  This new version is loaded with new and improved features that will make it easier to deploy for many, flexible to manage for most, and more performant for all.  Don't wait… migrate to VMware Greenplum 6.7 on Microsoft Azure today!

Enjoy!

Filter Tags

AI/ML Modern Applications Blog Databases Intermediate