vSphere Bitfusion 4.0.0 Delights Critics — Hollywood raves, "A Virtual GPU Pool Party", "All Secrets Revealed", "A Romp!"

September 01, 2021

I miss the deprivations of the past. I am not alone. I read an author nostalgic for fruit being out of season…because it was so special when it was in season. He noted that while blueberries still have a season when they are cheapest and best, there is never a time when you can't get them at all. 

For myself, I miss movies that you could only see once a year when they were on TV — on one of three American broadcast channels. It's hard to cherish a movie when it is always available. I don't want to watch something if I can just as well watch it next week.

But this blog is about six features and updates of the Bitfusion 4.0.0 release that I would miss if they went away — and also, six movies that used to be on TV in the 60's and 70's.


VMware's vSphere Bitfusion allows you to create pools of GPU servers from which remote GPU services can be dynamically allocated to client VMs running AI/ML applications.


Shall I compare thee to a summer's day?
No, I shall I compare thee to a clickbait headline.

Six movies Bitfusion barred from viewing, until now…number three will drive you mad!

Dark Mode — "Toto, I've a feeling we're not in Kansas anymore"

The Wizard of Oz is the Ur movie of annually anticipated movies. The first time I watched it at age 6 we went to the neighbors because they had a color TV and we could enjoy the transition from black-and-white Kansas to living-color Munchkinland. Although a favorite of children and having a happy ending, its dark scenes — scary forests, flying monkeys, the Wicked Witch of the West — scared a lot of young fans. But now, Bitfusion has a dark side, too…

GUIs with light and dark modes, respectively, display dark text on a light background or light text on a dark background. Internet sites disagree about GUI dark mode. Save energy! Reduce eye strain in low light! Passing fad! But vCenter has it and with release 4.0.0, the Bitfusion plugin supports it.

These screen shots are all the explanation you need to form your own opinion, go online, and argue with the opposing tribes.

Dark mode and light mode for Bitfusion plug-in

Bitfusion plug-in themes. From left to right: dark mode, light mode, and  the user menu to switch themes.

Filters — "Howard has had discussions with Leonard Bernstein about the possibility of conducting an avalanche…in E flat"

1972 brought a late entry to the category of annually broadcast movies. But it was brilliant. What's Up, Doc? delivered a screwball comedy with odd characters, clever dialog, and chaos that built and built to a fantastic, hilarious resolution. This is incredibly difficult to do. Every choice requires careful filtering. It has to proceed with the most careful contradictions: logical, yet impossible, believable, yet ridiculous. If you credit the construction, you'll covet the collapse.  Now, Bitfusion allows you to filter your GPU choices…

 

1972 Screwball comedy, What's Up, Doc?

Comedy gold or deadly dross? You can filter the funny from the flop by how organically the chaos is set up

Many customers ask me how to allocate particular GPU types with a Bitfusion run command.  Until this release I told them to populate servers with one GPU family and then use the -l switch to supply a list of the servers with the desired family. Bitfusion 4.0.0 now has a direct and expanded filtering capability. You can start applications on GPUs and servers with specific properties.

Here is a command to allocate one GPU and start an application, but not yet using filters (this is a fictional ML app for example purposes).

bitfusion run -n 1 -- python3 hello_gpu.py --meaning_lf 42

 

Now, let us introduce a filter so you will only run "hello_gpu.py on GPUs belonging to the T4 family.

bitfusion run -n 1 --filter 'device.name=T4' -- python3 hello_gpu.py --meaning_lf 42

 

With this filter we limit our search to Bitfusion servers with the substring, "Turing" in their hostname.

bitfusion run -n 1 --filter 'server.hostname=Turing' -- python3 hello_gpu.py --meaning_lf 42

 

To filter for a minimum amount of GPU framebuffer memory (in MB):

bitfusion run -n 1 --filter 'device.memory>=16384' -- python3 hello_gpu.py --meaning_lf 42

 

You can filter for servers that have RDMA capability.  And you can filter for more than one thing at a time.

bitfusion run -n 1 --filter 'server.has-rdma=true' --filter 'device.name=Tesla A100' -- python3 hello_gpu.py --meaning_lf 42

 

I recommend keeping the filter value between quotes so you don't have to chase down errors where the shell is processing spaces or special characters.

You can also filter on device capability and index number and you can filter on server address, CUDA version and driver version. The 4.0.0 User Guide gives complete filter details.

Kubernetes Secrets — "It's buried under a big 'W'"

More than 20 big-name comedians and major stars appeared in It's a Mad, Mad, Mad, Mad World.  The cast count reached 50 even before you got to folks who didn't make the credits (including Jack Benny!). As a group of strangers help a man in a hurry who drove his car off a cliff, he tells them of some buried money he was heading for, just before he kicks the bucket. More and more of the ensemble cast is drawn into the secret, the mayhem, and the chase for the cash. My brother and I, holed up in our room watching it while my parents hosted a party one evening, made some money of our own off this movie. Several party guests, men, burst in and paid us to turn the channel for the end of a basketball game. We failed keeping this secret from my parents. Now, Bitfusion coughs up its secrets to Kubernetes and smooths its orchestration…

Kubernetes allows you pass "secrets", objects such as passwords or tokens, into a runtime environment without having to openly put this information into a pod specification or build it into a container image. Bitfusion clients need a token to successfully access GPU resources on Bitfusion servers. The 4.0.0 release can now publish its tokens as Kubernetes secrets.

Enabling the pods to access Bitfusion servers is easy work (assuming you have already set up a working Kubernetes cluster — check out the vSphere with Tanzu Quick Start Guide).

You can find the complete "Kubernetes secrets" instructions in the last half of the Enabling Client page of the Bitfusion Installation Guide.

If you are new to Kubernetes, however, the following diagram might make these instructions clearer. 

How Bitfusion creates tokens as a Kubernetes secret

The process to create a Bitfusion token as a Kubernetes secret

The left box (orange) is the machine the administrator uses to run the Kubernetes cluster (e.g. running kubectl commands). This machine has a Kubeconfig file specifying the Kubernetes cluster's URL and crypto string.  This file is needed by vCenter Server to publish secrets on the cluster. Once you browse to vCenter Server from the administrators machine, navigate to the "Tokens" tab of the Bitfusion plug-in.  From there, you can upload the Kubeconfig file and go through all other steps to create and publish the token.  Again, these steps are detailed in the Bitfusion Installation Guide. Additionally, you have to edit a pods yaml file to map to the secrets you have just published. The mapping is also described in the Installation Guide.

Data Retention Policy — "I'm half-horse, half-alligator, a little touched with the snapping turtle"

The full quote. brag, from Davy Crockett: King of the Wild Frontier just gets better. This movie was often broadcast in three parts, one hour each Sunday evening on reruns of The Wonderful World of Disney. Certainly a hagiography of the American folk hero, it was a fun one; and a boy needs heroes. With the final, fatal last stand at the Alamo, it instilled in my psyche the importance of bravery, of retaining your principles even at the cost of your life (at age seven this is a pretty casual commitment). Now, Bitfusion retains its GPU usage and allocation data under policies you dictate, at an acceptable cost…

[OMINOUS MUSIC] As clients allocate GPUs, access its memory, use its cores, Bitfusion is watching. Bitfusion is recording.  Bitfusion remembers [MUSIC SCREECHES TO A HALT]  Well, Bitfusion servers have a database of metrics and usage records; that is what it is supposed to do. However, storage can fill up, so Bitfusion throws old data away to make room for new metrics. Bitfusion also keeps summarized data that consume less space than full detailed records. With release 4.0.0, Bitfusion gives you control over the retention policies. You can:

  • specify the days the full, detailed data is kept
  • specify the days the summary data is kept

These settings are shown below.

Data retention policy settings

Bitfusion data retention policy settings

  • When an event occurs, both detailed and summary database entries are created. The countdown to deletion day starts immediately for both entries; they run concurrently.
  • The Summary Usage Data checkbox places a retention limit on summary data.  If unchecked the summary data is never deleted.
  • Perhaps obviously, we recommend keeping the summarized data for more days than the detailed data

Monitoring Plug-ins — "I can't fly. I haven't got my wings."

As a Christmas movie It's a Wonderful Life is the one movie that is still broadcast once a year. And I watch it almost every year. A despondent George Bailey, never able to escape small town duties and family to achieve the dreams and adventures he'd had since childhood, crushed by the incompetence of fellows and the perfidy of a scheming, rich and powerful enemy, facing bankruptcy, scandal and prison, he contemplates suicide. He is rescued by an angel who has monitored his whole life and shows him how much worse the world would have been without his kindness, goodness, and integrity. Plus Christmas. Now, Bitfusion makes it easy to monitor its servers. Will it make your life wonderful…?

There is a Linux monitoring-plugins package we now pre-install on Bitfusion servers. It contains more than fifty standard plugins for Icinga, Naemon, Nagios, Shinken, Sensu, and other monitoring applications. Each plugin is a stand-alone command line tool that provides a specific type of check. Typically, your monitoring software runs these plugins to determine the current status of hosts and services on your network. Some of the provided plugins let you check local system metrics such as load averages, processes, or disk space usage.

To be able to use these monitoring tools, you must log into each Bitfusion server and set up a password for the "monitoring" user account.

export bfm_ip=[IP of BFM VM]
ssh customer@$bfm_ip
sudo passwd monitoring
# Now enter password as prompted
exit

Next, from the account/machine that you use for monitoring,  copy your public key into the monitoring user's authorized_key file on the Bitfusion server.

scp id_rsa.pub monitoring@$bfm_ip:~/.ssh/authorized_keys

And now you can run a quick test. On the machine you use for monitoring, install the monitoring plugins and run a Nagios command to check disk space on the Bitfusion server.

# On Ubuntu 20.04
# Install Nagios
sudo apt install -y monitoring-plugins
  
# Now you can run test
/usr/lib/nagios/plugins/check_by_ssh  -H $bfm_ip -l monitoring -C '/usr/libexec/check_disk --units GB --critical 15 -p /'
DISK OK - free space: / 36 GB (78% inode=98%);| /=10GB;;34;0;49

Supported Versions — "As God is my witness, I'll never be hungry again"

This may be the movie that defined the word, epic. Gone with the Wind spanned history and geography, love and war, history and family, comedy and tragedy. Complicated characters, not truly heroes, not truly villains, struggle through the Civil War in the American South. What can they and what should they support as everything is swept away?  Now, Bitfusion updates its support matrix, but just with deprecation notices…

Bitfusion 4.0.0 removes support for Ubuntu 16.04.  It is gone with the wind.

Bitfusion, in a coming release, will deprecate support for clients running Bitfusion 2.x.x. Soon it will be gone, too.

Bitfusion continues support for CUDA 11.2.2, cuDNN 8.1.1, and NCCL 2.8.4.

Conclusion — "You didn't forget me, did you?"

Were I to include a seventh movie in my list it would be Heidi (1968). Perhaps not an all-time great, but it is a wonderful story and had a great cast (and John Williams did the music).  It was the first movie I let my older sister talk me into watching even though I thought it would be stupid and it ended up teaching me there were marvelous things to enjoy that I hadn't imagined before. She demonstrated such facility for me more than once. Of course, this was the same sister who prevented me from watching Jaws and Psycho, so her record wasn't perfect. Now, a quick recap of Bitfusion's new facilities…

I'm most happy about the new filtering capabilities, having anticipated them for so long, but Kubernetes secrets, the monitoring, and the retention policy management are all excellent updates.  And with the new theme, there's no longer reason to fear the dark.

 

 

Filter Tags

AI/ML Application Acceleration Modern Applications vSphere vSphere 7 Bitfusion Hardware Acceleration Blog Announcement Feature Walk-through Intermediate

James Brogan

Read More from the Author

James Brogan started his career as an ASIC designer on main-frame computers. Since then, he has worked at multiple software and silicon startups in engineering and field roles on products such as virtual system prototypes, multi-core processors, network processors, and software analysis and optimization. He came to VMware with the acquisition of Bitfusion where he brought GPU virtualization solutions to customers and homemade chocolate chip cookies to co-workers. He continues working with AI/ML acceleration and virtualization at VMware as a Sr. Technical Marketing Architect. He dreams of getting past the beginner-level on guitar, and enjoys reading and public speaking.