Providing machine readable VMware release data

Update [2022-10-17] The VMware APIs for the KnowledgeBase have introduced a breaking change. This means the releases can no longer be updated without further development.

I started a project with the goal to re-format VMware product release information in a way that can be used in automation and scripting. These days, even Excel can handle JSON. When searching for new projects I often need some external trigger to take the first step. In this case, a customer I have been working gave me the reason to do something (thanks, Tim!).

TL, DR:

Code repo: Transform VMware product builds
Output repo: Machine-readable VMware release data

Motivation

VMware publishes a KB article titled Correlating build numbers and versions of VMware products (1014508) which links to a set of pages that contain release information for various products. However, the information offered is just an HTML table and not well suited for automation.
I saw this project as a way to extend on my Python practice. However, after finishing the bulk of the project I discovered that Florian over at virten not only has a great blog but does also a very good job of providing a set of JSON files that contain release information, especially for ESXi. This project is not meant to be a copycat and I hope the additional benefit is trying to cover all products in the overview KB to target a broader audience.

The workflow

In the primary repository Transform VMware product builds the data from the master KB (1014508) is parsed.
For each KB article listed in the “resolution” table an object holding the key data is created.
Finally, the output to JSON in various data orientation options is processed.
A scheduled GitHub Action will execute the code once a day and push the output to a second repository Machine-readable VMware release data.

Output format and folder structures

The output is structured like this:

- Directory: based on Pandas options to handle [json data orientation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_json.html)
    - Files: KB(a)_(b)_table(c)_release_as_(d)"
       - a: knowledge base article id - the unique ID for the KB article
       - b: product name - The first product name from the meta data, all in lower case and spaces replaced by underscores
       - c: An id to identify multiple html tables on the section (starting at 0)
       - d: json data orientation - see above

Example

This is an excerpt from a json that uses the build number as a key:

    "17551050": {
    "Version": "ESXi 7.0 Update 1d",
    "Release Name": "ESXi 7.0 Update 1d",
    "Release Date": "2021-02-04T00:00:00.000Z",
    "Installer Build Number": null
    }

In case you are wondering why I do not handle the build number as a number, JSON mandates strings to be used for the key.

Hurdles

As always, the most difficult part is dealing with exceptions. I started testing with three KB articles but found later on that my model wouldn’t fit every product release. Information, headings and the data formatting varied. There are still gaps that need to be taken care of, the biggest one is the complexity of the VCF release table.

Takeaway

It’s been a small fun project that may hopefully save someone a bit of time and effort. At least I had some solid learnings when dealing with data in Pandas. As always, GitHub action has been a good friend in setting up automation very quickly.