Can you performance tune this workbook?

This week’s #WOW2021 challenge was a bit different. Rather than being challenged to recreate a viz, Candra challenged us to performance tune a workbook she’d put together, with the aim for all steps to be sub 0.5s.

I started by downloading the slow published workbook (waiting patiently for it to download), opening the workbook ( waiting patiently for it to open), and then assessing how it had been built.

It took a while to download… it was 99MB…

When it finally opened, I could see

  • The workbook had multiple data sources, which were being blended
  • There were cross data source filters being applied
  • There were lots of fields in the data source that weren’t being used
  • One of the data sources contained 2.2 million rows

Tackling these points initially was key.

The requirements stated that the end user only cares about OB/GYNS (specifically Pri Spec = OBSTETRICS/GYNECOLOGY), so the hint here was we’re allowed to restrict the size of the data source.

The requirements also stated any data prep/transformation had to happen in Tableau Desktop, hinting that some re-modelling would be required.

This means you do need to download the original data source files (which also took some time). My aim was to create a single data source, that would be restricted to the OB/GYNS data only, that I could then repoint the existing report to.

I created a new connection within the ‘slow’ workbook by connecting to the two files and relating them – the relationship of St to St should happen automagically 🙂

I then added Data Source Filters (right click on the data source > Edit Data Source Filters), and set the filters to restrict to the relevant Pri Spec and exclude certain states (which I found was applied when looking at the data source filters applied to one of the original data sources).

Next I repointed the workbook to use this new data source : right click on the original DAC_NationalDownloadableFile data source > Replace Data Source, and repoint to the DAC_NationalDownloadableFile.csv+ (Multiple Connections) data source

This caused an issue with the NPI field which was a string in the original and a number in the replacement.

Resolve this by right clicking on the NPI (1) field and select Replace References and point to the NPI field. This will then fix the issue with the Providers per Capita field too.

On the Providers per Capita sheet, remove both the pills from the Filter shelf and replace the Population field on the Tooltip shelf that is from the secondary data source (since there is an orange tick against it), with the Population field from the newly created data source (literally drag it and drop it onto the existing pill so it replaces it). You’ll need to format the Population field to millions with 1 dp.

Edit the Providers per Capita measure from

round(COUNTD([NPI])/(SUM([Sheet1 (State Regions)].[Population])/100000),0)

to

COUNTD([NPI])/(SUM([Population])/100000)

This pill is no longer referencing data from another data source. The ROUND isn’t necessary as this can be managed with the number formatting (so its one less calculation to be done and shaves a teeeeeny bit of time).

You can now remove the original 2 data sources (right click data source and close).

Now extract the remaining data source – right click > Extract Data. The data source filters you applied, will automatically be set to be extract filters. Also set the extract to Hide Unused Fields. The extract filters will reduce the number of rows in the data source; the hide unused fields will reduce the number of columns, all of which optimises the data source as much as it can be.

Change to the dashboard sheet now.

Remove the Primary Speciality filter as its no longer relevant (contains 1 value).

I chose to change the City filter to ‘All values in database‘ and to be a Wildcard Match input, so the user types in rather than selecting from a list. Using fields that have a high number of values as a filter and displaying all the possible values in the filter can impact performance. A separate query needs to be run to populate the list. Using ‘relevant values’ again causes a query to happen each time other selections are made with other filters/actions. This can also add overhead.

For the same reasoning, I also changed the NPI filter in the same way,

I also changed the Gender filter to be a Single Value List (ie radio button) entry, set to Include rather than Exclude values.

With these changes I ran the performance recording (Help menu -> settings and performance -> start performance recording), followed the required steps (select Texas, deselect Texas, set Gender to F, reset back to All), then stopped the performance recording ( Help menu -> settings and performance -> stop performance recording). This will automatically load a new workbook with the results.

The best I got to on my personal laptop is below – not the 0.5s suggested in the challenge, but a lot better than it was.

Running the same process against the same workbook on my more powerful work laptop, I got :

which suggest hardware does help 🙂

After a bit of discussion with another #WOW participant, it was suggested reordering the columns in the table (just reverse the order) should help. So I tried this, and shaved a bit more time (this is personal laptop, so compare to top image above).

I’m interested in seeing what other solutions and suggestions get shared and what Candra came up with herself.

Happy vizzin’! Stay Safe!

Donna

Leave a comment