Continuing on from last week’s challenge, Candra Mcrae set this #WOW2021 challenge to introduce another new feature of v2020.4 (so you’re going to need this version as a minimum to complete this challenge). Predictive modelling isn’t something I use at all, so I was quite reliant on the help documentation Candra referenced. I’m also not overly familiar with the various different models, and when they should be used, so this isn’t something I’m likely to use personally, without careful consideration and thought – it’s worth reiterating Candra’s warning :
Viz Responsibly: While predictive analysis in Tableau has never been easier, we encourage you to be intentional in how you use this powerful feature because not doing so (e.g., selecting a model type not appropriate for your data) could lead to inaccurate results.
The focus of this blog will be
- Building the main chart
- Creating the tooltips
- Determining the data for the title
- Building the measure selector
- Adding the measure selector interactivity
Building the main chart
The data provided has some records related to years before 1993, but the requirement is just to use data from 1993 onwards, so the first thing I did was to set a data source filter (right click on data source -> Edit Data Source Filters) to restrict the whole data source to Year 1 >=1993
Next I created the measures we need to display
[Total enrollment 2 All students]
simply references the existing measure.
% Black Students
SUM([Total enrollment 2 Black students]) / SUM([Total Enrollment])
% Non-Black Students
1- [% Black Students]
These 3 measures need to be displayed on a single chart where the actual measure displayed is determined by user selection. This selection will be driven by the use of a parameter, that will be set by a parameter action. For now, we just need to establish the parameter.
This is simply a string parameter which will store the value of Total Enrollment by default.
Using this parameter, we can now decide which value we’re going to plot on the chart
WHEN ‘Total Enrollment’ THEN SUM([Total Enrollment])
WHEN ‘% Black Students’ THEN [% Black Students]
ELSE [% Non-Black Students]
Add this to a view by placing
- Year 1 as continuous (green) pill on Columns
- Actual Value on Rows
and you get the basic shape, although it’s not as ‘peaky’. This is resolved by editing the Actual Value axis (right click axis -> edit) and unchecking the Include zero checkbox.
Now change the text in the pSelect_Measure input parameter that’s displayed to % Black Students, and the chart will change. Verify it changes with the text % Non-Black Students too.
In reading through the Tableau KB documentation about Predictive Modeling Functions in Time Series Visualizations, I came to learn of the feature to Extend Date Range, something I’ve never come across before, and I’m not sure what version it first appeared in. Anyway, for this, you need to be working with a field which is a date datatype. The Year 1 field provided is an int.
I’m not entirely sure what I’ve done next is the optimum method, but it worked for me… some of it involved a bit trial and error when I came to defining and invoking the modelling feature later on. In writing this up, I’m essentially helping you to avoid the ‘back and forth’ steps I took in getting it all to work.
Anyway, I needed a date field to represent the year
This resolves to a field containing 1st Jan YYYY for each Year in the data set.
Replace Year 1 on the chart with this field, and changing it to the ‘continuous’ instance of the date by choosing the second ‘Year’ option from the context menu
This changes the pill to green and then the option to Extend Date Range is visible on the menu. Set this to 5 years using the custom option
After doing this you’ll get a 5 nulls indicator displayed, which is basically saying there’s some years without any data. This what we expect for now.
Now onto the modelling part. We need a new calculated field to contain the predicted value using the Gaussian process regression.
MODEL_QUANTILE(‘model=gp’, 0.5,[Actual Value],ATTR(DATETRUNC(‘year’,[Year])))
again, this took a bit of trial and error to get the date field in the required format.
Add this to the Rows and again edit the axis to ‘exclude zero’. You should now see the data extending for the predicted value beyond 2018. You can now hide the 5 nulls indicator (right click -> hide indicator)
You can now combine to be a dual axis chart (don’t forget to synchronise axis), and apply the relevant formatting to change the marks, apply relevant colours, hide axis & gridlines etc. Note, I set the area chart to 25% opacity, rather than the 75% stated in the requirement, as this gave me the colour most similar to the solution.
Creating the tooltips
Hovering over the chart, the tooltips display the Actual, Predicted and Residual (Actual-Predicted) value for each point. But depending on the measure selected, the format differs – Total Enrollment is in K and the others are in %.
We can’t use the number formatting feature of a field to resolve this, so we need to be a bit more creative. I confess I started creating individual fields for each measure (Actual, Predicted, Residual) based on the measure type selected (Total Enrollment, % Black Students, % Non-Black Students), but this meant I was creating loads of calculated fields, which just seemed a bit unnecessary.
So after a bit of a think, I realised there was a better way.
First up, let’s get our residual
[Actual Value]-[Predicted Value]
Now, if we put the values in a tabular form, you can see what the precision of the values are depending on the measure stated
We need to format appropriately. Both displays required the values to be formatted to 1 decimal place, and both have a suffix, either a K or a %.
To get the value in the required display format
Tooltip – Actual Value
IF [pSelect_Measure]=’Total Enrollment’ THEN [Actual Value]/1000
ELSE [Actual Value] * 100
Format this to 1 dp.
Create a similar field for the predicted value, which should also be formatted to 1 dp.
Tooltip – Predicted Value
IF [pSelect_Measure] = ‘Total Enrollment’ THEN [Predicted Value]/1000
ELSE [Predicted Value] * 100
And finally Tooltip – Residual
[Tooltip – Actual Value] – [Tooltip – Predicted Value]
This needs to be custom formatted to +#,##0.0;-#,##0.0 which ensures a + symbol is always displayed for positive values.
Pop these onto the tabular display we built earlier, and you can see the values are now displaying in the way we need
Finally we need to create a field to store the required suffix
Tooltip – Value Suffix
IF [pSelect_Measure] = ‘Total Enrollment’ THEN ‘K’
We can now add these 4 fields onto the Tooltip shelf of the ‘All’ marks card, and create the tooltip as required
Determining the data for the title
As we need to only use 2 sheets to build this solution, and 1 sheet will be required for the measure selection, we have to incorporate the summary data displayed at the top of the dashboard as part of the title of the chart viz.
In the title, we need to display the following :
- The latest year in the provided data set (ie 2018)
- The latest year including the extended date range (ie 2023 – 5 years later)
- The actual value from 2018 based on the selected measure
- The predicted value from 2023 based on the selected measure
- An indicator to show whether the values were likely to increase or decrease
The requirement was to ensure there was no ‘hardcoding’. And as we’re working on getting information related to a specific row (ie year) in a set of data that consists of multiple rows (years), then we’re going to need to make use of table calculations for this.
Let’s start with the ‘easier’ bits first. We want the Year of the latest in the actual data set, and we want this value to be essentially stored against every row in the data
This returns the value of 2018.
Latest Year + 5
This simply adds 5 years to the Latest Year, so returns 2023.
Now when I’m using table calculations, I often prefer to see what the data is doing in the table itself, so I can be sure I’m doing things correctly. With the ‘extended year’ stuff, it’s a bit fiddly creating the table from scratch, so I simply started by duplicating the chart sheet ‘as crosstab’ (right click on the sheet name tab, -> Duplicate as Crosstab). Rearrange the fields so Measure Names is on Columns and Year is on Rows and the ‘Tooltip’ named fields are visible. Add Latest Year and Latest Year+5 to Rows, and you can see how these fields show the same value against every row.
Now, remove these fields, as by adding them, we’ve lost the additional ‘extended dates’ rows (ie the ‘fake’ rows that don’t actually exist in the data). Ok, so now we want to get the Actual Value associated to 2018, but then perpetuate this across every row in the data.
Latest Year – Actual
WINDOW_MAX(IF MIN([Year]) = MIN([Latest Year]) THEN [Tooltip – Actual Value] END)
If the Year is the same as Latest Year, then display the value from the Tooltip – Actual Value field. The WINDOW_MAX table calc, then spreads this same value across all rows displayed. Format to 1dp and add this to the table.
We need to do something similar to get the Predicted Value for 2023
Latest Year +5 – Predicted
WINDOW_MAX(IF LAST()=0 THEN [Tooltip – Predicted Value] END)
If we’re at the last row in the data, then display the value from the Tooltip – Predicted Value field. Again the WINDOW_MAX spreads the value across all the rows. Set this to 1 dp and add to the table.
And now we just need to get the increase/decrease indicator
Increase | Decrease
IF ([Latest Year – Actual])-[Latest Year +5 – Predicted]>0 THEN ‘decrease’ ELSE ‘increase’ END
So now we know we’ve got the correct values we need, we can add these fields to the Detail shelf of the chart sheet, so we can reference them in the Title of the chart.
We also need the Latest Year and Latest Year +5 fields added to the Detail shelf, but when you add these, you’ll notice that you lose the ‘extended years’. You can fix this by wrapping the fields in an ATTR function. Double click on the field, which will allow you to ‘type in’ to the field.
You should now be able to create the text in the chart title
Building the measure selector
Phew! Are you still with me… there’s a fair bit going on already, and now we’ve got to build the chart that will drive the user selection.
On a separate sheet, add
- Measure Names to Rows
- Measure Values to Detail
- Measure Names to Text
- Measure Names to Filter, and restricted to the 3 original measures – Total Enrollment, % Black Students, %Non-Black Students
Uncheck Show Header on the pill on the Rows, then format
- Set background colour of the pane to a navy blue
- Set row & column borders to be white
- Set the text label to be white text, centred, and increase the font
- Turn off the tooltip
Adding the measure selector interactivity
Create the dashboard and add both the charts. To add the interactivity so that on click of a row in the Measure Selection sheet, it changes the measure being displayed, we need to add a dashboard action, that changes a parameter (Dashboard menu -> Actions -> Add Action -> Change Parameter). Set the action to run on Select when the Measure Select sheet is clicked. The target parameter is pSelect_Measure and the Measure Names field should be passed into this.
And with all that, you should hopefully now have a working solution. My published viz is here (note, my Measure Selection sheet is slightly different from what I’ve described above). The above is a bit simpler I think.
Happy vizzin’! Stay Safe!