Can you predict the future?

February 4, 2021 Donna ColesLeave a comment

Continuing on from last week’s challenge, Candra Mcrae set this #WOW2021 challenge to introduce another new feature of v2020.4 (so you’re going to need this version as a minimum to complete this challenge). Predictive modelling isn’t something I use at all, so I was quite reliant on the help documentation Candra referenced. I’m also not overly familiar with the various different models, and when they should be used, so this isn’t something I’m likely to use personally, without careful consideration and thought – it’s worth reiterating Candra’s warning :

Viz Responsibly: While predictive analysis in Tableau has never been easier, we encourage you to be intentional in how you use this powerful feature because not doing so (e.g., selecting a model type not appropriate for your data) could lead to inaccurate results.

The focus of this blog will be

Building the main chart
Creating the tooltips
Determining the data for the title
Building the measure selector
Adding the measure selector interactivity

Building the main chart

The data provided has some records related to years before 1993, but the requirement is just to use data from 1993 onwards, so the first thing I did was to set a data source filter (right click on data source -> Edit Data Source Filters) to restrict the whole data source to Year 1 >=1993

Next I created the measures we need to display

Total Enrollment

[Total enrollment 2 All students]

simply references the existing measure.

% Black Students

SUM([Total enrollment 2 Black students]) / SUM([Total Enrollment])

% Non-Black Students

1- [% Black Students]

These 3 measures need to be displayed on a single chart where the actual measure displayed is determined by user selection. This selection will be driven by the use of a parameter, that will be set by a parameter action. For now, we just need to establish the parameter.

pSelect_Measure

This is simply a string parameter which will store the value of Total Enrollment by default.

Using this parameter, we can now decide which value we’re going to plot on the chart

Actual Value

CASE [pSelect_Measure]
WHEN ‘Total Enrollment’ THEN SUM([Total Enrollment])
WHEN ‘% Black Students’ THEN [% Black Students]
ELSE [% Non-Black Students]
END

Add this to a view by placing

Year 1 as continuous (green) pill on Columns
Actual Value on Rows

and you get the basic shape, although it’s not as ‘peaky’. This is resolved by editing the Actual Value axis (right click axis -> edit) and unchecking the Include zero checkbox.

Now change the text in the pSelect_Measure input parameter that’s displayed to % Black Students, and the chart will change. Verify it changes with the text % Non-Black Students too.

In reading through the Tableau KB documentation about Predictive Modeling Functions in Time Series Visualizations, I came to learn of the feature to Extend Date Range, something I’ve never come across before, and I’m not sure what version it first appeared in. Anyway, for this, you need to be working with a field which is a date datatype. The Year 1 field provided is an int.

I’m not entirely sure what I’ve done next is the optimum method, but it worked for me… some of it involved a bit trial and error when I came to defining and invoking the modelling feature later on. In writing this up, I’m essentially helping you to avoid the ‘back and forth’ steps I took in getting it all to work.

Anyway, I needed a date field to represent the year

Year

MAKEDATE([Year 1],1,1)

This resolves to a field containing 1st Jan YYYY for each Year in the data set.

Replace Year 1 on the chart with this field, and changing it to the ‘continuous’ instance of the date by choosing the second ‘Year’ option from the context menu

This changes the pill to green and then the option to Extend Date Range is visible on the menu. Set this to 5 years using the custom option

After doing this you’ll get a 5 nulls indicator displayed, which is basically saying there’s some years without any data. This what we expect for now.

Now onto the modelling part. We need a new calculated field to contain the predicted value using the Gaussian process regression.

Predicted Value

MODEL_QUANTILE(‘model=gp’, 0.5,[Actual Value],ATTR(DATETRUNC(‘year’,[Year])))

again, this took a bit of trial and error to get the date field in the required format.

Add this to the Rows and again edit the axis to ‘exclude zero’. You should now see the data extending for the predicted value beyond 2018. You can now hide the 5 nulls indicator (right click -> hide indicator)

You can now combine to be a dual axis chart (don’t forget to synchronise axis), and apply the relevant formatting to change the marks, apply relevant colours, hide axis & gridlines etc. Note, I set the area chart to 25% opacity, rather than the 75% stated in the requirement, as this gave me the colour most similar to the solution.

Creating the tooltips

Hovering over the chart, the tooltips display the Actual, Predicted and Residual (Actual-Predicted) value for each point. But depending on the measure selected, the format differs – Total Enrollment is in K and the others are in %.

We can’t use the number formatting feature of a field to resolve this, so we need to be a bit more creative. I confess I started creating individual fields for each measure (Actual, Predicted, Residual) based on the measure type selected (Total Enrollment, % Black Students, % Non-Black Students), but this meant I was creating loads of calculated fields, which just seemed a bit unnecessary.

So after a bit of a think, I realised there was a better way.

First up, let’s get our residual

Prediction Residual

[Actual Value]-[Predicted Value]

Now, if we put the values in a tabular form, you can see what the precision of the values are depending on the measure stated

We need to format appropriately. Both displays required the values to be formatted to 1 decimal place, and both have a suffix, either a K or a %.

To get the value in the required display format

Tooltip – Actual Value

IF [pSelect_Measure]=’Total Enrollment’ THEN [Actual Value]/1000
ELSE [Actual Value] * 100
END

Format this to 1 dp.

Create a similar field for the predicted value, which should also be formatted to 1 dp.

Tooltip – Predicted Value

IF [pSelect_Measure] = ‘Total Enrollment’ THEN [Predicted Value]/1000
ELSE [Predicted Value] * 100
END

And finally Tooltip – Residual

[Tooltip – Actual Value] – [Tooltip – Predicted Value]

This needs to be custom formatted to +#,##0.0;-#,##0.0 which ensures a + symbol is always displayed for positive values.

Pop these onto the tabular display we built earlier, and you can see the values are now displaying in the way we need

Finally we need to create a field to store the required suffix

Tooltip – Value Suffix

IF [pSelect_Measure] = ‘Total Enrollment’ THEN ‘K’
ELSE ‘%’
END

We can now add these 4 fields onto the Tooltip shelf of the ‘All’ marks card, and create the tooltip as required

Determining the data for the title

As we need to only use 2 sheets to build this solution, and 1 sheet will be required for the measure selection, we have to incorporate the summary data displayed at the top of the dashboard as part of the title of the chart viz.

In the title, we need to display the following :

The latest year in the provided data set (ie 2018)
The latest year including the extended date range (ie 2023 – 5 years later)
The actual value from 2018 based on the selected measure
The predicted value from 2023 based on the selected measure
An indicator to show whether the values were likely to increase or decrease

The requirement was to ensure there was no ‘hardcoding’. And as we’re working on getting information related to a specific row (ie year) in a set of data that consists of multiple rows (years), then we’re going to need to make use of table calculations for this.

Let’s start with the ‘easier’ bits first. We want the Year of the latest in the actual data set, and we want this value to be essentially stored against every row in the data

Latest Year

{FIXED: MAX([Year])}

This returns the value of 2018.

Latest Year + 5

DATE(DATEADD(‘year’,5,[Latest Year]))

This simply adds 5 years to the Latest Year, so returns 2023.

Now when I’m using table calculations, I often prefer to see what the data is doing in the table itself, so I can be sure I’m doing things correctly. With the ‘extended year’ stuff, it’s a bit fiddly creating the table from scratch, so I simply started by duplicating the chart sheet ‘as crosstab’ (right click on the sheet name tab, -> Duplicate as Crosstab). Rearrange the fields so Measure Names is on Columns and Year is on Rows and the ‘Tooltip’ named fields are visible. Add Latest Year and Latest Year+5 to Rows, and you can see how these fields show the same value against every row.

Now, remove these fields, as by adding them, we’ve lost the additional ‘extended dates’ rows (ie the ‘fake’ rows that don’t actually exist in the data). Ok, so now we want to get the Actual Value associated to 2018, but then perpetuate this across every row in the data.

Latest Year – Actual

WINDOW_MAX(IF MIN([Year]) = MIN([Latest Year]) THEN [Tooltip – Actual Value] END)

If the Year is the same as Latest Year, then display the value from the Tooltip – Actual Value field. The WINDOW_MAX table calc, then spreads this same value across all rows displayed. Format to 1dp and add this to the table.

We need to do something similar to get the Predicted Value for 2023

Latest Year +5 – Predicted

WINDOW_MAX(IF LAST()=0 THEN [Tooltip – Predicted Value] END)

If we’re at the last row in the data, then display the value from the Tooltip – Predicted Value field. Again the WINDOW_MAX spreads the value across all the rows. Set this to 1 dp and add to the table.

And now we just need to get the increase/decrease indicator

Increase | Decrease

IF ([Latest Year – Actual])-[Latest Year +5 – Predicted]>0 THEN ‘decrease’ ELSE ‘increase’ END

So now we know we’ve got the correct values we need, we can add these fields to the Detail shelf of the chart sheet, so we can reference them in the Title of the chart.

We also need the Latest Year and Latest Year +5 fields added to the Detail shelf, but when you add these, you’ll notice that you lose the ‘extended years’. You can fix this by wrapping the fields in an ATTR function. Double click on the field, which will allow you to ‘type in’ to the field.

You should now be able to create the text in the chart title

Building the measure selector

Phew! Are you still with me… there’s a fair bit going on already, and now we’ve got to build the chart that will drive the user selection.

On a separate sheet, add

Measure Names to Rows
Measure Values to Detail
Measure Names to Text
Measure Names to Filter, and restricted to the 3 original measures – Total Enrollment, % Black Students, %Non-Black Students

Uncheck Show Header on the pill on the Rows, then format

Set background colour of the pane to a navy blue
Set row & column borders to be white
Set the text label to be white text, centred, and increase the font
Turn off the tooltip

Adding the measure selector interactivity

Create the dashboard and add both the charts. To add the interactivity so that on click of a row in the Measure Selection sheet, it changes the measure being displayed, we need to add a dashboard action, that changes a parameter (Dashboard menu -> Actions -> Add Action -> Change Parameter). Set the action to run on Select when the Measure Select sheet is clicked. The target parameter is pSelect_Measure and the Measure Names field should be passed into this.

And with all that, you should hopefully now have a working solution. My published viz is here (note, my Measure Selection sheet is slightly different from what I’ve described above). The above is a bit simpler I think.

Happy vizzin’! Stay Safe!

Donna

Can you find the variance along a line?

January 8, 2021 Donna ColesLeave a comment

So after a couple of weeks off blogging due to Christmas, I’m back providing solutions for #WOW2021! New contributor Candra Mcrae started off the year with this gentle workout focussing on table calculations. For those who have followed my previous blogged solutions, you’ll probably recall that table calcs don’t phase me as much as they do some – I started using Tableau before LODs existed, so they were the only tool at my disposal in the past. That said, I’m still not such an expert that I get it right first time – there’s often plenty of trial and error as I choose which fields I want to compute by, although I’ve got much better at this since I read Andy Kriebel’s Table Calculations Overview a couple of years ago.

As with any challenge that involves table calcs, I tend to start by building out a tabular view of all the data I’m going to need to build the viz. This ensures I can validate the data much easier and set the table calculation settings to what I need.

So let’s start….

For the line chart itself, we’re simply going to be plotting Year against Food insecurity which has been formatted to 1 decimal place and displays the % as a suffix

We will also need to capture 2 other values which will represent the coloured circles. Parameters are needed to help identify what these circles will be.

pSelected Year is an integer parameter that can be built by right clicking on Year and selecting Create -> Parameter. This will populate the parameter with all the values of the Year field. Default the value to 2018 and adjust the formatting of the displayed value so that it is 0 decimal places and include thousand separators is unchecked.

With this parameter, we can capture the value associated which is represented by the pink dot in the viz

Selected Year %

IF [Year] = [pSelected Year] THEN [Food insecurity (includes low and very low food security) Percent of households] END

It’s a bit more work to identify the black dot, as this will vary based on another parameter

pComparison

I created this an integer parameter storing the values 0, 1 & 2 (defaulted to 2), but aliased for display as First Year, Most Recent Year, Previous Year. I’ll be using this is an case statement/if clause later and comparing integers to strings is much more efficient in Tableau.

I like to take things step by step, especially when there’s table calcs involved, to ensure all the values I’m referencing are correct, so rather than identifying the selected value in a single calculated field, I’m using multiple.

Firstly I want to identify the 1st year in the dataset (without hardcoding).

First Year

WINDOW_MIN(MIN([Year]))

This will store the value of the earliest year (1995) against every row in the data that is outputted in the view (a bit like you would get with {FIXED : MIN([Year])}, but this is a no LOD challenge).

From this I can work out

First Year %

IF MIN([Year]) = [First Year] THEN MIN([Food insecurity (includes low and very low food security) Percent of households]) END

Notice the MIN() functions used in this statement, as opposed to the Selected Year % above. This is because First Year is a table calc which is an aggregation, and subsequently other fields referenced in the calculation also need to be aggregated. In this case other aggregations such as AVG, MAX, ATTR would also suffice.

Similarly, I’m going to derive the Latest Year % with

Latest Year

WINDOW_MAX(MAX([Year]))

Latest Year %

IF MIN([Year]) = [Latest Year] THEN MIN([Food insecurity (includes low and very low food security) Percent of households]) END

Finally, I’m also going to work out the value for the previous year

Previous Year

[pSelected Year]-1

Previous Year %

IF [Year] = [Previous Year] THEN [Food insecurity (includes low and very low food security) Percent of households] END

With these 3 % fields, I can now create a fourth field which stores the value of the % I want to compare with

Selected Comparison %

CASE [pComparison]
WHEN 0 THEN [First Year %]
WHEN 1 THEN [Latest Year %]
WHEN 2 THEN SUM([Previous Year %])
END

This ultimately just stores the 2nd value required for the black dot. This could all have been written within this single calculation, but I find it easier to troubleshoot if things are broken down a bit.

Putting these onto a table we can see how the values in each row change, as the parameters are changed.

Note, for all the table calculation fields (all denoted by the triangle symbol on the pill), I have explicitly set them to Compute Using the specific dimension of Year rather than the default of table down. While this will give the same result, I prefer to be explicit to ensure the values don’t change if pills get subsequently moved around the canvas (in the case of the Selected Comparison % field, all Nested Calculations within the Edit Table Calculation dialog box need to be set).

This is enough information to build the main viz itself by

adding Year to Columns (green continuous pill)
adding Food insecurity to Rows
adding Selected Year % to Rows next to Food insecurity
then drag Selected Comparison % to the Selected Year % axis which will automatically change the display to have Measure Values on Rows instead. Set the table calculation setting to compute by Year

This chart can then be set to be dual-axis and the axes synchronised. The Food insecurities should remain as as line mark, and the Measure Values should be a circle. The colours, formatting and tooltip then need to be applied.

Now we need to go back to our table of data to build out other calculations. The requirement is a single viz, so we need to provide the % value of the selected year within the title of the chart, along with the difference from the comparison value. For this to work, we need to store the relevant values against every row in the data set.

We already have the Selected Year % value identified, but this is only captured against the row of the selected year in the data output. To get it to display against every row we need

Window Max – Selected Year %

WINDOW_MAX(MIN([Selected Year %]))

This is formatted to 0 dp with the % sign as suffix, as this is the field that will be displayed in the title. Added to our data table, with the table calc set to compute by Year, you can see the value replicated across every row

Similarly, we already have the Selected Comparison % captured, but in order to work out the difference, we also need to get this value against every row too

Window Max – Selected Comparison %

CASE [pComparison]
WHEN 0 THEN WINDOW_MAX([First Year %])
WHEN 1 THEN WINDOW_MAX([Latest Year %])
WHEN 2 THEN WINDOW_MAX(MAX([Previous Year %]))
END

Adding this in to the table (and remembering to set the table calc settings), you can also see the relevant value perpetuated against every row. Change the pComparison value and you’ll see the values change accordingly, but still the same on every row.

So for the difference…. at this point I chose to deviate from the solution published. We’re already dealing with a measure that is quantified as a % (as opposed to a measure that is quantified as £ say). So to get the % difference between two percentage measures, I simply chose to show the difference between the two values (ie selected – comparison). Some would refer to this a the point difference between the values. This makes most sense to me for this particular scenario. The alternative is to calculate the % difference in the more traditional way of (selected – comparison) / comparison as you may do if you were presenting the % difference between the value of sales in different years. But I personally found the result could be confusing to the reader .

Difference

[Win Max – Selected Year %]-[Win Max – Selected Comparison %]

BUT, the value displayed within the title is the absolute value (ie no -ve sign) and actually doesn’t display if the value is 0 (which you get if you select the year to be 2019 and comparison to most recent year). So I resolved this with

DISPLAY : Difference

IF [Difference] <> 0 THEN ABS([Difference]) END

which is formatted to 1 dp with % suffix

Finally, we have a symbol that is used to indicate if the change is +ve, -ve or remains the same.

Difference Indicator

IF [Difference] > 0 THEN ‘▲’
ELSEIF [Difference] < 0 THEN ‘▼’
ELSE ‘N/C’
END

Add this into the table too, once again remembering to set the table calc properties for all nested calculations.

So now you have these fields, you can add Window Max – Selected Year % and DISPLAY : Difference and Difference Indicator to the Detail shelf of the All marks card on the chart viz, and these will then be available in title dialog to add. Once again, ensure you set the table calcs to compute by Year.

You’ll also need to add some spacing in the title, to allow the parameter controls to be ‘floated’ into place on the dashboard. Getting the position right is really tricky. I positioned it very carefully on Desktop, but when I published, the controls were in completely different places. The easiest way I found to resolve this, was to use the Edit feature in Tableau public to edit the dashboard online and move the objects that way.

My published viz is here.

Happy vizzin’! Stay Safe!

Donna

Donna + DataViz

ramblings in all things Tableau (and occasionally some other Stuff….)

Tag: Window_Max

Can you predict the future?

Can you find the variance along a line?