Can you build a “must include” filter?

Ann Jackson set this week’s challenge inspired by a business requirement she’d encountered, which was to be able to filter the data based on a user selection, and then additionally apply another filter to narrow down the required data.

The task ‘splash’ page had already indicated the challenge would involve sets, and the requirements additionally stated you’d need version 2020.2 or higher, which added an additional hint that set controls would be involved. Examining Ann’s solution, I could also see that the tell-tale set icon displayed when hovering near the input control, which confirmed the ‘filters’ were indeed using the set control functionality.

  • Building the bar chart
  • Viz in Tooltip
  • BANs

Building the bar chart

Let’s start by focusing on the bar chart. The first thing to do is to rename several fields. Ann likes capital letters, so the following fields just need to be renamed so they’re all in uppercase: Sales, Quantity, Product, Order ID and Customer (Name). Sales can also be formatted to $ with 0dp.

We create the first set we need by simply right-clicking on the PRODUCT field -> Create -> Set. I named this 1st Products and set to the Use All condition.

Lets build out a basic table and then we can play with how we need to use this set. Put

  • ORDER ID on Rows
  • CUSTOMER on Rows
  • Add Sales to Text
  • Drag Quantity over the Sales column, so Measure Names gets added to Columns and Measure Values is on Text
  • Sort by Sales descending

If we now add the 1st Products set to the Filter shelf and choose the Show Set option, the list of products in the set will be displayed for selection.

Choose the 1st item in the list, and the data will be filtered to just those customers who have ordered that product

But, while this list matches the names if you do the same on the solution, the Sales and Quantity values differ. This is because, we have actually filtered the data just to the lines containing the selected product. If you bring PRODUCT onto Rows, you’ll see you have 1 line per customer.

Now, while you can use LOD calculations to compute the total sales/quantity at an order id level, to resolve this, it won’t solve the next step of the puzzle, to filter the data further by other products on the same orders, as we’ve already filtered out all the other data. So we need to do something else.

To demonstrate, we just want to keep a handle on these specific ORDER IDs so ctrl-click to select them all, and then Include to add them to the Filter shelf. Now remove the 1st Products set from the Filter shelf, then re-add back to the Rows.

You’ll see we now have all the product lines on each order, with an In or Out displayed against each row. The rows which match the PRODUCT selected in the set, is marked as In. This is essentially what the filter is doing if 1st Products is on the Filter shelf – it is filtering the rows to those which return as In (the set).

We basically want a way to be able to identify all the rows on an order which have at least 1 In row, so we can filter on that instead. For this we can create a new field,

Order has 1st Products

{FIXED [ORDER ID]: MAX([1st Products])}

What this is doing is saying for each ORDER ID, get the maximum 1st Products value, which is then ‘stored’ against each row for the same ORDER ID. This may look odd, as 1st Products from the display is showing an ‘In’ or an ‘Out’, so how does the MAX work? Well, whilst the display is In or Out, under the bonnet this is actually a boolean field of 1 = True = In or 0 = False = Out, so the MAX is actually returning either a 1 (true) or a 0 (false).

If you add Order has 1st Products to Rows, you can now see every row in this restricted data set is listed as True

Remove the ORDER ID from the Filter shelf, and you should observe all the other orders listed have False listed against every row.

So we can now add Order Has 1st Product = True to the Filter shelf, and we can remove 1st Products from Rows too.

So we’ve mastered the 1st product filter, time for the 2nd. Once again right-click on PRODUCT and create another set with the Use All condition. This time I called the set 2nd Product.

Just as before, we want to identify all orders that contain this 2nd product selected, so we’ll create another calculated field

Order has 2nd Product

{FIXED [ORDER ID]: MAX([2nd Product])}

Add this field to the Filter shelf and select True. Now click on the 2nd Product set in the Dimesions pane, and selected Show Set, to display the list of products

You’ll notice though that all products are listed, and not just the ones that are associated to the set of orders already filtered. To fix this, we need to add the Order Has 1st Products field to context (click on the Order Has 1st Products pill on the Filter shelf and Add to Context). This will change the pill to a grey colour, and you can now restrict the values in the 2nd Products set to only show All Values in Context.

The add to context function allows us to dictate the order in which the filtering is applied, so the data is primarily filtered based on our 1st Products set and then filtered based on the 2nd Product.

You should now be able to play around with the filtering and see how the rows for complete orders are retained.

Removing PRODUCT from the Rows, and we get back to the total order value / quantity values we’re expecting.

From this, you should now be able to build out the viz. Firstly, duplicate the sheet so we retain the original table, then move SALES & QUANTITY to the Columns, add Measure Names to Colour, add Labels to match mark colour, and format accordingly.

Viz in Tooltip

As an ‘added extra’ Ann added the ability to hover on a bar and see the details of the order in a simple table. We can use the original table we built above to demonstrate the functionality. Add PRODUCT to Row, and replace the QUANTITY field, with a new field you need to create called QTY, which is just based on QUANTITY ie

QTY

[QUANTITY]

Then hide the ORDER ID & CUSTOMER fields (uncheck Show Header) on the pill. You could remove completely, but I like to retain just in case we need to debug if things don’t work.

We will need to show some totals, but this isn’t the Grand Total. Grand Totals will work, but the background of the row will always be a fixed colour. Either white, by default, or set to grey if you choose. However I noticed that depending on the order id I hovered on, the Grand Total row may display grey or white depending on the existing banding. For this we need to use subtotals instead (Analysis menu -> Totals -> Add All Subtotals).

Because we have multiple (hidden) dimensions on the rows, multiple Total rows will display. We can remove by unchecking the Subtotals option against the ORDER ID pill.

And finally, we need to relabel the row. Right-click on the Total row and select Format. Change the label of the Totals section from ‘Total’ to ‘Grand Total’.

With additional formatting to change the font size etc, this sheet can now be added as a ‘Viz in Tooltip’ to the bar chart. On the Tooltip shelf associated to the bar chart, you can Insert -> Sheets -> <Select Sheet>

Hovering over a bar should now automatically filter the tabular display to the appropriate CUSTOMER & ORDER ID.

BANs

For the BANs, we need several additional calculated fields

# ORDERS

COUNTD([ORDER ID])

simply counts the number of distinct Order IDs in the filtered view.

Total Orders

{FIXED:COUNTD([ORDER ID])}

an LOD calculation that counts the overall number of distinct Order IDs, regardless of any filters applied.

% OF TOTAL ORDERS

[# ORDERS]/SUM([Total Orders])

formatted as a % to 1 dp.

AVG ORDER AMOUNT

SUM([SALES]) / [# ORDERS]

formatted to $ 0 dp.

AVG ORDER QUANTITY

SUM([QUANTITY])/[# ORDERS]

formatted to a number with 0 dp.

Add Measure Names to the Filter shelf, and filter to the relevant 4 measures. Add Measure Values and Measure Names to the Text shelf, and Measure Names on Columns. Add the Order Has 1st Products = true and Order has 2nd Product = true to the Filter shelf, but this time don’t add anything to context. If you do, it will affect the Total Orders measure, and the % of orders calc will be wrong. Hide the Measure Names on the columns (uncheck show header), and format the text appropriately.

To get the thick lines, format and set the Row Divider on the sheet to be a thick coloured line.

And you should now have all the core building blocks needed to add onto a dashboard.

My published viz is here.

Happy vizzin’! Stay safe!

Donna

Can you hide a chart in map layers?

Candra set the challenge this week to use the new map layers to build a map display which, on click of a country, filtered the display to that country and additionally displayed a donut chart indicating the percentage of urban dwellers in that country.

If map layers are very new to you, then the webinar by Adam McCann referenced in the challenge, has some VERY useful pointers for this challenge (the workbook for that can be downloaded from here, as I found some things needed closer inspection).

I think this challenge is going to be best described by walking through the steps.

Building the 1st map layer

Double-Click on Country/Region to load a map, and change the mark type to (filled) Map. Add Region to the Colour shelf and assign the appropriate colours. Adjust the map background via the Map -> Map Layers menu and set the Style to dark, and remove all the selections against the Map Layers list

The intention is when a country is selected the map will ‘drill into’ /filter that country, and display additional information. We will drive this by a parameter, which will get set via a parameter action, but for now we’ll manually set the value.

Create a new string parameter, that is default to nothing/empty string/ ”, and then show this on your sheet.

pSelectedCountry

We need the 1st layer of the map to display, when there is no value in the parameter. We need a calculated field to help drive this.

All Countries

IF [pSelectedCountry]=” THEN [Country/Region] END

By default this will create a field of type ‘string’ but we need it to be a geographic data type. so change this as below.

Add this field to the Detail shelf of the map, and remove the Country/Region field that was automatically added when we first built the map.

Enter the name of a country, eg China, into the parameter. The map should essentially go blank (black screen).

Building the 2nd map layer

We need the country to display, if it’s entered into the parameter. For this, we need another calculated field

Selected Country

IF [pSelectedCountry]=[Country/Region] THEN [Country/Region] END

Once again change this to a geographic role data type of type County/Region.

Next click and drag this field onto the map, and drop it onto the Add Marks Layer option that displays. This will create a new Selected Country marks card, although nothing will obviously change on the map display itself.

Move the Selected Country pill to be on the Detail shelf instead, and add Region onto the Colour shelf. Change the mark type from circle to Map.

Now if you enter a county into the parameter, eg China, the display should ‘filter’ & ‘zoom in’ on China.

So what we have is the 1st layer only showing when no countries have been selected, and vice versa, the 2nd layer only showing when a country has been selected.

We need to now add further layers for the donut chart, which only want to show the country has been selected as well.

Building the 3rd map layer

A donut chart, in the past, is traditionally created by building a pie chart, then using a dual axis to add a circle, sized smaller that the pie chart, on top (see this Tableau KB for info). Rather than a dual axis, we’re going to use map layers – 1 layer for the pie chart, and then another layer for the central circle.

Keeping a country selected (so we can see what we’re building), drag Selected Country onto the map again to create another map layer. Change the mark type to Pie and increase the Size to as large as possible. Move Selected Country to Detail.

In the data set we have a field called Population Urban which stores the ‘percentage’ value of urban dwellers eg 0.17 is 17%. To create the angles for the pie chart, we need to know

Population Non-Urban

1-[Population Urban]

Drag Measure Values onto the Angle shelf. This will automatically add Measure Names to the Filter shelf. Edit the filter to just select the Population Urban and Population Non-Urban measures. Drag the Measure Names field that was also automatically added to the Detail shelf, to the Colour shelf. Adjust colours accordingly, and set the border of the pie chart to white (under the Colour shelf options).

Verify that if you set the parameter to empty again, the whole world map displays, and you can’t see any pie charts.

Building the 4th map layer

Now we need to make the donut hole. Once again, ensure a country is selected, so your pie chart is visible, then drag Selected Country onto the map again, and drop to add another map layer.

This time, move the Selected Country field onto the Detail shelf, add Region to the Colour shelf, and adjust the size of the circle, so its smaller than the pie. Set the border of the circle to be white again too,

Add Population Urban onto the Label shelf, and format to a percentage with 0 dp. The best way to do this, is to format the Population Urban measure in the data pane (right click->default properties -> number format).

At this point you’ll notice the number is huge… we need to add Year to the Filter shelf, and select 2012.

Align the label to be middle centre, and adjust the font to be much bigger text. Add ‘urban dwellers’ underneath.

Once again, verify you get the expected behaviour as you change the values in the parameter from nothing to Russia or China etc.

The final step on this sheet is to add text to the Tooltips. Unlike when working with dual axis, you don’t have an All marks card, so you’ll need to add the required fields (Country/Region, Region, Population Total (formatted to Millions with 0dp), Population Urban to the Tooltip shelf on each of the relevant layers.

Setting the parameter interactively

Create a dashboard sheet, and add the map sheet you’ve built. Then create a dashboard action which sets the pSelectedCountry parameter, impacting the All Countries field, and that when the selection is cleared, the value is reset to ”.

My published viz is available here. Enjoy!

Happy vizzin’! Stay Safe!

Donna

Can you use the Brush Filter Extension?

For this week’s challenge, Sean Miller decided to continue with the theme of dashboard extensions, introducing the brush filter extension. The whole of the top section is the extension, so there’s actually only a single chart that needs to be built for this challenge. The main areas of focus for this blog are

  • The basic chart
  • Colouring the chart
  • Identifying the date range
  • Adding the extension

The basic chart

The chart needs to plot the number of rat sightings vs the Created Date by Borough. The number of sightings is based on the automatically generated field that is related to the name of your data source. To simplify, I created a new field based on this:

# of Sightings

COUNT([2021_02_24_WW08_Rat_Sightings.csv])

and then created the chart as follows

  • Borough on Columns
  • Month(Created Date) on Columns as green continuous date pill, formatted to mmm yy
  • # of Sightings on Rows
  • Borough on Filter, excluding the value Unspecified

The Boroughs were then manually sorted into the required order.

Colouring the chart

Each Borough needs to be coloured based on the difference between the start & end points. So we need to identify what these are for each Borough. I used table calculations for this

Start Count

WINDOW_MAX(IF FIRST()=0 THEN [# of Sightings] END)

If we’re at the first point in the partition, then record the number of sightings. All the other values in that partition will be NULL. The WINDOW_MAX statement then takes that value and ‘spreads’ it across all the data points/rows in that partition, as shown below. In this instance the partition is the Borough.

When Start Count is added to the view, the table calculation is set to compute by the Created Date.

We create a similar field for the end point, this time using the LAST() table calculation

End Count

WINDOW_MAX(IF LAST()=0 THEN [# of Sightings] END)

With these two calculated fields, we can now create

Change

IF [End Count]-[Start Count]>0 THEN ‘INCREASE’
ELSEIF [End Count]-[Start Count]< 0 THEN ‘DECREASE’
ELSE ‘NO CHANGE’
END

To be honest, I outputted the words, as I thought I’d be using these in a legend somewhere, but I didn’t. Outputting as values 1, 0 , -1 or similar will have been more efficient.

This field can now be added to the Colour shelf with the table calculation setting applied to compute by Created Date

Identifying the Date Range

As part of the chart title, we need to identify the date range, ie the earliest month and the latest month within the whole window (whole chart). Again we can use table calculations for this

Start Month

WINDOW_MAX(IF FIRST()=0 THEN MIN([Created Date]) END)

If we’re at the first point in the partition (this time the partition is the whole table), then get the Created Date (this is wrapped within a MIN, as the field needs to be an aggregate; using MAX would have worked just as well). The WINDOW_MAX statement once again ‘spreads’ this value across all the rows. This field is formatted to mmmm yyyy

Similarly we have

End Month

WINDOW_MAX(IF LAST()=0 THEN MIN([Created Date]) END)

also formatted to mmmm yyyy.

Both these fields can then be added to the Detail shelf, but this time the table calculation is set as table across. Once on the Detail shelf, the fields can be referenced in the chart title.

Adding the Brush Filter Extension

This challenge makes use of the Brush Filter extension, which you need to download from this link and save the .trex file somewhere appropriate on your machine.

Create a dashboard and add the chart you’ve already built, then add the Extension object to the dashboard. When prompted select My Extensions and browse to the Brush Filter .trex file you’ve saved.

You’ll then be prompted to configure the extension as below, selecting the name of the sheet containing your chart, the Created Date field that will be filtered, and the relevant measure to display. You can then adjust the colours of the ‘filter’ control, and scroll down to save.

As well as a new sightings by month profile chart being displayed, this will have the effect of adding the Created Date to the Filter shelf on your worksheet. The ‘grey’ box over the chart, can then be adjusted by dragging in the ends, or moving the whole box, to apply the required filter.

And apart from formatting adjustments to the chart, that’s all there is to it for today. My published viz is here.

Happy Vizzin’! Stay Safe!

Donna

Can you use Dashboard Extensions?

For this week’s #WOW2021 challenge, Lorna tasked us with using dashboard extensions, some of which can also be used on Tableau Public (see here). I haven’t had an opportunity to use extensions before, so this was going to be a brand new learning experience for me.

Modelling the Data

In a break from the ‘norm’ Lorna provided us with a new set of data to use, which had to be retrieved from the sources Lorna provided in the challenge

  • Emoji Sentiment – a csv file containing a summary of how often a particular emoji was used within tweets in 2015 and whether the tweets were classified as being positive, neutral, or negative in sentiment.
  • Emoji Database – a link to a site where after registering, you can download a csv file which defines and classifies each emoji

The first part of the challenge requires these data sources to be modelled using relationships in Tableau Desktop. Lorna hinted that a join calculation would be required on the codepoint fields.

I found this a bit odd, as both data sources contained an Emoji field which was automatically set to be the relationship field, and seemed to work.

For the purpose of this blog though, I’m going to attempt to rebuild my solution as I type, and use the suggested fields. The Unicode codepoint field in the Emoji Sentiment data needs to map to the codepoint field in the Emoji Database, but they’re not an exact match. We can prefix ‘0x’ to the codepoint field and ensure the case of the letters match. We can use a relationship calculation for this.

Grouping the Groups

This chart shows the top 20 emojis based on Occurrences and should filter to the appropriate group when a ‘category’ is clicked on the dashboard extension image on the left hand side.

The Emjoi Database data has a Group field, but this doesn’t exactly match the groupings on the image – some groups are ‘grouped’ together. I used Tableau’s in built grouping functionality to group the entries as required (right click field -> create -> group).

Category

I grouped People & Body and Smileys & Emotion together and named them Smileys and People

Filtering by Top 20

The list of Emojis needs to show the Top 20 based on Occurrences, but also filtered by Category. Build out a basic table of

  • Emoji Sentiment : Emoji1, Unicode name on Rows
  • Occurrences on Text
  • Emoji1 on Filter, set to filter by Top 20 of Occurrences
  • Category as a Context Filter – select all values to filter by, then right click -> add to context. This means the data will be filtered based on the Category first before the Top 20 filter is applied. The pill will change to grey to show it is a context filter. Show the filter and you can test it’s working

Building out the Top 20 Chart

Now we have the basics of the filtering functionality, we can build out the rest of the chart.

We need to display percentages, so need fields

% Positive

SUM([Positive])/SUM([Occurrences])

formatted to 1 decimal place. Create similar fields for % Negative and % Neutral

First start by creating a duplicate of the Occurrences field and call it #. Then add this as a discrete pill to the Rows shelf. Remove Occurrences from the Text shelf. Then add Measure Values to Columns and filter by Measure Names so only % Positive, % Neutral and % Negative measures are displayed. Add Measure Names to Colour.

All this has been done on one axis, so now we can add another to display the Position – add this field to Columns. This will create a 2nd marks card. Remove the Measure Names fields from the card, and change the mark type to circle.

Add an additional field MIN(1) to Columns by ‘typing in’ (double click on the Columns area) . Then click on the pill and select dual axis which will combine this field with the Position measure. Synchronise axis. This unfortunately will probably set all the marks to be circle type

so we need to reset…

  • change the mark type of the Measure Values card to bar
  • remove Measure Names from the Position card
  • remove Measure Names from the MIIN(1) card
  • change mark type of MIN(1) card to bar, and reduce size to as small as possible

This should give you the core chart, which can now be formatted accordingly.

Rounded Bar Chart

I always forget how to build these for some reason, so a quick google gave me a refresher via Andy Kriebel’s tutorial on YouTube

  • Add Category to Rows
  • Occurrences to Columns
  • Type in MIN(0) on Columns
  • Drag the MIN(0) pill and drop it on the Occurrences axis. This will change the view so Measure Values is now on Columns and Measure Names is on Rows
  • Change the mark type to Line and drag the Measure Names pill from the Rows onto the Path shelf
  • Increase the size of the mark

Exclude the Null Category and manually sort the categories to match the order in which the entries are listed on the left hand image.

Set to show mark labels, only displaying them at the end of the line, and aligning middle right

Add Category to the Colour shelf and colour accordingly, then apply the relevant formatting again to remove gridlines, axis and hide the Category pill from displaying.

Adding & Configuring the Dashboard Extension

This challenge makes use of the Image Map Filter extension, which you need to download from this link and save the .trex file somewhere appropriate on you machine. I also chose to make use of the image Lorna used rather than create anything, so saved the image from the challenge page to my laptop.

Create a dashboard and add the 2 charts you’ve already built, then add the Extension object to the dashboard. When prompted select My Extensions and browse to the Image Map Filter .trex file you’ve saved.

You’ll then be prompted to configure the extension as below, selecting the image you’ve saved, which is set to scale to container; choosing the Category dimension which is what the chart is filtered by, and then selecting the Top 20 sheet

You then need to select the rectangle option, which allows you to ‘draw’ on the image

Create a rectangle around one of the options, and when prompted select the appropriate Category which the selection relates to.

Repeat this for all the options in the image.

Average Legend

To build this I simply duplicated the Top 20 chart, removed all the pills from the Rows shelf, removed the Emoji pill from the Filter shelf and then changed the aggregation of the Position pill from SUM to AVG. I had to re-tweak the tooltips too.

And after all that, I hope you have all the components you need to deliver this solution. My published viz is here.

Happy Vizzin’! Stay Safe!

Donna

Can you build a Fancy Text Table?

Ann Jackson provided this week’s challenge, to deliver a text table using only Measure Names & Measure Values. I thought with Ann’s introduction that “This challenge should be straightforward for users of all levels” that this would be relatively straightforward, but I have to confess there were moments that I struggled with this. I knew the fundamentals that I’d need to complete this; that all the columns except the first were going to need to be numbers (ie measures), that I’d have to use custom formatting to display the number in the required format (a shape, a date, a word), and that I’d need to use the ‘legends per measure’ functionality to colour each column independently of each other. But determining the best/worst date to display proved to be a bit tricksy! I got there in the end, but there was a fair bit of trial and error.

Custom Formatting

I’m going to step through the build of this, as I think that’s probably the easiest way to describe this challenge. But before I do, one of the core fundamentals to this is knowing about how numbers can be custom formatted. By that I mean when you right-click on a measure -> default properties -> number format -> custom

This box allows you to type in, but you need to know what format/syntax to use. If you set the formatting via one of the other options, then look at the Custom option, it’ll have an entry that will give you a starting point. The above is the format for a number set to 1 decimal place, and shows that negative numbers will be prefixed by a minus sign (-). If we wanted to always show a plus sign in front of a positive number, we can edit this custom formatting to +#,##0.0;-#,##0.0.

The first entry to the left of the semi colon (;) indicates what’s applied to positive numbers. The next entry, to the right of the semi colon, indicates what’s applied to a negative number.

With this knowledge, you can apply more ‘creative’ custom formatting to any numeric measure that contains positive and negative numbers. For example if you want to show a ☑ or a ☒ depending on a ‘yes/no’ or ‘true/false’ concept, then we can create a version of the field as a number along the lines of

Field as Number

If [Field] = ‘XXXX’ THEN 1 ELSE -1 END.

We can than custom format this field by entering ☑;☒ into the text box

The field can still be treated as a measure, since the underlying value is still a number (in this case +/- 1), it’s just displayed differently.

Building the measures

So now we’ve covered how this ‘sneaky formatting’ is working, we’ll get on with the overall build.

The data just needs data from 2019 & 2020, so I chose to set a data source filter to restrict to just these two years.

But, I wanted the rest of the challenge to derive the current year instead of hardcoding, so I created fields

Current Year

YEAR({MAX([Order Date])})

Last Year

[Current Year] – 1

From these, I could then use LoDs to create

CY SALES

IF YEAR([Order Date]) = [Current Year] THEN [Sales] END

This is formatted to $ with 0 dp

and then

LY SALES

IF YEAR([Order Date]) = [Last Year] THEN [Sales] END

again formatted to $ with 0 dp.

We then need an indicator which is ‘true’ if CY SALES is greater than LY SALES, but as discussed above, we need this to be a ‘measure’, which we can custom format.

CY vs LY

IF SUM([CY SALES])>SUM([LY SALES]) THEN 1 ELSE -1 END

Custom format this as ✅;❌ (just copy these symbols from this page… they’ll look black and white in the dialog) – check out this page, to lift the images/other symbols from.

The actual difference identified by △ (again just copy and paste this symbol into the field name) is simply

SUM([CY SALES]) – SUM([LY SALES])

formatted to $ with 0 dp. Once you’ve done this using the Currency(Custom) option, then go to the Custom option and add + to the front of the string :

+”$”#,##0;-“$”#,##0

Next up is the percentage difference

% DIFF
[△]/SUM([LY SALES])

again format this first to a Percentage at 1 dp, then edit the Custom format to +0.0%;-0.0%

Now we’re getting to the slightly more complex part of the challenge – to identify the best and worst day in the month. We’ll start with the best day. We’re using FIXED LoDs throughout this, and while it’s probably possible to do in a single calculation, we’ll use multiple calcs to build up the components.

Order Month

DATENAME(‘month’,[Order Date])

This is the one dimension that’s going to be used in the final output, and simply outputs the month name (January, February etc).

In the data set, there can be multiple sales (ie orders) in a single day. We want to identify the total sales in 2020 (ie the current year) for each order date.

Sales Per Day

{FIXED [Order Date] : SUM([CY SALES])}

Now we’ve got the total sales per day, we want to identify the value of the maximum daily sales in each month

Max CY Sales Per Month

{FIXED [Order Month]: MAX([Sales Per Day])}

Now we need to identify the date in the month that the max daily sales ocurred

BEST DAY

INT({FIXED [Order Month]: MAX(IF [Sales Per Day] = [Max CY Sales Per Month] THEN ([Order Date]) END)}) + 2

WOAH! WHAT??? Let’s try to break this down…

IF [Sales Per Day] = [Max CY Sales Per Month] THEN ([Order Date]) END

If the daily sales value is the maximum daily sales in the month, then return the associated Order Date. But we need to get a date per month, so we’ve wrapped this in a FIXED LoD, for each Order Month. LoDs require the value to be aggregated, so the IF statement gets wrapped in a MAX statement (note MIN would work just as well).

{FIXED [Order Month]: MAX(IF [Sales Per Day] = [Max CY Sales Per Month] THEN ([Order Date]) END)}

Finally, due to the nature of this challenge, that requires we only work with Measure Names & Measure Values, we will convert this date field to a number using the INT function.

The intention here, is that we can then use the Custom formatting option once again, to set the number as a date format – I chose dd mmm yyyy (ie 01 Jan 2020 format, as I feel its less confusing that working out whether the date is in UK or US format).

However, by a very weird circumstance, converting a date to an INT then formatting as a date, will give you a date 2 days out from the one you converted. I don’t understand why, and it left me scratching my head for some time. I had to sense check with a fellow #WOWer who had the same, and checking Ann’s solution, she also was handling the oddity, which is the reason for the +2 on the calculation.

We just create similar fields for identifying the worst day

Min CY Sales Per Month

{FIXED [Order Month]: MIN([Sales Per Day])}

WORST DAY

INT({FIXED [Order Month]: MAX(IF [Sales Per Day] = [Min CY Sales Per Month] THEN ([Order Date]) END)}) + 2

format this to dd mmm yyyy

The final measure we need is based on determining the rank of the CY SALES per month. Ie if we ordered the months based on CY SALES descending, the top 6 would be marked as ‘Top’ and the rest as ‘Bottom’.

RANK

IF RANK(SUM([CY SALES])) <= 6 THEN 1 ELSE -1 END

We can then custom format this to “Top”;”Bottom”

Formatting the Table

Create a text table by

  • Order Month on Rows
  • Measure Names on Columns
  • Measure Values on Text
  • Measure Names on Filter, filtered to just the relevant measures

Add Measure Values to the Colour shelf, and select the Use Separate Legends option to display multiple diverging colour legend controls.

It’s now a case of going through each measure and editing the colour palette, and other settings. Some of this was again a bit of trial and error for me – I chose options that worked.

For the black text fields (CY SALES, LY SALES, %DIFF), choose a diverging colour palette, then click on the coloured squares at each end and select black from the colour picker. Select Stepped Colour and reduce the steps to 2.

Apply the same concept to the BEST DAY and WORST DAY legends, but select the appropriate green or red colour instead.

For the remaining fields, select a diverging colour palette, select the appropriate red at one end, and green at the other, reduce the steps to 2

And subject to some other formatting tweaks (increase font sizes, centre text), this is enough to complete the challenge. My final published viz is here. Note, the published viz does have slight differences to what I’ve blogged… as with many things, you sometimes realise things can be simpler when you try again.

Happy vizzin’! Stay Safe!

Donna

Can you predict the future?

Continuing on from last week’s challenge, Candra Mcrae set this #WOW2021 challenge to introduce another new feature of v2020.4 (so you’re going to need this version as a minimum to complete this challenge). Predictive modelling isn’t something I use at all, so I was quite reliant on the help documentation Candra referenced. I’m also not overly familiar with the various different models, and when they should be used, so this isn’t something I’m likely to use personally, without careful consideration and thought – it’s worth reiterating Candra’s warning :

Viz Responsibly: While predictive analysis in Tableau has never been easier, we encourage you to be intentional in how you use this powerful feature because not doing so (e.g., selecting a model type not appropriate for your data) could lead to inaccurate results.  

The focus of this blog will be

  • Building the main chart
  • Creating the tooltips
  • Determining the data for the title
  • Building the measure selector
  • Adding the measure selector interactivity

Building the main chart

The data provided has some records related to years before 1993, but the requirement is just to use data from 1993 onwards, so the first thing I did was to set a data source filter (right click on data source -> Edit Data Source Filters) to restrict the whole data source to Year 1 >=1993

Next I created the measures we need to display

Total Enrollment

[Total enrollment 2 All students]

simply references the existing measure.

% Black Students

SUM([Total enrollment 2 Black students]) / SUM([Total Enrollment])

% Non-Black Students

1- [% Black Students]

These 3 measures need to be displayed on a single chart where the actual measure displayed is determined by user selection. This selection will be driven by the use of a parameter, that will be set by a parameter action. For now, we just need to establish the parameter.

pSelect_Measure

This is simply a string parameter which will store the value of Total Enrollment by default.

Using this parameter, we can now decide which value we’re going to plot on the chart

Actual Value

CASE [pSelect_Measure]
WHEN ‘Total Enrollment’ THEN SUM([Total Enrollment])
WHEN ‘% Black Students’ THEN [% Black Students]
ELSE [% Non-Black Students]
END

Add this to a view by placing

  • Year 1 as continuous (green) pill on Columns
  • Actual Value on Rows

and you get the basic shape, although it’s not as ‘peaky’. This is resolved by editing the Actual Value axis (right click axis -> edit) and unchecking the Include zero checkbox.

Now change the text in the pSelect_Measure input parameter that’s displayed to % Black Students, and the chart will change. Verify it changes with the text % Non-Black Students too.

In reading through the Tableau KB documentation about Predictive Modeling Functions in Time Series Visualizations, I came to learn of the feature to Extend Date Range, something I’ve never come across before, and I’m not sure what version it first appeared in. Anyway, for this, you need to be working with a field which is a date datatype. The Year 1 field provided is an int.

I’m not entirely sure what I’ve done next is the optimum method, but it worked for me… some of it involved a bit trial and error when I came to defining and invoking the modelling feature later on. In writing this up, I’m essentially helping you to avoid the ‘back and forth’ steps I took in getting it all to work.

Anyway, I needed a date field to represent the year

Year

MAKEDATE([Year 1],1,1)

This resolves to a field containing 1st Jan YYYY for each Year in the data set.

Replace Year 1 on the chart with this field, and changing it to the ‘continuous’ instance of the date by choosing the second ‘Year’ option from the context menu

This changes the pill to green and then the option to Extend Date Range is visible on the menu. Set this to 5 years using the custom option

After doing this you’ll get a 5 nulls indicator displayed, which is basically saying there’s some years without any data. This what we expect for now.

Now onto the modelling part. We need a new calculated field to contain the predicted value using the Gaussian process regression.

Predicted Value

MODEL_QUANTILE(‘model=gp’, 0.5,[Actual Value],ATTR(DATETRUNC(‘year’,[Year])))

again, this took a bit of trial and error to get the date field in the required format.

Add this to the Rows and again edit the axis to ‘exclude zero’. You should now see the data extending for the predicted value beyond 2018. You can now hide the 5 nulls indicator (right click -> hide indicator)

You can now combine to be a dual axis chart (don’t forget to synchronise axis), and apply the relevant formatting to change the marks, apply relevant colours, hide axis & gridlines etc. Note, I set the area chart to 25% opacity, rather than the 75% stated in the requirement, as this gave me the colour most similar to the solution.

Creating the tooltips

Hovering over the chart, the tooltips display the Actual, Predicted and Residual (Actual-Predicted) value for each point. But depending on the measure selected, the format differs – Total Enrollment is in K and the others are in %.

We can’t use the number formatting feature of a field to resolve this, so we need to be a bit more creative. I confess I started creating individual fields for each measure (Actual, Predicted, Residual) based on the measure type selected (Total Enrollment, % Black Students, % Non-Black Students), but this meant I was creating loads of calculated fields, which just seemed a bit unnecessary.

So after a bit of a think, I realised there was a better way.

First up, let’s get our residual

Prediction Residual

[Actual Value]-[Predicted Value]

Now, if we put the values in a tabular form, you can see what the precision of the values are depending on the measure stated

We need to format appropriately. Both displays required the values to be formatted to 1 decimal place, and both have a suffix, either a K or a %.

To get the value in the required display format

Tooltip – Actual Value

IF [pSelect_Measure]=’Total Enrollment’ THEN [Actual Value]/1000
ELSE [Actual Value] * 100
END

Format this to 1 dp.

Create a similar field for the predicted value, which should also be formatted to 1 dp.

Tooltip – Predicted Value

IF [pSelect_Measure] = ‘Total Enrollment’ THEN [Predicted Value]/1000
ELSE [Predicted Value] * 100
END

And finally Tooltip – Residual

[Tooltip – Actual Value] – [Tooltip – Predicted Value]

This needs to be custom formatted to +#,##0.0;-#,##0.0 which ensures a + symbol is always displayed for positive values.

Pop these onto the tabular display we built earlier, and you can see the values are now displaying in the way we need

Finally we need to create a field to store the required suffix

Tooltip – Value Suffix

IF [pSelect_Measure] = ‘Total Enrollment’ THEN ‘K’
ELSE ‘%’
END

We can now add these 4 fields onto the Tooltip shelf of the ‘All’ marks card, and create the tooltip as required

Determining the data for the title

As we need to only use 2 sheets to build this solution, and 1 sheet will be required for the measure selection, we have to incorporate the summary data displayed at the top of the dashboard as part of the title of the chart viz.

In the title, we need to display the following :

  • The latest year in the provided data set (ie 2018)
  • The latest year including the extended date range (ie 2023 – 5 years later)
  • The actual value from 2018 based on the selected measure
  • The predicted value from 2023 based on the selected measure
  • An indicator to show whether the values were likely to increase or decrease

The requirement was to ensure there was no ‘hardcoding’. And as we’re working on getting information related to a specific row (ie year) in a set of data that consists of multiple rows (years), then we’re going to need to make use of table calculations for this.

Let’s start with the ‘easier’ bits first. We want the Year of the latest in the actual data set, and we want this value to be essentially stored against every row in the data

Latest Year

{FIXED: MAX([Year])}

This returns the value of 2018.

Latest Year + 5

DATE(DATEADD(‘year’,5,[Latest Year]))

This simply adds 5 years to the Latest Year, so returns 2023.

Now when I’m using table calculations, I often prefer to see what the data is doing in the table itself, so I can be sure I’m doing things correctly. With the ‘extended year’ stuff, it’s a bit fiddly creating the table from scratch, so I simply started by duplicating the chart sheet ‘as crosstab’ (right click on the sheet name tab, -> Duplicate as Crosstab). Rearrange the fields so Measure Names is on Columns and Year is on Rows and the ‘Tooltip’ named fields are visible. Add Latest Year and Latest Year+5 to Rows, and you can see how these fields show the same value against every row.

Now, remove these fields, as by adding them, we’ve lost the additional ‘extended dates’ rows (ie the ‘fake’ rows that don’t actually exist in the data). Ok, so now we want to get the Actual Value associated to 2018, but then perpetuate this across every row in the data.

Latest Year – Actual

WINDOW_MAX(IF MIN([Year]) = MIN([Latest Year]) THEN [Tooltip – Actual Value] END)

If the Year is the same as Latest Year, then display the value from the Tooltip – Actual Value field. The WINDOW_MAX table calc, then spreads this same value across all rows displayed. Format to 1dp and add this to the table.

We need to do something similar to get the Predicted Value for 2023

Latest Year +5 – Predicted

WINDOW_MAX(IF LAST()=0 THEN [Tooltip – Predicted Value] END)

If we’re at the last row in the data, then display the value from the Tooltip – Predicted Value field. Again the WINDOW_MAX spreads the value across all the rows. Set this to 1 dp and add to the table.

And now we just need to get the increase/decrease indicator

Increase | Decrease

IF ([Latest Year – Actual])-[Latest Year +5 – Predicted]>0 THEN ‘decrease’ ELSE ‘increase’ END

So now we know we’ve got the correct values we need, we can add these fields to the Detail shelf of the chart sheet, so we can reference them in the Title of the chart.

We also need the Latest Year and Latest Year +5 fields added to the Detail shelf, but when you add these, you’ll notice that you lose the ‘extended years’. You can fix this by wrapping the fields in an ATTR function. Double click on the field, which will allow you to ‘type in’ to the field.

You should now be able to create the text in the chart title

Building the measure selector

Phew! Are you still with me… there’s a fair bit going on already, and now we’ve got to build the chart that will drive the user selection.

On a separate sheet, add

  • Measure Names to Rows
  • Measure Values to Detail
  • Measure Names to Text
  • Measure Names to Filter, and restricted to the 3 original measures – Total Enrollment, % Black Students, %Non-Black Students

Uncheck Show Header on the pill on the Rows, then format

  • Set background colour of the pane to a navy blue
  • Set row & column borders to be white
  • Set the text label to be white text, centred, and increase the font
  • Turn off the tooltip

Adding the measure selector interactivity

Create the dashboard and add both the charts. To add the interactivity so that on click of a row in the Measure Selection sheet, it changes the measure being displayed, we need to add a dashboard action, that changes a parameter (Dashboard menu -> Actions -> Add Action -> Change Parameter). Set the action to run on Select when the Measure Select sheet is clicked. The target parameter is pSelect_Measure and the Measure Names field should be passed into this.

And with all that, you should hopefully now have a working solution. My published viz is here (note, my Measure Selection sheet is slightly different from what I’ve described above). The above is a bit simpler I think.

Happy vizzin’! Stay Safe!

Donna

Can you use map layers to show profit at state and city levels?

It was Sean Miller’s turn to present this week’s #WOW2021 challenge, and one of the common features of Sean’s challenges is to utilise new features of Tableau. So for this challenge, you’re going to need v2020.4 (or higher if you’re reading this in a few months time 🙂 )

The core components of this challenge are relatively basic, and I am therefore making assumptions you know how to build a map and a bar chart. The focus for this will be on

  • Create the layered map
  • Sizing the circles (cities)
  • Colouring the circles (cities)
  • Removing interactivity on the states
  • Colouring the Profit on the Tooltip
  • Hover action from bar to cities

Create the layered map

Start by building out a basic filled map (double click State, then double click Profit and move Profit to the Colour shelf – if a map of the US doesn’t appear, you’ll need to edit locations via the Map -> Edit Locations option). Change the Profit colour legend to be red-black diverging, and set the centre point to 0. Remove all the map ‘features’, via the Map -> Map Layers option.

Then drag the City field onto the map, and when the Add a Marks Layer option appears, drop the field onto the ‘layer’ icon.

This will create a marks card for each field on the layers – one for City and one for State

Note if you get an indicator on the lower right of the map that suggests there’s multiple ‘unknown’ cities, click on the indicator to Edit Locations and make sure the State field is mapped to State.

Sizing the circles (cities)

The City layer needs to be sized based on Profit, so drag this field to the Size shelf of the City marks card. By default this will size from small to large with the smallest value being the lowest Profit value

But in the solution, you’ll see Profit is sized based on its ‘absolute’ value, ie -13,000 should be the same size as +13,000.

This was the last thing I actually resolved when building this viz, as it isn’t anything I’ve seen before. I knew it couldn’t be something too tricky, due to the nature of the current set of challenges, so I had a bit of google and discovered it’s just a case of setting the Sizes vary option to be From zero. You might need to the adjust the slider on the Size shelf itself to get the circles at an appropriate size

Colouring the circles (cities)

Create a calculated field

Profit is +ve?

SUM([Profit])>=0

which will return true or false and add this to the Colour shelf of the City marks card. Colour accordingly, and add a grey border to the circles (setting on the Colour shelf).

Removing interactivity on the states

As you move your mouse over the map, you’ll see the borders of the states are ‘selected’. To remove this, click the arrow next to the title of the State marks card, and choose Disable Selection.

As an additional step, remove the tooltips from showing on the State marks card.

Colouring the Profit on the Tooltip

The tooltip displays +ve profit in black and -ve profit in red. For this you’ll need two calculated fields, one to store the profit value only when it’s positive, and one to store the profit value only when it’s negative.

+ve Profit

IF [Profit +ve?] THEN SUM([Profit]) END

-ve Profit

IF NOT([Profit +ve?]) THEN SUM([Profit]) END

Add both these fields to the Tooltip shelf of the City marks card, then ensure they sit side by side in the tooltip text. Colour the text accordingly.

Hover action from bar to cities

When the 2 charts are on the dashboard, add a dashboard Highlight action to run on Hover from the Bar sheet to the Map sheet

Hopefully this has covered all the key points of this challenge. My published viz is here.

Happy vizzin’! Stay Safe!

Donna

Can you build a Control Chart?

Lorna Brown returned this week to set another table calculation based challenge involving a line chart which ‘on click’ of a point, exposed the ability to ‘drill down’ to view a tabular view of the data ‘behind’ the point. This is classic Tableau in my opinion – show the summarised data (in the line chart) and with ‘drill down on demand’. Lorna added some additional features on the dashboard; hiding/showing filter controls to change how the data is displayed in the chart, and a back navigational button on the ‘detail’ list.

The areas I’m going to focus on in this blog are

  • Setting up the parameters
  • Defining the date to plot on the chart
  • Restricting the data to the relevant years
  • Defining the reference bands
  • Colouring the marks
  • Working out the date range for the tooltip
  • Building the table
  • Drill down from chart to table
  • Un-highlight selected marks
  • Hide/Show filter controls
  • Add navigation button

Setting up the parameters

This challenge requires 3 parameters.

Select a Date

a string parameter containing the 2 options available (Date Submitted & Date Selected) for selection, which when displayed on the dashboard will be set to a single value list control (ie radio buttons)

Latest X Years

an integer parameter, defaulted to 3, which allows a range of values from 1 to 5.

NOTE – Ensure the step size is set to 1, as this is what allows the Show buttons option to be enabled when customising the Slider parameter control type.

STD

another integer parameter, defaulted to 1, that allows a range of values from 1 to 3

Defining the date to plot on the chart

The Select a Date parameter is used to switch the view between different dates in the data set. This means you can’t plot your chart based on date field that already exists in the data set. We have to create a new field that determines which date field to select based on the parameter

Date to Plot

DATE(DATETRUNC(‘week’, IIF([Select a Date]=’Date Submitted’,[Date sent to company], [Date received])))

The nested IIF statement, is basically saying, if the parameter is ‘Date Submitted’ then use the Date sent to company field, else use the Date received field. This is all wrapped within a DATETRUNC statement to reset all the dates to the 1st day of the week (since the requirement is to report at a weekly level).

Note – there was some confusion which field the parameter option should map to. I have chosen the above, but you may see solutions with the opposite. Don’t get hung up on this, as the principal of how this all works is most important.

Restricting the data to the relevant years

The requirement is to show ‘x’ years worth of data, where 1 year’s worth of data is the data associated to the latest year in the data set (ie from 01 Jan to latest date, rather than 12 months worth of data). So to start with I calculated, rather than hardcoded, the maximum year in the data

Year of Latest Date

YEAR({MAX([Date to Plot])})

Then I could work out which dates I wanted via

Dates to Include

[Date to Plot]>= MAKEDATE([Year of Latest Date] – ([Latest X Years]-1),1,1)

In the MAKEDATE function, I’m building a date that is the 1st Jan of the relevant based on how many years we need to show.

So if Year of Latest Date is 2020 and Latest X Years =1 then Year of Latest Date – (Latest X Years -1) = 2020 – (1-1) = 2020 – 0 = 2020. So we’re looking for dates >= 01 Jan 2020.

So if Year of Latest Date is 2020 and Latest X Years =3 then Year of Latest Date – (Latest X Years -1) = 2020 – (3-1) = 2020 – 2 = 2018. So we’re looking for dates >= 01 Jan 2018.

This field is added to the Filter shelf and set to true.

So at this point, our basic chart can be built as

  • Year on Columns (where Year = YEAR([Date to Plot])), and allows the Year header to display at the top
  • Date to Plot on Columns, set to Week Number display via the pill dropdown option, and also set to be discrete (blue pill). This field is ultimately hidden on the display.
  • Number of Complaints on Rows (where Number of Complaints = COUNT([XXXX.csv], the auto generated field relating to the name of the datasource).

To get the line and the circles displayed, this needs to become a dual axis chart by duplicating the Number of Complaints measure on the Rows, synchronising the axis and setting one instance to be a line mark type, and the other a circle.

Defining the reference bands

The reference bands are based on the number of standard deviations away from the mean/ average value per year.

Avg Complaints Per Year

WINDOW_AVG([Number of Complaints])

Once we have the average, we need to define and upper and lower limit based on the standard deviations

Upper Limit

[Avg Complaints Per Year] + (WINDOW_STDEV([Number of Complaints]) * [STD])

Lower Limit

[Avg Complaints Per Year] + (WINDOW_STDEV([Number of Complaints]) * -1 * [STD])

Add both these fields to the Detail shelf of the chart viz (at the All marks card level) and set the table calculation of each field to Compute By Date to Plot

This ‘squashes’ everything up a bit, but we’ll deal with that later.

Add a Reference Band (right click on axis – > Add Reference Line) that ranges from the Lower Limit to Upper Limit.

If an Average Line also appears on the display, then remove it, by right clicking on the axis -> Remove Reference Line – > Average

Colouring the marks

I created a boolean field based on whether the Number of Complaints is within the Upper Limit and Lower Limit

Within STD Limits?

[Number of Complaints]<[Upper Limit] AND [Number of Complaints]>[Lower Limit]

Add this to the Colour shelf of the circle mark type, and set to Compute Using Date to Plot. The values will be True, False or Null. Right click on the Null option in the Colour Legend, and select Exclude. This will add the Within STD Limits? to the Filter shelf, and the chart will revert back to how it was. Adjust the colours accordingly.

The Tooltip doesn’t show true or false though, so I had another field to use on that

In or Out of Upper or Lower Limits?

If [Within STD Limits?] THEN ‘In’ ELSE ‘Out’ END

Working out the date range for the tooltip

The Tooltip shows the start and end of the dates within the week. I simply built 2 calculated fields to show this.

Date From

[Date to Plot]

This 2nd instance is required as Date to Plot is formatted to dd mmm yyyy format and also used in the Tooltip. Whereas Date From is displayed in a dd/mm/yyyy format.

Date To

DATE(DATEADD(‘day’, 6, [Date to Plot]))

Just add 6 days to the 1st day of the week.

Building the table

Create a new sheet and add all the relevant columns required to Rows in the required order. For the last column, Company response to consumer, add that to the Text shelf instead (to replace the usual ‘Abc’ text). The in the Columns shelf, double click and type in ‘Company response to consumer’ which creates a ‘fake’ column heading. Format all the text etc to make it all look the same.

Add the Dates to include = true filter.

Also add the WEEK(Date to Plot) field to the Rows shelf, as a blue discrete field (exactly the same format as on the line chart). But hide this field (uncheck Show Header). This is the key linking field from the chart to the detail.

Drill down from chart to table

Create one dashboard (Chart DB) that displays the chart viz. And another dashboard that displays the table detail (Table DB). On the Chart dashboard, add a Filter Dashboard Action (Dashboard menu -> Actions -> Add Action -> Filter), that starts from the Chart sheet, runs as a Menu option, and targets the Detail sheet on the Detail dashboard. Set the action to exclude all values when no selection has been made. Name the action Click to Show Details

On the line chart, if you now click a point on the chart, the tooltip will display, along with a link, which when clicked on, will then take you to the Detail dashboard and present you with the list of complaints. The number of rows displayed should match the number you clicked on

Un-highlight selected marks

What you might also notice, is when you click on a point on the chart, the other marks will all ‘fade’ out, leaving just the one you selected highlighted. It’s not always desirable for this to happen. To prevent this, create a new field called Dummy which just contains the text ‘Dummy’. Add this onto the Detail shelf of the All marks card on the chart viz.

Then on the chart dashboard, add another dashboard action, but this time choose a highlight action. Set the action to run on select and set the source & target sheets to be the same sheet on the same dashboard. But target highlighting to selected fields, and select the Dummy field only

Hide/Show filter controls

Check out this post by The Data School that explains very simply how to work with floating containers to show/hide buttons. When creating in Desktop, the ‘onclick’ interactivity won’t work, you’ll have to manually select to show and hide, but once published to Tableau Public, it’ll behave as desired.

You have options to customise what the button looks like when the container contents are hidden, and what it looks like when they’re shown, via the Button Appearance

Add Navigation Button

On the Detail dashboard, simply add a Navigation object to the dashboard

and edit the settings to navigate back to the chart dashboard, as well as customise the appearance

Hopefully I’ve covered all the key features of this challenge. My published viz is here.

Happy vizzin’! Stay safe!

Donna

Can you build a Customer Lifetime Value Matrix?

Ann Jackson challenged us this week to build this matrix depicting the average worth of customer cohorts during their lifetime.

This challenge involves a mix of LoDs (Level of Detail calculations) and table calculations.

First up , we need to define our customer cohorts (ie group the customers), which for this challenge is based on identifying the quarter they placed their first order in. This will involve an LoD calculation. For a good introduction to LoDs with some worked examples (including a similar cohort analysis example), check out this Tableau blog post.

ACQUISITION QUARTER

DATE(DATETRUNC(‘quarter’,{FIXED [Customer ID] : MIN([Order Date])}))

The 2nd part of the formula in the { … } returns the earliest Order Date associated to the Customer ID, which is then truncated to the 1st day of the quarter that date falls in ie 23 Feb 2019 is truncated to 01 Jan 2019.

For the ‘quarters since birth’ field, we need to calculate the difference in quarters, between the ACQUISITION QUARTER and the ‘quarter’ associated to the Order Date of each order in the dataset.

QUARTERS SINCE BIRTH

DATEDIFF(‘quarter’,[ACQUISITION QUARTER],DATETRUNC(‘quarter’,[Order Date]))

Drag this field into the ‘dimensions’ area of the left hand data pane (above the line if you’re using later versions of Tableau).

Lets sense check what this looks like, by adding

  • ACQUISITION QUARTER to Rows (Discrete, Exact Date)
  • ORDER DATE to Rows, set to Quarter (quarter year ie May 2015 format which will make a green pill), then set to discrete to change to blue
  • QUARTERS SINCE BIRTH to Rows

You can see that while the first row against each cohort starts with a different quarter, the QUARTERS SINCE BIRTH always starts at 0 and counts sequentially down the table.

Next we want to count the number of distinct customers in each cohort, and we’ll use another LOD for this.

CUSTOMERS

{FIXED [ACQUISITION QUARTER]: COUNTD([Customer ID])}

Once again move this field into the Dimensions section of the data pane.

Add this onto the Rows of the above data table, and you should get every row for the same cohort displaying the same number

Add Sales onto Text to get the value of sales made by the customer in each cohort in each quarter. The ‘customer lifetime value’ we need is defined as the total sales so far / number of customers in the cohort.

Remove the QUARTER(Order Date) field from the table, as we’re not going to need this for the display, and it’ll affect the next steps if it’s left.

To get the cumulative sales, we need a Running Total Quick Table Calculation. Click on the Sales pill on the Text shelf and select Quick Table Calculation -> Running Total. The click again and Compute By -> QUARTERS SINCE BIRTH. Add Sales back into the table, so you can see the quarterly Sales value and how it’s cumulating until it reaches the next cohort.

We’ve now got the building blocks we need for the CLTV value we need to plot

Avg Lifetime Value

RUNNING_SUM(SUM([Sales])) / SUM([CUSTOMERS])

Note – I purposefully haven’t called this field what you might expect, as I’m going to ‘fill in the gaps’ that Ann describes in the requirements, and I’ll use that name then.

Pop this field into the table above, again setting the table calculation to compute by QUARTERS SINCE BIRTH

You can now use the data table above to validate the calculation is what you expected.

Now let’s build the viz out.

On a new sheet

  • QUARTERS SINCE BIRTH to Columns
  • ACQUISITION QUARTER (exact date, discrete blue pill) to Rows
  • Avg Lifetime Value to Text, setting the table calculation to Compute By QUARTERS SINCE BIRTH

From this basic text table, you can see the ‘blank’ fields, Ann mentioned. In the data table view, it’s not so obvious. The blank is there because there are no sales in those quarters for those cohorts. To fix we need another table calculation

CUSTOMER LIFETIME VALUE (CLTV)

IF ISNULL([Avg Lifetime Value]) AND NOT ISNULL(LOOKUP([Avg Lifetime Value],-1)) AND NOT ISNULL(LOOKUP([Avg Lifetime Value],1))

THEN LOOKUP([Avg Lifetime Value],-1) ELSE [Avg Lifetime Value] END

This says, if the Avg Lifetime Value field is NULL but neither the previous or the subsequent values are NULL, then use the Avg Lifetime Value value from the previous column (LOOKUP).

Replace the Avg Lifetime Value with the CUSTOMER LIFETIME VALUE (CLTV) field (setting the Compute By again), and the empty spaces have disappeared.

If you hover over the cells in the lower right hand side of the view, you’ll see tooltips showing, indicating that a mark has been drawn on the viz with Null data. To fix this, add CUSTOMER LIFETIME VALUE (CLTV) to the Filter shelf and specify non-null values only to show via the Special tab.

Now if you hover over that area you don’t get any tooltips displaying, as there aren’t any marks there.

Now it’s just a case of formatting the viz a bit more

  • Add CUSTOMERS to Rows
  • Add CUSTOMER LIFETIME VALUE (CLTV) to the Colour shelf by holding down the Ctrl key then clicking on the field that’s already on the Text shelf, and dragging via the mouse onto the Colour shelf. Using Ctrl in this way has the effect of copying the field including the table calculation settings, so you don’t need to apply them again. This will change the colour of the Text.
  • Then change the mark type to Square, which will then fill out the background based on the colour.
  • Then edit the colour legend to the relevant palette (which you may need to install via Ann’s link).
  • Set the border of the mark via the Colour shelf to white
  • Remove the row & column dividers
  • Set the row Axis Ruler to a dark black/grey line
  • Format the 1st 2 columns so the font is the same and centred. Widen the columns if required.
  • Update the tooltip

And then you should be ready to add the viz to your dashboard. My published version is here.

This blog is a bit more detailed that my recent posts, but I’m also conscious I’ve skipped over some bits that if you’re brand new to Tableau, you may not be sure how to do. Feel free to add comments if you need help!

Happy vizzin’! Stay Safe!

Donna

Can you find the variance along a line?

So after a couple of weeks off blogging due to Christmas, I’m back providing solutions for #WOW2021! New contributor Candra Mcrae started off the year with this gentle workout focussing on table calculations. For those who have followed my previous blogged solutions, you’ll probably recall that table calcs don’t phase me as much as they do some – I started using Tableau before LODs existed, so they were the only tool at my disposal in the past. That said, I’m still not such an expert that I get it right first time – there’s often plenty of trial and error as I choose which fields I want to compute by, although I’ve got much better at this since I read Andy Kriebel’s Table Calculations Overview a couple of years ago.

As with any challenge that involves table calcs, I tend to start by building out a tabular view of all the data I’m going to need to build the viz. This ensures I can validate the data much easier and set the table calculation settings to what I need.

So let’s start….

For the line chart itself, we’re simply going to be plotting Year against Food insecurity which has been formatted to 1 decimal place and displays the % as a suffix

We will also need to capture 2 other values which will represent the coloured circles. Parameters are needed to help identify what these circles will be.

pSelected Year is an integer parameter that can be built by right clicking on Year and selecting Create -> Parameter. This will populate the parameter with all the values of the Year field. Default the value to 2018 and adjust the formatting of the displayed value so that it is 0 decimal places and include thousand separators is unchecked.

With this parameter, we can capture the value associated which is represented by the pink dot in the viz

Selected Year %

IF [Year] = [pSelected Year] THEN [Food insecurity (includes low and very low food security) Percent of households] END

It’s a bit more work to identify the black dot, as this will vary based on another parameter

pComparison

I created this an integer parameter storing the values 0, 1 & 2 (defaulted to 2), but aliased for display as First Year, Most Recent Year, Previous Year. I’ll be using this is an case statement/if clause later and comparing integers to strings is much more efficient in Tableau.

I like to take things step by step, especially when there’s table calcs involved, to ensure all the values I’m referencing are correct, so rather than identifying the selected value in a single calculated field, I’m using multiple.

Firstly I want to identify the 1st year in the dataset (without hardcoding).

First Year

WINDOW_MIN(MIN([Year]))

This will store the value of the earliest year (1995) against every row in the data that is outputted in the view (a bit like you would get with {FIXED : MIN([Year])}, but this is a no LOD challenge).

From this I can work out

First Year %

IF MIN([Year]) = [First Year] THEN MIN([Food insecurity (includes low and very low food security) Percent of households]) END

Notice the MIN() functions used in this statement, as opposed to the Selected Year % above. This is because First Year is a table calc which is an aggregation, and subsequently other fields referenced in the calculation also need to be aggregated. In this case other aggregations such as AVG, MAX, ATTR would also suffice.

Similarly, I’m going to derive the Latest Year % with

Latest Year

WINDOW_MAX(MAX([Year]))

Latest Year %

IF MIN([Year]) = [Latest Year] THEN MIN([Food insecurity (includes low and very low food security) Percent of households]) END

Finally, I’m also going to work out the value for the previous year

Previous Year

[pSelected Year]-1

Previous Year %

IF [Year] = [Previous Year] THEN [Food insecurity (includes low and very low food security) Percent of households] END

With these 3 % fields, I can now create a fourth field which stores the value of the % I want to compare with

Selected Comparison %

CASE [pComparison]
WHEN 0 THEN [First Year %]
WHEN 1 THEN [Latest Year %]
WHEN 2 THEN SUM([Previous Year %])
END

This ultimately just stores the 2nd value required for the black dot. This could all have been written within this single calculation, but I find it easier to troubleshoot if things are broken down a bit.

Putting these onto a table we can see how the values in each row change, as the parameters are changed.

Note, for all the table calculation fields (all denoted by the triangle symbol on the pill), I have explicitly set them to Compute Using the specific dimension of Year rather than the default of table down. While this will give the same result, I prefer to be explicit to ensure the values don’t change if pills get subsequently moved around the canvas (in the case of the Selected Comparison % field, all Nested Calculations within the Edit Table Calculation dialog box need to be set).

This is enough information to build the main viz itself by

  • adding Year to Columns (green continuous pill)
  • adding Food insecurity to Rows
  • adding Selected Year % to Rows next to Food insecurity
  • then drag Selected Comparison % to the Selected Year % axis which will automatically change the display to have Measure Values on Rows instead. Set the table calculation setting to compute by Year

This chart can then be set to be dual-axis and the axes synchronised. The Food insecurities should remain as as line mark, and the Measure Values should be a circle. The colours, formatting and tooltip then need to be applied.

Now we need to go back to our table of data to build out other calculations. The requirement is a single viz, so we need to provide the % value of the selected year within the title of the chart, along with the difference from the comparison value. For this to work, we need to store the relevant values against every row in the data set.

We already have the Selected Year % value identified, but this is only captured against the row of the selected year in the data output. To get it to display against every row we need

Window Max – Selected Year %

WINDOW_MAX(MIN([Selected Year %]))

This is formatted to 0 dp with the % sign as suffix, as this is the field that will be displayed in the title. Added to our data table, with the table calc set to compute by Year, you can see the value replicated across every row

Similarly, we already have the Selected Comparison % captured, but in order to work out the difference, we also need to get this value against every row too

Window Max – Selected Comparison %

CASE [pComparison]
WHEN 0 THEN WINDOW_MAX([First Year %])
WHEN 1 THEN WINDOW_MAX([Latest Year %])
WHEN 2 THEN WINDOW_MAX(MAX([Previous Year %]))
END

Adding this in to the table (and remembering to set the table calc settings), you can also see the relevant value perpetuated against every row. Change the pComparison value and you’ll see the values change accordingly, but still the same on every row.

So for the difference…. at this point I chose to deviate from the solution published. We’re already dealing with a measure that is quantified as a % (as opposed to a measure that is quantified as £ say). So to get the % difference between two percentage measures, I simply chose to show the difference between the two values (ie selected – comparison). Some would refer to this a the point difference between the values. This makes most sense to me for this particular scenario. The alternative is to calculate the % difference in the more traditional way of (selected – comparison) / comparison as you may do if you were presenting the % difference between the value of sales in different years. But I personally found the result could be confusing to the reader .

Difference

[Win Max – Selected Year %]-[Win Max – Selected Comparison %]

BUT, the value displayed within the title is the absolute value (ie no -ve sign) and actually doesn’t display if the value is 0 (which you get if you select the year to be 2019 and comparison to most recent year). So I resolved this with

DISPLAY : Difference

IF [Difference] <> 0 THEN ABS([Difference]) END

which is formatted to 1 dp with % suffix

Finally, we have a symbol that is used to indicate if the change is +ve, -ve or remains the same.

Difference Indicator

IF [Difference] > 0 THEN ‘▲’
ELSEIF [Difference] < 0 THEN ‘▼’
ELSE ‘N/C’
END

Add this into the table too, once again remembering to set the table calc properties for all nested calculations.

So now you have these fields, you can add Window Max – Selected Year % and DISPLAY : Difference and Difference Indicator to the Detail shelf of the All marks card on the chart viz, and these will then be available in title dialog to add. Once again, ensure you set the table calcs to compute by Year.

You’ll also need to add some spacing in the title, to allow the parameter controls to be ‘floated’ into place on the dashboard. Getting the position right is really tricky. I positioned it very carefully on Desktop, but when I published, the controls were in completely different places. The easiest way I found to resolve this, was to use the Edit feature in Tableau public to edit the dashboard online and move the objects that way.

My published viz is here.

Happy vizzin’! Stay Safe!

Donna