Can you build a retention heat map with a marginal histogram?

Week 50 of #WorkoutWednesday saw Curtis deliver a heatmap based challenge with a few twists.

The challenge consisted of the following main components :

• Identify a ‘cohort’ for each customer based on first week of sale
• Understand the % of customers in each cohort making purchases per week for the following x weeks, where x was driven by a user defined input.
• Build a heat map for weeks 1 to x-1, but only include ‘full’ weeks
• Build a bar chat for week x
• Display a summary BAN for volume of week x returning customers as a proportion of the customers in the cohorts displayed.

Identify cohort

Each customer needed to be put in a cohort based on the week of their first order, which uses a FIXED LoD (level of detail) calculation.

Cohort

DATE({FIXED [customer_id]: MIN(DATETRUNC(‘week’, [order_week]))})

% of Customers

The number of unique customers making a purchase is a simple calculation

Customer Count

COUNTD([customer_id])

and plotting these fields alongside order_week, we start to see where we’re heading

But we need to index each order_week in relation to the cohort: For the cohort dated 01 Jan, week 1 is order_week 08 Jan, but for the cohort dated 08 Jan, week 1 is order_week 15 Jan.

Week Index

DATEDIFF(‘week’,[Cohort],[order_week])

We also need against each row, the number of customers in the cohort, which is equivalent to the number of customers in week 0.

New Customers

{FIXED [Cohort]: COUNTD(IF ([order_week])=([Cohort]) THEN [customer_id] END)}

This is saying for each cohort, count the number of distinct customers when the order week is the same as the cohort week.

Now we’ve got that, we can work out the % of returning customers each week:

% of Customers

[Customer Count]/SUM([New Customers])

Putting all these onto a data table, we can see all the figures are making sense

But we want to limit the number of weeks, based on the user input, so we need a parameter, that is limited to range between 10 and 26 weeks. Note how the display format of the parameter is set.

Time Period

To restrict the weeks, we need another calculated field we can filter by

FILTER:Weeks Index to Display

[Week Index]> 0 AND [Week Index]<[Time Period]

This is added to the filter shelf and set to True.

This verifies we’ve got the numbers we need to start to build the heat map.

Heat Map

Duplicating the data sheet I’ve created above, then moving the pills around, we get the ‘bones’ of what we’re after, but you can see towards the bottom of the cohort list, that we have missing entries, where there aren’t enough weeks for the cohort week being considered. We don’t want these rows to display, so need to filter the data someway.

I approached this by determining the number of unique customers at week x, then seeing whether there was any or not :

Count Customers at Time Period Param

{FIXED [Cohort]: COUNTD(IF ([Week Index])=([Time Period]) THEN [customer_id] END)}

FILTER: Complete Cohorts

[Count Customers at Time Period Param]>0

Adding the Count Customers at Time Period Param field to the view, you can see that for a 10 week time period, from the cohort dated 29th October onwards, the count of customers is 0, so these are the rows I’m going to filter out, by adding the Filter : Complete Cohorts field to the filter shelf, and setting to True.

So lets turn this into the heat map display:

• Remove the Count Customers at Time Period Param field
• Move % of Customers onto the colour shelf
• Move order_week and Customer Count onto the Tooltip shelf
• Change the Sort of the Cohort to descending
• Add New Customers as a discrete field next to Cohort
• Change the colour palette to use Vidris
• Change the formatting of the Cohort and order_week fields to suit
• Set the tooltips
• Apply other formatting changes to adjust font size, alignment, row/column lines etc

Bar Chart

For the bar chart, we only want the data for week x, so we need to filter the data just to this week again

FILTER : Time Period

[Week Index]=[Time Period]

The bar is basically plotting Cohort against % of Customers, but we need some additonal wizardy to get the desired display.

The requirement states the bar chart needs to be hardcoded from 0-50%, and so we need the light grey bars behind each bar to ‘fill up’ to 50%.

We do this using a synchronised dual axis, with the secondary axis set to MIN(0.5), coloured light grey, and the secondary axis ‘pushed to the back’. Adjust the size of the bars to suit.

But we need to label the end of the secondary axis with the % of Customers, and the label needs to be displayed to the right of the bar.

As we haven’t fixed the axis at all, adding % of Customers to the label shelf of the secondary axis, and reducing the font size and setting to be right- aligned, we get what we need. Then just add all the relevant fields to both marks cards that are needed for the tooltip, and hide both axes and the Cohort field.

But the chart does need to be labelled with Week x. So this requires a calculated field

LABEL: Week Index

‘Week ‘ + STR([Week Index])

which is then added to the Columns shelf and left aligned, and Hide Field Labels for Columns

Other formatting of the chart is applied to ensure no row/column borders or gridlines are being displayed.

Summary BAN

The summary BAN is simply % of Customers displayed as Text but filtered by FILTER: Time Period = True, with the formatting of the text adjusted to suit.

Putting it all together

When adding it all to the dashboard, the heat map and the bar chart need to be sited side by side and both set to Fit Entire View so the rows for each cohort line up. The colour legend is then sited below the heat map, and I then used blank objects either side to make the legend line up with the heat map.

The line at the top of the dashboard is created by using a vertical container object, where a blank object within is then set to height 0 with a black background. Working with containers can be a bit fiddly at times.

I found this a really enjoyable challenge – thanks Curtis!

My published viz is here.

Happy vizzin’

Donna