Can you recommend profitable return customer product bundles?

This week Candra Mcrae provided a market basket analysis challenge which had a bit of all sorts going on as well as the core ‘what else was bought with product x’ puzzle – parameter actions, sheet swapping, custom toggle buttons.

Hopefully I’ll cover all the key elements :

  • Identifying the 2nd customer order
  • Identifying the orders containing the selected sub-category
  • Identifying the ‘other’ sub-categories ordered
  • Building the bar chart
  • Building the tree map
  • Creating the toggle button
  • Controlling what viz to display
  • Creating the dashboard and interactivity

Identifying the 2nd customer order

The requirements state the analysis is based on returning customers, and specifically their 2nd order. So we need a way to identify this set of data.

In the superstore data set, a customer only ever places one order on a day, so we will make use of this feature, but this technique won’t always work in other situations – I personally have been known to place multiple orders with Amazon on the same day, especially when Christmas shopping!

If we look at the dates customers have ordered, we’re looking to identify the orders associated to the customer & dates highlighted below

We can identify whether the date is the first order date using a FIXED LoD calculation

Is Customer’s First Order?

{FIXED [Customer ID]: MIN([Order Date])} = [Order Date]

The FIXED LoD statement within the {..} is finding the minimum Order Date for each Customer ID. This is then being checked against each Order Date, and returning true when it matches.

So this is giving us the first record, but we want the second….

I solved this, by first adding Is Customer’s First Order? = False as a data source filter (right click on the Superstore data source -> edit data source filters)

This essentially removes all the rows associated to the first order for each customer from the data set, and consequently, the Is Customer’s First Order? is now reported as True against what was originally the 2nd order…
If you compare the image above to the one further up the page where I marked the 2nd orders, you should see the values match what now seems to be listed as the first!

This works due to Tableau’s order of operations; the data source filter is filtering the data ‘early on’, so the rows of data we’re now working against, has no knowledge of those rows. Subsequently the customer’s first order is now actually reporting the customer’s 2nd order as the minimum (aka first) order date.

We can now add Is Customer’s First Order? to the Filter shelf and set to True, to just work with this set of orders.

NOTEIf the data source has not yet been extracted, which will be required if you want to publish to Tableau Public, then you will need to do one of the following :

  1. At the point of extraction, the data source filter will automatically be applied as an extract filter. If you keep it, then after extracting, remove the data source filter if its still there, otherwise you’ll find you’re reporting against the 3rd customer order OR
  2. At the point of extraction, remove the extract filter that is automatically applied. After extraction, re-add the data source filter if it’s been removed.

An alternative to using any data source/extract filters is to use the following LoD (which I only got my head round after building out the above, so I didn’t put it in my solution., but works just as well).

Is Customer’s 2nd Order?

{FIXED [Customer ID]:MIN(IIF([Is Customer’s First Order]=False,[Order Date],NULL))} = [Order Date]

The IIF statement is getting the Order Date for all the records which aren’t the Customer’s 1st order. The FIXED LOD, is then returning the minimum/smallest of these dates, which will be the date of the 2nd order. This is then compared to each Order Date to identify the appropriate record.

If you strip off any data source/extract filters, you can see how this works, and Is Customer’s 2nd Order? can be added to the Filter shelf instead, set to True, to get the required rows.

Identifying the orders containing the selected sub-category

To start with we need a parameter to store the selected Sub-Category. Right click on Sub-Category and select Create -> Parameter to create a string parameter which lists the sub-categories. I named mine pSelected_SubCat and it’s defaulted to Accessories.

What we’re now trying to find is the rows relating to Order IDs that contain the stated Sub-Category, in this instance Accessories

So we first need to identify the rows which match

Customer Ordered Selected SubCat

IIF([pSelected_SubCat]=[Sub-Category],1,0)

Add this to the view, and you can see the rows associated to Accessories are marked with 1

But we don’t just want these rows; we also need the rest of the rows on the same order, so in the case of order CA-2018-131534 above, we also need the row associated to the Paper Sub-Category.

For this, I used Sets as per the technique described in the link Candra provided within the hint.

We want the set of Order IDs where the sum of the Customer Ordered Selected SubCat field is >=1. Right click on Order ID and Create -> Set. On the Condition tab, set the relevant properties.

Adding the Orders with Selected SubCat set into the view, and we can see it’s picked up the rows we care about, and we can now filter by this set

An alternative to using Sets, is to use another FIXED LoD calculation

{FIXED [Order ID]: SUM([Customer Ordered Selected SubCat])}>=1

For each Order ID, sum up the Customer Ordered Selected SubCat field, and if it’s >=1 then it will return True, and you can then filter by this instead.

Identifying the ‘other’ sub-categories ordered

We want to count the orders which contain each Sub-Category, which is simply

Count Orders

COUNTD([Order ID])

With a basic view that is filtered as required, that lists Sub-Category by Count Orders (and sorted by Count Orders descending), we get

And you’ll see that Accessories is listed at the top. Change the parameter, and that value will also be listed. But we don’t want to see this – we need to filter it out. For this, we can create

Other Items Ordered

If [pSelected_SubCat]=[Sub-Category] THEN ‘N/A’ ELSE [Sub-Category] END

Pop this into the view, and you can see the ‘N/A’ is listed against the selected Sub-Category.

This means we can add a further Filter using the Other Items Ordered field and set it to Exclude N/A.

Building the bar chart

So now we’ve got the 3 filters we need to identify the rows of data we need to consider, we can easily build a bar chart.

To colour the bars, we need to work out the profitability

Profit Ratio

SUM([Profit])/SUM([Sales])

Is Profitable?

[Profit Ratio]>0

Add Is Profitable? onto the Colour shelf and adjust accordingly

You just then need to format to remove gridlines/axes etc, and add the relevant fields (correctly formatted) to the Tooltip to create the required text.

Building the treemap

The simplest way to do this, is to duplicate the sheet you’ve built for the bar chart, then use the Show Me option (top right) and select TreeMap. This will put most of the pills in the right places. The only thing that needs to be shifted is to add Is Profitable? back onto the Colour shelf, and to explicitly set the Sort order against the Sub-Category field. Count Orders also needs adding to the Label shelf, so it can be displayed.

Creating the toggle control

To create the toggle control which will control the switching of the vizzes do the following :

  • In the Columns shelf, type in MIN(0)
  • Then next to it, type in MIN(1)
  • Drag the MIN(1) pill from the Columns shelf and drop onto the MIN(0) axis, when you see the ‘2 green columns’ appear
  • Remove Measure Names from the Rows shelf
  • Change the mark type to Line
  • Create a parameter called Toggle which is an integer containing the values 0 & 1, defaulted to 0. Show this parameter on the view.
  • Create a new calculated field, Selected Toggle which just references the Toggle parameter
  • Add Selected Toggle to the Columns shelf and change to aggregate to MIN rather than SUM
  • Make it dual axes, and synchronise the axes
  • Remove Measure Names from the Colour shelf on the All marks card.
  • On the Selected Toggle card, change the mark type to Circle, and increase the Size. If you change the Toggle parameter the circle should shift to 1.
  • To colour the line based on the position, create a new calculated field

Colour

IF [Toggle]=0 THEN ‘grey’ ELSE ‘green’ END

  • Add Colour to the Colour shelf of the All marks card. With the Toggle parameter set to 0, set the colour to grey. Change the parameter to 1, and set the colour to green.
  • To set the tooltip create a new calculated field

Toggle Label

IF [Toggle]=0 THEN ‘Bar’ ELSE ‘Treemap’ END

  • Add Toggle Label to the Tooltip shelf of the Selected Toggle marks card. Adjust the tooltip text to just show this value.
  • Remove the text from the Tooltip of the Measure Names marks card.
  • Hide the axes headers and remove the row & column dividers

Controlling what viz to display

With the Toggle parameter set to 0, add the Selected Toggle field to the Filter shelf of the bar chart. and set it to be 0 (or at most 0 depending what filter control displays)

Then set the Toggle parameter to 1, and add the Selected Toggle field to the Filter shelf of the Treemap chart. Set this to be 1 (or at least 1 depending on the filter control displayed)

You should now find that if you switch to the bar chart sheet, nothing is displayed. Change the Toggle parameter back to 0, and the bar chart will now show, but there will be nothing on the treemap chart.

Creating the dashboard and interactivity

Create a dashboard sheet.

Add a vertical layout container

Add the bar chart into the vertical layout container. Set the chart to fit entire view and don’t display the title.

Add the treemap chart beneath the bar chart in the same vertical container. Whilst nothing is actually displaying, still set the chart to fit entire view and again don’t display the title.

The Toggle parameter should have automatically displayed on the right hand side, so test changing the parameter to see how the chart switches.

Add the Toggle Control sheet into the right hand container too.

Add a dashboard action to change the Toggle parameter when the Toggle Control is interacted with. It should take effect on Select of the Toggle chart, affect the Toggle parameter and pass the Measure Values field into the parameter.

Clicking on each end of the Toggle chart should now make the viz change from bar to treemap.

You just now need to tidy up the dashboard – add a title, remove any unrequired objects, set the container background of the right hand panel to grey (you will need to set the background on the toggle sheet to the same shade of grey too).

And hopefully that’s it. My published version is here.

Happy vizzin’! Stay Safe!

Donna

Can you build a Fancy Text Table?

Ann Jackson provided this week’s challenge, to deliver a text table using only Measure Names & Measure Values. I thought with Ann’s introduction that “This challenge should be straightforward for users of all levels” that this would be relatively straightforward, but I have to confess there were moments that I struggled with this. I knew the fundamentals that I’d need to complete this; that all the columns except the first were going to need to be numbers (ie measures), that I’d have to use custom formatting to display the number in the required format (a shape, a date, a word), and that I’d need to use the ‘legends per measure’ functionality to colour each column independently of each other. But determining the best/worst date to display proved to be a bit tricksy! I got there in the end, but there was a fair bit of trial and error.

Custom Formatting

I’m going to step through the build of this, as I think that’s probably the easiest way to describe this challenge. But before I do, one of the core fundamentals to this is knowing about how numbers can be custom formatted. By that I mean when you right-click on a measure -> default properties -> number format -> custom

This box allows you to type in, but you need to know what format/syntax to use. If you set the formatting via one of the other options, then look at the Custom option, it’ll have an entry that will give you a starting point. The above is the format for a number set to 1 decimal place, and shows that negative numbers will be prefixed by a minus sign (-). If we wanted to always show a plus sign in front of a positive number, we can edit this custom formatting to +#,##0.0;-#,##0.0.

The first entry to the left of the semi colon (;) indicates what’s applied to positive numbers. The next entry, to the right of the semi colon, indicates what’s applied to a negative number.

With this knowledge, you can apply more ‘creative’ custom formatting to any numeric measure that contains positive and negative numbers. For example if you want to show a ☑ or a ☒ depending on a ‘yes/no’ or ‘true/false’ concept, then we can create a version of the field as a number along the lines of

Field as Number

If [Field] = ‘XXXX’ THEN 1 ELSE -1 END.

We can than custom format this field by entering ☑;☒ into the text box

The field can still be treated as a measure, since the underlying value is still a number (in this case +/- 1), it’s just displayed differently.

Building the measures

So now we’ve covered how this ‘sneaky formatting’ is working, we’ll get on with the overall build.

The data just needs data from 2019 & 2020, so I chose to set a data source filter to restrict to just these two years.

But, I wanted the rest of the challenge to derive the current year instead of hardcoding, so I created fields

Current Year

YEAR({MAX([Order Date])})

Last Year

[Current Year] – 1

From these, I could then use LoDs to create

CY SALES

IF YEAR([Order Date]) = [Current Year] THEN [Sales] END

This is formatted to $ with 0 dp

and then

LY SALES

IF YEAR([Order Date]) = [Last Year] THEN [Sales] END

again formatted to $ with 0 dp.

We then need an indicator which is ‘true’ if CY SALES is greater than LY SALES, but as discussed above, we need this to be a ‘measure’, which we can custom format.

CY vs LY

IF SUM([CY SALES])>SUM([LY SALES]) THEN 1 ELSE -1 END

Custom format this as ✅;❌ (just copy these symbols from this page… they’ll look black and white in the dialog) – check out this page, to lift the images/other symbols from.

The actual difference identified by △ (again just copy and paste this symbol into the field name) is simply

SUM([CY SALES]) – SUM([LY SALES])

formatted to $ with 0 dp. Once you’ve done this using the Currency(Custom) option, then go to the Custom option and add + to the front of the string :

+”$”#,##0;-“$”#,##0

Next up is the percentage difference

% DIFF
[△]/SUM([LY SALES])

again format this first to a Percentage at 1 dp, then edit the Custom format to +0.0%;-0.0%

Now we’re getting to the slightly more complex part of the challenge – to identify the best and worst day in the month. We’ll start with the best day. We’re using FIXED LoDs throughout this, and while it’s probably possible to do in a single calculation, we’ll use multiple calcs to build up the components.

Order Month

DATENAME(‘month’,[Order Date])

This is the one dimension that’s going to be used in the final output, and simply outputs the month name (January, February etc).

In the data set, there can be multiple sales (ie orders) in a single day. We want to identify the total sales in 2020 (ie the current year) for each order date.

Sales Per Day

{FIXED [Order Date] : SUM([CY SALES])}

Now we’ve got the total sales per day, we want to identify the value of the maximum daily sales in each month

Max CY Sales Per Month

{FIXED [Order Month]: MAX([Sales Per Day])}

Now we need to identify the date in the month that the max daily sales ocurred

BEST DAY

INT({FIXED [Order Month]: MAX(IF [Sales Per Day] = [Max CY Sales Per Month] THEN ([Order Date]) END)}) + 2

WOAH! WHAT??? Let’s try to break this down…

IF [Sales Per Day] = [Max CY Sales Per Month] THEN ([Order Date]) END

If the daily sales value is the maximum daily sales in the month, then return the associated Order Date. But we need to get a date per month, so we’ve wrapped this in a FIXED LoD, for each Order Month. LoDs require the value to be aggregated, so the IF statement gets wrapped in a MAX statement (note MIN would work just as well).

{FIXED [Order Month]: MAX(IF [Sales Per Day] = [Max CY Sales Per Month] THEN ([Order Date]) END)}

Finally, due to the nature of this challenge, that requires we only work with Measure Names & Measure Values, we will convert this date field to a number using the INT function.

The intention here, is that we can then use the Custom formatting option once again, to set the number as a date format – I chose dd mmm yyyy (ie 01 Jan 2020 format, as I feel its less confusing that working out whether the date is in UK or US format).

However, by a very weird circumstance, converting a date to an INT then formatting as a date, will give you a date 2 days out from the one you converted. I don’t understand why, and it left me scratching my head for some time. I had to sense check with a fellow #WOWer who had the same, and checking Ann’s solution, she also was handling the oddity, which is the reason for the +2 on the calculation.

We just create similar fields for identifying the worst day

Min CY Sales Per Month

{FIXED [Order Month]: MIN([Sales Per Day])}

WORST DAY

INT({FIXED [Order Month]: MAX(IF [Sales Per Day] = [Min CY Sales Per Month] THEN ([Order Date]) END)}) + 2

format this to dd mmm yyyy

The final measure we need is based on determining the rank of the CY SALES per month. Ie if we ordered the months based on CY SALES descending, the top 6 would be marked as ‘Top’ and the rest as ‘Bottom’.

RANK

IF RANK(SUM([CY SALES])) <= 6 THEN 1 ELSE -1 END

We can then custom format this to “Top”;”Bottom”

Formatting the Table

Create a text table by

  • Order Month on Rows
  • Measure Names on Columns
  • Measure Values on Text
  • Measure Names on Filter, filtered to just the relevant measures

Add Measure Values to the Colour shelf, and select the Use Separate Legends option to display multiple diverging colour legend controls.

It’s now a case of going through each measure and editing the colour palette, and other settings. Some of this was again a bit of trial and error for me – I chose options that worked.

For the black text fields (CY SALES, LY SALES, %DIFF), choose a diverging colour palette, then click on the coloured squares at each end and select black from the colour picker. Select Stepped Colour and reduce the steps to 2.

Apply the same concept to the BEST DAY and WORST DAY legends, but select the appropriate green or red colour instead.

For the remaining fields, select a diverging colour palette, select the appropriate red at one end, and green at the other, reduce the steps to 2

And subject to some other formatting tweaks (increase font sizes, centre text), this is enough to complete the challenge. My final published viz is here. Note, the published viz does have slight differences to what I’ve blogged… as with many things, you sometimes realise things can be simpler when you try again.

Happy vizzin’! Stay Safe!

Donna

Can you build a Customer Lifetime Value Matrix?

Ann Jackson challenged us this week to build this matrix depicting the average worth of customer cohorts during their lifetime.

This challenge involves a mix of LoDs (Level of Detail calculations) and table calculations.

First up , we need to define our customer cohorts (ie group the customers), which for this challenge is based on identifying the quarter they placed their first order in. This will involve an LoD calculation. For a good introduction to LoDs with some worked examples (including a similar cohort analysis example), check out this Tableau blog post.

ACQUISITION QUARTER

DATE(DATETRUNC(‘quarter’,{FIXED [Customer ID] : MIN([Order Date])}))

The 2nd part of the formula in the { … } returns the earliest Order Date associated to the Customer ID, which is then truncated to the 1st day of the quarter that date falls in ie 23 Feb 2019 is truncated to 01 Jan 2019.

For the ‘quarters since birth’ field, we need to calculate the difference in quarters, between the ACQUISITION QUARTER and the ‘quarter’ associated to the Order Date of each order in the dataset.

QUARTERS SINCE BIRTH

DATEDIFF(‘quarter’,[ACQUISITION QUARTER],DATETRUNC(‘quarter’,[Order Date]))

Drag this field into the ‘dimensions’ area of the left hand data pane (above the line if you’re using later versions of Tableau).

Lets sense check what this looks like, by adding

  • ACQUISITION QUARTER to Rows (Discrete, Exact Date)
  • ORDER DATE to Rows, set to Quarter (quarter year ie May 2015 format which will make a green pill), then set to discrete to change to blue
  • QUARTERS SINCE BIRTH to Rows

You can see that while the first row against each cohort starts with a different quarter, the QUARTERS SINCE BIRTH always starts at 0 and counts sequentially down the table.

Next we want to count the number of distinct customers in each cohort, and we’ll use another LOD for this.

CUSTOMERS

{FIXED [ACQUISITION QUARTER]: COUNTD([Customer ID])}

Once again move this field into the Dimensions section of the data pane.

Add this onto the Rows of the above data table, and you should get every row for the same cohort displaying the same number

Add Sales onto Text to get the value of sales made by the customer in each cohort in each quarter. The ‘customer lifetime value’ we need is defined as the total sales so far / number of customers in the cohort.

Remove the QUARTER(Order Date) field from the table, as we’re not going to need this for the display, and it’ll affect the next steps if it’s left.

To get the cumulative sales, we need a Running Total Quick Table Calculation. Click on the Sales pill on the Text shelf and select Quick Table Calculation -> Running Total. The click again and Compute By -> QUARTERS SINCE BIRTH. Add Sales back into the table, so you can see the quarterly Sales value and how it’s cumulating until it reaches the next cohort.

We’ve now got the building blocks we need for the CLTV value we need to plot

Avg Lifetime Value

RUNNING_SUM(SUM([Sales])) / SUM([CUSTOMERS])

Note – I purposefully haven’t called this field what you might expect, as I’m going to ‘fill in the gaps’ that Ann describes in the requirements, and I’ll use that name then.

Pop this field into the table above, again setting the table calculation to compute by QUARTERS SINCE BIRTH

You can now use the data table above to validate the calculation is what you expected.

Now let’s build the viz out.

On a new sheet

  • QUARTERS SINCE BIRTH to Columns
  • ACQUISITION QUARTER (exact date, discrete blue pill) to Rows
  • Avg Lifetime Value to Text, setting the table calculation to Compute By QUARTERS SINCE BIRTH

From this basic text table, you can see the ‘blank’ fields, Ann mentioned. In the data table view, it’s not so obvious. The blank is there because there are no sales in those quarters for those cohorts. To fix we need another table calculation

CUSTOMER LIFETIME VALUE (CLTV)

IF ISNULL([Avg Lifetime Value]) AND NOT ISNULL(LOOKUP([Avg Lifetime Value],-1)) AND NOT ISNULL(LOOKUP([Avg Lifetime Value],1))

THEN LOOKUP([Avg Lifetime Value],-1) ELSE [Avg Lifetime Value] END

This says, if the Avg Lifetime Value field is NULL but neither the previous or the subsequent values are NULL, then use the Avg Lifetime Value value from the previous column (LOOKUP).

Replace the Avg Lifetime Value with the CUSTOMER LIFETIME VALUE (CLTV) field (setting the Compute By again), and the empty spaces have disappeared.

If you hover over the cells in the lower right hand side of the view, you’ll see tooltips showing, indicating that a mark has been drawn on the viz with Null data. To fix this, add CUSTOMER LIFETIME VALUE (CLTV) to the Filter shelf and specify non-null values only to show via the Special tab.

Now if you hover over that area you don’t get any tooltips displaying, as there aren’t any marks there.

Now it’s just a case of formatting the viz a bit more

  • Add CUSTOMERS to Rows
  • Add CUSTOMER LIFETIME VALUE (CLTV) to the Colour shelf by holding down the Ctrl key then clicking on the field that’s already on the Text shelf, and dragging via the mouse onto the Colour shelf. Using Ctrl in this way has the effect of copying the field including the table calculation settings, so you don’t need to apply them again. This will change the colour of the Text.
  • Then change the mark type to Square, which will then fill out the background based on the colour.
  • Then edit the colour legend to the relevant palette (which you may need to install via Ann’s link).
  • Set the border of the mark via the Colour shelf to white
  • Remove the row & column dividers
  • Set the row Axis Ruler to a dark black/grey line
  • Format the 1st 2 columns so the font is the same and centred. Widen the columns if required.
  • Update the tooltip

And then you should be ready to add the viz to your dashboard. My published version is here.

This blog is a bit more detailed that my recent posts, but I’m also conscious I’ve skipped over some bits that if you’re brand new to Tableau, you may not be sure how to do. Feel free to add comments if you need help!

Happy vizzin’! Stay Safe!

Donna