Can you quickly tell the net change between two dates?

Continuing with ‘Community Challenge’ month, it was the turn of Will Perkins to set the challenge for this week; a challenge inspired by Google’s stock tracker.

By interacting with the published solution and reading the requirements, I deduced that I was likely to need 3 sheets – 1 for the Region headings & KPIs, 1 for the trend line chart, and 1 to drive the timeframe selections. The trend line chart looked like it was going to involve a dual axis combining a line and and area chart, along with ‘filled’ reference bands, although exactly how it would work I wasn’t entirely sure initially. Finally, there was going to be some ‘parameter actions’ action along with the ‘true = false’ trick to ensure selected marks didn’t remain highlighted.

But before we can tackle the actual chart build, we need to nail some of the calculations involved.

Identifying the date range to highlight

The intention of the chart is that on initial load, it has highlighted the timeframe for the last 14 days up to ‘today’. As this chart is being built with a static data set, which only has data up to the end of 2021, I chose to ‘hardcode’ my ‘today’ value into a parameter. This is so that in a year’s time when I might look at this again, I won’t be presented with a broken looking viz.

pToday

Date parameter defaulted to 20 Sept 2021

The user also has the ability to highlight/select dates on the chart itself, which will define a start and end date range. So we also need some additional parameters to capture this information.

pStartRange

Date parameter defaulted to 01 Jan 1900

Similarly you’ll need a pEndRange parameter too, also defaulted to 01 Jan 1900.

Later on we’ll define parameter actions which will ‘set’ these values based on user interaction.

With these fields, we can then define calculated fields to store the start and end dates depending on whether we’re using the defaults due to initial load (ie 14 days to today), or a user selected range.

Selected Range Start Date

IF [pStartRange] = #1900-01-01# THEN DATE(DATEADD(‘day’,-14,[pToday]))
ELSE DATE([pStartRange])
END

Selected Range End Date

IF [pEndRange] = #1900-01-01# THEN DATE([pToday])
ELSE DATE([pEndRange])
END

We’re going to be plotting Order Date on our axis at the day level, and so to simplify things IMO, I created

Order Date Day

DATE(DATETRUNC(‘day’,[Order Date]))

which I then reference in the following calculated field, which is just to capture all the days within the range selected

Selected Dates To Plot

IF [Order Date Day]>= [Selected Range Start Date] AND [Order Date Day]<=[Selected Range End Date] THEN [Order Date Day] END

We can now start to build out the basic chart

Plotting Order Date Day and Selected Dates to Plot side by side you can see the date axis differ, with only the dates from 06 Sep – 20 Sep 21 displaying on the right hand side. The marks type for Selected Date to Plot is set to Area, and to get the marks to join up, you need to turn Stack Marks Off (Analysis -> Stack Marks -> Off menu).

Defining the Timeframe to Display

We’re going to use another parameter to store the timeframe value

pTimeframe

String parameter defaulted to 6 MONTHS (note the case – it’s simpler to match it to the display format that’s going to be used)

We then need a calculated field to tell us what to do with this value

Timeframe to Display

CASE [pTimeframe]
WHEN ‘1 MONTH’ THEN [Order Date Day]>=DATEADD(‘month’,-1,[pToday]) AND [Order Date Day]<= [pToday]

WHEN ‘6 MONTHS’ THEN [Order Date Day]>=DATEADD(‘month’, -6, [pToday]) AND [Order Date Day]<= [pToday]

WHEN ‘YTD’ THEN [Order Date Day]>=DATETRUNC(‘year’,[pToday]) AND [Order Date Day]<= [pToday] WHEN ‘1 YEAR’ THEN [Order Date Day]>= DATEADD(‘year’,-1,[pToday]) AND [Order Date Day]<= [pToday]
ELSE [Order Date Day] <= [pToday]
END

This field will return true for all the dates that fall within each statement and false otherwise.

Add this field to the Filter shelf and select True.

You can test how the left hand side of the chart is affected by manually typing the different values into the parameter

Colouring the chart

The line and area charts are coloured based on whether dates fall in the selected range and whether the difference between the sales values at the start and end of the selected range is positive or not. We need several more calculated fields to work this out.

We firstly need to capture the min and max dates of the selected area for each region. Now, you initially might think that the Selected Range Start Date and Selected Range End Date fields already have these values. However there isn’t always a sale in every region for these dates. You could argue, that in that case, the sales value for that date should be 0 (ie there were no sales on that day), but to match the solution (and it was easier), we just get the min and max dates within the selected range that have a sales value for each region.

Min Selected Date Per Region

{FIXED [Region]: MIN([Selected Dates to Plot])}

Max Selected Date Per Region

{FIXED [Region]: MAX([Selected Dates to Plot])}

Pop these out into a quick view, and you can see how the dates differ per region compared to the default start & end date values

Now we want to work out the sales value on these dates

Min Date Sales

{FIXED [Region]: SUM(IF [Order Date Day]=[Min Selected Date Per Region] THEN [Sales] END)}

Max Date Sales

{FIXED [Region]: SUM(IF [Order Date Day]=[Max Selected Date Per Region] THEN [Sales] END)}

and then we can work out the difference and the % difference

Range Sales Diff

SUM([Max Date Sales])-SUM([Min Date Sales])

custom formatted to +”$”#,##0.00;-“$”#,##0.00 to show a ‘+’ prefix for positive values

Range Sales % Diff

[Range Sales Diff]/SUM([Min Date Sales])

custom formatted to ▲0.0%;▼0.0%

Now we can compute a field to use to colour the line/area chart

Colour – Trend

IF MIN([Order Date Day]) >= [Selected Range Start Date] AND MIN([Order Date Day])<= [Selected Range End Date] THEN

IF [Range Sales Diff]>= 0 THEN 1 ELSE -1 END
ELSE 0
END

If we’re within the selected date range, then test to see if the value is positive (set to 1) or negative (set to -1), otherwise we’re outside the selected date range, so set to 0

Go back to the trend chart and add this field to the Colour shelf of the All Marks card (so it gets added to both sets of marks). Change it to be a discrete (blue) pill and the adjust the colours accordingly. At this point you may want to change the background colour if you’re using a white line. I’m just setting it to a light grey at this point, but eventually it’ll get set to black.

Adding the highlight band

This took a lot of thinking. I knew I’d need a reference band, but it took some time to figure out how to get the backgrounds coloured differently, since you only have the option to fill between the band with one colour.

The trick is to make use of the two date axes we have and to apply a band per pane.

But we need some more fields to make this happen.

Ref Line Start Date -ve

IF [Range Sales Diff]<0 THEN [Selected Range Start Date] END

Ref Line End Date -ve

IF [Range Sales Diff]<0 THEN [Selected Range End Date] END

Add these fields to the Detail shelf of the Order Date Day card and set to be continuous (green). Then add a reference band to this axis, applying the settings as below (note, the Line is a white dotted line, so isn’t showing up in the field setting, though you can see it on the viz).

Because the reference band has been set at the pane level, and the reference line dates are only relevant if the difference is negative, then the band is just showing on one row.

We then do something very similar, but this time we get some dates only if the difference is positive.

Ref Line Start Date +ve

IF [Range Sales Diff]>=0 THEN [Selected Range Start Date] END

Ref Line End Date +ve

IF [Range Sales Diff]>=0 THEN [Selected Range End Date] END

Add these as continuous pills on the Detail shelf of the Selected Dates to Plot card, and add another reference band to this axis instead.

Now you can set the chart to be a dual axis, synchronising the axes, and removing the Measure Names field from the All Marks card which will have automatically been added

This is the core viz, that will need further formatting before its ready to put on the dashboard – remove gridlines, borders etc, set background, remove headers. NOTE– You’ll need to manually re-sort the Regions before the field is hidden.

The KPI table

We need to build a ‘fake table’ for this, by putting Region on Rows and typing MIN(0) on Columns, then adding the Range Sales Diff and Range Sales % Diff fields to the Text shelf. We need an additional field to colour the text though.

Colour – KPI

[Range Sales Diff]>=0

Finally, I capitalised the Region values by using Aliases. This is a quick method when there aren’t many values, but otherwise I would usually create a field with UPPER([Region}).

Once again don’t forget to sort the Regions, and the apply relevant formatting.

The Timeframe Selector

Will states you can use a separate data source for this, so create the list in Excel

and then copy and paste (via the Data > Paste) menu into your workbook

On a new sheet add Date Range to Columns and Date Range to Text. The size and colour of the text differs based on which one has been selected. So create a field

Timeframe Selected

[Date Range] = [pTimeframe]

and add this field to both the Size and Colour shelves. You’ll need to adjust the settings, and hide headers, remove gridlines etc. Try to avoid touching the Text formatting directly, as you might find the Size then doesn’t adjust.

Adding the interactivity

You’ll need to use layout containers to organise all the objects on the dashboard. Then you can add the various dashboard parameter actions needed

Selecting the timeframe

Add a parameter action that on select passes the Date Range field into the pTimeframe parameter

Selecting the date range to highlight

You’ll need 2 parameter actions for this, one that passes the minimum Order Date Day selected into the pStartRange parameter, and the other that passes the maximum Order Date Day selected into the pEndRange parameter.

Deselecting the highlighted marks

By default when you click on a mark/select marks in Tableau, they are highlighted/selected and the other marks are faded, until you ‘click’ again. To stop this from happening I use a ‘true = false’ trick, that has become very common in #WOW challenges, and I’ve blogged many times before.

Create calculated fields

True

True

and

False

False

and add these to the Detail shelf of the All Marks card on the trend line chart.

Then on the dashboard add a Filter action that on select targets the sheet directly, mapping the true field to the false field. As this can never be ‘true’ the filter doesn’t apply, and the marks become unselected.

Repeat the same on the Timeframe Selector sheet.

Hopefully that’s covered all the core points. My published viz is here.

Happy vizzin’! Stay Safe!

Donna

How much change has occurred?

A relatively straightforward challenge was set by Luke this week, to visualise the difference in Sales between 2020 and 2021 in a slightly different format than what you might usually think of.

Start by filtering the data to just the years 2020 and 2021 (add Order Date to the Filter shelf and select specific years, or add a data source filter to limit the whole data set).

Add Sub-Category to Rows, and Sales to Columns, then add Order Date to Colour which by default will display as YEAR(Order Date). Colour the years appropriately.

Now unstack the marks (Analysis menu -> Stack Marks -> Off), and re-order the colour legend, so 2021 is listed first (this makes the 2021 bars sit ‘on top’ of 2020).

Adjust the size to make the bars thinner.

Now add another instance of Sales to the Columns shelf, and make the chart dual axis (synchronising the axis). Reset the mark type of the original SUM(Sales) marks card back to bar.

We need the circle mark for the 2021 Sales to be blue. To do this, duplicate the Order Date field, then add Order Date (copy) to the Colour shelf of the SUM(Sales)(2) marks card. This will show another colour legend, and you can set the colours accordingly. Add a white border around the circle marks.

To work out the % difference to display on the label, we need the following fields

2021 Sales

{FIXED [Sub-Category]: SUM(IF YEAR([Order Date])=2021 Then [Sales] END)}

This returns the value of the 2021 sales for each Sub-Category against all the years in the data set. Similarly we need

2020 Sales

{FIXED [Sub-Category]: SUM(IF YEAR([Order Date])=2020 Then [Sales] END)}

which means we can then create

% Difference

(SUM([2021 Sales])-SUM([2020 Sales]))/SUM([2020 Sales])

format this using custom formatting to display as +0%;-0%

Now we can add % Difference to the Label field of the Sum(Sales)(2) marks card.

You’ll notice you’ll have duplicate labels displayed. To resolve this, you need to adjust the label settings as below

To sort the rows, you need to sort the Sub-Category field by 2021 Sales descending

And finally to show the value of the 2021 Sales, add this field to the Rows shelf, and change to be discrete (blue pill).

All that’s left to do now is adjust the wording of the tooltips as you see fit, and format to remove gridlines, headers etc.

My published version is here.

Happy vizzin’! Stay Safe!

Donna

Can you find the needle in the haystack?

It was Candra’s turn to ‘set’ the #WOW2021 challenge this week providing a hint in the challenge description that the solution would involve sets.

As with many challenges, I built the data out in tabular format to start with to verify I had all the components and calculations correct. The areas of focus are

  • Identify number of distinct customers per product
  • Identify overall average number of distinct customers per product
  • Identify if product above or below average distinct customers
  • Identify Top 50 products by Sales
  • Identify Unprofitable Products
  • Identify products that are both in the top 50 AND unprofitable
  • Building the viz

Identify number of distinct customers per product

To start off, add Product Name, Sub-Category, Category to the Rows shelf to begin building out a table. Add Sales (formatted to $k 0dp) and Profit (formatted to $k 0dp with negative values as () ) to Text and sort by Sales descending.

To identify the distinct customers per product, we can create

Customer Count per Product

{FIXED [Product Name] : COUNTD([Customer ID])}

Add this to the view.

Identify overall average number of distinct customers per product

What we’re looking for here is the average of all the values we’ve got listed in the Customer Count per Product column. Ie we want to sum up those values displayed and divide by the number of rows.

The number of rows is equivalent to the number of products, which we can get from

Count Products

{FIXED : COUNTD([Product Name])}

And so to get the overall average we calculate

Avg Overall Customer Count

{FIXED: SUM([Customer Count Per Product])} / [Count Products]

Add these fields to the view as well, so you can see how the values work per row. The last two calculations give you the same value across all rows.

Identify if product above or below average distinct customers

Given the above display, this is just a case of comparing values in 2 columns

Higher than Avg Customer Count

AVG([Customer Count Per Product]) > SUM([Avg Overall Customer Count])

this returns true or false – add this to the view too.

Identify Top 50 products by Sales

We can create a set for this. Right click on Product Name > Create > Set. Name the set something suitable eg Top 50 Products, and on the Top tab, state the number (50) and the field (Sales) and the aggregation (Sum)

Add this to the view, and if you’ve sorted by the sales, you should find the top 50 rows are all In the set, and the rest are Out.

Identify Unprofitable Products

We can use another set for this. Again create a set off of Product Name, call it Unprofitable Products, and on the Condition tab, set the condition so that the Sum of Profit is less than 0

Add this onto the view too.

Identify products that are both in the top 50 AND unprofitable

For this, we’re explicitly looking for the rows that are both In the Top 50 Products set and In the Unprofitable Products set.

We can use the Combined Set functionality to do this.

In the left hand data pane, select both the Top 50 Products and the Unprofitable Products sets (hold down ctrl to multi select), then right click and Create Combined Set. I called the set Products to Include, and select to combine the sets by including Shared members in both sets

If you then add this field to the Filter shelf, you will be left with just the 13 Products that match

This is the single filter field you can use as per Candra’s requirements.

Building the viz

To get the text to display to the left of the bar, you actually need to create a ‘fake’ bar chart.

  • Add Products to Include to Filter
  • Add Product Name to Rows
  • On the Columns shelf, double click and type in MIN(1)
  • Add Sales to Columns to the right of MIN(1)
  • Sort by Sales descending

Against the MIN(1) marks card

  • Change the Size to small
  • Set the Opacity of the Colour to 0% and the border to None
  • Add Product Name, Sub-Category and Category to the Label shelf and adjust accordingly, aligning left
  • Increase the height of each row to make the text visible

On the Sales marks card

  • Add Higher than Avg Customer to the Colour shelf and adjust
  • Show mark labels
  • Create a new field Profit Ratio : SUM([Profit])/SUM([Sales]) Format to % with 0dp and add to Tooltip
  • Add Profit, and Customer Count by Product to Tooltip and adjust accordingly

Finally, uncheck Show Header against Product Name and MIN(1) and Sales and format the borders/gridlines etc. Add the title, then add to the dashboard.

All done (I hope…)! My published version is here.

Happy vizzin’! Stay Safe!

Donna

Can you visualise the Cholera Outbreak?

Lorna delivered an exciting map based challenge this week, to recreate one of the most famous visualisations ever created. If you’re part of the data viz community and never heard of John Snow and his cholera map, then I suggest you go and google now 🙂

While I love map based challenges, as its an area I don’t get to use regularly, they’re also the ones that can give me the most frustrations/take the longest to complete, as the functions and concepts I need to use don’t come as readily as with other challenges. I had to refer back to my own blogs on previous map based challenges (specifically this one and this one), to help me out. However these were written before map layers were introduced, so not everything was applicable.

  • Modelling the data
  • Building the Avg Distance to Pump bar chart
  • Building the Total Deaths within Avg Distance bar chart
  • Building Map
  • Adding the dashboard interactivity

Modelling the data

3 sets of data was provided

  • Pumps – 1 row per pump with its location
  • Deaths – 1 row per death and its location (can be multiple rows for the same location)
  • Deaths aggregated – 1 row per location where at least 1 death was recorded, including the count of deaths at that location

This all needed to be ‘combined’ so we can work across all 3 sets of data and compare the locations. I used relationship calculations to create a relationship of 1=1 between the related data sets, which means all the aggregated deaths are mapped to each pump, and all the individual deaths are mapped to each pump too.

Building the Avg Distance to Pump bar chart

I chose to start building the bar charts first so I could get better acquainted with how the data was working together, and I knew the maps would take more effort.

To start we need to figure out the distance from each pump to each death, so we need to get the location of each pump, and the location of each death

Pump Location

MAKEPOINT([Pump Lat],[Pump Lon])

Individual Death Location

MAKEPOINT([Death Lat],[Death Lon])

These will both create calculated fields of a geographic data type as indicated by the globe icon to the left of the field.

And with these fields, we can then work out the distance between them in metres

Distance Pump to Death

DISTANCE([Pump Location], [Individual Death Location],’m’)

Now we want the average distance per pump

Avg Distance

{FIXED [Pump ID]: AVG([Distance Pump to Death])}

I formatted this to 0 dp and added a suffix of ‘m’

And with this, we can build the first bar chart – Pump ID on Rows and Avg Distance on Columns. You need to apply formatting

  • Change the font – I set the font to Times New Roman at the Workbook level, to make life easier (Format menu > Workbook).
  • Set the worksheet background colour. I used #f5f1f0
  • Add a black border to each bar (via the Colour shelf)
  • Show labels on bars (via Label shelf)
  • Hide Avg Distance axis (uncheck show header on the Avg Distance pill)
  • Remove all row & column borders
  • Set the Axes Ruler on Rows to black
  • Set the Tooltips

A parameter is needed to store the selected pump

pSelectedPump – an integer defaulted to 1

With this, we can then create a field to indicate which pump is selected

Is Selected Pump?

[Pump ID]=[pSelectedPump]

Add this field to the Colour shelf and set the True option to black and the false option to match the background colour you used.

Building the Total Deaths within Avg Distance bar chart

We now need to identify if the death occurred within the average distance that has been calculated.

Number Deaths within Avg Distance

SUM(INT([Distance Pump to Death] <= [Avg Distance]))

This is a bit of short hand instead of writing IF [Distance Pump to Death] <= [Avg Distance] THEN 1 ELSE 0, and summing up the result. It relies on the the fact that the equation [Distance Pump to Death] <= [Avg Distance] returns a boolean of true or false, which can be converted to an integer as true =1 and 0 = false.

At a row level, the field just returns 1 or 0, but just plot this field against Pump ID, you get the desired count

And the easiest way to build this bar chart, is to duplicate the one created above (on the worksheet tab, right click & duplicate sheet), then replace the Avg Distance pill with the Number Deaths within Avg Distance pill (simply drag the latter from the data pane and drop directly onto the former in the columns shelf). Then adjust tooltips etc to suit. Doing it this way preserves (most of) the formatting.

Building the Map

We’re going to use 4 layers in creating the map, and I’m documenting from the bottom upwards (although this isn’t the order in which I actually built the viz in practice – I shuffled things round afterwards).

We’ll start with the density layer – the blue coloured area on the map.

On a new sheet, add Individual Death Location and Death ID to the Detail shelf. This should automatically create a map view with the auto generated lat & long fields. Then

  • Change the mark type to Density
  • Change the colour to Density Blue-Teal Light
  • Alter the Intensity to 75%, the Opacity to 60%
  • Increase the Size to maximum
  • Delete the info in the tooltip

We need to set the background map

Via the Map > Map Layers menu option, set the Style to Normal and uncheck all options and set Streets, Highways, Routes

Then set the worksheet background to the colour we’ve used before

Onto Layer 2 – the pump buffer circle.

We’ve already identified the selected pump, but now we need to get it’s location, and for that we need the lat and long of the selected pump.

Selected Pump Lat

{FIXED: MIN(IIF([Is Selected Pump?],[Pump Lat],NULL))}

This essentially gets the latitude of the selected pump and sets that value across all the rows in the data.

Selected Pump Long

{FIXED: MIN(IIF([Is Selected Pump?],[Pump Lon],NULL))}

And with these we can get

Selected Pump Location

MAKEPOINT([Selected Pump Lat],[Selected Pump Long])

We also need to get the average distance of the selected pump spread across every row

Selected Pump Avg Distance

{FIXED :SUM( IF [Is Selected Pump?] THEN [Avg Distance] END)}

Now we have this, we can build the buffer field

Pump Buffer

BUFFER([Selected Pump Location],[Selected Pump Avg Distance],’m’)

Drag this Pump Buffer field onto the map and drop it onto the ‘Add a Marks Layer’ section that should appear. This will create an additional marks card.

Set the colour to the same background colour previously used, and drop the opacity to around 5%. Add Pump ID to Detail and Selected Pump Avg Distance to the Tooltip shelf. Adjust tooltip to match.

Onto Layer 3 – the pumps

Drag Pump Location onto the map and Add a Marks Layer.

Add Pump ID to Text and Location to Tooltip to the relevant marks card. Change mark type to circle, increase the size and change colour to red and add a dark grey border. Adjust text to be centred, and set the tooltips accordingly.

Onto the final layer now, the aggregated death locations. We need a further location field

Death Location

MAKEPOINT([DeathLat (deaths aggregate.csv)], [DeathLon (deaths aggregate.csv)])

Drag this onto the map to Add a Marks Layer then on the subsequent marks card add Loc ID to the Detail shelf. Change the mark type to circle, and add Deaths to the Size shelf – adjust accordingly. Change the colour to black with a white border. Adjust tooltip.

Adding the dashboard interactivity

Add the 3 sheets onto a new dashboard sheet. Create a new dashboard action to Change Parameter, that runs on select of any of the 3 charts, and targets the pSelectedPump parameter by setting it with the value from the Pump ID field.

Now if you select a pump on the map, or a bar on either of the bar charts, all the charts will reflect the selection made.

The other interactvity you may notice, is that on click, of the bars, or the pumps, the other data ‘fades’ (or the selected data is highlighted). We don’t want this.

To resolve, create a calculated field

True

True

and another

False

False

Add both these fields to the Detail shelf on both bar chart sheets and to the Detail shelf on the Pumps marks card.

Then on the dashboard, create a dashboard filter action which filters on selected fields setting True = False (see below). You’ll need to create one of these actions for each of the 3 sheets.

And hopefully, you now have a working viz. My published version is here.

Happy vizzin’! Stay Safe!

Donna

Generation Population

For week 17 of #WOW2021, Sean Miller decided to challenge us with recreating a chart by Nathan Yau (see here for the original). The aim was to recreate within 1 sheet, which I managed to do. So how did I do it? Read on 🙂

  • The groundwork
  • Colouring the bars
  • Adding the year labels
  • Final formatting

The groundwork

The chart itself follows a straight forward structure of multiple blue dimension fields on Columns with a green measure on Rows similar to this view below based on Superstore Sales data – it’s just formatted a bit more creatively!

The data provided just contains 3 fields : SEX, AGE and 2019 Population. We want to present the 2019 Population measure for the Total SEX only by Year. We need to create the Year field, which is simply

Year

2019-[AGE]

I dragged this into the ‘dimensions’ section of the data pane (above the line).

This allows us to create the basic bar chart required

We now need to define the various fields that we will need to add as additional dimensions on the Columns shelf to create the ‘generation’ data panes.

Generation

IF [Year]<= 1927 THEN ‘Greatest
Generation’
ELSEIF [Year]<= 1945 THEN ‘Silent Generation’
ELSEIF [Year]<=1964 THEN ‘Baby Boomer’
ELSEIF [Year] <= 1980 THEN ‘Generation X’
ELSEIF [Year]<=1996 THEN ‘Millennials’
ELSEIF [Year]<=2012 THEN ‘Generation Z’
ELSE ‘Gen
Alpha’
END

NOTE – there is a deliberate carriage return in the condition for ‘Greatest Generation’ and ‘Gen Alpha’ which will force the field to ‘wrap’ when displayed.

Having defined the above, we need to determine

Total Population Per Generation

{FIXED [Generation], [SEX]: SUM([2019 Population])}

and then

% of Total Population

SUM([Total Population Per Generation]) / TOTAL(SUM([Total Population Per Generation]))

NOTE – to create this field, I originally created a ‘quick table calculation’ against the Total Population Per Generation field which I’d displayed on a view, and then dragged the resulting pill into the measures pane to create the new field with the desired calc.

Let’s put these in a table, so we can then check the values, and see that the 2nd and 3rd columns are the same value for each row associated to a particular generation, which is what we need.

Right, so now we need to determine the rank based on the Total Population Per Generation

Rank

RANK_DENSE(SUM([Total Population Per Generation]))

Format this to a custom number with 0 decimal places, but prefixed with #

When added to the table we get

The intention, is that Rank will be displayed as discrete ‘header’ pill rather than a measure, so let’s move Rank to be the 1st pill on the Rows shelf and change to be discrete.

But we need the Total Population Per Generation and % of Total Population fields to be combined into a single pill. So we need to do a bit of string manipulation/ number formatting for this

Total | Percent

STR(ROUND(SUM([Total Population Per Generation])/1000000,1)) + ‘M’ + ‘ | ‘ + STR(ROUND([% of Total Population] * 100,1)) +’%’

This looks complicated, but its because even though you may have applied the relevant display number formatting against the individual numeric measures of Total Population Per Generation and % of Total Population, the formatting is not preserved, when converted into a string field, which this field needs to be. So the relevant calculations need to be applied within the field itself.

This outputs the below

Now we need a way to sort the data so the ‘Greatest Generation’ associated to the earliest years is listed first. I did this by determining the minimum date within each Generation.

Min Year Per Generation

{FIXED [Generation], [SEX]: MIN([Year])}

Add this into the view as the first pill in Rows, and the data should automatically sort from lowest to highest

We can now build the viz – duplicate the table sheet, remove Total Population Per Generation and % of Total Population from the Measure Values section. Drag 2019 Population to Columns, then click the swap rows & columns button :

Colouring the bars

The bars are coloured based on each ‘Generation’ pane. You could hardcode this along the lines of ‘Generation = x OR Generation = y or Generation = z etc’ where x, y and z etc are generations of the same colour. This would return true or false, which you can then add to the colour shelf and adjust accordingly.

I decided to be a bit more dynamic, deciding I wanted to set the colour based on whether it was an odd or even pane.

For this I created another ‘rank’ field based on the field I’d used to ‘sort’ the data, the Min Year Per Generation field.

Sort Position

RANK_DENSE(MIN([Min Year per Generation]),’asc’)

If you add this into the data table, you’ll see each section is numbered 1 -7

From this, we can then determine if the number is even (or not)

Sort Position is even number

[Sort Position]%2=0

Add this onto the Colour shelf which will return True or False and colour accordingly.

Adding the year labels

The year labels are achieved by using a dual axis chart, to plot a point for each specific year (based on the Min Year Per Generation field) at some arbitrary value.

Point to Plot Year Label

IF [Year]=[Min Year per Generation] AND [Year]<>1919 THEN 4700000 END

For each ‘min year’ that isn’t 1919, plot a value at 4.7M.

Add this field to the Rows shelf, change mark type to circle, reduce size to as small as it can, and set the colour transparency to 0.

Add Min Year Per Generation to the Label shelf, then change the alignment to vertical.

Now you make the chart dual axis and synchronise the axis. Some of the colours/marks may change, so reset by removing Measure Names from the Colour shelf and changing the mark types bar to bar & circle.

Final formatting

So at this point all the main components are there. It’s now a case of formatting – removing right and bottom axes, removing gridlines. The vertical dashed lines, are column dividers, set at the pane level only.

The solid left hand axis is set via

The text is formatted using the fonts advised in the requirements and sizes adjusted to suit.

Add on a tooltip, and set the background colour of the worksheet and you should be done.

My published viz is here.

Happy vizzin’! Stay Safe!

Donna

Can you use Quick LoDs to recreate this view?

It was Lorna’s turn to set the challenge this week, and she took the opportunity to ask us to use a new feature in Tableau v2021.1 – Quick LoDs (you’ll obviously need v2021.1 to use the functionality, but the LoDs can be created manually in earlier versions if need be).

This blog will focus on

  • Creating the Sub-Category Average Sales by Category LoD using Quick LoDs
  • Formatting the difference calculation
  • Colouring the bars
  • Text for Tooltip
  • Putting it all together

The viz itself isn’t that complex once you’ve nailed the LoD, so lets’ start with that bit.

Creating the Sub-Category Average Sales by Category LoD using Quick LoDs

The what…? It’s a bit of a mouthful,.. the “Sub-Category Average Sales by Category”. What we’re essentially after here is the Total Sales per Category / No of Sub-Categories in the Category. To do this with LoDs, we first need to create an LoD to represent the total sales in each Sub-Category.

As mentioned in the requirements / referenced in the KB article above, you use a combination of ctrl/command click & drag to create an LoD via the Quick LoD feature. In this case I dragged the Sales measure onto the Sub-Category dimension, and this automatically created

Sales (Sub-Category)

{ FIXED [Sub-Category]: SUM([Sales]) }

which is the sales per sub-category.

From this, I then dragged this measure onto the Category dimension, which automatically then created

Sales (Sub-Category) (Category)

{ FIXED [Category]: SUM([Sales (Sub-Category)]) }

BUT I then edited this to change the aggregation to AVG, so the field became

{ FIXED [Category]: AVG([Sales (Sub-Category)]) }

This gives the average value required, which you can see is the same across the rows in a single Category :

Formatting the difference calculation

The viz displays the % difference between the Sub-Category sales and the average. This is calculated with

Difference

(SUM([Sales])- SUM([Sales (Sub-Category) (Category)])) / SUM([Sales (Sub-Category) (Category)])

This is then custom formatted as ▲0%;▼0% (I use this site to get my shapes from).

Colouring the bars

The bars need to be coloured based on the value of the Difference field, so another calculated field is required

Colour

[Difference]>0

This will just return true or false and can dropped on the Colour shelf of the bars.

Text for Tooltip

Within the tooltip, the % difference is displayed, along with some text which indicates if it is above or below the average. For this we need another calculated field to reference in the Tooltip.

Above | Below

IF [Difference]>0 THEN ‘Above’ ELSE ‘Below’ END

Putting it all together

With all the calculated fields built, the the chart itself is a relatively simple dual axis chart

  • Add Category then Sub-Category to Rows
  • Add Sales to Columns and sort descending
  • Add Sales (Sub-Category) (Category) to Columns. Make dual axis and synchronise axis
  • Click on the All marks card and remove Measure Names from the Colour shelf.
  • Click on the Sales marks card and change mark type to bar; add the Colour field to the Colour shelf and adjust accordingly; add Difference to the Label shelf and format appropriately.
  • Click on the Sales (Sub-Category) (Category) marks card and change mark type to Gantt; adjust the Size to be as large as possible; set the colour to the relevant grey.
  • On the All marks card, add the Above | Below field to the Tooltip shelf, then edit the Tooltip on the All marks card to create the required text.
  • Finally, remove axis, remove the column heading labels, remove grid lines and column borders and format the displayed text appropriately. Title the viz.

A very brief post today, but hopefully I’ve ticked off all the required elements, My published viz is here.

Happy vizzin’! Stay Safe!

Donna

Tableau Website Analytics

This week’s #WOW2021 challenge by Ann Jackson is focussed on dashboard design/layout, reference lines and formatting time, so that will be the focus of this blog too. I’m hoping it might be a fairly short post this week 🙂

  • Filtering the data
  • Creating the key measures, formatting time, adding average lines
  • Dashboard layout

Filtering the data

The data Ann provided is based on the Google Analytics data the team have harnessed related to the activity on the #WOW website. It’s a static data set, which contains data from 29 Dec 2019 up to 12 April 2021, but Ann states her solution only includes data up to 10 April 2021. I therefore added the Date field to the Filter shelf on the sheet and set it to end on 10 April 2021

Knowing I’d be building multiple sheets for this challenge, I set this filter to be a global filter by setting it to apply to worksheets – > all using this data source

Creating the key measures, formatting time, adding average lines

For this challenge, we’ll be creating a sheet for each BAN, and a sheet for each trend line depicting a measure by week, so 8 sheets in total.

For the Sessions measure, both the BAN and the trend chart are straightforward.

The BAN just shows SUM([Sessions]) on the Text shelf, appropriately formatted. The other BAN sheets are pretty much the same, but just show the appropriate measure in the appropriate colour.

The trend line displays Week([Date]) by SUM([Sessions]), with the average line added by dragging Average Line from the Analytics pane onto the chart and then formatting.

If the numbers don’t quite match up, it may be because your week is set to start on a different day. By default as I’m UK based, dates are set to start their week on a Monday. For this challenge to match Ann’s solution, your week needs to start on a Sunday. You can set this by right clicking on the data source itself and changing via the Date Properties option

To determine the average session duration, we need to build a calculated field, that then needs to be formatted to show minutes and seconds.

Average Session Duration

(SUM([Session Duration]) / SUM([Sessions]))/86400

Session Duration / Sessions will return a value in seconds. To be able to format this in the way required, we need to get the number of seconds as a proportion of a day. There are 86400 seconds in a day (60 * 60 * 24), so we divide by this.

We can then use a custom format on this field and use the nn:ss notation. NOTE not mm:ss. If you needed to format this as hours:minutes:seconds, the format would be hh:mm:ss, but mm:ss does not provide you with the right values. This video demonstrates all this, if you’re interested.

When it comes to building the trend line for this measure, the average line, can’t be added as simply in the way it could for the Sessions trend chart (at least I couldn’t get it to work that way…).

We need to build a calculated field that will show the same overall average value alongside the weekly averages. This value needs to match what’s displayed in the BAN chart.

Overall Session Duration mm:ss

{FIXED : [Avg Session Duration]}

This is returning the Avg Session Duration based on all the rows in the data. As the Date field has been added as a global filter, it is acting like a data source filter, so the dates we don’t want have been excluded from the data set that the FIXED LoD is being applied against. If the Date filter was a simple ‘quick filter’, this calculation wouldn’t work, as the data for the 11th & 12th April would also be included in the calculation.

Format this to nn:ss as well. Add this field to the Detail shelf of the trend chart, then right click on the Average Session Duration axis and Add Reference Line, and reference the Overall Session Duration mm:ss field.

For the bounce rate, we simply need

Bounce Rate

SUM([Bounces])/SUM([Sessions])

which is formatted to a percentage of 1 dp.

When building the trend line, I added the average line from the analytics pane, but that gave me a different value from my BAN. So I built

Overall Bounce Rate

{FIXED : [Bounce Rate]}

This was formatted to % 1dp, and added as a reference line as described above.

Finally the last measure, we need

Avg Time on Page

(SUM([Time on Page])/(SUM([Pageviews])- SUM([Exits])))/86400

formatted to nn:ss

and

Overall Time on Page mm:ss

{FIXED : [Avg Time on Page]}

and again formatted to nn:ss.

The charts for these are built exactly like the Average Session Duration.

Dashboard Layout

The easiest way to describe the layout I built is to show it 🙂 Note the Item Hierarchy on the left hand side of the image below.

I have a vertical container as the Main page.

The first row in this container is the title in a Text object.

The second row is a Blank object and is the yellow line. The background of the blank object is set to the relevant yellow, the outer padding is set to 0 all round, and then the height is set to 4. This gives the appearance of a thick coloured line.

The third row is another vertical container, and I’ve done this, so I can ultimately use the option to Distribute Contents Evenly on the container to ensure the horizontal container ‘rows’, which I’ll be adding into this container, are evenly spaced.

So ‘within’ the 3rd row, the 4th-7th rows are managed using a horizontal container, which in turn contains a blank object (the coloured vertical line), the BAN sheet and the trend sheet. Around each horizontal container I set the outer padding to 10 all round to give some spacing. The blank object in each ‘row’ is given the relevant background colour and set to a width of 15.

Finally, I finished off with an additional horizontal container at the bottom which is where I added my standard #WOW footer. Note this horizontal container is essentially the 4th row of the original Main vertical container though.

Hopefully I’ve provided enough for you to build the challenge yourself / resolve any issues you might have. If there’s anything I’ve missed, do please comment to let me know.

My published viz is here. Please note, that for some reason (and I don’t know why), Tableau Public does not seem to display my time axis properly (just shows 0). As the workbook renders on Public, the values on the axis show, but once fully loaded, they change. If you download the workbook, the values are fine. I published from Tableau Desktop 2021.1.0. I’m putting this down to an issue with Tableau Public.

Happy vizzin’! Stay Safe!

Donna

Can you recommend profitable return customer product bundles?

This week Candra Mcrae provided a market basket analysis challenge which had a bit of all sorts going on as well as the core ‘what else was bought with product x’ puzzle – parameter actions, sheet swapping, custom toggle buttons.

Hopefully I’ll cover all the key elements :

  • Identifying the 2nd customer order
  • Identifying the orders containing the selected sub-category
  • Identifying the ‘other’ sub-categories ordered
  • Building the bar chart
  • Building the tree map
  • Creating the toggle button
  • Controlling what viz to display
  • Creating the dashboard and interactivity

Identifying the 2nd customer order

The requirements state the analysis is based on returning customers, and specifically their 2nd order. So we need a way to identify this set of data.

In the superstore data set, a customer only ever places one order on a day, so we will make use of this feature, but this technique won’t always work in other situations – I personally have been known to place multiple orders with Amazon on the same day, especially when Christmas shopping!

If we look at the dates customers have ordered, we’re looking to identify the orders associated to the customer & dates highlighted below

We can identify whether the date is the first order date using a FIXED LoD calculation

Is Customer’s First Order?

{FIXED [Customer ID]: MIN([Order Date])} = [Order Date]

The FIXED LoD statement within the {..} is finding the minimum Order Date for each Customer ID. This is then being checked against each Order Date, and returning true when it matches.

So this is giving us the first record, but we want the second….

I solved this, by first adding Is Customer’s First Order? = False as a data source filter (right click on the Superstore data source -> edit data source filters)

This essentially removes all the rows associated to the first order for each customer from the data set, and consequently, the Is Customer’s First Order? is now reported as True against what was originally the 2nd order…
If you compare the image above to the one further up the page where I marked the 2nd orders, you should see the values match what now seems to be listed as the first!

This works due to Tableau’s order of operations; the data source filter is filtering the data ‘early on’, so the rows of data we’re now working against, has no knowledge of those rows. Subsequently the customer’s first order is now actually reporting the customer’s 2nd order as the minimum (aka first) order date.

We can now add Is Customer’s First Order? to the Filter shelf and set to True, to just work with this set of orders.

NOTEIf the data source has not yet been extracted, which will be required if you want to publish to Tableau Public, then you will need to do one of the following :

  1. At the point of extraction, the data source filter will automatically be applied as an extract filter. If you keep it, then after extracting, remove the data source filter if its still there, otherwise you’ll find you’re reporting against the 3rd customer order OR
  2. At the point of extraction, remove the extract filter that is automatically applied. After extraction, re-add the data source filter if it’s been removed.

An alternative to using any data source/extract filters is to use the following LoD (which I only got my head round after building out the above, so I didn’t put it in my solution., but works just as well).

Is Customer’s 2nd Order?

{FIXED [Customer ID]:MIN(IIF([Is Customer’s First Order]=False,[Order Date],NULL))} = [Order Date]

The IIF statement is getting the Order Date for all the records which aren’t the Customer’s 1st order. The FIXED LOD, is then returning the minimum/smallest of these dates, which will be the date of the 2nd order. This is then compared to each Order Date to identify the appropriate record.

If you strip off any data source/extract filters, you can see how this works, and Is Customer’s 2nd Order? can be added to the Filter shelf instead, set to True, to get the required rows.

Identifying the orders containing the selected sub-category

To start with we need a parameter to store the selected Sub-Category. Right click on Sub-Category and select Create -> Parameter to create a string parameter which lists the sub-categories. I named mine pSelected_SubCat and it’s defaulted to Accessories.

What we’re now trying to find is the rows relating to Order IDs that contain the stated Sub-Category, in this instance Accessories

So we first need to identify the rows which match

Customer Ordered Selected SubCat

IIF([pSelected_SubCat]=[Sub-Category],1,0)

Add this to the view, and you can see the rows associated to Accessories are marked with 1

But we don’t just want these rows; we also need the rest of the rows on the same order, so in the case of order CA-2018-131534 above, we also need the row associated to the Paper Sub-Category.

For this, I used Sets as per the technique described in the link Candra provided within the hint.

We want the set of Order IDs where the sum of the Customer Ordered Selected SubCat field is >=1. Right click on Order ID and Create -> Set. On the Condition tab, set the relevant properties.

Adding the Orders with Selected SubCat set into the view, and we can see it’s picked up the rows we care about, and we can now filter by this set

An alternative to using Sets, is to use another FIXED LoD calculation

{FIXED [Order ID]: SUM([Customer Ordered Selected SubCat])}>=1

For each Order ID, sum up the Customer Ordered Selected SubCat field, and if it’s >=1 then it will return True, and you can then filter by this instead.

Identifying the ‘other’ sub-categories ordered

We want to count the orders which contain each Sub-Category, which is simply

Count Orders

COUNTD([Order ID])

With a basic view that is filtered as required, that lists Sub-Category by Count Orders (and sorted by Count Orders descending), we get

And you’ll see that Accessories is listed at the top. Change the parameter, and that value will also be listed. But we don’t want to see this – we need to filter it out. For this, we can create

Other Items Ordered

If [pSelected_SubCat]=[Sub-Category] THEN ‘N/A’ ELSE [Sub-Category] END

Pop this into the view, and you can see the ‘N/A’ is listed against the selected Sub-Category.

This means we can add a further Filter using the Other Items Ordered field and set it to Exclude N/A.

Building the bar chart

So now we’ve got the 3 filters we need to identify the rows of data we need to consider, we can easily build a bar chart.

To colour the bars, we need to work out the profitability

Profit Ratio

SUM([Profit])/SUM([Sales])

Is Profitable?

[Profit Ratio]>0

Add Is Profitable? onto the Colour shelf and adjust accordingly

You just then need to format to remove gridlines/axes etc, and add the relevant fields (correctly formatted) to the Tooltip to create the required text.

Building the treemap

The simplest way to do this, is to duplicate the sheet you’ve built for the bar chart, then use the Show Me option (top right) and select TreeMap. This will put most of the pills in the right places. The only thing that needs to be shifted is to add Is Profitable? back onto the Colour shelf, and to explicitly set the Sort order against the Sub-Category field. Count Orders also needs adding to the Label shelf, so it can be displayed.

Creating the toggle control

To create the toggle control which will control the switching of the vizzes do the following :

  • In the Columns shelf, type in MIN(0)
  • Then next to it, type in MIN(1)
  • Drag the MIN(1) pill from the Columns shelf and drop onto the MIN(0) axis, when you see the ‘2 green columns’ appear
  • Remove Measure Names from the Rows shelf
  • Change the mark type to Line
  • Create a parameter called Toggle which is an integer containing the values 0 & 1, defaulted to 0. Show this parameter on the view.
  • Create a new calculated field, Selected Toggle which just references the Toggle parameter
  • Add Selected Toggle to the Columns shelf and change to aggregate to MIN rather than SUM
  • Make it dual axes, and synchronise the axes
  • Remove Measure Names from the Colour shelf on the All marks card.
  • On the Selected Toggle card, change the mark type to Circle, and increase the Size. If you change the Toggle parameter the circle should shift to 1.
  • To colour the line based on the position, create a new calculated field

Colour

IF [Toggle]=0 THEN ‘grey’ ELSE ‘green’ END

  • Add Colour to the Colour shelf of the All marks card. With the Toggle parameter set to 0, set the colour to grey. Change the parameter to 1, and set the colour to green.
  • To set the tooltip create a new calculated field

Toggle Label

IF [Toggle]=0 THEN ‘Bar’ ELSE ‘Treemap’ END

  • Add Toggle Label to the Tooltip shelf of the Selected Toggle marks card. Adjust the tooltip text to just show this value.
  • Remove the text from the Tooltip of the Measure Names marks card.
  • Hide the axes headers and remove the row & column dividers

Controlling what viz to display

With the Toggle parameter set to 0, add the Selected Toggle field to the Filter shelf of the bar chart. and set it to be 0 (or at most 0 depending what filter control displays)

Then set the Toggle parameter to 1, and add the Selected Toggle field to the Filter shelf of the Treemap chart. Set this to be 1 (or at least 1 depending on the filter control displayed)

You should now find that if you switch to the bar chart sheet, nothing is displayed. Change the Toggle parameter back to 0, and the bar chart will now show, but there will be nothing on the treemap chart.

Creating the dashboard and interactivity

Create a dashboard sheet.

Add a vertical layout container

Add the bar chart into the vertical layout container. Set the chart to fit entire view and don’t display the title.

Add the treemap chart beneath the bar chart in the same vertical container. Whilst nothing is actually displaying, still set the chart to fit entire view and again don’t display the title.

The Toggle parameter should have automatically displayed on the right hand side, so test changing the parameter to see how the chart switches.

Add the Toggle Control sheet into the right hand container too.

Add a dashboard action to change the Toggle parameter when the Toggle Control is interacted with. It should take effect on Select of the Toggle chart, affect the Toggle parameter and pass the Measure Values field into the parameter.

Clicking on each end of the Toggle chart should now make the viz change from bar to treemap.

You just now need to tidy up the dashboard – add a title, remove any unrequired objects, set the container background of the right hand panel to grey (you will need to set the background on the toggle sheet to the same shade of grey too).

And hopefully that’s it. My published version is here.

Happy vizzin’! Stay Safe!

Donna

Can you build a Fancy Text Table?

Ann Jackson provided this week’s challenge, to deliver a text table using only Measure Names & Measure Values. I thought with Ann’s introduction that “This challenge should be straightforward for users of all levels” that this would be relatively straightforward, but I have to confess there were moments that I struggled with this. I knew the fundamentals that I’d need to complete this; that all the columns except the first were going to need to be numbers (ie measures), that I’d have to use custom formatting to display the number in the required format (a shape, a date, a word), and that I’d need to use the ‘legends per measure’ functionality to colour each column independently of each other. But determining the best/worst date to display proved to be a bit tricksy! I got there in the end, but there was a fair bit of trial and error.

Custom Formatting

I’m going to step through the build of this, as I think that’s probably the easiest way to describe this challenge. But before I do, one of the core fundamentals to this is knowing about how numbers can be custom formatted. By that I mean when you right-click on a measure -> default properties -> number format -> custom

This box allows you to type in, but you need to know what format/syntax to use. If you set the formatting via one of the other options, then look at the Custom option, it’ll have an entry that will give you a starting point. The above is the format for a number set to 1 decimal place, and shows that negative numbers will be prefixed by a minus sign (-). If we wanted to always show a plus sign in front of a positive number, we can edit this custom formatting to +#,##0.0;-#,##0.0.

The first entry to the left of the semi colon (;) indicates what’s applied to positive numbers. The next entry, to the right of the semi colon, indicates what’s applied to a negative number.

With this knowledge, you can apply more ‘creative’ custom formatting to any numeric measure that contains positive and negative numbers. For example if you want to show a ☑ or a ☒ depending on a ‘yes/no’ or ‘true/false’ concept, then we can create a version of the field as a number along the lines of

Field as Number

If [Field] = ‘XXXX’ THEN 1 ELSE -1 END.

We can than custom format this field by entering ☑;☒ into the text box

The field can still be treated as a measure, since the underlying value is still a number (in this case +/- 1), it’s just displayed differently.

Building the measures

So now we’ve covered how this ‘sneaky formatting’ is working, we’ll get on with the overall build.

The data just needs data from 2019 & 2020, so I chose to set a data source filter to restrict to just these two years.

But, I wanted the rest of the challenge to derive the current year instead of hardcoding, so I created fields

Current Year

YEAR({MAX([Order Date])})

Last Year

[Current Year] – 1

From these, I could then use LoDs to create

CY SALES

IF YEAR([Order Date]) = [Current Year] THEN [Sales] END

This is formatted to $ with 0 dp

and then

LY SALES

IF YEAR([Order Date]) = [Last Year] THEN [Sales] END

again formatted to $ with 0 dp.

We then need an indicator which is ‘true’ if CY SALES is greater than LY SALES, but as discussed above, we need this to be a ‘measure’, which we can custom format.

CY vs LY

IF SUM([CY SALES])>SUM([LY SALES]) THEN 1 ELSE -1 END

Custom format this as ✅;❌ (just copy these symbols from this page… they’ll look black and white in the dialog) – check out this page, to lift the images/other symbols from.

The actual difference identified by △ (again just copy and paste this symbol into the field name) is simply

SUM([CY SALES]) – SUM([LY SALES])

formatted to $ with 0 dp. Once you’ve done this using the Currency(Custom) option, then go to the Custom option and add + to the front of the string :

+”$”#,##0;-“$”#,##0

Next up is the percentage difference

% DIFF
[△]/SUM([LY SALES])

again format this first to a Percentage at 1 dp, then edit the Custom format to +0.0%;-0.0%

Now we’re getting to the slightly more complex part of the challenge – to identify the best and worst day in the month. We’ll start with the best day. We’re using FIXED LoDs throughout this, and while it’s probably possible to do in a single calculation, we’ll use multiple calcs to build up the components.

Order Month

DATENAME(‘month’,[Order Date])

This is the one dimension that’s going to be used in the final output, and simply outputs the month name (January, February etc).

In the data set, there can be multiple sales (ie orders) in a single day. We want to identify the total sales in 2020 (ie the current year) for each order date.

Sales Per Day

{FIXED [Order Date] : SUM([CY SALES])}

Now we’ve got the total sales per day, we want to identify the value of the maximum daily sales in each month

Max CY Sales Per Month

{FIXED [Order Month]: MAX([Sales Per Day])}

Now we need to identify the date in the month that the max daily sales ocurred

BEST DAY

INT({FIXED [Order Month]: MAX(IF [Sales Per Day] = [Max CY Sales Per Month] THEN ([Order Date]) END)}) + 2

WOAH! WHAT??? Let’s try to break this down…

IF [Sales Per Day] = [Max CY Sales Per Month] THEN ([Order Date]) END

If the daily sales value is the maximum daily sales in the month, then return the associated Order Date. But we need to get a date per month, so we’ve wrapped this in a FIXED LoD, for each Order Month. LoDs require the value to be aggregated, so the IF statement gets wrapped in a MAX statement (note MIN would work just as well).

{FIXED [Order Month]: MAX(IF [Sales Per Day] = [Max CY Sales Per Month] THEN ([Order Date]) END)}

Finally, due to the nature of this challenge, that requires we only work with Measure Names & Measure Values, we will convert this date field to a number using the INT function.

The intention here, is that we can then use the Custom formatting option once again, to set the number as a date format – I chose dd mmm yyyy (ie 01 Jan 2020 format, as I feel its less confusing that working out whether the date is in UK or US format).

However, by a very weird circumstance, converting a date to an INT then formatting as a date, will give you a date 2 days out from the one you converted. I don’t understand why, and it left me scratching my head for some time. I had to sense check with a fellow #WOWer who had the same, and checking Ann’s solution, she also was handling the oddity, which is the reason for the +2 on the calculation.

We just create similar fields for identifying the worst day

Min CY Sales Per Month

{FIXED [Order Month]: MIN([Sales Per Day])}

WORST DAY

INT({FIXED [Order Month]: MAX(IF [Sales Per Day] = [Min CY Sales Per Month] THEN ([Order Date]) END)}) + 2

format this to dd mmm yyyy

The final measure we need is based on determining the rank of the CY SALES per month. Ie if we ordered the months based on CY SALES descending, the top 6 would be marked as ‘Top’ and the rest as ‘Bottom’.

RANK

IF RANK(SUM([CY SALES])) <= 6 THEN 1 ELSE -1 END

We can then custom format this to “Top”;”Bottom”

Formatting the Table

Create a text table by

  • Order Month on Rows
  • Measure Names on Columns
  • Measure Values on Text
  • Measure Names on Filter, filtered to just the relevant measures

Add Measure Values to the Colour shelf, and select the Use Separate Legends option to display multiple diverging colour legend controls.

It’s now a case of going through each measure and editing the colour palette, and other settings. Some of this was again a bit of trial and error for me – I chose options that worked.

For the black text fields (CY SALES, LY SALES, %DIFF), choose a diverging colour palette, then click on the coloured squares at each end and select black from the colour picker. Select Stepped Colour and reduce the steps to 2.

Apply the same concept to the BEST DAY and WORST DAY legends, but select the appropriate green or red colour instead.

For the remaining fields, select a diverging colour palette, select the appropriate red at one end, and green at the other, reduce the steps to 2

And subject to some other formatting tweaks (increase font sizes, centre text), this is enough to complete the challenge. My final published viz is here. Note, the published viz does have slight differences to what I’ve blogged… as with many things, you sometimes realise things can be simpler when you try again.

Happy vizzin’! Stay Safe!

Donna

Can you build a Customer Lifetime Value Matrix?

Ann Jackson challenged us this week to build this matrix depicting the average worth of customer cohorts during their lifetime.

This challenge involves a mix of LoDs (Level of Detail calculations) and table calculations.

First up , we need to define our customer cohorts (ie group the customers), which for this challenge is based on identifying the quarter they placed their first order in. This will involve an LoD calculation. For a good introduction to LoDs with some worked examples (including a similar cohort analysis example), check out this Tableau blog post.

ACQUISITION QUARTER

DATE(DATETRUNC(‘quarter’,{FIXED [Customer ID] : MIN([Order Date])}))

The 2nd part of the formula in the { … } returns the earliest Order Date associated to the Customer ID, which is then truncated to the 1st day of the quarter that date falls in ie 23 Feb 2019 is truncated to 01 Jan 2019.

For the ‘quarters since birth’ field, we need to calculate the difference in quarters, between the ACQUISITION QUARTER and the ‘quarter’ associated to the Order Date of each order in the dataset.

QUARTERS SINCE BIRTH

DATEDIFF(‘quarter’,[ACQUISITION QUARTER],DATETRUNC(‘quarter’,[Order Date]))

Drag this field into the ‘dimensions’ area of the left hand data pane (above the line if you’re using later versions of Tableau).

Lets sense check what this looks like, by adding

  • ACQUISITION QUARTER to Rows (Discrete, Exact Date)
  • ORDER DATE to Rows, set to Quarter (quarter year ie May 2015 format which will make a green pill), then set to discrete to change to blue
  • QUARTERS SINCE BIRTH to Rows

You can see that while the first row against each cohort starts with a different quarter, the QUARTERS SINCE BIRTH always starts at 0 and counts sequentially down the table.

Next we want to count the number of distinct customers in each cohort, and we’ll use another LOD for this.

CUSTOMERS

{FIXED [ACQUISITION QUARTER]: COUNTD([Customer ID])}

Once again move this field into the Dimensions section of the data pane.

Add this onto the Rows of the above data table, and you should get every row for the same cohort displaying the same number

Add Sales onto Text to get the value of sales made by the customer in each cohort in each quarter. The ‘customer lifetime value’ we need is defined as the total sales so far / number of customers in the cohort.

Remove the QUARTER(Order Date) field from the table, as we’re not going to need this for the display, and it’ll affect the next steps if it’s left.

To get the cumulative sales, we need a Running Total Quick Table Calculation. Click on the Sales pill on the Text shelf and select Quick Table Calculation -> Running Total. The click again and Compute By -> QUARTERS SINCE BIRTH. Add Sales back into the table, so you can see the quarterly Sales value and how it’s cumulating until it reaches the next cohort.

We’ve now got the building blocks we need for the CLTV value we need to plot

Avg Lifetime Value

RUNNING_SUM(SUM([Sales])) / SUM([CUSTOMERS])

Note – I purposefully haven’t called this field what you might expect, as I’m going to ‘fill in the gaps’ that Ann describes in the requirements, and I’ll use that name then.

Pop this field into the table above, again setting the table calculation to compute by QUARTERS SINCE BIRTH

You can now use the data table above to validate the calculation is what you expected.

Now let’s build the viz out.

On a new sheet

  • QUARTERS SINCE BIRTH to Columns
  • ACQUISITION QUARTER (exact date, discrete blue pill) to Rows
  • Avg Lifetime Value to Text, setting the table calculation to Compute By QUARTERS SINCE BIRTH

From this basic text table, you can see the ‘blank’ fields, Ann mentioned. In the data table view, it’s not so obvious. The blank is there because there are no sales in those quarters for those cohorts. To fix we need another table calculation

CUSTOMER LIFETIME VALUE (CLTV)

IF ISNULL([Avg Lifetime Value]) AND NOT ISNULL(LOOKUP([Avg Lifetime Value],-1)) AND NOT ISNULL(LOOKUP([Avg Lifetime Value],1))

THEN LOOKUP([Avg Lifetime Value],-1) ELSE [Avg Lifetime Value] END

This says, if the Avg Lifetime Value field is NULL but neither the previous or the subsequent values are NULL, then use the Avg Lifetime Value value from the previous column (LOOKUP).

Replace the Avg Lifetime Value with the CUSTOMER LIFETIME VALUE (CLTV) field (setting the Compute By again), and the empty spaces have disappeared.

If you hover over the cells in the lower right hand side of the view, you’ll see tooltips showing, indicating that a mark has been drawn on the viz with Null data. To fix this, add CUSTOMER LIFETIME VALUE (CLTV) to the Filter shelf and specify non-null values only to show via the Special tab.

Now if you hover over that area you don’t get any tooltips displaying, as there aren’t any marks there.

Now it’s just a case of formatting the viz a bit more

  • Add CUSTOMERS to Rows
  • Add CUSTOMER LIFETIME VALUE (CLTV) to the Colour shelf by holding down the Ctrl key then clicking on the field that’s already on the Text shelf, and dragging via the mouse onto the Colour shelf. Using Ctrl in this way has the effect of copying the field including the table calculation settings, so you don’t need to apply them again. This will change the colour of the Text.
  • Then change the mark type to Square, which will then fill out the background based on the colour.
  • Then edit the colour legend to the relevant palette (which you may need to install via Ann’s link).
  • Set the border of the mark via the Colour shelf to white
  • Remove the row & column dividers
  • Set the row Axis Ruler to a dark black/grey line
  • Format the 1st 2 columns so the font is the same and centred. Widen the columns if required.
  • Update the tooltip

And then you should be ready to add the viz to your dashboard. My published version is here.

This blog is a bit more detailed that my recent posts, but I’m also conscious I’ve skipped over some bits that if you’re brand new to Tableau, you may not be sure how to do. Feel free to add comments if you need help!

Happy vizzin’! Stay Safe!

Donna