Can you build a Customer Lifetime Value Matrix?

Ann Jackson challenged us this week to build this matrix depicting the average worth of customer cohorts during their lifetime.

This challenge involves a mix of LoDs (Level of Detail calculations) and table calculations.

First up , we need to define our customer cohorts (ie group the customers), which for this challenge is based on identifying the quarter they placed their first order in. This will involve an LoD calculation. For a good introduction to LoDs with some worked examples (including a similar cohort analysis example), check out this Tableau blog post.

ACQUISITION QUARTER

DATE(DATETRUNC(‘quarter’,{FIXED [Customer ID] : MIN([Order Date])}))

The 2nd part of the formula in the { … } returns the earliest Order Date associated to the Customer ID, which is then truncated to the 1st day of the quarter that date falls in ie 23 Feb 2019 is truncated to 01 Jan 2019.

For the ‘quarters since birth’ field, we need to calculate the difference in quarters, between the ACQUISITION QUARTER and the ‘quarter’ associated to the Order Date of each order in the dataset.

QUARTERS SINCE BIRTH

DATEDIFF(‘quarter’,[ACQUISITION QUARTER],DATETRUNC(‘quarter’,[Order Date]))

Drag this field into the ‘dimensions’ area of the left hand data pane (above the line if you’re using later versions of Tableau).

Lets sense check what this looks like, by adding

  • ACQUISITION QUARTER to Rows (Discrete, Exact Date)
  • ORDER DATE to Rows, set to Quarter (quarter year ie May 2015 format which will make a green pill), then set to discrete to change to blue
  • QUARTERS SINCE BIRTH to Rows

You can see that while the first row against each cohort starts with a different quarter, the QUARTERS SINCE BIRTH always starts at 0 and counts sequentially down the table.

Next we want to count the number of distinct customers in each cohort, and we’ll use another LOD for this.

CUSTOMERS

{FIXED [ACQUISITION QUARTER]: COUNTD([Customer ID])}

Once again move this field into the Dimensions section of the data pane.

Add this onto the Rows of the above data table, and you should get every row for the same cohort displaying the same number

Add Sales onto Text to get the value of sales made by the customer in each cohort in each quarter. The ‘customer lifetime value’ we need is defined as the total sales so far / number of customers in the cohort.

Remove the QUARTER(Order Date) field from the table, as we’re not going to need this for the display, and it’ll affect the next steps if it’s left.

To get the cumulative sales, we need a Running Total Quick Table Calculation. Click on the Sales pill on the Text shelf and select Quick Table Calculation -> Running Total. The click again and Compute By -> QUARTERS SINCE BIRTH. Add Sales back into the table, so you can see the quarterly Sales value and how it’s cumulating until it reaches the next cohort.

We’ve now got the building blocks we need for the CLTV value we need to plot

Avg Lifetime Value

RUNNING_SUM(SUM([Sales])) / SUM([CUSTOMERS])

Note – I purposefully haven’t called this field what you might expect, as I’m going to ‘fill in the gaps’ that Ann describes in the requirements, and I’ll use that name then.

Pop this field into the table above, again setting the table calculation to compute by QUARTERS SINCE BIRTH

You can now use the data table above to validate the calculation is what you expected.

Now let’s build the viz out.

On a new sheet

  • QUARTERS SINCE BIRTH to Columns
  • ACQUISITION QUARTER (exact date, discrete blue pill) to Rows
  • Avg Lifetime Value to Text, setting the table calculation to Compute By QUARTERS SINCE BIRTH

From this basic text table, you can see the ‘blank’ fields, Ann mentioned. In the data table view, it’s not so obvious. The blank is there because there are no sales in those quarters for those cohorts. To fix we need another table calculation

CUSTOMER LIFETIME VALUE (CLTV)

IF ISNULL([Avg Lifetime Value]) AND NOT ISNULL(LOOKUP([Avg Lifetime Value],-1)) AND NOT ISNULL(LOOKUP([Avg Lifetime Value],1))

THEN LOOKUP([Avg Lifetime Value],-1) ELSE [Avg Lifetime Value] END

This says, if the Avg Lifetime Value field is NULL but neither the previous or the subsequent values are NULL, then use the Avg Lifetime Value value from the previous column (LOOKUP).

Replace the Avg Lifetime Value with the CUSTOMER LIFETIME VALUE (CLTV) field (setting the Compute By again), and the empty spaces have disappeared.

If you hover over the cells in the lower right hand side of the view, you’ll see tooltips showing, indicating that a mark has been drawn on the viz with Null data. To fix this, add CUSTOMER LIFETIME VALUE (CLTV) to the Filter shelf and specify non-null values only to show via the Special tab.

Now if you hover over that area you don’t get any tooltips displaying, as there aren’t any marks there.

Now it’s just a case of formatting the viz a bit more

  • Add CUSTOMERS to Rows
  • Add CUSTOMER LIFETIME VALUE (CLTV) to the Colour shelf by holding down the Ctrl key then clicking on the field that’s already on the Text shelf, and dragging via the mouse onto the Colour shelf. Using Ctrl in this way has the effect of copying the field including the table calculation settings, so you don’t need to apply them again. This will change the colour of the Text.
  • Then change the mark type to Square, which will then fill out the background based on the colour.
  • Then edit the colour legend to the relevant palette (which you may need to install via Ann’s link).
  • Set the border of the mark via the Colour shelf to white
  • Remove the row & column dividers
  • Set the row Axis Ruler to a dark black/grey line
  • Format the 1st 2 columns so the font is the same and centred. Widen the columns if required.
  • Update the tooltip

And then you should be ready to add the viz to your dashboard. My published version is here.

This blog is a bit more detailed that my recent posts, but I’m also conscious I’ve skipped over some bits that if you’re brand new to Tableau, you may not be sure how to do. Feel free to add comments if you need help!

Happy vizzin’! Stay Safe!

Donna

Can you find the variance along a line?

So after a couple of weeks off blogging due to Christmas, I’m back providing solutions for #WOW2021! New contributor Candra Mcrae started off the year with this gentle workout focussing on table calculations. For those who have followed my previous blogged solutions, you’ll probably recall that table calcs don’t phase me as much as they do some – I started using Tableau before LODs existed, so they were the only tool at my disposal in the past. That said, I’m still not such an expert that I get it right first time – there’s often plenty of trial and error as I choose which fields I want to compute by, although I’ve got much better at this since I read Andy Kriebel’s Table Calculations Overview a couple of years ago.

As with any challenge that involves table calcs, I tend to start by building out a tabular view of all the data I’m going to need to build the viz. This ensures I can validate the data much easier and set the table calculation settings to what I need.

So let’s start….

For the line chart itself, we’re simply going to be plotting Year against Food insecurity which has been formatted to 1 decimal place and displays the % as a suffix

We will also need to capture 2 other values which will represent the coloured circles. Parameters are needed to help identify what these circles will be.

pSelected Year is an integer parameter that can be built by right clicking on Year and selecting Create -> Parameter. This will populate the parameter with all the values of the Year field. Default the value to 2018 and adjust the formatting of the displayed value so that it is 0 decimal places and include thousand separators is unchecked.

With this parameter, we can capture the value associated which is represented by the pink dot in the viz

Selected Year %

IF [Year] = [pSelected Year] THEN [Food insecurity (includes low and very low food security) Percent of households] END

It’s a bit more work to identify the black dot, as this will vary based on another parameter

pComparison

I created this an integer parameter storing the values 0, 1 & 2 (defaulted to 2), but aliased for display as First Year, Most Recent Year, Previous Year. I’ll be using this is an case statement/if clause later and comparing integers to strings is much more efficient in Tableau.

I like to take things step by step, especially when there’s table calcs involved, to ensure all the values I’m referencing are correct, so rather than identifying the selected value in a single calculated field, I’m using multiple.

Firstly I want to identify the 1st year in the dataset (without hardcoding).

First Year

WINDOW_MIN(MIN([Year]))

This will store the value of the earliest year (1995) against every row in the data that is outputted in the view (a bit like you would get with {FIXED : MIN([Year])}, but this is a no LOD challenge).

From this I can work out

First Year %

IF MIN([Year]) = [First Year] THEN MIN([Food insecurity (includes low and very low food security) Percent of households]) END

Notice the MIN() functions used in this statement, as opposed to the Selected Year % above. This is because First Year is a table calc which is an aggregation, and subsequently other fields referenced in the calculation also need to be aggregated. In this case other aggregations such as AVG, MAX, ATTR would also suffice.

Similarly, I’m going to derive the Latest Year % with

Latest Year

WINDOW_MAX(MAX([Year]))

Latest Year %

IF MIN([Year]) = [Latest Year] THEN MIN([Food insecurity (includes low and very low food security) Percent of households]) END

Finally, I’m also going to work out the value for the previous year

Previous Year

[pSelected Year]-1

Previous Year %

IF [Year] = [Previous Year] THEN [Food insecurity (includes low and very low food security) Percent of households] END

With these 3 % fields, I can now create a fourth field which stores the value of the % I want to compare with

Selected Comparison %

CASE [pComparison]
WHEN 0 THEN [First Year %]
WHEN 1 THEN [Latest Year %]
WHEN 2 THEN SUM([Previous Year %])
END

This ultimately just stores the 2nd value required for the black dot. This could all have been written within this single calculation, but I find it easier to troubleshoot if things are broken down a bit.

Putting these onto a table we can see how the values in each row change, as the parameters are changed.

Note, for all the table calculation fields (all denoted by the triangle symbol on the pill), I have explicitly set them to Compute Using the specific dimension of Year rather than the default of table down. While this will give the same result, I prefer to be explicit to ensure the values don’t change if pills get subsequently moved around the canvas (in the case of the Selected Comparison % field, all Nested Calculations within the Edit Table Calculation dialog box need to be set).

This is enough information to build the main viz itself by

  • adding Year to Columns (green continuous pill)
  • adding Food insecurity to Rows
  • adding Selected Year % to Rows next to Food insecurity
  • then drag Selected Comparison % to the Selected Year % axis which will automatically change the display to have Measure Values on Rows instead. Set the table calculation setting to compute by Year

This chart can then be set to be dual-axis and the axes synchronised. The Food insecurities should remain as as line mark, and the Measure Values should be a circle. The colours, formatting and tooltip then need to be applied.

Now we need to go back to our table of data to build out other calculations. The requirement is a single viz, so we need to provide the % value of the selected year within the title of the chart, along with the difference from the comparison value. For this to work, we need to store the relevant values against every row in the data set.

We already have the Selected Year % value identified, but this is only captured against the row of the selected year in the data output. To get it to display against every row we need

Window Max – Selected Year %

WINDOW_MAX(MIN([Selected Year %]))

This is formatted to 0 dp with the % sign as suffix, as this is the field that will be displayed in the title. Added to our data table, with the table calc set to compute by Year, you can see the value replicated across every row

Similarly, we already have the Selected Comparison % captured, but in order to work out the difference, we also need to get this value against every row too

Window Max – Selected Comparison %

CASE [pComparison]
WHEN 0 THEN WINDOW_MAX([First Year %])
WHEN 1 THEN WINDOW_MAX([Latest Year %])
WHEN 2 THEN WINDOW_MAX(MAX([Previous Year %]))
END

Adding this in to the table (and remembering to set the table calc settings), you can also see the relevant value perpetuated against every row. Change the pComparison value and you’ll see the values change accordingly, but still the same on every row.

So for the difference…. at this point I chose to deviate from the solution published. We’re already dealing with a measure that is quantified as a % (as opposed to a measure that is quantified as £ say). So to get the % difference between two percentage measures, I simply chose to show the difference between the two values (ie selected – comparison). Some would refer to this a the point difference between the values. This makes most sense to me for this particular scenario. The alternative is to calculate the % difference in the more traditional way of (selected – comparison) / comparison as you may do if you were presenting the % difference between the value of sales in different years. But I personally found the result could be confusing to the reader .

Difference

[Win Max – Selected Year %]-[Win Max – Selected Comparison %]

BUT, the value displayed within the title is the absolute value (ie no -ve sign) and actually doesn’t display if the value is 0 (which you get if you select the year to be 2019 and comparison to most recent year). So I resolved this with

DISPLAY : Difference

IF [Difference] <> 0 THEN ABS([Difference]) END

which is formatted to 1 dp with % suffix

Finally, we have a symbol that is used to indicate if the change is +ve, -ve or remains the same.

Difference Indicator

IF [Difference] > 0 THEN ‘▲’
ELSEIF [Difference] < 0 THEN ‘▼’
ELSE ‘N/C’
END

Add this into the table too, once again remembering to set the table calc properties for all nested calculations.

So now you have these fields, you can add Window Max – Selected Year % and DISPLAY : Difference and Difference Indicator to the Detail shelf of the All marks card on the chart viz, and these will then be available in title dialog to add. Once again, ensure you set the table calcs to compute by Year.

You’ll also need to add some spacing in the title, to allow the parameter controls to be ‘floated’ into place on the dashboard. Getting the position right is really tricky. I positioned it very carefully on Desktop, but when I published, the controls were in completely different places. The easiest way I found to resolve this, was to use the Edit feature in Tableau public to edit the dashboard online and move the objects that way.

My published viz is here.

Happy vizzin’! Stay Safe!

Donna

Can you build a Premier League table?

Luke provided this fun challenge this week. It was nice to see an English sport that I’m very familiar with feature, even if I had to keep retyping the word ‘Draw’ for ‘Tie’ 🙂

This blog will cover

  • Re-modelling the data
  • Building the table & bar chart
  • Building the Last 5 matches chart
  • Getting the data for the Last 5 matches chart tooltip

Re-modelling the data

Sometimes the #WOW challenges are set with a very specific direction in mind eg no LODs, use 1 sheet only, no data modelling etc. This challenge had no real restrictions, and there were minimal clues. The Latest Challenges page indicated they’d be table calculations involved

and Luke’s tweet introducing the challenge suggested there was a ‘trick’ that would make this a lot simpler to solve…. hmmmm???

I looked at the data referenced to get a feel of the shape and layout – it showed 1 row per game, with the home & away team for each game stored in different columns. Trying to count the number of games a specific team played was likely to be complicated in this format. Given Luke had suggested a ‘trick’, I decided I was going to pivot the data to expand the data to give me 1 row per game per team, hoping this was the ‘trick’ he implied, and there was no requirement to state that data modelling wasn’t allowed.

I did this in Tableau itself, in the data source pane. After connecting to the data, highlight the columns Home Team and Away Team, right click and select Pivot.

This has the effect of duplicating the rows of data and adding 2 additional columns to the end Pivot Field Names containing the column names that had been pivoted, and Pivot Field Values containing the values of those fields.

I then simply renamed Pivot Field Names to Home or Away and Pivot Field Values to Team.

With the data structured this way it was very simple to calculate the various measures required :

Win-Loss-Draw

IF [Home or Away] = ‘Home Team’ AND FTR = ‘H’ THEN ‘W’
ELSEIF [Home or Away] = ‘Away Team’ AND FTR = ‘A’ THEN ‘W’
ELSEIF FTR = ‘D’ THEN ‘T’
ELSE ‘L’
END

FTR in the data set stands for full-time result and indicated whether it was the Home (H) or the Away (A) team that won, or whether it was a Draw (D). So this field indicates whether the Team in each row won, lost or drew (tied) the match.

From this we can then work out the points the Team gained in the match.

Points

IF [Win-Loss-Draw] = ‘W’ THEN 3
ELSEIF [Win-Loss-Draw] = ‘T’ THEN 1
ELSE 0
END

To determine the total number of Wins for each team I created

Wins

FIXED [Team]: SUM(IF [Win-Loss-Draw] = ‘W’ THEN 1 ELSE 0 END)}

and then repeated this similarly to get a Losses and Ties measure.

Finally I determined the total number of games each team played by using the provided tablename (count) field that is now added in more recent versions of v2020 (supersedes the Number of Records field).

Matches Played

{FIXED [Team]: (COUNT([2020_11_11_WW46_EPL.csv]))}

Building the table & bar charts

In my initial built, I created one sheet to show both the table summary with the bar chart, and a separate sheet to show the last 5 matches, with the intention the charts would be lined up side by side on the dashboard, removing padding to make the charts look to be one.

But with the table and bar on the same chart, I couldn’t get the labels of each column and the ‘Total Points’ to be aligned in the way displayed. So I ended up using 3 different sheets.

Table

Given we’ve already got all the measures, this was a very simple table, with the Team field sorted descending by the Points measure.

Bar Chart

Again this is a pretty basic chart with Team on Rows,(which is then hidden) and Points on Columns, and Win-Lose-Draw on Colour.

To label the end of the chart with the total points, I created an additional measure

Total Points

{FIXED [Team]: SUM([Points])}

which I added to Columns and made dual axis, with the Total Points being a Gantt type, and the mark being labelled.

Building the last 5 matches chart

This chart is slightly more complex, and where the table calculations clue comes into play.

In the data, a match for each Team is identified by the Date, but we can’t use Date to plot against as not everyone has their matches on the same date. We just want to number each match played by each team in order of the date played.

We can use a field Index = INDEX() to number our matches per team – you’ll see that some teams have played 8 matches while others 7.

But we only want the last 5 matches, which for some teams will be matches 3-7 and others 4-8.

To do this, we need to determine the number of matches played. In hindsight I should have realised, I’ve already got this in a variable, but as my head was in ‘table calc’ mode, I created a field Size = SIZE()

As you can see, this returns the number of rows for each team against every row for that team.

I then created another calculated field

Last 5 matches only

[Index] > [Size]-5

Adding this into the table, and verifying the table calc settings for both the nested table calcs are set as above, you can see the last 5 rows of each team are set to True

This can then be added to the Filter shelf to reduce the rows shown.

However, as some teams have played less matches than others, the Index values shown aren’t consistent for every team, and we need this.

So another field is required

Index to Plot

//want a value from 1 to 5 based on the index
[Index] – ([Size]-5)

This gives us the 1-5 sequence for each row we need to build the chart

The layout below is the basic premise.

The circles need to be coloured based on the Win-Loss-Draw field, and the labelled with the same field which is set to match mark colour. Reducing the transparency of the colour will fade the circle colour and make the label stand out.

Getting the data for the Last 5 matches chart tooltip

So just by pivoting the data, I was able to crack through the measures and the charts pretty quickly, and felt happy that I’d worked out Luke’s trick.

But then I came to adding the tooltip to the circle chart which needs information about the opposition, and the score.

The score is always displayed on the tooltip as

Team ‘s score – Opposition score.

For example when Arsenal played Sheffield United at Home where Arsenal won 2-1, the score is displayed on the tooltip as 2-1.

And when Arsenal played Manchester City Away, where Manchester City won 1-0, the score for Arsenal is displayed on the tooltip as 0-1.

So to get the team’s score I created

Team Score

IF [Home or Away]=’Home Team’ THEN [Fthg] ELSE [Ftag] END

Where FTHG is Full Time Home Goals and FTAG is Full Time Away Goals. So if the team is listed as the Home team, their score is identified by FTHG, but if they are the Away team, their score is identified by FTAG.

Correspondingly, we need

Opposition Score

IF [Home or Away]=’Away Team’ THEN [Fthg] ELSE [Ftag] END

If the team we’re interested in is listed as the Away Team, then the opposition will be the Home team, so FTHG identifies the goals they scored, otherwise the opposition is the Away team, so FTAG is the goals they scored.

Both of these fields may sound complicated, but it was retrievable in the data I had.

However the actual name of the Opposition team wasn’t – this was lost once I pivoted the data – the Opposition is now located on a completely different row of data and as we have no ‘match id’ or similar in the data, I can’t actually identify who the opposition might have been, since multiple matches can play on the same date.

This is where I started to question whether ‘my trick’ was the right one…

I had a bit of a think, and came up with a solution, but it was a bit ‘off piste’ for a #WOW challenge as it required blending… typically when blending is required they’d be some mention of it in the requirements, but since there was nothing stated to say it couldn’t be used, I went with it.

I added the original unpivoted data set to my workbook as an additional data source. I renamed this with the word ‘Home‘ on the end.

When I work with blending, I try to be specific which fields the data sources are going to be linked/blended with.

So in the pivoted data source I’d been working with for most of the challenge, I create a new field

BLEND – Team

[Team]

Then in the Home data source I’d added, I also created a field with the same name, but it contained a different field.

BLEND – Team

[Home Team]

I then added another instance of the original unpivoted data source to my workbook, this time renaming it with the word Away on the end. In this data source I created

BLEND – Team

[Away Team]

so from the 3 data sources, I could now build up a table to verify I could get to the data I needed:

  • Pivoted Data.Team
  • Pivoted Data.Date
  • Home Data.Away Team
  • Away Data.Home Team

and when blended to the secondary data sources, the links for both BLEND – Team and Date fields are enabled

From this data, we can see that either the Home Team column or the Away Team column is populated, and provides the name of the opposition. So this means we can then create in the primary data source (the pivoted data source)

Opposition

IF ISNULL(ATTR([2020_11_11_WW46_EPL – Home].[Away Team])) THEN ATTR([2020_11_11_WW46_EPL – Away].[Home Team])
ELSE ATTR([2020_11_11_WW46_EPL – Home].[Away Team])
END

If the Away Team is NULL, then store the Home Team value, else store the Away Team value

And from this we now have all the building blocks we need to finalise the Last 5 matches chart.

Just make sure that as you add fields to the chart, you may need to alter the various table calculations to ensure they are still computing correctly (typically all the fields other than Team should be selected in the table calc dialog). And also keep an eye on the fields being blended – BLEND – Team and Date should always be selected for both.

After I posted my solution up, I did indicate I thought I might have done something a bit ‘out of the ordinary’, but apparently my approach match Luke’s! That was a real surprise.

My published version is here.

Happy vizzin’! Stay Safe!

Donna

Can you create a small multiple waterfall chart?

For this week’s #WOW2020 challenge, Lorna Brown asked us to recreate a waterfall chart – a chart style that hasn’t featured in many previous challenges (if at all), and is always a useful one to know how to build.

I’m familiar with these, and this challenge didn’t cause me too many issues, so this blog is going to be brief.

  • Building the waterfall
  • Small multiple / grid layout
  • Adding the month label

Building the waterfall

So I’d built out the basic waterfall for each month and day by plotting Year(Order Date), Month(Order Date) and Day(Order Date) on Columns. The Day(Order Date) field was set to Show Missing Values so each day without an order was still plotted, and I was trying to figure out how to get the additional ‘long’ bar at the end.

I worked out it was essentially a ‘total’ bar and when I duplicated my data as crosstab and played round with the data in a tabular form, I got the subtotals I needed displayed.

But I seemed to be having issues displaying these on the chart view. So I turned to my usual route, Google, and had a search, and came across this blog from Tim Ryan at The Data School, which gives you the complete guide to building the waterfall, so there’s no need for me to repeat it all – thanks Tim! 🙂

My issue was I had a green continuous Day(Order Date) field rather than a blue discrete one – doh!

The only couple of things you need to make note of – you need to ensure you have 0 displayed for the missing dates

Actual Profit

ZN(SUM([Profit]))

and the gantt bars should be coloured red for negative profit, blue for positive and grey for the missing days

Colour

IF SUM([Profit])<0 THEN ‘Red’ ELSEIF SUM([Profit]) > 0 THEN ‘Blue’
ELSE ‘Grey’
END

Small multiple / grid layout

For this you need fields to add to the Rows and Columns shelf that position the month in the appropriate cell.

The Quarter(Order Date) (blue discrete) on Rows, is just used to define the row a month lands in.

You then need

Cols

IF MONTH([Order Date])%3 = 0 THEN 3
ELSE MONTH([Order Date])%3
END

which assigns each month a value of 1,2 or 3. You’ll need this on the Columns shelf.

Adding the month label

The label shows the month and the total profit in the month, so I created an LOD for this

Profit for Month

{FIXED YEAR([Order Date]),MONTH([Order Date]): SUM([Profit])}

From this I wanted the maximum value of all the monthly profits

Max Monthly Profit in Year

{FIXED Year([Order Date]): MAX([Profit for Month])}

I then used a dual axis to plot this field on the Rows, set the mark type to a line and set the opacity of the line colour to 0%, so it disappears.

The month and value were then added to the Label shelf and the label set to Label start of line only and also right aligned to get the required positioning.

And that’s it. I said this would be brief 🙂 My published viz is here.

Happy vizzin’! Stay Safe!

Donna

Can you build a Strava workout calendar?

For 2020 Week 42, the #WoW founder, Andy Kriebel, returned with a challenge to reproduce the Strava training calendar. Compared to some challenges recently, this looked to be quite straight forward; Andy threw in some specific requirements to test certain features – ie no data modelling and no LoDs.

I’ve been doing #WorkoutWednesday challenges since they first started, so I know that Andy is a stickler for formatting and layout – points not necessarily listed as a requirement, just expected as part of the challenge to reproduce. I kept my fingers tightly crossed when I published that I’d got all the finer details, but alas, Andy still found fault – my month summaries weren’t right aligned (my bad – missed that little nuance completely), and my bars had borders on them… Andy must have eyes like Superman to have seen that, as it wasn’t obvious. It also wasn’t a setting I’d intentionally added. I later found out that adding a particular type of pill to the Detail shelf caused borders to automagically be added… There’s always something to learn when Andy’s about!

So onto the challenge – as with previous weeks, I’m going to try to focus on the areas that may be a bit trickier / newer to some rather than detail the complete build step by step.

  • Using the data sets – blending
  • Building the calendar grid
  • Ensuring a 0 measure value is displayed for missing days
  • Adding the monthly hours summary
  • Building the BANs
  • Year Filter control
  • Remove highlighting
  • Setting the colour of the Calendar chart background

Using the data sets – blending

Andy was very specific that the 2 data sets provided should be used separately and not joined in any via the data pane.

This meant the data sources would need to be blended (further detail on this is here). Blending used to be one of the only ways within Tableau you could combine data together.

When blending, the number of rows in your output will never be more than the number of rows in your primary* data source. If there are multiple matching rows in the secondary data source, then the results will be aggregated in the display.

* whatever data source the first pill you add to your canvas comes from, will be the primary, and is denoted by a small blue icon by your data source. Secondary data sources are denoted by a small orange icon.

In the case of this challenge, we had a data set containing a list of dates (1 row per day from 01 Jan 2014 up to 31 Dec 2021), along with Andy’s Strava activity, containing a row for each activity recorded, which included the date time the activity occurred. This data could vary in that there could be multiple activities on the same day, and equally days when no activity occurred at all.

So the Calendar data set is our primary data source, as we need to show a bar on the calendar chart for every day of the year, regardless if there’s any activity. The Activity data set is our secondary data source. The number of hours, number of activities etc can all be aggregated from this data set.

When blending data sources, especially on dates, I prefer to create explicit calculated fields that define the fields I want to blend on. So in the Calendar data source I created

BLEND: Date

[Date]

essentially just a duplicate of the existing Date field, and in the Activity data source, I also created

BLEND: Date

DATE([Date Time])

Note the fields are spelled exactly the same, so Tableau automatically uses them as the linking fields when the view is built.

If you now do the following

  • Add the Calendar.BLEND: Date field to the Filter shelf, and select the Year = 2020,
  • Add Calendar.BLEND: Date as an exact date to Rows
  • Add Activity.BLEND: Date as an exact date to Rows
  • Add Activity.Seconds to Text

You can see that fields from the secondary data source have an orange icon by them; and that there are Null/missing values for the records from the secondary data source as these were the days when there was no activity recorded. You can also see a red link icon against the BLEND: Date field in the left hand data source pane, as this identifies how the two data sets are being matched.

Building the calendar grid

The calendar is essentially a ‘small multiple’ layout with each month being positioned in a particular row or column. To build out this layout we need to define the row number and the column number. There are many ways to build a dynamic small multiple grid which can flex based on the number of items you might be trying to organise, but for the purpose of this exercise, we can keep it simple. We’re working with 12 months that are to be displayed in a 4 x 3 grid layout. Create the following calculated fields in the Calendar data source.

Rows

IF MONTH([Date])<=4 THEN 0
ELSEIF MONTH([Date]) <=8 THEN 1
ELSE 2
END

Cols

(MONTH([Date])-1)%4

I make both of these to be dimensions rather than measures by dragging them above the line on the left hand data source pane. If you build out the view as below, you can see how these calcs are working

As we want to show a mark for every day in the month, we need to add the day of the month from the Calendar data source to Columns. Drag Calendar.BLEND: Date to Columns, then select the drop down to change to the Day date part

We need to show the amount of time in hours rather than seconds. In the Activity data source, create the field

Hours

([Seconds]/60)/60

and drag this onto the Rows, and change the mark type to bar. If need be re-add the YEAR(BLEND: Date) = 2020 to the Filter shelf. Now add Calendar.BLEND: Date as an exact date to the Detail shelf. You should now have

where you can see the gaps in the days where no activities took place, and if you hover vertically, you should find that the days of the month are vertically aligned – ie 30th Jan aligns with 30th May etc.

Ensuring a 0 measure value is displayed for missing days

With the above we displayed the Activity.Hours field, but if you hover over the day when there is no activity, nothing displays on the tooltip rather then 0.

To fix this, create a calculated field in the primary Calendar data source

Hours

ZN(SUM([Sheet1 (Activities Summary)].[Hours]))

This is basically just referencing the field in the secondary blended data source, but wrapping in a ZN() function means it will display 0 when no match can be found

Use this field from the primary data source instead on the calendar viz.

Adding the monthly hours summary

The requirements meant Andy expected the summary to be displayed within the same sheet as the daily calendar viz.

For this I used an old friend MIN(0) to create another axis, which is placed on the Rows in front of the Hours measure.

What I now plan to do is set this axis to be Text and plot the month, monthly hours, and the word ‘hours’ at a specific point to the right of the each cell – I’m choosing day 28 – you might want to experiment and choose a different day.

First up though, I need to build some fields to plot.

Month Name Abbrev

IF DAY([Date]) = 28 THEN
UPPER(LEFT(DATENAME(‘month’,[Date]),3))
END

Hours in Month

IF MIN(DAY([Date])) = 28 THEN
WINDOW_SUM([Hours])
END

LABEL: Hours

IF DAY([Date]) = 28 THEN ‘HOURS’ END

Month

DATE(DATETRUNC(‘month’,[Date]))

Add Month as an exact discrete date to Detail and the other 3 fields to the Text shelf of the Min(0) marks card (change the mark type to Text if you haven’t already done so). Alter the table calculation setting of the Hours in Month field to compute by all fields except Month

Building the BANs

These use a similar concept as above, by using 3 instances of MIN(0) placed side by side on the Columns shelf and set to the Text mark type. This creates 3 marks cards which you can then add the relevant measures and text on.

The measures are all coming from fields in the primary data source that reference measures in the seconday data source ie

# Activities (in Activity data source)

COUNT([Activity ID])

# Activities (in Calendar data source)

ZN([Sheet1 (Activities Summary)].[# Activities])

#Miles (in Calendar data source)

ZN(SUM([Sheet1 (Activities Summary)].[Miles]))

Year Filter Control

All the sheets you are building need to be filtered by the same BLEND: Date field from the Calendar data source (set the filter to Apply to all worksheets).

When this field is added to the dashboard, you can customise it so the All values does not show and the slider control also doesn’t display

Remove highlighting

To stop items on the dashboard from highlighting when they are clicked on, I use a trick that has been probably been the ‘most used trick of #WOW2020’ 🙂

In the primary data source, create a field called True which contains the value TRUE and a field False containing the value FALSE. Add both these fields to the Detail shelf of each sheet you don’t want highlighting on.

On the dashboard, create a Filter URL action for the each sheet that goes from the sheet on the dashboard to the sheet itself, and passes selected fields setting true = false. As this condition will never be true, then there is nothing to ‘filter’ so the marks don’t highlight. This needs to be repeated for each sheet on the dashboard, so I had 3 filter dashboard actions.

NOTE – a consequence of adding the True and False fields to the Detail shelf on the bar sheets, was that it caused a border to be added around the bars.

This wasn’t something I noticed, as it isn’t at all obvious, but Andy called it out!

Setting the colour of the Calendar chart background

You need to format the sheet and set the fill colour of the Pane rather than the whole sheet to grey.

There’s obviously a lot of other formatting settings to apply to get rid of all the row/column borders and gridlines etc, but this was a slight difference that I wanted to call out, as ended up with a ‘border’ on my dashboard that wasn’t required when I set the whole worksheet background.

Right, I think that’s about it for this week! Thanks for the fun challenge Andy – great to have you back!

My published viz is here.

Happy vizzin’! Stay Safe!

Donna

Is Profit Ratio Influenced by Quantity?

For this week’s challenge, Ann asked we recreated a scatter plot that ‘on click’ became a connected scatter plot and also displayed an ‘insights’ box to the user.

Building out the data

The scatter plot is to show Profit Ratio by Quantity, so first up we need

Profit Ratio

SUM([Profit])/SUM([Sales])

formatted to percentage with 0 dp.

A mark is shown on the scatter plot per Category and Month/Year. When working with dates like this, I often prefer to created a dedicated field to use as an ‘exact date’ which I can then format. So I built

Month Year

DATE(DATETRUNC(‘month’,[Order Date]))

and formatted it to the March 2001 format (ie mmmm yyyy).

We’re also going to need a set based on the Category field, as that is going to drive the interactivity to help us connect the dots ‘on click’. So right click on Category and Create Set and just tick one of the categories displayed for now.

Selected Category

Let’s build these out into a table as follows

  • Order Date to Filter, filtered by years 2018 & 2019
  • Category, Selected Category & Month Year (exact date , discrete) onto Rows
  • Measure Names to Text and Filter to Quantity & Profit Ratio
  • Right click on Selected Category and select Show Set to display a selection control to easily allow the set values to be changed

To help drive the interactivity, we’re going to need to store the value of the Profit Ratio for the Selected Category in the set.

Selected PR

IF ATTR([Selected Category]) THEN [Profit Ratio] END

Format this to percentage 0 dp too, and add this to the table – it will only display the profit ratio against the rows associated to the category in the set

At this point, we’ve got what we need to build out the scatter plot, but I’m going to continue to create the fields needed for the ‘Insights’ piece.

These are looking for the months that contained the Max & Min values for the Profit Ratio and Quantity. I’m going to tackle this with Table Calcs.

Max PR per Category

WINDOW_MAX([Profit Ratio])

Adding this to the table, and setting the table calculation to compute by Month Year only, gives us the Max Profit Ratio for each Category

We need 3 more fields like this :

Max Qty per Category

WINDOW_MAX(SUM([Quantity]))

Min PR per Category

WINDOW_MIN([Profit Ratio])

Min Qty per Category

WINDOW_MIN(SUM([Quantity]))

Add all these to the table, setting the table calc in the same way as described above.

Now we know what the min & max values are, we need to know the month when these occurred.

Max PR Month

WINDOW_MAX(IF ATTR([Selected Category]) AND [Profit Ratio] = [Max PR per Category] THEN MIN([Month Year]) END)

Format this to March 2001 format.

If the category is that selected, and the profit ratio matches the maximum value, then retrieve the Month Year value, and duplicate that value across all the rows (this is what the outer WINDOW_MAX does).

Add this to the table in the Rows, and adjust the table calc settings for both the nested calculations to compute by Month Year only. The month for the selected category which had the highest profit ratio will be displayed against all rows for that category, and null for all other rows.

Again we need similar fields for the other months

Max Qty Month

WINDOW_MAX(IF ATTR([Selected Category]) AND SUM([Quantity]) = [Max Qty per Category] THEN MIN([Month Year]) END)

Min PR Month

WINDOW_MAX(IF ATTR([Selected Category]) AND [Profit Ratio] = [Min PR per Category] THEN MIN([Month Year]) END)

Min Qty Month

WINDOW_MAX(IF ATTR([Selected Category]) AND SUM([Quantity]) = [Min Qty per Category] THEN MIN([Month Year]) END)

Format all these too, and add to the table with the appropriate table calc settings applied.

If you now change the value in the the Selected Category set, you should see the table update.

Building the Scatter Plot

On a new sheet, build out the basic scatter plot by

  • Order Date to Filter, filtered by years 2018 & 2019
  • Profit Ratio to Rows
  • Quantity to Columns
  • Category to Colour and adjust accordingly
  • Month Year exact date, continuous to Detail
  • Change Shape to filled circle
  • Adjust the Tooltip to suit
  • Change the axis titles to be capitalised
  • Change the sheet title
  • Remove all row/column borders & gridlines. Leave the zero lines and set the axis rulers to be a sold grey line

Add Selected PR to Rows, and change the mark type of that card to line, and add Month Year to path. In the image below, I still have Furniture selected in my set.

Add Month Year to the Label shelf too and format so the text is smaller and coloured to match mark colour

Make the chart dual axis, synchronise the axis and remove the Measure Names from the colour shelf of each mark. Hide the Selected PR axis.

If you now change the selected set value, you can see how the other categories ‘join up’.

Building the Insights Box

Take a duplicate of the data table built originally, and add Selected Category to Filter. If you’ve still got an entry in your set, this should reduce the rows to just those records for that Category.

Remove the fields Profit Ratio, Quantity, Selected PR from the Measure Values section, and remove Selected Category from Rows.

Create a new field

Index = Size

INDEX()=SIZE()

and add this to the Filter shelf, setting the value to true.

The INDEX() function will number each row, the SIZE() function will number each row based on the total number of rows displayed (ie all rows will display the same number), so INDEX()=SIZE() will return true for the last row only

Now we’ve only got 1 row displayed, we’ve got the info we need to build the data in the Insights box by

  • Move Month Year to Detail
  • Category to Colour
  • Move Max PR Month, Max Qty Month, Min PR Month, Min Qty Month to Text
  • Move Max PR per Category, Max Qty per Category, Min PR Per Category, Min Qty per Category to Text

This will probably show duplicate instances of the values. This is because we didn’t fix the table calculation setting of the Index=Size field. Edit that to compute by Month Year

and then re-edit the Index=Size filter to just be true again, and expand the display so you can see all the values.

Now you can easily format the Text to the required display

I used this site to get the bullet point image from

Building the dashboard & interactivity

Add the scatter plot to the dashboard, and remove the legends that automatically get added.

Float the Insights sheet and position bottom right, remove the title. fit entire view, and also remove the colour legend which will have been added.

Add a dashboard Set Action to assign values to the Selected Category set and remove all values when the selection is cleared

Additionally, add a highlight action that highlights the Category field only (this keeps the lines highlighted on selection of a single mark).

Now practice clicking on and off marks on the scatter plot and you should see your lines & insights box being added and removed.

Finally add the title and header line by adding a vertical container at the top above the scatter plot.

Add a text box for the title, and underneath add a blank object.

Set the padding of the blank object to 0 and the background to black. Then edit the height to 4, and adjust the height of the container if it’s now too large.

Make adjustments to the position of the floating insights box if need be so it isn’t overlapping any axis.

And that should be it! My published viz is here.

I’ll be taking a break for a couple of weeks, so will be on catch up when I return from some R&R.

Happy vizzin’! Stay Safe!

Donna

What is the distribution of total orders by customer?

Luke Stanke set the challenge this week, and posted a ‘sneak peak’ on Twitter before the challenge was formally released

Challenges from Luke can sometimes be on the harder end of the scale, so with the bit of extra time available and only the gif in the tweet as a clue , I had a play to see if I could get close at all. And surprisingly I could, so once it was formally issued, it was just a case of tidying up some of my calcs.

Calculations

First up we need to establish the basic calculations

# Customers

COUNTD([Customer ID])

# Orders Per Customers

{FIXED [Segment],[Customer ID]: COUNTD([Order ID])}

Plotting these out onto a table with Segment we get

Note #Orders Per Customers needs to be set to be a discrete dimension.

So while this shows us a summary of the number of customers, we’ve only got 1 mark showing the summarised count. When plotting the chart we need something else in the view that will generate more marks. This is Customer ID.

At this point in order to help us build up our tabular view to ‘see’ what’s going on, I’ll filter the table to just show Segment = Consumer, and I’ll add Customer ID to Rows

As expected, our # Customers is now showing a count of 1 per row, but the data is now expanded as we have now got a row (ie a mark) per customer for each # Orders Per Customer ‘bucket’. But we still need a handle on the total customers in each ‘bucket’.

Customers Per Order Count

WINDOW_SUM([# Customers])

Adding this to the view and setting the table calc to compute based on Segment & Customer ID, we get the summarised value back again.

But we don’t want to actually show that number of marks; the number of circles to plot on the chart is dependent on a user parameter:

pMarkIndicator

Based on the requirements, if the number of customers is 15 and the user parameter is 5, then 3 circles should be drawn (15 / 5 =3 ), but if the number of customers was 14, only 2 circles should be drawn (14 / 5 = 2.8), ie the number of circles will always be set to the integer of the equal or lesser value (essentially the FLOOR() function). This can also be achieved by

Marks to Plot

INT([# Customers per Order Count]/[pMarkIndicator])

For some reason FLOOR can’t be used in the above as a table calculation is being used, but INT does the job just fine, and adding to the tabular view and adjust the table calculation accordingly we get

ie for the Consumer Segment, 6 customers have made 1 order in total, so based on batching the customers into groups of 5, this means 1 circle should be displayed. Whereas, 30 customers have made 3 orders in total, so 6 circles should be displayed.

But we can’t actually reduce the amount of rows (ie marks) displayed – we either have 1 row (by removing Customer ID) or a row per customer. But that’s fine, we don’t need to.

What we want is something in our data to group each row into the relevant batch size.

First up let’s generate an ID per row for each customer that restarts for each #Orders Per Customer.

Index

INDEX()

Add this as a discrete pill to Rows, and adjust the table calc

Now we have this, we calculate which ‘column’ each customer can sit in, a bit like what we would do if we were building a small multiple table, arranging objects in rows and columns (see here for an example of what I mean).

Cols

[Index]%([Marks to Plot])

This uses the modulo (%) notation which returns the remainder of the division sum. Lets put this on the view

For customers who have only ordered once, and where we’re only going to plot 1 mark, the Cols value is the same (0) for all rows.

Whereas for customers who have ordered twice, and where we want to plot 2 marks, the Cols value is either 0 or 1.

We’ve now got a value we can use to plot on an axis. We’re still going to plot a mark for each Customer ID, but some marks will be plotted in the same position, ie on top of each other, which therefore looks like just one mark.

Let’s show this more graphically, by duplicating the sheet, deleting some pills, moving some around, changing some to continuous, and setting the mark type to circle as below

Change the pMarksIndicator parameter and the number of circles will adjust as required.

So far, so good. We’ve got the right number of marks, it’s just not looking as nice and symmetrical as it should be.

We need to shift the marks to the left. But how far it shifts is dependent on whether we’ve plotted an odd or even number of marks.

If we have an odd number, the middle mark should be plotted at 0. If we have an even number the middle two marks should be plotted at -0.5 and +0.5 respectively. The calculation below will achieve this

Cols Shifted

IF [Marks to Plot]%2 = 0 //we’re even
THEN [Cols] – ([Marks to Plot]/2) + 0.5
ELSE //we’re odd
[Cols] – (([Marks to Plot]-1)/2)
END

To demonstrate this, I’ve added Cols Shifted along side Cols on the viz (this time make sure all the table calculation settings (including the nested calcs) are applied to compute based on Customer ID only which is different from the calcs above)..

Now you can see how it all works, you can remove the Cols and the Segment from the Filter shelf.

And now its just a case of applying the various formatting to clean up the display, and adding to the dashboard.

My published version is here.

Happy vizzin’! Stay Safe!

Donna

Can you compare a 3-day vs 14-day moving average and describe the latest trend?

This week for #WOW2020, Ann provided a table calculation feast of a challenge! This certainly is not for the faint-hearted! As well as cracking all the table calcs, the challenge features multiple views, measure swapping, parameters, BANs, filtering, sorting …. it’s got it all going on!

Ann hinted you’d probably want to start with the table, and even if there hadn’t been a table output in the display, this is what I would have done. If you’ve read enough of my blogs, you’ll know I often like to build up a ‘check data’ sheet, which just contains the data I need in tabular form as a quick reference. When working with table calculations this is an absolute must have!

So let’s build out that Check Data table to start with. I have a feeling this is going to be a lengthy blog 🙂

Initial Set up

First up, the requirements stated that the latest date would be 7 June, but I found records with a 8 June date. All the associated info for this date was null though, so I set a data source filter to exclude this. This means I wouldn’t get any issues if I needed to store the max date in a FIXED LoD calculation at any point.

I also found it easier to rename a couple of the measures provided to match the output, so rename PEOPLE_POSITIVE_NEW_CASES_COUNT to New Cases and PEOPLE_POSITIVE_CASES_COUNT to Reported Cases. I’ll refer to these renamed fields going forward.

Building all the Calculated Fields

To build out the table, we’re just going to focus on one State & County, as there’s a lot of data. So add Province State Name = Tennessee and County = Davidson to the Filter shelf.

Add Report Date (discrete exact date – blue pill) and New Cases & Reported to Rows. As you scroll down, you’ll see data starting to come in on 8 March.

We want to create our moving average calculations

3 Day Moving Avg

WINDOW_AVG(SUM([New Cases]), -2, 0)

14 Day Moving Avg

WINDOW_AVG(SUM([New Cases]), -13, 0)

Notice the number of rows to average over is 1 less than you might expect, as the current row is included, so the calculation is saying ‘current row’ and 2 | 13 previous rows.

Add these to the table, and adjust the table calculation so it is explicitly calculating by Report Date. This would have happened automatically, as the calculation would have been computing ‘down’ the table, but it’s best to fix the computation, so it doesn’t matter where the pill gets moved to in the view.

We now need to work out whether there is an increase or not between the 3-day and 14-day average.

Is Increase?

IF [3 Day Moving Avg] > [14 Day Moving Avg] THEN 1 ELSE 0 END

Is Decrease?

IF [3 Day Moving Avg] <= [14 Day Moving Avg] THEN 1 ELSE 0 END

I’m using 1s and 0s as it’s going to help with a later calculation.

NOTE – I’m assuming that if there is ‘no change’ it’ll be recorded as a decrease. This is how I interpreted the requirement, “ …whether it is an increase or a decrease (or no change)” and it wasn’t easy to find any matches anyway.

I also need some text to indicate the increase or decrease

Increase | Decrease

UPPER(IF [Is Increase?]=1 THEN ‘Increase’ ELSE ‘Decrease’ END)

The UPPER is used as that’s part of the tooltip formatting.

Let’s get these onto the view, always making sure the table calculations are set to Report Date.

We need to calculate the number of days that has been reported INCREASE in succession, and the number of days where successive DECREASE has been reported.

So first, let’s identify which rows match the previous row.

Match Prev Value?

LOOKUP([Is Increase?],-1) = [Is Increase?]

If the value of the Is Increase? field in the previous (-1) row is the same as the Is Increase? field in the current row, then this is true, else false.

Add to the view, and verify the table calculation for itself and all nested calculations being referenced, is set to Report Date.

We now have all the information we need to help us work out the number of days in the increase/decrease ‘trend’.

Days in Trend

IF (FIRST()=0) OR(NOT([Match Prev Value?])) THEN 1
ELSEIF [Increase | Decrease] = ‘INCREASE’ THEN ([Is Increase?]+PREVIOUS_VALUE([Is Increase?]))
ELSEIF [Increase | Decrease] = ‘DECREASE’ THEN ([Is Decrease? ]+PREVIOUS_VALUE([Is Decrease? ]))
END

If the row in the table is the very first entry (so there’s nothing previous to compare against), or the row in the table didn’t match it’s predecessor (ie there was a change), then we’re starting a new ‘trend run’, which obviously starts at 1.

Otherwise, if the current row we’re on indicates an increase, then we’ll add the value of the Is Increase? field (which is 1) to the previous value (which is also 1). PREVIOUS_VALUE works recursively though, so it essentially builds up a running sum, which gives our trend.

We ultimately do the same thing using the Is Decrease? column. This is why using 1 & 0s in the earlier calculation help.

Adding into the view, and setting the table calculation correctly, you should get something similar to this…

Finally, there’s one key field we need to add; something to help identify the latest row as we will need it for filtering in the table that’s displayed on the dashboard. Simply applying a standard ‘quick filter’ won’t work, as the table requires we show the 3-day & 14-day moving averages. A ‘quick filter’ to limit the data to the latest date (7th June), will show the wrong values, as the data related to the other days will be filtered out, so the table calc won’t have the information to correctly compute over.

We need to create another table calculation that we can use as a filter, and that due to Tableau’s ‘order of operations’ will apply later in the filtering process than a traditional quick filter.

Max Date

{FIXED : MAX([Report Date])}

The latest date in the whole data set.

Show Data for Latest Date

LOOKUP(MIN([Report Date]),0) = MIN([Max Date])

If the Report Date of the current row is the same as the maximum date in the whole data set, then return true.

We’ve now got all the core data components we need to create the various charts.

In the interest of time (my time in writing this out), I’m going to attempt not to describe the building of all the charts in too much detail, but just call out the useful bits you might need. If you’re attempting this challenge with the table calcs above, I’m assuming you know Tableau enough to not need everything defined to the lowest level.

The whole report is driven off a parameter which the user must enter a State – County combo.

You’ll need a calculated field to store the combo

State – County

[Province State Name] + ‘ – ‘ + [County]

and then create a parameter (State – County Parameter) off of this (right click, Create -> Parameter) which will create a string parameter with all the permutations.

When displaying on the dashboard, set this to be of type Type In

BAN

The BAN is a basic summary of the latest trend for the entered state county.

We need to filter the sheet to the value entered in the parameter

Is Selected State County?

[State – County Parameter] = [State – County]

Add this to the Filter shelf as true, along with the Show Data for Latest Date.

Add the relevant fields to the Text shelf to display the required text. The Report Date needs to be custom formatted to ddd, mmm d to get the Sun, Jun 7 display

Map

For the map, as well as filtering the latest date, we’re also going to need to filter just to the state only (not state & county) as above. So I created

Is Selected State?

LEFT([State – County Parameter], FIND([State – County Parameter],’-‘)-2) = [Province State Name]

This is unpicking the State – County combined string stored in the parameter, to just find the State part and compare to the Province State Name.

Build a filled map based on County and filter to the latest date and the selected state. I set the Map Layers to that below, which seems to match up

You’ll need to set both the Is Selected State County? and Increase|Decrease fields to the Colour shelf.

Bar & Line Chart

You’re going to need a few more calculated fields for this.

Moving Avg Selector

for the user to choose what the line should display. I’ve set it to an intger parameter that displays text

We then need a field to show on the display depending on what’s been selected in the parameter

Moving Avg to Display

If [Moving Avg Selector] = 3 THEN [3 Day Moving Avg] ELSE [14 Day Moving Avg] END

You’ll need a Dual Axis chart plotting New Cases and Moving Avg to Display against Report Date (continuous exact date)

The data only starts from 8th March, so I added Report Date to filter to start from 8th March. 8th March is also added as a constant reference line.

Table

Based on the State / County entered, the table is filtered to show the data for the latest date for all the counties in the state entered. Although not stated in the requirements, the first row is the county selected, with the rest ordered by Reported Cases.

You can get the selected county to the top, by adding Is Selected State County as a hidden field to the Rows, and moving ‘True’ to the top.

And that should be everything you need to build the dashboard, which is pretty much just stacking all the sheets one on top of each other in a single column.

My published viz is here.

Happy vizzin’! Stay Safe!

Donna

Can you build a dynamic date drilling chart?

Week 19 of #WOW2020 saw Ann provide this challenge, which she declared would be easier than Luke’s from the previous week.

The aim was to start with a chart by week, but on selection drill in to show a by day view, with the slight twist that all days between the first and last selected dates should display, even if the specific week in between wasn’t explicitly selected.

I knew this was going to involve set actions (although there was a clue on the ‘latest challenges‘ page – I tend to pick up the challenges through Twitter, so often bypass this page, but it’s worth a look for extra clues :-))

So on the face of it, the challenge seemed as if it should be ok, but it’s usually only when I start building that things can start to unravel.

One area that Ann hadn’t been explicit about in the requirements, was the behaviour in certain scenarios. I had to revise some of my initial attempts/calculations in order to match Ann’s workbook. This week I’m not going to go into all the wrong turns, but this is the behaviour I observed in Ann’s workbook, and so ultimately tried to replicate.

  1. When at the week level, if multiple points are selected, the days then displayed should start from the first day of the first week selected (which will be a Sunday as we’re working with how the US standardises a week start), and finish on the last day of the last week selected (which will be a Saturday).
  2. When at the week level, if a single point is selected then the 7 days in that week only should be displayed.
  3. When at the day level, if multiple points are selected, the days then displayed start from the first day of the week the first day is in and end on the last day of the week the last selected day is in (so you may get more days than actually selected). eg if the 1st day selected is a Tuesday and the last day selected is a Friday, the ‘drill in’, will start on the previous Sunday, and end on the following Saturday. This isn’t necessarily what you may expect to happen.
  4. Based on the above, when at the day level, a single day is selected, the ‘drill in’ will show the 7 days in the same week. Once you’re down to displaying the 7 days in the same week, clicking on any single day or selecting a couple of days, won’t appear to do anything as the same results are displayed.

Being aware of the above, will explain why some the calculations I end up with look the way they do.

Ok, let’s get on with the build…..

Ann stated that she’d hardcoded to the ‘last 52 weeks’, but essentially it looked like she was displaying data for all of 2019. This wasn’t really something the challenge was testing, so I simply started by adding a data source filter for the year 2019 (right click on data source -> Edit data source filters). I also had to set the date properties of my data source to set a week to start on a Sunday as I’m UK based, so my week’s are defaulted to start on Mondays (again right click on data source -> Date Properties). As a result of my year filter, I didn’t seem to have the same starting/ending dates as Ann, but as already stated, this wasn’t the main aim of the challenge, so I didn’t stress about it.

I decided quite early on that I was going to also use a Parameter Action to decide the level I was at (ie had I drilled in or not). I wasn’t entirely sure when I started how many ‘levels’ I might need, so chose to use an integer parameter for this. I ended up only needing 2 levels, so a boolean could have worked equally as well, or, as I type, I think I could have used this parameter to store the ‘date level’ (day or week) I want to display my dates at, which would have made some of my calculations easier to read. I’m not going to do this though.

Drill Down

Integer parameter set to 0 by default.

Now I want to define a date field that I’m going to use on the axis, that will vary depending on the ‘level’ we’re at.

Date to Plot

IF [Drill Down]>0 THEN DATETRUNC(‘day’, [Order Date]) ELSE
DATETRUNC(‘week’, [Order Date])
END

Note – I used >0 as I wasn’t sure if I’d have levels 0, 1 & 2… as it turns out I just used 0 & 1 in the end.

Add Date to Plot to Columns as a Continuous, Exact Date and Sales to Rows and we’ve got our starting point

Change the Drill Down parameter to 1 and the chart will change to display at the day level

Drilling Down on selection

On selection, we want to add the selected dates into a set, so first up, we need to define that set.

Right click on Date to Plot and Create -> Set

Selected Dates

Select some random dates so we can test with. These will get set properly later based on the Set Action we define on the dashboard.

Based on the dates in the set, we need to determine a min and a max date we can then use to restrict the dates being plotted on the chart.

Min Date

{FIXED:MIN(IF [Selected Dates] OR [Drill Down]=0 THEN DATETRUNC(‘week’,[Date to Plot]) END) }

This either gets the 1st day of the week based on the earliest date in the set, or the 1st day of the week of the whole data set (when we’re at the ‘starting’ level with Drill Down =0).

We also need

Max Date

IF [Drill Down]=0 THEN
{FIXED:MAX(DATEADD(‘day’,-1,[Date to Plot]))}
ELSE
DATEADD(‘week’,1,{FIXED:MAX(IF [Selected Dates] THEN DATETRUNC(‘week’,[Date to Plot]) END)})-1
END

If we’re at the starting level (Drill Down = 0) then we want the last day within the latest week in the data set (this field is used in the title display, so necessary to get this to display right), otherwise we need to get the last day of the week associated to the maximum date in the set.

Now we need to be able to restrict the dates displayed in the chart based on these

Dates to Include

[Order Date]>= [Min Date] AND [Order Date]<= [Max Date]

Add this to the Filter shelf and set to True.

Your display shouldn’t change, as we’re still at ‘level 0’, even though our set has random dates selected. If you now manually change Drill Down to 1, you should see a change

Setting up the dashboard actions

Ok, now we’ve got the basic idea, let’s get it all working properly with dashboard actions before we sort out all the other bits n bobs.

First up, let’s manually reset everything by setting the Drill Down parameter to 0 and emptying all the values selected in the set.

Add the sheet onto the dashboard, and create the set action, which is set to target the Selected Dates set and to Keep set values if you click on a blank area of the chart after selection.

However, this on it’s own won’t change the display. We need to set the Drill Down parameter to 1 too.

For this we need another field

Set Drill Down Level

1

Add this to the Detail shelf of the chart.

Then back on the dashboard, add a Parameter Action that targets the Drill Down parameter using the value stored in the Set Drill Down Level field

Now if you select the dates in the dashboard, you should get the desired behaviour, and if you select again, you should filter the days selected further (as per the behaviour described at the top of the page).

Resetting the display

The Reset button is actually another sheet.

I created a ‘fake’ bar chart by adding MIN(1) to Rows, and double clicking in the space below the Detail and Tooltip shelves on the Marks card, and typing the text ‘CLEAR SELECTION’. This creates a ‘pill’ without having it defined explicitly as a calculated field, and I added this to the Text shelf, and centred/formatted appropriately. I then set the axis to be fixed from 0 to 1 and hid it.

I set the colour to #a26dc2 and set the text to ‘match mark colour.

We only want this sheet to display, if we’ve ‘drilled down’, so I need

Show Reset

[Drill Down]=1

which is added to the Filter shelf and set to True.

I also need a parameter action off this sheet, to reset the Drill Down to 0 on selection. For this I need another field

Reset

0

which is added to the Detail shelf.

Depending on what you’ve already been playing around with, there’s a chance this sheet may already be empty. Get the dashboard into a state where you’ve drilled down to the day level, then add this sheet, and add another Parameter Action.

Reset is set to run off the ‘button’ sheet only, to target the Drill Down parameter by using the value in the Reset field.

You should now be able to test all this out and get the desired behaviour.

Adding Animations

To get the chart to transition between selections, we need to use the Animations functionality.

On the Format menu, select Animations and adjust the settings as you choose.

Play around and you should have the main features of this challenge now working

Dynamic Title

The title needs to change based on whether you’re at the weekly or daily level. It also needs to show total and average sales. So for all this I need

LABEL: Level

IF [Drill Down]>0 THEN ‘Day’ ELSE ‘Week’ END

LABEL: Subtitle Level

IF [Drill Down]>0 THEN ‘Daily’ ELSE ‘Weekly’ END

LABEL: Instruction

IF [Drill Down] = 0 THEN ‘SELECT WEEKS TO DRILL DOWN TO DAILY VIEW’
ELSE ‘CLEAR SELECTION USING BUTTON’
END

Total Sales

WINDOW_SUM(SUM([Sales]))

Avg Sales

WINDOW_AVG(SUM([Sales]))

All these fields are added to the Detail shelf of the main chart along with Min Date and Max Date, and then the title is edited and formatted accordingly to reference these.

Tooltips

In a similar manner, the tooltips also need adjusting, they just need to refer to the LABEL: Level field

Colour of the Line

Although not explicitly mentioned, the line colour seemed to change from a lighter shade at the weekly level to more intense at the daily level. To achieve this I created

COLOUR

[Drill Down]

and added to the Colour shelf. This gives a colour legend displaying either 1 or 0 depending at what state you are in the interaction. Set colour values accordingly.

Average Line

Add this to the chart simply by selecting Average Line from the Analytics pane and dragging then dropping onto the Table shelf that appears. Format appropriately.

And apart from other basic formatting to remove gridlines/ axis titles etc, that should be it.

My published viz is here.

Happy vizzin’! Stay Safe!

Donna

How does the Sales Pipeline look?

Luke posted the challenge this week to build a Sales funnel using bar charts and table calculations.

The data provided was structured as follows

Each row identified a record in the pipeline along with its value. The last_stage field indicated what stage in the process the record was currently at. The value and last_stage are the only two bits of data needed for this challenge, along with the knowledge that the process moves through the stages in the following order :

  1. Prospect
  2. Lead
  3. Qualified
  4. Opportunity
  5. Negotiations
  6. Closed

which you can tell from the diagram anyway.

The challenge is that the value of any record currently at stage 2 (lead) or above, also needs to be incorporated into the total value of the previous stages.

When tackling these types of challenges involving table calculations, I start by creating a table view of all the data I need. So let’s get cracking…

Building up the data required

First up, I need a field to help drive the order of the stages, so I created

last_stage Order

CASE [last_stage]
WHEN ‘Prospect’ THEN 1
WHEN ‘Lead’ THEN 2
WHEN ‘Qualified’ THEN 3
WHEN ‘Opportunity’ THEN 4
WHEN ‘Negotiations’ THEN 5
ELSE 6
END

Add this to a view along with the value

This is essentially the data needed to build the Current Status column in the output.

Next we need to work out the Overall Funnel values. As Luke stated in the requirements, this is going to involve table calculations. What we want is a running sum but one that starts at the bottom at stage 6, and goes ‘up’ the table. But there is no Table Up option in Tableau, so we need to be creative.

We’ll still need a running sum though, so create this

Running Sum

RUNNING_SUM(SUM([value]))

Add this to the view, it will automatically be applied Table Down, which is what we need at this point – each row is the sum of the previous rows above it.

We also need the overall total captured against each row in the table.

Window Sum

WINDOW_SUM(SUM([value]))

Now we’re in a position to work out our ‘cumulative’ value or Overall Funnel, which is a calculation involving all 3 fields

Cumulative Value

[Window Sum]- ([Running Sum] – SUM([value]))

Note – for the values that will be displayed, I’ve applied formatting of $ to 0 dp.

Now onto the final Percent to Close calculation which is the value of records at stage closed, as a proportion of the cumulative value. So we need to get the closed value stored against every row

Closed Value

{FIXED : SUM(IF [last_stage order] = 6 THEN [value] END)}

so then we can determine

% To Close

SUM([Closed Value]) / [Cumulative Value]

which is formatted to percentage to 0 dp

So now we have all the data we need to build the viz. I like to save this sheet for future reference, and to double check as we start moving pills around to build the viz.

Building the viz

Start by duplicating the table sheet, and remove the calculations, so only the 3 necessary (Sales, Cumulative Value & % To Close) are listed.

Then move these 3 onto the Columns shelf, and in doing so, the viz should reformat as a bar chart automatically.

Set the colour of the bars to the dark green (#00646d) and tick the Show mark labels option on the Label shelf (you may need to expand the width of the rows to get the label to show.

The first 2 columns are displaying what we need, but the final one needs to show the bars ‘filling up’ to 100%. We need a dual axis for this.

Type into the Columns shelf Min(1) to create a fourth column. Uncheck show mark labels for this column only.

The set this field to be Dual axis, and synchronise the axis. It’s likely the marks will all change to circles, so reset the ‘All’ marks card back to bar. On the Min axis, right click and select Move marks to back, to ensure the % Closed values are sitting on top.

Now change the colour of the Min(1) field to light green (#7cadb2), untick Show Header against last_stage order & SUM([Value]) to hide the axis. Also right click on last_stage in the view, and select Hide field labels for rows.

Finally adjust the size of the labels displayed to 8pt. On the % To Close marks card, change the label alignment to left, and set the font colour to white. Uncheck Show Tooltips for all marks.

And you should now have the main display.

Oh, adjust the Size of the bars to suit.

When added to the dashboard, add a horizontal container above the viz, and use text boxes to display the column titles.

My published viz is here.

Happy vizzin’!

Donna