Can you find the needle in the haystack?

It was Candra’s turn to ‘set’ the #WOW2021 challenge this week providing a hint in the challenge description that the solution would involve sets.

As with many challenges, I built the data out in tabular format to start with to verify I had all the components and calculations correct. The areas of focus are

  • Identify number of distinct customers per product
  • Identify overall average number of distinct customers per product
  • Identify if product above or below average distinct customers
  • Identify Top 50 products by Sales
  • Identify Unprofitable Products
  • Identify products that are both in the top 50 AND unprofitable
  • Building the viz

Identify number of distinct customers per product

To start off, add Product Name, Sub-Category, Category to the Rows shelf to begin building out a table. Add Sales (formatted to $k 0dp) and Profit (formatted to $k 0dp with negative values as () ) to Text and sort by Sales descending.

To identify the distinct customers per product, we can create

Customer Count per Product

{FIXED [Product Name] : COUNTD([Customer ID])}

Add this to the view.

Identify overall average number of distinct customers per product

What we’re looking for here is the average of all the values we’ve got listed in the Customer Count per Product column. Ie we want to sum up those values displayed and divide by the number of rows.

The number of rows is equivalent to the number of products, which we can get from

Count Products

{FIXED : COUNTD([Product Name])}

And so to get the overall average we calculate

Avg Overall Customer Count

{FIXED: SUM([Customer Count Per Product])} / [Count Products]

Add these fields to the view as well, so you can see how the values work per row. The last two calculations give you the same value across all rows.

Identify if product above or below average distinct customers

Given the above display, this is just a case of comparing values in 2 columns

Higher than Avg Customer Count

AVG([Customer Count Per Product]) > SUM([Avg Overall Customer Count])

this returns true or false – add this to the view too.

Identify Top 50 products by Sales

We can create a set for this. Right click on Product Name > Create > Set. Name the set something suitable eg Top 50 Products, and on the Top tab, state the number (50) and the field (Sales) and the aggregation (Sum)

Add this to the view, and if you’ve sorted by the sales, you should find the top 50 rows are all In the set, and the rest are Out.

Identify Unprofitable Products

We can use another set for this. Again create a set off of Product Name, call it Unprofitable Products, and on the Condition tab, set the condition so that the Sum of Profit is less than 0

Add this onto the view too.

Identify products that are both in the top 50 AND unprofitable

For this, we’re explicitly looking for the rows that are both In the Top 50 Products set and In the Unprofitable Products set.

We can use the Combined Set functionality to do this.

In the left hand data pane, select both the Top 50 Products and the Unprofitable Products sets (hold down ctrl to multi select), then right click and Create Combined Set. I called the set Products to Include, and select to combine the sets by including Shared members in both sets

If you then add this field to the Filter shelf, you will be left with just the 13 Products that match

This is the single filter field you can use as per Candra’s requirements.

Building the viz

To get the text to display to the left of the bar, you actually need to create a ‘fake’ bar chart.

  • Add Products to Include to Filter
  • Add Product Name to Rows
  • On the Columns shelf, double click and type in MIN(1)
  • Add Sales to Columns to the right of MIN(1)
  • Sort by Sales descending

Against the MIN(1) marks card

  • Change the Size to small
  • Set the Opacity of the Colour to 0% and the border to None
  • Add Product Name, Sub-Category and Category to the Label shelf and adjust accordingly, aligning left
  • Increase the height of each row to make the text visible

On the Sales marks card

  • Add Higher than Avg Customer to the Colour shelf and adjust
  • Show mark labels
  • Create a new field Profit Ratio : SUM([Profit])/SUM([Sales]) Format to % with 0dp and add to Tooltip
  • Add Profit, and Customer Count by Product to Tooltip and adjust accordingly

Finally, uncheck Show Header against Product Name and MIN(1) and Sales and format the borders/gridlines etc. Add the title, then add to the dashboard.

All done (I hope…)! My published version is here.

Happy vizzin’! Stay Safe!

Donna

Generation Population

For week 17 of #WOW2021, Sean Miller decided to challenge us with recreating a chart by Nathan Yau (see here for the original). The aim was to recreate within 1 sheet, which I managed to do. So how did I do it? Read on 🙂

  • The groundwork
  • Colouring the bars
  • Adding the year labels
  • Final formatting

The groundwork

The chart itself follows a straight forward structure of multiple blue dimension fields on Columns with a green measure on Rows similar to this view below based on Superstore Sales data – it’s just formatted a bit more creatively!

The data provided just contains 3 fields : SEX, AGE and 2019 Population. We want to present the 2019 Population measure for the Total SEX only by Year. We need to create the Year field, which is simply

Year

2019-[AGE]

I dragged this into the ‘dimensions’ section of the data pane (above the line).

This allows us to create the basic bar chart required

We now need to define the various fields that we will need to add as additional dimensions on the Columns shelf to create the ‘generation’ data panes.

Generation

IF [Year]<= 1927 THEN ‘Greatest
Generation’
ELSEIF [Year]<= 1945 THEN ‘Silent Generation’
ELSEIF [Year]<=1964 THEN ‘Baby Boomer’
ELSEIF [Year] <= 1980 THEN ‘Generation X’
ELSEIF [Year]<=1996 THEN ‘Millennials’
ELSEIF [Year]<=2012 THEN ‘Generation Z’
ELSE ‘Gen
Alpha’
END

NOTE – there is a deliberate carriage return in the condition for ‘Greatest Generation’ and ‘Gen Alpha’ which will force the field to ‘wrap’ when displayed.

Having defined the above, we need to determine

Total Population Per Generation

{FIXED [Generation], [SEX]: SUM([2019 Population])}

and then

% of Total Population

SUM([Total Population Per Generation]) / TOTAL(SUM([Total Population Per Generation]))

NOTE – to create this field, I originally created a ‘quick table calculation’ against the Total Population Per Generation field which I’d displayed on a view, and then dragged the resulting pill into the measures pane to create the new field with the desired calc.

Let’s put these in a table, so we can then check the values, and see that the 2nd and 3rd columns are the same value for each row associated to a particular generation, which is what we need.

Right, so now we need to determine the rank based on the Total Population Per Generation

Rank

RANK_DENSE(SUM([Total Population Per Generation]))

Format this to a custom number with 0 decimal places, but prefixed with #

When added to the table we get

The intention, is that Rank will be displayed as discrete ‘header’ pill rather than a measure, so let’s move Rank to be the 1st pill on the Rows shelf and change to be discrete.

But we need the Total Population Per Generation and % of Total Population fields to be combined into a single pill. So we need to do a bit of string manipulation/ number formatting for this

Total | Percent

STR(ROUND(SUM([Total Population Per Generation])/1000000,1)) + ‘M’ + ‘ | ‘ + STR(ROUND([% of Total Population] * 100,1)) +’%’

This looks complicated, but its because even though you may have applied the relevant display number formatting against the individual numeric measures of Total Population Per Generation and % of Total Population, the formatting is not preserved, when converted into a string field, which this field needs to be. So the relevant calculations need to be applied within the field itself.

This outputs the below

Now we need a way to sort the data so the ‘Greatest Generation’ associated to the earliest years is listed first. I did this by determining the minimum date within each Generation.

Min Year Per Generation

{FIXED [Generation], [SEX]: MIN([Year])}

Add this into the view as the first pill in Rows, and the data should automatically sort from lowest to highest

We can now build the viz – duplicate the table sheet, remove Total Population Per Generation and % of Total Population from the Measure Values section. Drag 2019 Population to Columns, then click the swap rows & columns button :

Colouring the bars

The bars are coloured based on each ‘Generation’ pane. You could hardcode this along the lines of ‘Generation = x OR Generation = y or Generation = z etc’ where x, y and z etc are generations of the same colour. This would return true or false, which you can then add to the colour shelf and adjust accordingly.

I decided to be a bit more dynamic, deciding I wanted to set the colour based on whether it was an odd or even pane.

For this I created another ‘rank’ field based on the field I’d used to ‘sort’ the data, the Min Year Per Generation field.

Sort Position

RANK_DENSE(MIN([Min Year per Generation]),’asc’)

If you add this into the data table, you’ll see each section is numbered 1 -7

From this, we can then determine if the number is even (or not)

Sort Position is even number

[Sort Position]%2=0

Add this onto the Colour shelf which will return True or False and colour accordingly.

Adding the year labels

The year labels are achieved by using a dual axis chart, to plot a point for each specific year (based on the Min Year Per Generation field) at some arbitrary value.

Point to Plot Year Label

IF [Year]=[Min Year per Generation] AND [Year]<>1919 THEN 4700000 END

For each ‘min year’ that isn’t 1919, plot a value at 4.7M.

Add this field to the Rows shelf, change mark type to circle, reduce size to as small as it can, and set the colour transparency to 0.

Add Min Year Per Generation to the Label shelf, then change the alignment to vertical.

Now you make the chart dual axis and synchronise the axis. Some of the colours/marks may change, so reset by removing Measure Names from the Colour shelf and changing the mark types bar to bar & circle.

Final formatting

So at this point all the main components are there. It’s now a case of formatting – removing right and bottom axes, removing gridlines. The vertical dashed lines, are column dividers, set at the pane level only.

The solid left hand axis is set via

The text is formatted using the fonts advised in the requirements and sizes adjusted to suit.

Add on a tooltip, and set the background colour of the worksheet and you should be done.

My published viz is here.

Happy vizzin’! Stay Safe!

Donna

Can you use Quick LoDs to recreate this view?

It was Lorna’s turn to set the challenge this week, and she took the opportunity to ask us to use a new feature in Tableau v2021.1 – Quick LoDs (you’ll obviously need v2021.1 to use the functionality, but the LoDs can be created manually in earlier versions if need be).

This blog will focus on

  • Creating the Sub-Category Average Sales by Category LoD using Quick LoDs
  • Formatting the difference calculation
  • Colouring the bars
  • Text for Tooltip
  • Putting it all together

The viz itself isn’t that complex once you’ve nailed the LoD, so lets’ start with that bit.

Creating the Sub-Category Average Sales by Category LoD using Quick LoDs

The what…? It’s a bit of a mouthful,.. the “Sub-Category Average Sales by Category”. What we’re essentially after here is the Total Sales per Category / No of Sub-Categories in the Category. To do this with LoDs, we first need to create an LoD to represent the total sales in each Sub-Category.

As mentioned in the requirements / referenced in the KB article above, you use a combination of ctrl/command click & drag to create an LoD via the Quick LoD feature. In this case I dragged the Sales measure onto the Sub-Category dimension, and this automatically created

Sales (Sub-Category)

{ FIXED [Sub-Category]: SUM([Sales]) }

which is the sales per sub-category.

From this, I then dragged this measure onto the Category dimension, which automatically then created

Sales (Sub-Category) (Category)

{ FIXED [Category]: SUM([Sales (Sub-Category)]) }

BUT I then edited this to change the aggregation to AVG, so the field became

{ FIXED [Category]: AVG([Sales (Sub-Category)]) }

This gives the average value required, which you can see is the same across the rows in a single Category :

Formatting the difference calculation

The viz displays the % difference between the Sub-Category sales and the average. This is calculated with

Difference

(SUM([Sales])- SUM([Sales (Sub-Category) (Category)])) / SUM([Sales (Sub-Category) (Category)])

This is then custom formatted as ▲0%;▼0% (I use this site to get my shapes from).

Colouring the bars

The bars need to be coloured based on the value of the Difference field, so another calculated field is required

Colour

[Difference]>0

This will just return true or false and can dropped on the Colour shelf of the bars.

Text for Tooltip

Within the tooltip, the % difference is displayed, along with some text which indicates if it is above or below the average. For this we need another calculated field to reference in the Tooltip.

Above | Below

IF [Difference]>0 THEN ‘Above’ ELSE ‘Below’ END

Putting it all together

With all the calculated fields built, the the chart itself is a relatively simple dual axis chart

  • Add Category then Sub-Category to Rows
  • Add Sales to Columns and sort descending
  • Add Sales (Sub-Category) (Category) to Columns. Make dual axis and synchronise axis
  • Click on the All marks card and remove Measure Names from the Colour shelf.
  • Click on the Sales marks card and change mark type to bar; add the Colour field to the Colour shelf and adjust accordingly; add Difference to the Label shelf and format appropriately.
  • Click on the Sales (Sub-Category) (Category) marks card and change mark type to Gantt; adjust the Size to be as large as possible; set the colour to the relevant grey.
  • On the All marks card, add the Above | Below field to the Tooltip shelf, then edit the Tooltip on the All marks card to create the required text.
  • Finally, remove axis, remove the column heading labels, remove grid lines and column borders and format the displayed text appropriately. Title the viz.

A very brief post today, but hopefully I’ve ticked off all the required elements, My published viz is here.

Happy vizzin’! Stay Safe!

Donna

Can you build a Customer Lifetime Value Matrix?

Ann Jackson challenged us this week to build this matrix depicting the average worth of customer cohorts during their lifetime.

This challenge involves a mix of LoDs (Level of Detail calculations) and table calculations.

First up , we need to define our customer cohorts (ie group the customers), which for this challenge is based on identifying the quarter they placed their first order in. This will involve an LoD calculation. For a good introduction to LoDs with some worked examples (including a similar cohort analysis example), check out this Tableau blog post.

ACQUISITION QUARTER

DATE(DATETRUNC(‘quarter’,{FIXED [Customer ID] : MIN([Order Date])}))

The 2nd part of the formula in the { … } returns the earliest Order Date associated to the Customer ID, which is then truncated to the 1st day of the quarter that date falls in ie 23 Feb 2019 is truncated to 01 Jan 2019.

For the ‘quarters since birth’ field, we need to calculate the difference in quarters, between the ACQUISITION QUARTER and the ‘quarter’ associated to the Order Date of each order in the dataset.

QUARTERS SINCE BIRTH

DATEDIFF(‘quarter’,[ACQUISITION QUARTER],DATETRUNC(‘quarter’,[Order Date]))

Drag this field into the ‘dimensions’ area of the left hand data pane (above the line if you’re using later versions of Tableau).

Lets sense check what this looks like, by adding

  • ACQUISITION QUARTER to Rows (Discrete, Exact Date)
  • ORDER DATE to Rows, set to Quarter (quarter year ie May 2015 format which will make a green pill), then set to discrete to change to blue
  • QUARTERS SINCE BIRTH to Rows

You can see that while the first row against each cohort starts with a different quarter, the QUARTERS SINCE BIRTH always starts at 0 and counts sequentially down the table.

Next we want to count the number of distinct customers in each cohort, and we’ll use another LOD for this.

CUSTOMERS

{FIXED [ACQUISITION QUARTER]: COUNTD([Customer ID])}

Once again move this field into the Dimensions section of the data pane.

Add this onto the Rows of the above data table, and you should get every row for the same cohort displaying the same number

Add Sales onto Text to get the value of sales made by the customer in each cohort in each quarter. The ‘customer lifetime value’ we need is defined as the total sales so far / number of customers in the cohort.

Remove the QUARTER(Order Date) field from the table, as we’re not going to need this for the display, and it’ll affect the next steps if it’s left.

To get the cumulative sales, we need a Running Total Quick Table Calculation. Click on the Sales pill on the Text shelf and select Quick Table Calculation -> Running Total. The click again and Compute By -> QUARTERS SINCE BIRTH. Add Sales back into the table, so you can see the quarterly Sales value and how it’s cumulating until it reaches the next cohort.

We’ve now got the building blocks we need for the CLTV value we need to plot

Avg Lifetime Value

RUNNING_SUM(SUM([Sales])) / SUM([CUSTOMERS])

Note – I purposefully haven’t called this field what you might expect, as I’m going to ‘fill in the gaps’ that Ann describes in the requirements, and I’ll use that name then.

Pop this field into the table above, again setting the table calculation to compute by QUARTERS SINCE BIRTH

You can now use the data table above to validate the calculation is what you expected.

Now let’s build the viz out.

On a new sheet

  • QUARTERS SINCE BIRTH to Columns
  • ACQUISITION QUARTER (exact date, discrete blue pill) to Rows
  • Avg Lifetime Value to Text, setting the table calculation to Compute By QUARTERS SINCE BIRTH

From this basic text table, you can see the ‘blank’ fields, Ann mentioned. In the data table view, it’s not so obvious. The blank is there because there are no sales in those quarters for those cohorts. To fix we need another table calculation

CUSTOMER LIFETIME VALUE (CLTV)

IF ISNULL([Avg Lifetime Value]) AND NOT ISNULL(LOOKUP([Avg Lifetime Value],-1)) AND NOT ISNULL(LOOKUP([Avg Lifetime Value],1))

THEN LOOKUP([Avg Lifetime Value],-1) ELSE [Avg Lifetime Value] END

This says, if the Avg Lifetime Value field is NULL but neither the previous or the subsequent values are NULL, then use the Avg Lifetime Value value from the previous column (LOOKUP).

Replace the Avg Lifetime Value with the CUSTOMER LIFETIME VALUE (CLTV) field (setting the Compute By again), and the empty spaces have disappeared.

If you hover over the cells in the lower right hand side of the view, you’ll see tooltips showing, indicating that a mark has been drawn on the viz with Null data. To fix this, add CUSTOMER LIFETIME VALUE (CLTV) to the Filter shelf and specify non-null values only to show via the Special tab.

Now if you hover over that area you don’t get any tooltips displaying, as there aren’t any marks there.

Now it’s just a case of formatting the viz a bit more

  • Add CUSTOMERS to Rows
  • Add CUSTOMER LIFETIME VALUE (CLTV) to the Colour shelf by holding down the Ctrl key then clicking on the field that’s already on the Text shelf, and dragging via the mouse onto the Colour shelf. Using Ctrl in this way has the effect of copying the field including the table calculation settings, so you don’t need to apply them again. This will change the colour of the Text.
  • Then change the mark type to Square, which will then fill out the background based on the colour.
  • Then edit the colour legend to the relevant palette (which you may need to install via Ann’s link).
  • Set the border of the mark via the Colour shelf to white
  • Remove the row & column dividers
  • Set the row Axis Ruler to a dark black/grey line
  • Format the 1st 2 columns so the font is the same and centred. Widen the columns if required.
  • Update the tooltip

And then you should be ready to add the viz to your dashboard. My published version is here.

This blog is a bit more detailed that my recent posts, but I’m also conscious I’ve skipped over some bits that if you’re brand new to Tableau, you may not be sure how to do. Feel free to add comments if you need help!

Happy vizzin’! Stay Safe!

Donna

Can you create a small multiple waterfall chart?

For this week’s #WOW2020 challenge, Lorna Brown asked us to recreate a waterfall chart – a chart style that hasn’t featured in many previous challenges (if at all), and is always a useful one to know how to build.

I’m familiar with these, and this challenge didn’t cause me too many issues, so this blog is going to be brief.

  • Building the waterfall
  • Small multiple / grid layout
  • Adding the month label

Building the waterfall

So I’d built out the basic waterfall for each month and day by plotting Year(Order Date), Month(Order Date) and Day(Order Date) on Columns. The Day(Order Date) field was set to Show Missing Values so each day without an order was still plotted, and I was trying to figure out how to get the additional ‘long’ bar at the end.

I worked out it was essentially a ‘total’ bar and when I duplicated my data as crosstab and played round with the data in a tabular form, I got the subtotals I needed displayed.

But I seemed to be having issues displaying these on the chart view. So I turned to my usual route, Google, and had a search, and came across this blog from Tim Ryan at The Data School, which gives you the complete guide to building the waterfall, so there’s no need for me to repeat it all – thanks Tim! 🙂

My issue was I had a green continuous Day(Order Date) field rather than a blue discrete one – doh!

The only couple of things you need to make note of – you need to ensure you have 0 displayed for the missing dates

Actual Profit

ZN(SUM([Profit]))

and the gantt bars should be coloured red for negative profit, blue for positive and grey for the missing days

Colour

IF SUM([Profit])<0 THEN ‘Red’ ELSEIF SUM([Profit]) > 0 THEN ‘Blue’
ELSE ‘Grey’
END

Small multiple / grid layout

For this you need fields to add to the Rows and Columns shelf that position the month in the appropriate cell.

The Quarter(Order Date) (blue discrete) on Rows, is just used to define the row a month lands in.

You then need

Cols

IF MONTH([Order Date])%3 = 0 THEN 3
ELSE MONTH([Order Date])%3
END

which assigns each month a value of 1,2 or 3. You’ll need this on the Columns shelf.

Adding the month label

The label shows the month and the total profit in the month, so I created an LOD for this

Profit for Month

{FIXED YEAR([Order Date]),MONTH([Order Date]): SUM([Profit])}

From this I wanted the maximum value of all the monthly profits

Max Monthly Profit in Year

{FIXED Year([Order Date]): MAX([Profit for Month])}

I then used a dual axis to plot this field on the Rows, set the mark type to a line and set the opacity of the line colour to 0%, so it disappears.

The month and value were then added to the Label shelf and the label set to Label start of line only and also right aligned to get the required positioning.

And that’s it. I said this would be brief 🙂 My published viz is here.

Happy vizzin’! Stay Safe!

Donna

Can you show the Year-on-Year Trend?

It’s Community Month over at #WOW HQ this month, which means guest posters, and Kyle Yetter kicked it all off with this challenge. Having completed numerous YoY related workbooks both through work and previous #WOW challenges, this looked like it might be relatively straight forward on the surface. But Kyle threw in some curve balls, which I’ll try to explain within this blog. The points I’ll be focussing on

  • YoY % calculation for colouring the map
  • Displaying the circles on the map
  • Restricting the Date parameter to 1st Jan – 14th July only
  • Showing Daily or Weekly dates on the viz in tooltip
  • Restricting to full weeks only (in weekly view)

YoY % calculation

The data provided includes dates from 1st Jan 2019 to 21st July 2020. We need to be able to show Current Year (CY) values alongside Previous Year (PY) values and the YoY% difference. I built up the following calculations for all this

Today

#2020-07-15#

This is just hardcoded based on the requirement. In a business scenario where the data changes, you may use the TODAY() function to get the current date.

Current Year

YEAR([Today])

simply returns 2020, which I could have hardcoded as well, but I prefer to build solutions as if the data were more dynamic.

CY

IF YEAR([Subscription Date]) = [Current Year] THEN [Subscription] END

stores the value of the Subscription field but only for records associated to 2020

PY

IF YEAR([Subscription Date]) = [Current Year]-1 THEN [Subscription] END

stores the value of the Subscription field but only for records associated to 2019 (ie 2020-1)

YoY%

(SUM([CY])- SUM([PY]))/SUM([PY])

format this to a percentage with 0 decimal places. This ultimately is the measure used to colour the map. CY, PY & YoY% are also referenced on the Tooltip.

Displaying circles on the map

This is achieved using a dual axis map (via a second instance of the Latitude pill on Rows). One ‘axis’ is a map mark type coloured by the YoY% and the other is a circle mark type, sized by CY, explicitly coloured black.

The Tooltip for the circle mark type also shows the % of Total subscriptions for the current year, which is a Percent of Total Quick Table Calculation

Restricting the Date parameter to 1st Jan – 14th July only

As mentioned the Subscription Date contains dates from 01 Jan 2019 to 21 July 2020, but we can’t simply add a filter restricting this date to 01 Jan 20 to 14 Jul 20 as that would remove all the rows associated to the 2019 data which we need available to provide the PY and YoY% values.

So to solve this we need a new date field, and we need to baseline / normalise the dates in the data set to all align to the same year.

Baseline Date

//set all dates to be based on current year
MAKEDATE([Current Year], MONTH([Subscription Date]), DAY([Subscription Date]))

So if the Subscription Date is 01 Jan 2019, the equivalent Baseline Date associated will be 01 Jan 2020. The Subscription Date of 01 Jan 2020 will also have a Baseline Date of 01 Jan 2020.

We also want to ensure we don’t have dates beyond ‘today’

Include Dates < Today

[Baseline Date]< [Today]

Add Include Dates < Today to the Filter shelf, and set to True.

Add Baseline Date to the Filter shelf, choose Range of Dates , and by default the dates 01 Jan 2020 to 14 Jul 2020 should be displayed

Select to Show Filter, and when the filter displays, select the drop down arrow (top right) and change to Only Relevant Values

Whilst you can edit the start and end dates in the filter to be before/after the specific dates, this won’t actually use those dates, and the filter control slider can only be moved between the range we want.

The Baseline Date field should then be custom formatted to mmmm dd to display the dates in the January 01 format.

Showing Daily or Weekly dates on the viz in tooltip

The requirements state that if the date range selected is <=30 days, the trend chart shown on the Viz in Tooltip should display daily data, otherwise it should be weekly figures, where the week ‘starts’ on the minimum date selected in the range.

There’s a lot going on to meet this requirement.

First up we need to be able to identify the min & max dates selected by the user via the Baseline Date filter.

This did cause me some trouble. I knew what I wanted, but struggled. A FIXED LOD always gave me the 1st Jan 2020 for the Min Date, regardless of where I moved the slider, whereas a WINDOW_MIN() table calculation function caused issues as it required the data displayed to be at a level of detail that I didn’t want.

A peak at Kyle’s solution and I found he’d added the date filters to context. This means a FIXED LOD would then return the min & max dates I was after.

Min Date

{MIN([Baseline Date])}

Note this is a shortened notation for {FIXED : MIN([Baseline Date])}

Max Date

{MAX([Baseline Date])}

With these, we can work out

Days between Min & Max

DATEDIFF(‘day’,[Min Date], [Max Date])

which in turn we can categorise

Daily | Weekly

IF [Days between Min & Max]<=30 THEN ‘Daily’ ELSE ‘Weekly’ END

We also need to understand the day the weeks will start on.

Day of Week Min Date

DATEPART(‘weekday’,[Min Date])

This returns a number from 1 (Sunday) to 7 (Saturday) based on the Min Date selected.

Using this we can essentially ‘categorise’ and therefore ‘group’ the Baseline Date into the appropriate week.

Baseline Date Week

CASE [Day of Week Min Date]
WHEN 1 THEN DATETRUNC(‘week’,([Baseline Date]),’Sunday’)
WHEN 2 THEN DATETRUNC(‘week’,([Baseline Date]),’Monday’)
WHEN 3 THEN DATETRUNC(‘week’,([Baseline Date]),’Tuesday’)
WHEN 4 THEN DATETRUNC(‘week’,([Baseline Date]),’Wednesday’)
WHEN 5 THEN DATETRUNC(‘week’,([Baseline Date]),’Thursday’)
WHEN 6 THEN DATETRUNC(‘week’,([Baseline Date]),’Friday’)
WHEN 7 THEN DATETRUNC(‘week’,([Baseline Date]),’Saturday’)
END

Ideally we want to simplify this using something like DATETRUNC(‘week’, [Baseline Date], DATEPART(‘weekday’, [Min Date])), but unfortunately, at this point, Tableau won’t accept a function as the 3rd parameter of the DATETRUNC function.

Let’s just have a look at what we’ve got so far

Rows for California only showing the Subscription Dates from 01 Jan 2019 – 10 Jan 2019 and 01 Jan 2020 to 10 Jan 2020. Min & Max date for all rows are identical and matches the values in the filter. The Baseline Date field for both 01 Jan 2019 and 01 Jan 2020 is January 01. The Baseline Date Week for 01 Jan 2019 – 07 Jan 2019 AND 01 Jan 2020 – 07 Jan 2020 is 01 Jan 2020. The other dates are associated with the week starting 08 Jan 20202.

So now we have all this information, we need yet another date field that will be plotted on the date axis of the Viz in Tooltip.

Date to Plot

IF [Days between Min & Max] <=30 THEN ([Baseline Date]) ELSE [Baseline Date Week] END

If you add this field to the tabular display I built out above, you can see how the value changes as you move the filter dates to be within 30 days of each other and out again.

When added to the actual viz, this field is formatted to dd mmm ie 01 Jan, and then is plotted as a continuous, exact date (green pill) field on the Columns alongside the Daily | Weekly field, with State & Subscription on Rows. The YEAR(Subscription Date) provides the separation of the data into 2 lines.

Restricting to full weeks only (in weekly view)

The requirements state only full weeks (ie 7 days of data) should be included when the data is plotted at a weekly level. For this we need to ascertain the ‘week’ the maximum date falls in

Max Date Week

CASE [Day of Week Min Date]
WHEN 1 THEN DATETRUNC(‘week’,([Max Date]),’Sunday’)
WHEN 2 THEN DATETRUNC(‘week’,([Max Date]),’Monday’)
WHEN 3 THEN DATETRUNC(‘week’,([Max Date]),’Tuesday’)
WHEN 4 THEN DATETRUNC(‘week’,([Max Date]),’Wednesday’)
WHEN 5 THEN DATETRUNC(‘week’,([Max Date]),’Thursday’)
WHEN 6 THEN DATETRUNC(‘week’,([Max Date]),’Friday’)
WHEN 7 THEN DATETRUNC(‘week’,([Max Date]),’Saturday’)
END

so if the maximum date selected is a Thursday (eg Thurs 11th June 2020) but the minimum date happens to be a Tuesday, then the week starts on a Tuesday, and this field will return the previous Tuesday date (eg Tues 9th June 2020).

And then to restrict to complete weeks only…

Full Weeks Only

IF [Daily | Weekly]=’Weekly’ THEN
[Date To Plot]< [Max Date Week]
ELSE TRUE
END

If we’re in the ‘weekly’ mode, the Date To Plot field will be storing dates related to the start of the week, so will return true for all records where the field is less than the week of the max date. Otherwise if we’re in ‘daily’ mode we just want all records.

This field is added to the Filter shelf and set to true.

Hopefully that covers off all the complicated bits you need to know to complete this challenge. My published solution is here.

Happy vizzin’! Stay Safe!

Donna

Can you create a BCG growth share matrix with Sales and Orders?

Week 39 of #WorkoutWednesday was set by Lorna this week. Details of the challenge and a description of what a BCG Growth Share Matrix is here.

The main complexity of this challenge is the calculations, and more so with Lorna’s stipulation ‘no table calcs allowed’.

I started using Tableau before LoDs were introduced, so I tend to be much more comfortable with Table Calcs than I do with LoDs; I know there are many people who are the other way round.

So, let’s get started. In the interest of time (it’s now Sunday afternoon, and I’ve dipped into this challenge on & off since Wednesday), I’m going to focus on the calcs involved in achieving my solution, and not necessarily going into the detail of putting it all together on the Viz.

Sales Scatter

To build the Sales Scatter Plot, we need the following information to help get the %growth and %market share values :

  • The total sales across all the specified years per subcategory
  • The sales for the latest year per subcategory
  • The total sales for the previous years per subcategory
  • The overall total sales across all the specified years

The number of years to compare against is set by a parameter to be 2,3,or 4 years (No.of Years).

To limit the data just to these years, I created a Dates To Include as

YEAR([Order Date]) >= [Max Year in Dataset] – ([No of Years]-1)

Max Year in Dataset is another field that I could have simply hardcoded to 2019, but I decided to be more dynamic, and derive it using an LoD

YEAR({FIXED: MAX([Order Date])})

which gets the latest Order Date in the data and then retrieves the Year part of it, which happens to be 2019 in this instance. Dates To Include is then True if the year of the order date is greater than, or equal to, the maximum year less one less than the No of Years param.

Eg if No of Years = 2, then Dates to Include is True if the year of the order date >= 2019 – (2-1) = 2019-1 = 2018.

if No of Years = 4, then Dates to Include is True if the year of the order date >= 2019 – (4-1) = 2019-3= 2016.

To sense check the numbers I need, I built a tabular view by Sub Category adding Dates to Include = True to the filters shelf.

Total sales across all the specified years per subcategory

is simply achieved by adding SUM([Sales]) to the view.

Sales Latest Year

The sales for the latest year per subcategory is

IF YEAR([Order Date]) = [Max Year in Dataset] THEN [Sales] END

Sales Previous Years

The sales for the previous years per subcategory is then just the opposite

IF YEAR([Order Date]) <> [Max Year in Dataset] THEN [Sales] END

The value of this field ultimately changes as the No of Years parameter changes. If No of Years = 2, then this will contain the 2018 sales, if its 3, then it will contain the sum of the sales in 2017 & 2018 etc.

The Growth % is stated as being the change for the latest year in comparison to the average sales of the previous years. So next I need

Average Sales Previous Years

SUM([Sales Previous Years])/([No of Years]-1)

Note the [No of Years]-1, as when comparing across 2 years, I’m really only being asked to compare 2019 with the previous year, ie 2018.

Now I can find the Sales Growth %

Growth Sales

(SUM([Sales Latest Year])-[Average Sales Previous Years])/[Average Sales Previous Years]

The difference of the latest year compared to the average, as a proportion of the average.

Now we need the Market Share of Sales. This would usually be a pretty simple Percent of Total quick table calculation on SUM([Sales]), but I’ve been challenged to do this without table calcs. So an LoD is needed instead. I created

Sales All Years

{FIXED [Dates to Include]:SUM([Sales])}

which gives me my total sales across the specified years, at the ‘overall’ level.

From this I can calculate

Market Share Sales

SUM([Sales])/SUM([Sales All Years])

The table below gives the breakdown of all these fields at a sub-category level for 2 years

Next step is to categorise each row into Cash Cows, Dogs, Question Marks or Stars based on their % Growth & % Market Share values. Lorna stated that 7% should be considered the mid-point for % Market Share, but that the Growth mid point should be half of the maximum growth value.

From the table above, 64.8% (against appliances) is the maximum value in the set, so I need a new field against each row that outputs 32.4%. Once again, with Table Calcs I would just have used WINDOW_MAX to find the max and halve it. But instead I needed another LOD, this one a bit more complex…

Mid Sales Growth Constant (Sub Cat)

({EXCLUDE [Sub-Category]:MAX({INCLUDE [Sub-Category]: (SUM([Sales Latest Year])-[Average Sales Previous Years])/[Average Sales Previous Years]})})/2

I find INCLUDE and EXCLUDE LoDs a bit of a black art, and there may well have been another way to do this, but it was a bit of trial and error in the end, and so once it gave the value I was after, I stuck with it 🙂 I’ve tried to write some sentences to explain it better, but can’t 😦

So with the mid point defined, the categorisation is then

Sales Category (Sub Cat)

IF [Growth Sales]<=MIN([Mid Sales Growth Constant (Sub Cat)]) AND [Market Share Sales]<=MIN([Mid Market Share Constant]) THEN ‘Dogs’ ELSEIF [Market Share Sales]<=MIN([Mid Market Share Constant]) AND [Growth Sales]>MIN([Mid Sales Growth Constant (Sub Cat)]) THEN ‘Question Marks’
ELSEIF [Growth Sales]>MIN([Mid Sales Growth Constant (Sub Cat)]) AND [Market Share Sales]>MIN([Mid Market Share Constant]) THEN ‘Stars’
ELSE ‘Cash Cows’
END

This is all the information now needed to plot the Sales Scatter chart. Custom shapes were sourced from flaticon.com and I used Paint to colour and resize them appropriately, before saving them into a folder in the Shapes folder of the My Tableau Repository directory on my laptop.

Order Scatter

The steps for creating the Orders scatter are the same in principle. I need to find

  • The total orders across all the specified years per subcategory
  • The number of orders for the latest year per subcategory
  • The number of orders for the previous years per subcategory
  • The overall number of orders across all the specified years

However in doing this I got quite different numbers from Lorna, and its due to how you choose to count orders. Unlike a sales value, where the value is just attributed to a single line in the data set, an Order ID can span multiple lines, which means they can span sub-categories too. Eg imagine you have

Order A – contains 2 line of appliances, and 1 line of phones

Order B – contains 1 line of appliances.

How many orders are there? Answer = 2

How many order lines are there? Answer = 4

How many orders contain appliances? Answer = 2 (100%)

How many order lines contain appliances? Answer = 3 (75%)

How many orders contain phones? Answer 1 (50%)

How many order lines contain phones? Answer 1 (25%)

So when it comes to the ‘market share %’, should I be considering order lines (which means it’ll sum to 100%) or distinct orders which means it’ll sum to more than 100%.

I wrestled with this, and ultimately concluded, this challenge is purely for illustrative purposes, and to stick with my original assumption based on distinct orders (which I’d already calculated before finding the discrepancy). So my Market Share % won’t sum to 100%, and probably isn’t really a definition of Market Share, but I made peace with myself and moved on 🙂

# Orders

COUNTD([Order ID])

Orders Latest Year

COUNTD(IF YEAR([Order Date]) = [Max Year in Dataset] THEN [Order ID] END)

Orders Previous Years

COUNTD(IF YEAR([Order Date]) <> [Max Year in Dataset] THEN [Order ID] END)

Average Orders Previous Years

[#Orders Previous Years]/([No of Years]-1)

Growth Orders

([#Orders Latest Year]-[Average Orders Previous Years])/[Average Orders Previous Years]

Total Orders

{FIXED [Dates to Include] : COUNTD([Order ID])}

Market Share Orders

([# Orders])/MIN([Total Orders])

For 2 years this gave me

Once again the Market share mid point was defined to be a 7% constant, while the growth % mid point needed to be half the max value, which in this instance based on the above, was 1/2 of 51.2%. The same type of LOD was required which gave me

Mid Orders Growth Constant (Sub Cat)

({EXCLUDE [Sub-Category]:MAX({INCLUDE [Sub-Category]: ([#Orders Latest Year]-[Average Orders Previous Years])/[Average Orders Previous Years]})})/2

and from this I could the categorise with

Orders Category (Sub Cat)

IF [Growth Orders]<=MIN([Mid Orders Growth Constant (Sub Cat)]) AND [Market Share Orders]<=MIN([Mid Market Share Constant]) THEN ‘Dogs’ ELSEIF [Market Share Orders]<=MIN([Mid Market Share Constant]) AND [Growth Orders]>MIN([Mid Orders Growth Constant (Sub Cat)]) THEN ‘Question Marks’
ELSEIF [Growth Orders]>MIN([Mid Orders Growth Constant (Sub Cat)]) AND [Market Share Orders]>MIN([Mid Market Share Constant]) THEN ‘Stars’
ELSE ‘Cash Cows’
END

Top 20 Manufacturers Bar Chart

To create the bar chart, I created a new field Manufacturer-SubCat as

[Manufacturer] + ‘ (‘ + [Sub-Category] + ‘)’

I then set about creating the same data tables as I’ve included above, one for Sales and one for Orders.

The majority of the measures used were the same, but when it came to determining the growth mid-point, I found creating a separate field was simpler than trying to use a single field that worked across both Sub-Category and Manufacturer-SubCat. I followed the same methodology though:

Mid Sales Growth Constant (Manu Sub Cat)

({EXCLUDE [Manufacturer-SubCat]:MAX({INCLUDE [Manufacturer-SubCat]: (SUM([Sales Latest Year])-[Average Sales Previous Years])/[Average Sales Previous Years]})})/2

Mid Orders Growth Constant (Manu Sub Cat)

({EXCLUDE [Manufacturer-SubCat]:MAX({INCLUDE [Manufacturer-SubCat]: ([#Orders Latest Year]-[Average Orders Previous Years])/[Average Orders Previous Years]})})/2

I also found though that the 7% market share constant didn’t really give me the split, so I decided a 3% constant would be better to give me categorisations at this level into each bracket. As I had different comparison values, I needed new categorisation fields too.

Sales Category (Manu Sub Cat)

IF [Growth Sales]<=MIN([Mid Sales Growth Constant (Manu Sub Cat)]) AND [Market Share Sales]<=MIN([Mid Market Share Constant (Manu Sub Cat)]) THEN ‘Dogs’ ELSEIF [Market Share Sales]<=MIN([Mid Market Share Constant (Manu Sub Cat)]) AND [Growth Sales]>MIN([Mid Sales Growth Constant (Manu Sub Cat)]) THEN ‘Question Marks’
ELSEIF [Growth Sales]>MIN([Mid Sales Growth Constant (Manu Sub Cat)]) AND [Market Share Sales]>MIN([Mid Market Share Constant (Manu Sub Cat)]) THEN ‘Stars’
ELSE ‘Cash Cows’
END

Orders Category (Manu Sub Cat)

IF [Growth Orders]<=MIN([Mid Orders Growth Constant (Manu Sub)]) AND [Market Share Orders]<=MIN([Mid Market Share Constant (Manu Sub Cat)]) THEN ‘Dogs’ ELSEIF [Market Share Orders]<=MIN([Mid Market Share Constant (Manu Sub Cat)]) AND [Growth Orders]>MIN([Mid Orders Growth Constant (Manu Sub)]) THEN ‘Question Marks’
ELSEIF [Growth Orders]>MIN([Mid Orders Growth Constant (Manu Sub)]) AND [Market Share Orders]>MIN([Mid Market Share Constant (Manu Sub Cat)]) THEN ‘Stars’
ELSE ‘Cash Cows’
END


To Sort the Manufacturer-SubCategory data by Sales or Orders, I created a SORT BY parameter with values of Sales and Orders.

I then created

Sort Value

IF [Sort] = ‘Order’ THEN [# Orders] ELSE SUM([Sales]) END

On the Manufacturer-SubCat field, I then set the Sort by property to be

Manufacturer-SubCat was then also added to the Filters shelf, with the filter set to

And that covers the main calcs and complexities that went into my version/interpretation of this challenge. I hate it when I can’t get the numbers to match 😦 but ultimately now feel comfortable with what I did and I hope it might explain why some of you may have got differences too…

My version of the viz is here, and I’ve included additional detail on the tooltips too which should show the numbers that went into the % calcs for each mark.

Happy vizzin!

Donna

Which months have the higher number of orders?

For week 29 of #WorkoutWednesday2019, Luke Stanke set the challenge above (described here), which is comparing the overall average of orders placed per day against the average for each month.

On the face of it, this didn’t seem too bad (especially for a challenge set by Luke).  Some days when I tackle these challenges, the path I take can be long and arduous with several false starts along the way; the result being more of a ‘happy accident’ than anything of real coherence. Luckily for me, since I’d promised to start blogging on these challenges, this wasn’t one of them.

My usual approach to any of these challenges is to take some time reviewing the published viz on Tableau Public; hovering over the various marks to understand what’s on the tooltips and seeing if I can get any clues into how the various objects (marks, views, legends, titles etc) have been rendered, whether axis are being sneakily used, understanding the interactivity at play etc.

My next step is to then get the figures right based on what’s presented.  I typically like to create a ‘data’ sheet in the workbook – a tabular view of the data and associated calculations I have built, so I can easily sense check with the published viz whether my assumptions and computations are valid, and it provides a useful reference point if I’m trying to figure out what I did sometime later. 

So I started this challenge the same way.

First up, I assumed the ‘line’ was probably a reference line, and hovering over the viz confirmed this.  Everything else seemed pretty straightforward, so onto the figures.

1. Average no of orders per day per segment

This is what the line represents, and do to this I need to find the total number of orders placed per segment and the total number of days on which orders were placed for each segment.  The average is then just count orders / count days.

Some people will create everything in a single calculated field, but I like to break things up to help me troubleshoot if things don’t quite work as intended, so I ended up with 3 calculated fields, and since the overall average was required at a level higher than the level of detail being displayed (which is month in this viz), I figured LoD calculations were the way to go.

Count Orders Per Segment

{FIXED [Segment]: COUNTD([Order ID])}

For each segment, count the number of distinct orders that exist.

Count Days Per Segment

{FIXED [Segment]: COUNTD([Order Date])}

For each segment, count the number of distinct days on which an order was placed.

Overall Avg Orders Per Day Per Segment

SUM([Count Orders per Segment]) / SUM([Count Days Per Segment])

Format this to 2dp.

2. Avg no of orders per day per segment per month

This is what each coloured ‘bar’ represents, and for this I needed to find the total number of orders placed per segment per month, and the total number of days in each month on which orders were placed for each segment.   I chose to stick with LoDs again for this :

Count Orders Per Segment Per Month

{FIXED [Segment], MONTH([Order Date]): COUNTD([Order ID])}

For each segment and month, count the number of distinct orders that exist.

Count Days Per Segment Per Month

{FIXED [Segment], MONTH([Order Date]): COUNTD([Order Date])}

For each segment and month, count the number of distinct days that an order was placed.

Avg Orders Per Day Per Segment Per Month

SUM([Count Orders Per Segment Per Month]) / SUM([Count Days Per Segment Per Month])

Format this to 2dp.

Putting these fields out in a table, you can see the first three columns contain the same values for each segment even though the data is being displayed at the month level.  The final 3 columns are the monthly figures.  These numbers all reconcile back to the data displayed on the viz.

So I’m heading in the right direction, now onto the next bit.

The requirement is to Label the bars with the percent difference between the monthly value and the overall value”, so I need another field…

3.  % Difference

…which is basically the difference between column 3 and column 6 above, as a proportion of column 3

([Avg Orders Per Day Per Segment Per Month]-[Overall Avg Orders Per Day Per Segment])/[Overall Avg Orders Per Day Per Segment]

By default though, this shows 0s when added to the table as the numbers calculated are actually 0.38, -0.33 etc, so it needs formatting as a percentage to 0 decimal places.  However, applying the standard ‘percentage’ number format won’t quite cut it for this challenge, as Luke has labelled the positive numbers with a + too.  To get this, I need to apply custom formatting.

The easiest way to get this right I find, is to use one of the default number formatting options to set the number in whatever ‘main’ format you need, eg if you have a monetary value to display in £k, use the Currency (Custom) to get all the settings right.  In this instance though I want Percentage set to 0 dp..

Once done, press ok to close the dialog box, then go to set the number format again, but this time choose Custom.  The formatting ‘style’ applied previously will be shown

and it can then be modified to get the desired format, in this instance I change to

It’s often these little formatting tips that get thrown into the #WorkoutWednesday challenges that I love the most, although sometimes they can be tucked away and hard to find (or remember).

So at this point I now think I can start to build the viz, so adding the various pills I need, and adding a reference line I get this…

A line chart isn’t what I want though, and it’s not bars either, as they start from 0.  The mark type I need is gantt

which makes me realise I need to create another field…

4.  Difference

The gantt chart has the marks in the right position, but to make the ‘bars’ I need to alter the size, and that size is the difference between the mark position and reference line, which is

([Avg Orders Per Day Per Segment Per Month]-[Overall Avg Orders Per Day Per Segment])

Placing this field on size, gives me this but while the bars are the right height, they’re not in the right place.

Using gantt bars in this way is akin to the technique used in building waterfall charts, and is rectified simply by applying a multiple of -1 to the pill on the size shelf (in this instance [Difference]).

As a shortcut, I simply type this into the pill on the size shelf itself, which is a nifty little trick.

And voila! The viz is essentially there now.  Just need to add colour and further formatting …

The ‘bars’ are coloured based on whether they’re above or below the line, ie whether the difference is positive or negative, so another field is needed :

5. COLOUR : Difference

IF [Difference]>=0 THEN ‘green’ ELSE ‘blue’ END

And popping this on the colour shelf, and adjusting the colours to suit, gives me

The main thing left now is the little formatting bits and pieces :

Borders : remove columns

Axis : remove title, set to be independent, and set not to start at zero

Reference Line : change the label font size, align left middle, and set the shading to have a white background and 100% opacity

Reference Line TootlTip (new feature in v2019.2): set to Custom as below

Note if I’d simply called my field ‘Average’ rather than ‘Overall Avg Orders Per Day Per Segment’, I wouldn’t have needed this step, though it’s always useful to try out the new ‘little’ features if you can 🙂

Label : add [% Difference] to label shelf, and format centre middle

Tooltip : add relevant fields and adjust formatting as appropriate

The final thing I noticed was the axis scale – my scale was 2dp due to the formatting of my [Avg Orders Per Segment Per Month] field.  Luke’s axis was mixed – some scales at 1dp and some with 0.

I tried a few things, like formatting the axis to be Number (Standard) which has the effect of ‘automatically’ showing a number as a decimal or a whole numbers (something I recall from a very early WorkoutWednesday challenge a couple of years back).  But this didn’t give me the desired effect.  I ended up setting the axis format to be 1 dp.  But this then meant the value on the tooltip also ended up displaying as 1dp, when I wanted it to be 2 *sigh*

To fix this I created a duplicate field of the measure being displayed (a copy of [Avg Orders Per Segment Per Month]), and formatted it to 2dp, and placed this on the tooltip instead. 

I’m going to have to have a peak at Luke’s solution to figure out what magic he’s done here….

Very final step was to add to a dashboard, and add the title and my own custom footer.

So phew done!  My published version is here.

If you’ve got this far, thank you for reading J  I can’t guarantee all write ups will be to this level – it’ll partly depend on the challenge itself, and what path I head down to solve it.

I’m now off to have a peak at Luke’s challenge to figure out that pesky axis……. or I would if his workbook was downloadable 😦

Happy vizzin!

Donna