The data provided contained 1 row per person. I started by creating a bin based on the Age field, which I dragged into the dimensions section of the data pane (above the line) after I connected to the data. Right click Age > Create Bins
Age Group
Create bins every 5 ‘units’
Adding this on to Rows you’ll see you get the relevant values
With this field, we can create the other calculations we will need
{FIXED [Age Group]:SUM( IIF([Gender]<>’Female’ AND [Gender]<>’Male’,1,0))}
Format both of these to 0 dp
Males
IIF([Gender]=’Male’,1,0)
Format this to 0dp
Females
IIF([Gender]=’Female’,-1,0)
Note, this is -1 due to how we’re going to plot on the chart, but we don’t want the labels displayed with negative numbers, so custom format this field to ,##0;#,##0
Headcount Gap
SUM([Males]) + SUM([Females])
Note – we’re adding since the Females value is actually a negative number
Also custom format this to ,##0;#,##0
Add Total Headcount to Rows and change to be discrete (blue pill). Add Other Gender to Rows too, and again change to discrete.
Add Females to Columns and then add Males to Columns too.
Make the chart dual axis and synchronise the axis. Change the mark type on the All marks card to bar. Adjust the Measure Names colour legend.
Make the rows a bit wider. On the All marks card, on the Label shelf, tick show mark labels and align the labels middle centre and bold.
Add Headcount Gap to Columns. Remove the Measure Names field from the Colour shelf on the Headcount Gap marks card.
Create a new field
Colour – Headcount Gap
[Headcount Gap] <0
and add to the Colour shelf on the Headcount Gap marks card and adjust colours accordingly.
I chose to make the tooltip on all the bars to match exactly and reference the information shown. For this I created some other fields
Tooltip – Least Gender
IIF(NOT([Colour – Headcount Gap]),’women’, ‘men’)
Tooltip – Most Gender
IIF([Colour – Headcount Gap],’women’, ‘men’)
On the All marks card, add Males, Females, Headcount Gap, Tooltip – Least Gender and Tooltip – Most Gender to the Tooltip shelf and adjust the tooltip as required
Add a constant 0 reference line that displays a solid black line to both the Female axis and the Headcount Gap axis.
Finally tidy up by
Hide the top axis (uncheck show header)
Editing the title of the Female axis to read Headcount
Remove all gridlines
Remove column dividers
Update the viz title.
Add the sheet to a dashboard and ta-dah! My published viz is here.
It was Yoshi’s first challenge as a #WOW coach this week, and it provided us with an opportunity to develop our data densification and date calculation skills. I will admit, this certainly was a challenge that made me have to think a lot and reference other content where I’d done similar things before.
Modelling the data
Yoshi provided us with an adapted data set of patient waiting times, based on the Healthcare dataset from the Real World Fake Data (#RWFD) site. This provides 1 row per patient with a wait start time and end time. The requirement was to understand how many patients were waiting in each 30 min time slot in a 24hr period, so if a patient waited over 30 mins, or the start & wait time spanned a 30 min time slot, then the data needed to be densified appropriately. Yoshi therefore provided an additional data set to use for this densification, and I have to be honest, it wasn’t quite what I was expecting.
So the first challenge was to understand how to relate the two sets of data together, and this took some testing before I got it right. Yoshi provided hints about using a DATEDIFF calculation to find the time difference between the wait start time (rounded to 30 mins) and the wait end time (rounded to 30 mins).
To work out the calculation required, I actually first connected to the Hospital ER data set and then spent time working out the various parts of the formula required. I wanted to work out what the waiting start time was ’rounded down’ to the previous 30 min time slot, and what the waiting end time was ’rounded up’ to the next 30 min time slot.
If the minute of Date Start is between 0 and 29, then return the date at the hour mark, otherwise (if the minute of Date Start is between 30-59) then find the date at the hour mark, and add on 30 minutes. So if Date Start is 23/04/2019 13:23:00 then return 23/04/2019 13:00:00, but if DateStart is 23/04/2019 13:47, then return 23/04/2019 13:30:00.
If the minute of Date End is between 0 and 29, then find the date at the hour mark, and add on 30 minutes, otherwise (if the minute of Date End is between 30-59) then find the date at the hour mark, and add on 1 hour. So if Date End is 23/04/2019 13:23:00 then return 23/04/2019 13:30:00, but if DateEnd is 23/04/2019 13:47, then return 23/04/2019 14:00:00.
With this, I was then able to relate the Hospital ER data set to the dummy by 30 min data set by adding a relationship calculation to the Hospital ER data set of
(which returns the difference in minutes between the ’rounded down’ Date Start and the ’rounded up’ Date End – you can’t reference calculated fields in the relationship, so you have to recreate)
and set this to be >= to Range End from the dummy by 30 min data set
Let’s see what this has done to the data.
On a sheet add Patient ID. Date Start (as discrete exact date – blue pill), Date End(as discrete exact date – blue pill), Date Start Round (as discrete exact date – blue pill), Date End Round(as discrete exact date – blue pill) and Index (as discrete dimension – blue pill) to Rows and add Patient Waittime to Text.
Looking at the data for Patient Id 899-89-7946 : they started waiting at 06:18 and ended at 07:17. This meant they spanned three 30 min time slots: 06:00-06:30, 06:30-07:00 and 07:00-07:30, and consequently there are 3 rows displayed, and the ’rounded’ start & end dates go from 06:00 to 07:30.
Identifying the axis to plot the 24hr time period against
Having ‘densified’ the data, we now need to get a date against each row related to the 30 min time slot it represents. ie, for the example above we need to capture a date with the 06:00 time slot, the 06:30 time slot and the 07:00 time slot.
But, as the chart we want to display is depicted over a 24hr timeframe, we need to align all the dates to the exact same day, while retaining the time period associated to each record. This technique to shift the dates to the same date, is described in this Tableau Knowledge Base article.
Add this into the table, you can see what this data all looks like – note, I’m just choosing a arbitrary date of 01 Jan 1990, so my baseline dates are all on this date, but the time portions reflect that of the Date Start Round field.
With the day all aligned, we now want to get a time reflective of each 30 min timeslot. The logic took a bit of trial and error to get right, and there may well be a much better method than what I came up with. It became tricky as I had to handle what happens if the time is after 11pm (23:00) as I needed to make sure I didn’t end up returning dates on a different day (ie 2nd Jan 1990). I’ll describe my logic first.
If the Index is 0 then we just want the date time we’ve already adjusted – ie Date Start Baseline
If the Index is 1, then we need to shift the Date Start Baseline by 30 minutes. However, if Date Start Baseline is already 23:30, then we need to hardcode the date to be 01 Jan 1990 00:00:00, as ‘adding’ 30 mins will result in 02 Jan 1990 00:00:00.
If the Index is 2, then we need to shift the Date Start Baseline by 1 hour. However, if Date Start Baseline is already 23:30, then we need to hardcode the date to be 01 Jan 1990 00:30:00, as ‘adding’ 1 hour will result in 02 Jan 1990 00:30:00. Similarly, if Date Start Baseline is already 23:00, then we need to hardcode the date to 01 Jan 1990 00:00:00, otherwise we’ll end up with 02 Jan 1990 00:00:00.
As the relationship didn’t result in any instances of Index > 2, I stopped my logic there. This is all encapsulated within the calculation
Date to Plot
CASE [Index] WHEN 0 THEN [Date Start Baseline ] WHEN 1 THEN IIF(DATEPART(‘hour’, [Date Start Baseline ])=23 AND DATEPART(‘minute’, [Date Start Baseline ])=30, #1990-01-01 00:00:00#, DATEADD(‘minute’, 30, [Date Start Baseline ])) WHEN 2 THEN IIF(DATEPART(‘hour’, [Date Start Baseline ])=23 AND DATEPART(‘minute’, [Date Start Baseline ])=30, #1990-01-01 00:30:00#, IIF(DATEPART(‘hour’, [Date Start Baseline ])=23 AND DATEPART(‘minute’, [Date Start Baseline ])=0, #1990-01-01 00:00:00#, DATEADD(‘hour’, 1, [Date Start Baseline ]))) END
Phew!
Add this to the table as a discrete exact date (blue pill), and you can see what’s happening.
For Patient Id 899-89-7946, we have the 3 timeslots we wanted: 06:00 (representing 06:00-06:30), 06:30 (representing 06:30 -07:00) and 07:00 (representing 07:00-07:30).
On a new sheet, add Date To Plot as a discrete exact date (blue pill) and then create
Count Patients
COUNT([Patient Id])
and add to Text and you have a list of the 48 30 min time slots that exist within a 24hr period, with the patient counts.
We need to be able to filter this based on a quarter, so create
Date Start Quarter
DATE(DATETRUNC(‘quarter’,[Date Start]))
and custom format this to yyyy-“Q”q
Add to the Filter shelf selecting individual dates and filter to 2019-Q2, and the results will adjust, and you should be able to reconcile against the solution.
Building the 30 minute time slot bar chart
On a new sheet, add Date Start Quarter to the Filter shelf and set to 2019-Q2. Then add Date to Plot as a continuous exact date (green pill) to Columns and Count Patients to Rows. Change the mark type to a bar and show mark labels.
Custom format Date To Plot to hh:nn.
We need to identify a selected bar in a different colour. We’ll use a parameter to capture the selected bar.
pSelectTimePeriod
date parameter defaulted to 01 Jan 1990 14:00 with a display format of hh:nn
Then create a calculated field
Is Selected Time Period
[Date to Plot] = [pSelectTimePeriod]
and add to the Colour shelf, adjusting colours to suit and setting the font on the labels to match mark colour.
A ‘bonus’ requirement is to make each bar ’30 mins’ wide. For this we need
Size – 30 Mins
//number of seconds in 30 mins divided by number of seconds in a day (24hrs) (30 * 60) / 86400
Add this to the Size shelf, change it to be a dimension (so no longer aggregated to SUM), then click on the Size shelf, and set it to be fixed width, left aligned.
Via the Colour shelf, change the border of the mark to white.
Next, we need to get the tooltip to reflect the ‘current’ time slot, as well as the next time slot. For this I created
Next Time Period
IIF(LAST()=0, #1990-01-01 00:00:00#, LOOKUP(MIN([Date to Plot]),1))
This is a table calculation. If it’s the last record (ie 01 Jan 1990 23:30), then set the time slot to be 1st Jan 1990 00:00, otherwise return the next time slot (lookup the next (+1) record). Custom format this to hh:nn and add to the Tooltip shelf. Set the table calculation to compute by Date to Plot and Is Selected Time Period.
Update the Tooltip as required.
The final part of this bar chart is another ‘bonus’ requirement – to add 2 hr time interval ‘bandings’. For this I created another calculated field
2hr Time Period to Plot
IIF(DATEPART(‘hour’, MIN([Date to Plot]))%4 = 0 AND DATEPART(‘minute’, MIN([Date to Plot]))=0,WINDOW_MAX([Count Patients])*1.1,NULL)
If the hour part of the date is divisible by 4 (ie 0, 4, 8,12,16, 20), and we’re at the hour mark (minute of date is 0), then get the value of highest patient count displayed in the chart, and uplift it by 10%, otherwise return nothing.
Add this to Rows. On the 2nd marks card created, remove Is Selected Time Period from Colour and Next Time Period from Tooltip. Adjust the table calculation of the 2hr Time Period to Plot to compute by Date to Plot only. Remove labels from displaying.
The bars are currently showing all at the same height (as required) but at a width of 30 minutes. We want this to be 2 hrs instead, so create
Size – 2 Hrs
//number of seconds in 2hrs / number of seconds in a day (24 hrs) (60*60*2)/86400
Add this to the Size shelf instead (fixing it to left again). Adjust the colour to a pale grey, and delete all the text from the Tooltip.
Hide the nulls indicator. Make the chart dual axis and synchronise the axis. Right click on the 2hr Time Period Plot axis (the right hand axis) and move marks to back.
Tidy up the chart by removing row & column dividers, hiding the right hand axis, removing the titles from the other axes. Update the title accordingly.
Building the Patient Detail bar chart
On a new sheet, add Is Selected Time Period to Filter and set to True. Then go back to the other bar chart, and set the Date Start QuarterFilter to Apply to Worksheets -> Selected Worksheets and select the new one being built.
Add Department Referral, Patient Id, Date Start (as discrete exact date – blue pill) and Date End(as discrete exact date – blue pill) to Rows and Patient Waittime to Columns. Manually drag the NoneDepartment Referral option to the top of the chart. Add Patient Waittime to Colour and adjust to use the grey colour palette. Remove column dividers.
The title of this chart needs to display the number of patients in the selection, as well as the timeframe of the 30 min period. We already have the start of this captured in the parameter pSelectTimePeriod. We can use parameters to capture the other values too.
pNumberPatients
integer parameter defaulted to 0
pNextTimePeriod
date parameter, defaulted to 01 Jan 1990 14:30:00, with a custom display format of hh:nn
Update the title of the Patient Detail bar chart to reference these parameters (don’t worry about the count not being right at this point).
Capturing the selections
Add the sheets onto a dashboard. Then create the following dashboard parameter actions.
Set Selected Start
On select of the 30 Min bar chart, set the pSelectTimePeriod parameter passing in the value from the Date To Plot field.
Set Selected End
On select of the 30 Min bar chart, set the pNextTimePeriod parameter passing in the value from the Next Time Period field.
Set Count of Patients
On select of the 30 Min bar chart, set the pNumberPatients parameter passing in the value from the Count Patients field, aggregated at the SUM level.
With these all set, you should find you can click on a bar in the 30 minute chart and filter the list of patients below, with the title reflecting the bar chosen.
The basic viz itself wasn’t overly tricky, once you get over the hurdle of the calculations needed for the relationship and identifying the relevant 30 min time periods.
This week’s challenge, set by Lorna, made use of the newly released ‘dotted line’ format in v2023.2, so you’ll need that version in order to achieve the dotted line that spans the 2 marks being compared. If you don’t have this version, you can still recreate the challenge, but you’ll just have a continuous line.
Building the calculations
This challenge involves table calculations, so my preferred startig point is to build out all the fields I need in a tabular format.
So start by adding Category to Columns and Order Date at the discrete (blue) week level to Rows. Add Sales to Text (I formatted Sales to $ with 0 dp, just for completeness).
Apply a Moving Average Quick Table Calculation to the Sales pill, then edit the table calculation to it is averaging over the previous 5 values (including the current value – ie 6 weeks in total), and it is computing by Order Date only.
Add another instance of Sales back into the table, so you can check the values.
The ‘moving average’ Sales pill is what will be used to plot the main line chart.
But we need to identify 2 points on the chart to compare – the values associated to a start date determined by the user clicking on the chart, and the values associated to an end date determined by the user hovering on the chart. We’ll make use of parameters to store these dates
pDateClick
date parameter defaulted to 27th Dec 2020
pDateHover
date parameter defaulted to 28 Nov 2011
We can then determine what the moving average Sales values were at these two dates
Sales to Compare
IF DATETRUNC(‘week’, MIN([Order Date])) = [pDateClick] OR DATETRUNC(‘week’, MIN([Order Date])) = [pDateHover] THEN WINDOW_AVG(SUM([Sales]), -5, 0) END
Add this to the table, and the values for the moving average will only be populated for the two dates stored in the parameters
This is the field we will use to plot the points to draw the lines with.
But we also need to work out the difference between these values so we can display the labels.
If the date matches the pDateHover (end) date then return the moving average and then spread that value over every row.
Add these into the table, and you can see how the table calculations are working
The moving avg value for the start date is displayed in every row against the Sales to Compare Start column, while the moving avg for the end date is displayed in every row for the Sales to Compare End column.
With these values now displayed on the same row, we can calculate
Difference
[Sales to Compare End]-[Sales to Compare Start]
formatted to $ with 0 dp
and
% Difference
[Difference]/[Sales to Compare Start]
formatted to a custom number format of ▲0.0%;▼0.0%;0.0%
Popping these into the tables, the values populate over every row, but when it comes to the label, we only want to show data against the comparison date, so I created
Label Difference
IF DATETRUNC(‘week’, MIN([Order Date])) = [pDateHover] THEN [Difference] END
formatted to $ with 0 dp, and
Label Difference %
IF DATETRUNC(‘week’, MIN([Order Date])) = [pDateHover] THEN [% Difference] END
formatted to a custom number format of ▲0.0%;▼0.0%;0.0%
With all these fields, we can now build the chart
Building the viz
On a new sheet, add Order Date set to the continuous (green) week level to Columns and Category to Rows. Add Sales to Rows and apply the moving average quick tablecalculation discussed above, adjusting so it is computing over 5 previous weeks and by Order Date only
Add Category to Colour and adjust accordingly, then reduce the opacity to around 40%
Add pDateClick to Detail then add a reference line to the Order Date axis the references this field. Adjust the Label and Tooltip fields, and set the line format to be a thick grey line.
Format the reference line text so it is aligned top right.
Then add pDateHover to Detail and add another reference line, but this time set the line so it is formatted as a dashed line.
Next, add Sales to Compare to Rows. Adjust the colour of the associated marks cards so it is 100% opacity.
Right click on the Sales to Compare pill and Format. On the Pane tab on the left hand side, set the Marks property of the Special Values section to Hide (Connect Lines). Click on the Path shelf and choose the dashed display option.
Set the chart to be Dual axis and synchronise the axis. Remove Measure Names from the All marks card. Hide the right hand axis.
Add Label Difference and Label Difference % to the Label shelf. Set the font to match mark colour and adjust font size and positioning. I left aligned the font but added a couple of spaces so it wasn’t as close to the line. I also added a couple of extra carriage returns so the text shifted up too.
Finally adjust tooltips, remove column dividers and hide the Category column title. Adjust the axis titles.
Adding the interactivity
Add the sheet onto a dashboard, then add 2 parameter actions
Set Start on Click
On select of the Viz sheet, pass the Order Date (week) into the pDateClick parameter. Leave the current value when selection is cleared (ie when user clicks elsewhere in viz, but not on another mark).
and
Set Comparison on Hover
On select of the Viz sheet, pass the Order Date (week) into the pDateHover parameter. Leave the current value when selection is cleared (ie when user clicks elsewhere in viz, but not on another mark).
Lorna Brown set the challenge this week to build this butterfly chart, so called because of the symmetrical display. I’ve built these before so hoped it wouldn’t be too taxing.
The data set isn’t very verbose, fields for gender & year specific values by age bracket
When connecting to this excel file, you need to tick the Use Data Interpreter checkbox which removes all the superfluous rows you can see in the excel file.
Now, I started by building a version that required no further data reshaping, which I’ve published here. This version uses two sets of dual axis, and I use the reversed axis feature to display the male figures on the left. This version required me to float the Population axis label onto the dashboard so it was central. Overall it works, the only annoying snag is that I have two 0M labels displayed on the bottom axis, since this isn’t a single axis. As a result, I came up with an alternative, which matches the solution, and which I’m going to blog about.
Reshaping the data
So once I connected to the data and applied the Use Data Interpreter option, I then selected the Males, 2021 and Females, 2021 columns, right clicked and selected Pivot
This results in a row for the Male figures and a row for the Female figures, with a column Pivot Names indicating the labels for each row. I renamed this to Gender, and the Pivot Values column to Population.
Building the Chart
On a sheet, add Age to Rows and Population to Columns and Gender to Colour. Set Stack Marks Off (Analysis > Stack Marks > Off).
You’ll see that both the values for the Male & Female data is displaying in the positive x-axis. We don’t want this. So let’s create
Population – Split
IF [Gender]= ‘Males, 2021’ THEN [Population]*-1 ELSE [Population] END
Replace the Population pill on Columns with Population Split
So this is the basic butterfly, but now we need to show the Total Population. Firstly, let’s create
Total Population per Age Bracket
{FIXED Age: SUM([Population])}
And then similarly to before, we need to display this in both directions against the Male and the Female side, so create
Total Population Split
IF [Gender]= ‘Males, 2021’ THEN [Total Population per Age Bracket]*-1 ELSE [Total Population per Age Bracket] END
Drag this field onto the Population Split axis until the two green column icon appears, then let go of the mouse. This is creating a combined axis, where multiple measures are on the same axis.
Move Measure Names from Rows to the Detail shelf and then change the symbol to the left, so the pill is also added to the Colour shelf. Adjust the colours of the 4 options accordingly.
Edit the axis and rename to Population. Then Format the axis, and set the scale to display in millions to 0dp
The tooltip needs to display % of total, so we’ll need
% of Total
SUM([Population]) / SUM([Total Population per Age Bracket])
Add this to the Tooltip shelf
We also need the population figures on the tooltip displayed differently, so explicitly add Population Split and Total Population Split to the Tooltip shelf. Use the context menu of the pills on this shelf, to set the display format in the pane to be millions to 2 decimal places. But this time, once set using the Number(Custom) dialog, switch to the Custom option and remove the – (minus sign) from the format string. This will make the values displayed against the Males which are plotted on the -ve axis, to show as a positive number.
Finally, set aliases against the Gender field, so it displays as Males or Females (right click on Gender > Aliases).
Now you have all the information to set the tooltip text.
Now add a title to the sheet to match that displayed, hide field labels for rows, and adjust any other formatting with row/column dividers etc that may be required, and add to a dashboard.
And that should be it. My published version of this build is here.
This week, Luke set the challenge which is focussed on manipulating time. Medical admissions over many days is represented in a bar chart which spans a 24 hour period. All admissions needs to be ‘bucketed’ into 15 minute intervals over the 24 hours ie admissions between midnight and 12:14am is counted within the same 12:00am ‘bucket’.
The data for this challenge is embedded within a workbook which you need to download via the challenge page. I did as Luke instructed; downloaded the workbook, deleted all the existing sheets, then re-saved as my own file.
The first step that is required is to ‘baseline’ / normalise the admission dates so they all look to be on the same day.
There’s different ways to do this; on this occasion I used the 3rd method from this Tableau KB, although I simply hardcoded a date of 1 Jan 2023 rather than use TODAY(). It doesn’t matter what this particular date is, its just an arbitrary date.
Once we’ve got this, we then need to manipulate this date again to ‘group’ into the 15 min interval. This isn’t something I know ‘just to do’, but I know I’ve done it before. So a quick google was needed and I used this blog for the required calculation.
Pop these fields out into a table to see how these calculated fields are working
The Baseline Admission Date 15 mins is what we’ll use for the x-axis. The next step is work out the value being plotted Stays per Day.
Now when the challenge was first placed, a couple of the requirements were missing, so there was a bit of head-scratching trying to figure out what numbers were being used to get the values presented.
The following fields need to be added to the Filter shelf:
Stay Type = Outpatient
Admission Date starting from 30 Aug 2017 00:00
When on the filters shelf, both these should then be added to Context, as the data needs to be filtered before the LOD calc we need to use gets calculated (defined below).
Count Days with 15 min interval
{FIXED [Baseline Admission Date 15 mins]: COUNTD(DATETRUNC(‘day’, [Admission Date]))}
This is counting the distinct days when there was admission within each 15 minute period ie if there were 2 admissions on the same day within the same 15 minute window, the day would only count as 1.
From this we can then compute
Stays per Day
SUM([Number of Records])/SUM([Count Days with 15 min interval])
Now we’ve got the data we need, so we can build the viz.
Add Baseline Admission Date 15 mins to Columns as a green continuous pill, and Stays per Day to Rows. Don’t forget to add Admissions Date and Stay Type to the Filter shelf as mentioned above.
The bars look ‘blocky’. You can manually adjust the size, but you might notice that the widths between isn’t exact – the whitespace looks larger between some bars than others. To resolve this, I created a field to control the size, which is based on the number of 15 minute intervals there are in a 24 hour period – 96.
Size
1/96
Add this to the Size shelf, change the aggregation to MIN, and adjust the size to be Fixed and aligned Centre.
Add another instance of Stays per Day to the Rows shelf. Then make it dual axis and synchronise axis. Change the mark type of the 2nd Stays per Day instance to Gantt and change the colour. Then change the colour of the 1st Stays per Day.
Show mark labels, and set the Label just to show the max value.
Right click on the time axis, and format, and custom format to h:nn am/pm
And essentially, that’s it. There’s formatting to do to remove the secondary axis, column & row banding etc and add tooltips, but the core of the viz is complete.
My published instance is here. Note the time formatting seems to be an issue on Tableau Public. Someone did comment that this was an issue with the MAKETIME function, but I didn’t actually use this function.
It was Lorna’s turn to set the challenge this week, and she took the opportunity to ask us to use a new feature in Tableau v2021.1 – Quick LoDs (you’ll obviously need v2021.1 to use the functionality, but the LoDs can be created manually in earlier versions if need be).
This blog will focus on
Creating the Sub-Category Average Sales by Category LoD using Quick LoDs
Formatting the difference calculation
Colouring the bars
Text for Tooltip
Putting it all together
The viz itself isn’t that complex once you’ve nailed the LoD, so lets’ start with that bit.
Creating the Sub-Category Average Sales by Category LoD using Quick LoDs
The what…? It’s a bit of a mouthful,.. the “Sub-Category Average Sales by Category”. What we’re essentially after here is the Total Sales per Category / No of Sub-Categories in the Category. To do this with LoDs, we first need to create an LoD to represent the total sales in each Sub-Category.
As mentioned in the requirements / referenced in the KB article above, you use a combination of ctrl/command click & drag to create an LoD via the Quick LoD feature. In this case I dragged the Sales measure onto the Sub-Category dimension, and this automatically created
Sales (Sub-Category)
{ FIXED [Sub-Category]: SUM([Sales]) }
which is the sales per sub-category.
From this, I then dragged this measure onto the Category dimension, which automatically then created
Sales (Sub-Category) (Category)
{ FIXED [Category]: SUM([Sales (Sub-Category)]) }
BUT I then edited this to change the aggregation to AVG, so the field became
{ FIXED [Category]: AVG([Sales (Sub-Category)]) }
This gives the average value required, which you can see is the same across the rows in a single Category :
Formatting the difference calculation
The viz displays the % difference between the Sub-Category sales and the average. This is calculated with
This is then custom formatted as ▲0%;▼0% (I use this site to get my shapes from).
Colouring the bars
The bars need to be coloured based on the value of the Difference field, so another calculated field is required
Colour
[Difference]>0
This will just return true or false and can dropped on the Colour shelf of the bars.
Text for Tooltip
Within the tooltip, the % difference is displayed, along with some text which indicates if it is above or below the average. For this we need another calculated field to reference in the Tooltip.
Above | Below
IF [Difference]>0 THEN ‘Above’ ELSE ‘Below’ END
Putting it all together
With all the calculated fields built, the the chart itself is a relatively simple dual axis chart
Add Category then Sub-Category to Rows
Add Sales to Columns and sort descending
Add Sales (Sub-Category) (Category) to Columns. Make dual axis and synchronise axis
Click on the All marks card and remove Measure Names from the Colour shelf.
Click on the Sales marks card and change mark type to bar; add the Colour field to the Colour shelf and adjust accordingly; add Difference to the Label shelf and format appropriately.
Click on the Sales (Sub-Category)(Category) marks card and change mark type to Gantt; adjust the Size to be as large as possible; set the colour to the relevant grey.
On the All marks card, add the Above | Below field to the Tooltip shelf, then edit the Tooltip on the All marks card to create the required text.
Finally, remove axis, remove the column heading labels, remove grid lines and column borders and format the displayed text appropriately. Title the viz.
A very brief post today, but hopefully I’ve ticked off all the required elements, My published viz is here.
This weeks #WorkoutWednesday was set by the lovely Ann Jackson who often delivers some ‘challenging’ problems, all beautifully presented to fool you into thinking it’s going to be straightforward.
This week was no different. Time constraints meant I couldn’t dedicate the usual time to it on Wednesday, and then when I did get to it, I ended up with several false starts, that got very nearly there, but just fell at the final hurdle. I started again this evening, and finally got to something I’m happy with. So let’s get to it.
Ann’s challenge here, was to show a set of monthly KPI BANs (big-ass numbers) with a day by day comparison to the same time month in the previous year. From initial inspection, I figured that several table calculations were going to be needed. She also stated that we could use as many sheets as we liked. I ended up with 4 in my final viz; 1 displaying the BAN numbers, 1 displaying the trend chart, 1 displaying the red/green indicators to the left and 1 for the ‘days until month end’ subtitle.
Let’s start with the BAN numbers.
Ann wanted the chart to be dynamic, to be based as if you were looking at the data based on the month of ‘today’, and for it to change if you looked at it tomorrow. Since the Superstore dataset being used only contains data from 2015-2018, you can’t use the real ‘today’ date.
I authored my viz on 20th Sept 2019. I set up a table calculation to simulate today’s date as follows
Today
//simulate today to be based on the latest year in the dataset MAKEDATE( YEAR({FIXED:MAX([Order Date])}), MONTH(TODAY()), DAY(TODAY()) )
This produces a date of 20 Sept 2018 (or whatever date in 2018 you happen to be building your viz).
Since the data set is fixed, I could have simply hardcoded the year to 2018, but used the above FIXED LoD expression to be more generic. This LoD finds the year of the maximum date in the whole dataset.
I need to know the month to date sales for the month I’m in (in this case sales from the 1st to 20th September).
Sales MTD This Year
IF [Order Date]>=DATETRUNC(‘month’, [Today]) AND [Order Date]<= [Today] THEN [Sales] ELSE 0 END
This returns the Sales value for the records dated between 01 Sept 2018 and 20 Sept 2018.
This gives me my basic headline BAN number
For the BAN, I also need % change from previous year which requires
Today Last Year
DATEADD(‘year’, -1,[Today])
which returns 20 Sept 2017
Sales MTD Last Year
IF [Order Date]>=DATETRUNC(‘month’, [Today Last Year]) AND [Order Date]<= [Today Last Year] THEN [Sales] ELSE 0 END
which returns the Sales value for the records dated between 01 Sept 2017 and 20 Sept 2017.
% Change
(SUM([Sales MTD This Year]) – SUM([Sales MTD Last Year]))/Sum([Sales MTD Last Year])
This gives me the YoY difference, which I then custom formatted to
▲ 0%;▼ 0%
I could then set up my BAN sheet, by adding the relevant fields to the Text shelf, and formatting accordingly
For the KPI indicator, I required an additional field to set the colouring based on the value of %Change
Colour:BAN
IF [% Change] < 0 THEN ‘red’ ELSE ‘green’ END
I then created a very simple bar chart using an ‘old favourite’ MIN(1) to create an axis for a bar chart. The axis was fixed to end at 1, so the bar fills the space.
So that’s the straightforward bits… now onto the more challenging part – the trend chart.
This chart is showing the following:
The daily month to date sales for the current month up to ‘today’. This is the red/green line which is labelled with the total MTD sales as at today. At the point I’m writing this is the sales from 1-20 Sept 2018.
The daily month to date sales for the equivalent month last year, from the start of the month up to the same date last year (in my case 1-20 Sept 2017). This is the darker grey area chart up to the dotted ‘today’ reference line.
The daily month to date sales for the equivalent month last year from the start of the month up to the end of the month (in my case 1-30 Sept 2017). This is the dark + light grey area chart.
For this I knew I’d need a dual axis chart using an area chart for one and line chart for the other.
Given there’s a reference line on the axis indicating ‘Today’, I know I needed a continuous date axis, and chose to use the idea of baselining all the dates to the same year, and then filtering the viz just to use the dates in the current month (in this case September).
Date Aligned
//reset all data to pretend all against same year MAKEDATE(YEAR([Today]),MONTH([Order Date]), DAY([Order Date]))
You can see from above regardless of the year of the actual Order Date, the re-aligned date field, has the same date.
Month To Include
MONTH([Order Date]) = MONTH(TODAY())
adding this to the filter shelf and setting to True filters to just the September dates in the data set.
Area Chart
The area chart is last year’s data. So far I’ve only built a Sales MTD – Last Year field, but plotting this as a running total table calc against Date Aligned (exact date), doesn’t give me what I need….
…as it flattens out after 20 Sept, as that is when I defined the Sales value to stop being counted. I need a Sales field that continues to grow until the end of the month. I also need a Sales field that gives me my running total up to 20 Sept, but then stops.
Sales Full Month Last Year
IF DATETRUNC(‘month’,[Order Date]) = DATETRUNC(‘month’,[Today Last Year]) THEN [Sales] ELSE 0 END
Adding this to the view and changing to use a Running Total quick table calc gives me what I need
Running Sum Sales MTD Last Year
IF ATTR([Date Aligned])<=ATTR([Today]) THEN RUNNING_SUM(SUM([Sales MTD Last Year])) ELSE NULL END
This is basically only storing the running sum if the date is prior or on today.
I then changed these to sit on the same axis, rather than side by side, changed the mark type to Area, turned stack marks to off, and changed the colour to grey. As there are measures that overlap each other they give the appearance of a darker shade (sneaky huh?).
Line Chart
In a similar way described above, I can’t just use a running total of my existing Sales MTD – This Year field for the line, as that will also continue beyond 20 Sept. So I need
Running Sum Sales MTD This Year
IF ATTR([Date Aligned])<=ATTR([Today]) THEN RUNNING_SUM(SUM([Sales MTD This Year])) ELSE NULL END
Added to the view as a dual axis (synchronised) and mark type of line I get
To change the colour of the line I can’t just use the field I used to make the KPI indicator above, as my data is now at a much more granular level, and it will return me multiple % changes. I just want the overall % change. I had to create more calculated fields for this :
Total Sales MTD
WINDOW_SUM(SUM([Sales MTD This Year]))
Total Sales MTD – Last Year
WINDOW_SUM(SUM([Sales MTD Last Year]))
% Total Change
([Total Sales MTD] -[Total Sales MTD – Last Yr]) / [Total Sales MTD – Last Yr]
Colour : Line
IF [% Total Change ] < 0 THEN ‘red’ ELSE ‘green’ END
You can obviously combine all these steps into one, but I find it easier to read this way. No doubt there’s also another way I could have achieved this.
So that’s the main trend chart complete you think (don’t forget to add Today as a reference line, and label the end of the line chart), until you examine the tooltips and notice things aren’t quite giving you what you need.
Against each mark, Ann wants us to show:
Sales MTD for this year, which rises until ‘today’, then remains the same
Sales MTD for previous year, which also rises until ‘today’, then remains the same
Sales MTD for previous year, which continues until the end of the month
With the measures I’ve got on the view, the MTD Sales up to today for this year and last year stop once I pass ‘today’.
But not to worry, this actually isn’t too hard; I just need to add Sales MTD This Year,Sales MTD Last Year and Sales Full Month Last Year to the tooltip and change all the be Running Total table calcs.
Apply relevant formatting to the tooltip, and gridlines etc, hide headers & axis and this chart is now good to go!
When I then added these 3 views to the dashboard, I placed them side by side in a horizontal container, and changed the padding on each view to 0 on all sides, so they all butted up against each other and the lines for each row appeared joined up.
The subtitle showing the days until the end of month is simply a sheet showing another calculated field Days Until End of Month in the text
And so that’s about it I think… on reflection I wonder why I was being such a knob with my initial attempts where the table calcs I was using seemed to be getting out of hand…. we just all have those days I guess 🙂