Can you visualise yearly rank change in Sub-Category sales?

August 30, 2025 Donna ColesLeave a comment

Yusuke set this interesting challenge : to combine a ‘bump’/’slope’ chart visualising the change in rank whilst also visually displaying the Sales value for the relevant Sub-Category in the ranked position.

Defining the calculations

This challenge will involve table calculations, so I’m going to start by building out the various calculations that will be required and displaying in a tabular view.

Add Category to Filter and select Office Supplies. Then add Sub-Category and Order Date at the Year level as a discrete (blue) pill to Rows. Add Sales to Text.

Create a new field

Sales Rank

RANK(SUM([Sales]))

And add to the table, and verify the table calculation is set to compute by Sub-Category only.

We will need to ‘colour’ the viz based on the rank compared to the previous year. For this create

Is Min Year

{MIN(YEAR([Order Date]))} = YEAR([Order Date])

which will return true for the first year in the data (in this instance 2022) and then create

Colour

IF [Sales Rank] = LOOKUP([Sales Rank],-1) OR ATTR([Is Min Year]) THEN ‘Same as last year’
ELSE ‘Different from last year’
END

If the rank is the same as the previous one, or it’s the first year, then treat as the same, otherwise treat as different.

Add the Colour field to the table, and this time make sure the table calculation for Colour is computing by Year of Order Date only (while the nested calc for Sales Rank should still be computed by Sub-Category only)

The labels on the viz only want to show in certain scenarios – if it’s the first record (ie for 2022) or there has been a change in rank. We need

Label : Rank & Sub Cat

IF ATTR([Is Min Year]) OR [Colour] = ‘Different from last year’ THEN STR([Sales Rank]) + ‘ | ‘ + MIN([Sub-Category]) END

and

Label : Sales

IF ATTR([Is Min Year]) OR [Colour] = ‘Different from last year’ THEN SUM([Sales]) END

format this to $ with 0dp

Add these to the sheet, and double check the nested table calculations on each pill are computing as required (Sales Rank by Sub-Category only, Colour by Year Order Date only)

Now we have all this, we can start building

Creating the Viz

On a new sheet, add Category to Filter and select Office Supplies. The add Order Date to Columns, but set to be continuous (green) pill at the Year level. Add Sub-Category to Detail and add Sales Rank to Rows as a discrete (blue) pill. Verify the table calculation setting against the Sales Rank pill is by Sub-Category only.

Change the mark type to line and then add Order Date to Path. By default it should be at the Year level as a discrete pill.

This is the ‘bump’ chart.

Now add another instance of Order Date to Columns as a continuous pill at the Year level to essentially duplicate the display. On the 2nd marks card, change the mark type to Gantt

This gives us the ‘starting point’ for each ‘bar’. But we need to determine the size for each bar. First we’re going to ‘normalise’ the sales values for all the sales being displayed so we get a value between 0 and 1, where 0 is the smallest sale, and 1 is the largest.

Normalised Sales

((SUM([Sales]) – WINDOW_MIN(SUM([Sales]))) / (WINDOW_MAX(SUM([Sales])) – WINDOW_MIN(SUM([Sales]))))

To see what this is doing, format the field to 2dp, then add the field to the tabular view, and ensure the table calculation is computing by both Sub-Category and Year Order Date.

But the ‘axis’ we want to plot the bar length against is in years, so we need to adjust this size to be a proportion of a year (ie 365 days)

Gantt Size

//proportion of a year
[Normalised Sales] * 365

Add this to the Size shelf on the 2nd marks card on the viz. Adjust the table calc setting so it is computing by all the fields listed.

We now have the core concept so now we can start finalising the display.

Make the chart dual axis and synchronise the axis.

Set the view to fit height.

On the 1st marks card (that represents the line)

change the line style to dotted (via the Path shelf)
reduce the Size to suit
change the colour to pale grey
Add Label : Sales and Label : Rank & Sub Cat to the Label shelf.
- Adjust the table calc settings of each so the nested table calcs in each have Sales Ranks by Sub Category only and Colour by both the Year Order Date fields only.
- Adjust the layout of the text as required
- Align the font to be top right
- Change the font style (bold & black)
- Ensure the Label is set to ‘allow labels to overlap marks’
Remove the Tooltip

On the 2nd marks card, the gantt bar

Add Colour to the Colour shelf and adjust the colours accordingly.
- Verify the table calc settings are as expected
- I chose to reduce the opacity slightly, so I could see the dotted line underneath (set to 70%)
Add Sales to Tooltip (format to $ with 0 dp) and the adjust Tooltip as required

Then we just ned to finalise the formatting/display

Set the font of the years and rank numbers to black & bold.
hide the Sales Rank label heading (right click > hide field labels for rows)
remove row & column dividers
Add black column gridlines (I set to the 2nd thickness level), and remove any row gridlines
Edit the top axis to have a fixed start (use default option) and end at 31/12/2025 so the 2026 label and line disappears.
Remove the title from the top axis.
Edit the bottom axis – remove the tile, and then set the tick marks to None, so the bottom axis now looks empty.

And that should be it. Now add the sheet to a dashboard and display the category filter as a single select, customising the remove the ‘all’ option.

My published viz is here

Happy vizzin’!

Donna

Can you recreate this marginal histogram?

August 16, 2025 Donna ColesLeave a comment

For this week’s challenge, Lorna visualised the output of a crazy golf game she had with the latest Data School cohort, and thought it would be a fun challenge to share.

While the data set is very small, there’s several core calculations required, which we’ll start off creating.

Defining the core calculations

After connecting to the data, I initially dragged the Hole field from Measures to Dimensions (dragged it to be above the line on the data pane).

We need to know the average score per person (across all 11 holes)

Overall Average Score

{FIXED [Person]: AVG([Score])}

format this to 1dp

We also need the average score per hole

Avg Score Per Hole

{FIXED [Hole]: AVG([Score])}

format this to 1 dp

We need each person’s score up to 9 holes

Score up to 9

{FIXED [Person]:SUM( IF [Hole]<=9 THEN [Score] END)}

We need the tie break score for the 10th & 11th holes

Tie Break

{FIXED [Person]:SUM( IF [Hole]>9 THEN [Score] ELSE 0 END)}

We need the average score across 9 holes

Avg Score

[Score up to 9]/9

format to 1 dp

We need to the lowest score per person

Lowest

{FIXED [Person]: MIN([Score])}

With this information, we can then define a field we can use for sorting which is a numeric field that is a combination of multiple fields as stated in the requirements

Sort

(SUM([Score up to 9])*10000) +
(SUM([Tie Break]) * 1000) +
(SUM([Avg Score]) * 100)+
SUM([Lowest])

Building the Heat Map

On a new sheet add Score up to 9 to Rows and change it to be discrete. Repeat the same for Tie Break, Avg Score and Lowest. Then add Person to Rows and Hole to Columns.

Double click into Columns and manually type MIN(1.0) to create a ‘fake axis’. Change the mark type to bar and increase the Size to the maximum. Edit the axis to be fixed from 0 to 1 and the hide the axis. Manually increase the width of each row and set the chart to fit width.

Add Score to Colour and choose a diverging colour palette and then manually change the start & end colours to match the solution (or define your own colour style). I used #62a14b (green) and #13316d (dark blue). Check the box to use full colour range

Update the Tooltip as required.

We then need to include some indicators on the heat map. To determine the arrow indicator we first need

Previous Score

LOOKUP(SUM([Score]),-1)

and then create

Change Indicator

IF SUM([Score]) > [Previous Score] THEN ‘↗’
ELSEIF SUM([Score]) < [Previous Score] THEN ‘↘’
ELSEIF SUM([Score]) = [Previous Score] THEN ‘→’
END

(I sourced the arrow characters from https://jrgraphix.net/r/Unicode/2190-21FF)

Add Change Indicator to the Label shelf and then adjust the table calculation so it is computing by the Hole field only

We also need

Hole in 1 indicator

IF SUM([Score]) = 1 AND MIN([Hole])<> 9 THEN ‘●’ END

and

Max Shots Indicator

IF SUM([Score]) = 10 THEN ‘🙁’ END

(I used this site https://unicode-explorer.com/emoji/ for the emoji icon)

Add these to Label too and adjust label so it is aligned bottom centre, and coloured white. You may need to adjust size of font for each label item, and increase the height of each row to get the information to display.

Finalise the formatting by

Set the background colour of the whole sheet to #7299aa
Set the font of all the column headings and label headings to Tableau Medium, 9pt, white
Hide the Hole label heading (right click the label and hide field labels for columns)
Add a white border to the bars via the Colour shelf
Remove all column dividers
Remove row dividers from the pane only, ensure the row dividers for the headers remain.
Remove all gridlines , zero lines, axis rulers & ticks

Name the sheet HeatMap or similar.

Building the Average Score Per Person bar chart

On a new sheet, add Person to Rows and Overall Avg Score to Columns. Add a Sort to the Person pill that references the Sort field ascending

Add Overall Avg Score to the Colour shelf, and then adjust the colour scale so it uses a diverging colour scale which is then edited to use the same colour range as before and spans the same range, which means explicitly stating the start (-1) , middle (0) and end (10) values

We want to display a * against the winner

Winner Icon

IF SUM([Overall Avg Score]) = WINDOW_MIN(SUM([Overall Avg Score])) THEN ‘★’ ELSE ” END

Add this to Rows and adjust the table calculation so it is explicitly computing by Person only.

Create

Winner Label

IF SUM([Overall Avg Score]) = WINDOW_MIN(SUM([Overall Avg Score])) THEN ‘Coach Always Wins!’ END

and add this to Label along with Overall Avg Score. Increase the width of the bars to see the text and align middle left and change the font to white. Adjust the Tooltip.

Format the sheet

Set the worksheet background colour
Hide the Person column (uncheck Show Header)
Hide the Overall Avg Score axis (uncheck Show Header)
Remove all column/row dividers, gridlines, axis lines, zero lines
Add a white border around the bars (via the Colour shelf)
Increase the Size of the bars so there is a small gap between
Format the * to be white font
Hide the Winner Icon label (right click > hide field labels for rows)
Make the Winner Icon column as narrow as possible

Name the Sheet Avg Per Person or similar.

Building the Average Score Per Hole bar chart

On a new sheet, add Hole to Columns and Avg Score Per Hole to Rows. Add Avg Score Per Hole to Colour and adjust the scale as we did above so it ranges from green to blue and -1 to 10.

Show mark labels, adjust the tooltip. Set the background colour of the worksheet and remove all gridlines, zero lines, row/column dividers etc. Hide the Avg Score Per Hole axis and the hole labels (uncheck show header on the Hole pill). Adjust the font of the labels to be Tableau Medium and white. Add a white border around the bars. Increase the Size to leave a small gap.

Name the sheet Avg Per Hole or similar.

Building the dashboard

Set the background of the dashboard to the same colour of the worksheets.

Using containers, add a horizontal container and add a text object (for the title), the Avg Per Hole sheet and then a blank object. Remove the container that gets added with the colour legend.

Add another Horiztonal container beneath and add the Heatmap sheet and the Avg Per Person sheet.

Remove all padding. Remove all titles from the sheets. Set all the charts to fit entire view. Manually line up everything, but you’ll find you have an issue getting your horizontal bar chart to align to the player rows due to the header in the heatmap

(Note – you may notice your labels on the heat map aren’t displaying due to the space available). You can continue to adjust on Desktop so they do appear (make the heatmap have more vertical space), or wait until you’ve published to Tableau Public and see if you get the desired result… although at the point of writing , Tableau Public is having issues with the table calcs and causing odd behaviour with the display).

To fix this, go back to the Avg Per Person sheet and double click into column and manually type “” to create a dummy header row with no text. Hide the “” label (right click > hide field labels for columns). You can then adjust the height of this header label section to help get the alignment right.

Then make any final adjustments required – add the title, and any imagery etc. My published viz is here.

Happy vizzin’!

Donna

How are EPL Football Teams performing this season compared to previous seasons?

January 17, 2025 Donna ColesLeave a comment

I set this week’s challenge and I tried to deliver something to hopefully suit everyone wherever they are on their Tableau journey. The primary focus for this challenge is on table calculations, but there’s a few other features/functionality included too.

I love football – all the family are involved in it in some way, so thought I’d see what I could get from a different data set this week. The data set contains the results of matches played in the English Premier League for the last five seasons – from the 2020-21 season to the current season 2024-25. Matches up to the end of 2024 only are included.

As I wrote the challenge in such a way that should allow it to be ‘built upon’ from the beginner challenge, I’m going to author this blog in the same way. I’ll describe how to build the beginner challenge first, then will adapt /add on to that solution. Obviously, as in most cases, this is just how I built my solution – there may well be other ways to achieve the same result.

Beginner Challenge

Modelling the data

The data set provided displays 1 row per match with the teams being displayed in the Home and the Away columns respectively, and the FTR (full time result) column indicating if the Home team won (H), the Away team won (A) or if the match was a draw (D).

The first thing we need to do is pivot this data so we have 2 rows per match, with a column displaying the Team and another indicating if the team is the Home or Away team. To do this, in the data source window, select the Home and the Away columns (Ctrl-Click to multi-select – they’ll be highlighted in blue), then from the context menu on either column, select the Pivot option.

This will duplicate the rows, and generate two new fields called Pivot Field Names and Pivot Field Values. Rename the fields by double-clicking into the field heading

Pivot Field Names to Home or Away
Pivot Field Values to Team

Creating the points calculation

We need to determine how many points each team gained out of the match which is based on whether they won (3pts), lost (0pts) or drew (1pt). Create a calculated field

Pts

IF [Home or Away] = ‘Home’ AND [FTR] = ‘H’ THEN 3
ELSEIF [Home or Away] = ‘Away’ AND [FTR] = ‘A’ THEN 3
ELSEIF [FTR] = ‘D’ THEN 1
ELSE 0
END

Creating the cumulative points line chart

We want to create a line chart that displays the cumulative number of points each team has gained by each week in each season.

Start by adding Wk to Columns and Pts to Rows as these are the core 2 fields we want to plot. But we need to split this by Team and by season, so add Team and Season End Year to the Detail shelf.

This gives us all the data points we need, but at the moment, it’s currently just showing how many points each team gained per week in each season.

To get the cumulative value, we add a running total quick table calculation to the Pts field.

which gives us the display we need

While we can leave the SUM(Pts) field as is in the Rows, I tend to like to ‘bake’ this field into the data set, so I have a dedicated field representing the running total. I can create this field in 2 ways

Create a calculated field called Cumulative Points Per Season which contains the text RUNNING_SUM(SUM([Pts])). Add this field to Rows instead of just Pts.
Hold down Ctrl and then click on the SUM([Pts]) pill in Rows and drag into the left hand data pane and then release the mouse. This will automatically create a new field which you can rename to Cumulative Points Per Season. The field will already contain the text from the calculation used within the quick table calculation, and the field on Rows will automatically be updated.

Filtering the data

Add Team to the Filter shelf and when prompted just select a single entry eg Arsenal.

Show the Filter control. From the context menu, change the control to be a single value (list) display, and then select customise and uncheck the show all value option, so All can’t be selected.

Format the line

To make the line stepped, click the path button and select the step option

To identify the latest season without hardcoding, we can create

Is Latest Season

[Season End Year] = {MAX([Season End Year])}

The {MAX([Season End Year])} is a FIXED LOD (level of detail) calculation, and is a shortened notation for {FIXED: MAX([Season End Year])} which basically returns the latest value of Season End Year across every row in the data set. This calculation returns true if the Season End Year value of the row matches the overall value.

Add this field to the Colour shelf and adjust colours to suit. Also add the same field to the Size shelf

The line for the latest season is being displayed ‘behind’ the other seasons. To fix this, drag the True value in either the colour or size legend to be listed before the False value. Then edit the sizes from the context menu of the size legend, and check the reversed checkbox to make True the thicker line.

If the thick line seems too thick, adjust the mark size range to get it as you’d prefer

Label the line

Add Season End Year to the Label shelf. Align middle right. Allow labels to overlap marks. And match mark colour.

Create the Tooltip

We need an additional calculated field for this, as we want to display the season in the tooltip in the format 2024-2025 and not just the ending year of the season.

Season Start Year

[Season End Year]-1

Add this field to the Tooltip shelf, and then edit the Tooltip to build up the text in the required format. Use the Insert button on the tooltip to add referenced fields.

Final Formatting

To tidy up the display we want to

Change the axis titles
- Right click on each axis and Edit Axis then adjust the title (or remove altogether if you don’t want a title to display)
Remove gridlines and axis ruler
- Right click anywhere on the chart canvas, and select Format. In the left hand pane, select the format lines option and set grid Lines to None and axis rulers to None.

Set all text to dark purple
- Select the Format menu option at the top of the screen and select Workbook. The under the All fonts section, change the colour to that required

Update the title to reference the selected team
- double click in to the title of the sheet and amend accordingly, using the Insert option to add relevant fields.

Intermediate Challenge

For this part of the challenge we’re looking at using a dual axis to display another set of marks – these ones are circular and only up to 1 mark per season should display. As this now takes a bit more thought, and to help verify the calculations required, I’m going to build up the calculations I need in a tabular form first.

Defining the additional mark to plot

On a new sheet add Team, Season End Year and Wk to Rows. Set the latter 2 fields to be discrete (blue) pills. Add Cumulative Points Per Season to Text. Add Team to Filter and select Arsenal.

We need to identify the date of the last match played by each team, so we can use an LOD for this

Latest Date Per Team

{FIXED [Team] : MAX([Date])}

Add this to Rows as a discrete (blue pill) exact date. For Arsenal, the last match was on 27 Dec 2024, whereas for Chelsea it’s 22 Dec 2024.

With this, we can work out what the latest points are for each team in the current season.

Latest Points Per Team

WINDOW_MAX(IF MIN([Date]) = MIN([Latest Date Per Team]) THEN [Cumulative Points Per Season] END)

Breaking this down : the inner part of the statement says “if the date associated to the row matches the latest date, then return the points associated with that row”. Only 1 row in the table of data has a value at this point, all the rest of the rows are ‘nothing’. The WINDOW_MAX statement, then essentially ‘floods’ that value across every row in the data, because the ‘value’ returned by the inner statement is the maximum value (it’s higher than nothing). Add this field into the table.

We’re trying to identify the week in each season where the points are at least the same as the latest points. We’re going to capture the previous week’s points against every row.

Previous Points

LOOKUP([Cumulative Points Per Season],-1)

This is a table calculation that returns the value of the Cumulative Points Per Season from the previous row (-1). If we wanted the next row, the function parameter would be 1. 0 identifies the ‘current row’.

Add this to the table.

We can see the behaviour – The Previous Points associated to 2025 week 18 for Arsenal is 33, which is the value associated to Cumulative Points Per Season for week 17. But we can also see that week 38 from season 2024 is being reported as the previous points for week 1 of season 2025, which is wrong – we don’t want a previous value for this row.

To resolve, edit the table calculation of the Previous Points field and adjust so the calculation for Previous Points is just computing by the Wk field only.

With this we can identify the week in each season we want to ‘match’ against. In the case of the latest season, we just want the last row of data, but for previous seasons, we want to identify the first row where the number of points was at least the same as the latest points; the row where the points in the row are the same or greater than the latest points, and the points in the previous row are less.

Matching Week

//for latest season, just label latest record, otherwise find the week where the team had scored at least the same number of points as the current season so far
IF MIN([Is Latest Season]) AND LAST()=0 THEN TRUE
ELSE
IF NOT(MIN([Is Latest Season])) AND [Cumulative Points Per Season]>= [Latest Points Per Team] AND [Previous Points] < [Latest Points Per Team] THEN TRUE
ELSE FALSE
END
END

Add this to Rows and check the data. In the example below, in Season 2020-21, in week 26, Arsenal had 37 points. The previous week they had 34 points. Arsenal’s latest points are 36, so since 37 >=36 and 34 < 36, then week 26 is the matching week.

Looking at season 2023-24, in both week 15 and 16, Arsenal had 36 points. But only week 15 is highlighted as the match, as in week 14, Arsenal had 33 points so 36 >=36 and 33 <36, but for week 16, as the previous week was also 36, the 2nd half of the logic isn’t true : 36 is not less than 36.

So now we’ve identified the row in each season we want to display on the viz, we need to get the relevant points isolated in their own field too.

Matching Week Points

IF [Matching Week] THEN [Cumulative Points Per Season] END

Add this to the table

We now have data in a field we can plot.

Visualise the additional mark

If you want to retain your ‘Beginner’ solution, then the first step is to duplicate the Beginner worksheet, other wise, just build on what you have.

Add Matching Week Points to Rows to create an additional axis. By default only 1 mark may have displayed. Adjust the table calculation setting of the field, so the Latest Points Per Team calculation is computing by all fields except the Team field.

Change the mark type of the Matching Week Points marks card to Circle and remove Season End Year from the Label shelf (simply drag the pill against the T symbol off off the marks card)

Size the circles

We want the circles to be bigger, but if we adjust the Size, the lines change too, as the Size is being set based on the Is Latest Year pill on both marks cards. To resolve this, create a duplicate instance of Is Latest Year (right click pill and duplicate). This will automatically create Is Latest Season (copy) in the dimensions pane. Drag this onto the size shelf of the Matching Week Points marks card instead to make the sizing independent (you will probably find you need to readjust the table calculation for the Matching Week Points pill to include Is Latest Season (copy)). Then adjust the sizes as required.

Label the circles

Add Latest Points Per Team to the Label shelf of the Matching Week Points marks card. Adjust the table calculation setting, so the Latest Points Per Team calc is computing just by Wk only, so only the latest value displays.

Then format the Label so the text is aligned middle centre, is white & bold and slightly larger font.

Adjust the Tooltip text on the Matching Week Points mark, so it reads the same as on the line mark. You will need to reference the Matching Week Points value instead.

Then make the chart dual axis and synchronise the axis. Remove the Measure Names field that has automatically been added to the All marks card

Remove Label for Latest Season

We don’t want the season label to display for the current year, so create a new field

Label:Season

IF NOT([Is Latest Season]) THEN [Season End Year] END

Replace the Season End Year pill on the Label shelf of the Cumulative Points Per Season marks card, with this one instead.

Final Formatting

To tidy up

Remove right hand axis
- right click on axis and uncheck show header
Remove 165 nulls indicator
- right click on indicator and hide indicator
Remove row & column dividers
- Right click on the canvas and Format. Select Format Borders and set Row and Column Divider values to None

Advanced Challenge

For the final part of the challenge we want to add some additional text to the tooltip and adjust the filter control. As before either duplicate the Intermediate challenge sheet or just build on.

We’ll start with the tooltip text.

Expand the Tooltip Text

For this, we’ll go back to the data table sheet we were working with to validate the calculations required.

We want to calculate the difference in the number of weeks between the latest week of the current season, and the week number of the matching record from previous seasons. So first, we want to identify the latest week of the current season, and ‘flood’ that over every row.

Latest Week Number Per Team

WINDOW_MAX(MAX(IF ([Date]) = ([Latest Date Per Team]) THEN [Wk] END))

This is the same logic we used above when getting the Latest Points Per Team, although as this time the Wk field isn’t already an aggregated field like Cumulative Points Per Season is, we have to wrap the conditional statement with an aggregation (eg MAX) before applying the WINDOW_MAX.

Add this to the table.

And then we need, the week number in each season where the match was found, but this needs to be spread across every row associated to that season.

Matching Week No Per Team and Season

WINDOW_MAX(IF [Matching Week] THEN MIN([Wk]) END)

Add to the table, but adjust the table calculation so the field is computing by all fields except the Season End Year.

Now we have these 2 fields, we can compute the difference

Week Difference

[Latest Week Number Per Team] – [Matching Week No Per Team and Season]

Add to the table. With this we can then define what this means for the text in the tooltip

Less/More

IF [Week Difference]<0 THEN ‘more’ ELSEIF [Week Difference]>0 then ‘less’
ELSE ”
END

and also build the tooltip text completely

Tooltip- other seasons

IF NOT(MIN([Is Latest Season])) THEN
IF [Week Difference] = 0 THEN ‘It took the same amount of weeks to accrue at least the same number of points as the current season’
ELSE ‘It took ‘ + STR(ABS([Week Difference])) + ‘ weeks ‘ + [Less/More] + ‘ to accrue at least the same number of points as the current season’
END
END

Add this to the Tooltip shelf of the Matching Week Points marks card. Adjust the Tooltip to reference the field.

Adjust the Tooltip – other seasons table calculation, so the Latest Points Per Team nested calculation is computing for all fields except Team (this will make the circles reappear)

and also adjust the Latest Week Number Per Team nested calculation to compute by all fields except Team. This should make the text in the tooltip appear.

Filtering by teams in the current season only

We need to get a list of the teams in the current season only, which we can define by

Filter Team

IF {FIXED [Team]: MAX([Season End Year])} = {MAX([Season End Year])} THEN [Team] END

Breaking this down: {FIXED [Team]: MAX([Season End Year])} returns the maximum season for each team, which is compared against the maximum season in the data set. So if there is a match, then name of the team is returned. Listing this out in a table we get Null for all the teams whose latest season didn’t match the overall maximum

We can then build a set off of this field. Right click on the Filter Team field Create > Set and click the Null field and select Exclude

Then add this to the Filter shelf, and the list of teams displayed in the Team filter display will be reduced.

And with that the challenge is completed. My published viz is here.

Happy vizzin’!

Donna

Can you design a multi-dimensional Panel Chart for detailed analysis?

July 12, 2024 Donna ColesLeave a comment

Yusuke set this week’s challenge which is pretty tough-going. I got there eventually, but it wasn’t smooth sailing, and had a lot of false starts and changes to calculations throughout to get to the end. I will endeavour to explain where I had difficulties, but some of the calculations I came up with were more down to trial and error ( eg I wonder if this will work…?) rather than a known direction.

Modelling the data

We were provided with a version of Superstore and a State Abbreviations data set. I chose to use the Superstore Excel file I already had. I combined the two data sets in the data source pane using a relationship where State/Province in the Orders table of the Superstore excel file matched with the Full Name field in the State Abbreviations csv file.

As the data was just related to 2024, I added this as a data source filter (Year of Order Date = 2024)

Identifying the month

I decided I was going to use a parameter to select the month to filter. I first created

Order Date MY

DATE(DATETRUNC(‘month’,[Order Date]))

and then created a parameter

pOrderDate

date parameter defaulted to 01 Dec 2024, that I populated as a list using the values from the Order Date MY field. I set the display format to be a custom format of mmmm yyyy

Examining the data

On a new sheet, add State/Province and Abbreviation to Rows and Category to Columns to create a very simple ‘existence table’.

The presence of the Abc in the text table, shows that at least 1 record exists for the State/Category combination during 2024.

We can see some states, such as Arkansas, District of Colombia, Kansas etc don’t have any records at all for some Categories. However, when we look at the viz, we can see markers and labels associated to these Categories. In the image below, Kansas shows markers and labels against Furniture and Technology, when there are no records for these combinations at any point in 2024.

Handling this situation is the reason some of the calculations that follow are more complicated than you might expect.

Building the Trellis/Panel Chart calculations

Now when I built the viz, putting it into a trellis display was the last thing I did, but in rebuilding in order to write the blog, I’m finding that the table calcs needed start to help with the missing marks discussed above.

We need to count the number of States. For this create

State Count

SIZE()

Make this discrete and add to Rows. Adjust the table calculation to compute by State/Province and Abbreviation. 47 should be listed against every row in the column which is the number of rows displayed, and an Abc mark now exists for every State/Category combination.

We’ll also create

Index

INDEX()

make this discrete and add to Rows as well adjusting the table calculation to also compute by State/Province and Abbreviation

This has the effect of providing a counter for every row from 1-47.

We also need a parameter to define how many columns the display will be over

pCols

integer parameter defaulted to 8 which displays a range from 1 – 15 with a step of 1

With these fields, we can now define which row each record will sit on based on the counter for each State (the Index) and the number of columns to display (pCols).

Rows

INT(([Index]-1)/[pCols])

and we can also work out which column each record will sit in

Cols

([Index]-1)%[pCols]

Make both these fields discrete and add to Rows. Verify the table calc setting is as before. Show the pCols parameter, and test changing it and observe how the Rows and Cols values change

Remove State Count and Index from Rows. These won’t be needed in the final viz, as they’re just building blocks for other calculated fields.

Calculating the Profit Ratios

We need two Profit Ratio values – one for each State and Category per month and one ‘overall’ for each State by month

At the State/Category level, create

Monthly Sales

IF [Order Date MY] = [pOrderDate] THEN [Sales] ELSE 0 END

and

Monthly Profit

IF [Order Date MY] = [pOrderDate] THEN [Profit] ELSE 0 END

and from this create

Monthly PR

ZN(SUM([Monthly Profit])/SUM([Monthly Sales]))

custom format this to 0.0%;-0.0%;N/A which will then display the text N/A whenever the value is 0. Add this to the Text shelf.

Some additional State/Province records have appeared at this point with no Abbreviation (null). Filter these out.

To get the Overall profit ratio for the State, I created

State Sales for Month

{FIXED [State/Province]: SUM(
IF [Order Date MY] = [pOrderDate] THEN [Sales] ELSE 0 END)
}

and

State Profit for Month

{FIXED [State/Province]: SUM(
IF [Order Date MY] = [pOrderDate] THEN [Profit] ELSE 0 END)
}

and then created

PR for Month

SUM([State Profit for Month])/SUM([State Sales for Month])

and formatted this to a % with 1 dp.

Add PR for Month into the table and then apply a Sort to the State/Province field to sort by PR for Month descending

We’ve still got some gaps – we want to see the PR for Month value populated against every Category which have a profit ratio. For this create

State PR for Month

WINDOW_MAX([PR for Month])

Add this to the table and apply the table calculation to compute by Category only – we now have the overall profit ratio for each State plotted against every Category.

Note – we needed both PR for Month and State PR for Month as the latter is a table calculation and we can’t apply sorting on a table calc.

Building the panel chart

In building this I referred to my own blog post on a previous challenge, as the bar chart we need to build, isn’t a typical bar chart.

On a new sheet, add State/Province, Abbreviation and Category to Rows. Add Monthly PR to Text. Move the Abbreviation pill from Rows to Filters and exclude Null. Sort the State/Province field by PR for Month descending. Show the pOrderDate and pCols parameters.

Add Cols to Columns and Rows to Rows, so it’s the first pill listed. Move State/Province to Detail. Adjust the table calculations of both the Rows and Cols fields to be computing by State/Province and Category (listed in that order), and at the level of State/Province.

To build the bars, we’re using ‘fake axis’ on both the x & y axis to position the marks we want to display. We need

PR Bar Axis

ZN(LOOKUP(MIN(0.0),0))

and

Y-Axis

IFNULL(LOOKUP(MIN(0.5),0),0.5)

As alluded to above, these evolved as I played around to get the behaviour required.

Add Y-Axis to Rows and adjust the table calculation to compute by State/Province and Category (in that order) at the level of State/Province. Edit the axis to be fixed from -1 to 4.

Add PR Bar Axis to Columns. Again adjust the table calculation setting to be as the others. Change the mark type to Bar. Add another instance of Monthly PR to the Size shelf, and then adjust the Size to be Fixed and aligned Left.

Add Monthly PR to the Colour shelf, and adjust the colour legend to be fixed from -1.5 to 1.5 and centred at 0.

Edit the PR Bar Axis axis to be fixed from -1.25 to 1.25, to ensure the bars remain centred aligned regardless of the month selected.

Adjust the Tooltip as required.

Format the chart to remove all gridlines, remove the zero line for rows, and make the zero line for columns more prominent – dashed and darker. Remove all axis ticks. Add dark row and column dividers against the pane only and at the appropriate level so the information for a single State is contained within the boundaries.

Add row banding against the pane only at the appropriate band size and level

Reduce the width of each row, and widen each column to get a ‘squarer’ display.

Now we need to add the State/overall profit square. for this we need to ‘position’ a mark on the viz

Square Point

IF LAST()=0 THEN 0.9 ELSE NULL END

Add this to Columns. Edit the table calculation so it’s just computing by Category, then make the chart dual axis and synchronise axis.

This has created a second marks card, and essentially ‘duplicated’ the information for the ‘last’ Category only in each State.

Change the mark type to square. Remove Monthly PR from Colour and Size and Label . Add State PR for Month to Label and Colour instead. Adjust the table calculations to be computing by Category only. Align the label to be middle centre. Increase the Size of the mark so it’s roughly square.

Adjust the Colour legend of the State PR for Month so it ranges from -1.5 to 1.5 like the other one.

Adding Abbreviation to Label doesn’t work for every cell, so we need

Label – Abbreviation

WINDOW_MAX(MAX([Abbreviation]))

Add to Label and adjust the table calculation to be computing by Category only. Adjust the layout of the Label to match.

Update the Tooltip to reference State PR for Month instead. Hide the null indicator.

Finally hide the axis and the Cols and Rows pills (right click, uncheck show header). Hide the Category column title. Name the sheet.

And that should be the main viz. Add to a dashboard and only show one of the colour legends, renamed accordingly.

My published viz is here.

Happy vizzin’!

Donna

Can you filter a small multiples chart?

March 14, 2024 Donna ColesLeave a comment

Erica set the challenge this week, and I’m not gonna lie, I found this tough. On the face of it, it looks like something I felt I should be ok at, but nuances cropped up as I was building that meant I often had to change tact and try something different.

My intention was to build each version – Beginner, Intermediate then Advanced, adding to my solution each time, but decisions I made early on, then caused me grief later. For example, I did choose to utilise a lot of table calcs, but that meant when it came to applying the sorting mechanism that I wanted, I couldn’t reference the sort field I needed, as it contained a table calc. So I had to unpick the logic and build with LODs instead which took me a while to get right. Getting single lines to display in each ‘state’ cell also proved tricky at times, and that was even before I’d got to the requirement to pad out the ‘missing values’ with 0s. I also seemed to find that some things only seemed to work if I added pills and applied settings in a particular order. All in all, quite a challenge, and while I did get there in the end, I did have to peek at the solution at times to figure out if I was going nuts, but I found trying to scale back Erica’s solution to the beginner/intermediate version also suffered the same issues I was experiencing. I built with Desktop v2024.1 and there were times I was wondering if something had “broken” in that version, although having finally reached the end, I’ve yet to test that theory.

So, I’m blogging this guide with several caveats – Going from beginner to intermediate is ‘okay’, but when it gets to advanced I had to start again(ish). Some of the calcs I provide will just be ‘as is’. I will do my best to explain what’s going on, but there are times that I just don’t get it, and it’s just been trial and error that got me the results I needed – sorry!

So with all that in mind, let’s get building.

Initial steps

After connecting to the provided hyper file, I found I did have to create the modified sales value

Sales Modified

IF SUM([Sales])<5000 THEN SUM([Sales])*10 ELSE SUM([Sales]) END

I also decided to add a data source filter (right click data source > edit data source filters) to restrict the data to just Country/Region = USA.

The later versions of superstore have Canadian Provinces included too, and I don’t think these were listed in Erica’s solution. It just felt easier all round to exclude these records from source.

Beginner challenge

When building a trellis chart, we need to determine which Row and which Column our specific dimensions (in this case State/Province) will sit in. As the requirements already stated it was to be a 3 column grid, our calculations for this didn’t need to be so complex.

Cols

(INDEX()-1)%3

INDEX() returns an incremental number starting at 1 for whatever dimension or set of dimensions we’re counting over. In this case we’re counting the number of State/Provinces. %3 returns the remainder when the index is divided by 3, so we get values of 0, 1 and 2.

Change this field to be discrete (right click -> convert to discrete)

Add State/Province to Rows then add Cols to Rows. Edit the table calculation so the field is computing by State/Province. You can see that the first 3 rows will be positioned in columns 0, 1,2 respectively and so on.

Create a new field

Rows

INT((INDEX()-1) / 3)

This takes the index value (minus 1), divides by 3 and ’rounds’ to a whole number. Make this discrete too and add to Rows, setting the table calculation as described above. Now we can see the first 3 rows will all actually be in the same row (row 0), then next 3 rows in row 1 and so on.

Shift the pills around so Cols is on Columns, Rows is on Rows and State/Province is on Detail. Add Sales Modified to Rows.

Create a new field

Quarter Date

DATE(DATETRUNC(‘quarter’, [Order Date]))

and add this to Columns setting it as a continuous exact date (green pill). We’ve got a bit of unexpected ‘spaghetti’ going on…

To fix it, do the following ..

Add Quarter Date to Detail as a discrete exact date (blue pill). Change Quarter Date on Columns to be a continuous attribute (green pill – first change to attribute, then change to continuous). Edit the table calculation settings for both the Rows and the Cols fields to be computing by both State/Province and Quarter Date at the level of State/Province.

I had to reference previous challenges and blog posts I’d written to manage this… maybe there is something simpler, as this is pretty taxing for the ‘beginner’ part of the challenge.

Add another instance of ATTR(Quarter Date) to Columns and make dual axis and synchronise the axis. This will create an axis at the top of the chart as well as the bottom.

Format the ATTR(Quarter Date) pill so the axis format is custom formatted to “Q”q “‘”yy

Edit all the axis (top/bottom and left) and update the title. Adjust the Tooltip. Hide the Cols and Rows fields (right click and uncheck show header). Change the Colour of the line to grey.

This should be the Beginner solution. I have published this here.

Intermediate challenge

For this part of the challenge, we need to set up lots of new calculations, so let’s do this first. As usual, I’ll manage this in a tabular format. So on a new sheet, add State/Province and Quarter Date as a discrete exact date (blue pill) to Rows. Add Sales Modified to Text.

We need to get the threshold value for each State/Province, which is the average of the numbers all listed above, multiplied by 2. We’ll use LODs for this

Threshold

{FIXED [State/Province]: AVG({FIXED [State/Province], [Quarter Date]: ([Sales Modified])})} * 2

working inside out… get the value of the Sales Modified value for each State/Province and Quarter Date (which is the same as the values you see listed above) and then average this at the State/Province level and multiple the final result by 2. Add this to the table. This is the field we’ll be using for the horizontal reference line.

Next we need to identify the rows where the Sales Modified value exceeds the threshold, and then return the Sales Modified values for only these rows. I’ll do this in 2 stages

Is Above Threshold?

INT([Sales Modified] > SUM([Threshold]))

This returns a 1 or 0 depending on whether the statement is True or False. Using actual numeric values rather than boolean helps later on. Set this field to be discrete and add to the table.

Above Threshold Sales

IF [Is Above Threshold?]=1 THEN [Sales Modified] END

Add this to the table too. For the rows where we have 1’s, a value is displayed. This is the field we’ll be using for the red circles.

Finally we need to determine some fields to help us define a reference band. These need to be dates as they’ll be applied to the date axis, but the band doesn’t stretch to the previous/next quarter, and is only present if the last value is over the threshold.

Again using LODs let’s get the final date in the quarter

Max Quarter Per State

{FIXED [State/Province]: MAX([Quarter Date])}

Add this to the table as a discrete exact date (blue pill).

Now we need to know if the value associated with the final quarter is above the threshold or not

Final Quarter Above Threshold

INT(MIN([Quarter Date]) = MIN([Max Quarter per State]) AND [Is Above Threshold?]=1)

Again this will return a 1 or 0. Change to discrete and pop that into the table too. We can see Colorado is the first state listed where this is true.

Now we want to ‘spread’ that value across every row associated to the state

State Has Final Quarter Above Threshold

{FIXED [State/Province]: MAX(
{FIXED [State/Province], [Quarter Date]: ([Final Quarter Above Threshold])})
}

For each State/Province and Quarter Date, get the Final Quarter Above Threshold value and then get the maximum value of this for each State/Province. This is where having the values as 1’s and 0s helps.

Make discrete and add this to the table. Every row for Colorado has this set to 1

Now we can work out some dates

Ref Band Min

DATE(IF [State Has Final Quarter Above Threshold] = 1 THEN
DATEADD(‘month’, -1, [Max Quarter per State])
END)

If the State/Province is over the threshold for it’s final quarter, then get a date and set it to be 1 month less than the final quarter for that state.

Similarly

Ref Band Max

DATE(IF [State Has Final Quarter Above Threshold] = 1 THEN
DATEADD(‘month’, 1, [Max Quarter per State])
END)

If the State/Province is over the threshold for it’s final quarter, then get a date and set it to be 1 month more than the final quarter for that state.

Add both of these of the table as discrete exact dates (blue pills).

Now we have all the building blocks to build the next bit of the challenge.

Start by duplicating the Beginner viz.

Add Above Threshold Sales to Rows and make Dual axis and synchronise the axis.. Remove the 2nd instance of the Quarter Date from Columns, so we now have marks cards relating to the 2 measures rather than the 2 dates. Remove Measure Names from the All marks card.

Change the mark type on the Above Threshold Sales marks card to circle and change the colour to red. Adjust size to suit.

Add Threshold to the Detail shelf of the All marks card. Right click on the Sales axis and add reference line. Set it to be per pane and use the Threshold field. Display the value as the Label. Don’t show a Tooltip. Display a dotted black line.

Update the Tooltip to now have a reference to the threshold value too.

Add Ref Band Max and Ref Band Min to the Detail shelf of the All marks card. Set them both to be continuous attributes (green pills).

Right click on the Quarter Date axis and add reference line. Set it to be a band per pane that goes from the Ref Band Min to the Ref Band Max. Don’t display labels or tooltips or a line. Fill with a pale shade ot red/pink.

Add back in the additional Quarter Date dual axis as described in the Beginner section to get the dates listed the top again.

Hide the right hand axis (uncheck show header) and hide the NULL indicator.

This completes the Intermediate challenge. My version is published here.

Advanced challenge

Go back to the tabular sheet we were using to check the calculations needed for the Intermediate challenge, as we’ll build on this.

Firstly the sorting. We need to sort the states based on whether the final quarter was above the threshold or not, and then by the number of times the state was above the threshold. Let’s get the count to start with

Above Threshold Count Per State

{FIXED [State/Province]: SUM(
{FIXED [State/Province], [Quarter Date]: ([Is Above Threshold?])})
}

For each State/Province and Quarter Date, get the Is Above Threshold? value and then sum these up for each State/Province. This is where again having the values as 1’s and 0s helps.

Make this discrete and add to the table

then create a field we’ll use for the sort. This is going to be a numeric field

Sort

IF [State Has Final Quarter Above Threshold] = 1 THEN
100000 + [Above Threshold Count Per State]
ELSE
[Above Threshold Count Per State]
END

We’re just using a very large arbitrary number to force those states where the final quarter is over the threshold to be higher in the list. Make this discrete and add to the table too.

We can now apply a Sort to the State/Province field to sort by the Sort field descending

This results in Colorado moving to the top of the list followed by Minnesota, which also has 3 quarters above the threshold, including the last quarter, but alphabetically falls after Colorado so is listed 2nd.

To filter the data, I created another field, just for ease

Filter

IIF([State Has Final Quarter Above Threshold]=1,’Urgent’,’Non-Urgent’)

Add this to the Filter shelf and set to Urgent. The states should now be restricted to just those where the final quarter is above threshold.

For the final requirement of this challenge, we’ll build out another table on another sheet to demonstrate, as we need to work with a different instance of the quarter date.

On a new sheet, add State/Province to Rows and add Order Date at the Quarter (month year) level as a discrete field (blue pill) to Rows. Add Sales Modified to Text

For Alabama, we don’t have a 2020 Q3 or a 2023 Q3. Click on the Quarter(Order Date) pill and select Show Missing Values. These quarters appear but with no Sales Modified value.

We’ve had to use this different way to define the date quarter as if we tried to ‘show missing values’ against the Quarter Date field that is set to ‘exact date’, we send up getting every day that is missing, not just the dates relating to the quarter. Also, the ATTR(Quarter Date) field we’ve used on the previous vizzes, doesn’t allow the Show Missing Dates option.

Anyway, we need to get 0s in to these dates.

Show Modified with 0

ZN(LOOKUP([Sales Modified],0))

Apart from the Rows/Cols calcs needed for the trellis, this is the only table calculation I ended up using. It’s basically looking up it’s own row (LOOKUP([field],0)) and if it can’t find a value (as it’s missing) it’s returning 0 (the ZN() function). Add that to the table.

Ok so now we have the components needed, let’s build the viz. We can use the Intermediate version as a starting point, but will need to reapply some of the features.

Duplicate the Intermediate viz.

Remove the 2nd instance of the ATTR(Quarter Date) pill on Columns. Drag the Sales Modified with 0 pill and drop it directly over the Sales Modified pill so it replaces it. Hopefully the chart should still look the same.

Right click the Order Date field from the left hand data pane, and drag it directly onto the ATTR(Quarter Date) pill. Release the mouse and select the continuous quarter/year option from the dialog that displays.

Things will start to look a bit odd… you’ve lost your lines… Remove the Quarter Date pill from the Detail shelf on the All marks card. Fix the Rows and Cols table calc fields by just updating them to compute by State/Province only. Adjust the table calc of the Sales Modified With 0 field to compute by Quarter of Order Date and State/Province in that order.

Things still look crazy…. but just one more step… Show Missing Values on the Quarter(Order Date) pill.

Now every cell should be associated to a single State/Province with no broken lines, and Wyoming, right at the bottom, should show more than a single dot. This was A LOT of trial and error to fathom all this out.

Add back in the reference band following the instructions above (you should find the reference line for the threshold value will just then appear) and re-update the Tooltip.

Add another instance of Order Date at the quarter/year continuous level (green pill) to Columns and make dual axis and synchronise the axis. Edit the axis titles, and format them to the “Q”q “‘”yy custom format.

Apply the sort to the State/Province field on the Detail shelf so it is sorting by Sort Descending. Add Filter to the Filter shelf and select Urgent.

Add this to a dashboard, and that should be the completed Advanced challenge which I’ve published here.

There really was some black magic going on here at times. Tough one this week!

Happy vizzin’!

Donna

Can you visualise when it gets busy?

March 7, 2024 Donna ColesLeave a comment

It was Yoshi’s first challenge as a #WOW coach this week, and it provided us with an opportunity to develop our data densification and date calculation skills. I will admit, this certainly was a challenge that made me have to think a lot and reference other content where I’d done similar things before.

Modelling the data

Yoshi provided us with an adapted data set of patient waiting times, based on the Healthcare dataset from the Real World Fake Data (#RWFD) site. This provides 1 row per patient with a wait start time and end time. The requirement was to understand how many patients were waiting in each 30 min time slot in a 24hr period, so if a patient waited over 30 mins, or the start & wait time spanned a 30 min time slot, then the data needed to be densified appropriately. Yoshi therefore provided an additional data set to use for this densification, and I have to be honest, it wasn’t quite what I was expecting.

So the first challenge was to understand how to relate the two sets of data together, and this took some testing before I got it right. Yoshi provided hints about using a DATEDIFF calculation to find the time difference between the wait start time (rounded to 30 mins) and the wait end time (rounded to 30 mins).

To work out the calculation required, I actually first connected to the Hospital ER data set and then spent time working out the various parts of the formula required. I wanted to work out what the waiting start time was ’rounded down’ to the previous 30 min time slot, and what the waiting end time was ’rounded up’ to the next 30 min time slot.

Date Start Round

//Rounding down the wait start time

IIF(DATEPART(‘minute’, [Date Start])<30 , DATETRUNC(‘hour’, [Date Start]) , DATEADD(‘minute’, 30, DATETRUNC(‘hour’, [Date Start])))

If the minute of Date Start is between 0 and 29, then return the date at the hour mark, otherwise (if the minute of Date Start is between 30-59) then find the date at the hour mark, and add on 30 minutes. So if Date Start is 23/04/2019 13:23:00 then return 23/04/2019 13:00:00, but if Date Start is 23/04/2019 13:47, then return 23/04/2019 13:30:00.

Date End Round

//Rounding up the wait end time

IIF(DATEPART(‘minute’, [Date End])<30 , DATEADD(‘minute’, 30, DATETRUNC(‘hour’, [Date End])) , DATEADD(‘hour’, 1,DATETRUNC(‘hour’, [Date End])))

If the minute of Date End is between 0 and 29, then find the date at the hour mark, and add on 30 minutes, otherwise (if the minute of Date End is between 30-59) then find the date at the hour mark, and add on 1 hour. So if Date End is 23/04/2019 13:23:00 then return 23/04/2019 13:30:00, but if Date End is 23/04/2019 13:47, then return 23/04/2019 14:00:00.

With this, I was then able to relate the Hospital ER data set to the dummy by 30 min data set by adding a relationship calculation to the Hospital ER data set of

DATEDIFF(‘minute’,
IIF(DATEPART(‘minute’, [Date Start])<30 , DATETRUNC(‘hour’, [Date Start]) , DATEADD(‘minute’, 30, DATETRUNC(‘hour’, [Date Start]))),
IIF( DATEPART(‘minute’, [Date End])<30 , DATEADD(‘minute’, 30, DATETRUNC(‘hour’, [Date End])) , DATEADD(‘hour’, 1,DATETRUNC(‘hour’, [Date End])))
)

(which returns the difference in minutes between the ’rounded down’ Date Start and the ’rounded up’ Date End – you can’t reference calculated fields in the relationship, so you have to recreate)

and set this to be >= to Range End from the dummy by 30 min data set

Let’s see what this has done to the data.

On a sheet add Patient ID. Date Start (as discrete exact date – blue pill), Date End (as discrete exact date – blue pill), Date Start Round (as discrete exact date – blue pill), Date End Round (as discrete exact date – blue pill) and Index (as discrete dimension – blue pill) to Rows and add Patient Waittime to Text.

Looking at the data for Patient Id 899-89-7946 : they started waiting at 06:18 and ended at 07:17. This meant they spanned three 30 min time slots: 06:00-06:30, 06:30-07:00 and 07:00-07:30, and consequently there are 3 rows displayed, and the ’rounded’ start & end dates go from 06:00 to 07:30.

Identifying the axis to plot the 24hr time period against

Having ‘densified’ the data, we now need to get a date against each row related to the 30 min time slot it represents. ie, for the example above we need to capture a date with the 06:00 time slot, the 06:30 time slot and the 07:00 time slot.

But, as the chart we want to display is depicted over a 24hr timeframe, we need to align all the dates to the exact same day, while retaining the time period associated to each record. This technique to shift the dates to the same date, is described in this Tableau Knowledge Base article.

Date Start Baseline

DATEADD(‘day’, DATEDIFF(‘day’, [Date Start Round], #1990-01-01#), [Date Start Round])

Add this into the table, you can see what this data all looks like – note, I’m just choosing a arbitrary date of 01 Jan 1990, so my baseline dates are all on this date, but the time portions reflect that of the Date Start Round field.

With the day all aligned, we now want to get a time reflective of each 30 min timeslot. The logic took a bit of trial and error to get right, and there may well be a much better method than what I came up with. It became tricky as I had to handle what happens if the time is after 11pm (23:00) as I needed to make sure I didn’t end up returning dates on a different day (ie 2nd Jan 1990). I’ll describe my logic first.

If the Index is 0 then we just want the date time we’ve already adjusted – ie Date Start Baseline

If the Index is 1, then we need to shift the Date Start Baseline by 30 minutes. However, if Date Start Baseline is already 23:30, then we need to hardcode the date to be 01 Jan 1990 00:00:00, as ‘adding’ 30 mins will result in 02 Jan 1990 00:00:00.

If the Index is 2, then we need to shift the Date Start Baseline by 1 hour. However, if Date Start Baseline is already 23:30, then we need to hardcode the date to be 01 Jan 1990 00:30:00, as ‘adding’ 1 hour will result in 02 Jan 1990 00:30:00. Similarly, if Date Start Baseline is already 23:00, then we need to hardcode the date to 01 Jan 1990 00:00:00, otherwise we’ll end up with 02 Jan 1990 00:00:00.

As the relationship didn’t result in any instances of Index > 2, I stopped my logic there. This is all encapsulated within the calculation

Date to Plot

CASE [Index]
WHEN 0 THEN [Date Start Baseline ]
WHEN 1 THEN IIF(DATEPART(‘hour’, [Date Start Baseline ])=23 AND DATEPART(‘minute’, [Date Start Baseline ])=30, #1990-01-01 00:00:00#, DATEADD(‘minute’, 30, [Date Start Baseline ]))
WHEN 2 THEN IIF(DATEPART(‘hour’, [Date Start Baseline ])=23 AND DATEPART(‘minute’, [Date Start Baseline ])=30, #1990-01-01 00:30:00#, IIF(DATEPART(‘hour’, [Date Start Baseline ])=23 AND DATEPART(‘minute’, [Date Start Baseline ])=0, #1990-01-01 00:00:00#, DATEADD(‘hour’, 1, [Date Start Baseline ])))
END

Phew!

Add this to the table as a discrete exact date (blue pill), and you can see what’s happening.

For Patient Id 899-89-7946, we have the 3 timeslots we wanted: 06:00 (representing 06:00-06:30), 06:30 (representing 06:30 -07:00) and 07:00 (representing 07:00-07:30).

On a new sheet, add Date To Plot as a discrete exact date (blue pill) and then create

Count Patients

COUNT([Patient Id])

and add to Text and you have a list of the 48 30 min time slots that exist within a 24hr period, with the patient counts.

We need to be able to filter this based on a quarter, so create

Date Start Quarter

DATE(DATETRUNC(‘quarter’,[Date Start]))

and custom format this to yyyy-“Q”q

Add to the Filter shelf selecting individual dates and filter to 2019-Q2, and the results will adjust, and you should be able to reconcile against the solution.

Building the 30 minute time slot bar chart

On a new sheet, add Date Start Quarter to the Filter shelf and set to 2019-Q2. Then add Date to Plot as a continuous exact date (green pill) to Columns and Count Patients to Rows. Change the mark type to a bar and show mark labels.

Custom format Date To Plot to hh:nn.

We need to identify a selected bar in a different colour. We’ll use a parameter to capture the selected bar.

pSelectTimePeriod

date parameter defaulted to 01 Jan 1990 14:00 with a display format of hh:nn

Then create a calculated field

Is Selected Time Period

[Date to Plot] = [pSelectTimePeriod]

and add to the Colour shelf, adjusting colours to suit and setting the font on the labels to match mark colour.

A ‘bonus’ requirement is to make each bar ’30 mins’ wide. For this we need

Size – 30 Mins

//number of seconds in 30 mins divided by number of seconds in a day (24hrs)
(30 * 60) / 86400

Add this to the Size shelf, change it to be a dimension (so no longer aggregated to SUM), then click on the Size shelf, and set it to be fixed width, left aligned.

Via the Colour shelf, change the border of the mark to white.

Next, we need to get the tooltip to reflect the ‘current’ time slot, as well as the next time slot. For this I created

Next Time Period

IIF(LAST()=0, #1990-01-01 00:00:00#, LOOKUP(MIN([Date to Plot]),1))

This is a table calculation. If it’s the last record (ie 01 Jan 1990 23:30), then set the time slot to be 1st Jan 1990 00:00, otherwise return the next time slot (lookup the next (+1) record). Custom format this to hh:nn and add to the Tooltip shelf. Set the table calculation to compute by Date to Plot and Is Selected Time Period.

Update the Tooltip as required.

The final part of this bar chart is another ‘bonus’ requirement – to add 2 hr time interval ‘bandings’. For this I created another calculated field

2hr Time Period to Plot

IIF(DATEPART(‘hour’, MIN([Date to Plot]))%4 = 0 AND DATEPART(‘minute’, MIN([Date to Plot]))=0,WINDOW_MAX([Count Patients])*1.1,NULL)

If the hour part of the date is divisible by 4 (ie 0, 4, 8,12,16, 20), and we’re at the hour mark (minute of date is 0), then get the value of highest patient count displayed in the chart, and uplift it by 10%, otherwise return nothing.

Add this to Rows. On the 2nd marks card created, remove Is Selected Time Period from Colour and Next Time Period from Tooltip. Adjust the table calculation of the 2hr Time Period to Plot to compute by Date to Plot only. Remove labels from displaying.

The bars are currently showing all at the same height (as required) but at a width of 30 minutes. We want this to be 2 hrs instead, so create

Size – 2 Hrs

//number of seconds in 2hrs / number of seconds in a day (24 hrs)
(60*60*2)/86400

Add this to the Size shelf instead (fixing it to left again). Adjust the colour to a pale grey, and delete all the text from the Tooltip.

Hide the nulls indicator. Make the chart dual axis and synchronise the axis. Right click on the 2hr Time Period Plot axis (the right hand axis) and move marks to back.

Tidy up the chart by removing row & column dividers, hiding the right hand axis, removing the titles from the other axes. Update the title accordingly.

Building the Patient Detail bar chart

On a new sheet, add Is Selected Time Period to Filter and set to True. Then go back to the other bar chart, and set the Date Start Quarter Filter to Apply to Worksheets -> Selected Worksheets and select the new one being built.

Add Department Referral, Patient Id, Date Start (as discrete exact date – blue pill) and Date End (as discrete exact date – blue pill) to Rows and Patient Waittime to Columns. Manually drag the None Department Referral option to the top of the chart. Add Patient Waittime to Colour and adjust to use the grey colour palette. Remove column dividers.

The title of this chart needs to display the number of patients in the selection, as well as the timeframe of the 30 min period. We already have the start of this captured in the parameter pSelectTimePeriod. We can use parameters to capture the other values too.

pNumberPatients

integer parameter defaulted to 0

pNextTimePeriod

date parameter, defaulted to 01 Jan 1990 14:30:00, with a custom display format of hh:nn

Update the title of the Patient Detail bar chart to reference these parameters (don’t worry about the count not being right at this point).

Capturing the selections

Add the sheets onto a dashboard. Then create the following dashboard parameter actions.

Set Selected Start

On select of the 30 Min bar chart, set the pSelectTimePeriod parameter passing in the value from the Date To Plot field.

Set Selected End

On select of the 30 Min bar chart, set the pNextTimePeriod parameter passing in the value from the Next Time Period field.

Set Count of Patients

On select of the 30 Min bar chart, set the pNumberPatients parameter passing in the value from the Count Patients field, aggregated at the SUM level.

With these all set, you should find you can click on a bar in the 30 minute chart and filter the list of patients below, with the title reflecting the bar chosen.

The basic viz itself wasn’t overly tricky, once you get over the hurdle of the calculations needed for the relationship and identifying the relevant 30 min time periods.

My published viz is here. Enjoy!

Happy vizzin’!

Donna

Can you create these KPI Cards?

February 29, 2024 Donna ColesLeave a comment

Lorna set a table calculation filled challenge this week to recreate some KPI cards using an aggregated and amended version of the Superstore data set.

Lorna purposefully used a string Period field to define the timeframe to encourage the use of table calculations. No doubt, there are ways you could use string functions/regex to extract the relevant year and month number to come up with a different solution, but I’m going to head down the intended route.

And as with any table calculation based challenge, I find it best to always build out all the calculations I need into a tabular display to start with before building the viz. So let’s get started…

Building out the Calculations

We need to have a handle on more of the data than just that associated to the Period we’re interested in, so we can’t use a simple Quick Filter on the Period field to restrict the data, otherwise we can’t ‘access’ data for the previous period. So to manage which Period we want to focus on, I created a parameter

pSelectedPeriod

string field that uses a list where the values are added from the Period field. Default to P12 Y2022/23

On a new sheet, add Period to Rows and then add the Sales , Profit and Quantity measures to Text. Show the pSelectedPeriod parameter

What we’re going to do is get the values for each measure that is associated to the pSelectedPeriod, and display that value over every row.

Curr Sales

WINDOW_MAX(SUM(IF [Period] = [pSelectedPeriod] THEN [Sales] END))

If the Period matches that in the parameter, then get the Sales value and then use Window_MAX to ‘spread’ that value over every row

Add this to the table, and edit the table calculation to ensure the value is being computed by Period.

Repeat the same for Profit and Quantity

Curr Profit

WINDOW_MAX(SUM(IF [Period] = [pSelectedPeriod] THEN [Profit] END))

Curr Qty

WINDOW_MAX(SUM(IF [Period] = [pSelectedPeriod] THEN [Quantity] END))

If you change the parameter, you should see all the values in the last 3 columns changing to reflect the value from the first 3 columns of the relevant row.

Now we need to get the value from the previous period, ie the data from the previous row

Prev Sales

WINDOW_MAX(IF MIN([Period]) = [pSelectedPeriod] THEN LOOKUP(SUM([Sales]),-1) END)

The LOOKUP function is taking the Sales value from the previous 1 row (-1), and then WINDOW_MAX is once again ‘spreading’ this value across every row. We also need

Prev Profit

WINDOW_MAX(IF MIN([Period]) = [pSelectedPeriod] THEN LOOKUP(SUM([Profit]),-1) END)

Prev Qty

WINDOW_MAX(IF MIN([Period]) = [pSelectedPeriod] THEN LOOKUP(SUM([Quantity]),-1) END)

Add these to the table, and again adjust the table calc settings for every field to compute by Period.

Now we have the current and previous values for each measure, we can work out the % difference

Diff Sales %

([Curr Sales]-[Prev Sales])/[Prev Sales]

Format this to a % with 1 dp

Diff Profit %

([Curr Profit]-[Prev Profit])/[Prev Profit]

Diff Qty %

([Curr Qty]-[Prev Qty])/[Prev Qty]

Add all these onto the sheet, and remember the table calc settings (for these, there are nested table calcs, so make sure both are set properly).

We now need an arrow indicator to display up or down depending on the % value, and this needs to display as a different colour, so we need two fields per measure.

Diff Sales Indicator +ve

IF [Diff Sales %] >= 0 THEN ‘▲’ END

Diff Sales Indicator -ve

IF [Diff Sales %] < 0 THEN ‘▼’ END

I use this site to copy the text symbols from

Diff Profit Indicator +ve

IF [Diff Profit %] >= 0 THEN ‘▲’ END

Diff Profit Indicator -ve

IF [Diff Profit %] < 0 THEN ‘▼’ END

Diff Qty Indicator +ve

IF [Diff Qty %] >= 0 THEN ‘▲’ END

Diff Qty Indicator -ve

IF [Diff Qty %] < 0 THEN ‘▼’ END

Add these to the sheet if you wish too (apply the table calc settings), but you’ll only get a value for one or the other field depending on whether the difference was +ve or -ve. Below I’ve just added the two Sales indicators

The Tooltip displays the value of the two Periods being compared. One of these is in the parameter, but we need to capture the other

Previous Period

WINDOW_MAX(IF [pSelectedPeriod] = MIN([Period]) THEN LOOKUP(MIN([Period]),-1) END)

So now we have all the values we need for the KPIs captured against every row in the dataset. So now we want to just show a single row. It could be the first, it could be the last… based on Lorna’s hint, let’s filter to just show the row related to the pSelectedPeriod value.

Filter Selected Period

[pSelectedPeriod] = LOOKUP(MIN([Period]),0)

Using the offset of 0 with the LOOKUP, returns the value for the row you’re on, so adding this to the FIlter shelf and selecting True, filters the display to the row where the Period matches the parameter. NOTE if you adjust the table calc settings of this field after adding to the filter shelf, you’ll need to reselect the option to filter to True.

As this is a table calculation, the ‘filter’ is applied later in the order of operations, so information about the other rows in the table can be referenced. Filtering just by Period as a quick filter, is essentially a dimension filter and that happens earlier on in the process, meaning the data about the other rows would be inaccessible.

So we have all the fields, now build the cards.

Building the KPI Cards

On a new sheet, double click into Columns and type in MIN(0). Repeat this 2 more times. This gives us 3 axis to build each of the 3 cards.

On the All marks card, add Period to the Detail shelf. Show the pSelectedPeriod parameter. Add the Filter Selected Period to the Filter shelf and set to true (adjusting the table calc and resetting the filter value as required).

Change the mark type of the All marks card to shape and select a transparent shape (see here for more details).

On the first MIN(0) marks card, add Curr Sales, Prev Sales, Diff Sales %, Diff Sales Indicator +ve and Diff Sales Indicator -ve to the Label shelf. Remember to apply the table calc settings for all of the fields!

Adjust the text within the label, so it is formatted and positioned as required and then align middle centre.

On the middle MIN(0) marks card, do similar by adding the equivalent profit fields onto the Label shelf, and then repeat again for the bottom MIN(0) marks card, adding the quantity fields to the Label shelf.

On the All marks card, add Previous Period, Curr Sales, Curr Profit, Curr Qty Prev Sales, Prev Profit, Prov Qty, Diff Sales %, Diff Profit % and Diff Qty % to the Tooltip shelf (remember to set those table calc settings!) Then adjust the Tooltip to display the text as required. I used the ruler to shift the starting position, along with tabs (the tab keyboard button) to ‘try’ to get everything to align, and it works for most circumstances….

Finally format the KPI card

Set the worksheet background colour to light grey
Remove all gridlines, zero line, axis lines
Set the column divider to be a thick white line
Set the row divider to be a thick white line
Hide the axis

Add the sheet onto a dashboard, and you should be done.

My published viz is here.

Happy vizzin’!

Donna

Let’s Analyse New Product Sales

October 5, 2023 Donna ColesLeave a comment

For the challenge this week, Sean wanted us to identify the sales of ‘new’ products compared to previous years. This challenge was born out of a requirement he had at work, where he discovered the definition of a ‘new product’ was different to what he’d assumed.

In this instance, the ‘client’ only cared about comparing sales in the same month across the years. Any sales related to the products outside of the specific month were irrelevant and were to be ignored. A product was then counted as ‘new’ in the first year (of the specified month) that there was a sale.

In this case we are always looking at the month of September (the previous month to ‘today), so if Product X was sold in Sept 2021 and Sept 2023 only, then Product X would count as new in Sept 2021, and only the sales for Product X in Sept 2021 would be included in the subsequent values displayed.

Now I thought I’d nailed my understanding of the requirement, and went ahead building out a tabular view of the data using table calculations, but my final output did not match Sean’s result – in fact it was quite way off. I stepped through my logic, even sharing some samples with Sean to sense check, and we seemed to concur on the products identified as new, so I was incredibly puzzled as to what was causing the value discrepancy.

I messaged my friend, Rosario Gauna, to share my numbers and see what she came up with…. she matched Sean! Doh! I then had to spend time dissecting my solution and Rosario’s to see which products I was/wasn’t counting. Eventually I found the problem….

In using table calculations, I had included all the fields the viz required which included the Segment (Home Office, Corporate, Consumer). This meant that in my logic to identify a ‘new product’, I was counting the product as new in the first year that the product was sold in each Segment – ie if Product X sold in Sept 2022 against Segment Home Office and then sold in Sept 2023 against Segment Corporate, I counted both sales as being ‘new’ as the segment differed. However I only wanted the Sept 2022 sale to count as ‘new’.

With this understood, I tried to revise my initial table calc build, but to no avail. So I opened up a new workbook and started again, this time taking a different track. This is what I’ll blog, but I felt it important to explain how even a requirement I thought I’d understood, could still get mis-interpreted.

Building out the calculations

The first requirement is that we’re only looking at comparing the data for the previous month to ‘today’. If this was being developed for a live business application we’d reference the TODAY() function, but since I want this viz to continue to display on Tableau Public after we hit 2024, then I’m going to use a parameter to hardcode ‘today’.

pToday

date parameter defaulted to 4th October 2023

From this, I can then determine the month we need to compare

Month to Compare

DATEPART(‘month’, DATEADD(‘month’, -1,DATETRUNC(‘month’, [pToday])))

working from right to left (inside to out), this takes ‘today’ and finds the 1st of the month (ie 1st October), then goes back 1 month (ie 1st September), then gets the month number ie 9.

As this returns a number, it automatically is listed in the ‘measures’ section of the data pane (under the line), but I dragged it into the top half as I want the number to be discrete.

I then created

Filter Month

DATEPART(‘month’, [Order Date]) = [Month to Compare]

which returns true for all the orders where the month of the Order Date is in September.

Let’s build up a table so we can start to see what we’re working with

Add Product Name and Filter Month to Rows, Order Date (at the Year level) to Columns and Sales to Text.

In the example above, 3-ring staple pack has sales in September 2022 and in other months in 2020 and 2021, but we don’t care about those months, so this product is counted as new in 2022, with a value of $11.

3M Hangers With… have no sales in September in any year, so we won’t be counting that product at all.

6″ Cubicle Wall Clock has sales in both September 2022 and 2023, so this product will be counted as new in 2022 with a value of $83.

So we can filter the date just to Filter Month = True (move Filter Month from Rows to the Filter shelf).

Now we want to identify the earliest year for each Product Name.

Year First Purchased in Month

{FIXED [Product Name],[Filter Month]: MIN(YEAR([Order Date]))}

Add this onto Rows, and you can see below we’re picking up the right year.

Now we can identify the value of the sales we want to count.

New Product Sales

IF [Year First Purchased in Month] = YEAR([Order Date]) THEN [Sales] END

Only get Sales for the year that matches the minimum year. Format this field to $ with 0 dp.

Adding this in to the table, we can see we only have a value against the first year.

So we’ve identified the sales values of the products we care about. We can now aggregate this at a higher level, and incorporate the other dimensions.

Remove Sales from Text, add Category to Rows and Segment to Columns. Remove Product Name and Year First Purchased in Month.

The viz displays the % of total new product sales by Category per Segment. Right click on the New Product Sales pill and add Quick Table Calculation, selecting Percent of Total. Right click on the pill again and select Compute Using -> Category. Right click on the pill and Format and format to % with 0dp. Add another instance of New Product Sales back into the table.

Now we want to be able to compare the sales for the current year/category/segment against that of the previous year.

Previous Year Sales

LOOKUP(SUM([New Product Sales]),-1)

This looks up the New Product Sales value of the previous (-1) field. Add this field into the table, and right click and Edit Table Calculation. Adjust so the calculation is computing by Year of Order Date only.

You should be able to see that the values from the previous year for each Segment & Category are now displayed against each cell. As there is no data for 2019, there are no values listed for this field against the 2020 fields.

Now we have the values we want to compare in the same ‘row’ of data, we can compute

% Diff From Previous Year

(SUM([New Product Sales]) – [Previous Year Sales]) / [Previous Year Sales]

custom format this to ▲0%;▼0% (use this site to get the images to copy & paste), and add this into the table ensuring the table calculation is set to compute using Year of Order Date only, as you did above.

And now we have all the components needed to build the viz.

Building the bar chart

Duplicate the table sheet, then apply the following steps

Move % of Total New Product Sales to Rows
Move Category to Colour and adjust accordingly
Move New Product Sales and % Diff from Previous Year to Label
Remove Previous Year Sales
Add another instance of % of Total New Product Sales to Label ( I tend to do this by pressing ctrl, and then clicking on the pill in the Rows and dragging onto Text – this creates a duplicate of the pill and preserves the table calculation settings. Otherwise add an instance of New Product Sales to Label, then add the Percent of Total Quick Table Calculation and ensure the pill is set to compute by Category.)
Move Segment to be in front of the Year(Order Date) field on Columns – if this changes the viz display, ensure the mark type is set to bar.

Tidy up the display by

removing all gridlines/zero lines etc
remove row dividers
Adjust title of the y-axis
Hide the Segment/Order Date column heading
Hide the tooltips

The text and formatting of the labels also needs to be adjusted. We don’t want the ‘vs PY’ text displaying for the 2020 years when there is no difference from previous year.

Label PY Text

IF NOT ISNULL([Previous Year Sales]) THEN ‘vs PY’ END

Add this to the Label shelf and ensure the table calculation is set to compute using YEAR of Order Date only. Then adjust the layout of the label text and the format the size of the text.

Now you can just add to a dashboard!

Happy vizzin’!

Donna

Can you build a heat map with bathymetry lines?

May 18, 2023 Donna Coles2 Comments

What??? That was the first thought that went through my mind when Luke set this challenge. Bathy..huh… ??? What’s that all about. Well read the challenge to find out more 🙂

The what?? was quickly followed by errr…..? The indicator on the challenge overview page said ‘hard’ and this was a Luke challenge after all, so certainly not for the faint hearted! I wasn’t sure how this was going to go, so just started with a basic tabular view and went from there. (hint – it might be worth reading the blog to the end before you start building.. it could save some time 😉 )

I quickly built a basic heat map. I added Order Date at the discrete (blue) month level to Columns and Sub-Category to Rows. I added Sales to Colour and changed the mark type to square then added Sales to Label.

We ultimately need the cumulative Percent of Total Sales per Sub-Category, to display as the label. Click on the Sales pill on the Label shelf and Add Table Calculation. From the dialog window, choose Running Total and verify the calculation is computing Table (across) (or amend and select Specific Dimensions and ensure Month Order Date is selected). Then check Add secondary calculation and in the secondary calc dialog, select Percent of Total, again ensuring Table (across).

If applied correctly, the values in the December column should all be 100%.

I wanted to ensure this calculation was stored so I could reuse, so while pressing shift key, I dragged the Sales pill from the label shelf into the data pane, and renamed the calculation

Cumulative % of Total

RUNNING_SUM(SUM([Sales])) / TOTAL(SUM([Sales]))

Format the field to be % with 0 dp

Editing the field should show the details above. Of course, you can just create the calculated field manually and type in the syntax. If you do that, then replace the Sales field on the label field, with this one. Centre align the text.

With this field, we can create the calculation Luke provides for the colouring

Colour

ROUND([Cumulative % of Total]*50, -1)

Replace the Sales field on the Colour shelf with this field, and adjust the colour to use the Blue sequential colour palette. You should now have the basic structure and colouring of the heat map.

Adjust the font style and alignment of the Sub-Category and Order Date Month label headings, and change the month labels to be abbreviated. Hide the Order Date column label and the Sub-Category title (right click hide field labels for columns/rows). Remove all gridlines, row/column dividers etc.

Looking good.. but what next.. how to get those divider lines…. I wasn’t too sure at this point, but I knew I had to identify the cells when a ‘change in colour’ happened both horizontally and vertically.

So the first thing I did was to duplicate the above sheet into a basic crosstab (right click sheet > duplicate as crosstab), and I removed Cumulative % of Total from the display, so I just had the values used for the colour.

I decided I wanted to flag each cell with a 1 or 0 depending on whether the next cell was different or not. I started horizontally. So for each row, and starting with January, I wanted to compare the colour value for Jan with the colour value for Feb. If they were the same, I wanted to record 0 against Jan. If they were different I wanted to record 1 against Jan. I used the following calculation

Horizontal – Next Value Diff

IF ([Colour] = LOOKUP([Colour],1) OR LAST()=0) THEN 0 ELSE 0.99 END

If the Colour value for the current cell (eg Jan) matches the Colour value of the next cell (eg Feb) OR, the current cell is the last month (ie Dec), then return 0 otherwise there is a mismatch so return 0.99. Originally I used 1, but later found I had to adjust the calc to make the line show as I wanted.

Add this field to the tabular display and verify the table calculations used within the field are running Table (across).

You can see that for Accessories, the Colour value for Feb is not equal to the Colour value for Mar, and so the Horizontal – Next Value Diff value for Feb is 0.99. Whereas as the Colour value for Apr matches May, the Horizontal – Next Value Diff for Apr is 0.

I created a similar calculation to test the vertical settings

Vertical – Next Value Diff

IF ([Colour] = LOOKUP([Colour],1) OR LAST()=0) THEN 0 ELSE 1 END

This is essentially exactly the same calculation, but when added to the tabular view, the table calculation for the Vertical – Next Value Diff calc should be set to Table (down) instead

You can see that in Feb, the Colour value for Paper is equal to the Colour value for Phones, so the Vertical – Next Value Diff calc for Paper is 0. But in Mar, the Colour value for Storage is different from the Colour value for Supplies, and so the Vertical – Next Value Diff calc for Storage is 1.

Using these markers, I now want to ‘plot’ them on a viz.

So back to the heat map sheet we built above, we need some axis.

Double click into the Columns shelf and type MIN(1.0) and change the mark type to bar.

You can see a hole has appeared. To fix this, on the Analysis menu, select Infer Properties from Missing Values. The cell will populate.

Confession – I didn’t get this bit initially. I had the chart built with the lines in the correct places, but with a hole. I’d tried all sorts of combinations of ZN, and LOOKUP(expression,0) in calculations to try to make the number appear, but couldn’t get anything to work. I showed my colleague Sam Parsons, who mentioned the above setting. I have honestly never ever used it and was completely unaware of it’s existence. But it solved the problem, so massive hi-five to Sam 🙂

Double click into the Rows shelf and again type MIN(1.0). Having an axis on both rows and columns, means we can now fix the Size of the bar. Click on the Size shelf and select Fixed, width in axis units to 1 and alignment right.

Now we have both an x and y axis, we can add additional detail to get the white bar dividers displaying.

First, adjust the label so it is aligned middle centre.

Add Horizontal – Next Value Diff to Columns. Ensure the table calc settings are Table (across).

Remove both the Colour and the Cumulative % of Total fields from the marks card, and then add Horizontal – Next Value Diff as a discrete (blue) pill to Colour. Remove all gridlines, zero lines, axis ticks and row/column dividers. Change the Size to be Manual and reduce size. Make the chart dual axis and synchronise the axis.

You should have something similar to the above, assuming the colours for the Horizontal – Next Value Diff are distinguishable enough.

I adjusted the colours to set 0 to be a transparent colour (refer to this blog to learn how to add a transparent colour hex code into your custom colour palette, and then 0.99 to be white. Edit the x-axis and fix to be from 0 to 1. The white dividers between the relevant months should now be noticeable.

To set the dividers between the Sub-Category, I decided I just needed to use a constant 0 reference line. For this I needed

Vertical Ref Line

IF [Vertical – Next Value Diff] = 1 THEN 0 END

Add this to the Detail shelf on the All marks card, and verify the table calc setting for the Vertical – Next Value Diff calc is set to Table (down).

Right click on the y-axis and Add Reference Line. Set it per cell using the Vertical Ref Line field, with no label or tooltip displaying. Set the Line properties to be white, 100% opacity and a thick width.

Then edit the y-axis to also be fixed between 0 and 1. The lines separating the Sub-Category should now be prominent.

Then it’s just a case of tidying up – hide the axis, stop tooltips from displaying, and then add to the dashboard, setting as fit to entire view.

If need be you may want to tweak either the size of the bar on the Horizontal – Next Value Diff marks card, or change the thickness of the reference line to get similar sizes.

I have no idea when I’d ever need to use this type of display, but I enjoyed the puzzle and discovered something new which is what I love about using Tableau!

This version of my published viz is here.

BUT….

after writing all this up, and stepping away for a bit, I suddenly realised I’d over-complicated things! I didn’t need a dual axis after all. I could just use a reference line to show the dividers between the months, exactly like I did for the 0 constant reference line – doh! So I made some adjustments..

Amend

Horizontal – Next Value Diff

IF ([Colour] = LOOKUP([Colour],1) OR LAST()=0) THEN 0 ELSE 1 END

Then create

Horizontal Ref Line

IF [Horizontal – Next Value Diff] = 1 THEN 1 END

Then remove the Horizontal – Next Value Diff field from the Rows shelf, so there is no longer a dual axis. Add Horizontal Ref Line to Detail , and display the x-axis MIN(1.0) axis, and add a reference line per cell which references Horizontal Ref Line. Apply the same settings to the reference line as detailed above.

And this resulted in a much simpler looking viz

I know I could reduce the number of calcs used, but I like to have ‘building blocks’ to follow my thought process. I have published the simpler version within the same workbook on a separate tab, here.

Happy vizzin’!

Donna

Can you use Tableau to estimate Aaron Judge’s home run trajectories?

October 23, 2022 Donna ColesLeave a comment

It’s community month still for #WOW2022, and this week saw Samuel Epley set this challenge to visualise the home run trajectories of Aaron Judge.

I had a little mini-break to Rome this week, so was hoping I was going to be able to get this week’s challenge done and dusted on the Tuesday evening if it landed early enough, as I wasn’t going to be around.

It did land on the Tuesday for me, but wow! it was not going to be easy! I managed to build the KPIs & the scatter plots on the Tuesday evening, and knowing I didn’t have much time, just chose to use the Home Runs stats data set only. I knew these charts weren’t going to need any data densification, so found this approach simpler.

I’m afraid I’m still constrained by time at the moment, so this post isn’t going to be the detailed walkthrough you might usually expect – sorry! I’m just going to try to pull out key points from each chart.

KPIs

I built this on a single sheet, using Measure Names and Measure Values.

I used aliases on the Measure Names (right click -> Aliases) to change the label you can see displayed ie the Distance pill is aliased to ‘Average Distance’

I also custom formatted the various numbers and applied suffixes to display the unit of measure

Note – to To get the degree symbol, I typed Alt+ 0176

Scatter Plots

I built the Exit Velocity by Distance scatter plot first, and completed all the formatting & tooltips. Then I duplicated the sheet to form the basis of the other scatter plots, and just swapped the relevant pills as needed.

For the ball shape, I loaded the provided images as custom shapes into my shapes repository. I then just created the following calculated field to use as a discrete dimension I could add to the Shape shelf

Ball Shape

[HR Number]%9

It’s not as completely randomised as perhaps it should be, but it looks random enough on the display.

The Pitcher in the data is in the format <Surname>, <Forename>, but on the tooltip it needs to display as <Forename> <Surname>, so I just used a transformation on the Pitcher field to split the field based on the comma (right click Pitcher -> Transform -> Split). This automatically created 2 fields I could use on the Tooltip.

I also noticed a very subtle wording change in the tooltip based on whether the match was Home or Away. If Home, the tooltip read ‘New York Yankees vs. <Opposition>’ otherwise it read ‘New York Yankees at <Opposition>’. I used a calculated field for this logic

TOOLTIP: vs or at

IIF([Location]=’Home’,’vs.’, ‘at’)

The Trajectory Plot

OK, so this was the hardest part of this challenge, and mainly due to getting your head round the physics involved, as so many of the calculations are dependent on each other.

I’m generally pretty confident with my maths, but this was complex, especially with the force calculations for the y-axis. Samuel stated that both gravity and drag impacted the Y-axis calcs, but it wasn’t clear to me how both these forces should be applied (a bit of trial and error and I ended up adding them within the formula).

By the time I came to tackle this challenge, Samuel had already posted a video walkthrough, which can be viewed here and is another reason why I’m not going down to the nth degree in this post.

My suggestion is to watch Samuel’s video and/or feel free to download my workbook. I built my workbook independent of Samuel’s video, so there may be steps/calculations that differ.

However, I have tried to number my calculations in the order in which I created them, so you can hopefully follow the thought process. I have also left a CHK:Data sheet in the workbook, which I used to sense check what I was doing.

All the table calculations in the CHK:Data sheet are just set to the default ‘table down’ as I have filtered the sheet to a specific Home Run (HR Number = 1) only (ie I didn’t change any of the table calc settings as I added the pills to the sheet).

However, when you build the main trajectory chart, you have multiple HR Numbers in the view, so all the table calculations must be set so that calculations are only working for each HR Number. This means that any table calc (and any nested calculations) need to have all the fields except HR Number checked

When using the Pages shelf, which isn’t something I’ve ever really had to do before, you need to Show History and adjust the various settings to get the trail lines to show

To rotate the ball (the bonus option), you need another field to use on the Shape shelf. I had lost the will to live a bit by this point, so used the formula from my friend Rosario Gauna’s solution.

Rotation Shape

STR(IIF([14-Start Position Y m] <= 0, 0,
(MIN([Time Interval]) * 1000 / 25) % 9))

Note – when you add this to the Shape shelf, and select your baseball palette, just then use the Assign Palette button to automatically assign a ball to a number – this will get them into the correct order, without you having to do it one by one.

Finally, when adding the reference average lines, be sure to set the scope to per pane rather than table, otherwise you’ll end up with the wrong figures.

I think I’ve pretty much covered all the ‘little’ points that I came across that may trip you up, aside from all the tricky calcs of course!

My published workbook is here. I hope what I’ve written is enough for you to build it yourself. I think I’d still be here next year if I tried to do anything more fully! I’m off for a lie down now!

Happy vizzin’!

Donna

Donna + DataViz

ramblings in all things Tableau (and occasionally some other Stuff….)

Tag: LOOKUP

Can you visualise yearly rank change in Sub-Category sales?

Can you recreate this marginal histogram?

How are EPL Football Teams performing this season compared to previous seasons?

Can you design a multi-dimensional Panel Chart for detailed analysis?

Can you filter a small multiples chart?

Can you visualise when it gets busy?

Can you create these KPI Cards?

Let’s Analyse New Product Sales

Can you build a heat map with bathymetry lines?

Can you use Tableau to estimate Aaron Judge’s home run trajectories?