Can you create a waffle chart?

December 30, 2022 Donna ColesLeave a comment

Kyle Yetter ended the #WOW2022 year with this challenge, to create a waffle chart. I haven’t built one for a while, so needed a quick google to refresh my memory.

You’ll need a ‘template’ data source which defines the grid pattern layout, and what each square represents. Every square represents a percentage point from 1% to 100%, and is positioned in a row numbered from 1 to 10 and a column numbered from 1 to 10. This information is stored in the ‘template’. You can build your own, or you can use the waffle_template.xlsx file I created that is stored here.

Connect to the Waffle_Template excel file and make Row and Column discrete dimensions (just drag them to above the line on the left hand data pane). Then add Row to Rows and Column to Columns and Percent to Text.

Building the waffle

Now connect to the Orders data from the SuperStoreSales data source. In that data source, create a new field

Consumer %

ROUND(SUM(IF [Segment]=’Consumer’ THEN [Sales] END) / SUM({FIXED:SUM([Sales])}),2)

If the Segment is Consumer, get the sum of Sales as a proportion of the total Sales. Round the result to 2 decimal places.

Create similar fields for the other segments

Corporate %

ROUND(SUM(IF [Segment]=’Corporate’ THEN [Sales] END) / SUM({FIXED:SUM([Sales])}),2)

Home Office %

ROUND(SUM(IF [Segment]=’Home Office’ THEN [Sales] END) / SUM({FIXED:SUM([Sales])}),2)

Then in the Waffle_Template datasource, create the following field

Colour : Consumer

[Orders (Sample – Superstore)].[Consumer %]>=SUM([Percent])

The above calculation will return true when the % Sales for Consumer is greater than or equal to the Percent value.

Repeat and create equivalent fields for the other segments

Colour : Corporate

[Orders (Sample – Superstore)].[Corporate %]>=SUM([Percent])

Colour : Home Office

[Orders (Sample – Superstore)].[Home Office %]>=SUM([Percent])

Add Colour : Consumer to the Colour shelf, and adjust colours. If you get any warning messages about blending, just acknowledge.

To make the blocks, double click in the Columns shelf and type in MIN(1). Change the mark type to bar. Edit the MIN(1) axis, and fix it from 0 to 1. Remove the Percent field from the Text shelf.

Set the row and column dividers to be thick white lines, and hide the column and row headings and axis (uncheck Show Header). Uncheck Show Tooltip on the Tooltip shelf.

Name this sheet Consumer.

Then duplicate this sheet, and replace the Colour : Consumer pill on the Colour shelf with the Colour : Corporate pill. Adjust colours to suit. Name the sheet Corporate.

Then duplicate again, and repeat the process for the Colour : Home Office pill, naming the sheet Home Office.

Building the KPIs

On a new sheet, using the Orders – SuperStoreSales data source add Segment to Columns.

Create a new field

Percent of Total

ROUND(SUM([Sales]) / TOTAL(SUM([Sales])),2)

format this to percent with 0 dp and add this to the Text shelf.

Create a new field

Segment UPPER

UPPER([Segment])

and add this to the Text shelf.

Format the text and align centrally

Remove the row dividers, the Segment column labels, and uncheck Show Tooltips from the Tooltip shelf. Name the sheet BANs.

Creating the dashboard

Use a horizontal container to position the three waffle charts in. Ensure each chart is set to fit entire view. Hide the titles, and from the context menu on the container, distribute the contents evenly. Use padding on the objects to create more white space.

Add the BANs object underneath, remove the title and set to fit entire view. Adjust the heights and widths as required to ensure the waffles are spaced evenly and are square, and the BANs line up underneath.

My published viz is here.

Happy vizzin’!

Donna

Can you track headcount?

April 7, 2022 Donna ColesLeave a comment

Luke decided to set us a challenge this week based on Human Resources (HR) data, using a dataset provided as part of the #RWFD Real World Fake Data project managed by Mark Bradbourne.

The challenge focussed on reporting monthly headcount, using a dataset that contained 1 row per employee with a column relating to the start date (Hire Date) and a column relating to the date the employee left (Termdate), which could have been a future date or NULL/blank.

I’ve worked on headcount reporting before in my day job, but have used a snapshotted data source, which captures the active employees as at midnight on the 1st day of every month. This method makes counting the headcount each month very simple, as it’s just counting the number of rows per snapshot date.

Obviously, this wasn’t an option in this case. Luke gave very little clues as to what approach would be required, apart from saying ‘you will also need to manipulate the data’….

A word about the data…

Just a little side note here…. I ended up in discussions with my fellow #WOW participant Rosario Gauna as after attempting one method, I wasn’t getting as high numbers as the solution originally suggested. There were rows in the data set that had NULL values for most of the data except job title, but I think this was only around 225 rows, so even including them I didn’t get close to the figures. Rosario however, was somehow managing to see approx 15,000 rows of null data. I ended up building and publishing my initial solutions based on a version of the data sent to me directly by Rosario.

Since then, it seems others were also confused, and so the solution was adjusted to ignore all the null rows. As a result I reworked one of my solutions to use the original data source I downloaded from the site, and it’s this I will focus on in the blog.

A word about my solutions…

With little instruction, it took a bit of head-scratching to figure out how to work through the problem. I initially decided to use a scaffold dataset to help. I built a simple file in Excel which contained 1 column listing 1 date per month which went from 01 Jan 2017 to 01 Jan 2021. I then joined this to the HR data in Tableau Desktop using the physical data layer (ie I did not use relations), using join calculations as below (note, this is using the amended copy of data I received where Termdate was already defined as a date field.

The solution using this method is published here.

However, I didn’t feel completely comfortable with this approach. Luke had suggested the difficulty of this was 9/10 and referred to nested tableau calculations… the above solution just used quick table calcs. So I decided to see if I could come up with an alternative, just using the provided data, and that’s what I’ll document below.

Setting up the data

I downloaded the Human Resources.csv file from the link provided, and connected to it in Tableau Desktop. I found the Termdate field presented as a string field, so I converted it to a date, just by clicking on the ABC symbol of the data type against the field. I also added a data source filter to exclude all records where the Id field was NULL.

I then added a new data source, and connected to the same file again. This meant I had 2 instances of the Human Resources data source listed in my Data pane. I renamed the 2nd instance to be Leavers Only, changed the data type of the Termdate field again and added a data source filter so that only records with non-Null Termdate values were retained.

Note – What I will be doing here is using the Human Resources data source to manage the months we’re reporting headcount over. This method only works if there is at least 1 new starter every month, which in this case there is.

Building the calculations

The requirements state that a new starter doesn’t count in the monthly headcount figures until the following month, so the date field we need to use in the output, needs to be created as

Report Date

DATE(DATETRUNC(‘month’,DATEADD(‘month’,1, [Hire Date])))

This basically shifts the date the new starter joined to the 1st day of the following month.

To make the calculations easier to read, I also created

Starters

COUNT([2022_04_06_WW14_Human Resources.csv])

which is literally a reference to the field that automatically gets added as part of the connection – it’s just a bit of a mouthful as its named according the data connection.

Let’s pop these out in a table as below – I’m deliberately using the dateparts of the Report Date field as they’ll be used in the viz like this.

So this is just showing the new starters against the month at which they will count for headcount. We need to get details of leavers now, and we’ll do this by blending.

When I blend data sources, I tend to create specific fields so it’s clear to me that they’re being used for blending, rather than defining blend relationships between differently named fields. So in the Human Resources data source I’ve created

BLEND – Date

[Report Date]

and then in the Leavers data source, I’ve created

BLEND – Date

DATE(DATETRUNC(‘month’,DATEADD(‘month’,1,[Termdate])))

which is shifting the Termdate forward to the the 1st of the next month, since leavers should also only be recorded in the following month.

In the Human Resources data source I then created

Leavers Per Month

ZN(COUNT([2022_04_06_WW14_Human Resources – Leavers Only].[2022_04_06_WW14_Human Resources.csv]))

which is just referencing the automatically generated ‘count’ field from the Leavers data source. I’ve wrapped it in ZN so 0 is reported in the event no match is found.

Add Leavers Per Month into the table, and ensure the linking field from the secondary Leaver data source is connected on the BLEND – Date field

So now we know how many starters and leavers per month, we now need to total these up by generating a cumulative running sum of the starters, but subtracting any leavers along the way.

Official Headcount

([Starters] + PREVIOUS_VALUE(0)) – [Leavers Per Month]

Takes the value of the Starters in the current row, adds it to the value of Official Headcount from the previous row, then subtracts any Leavers recorded against the current row. Add this onto the table, and as its a table calculation, edit it so it is explicitly computing my the month & year of Report Date

For the annual change, we’re looking to compare the difference of the Official Headcount value for the current month eg Jan 2021, with the Official Headcount value for the same month 1 year ago eg Jan 2020.

Annual Change

(ZN([Official Headcount]) – LOOKUP(ZN([Official Headcount]), -12)) / ABS(LOOKUP(ZN([Official Headcount]), -12))

Take the Official Headcount from the current row, subtract the Official Headcount from 12 rows before, then divide the result by the Official Headcount from 12 rows before.

Format this to a percentage at 1 dp. (For some reason my numbers here don’t seem to match Luke’s revised solution…).

Again add to the sheet and set the table calcs of both the nested calcs to compute by month and year

The final measure was a bit of a strangely named one IMO. I felt I should be doing some rolling calculation, but ultimately, it just seemed to be reporting the difference between the headcount now and that 12 months ago ie the numerator of the calculation above.

Net Rolling 12 Change

(ZN([Official Headcount]) – LOOKUP(ZN([Official Headcount]), -12))

Pop this into the view, adjust the table calc settings again

Building the viz

On a new sheet, add Report Date at the month level, but then set to be continuous, to Columns and Report Date at the year level to Colour. Add Official Headcount to Rows (adding the linking field for the blend when prompted), and adjust the table calc settings to compute over both month & year, ensuring the Year of Report Date is listed first.

Then in the legend, select all the years from 2000-2017 and ‘hide’ (this is probably a cheat way, but the quickest – we need to retain the data from the previous years, so can’t just ‘filter’).

Edit the axis to not start from zero. Adjust colours to suit.

Add Annual Change and Net Rolling 12 Change to the Rows shelf, adjust both of the table calc settings, so all nested calcs are computing by Year & then Month.

Edit the axis of both these additional measures to also exclude 0. On the All marks card, click the Label button and tick the Show mark labels option.

Final steps ….

…on the All marks card, click the Tooltip button and uncheck Show tooltips so no tooltips display on hover.

Right click on the Month axis and format the axis so the dates in the Scale section are displayed as abbreviated dates

Edit the same axis and remove the title, then format the whole chart to

remove all row and column dividers
to set the row banding with a band size of 1
adjust the colour of the row and column gridlines to be a slightly darker shade of grey

And you’re done 🙂 My published viz is here

Had a bit of all sorts this week… I’ll be intrigued to see how Luke solved it!

Happy vizzin’!

Donna

Can you build a Premier League table?

November 13, 2020 Donna ColesLeave a comment

Luke provided this fun challenge this week. It was nice to see an English sport that I’m very familiar with feature, even if I had to keep retyping the word ‘Draw’ for ‘Tie’ 🙂

This blog will cover

Re-modelling the data
Building the table & bar chart
Building the Last 5 matches chart
Getting the data for the Last 5 matches chart tooltip

Re-modelling the data

Sometimes the #WOW challenges are set with a very specific direction in mind eg no LODs, use 1 sheet only, no data modelling etc. This challenge had no real restrictions, and there were minimal clues. The Latest Challenges page indicated they’d be table calculations involved

and Luke’s tweet introducing the challenge suggested there was a ‘trick’ that would make this a lot simpler to solve…. hmmmm???

I looked at the data referenced to get a feel of the shape and layout – it showed 1 row per game, with the home & away team for each game stored in different columns. Trying to count the number of games a specific team played was likely to be complicated in this format. Given Luke had suggested a ‘trick’, I decided I was going to pivot the data to expand the data to give me 1 row per game per team, hoping this was the ‘trick’ he implied, and there was no requirement to state that data modelling wasn’t allowed.

I did this in Tableau itself, in the data source pane. After connecting to the data, highlight the columns Home Team and Away Team, right click and select Pivot.

This has the effect of duplicating the rows of data and adding 2 additional columns to the end Pivot Field Names containing the column names that had been pivoted, and Pivot Field Values containing the values of those fields.

I then simply renamed Pivot Field Names to Home or Away and Pivot Field Values to Team.

With the data structured this way it was very simple to calculate the various measures required :

Win-Loss-Draw

IF [Home or Away] = ‘Home Team’ AND FTR = ‘H’ THEN ‘W’
ELSEIF [Home or Away] = ‘Away Team’ AND FTR = ‘A’ THEN ‘W’
ELSEIF FTR = ‘D’ THEN ‘T’
ELSE ‘L’
END

FTR in the data set stands for full-time result and indicated whether it was the Home (H) or the Away (A) team that won, or whether it was a Draw (D). So this field indicates whether the Team in each row won, lost or drew (tied) the match.

From this we can then work out the points the Team gained in the match.

Points

IF [Win-Loss-Draw] = ‘W’ THEN 3
ELSEIF [Win-Loss-Draw] = ‘T’ THEN 1
ELSE 0
END

To determine the total number of Wins for each team I created

Wins

FIXED [Team]: SUM(IF [Win-Loss-Draw] = ‘W’ THEN 1 ELSE 0 END)}

and then repeated this similarly to get a Losses and Ties measure.

Finally I determined the total number of games each team played by using the provided tablename (count) field that is now added in more recent versions of v2020 (supersedes the Number of Records field).

Matches Played

{FIXED [Team]: (COUNT([2020_11_11_WW46_EPL.csv]))}

Building the table & bar charts

In my initial built, I created one sheet to show both the table summary with the bar chart, and a separate sheet to show the last 5 matches, with the intention the charts would be lined up side by side on the dashboard, removing padding to make the charts look to be one.

But with the table and bar on the same chart, I couldn’t get the labels of each column and the ‘Total Points’ to be aligned in the way displayed. So I ended up using 3 different sheets.

Table

Given we’ve already got all the measures, this was a very simple table, with the Team field sorted descending by the Points measure.

Bar Chart

Again this is a pretty basic chart with Team on Rows,(which is then hidden) and Points on Columns, and Win-Lose-Draw on Colour.

To label the end of the chart with the total points, I created an additional measure

Total Points

{FIXED [Team]: SUM([Points])}

which I added to Columns and made dual axis, with the Total Points being a Gantt type, and the mark being labelled.

Building the last 5 matches chart

This chart is slightly more complex, and where the table calculations clue comes into play.

In the data, a match for each Team is identified by the Date, but we can’t use Date to plot against as not everyone has their matches on the same date. We just want to number each match played by each team in order of the date played.

We can use a field Index = INDEX() to number our matches per team – you’ll see that some teams have played 8 matches while others 7.

But we only want the last 5 matches, which for some teams will be matches 3-7 and others 4-8.

To do this, we need to determine the number of matches played. In hindsight I should have realised, I’ve already got this in a variable, but as my head was in ‘table calc’ mode, I created a field Size = SIZE()

As you can see, this returns the number of rows for each team against every row for that team.

I then created another calculated field

Last 5 matches only

[Index] > [Size]-5

Adding this into the table, and verifying the table calc settings for both the nested table calcs are set as above, you can see the last 5 rows of each team are set to True

This can then be added to the Filter shelf to reduce the rows shown.

However, as some teams have played less matches than others, the Index values shown aren’t consistent for every team, and we need this.

So another field is required

Index to Plot

//want a value from 1 to 5 based on the index
[Index] – ([Size]-5)

This gives us the 1-5 sequence for each row we need to build the chart

The layout below is the basic premise.

The circles need to be coloured based on the Win-Loss-Draw field, and the labelled with the same field which is set to match mark colour. Reducing the transparency of the colour will fade the circle colour and make the label stand out.

Getting the data for the Last 5 matches chart tooltip

So just by pivoting the data, I was able to crack through the measures and the charts pretty quickly, and felt happy that I’d worked out Luke’s trick.

But then I came to adding the tooltip to the circle chart which needs information about the opposition, and the score.

The score is always displayed on the tooltip as

Team ‘s score – Opposition score.

For example when Arsenal played Sheffield United at Home where Arsenal won 2-1, the score is displayed on the tooltip as 2-1.

And when Arsenal played Manchester City Away, where Manchester City won 1-0, the score for Arsenal is displayed on the tooltip as 0-1.

So to get the team’s score I created

Team Score

IF [Home or Away]=’Home Team’ THEN [Fthg] ELSE [Ftag] END

Where FTHG is Full Time Home Goals and FTAG is Full Time Away Goals. So if the team is listed as the Home team, their score is identified by FTHG, but if they are the Away team, their score is identified by FTAG.

Correspondingly, we need

Opposition Score

IF [Home or Away]=’Away Team’ THEN [Fthg] ELSE [Ftag] END

If the team we’re interested in is listed as the Away Team, then the opposition will be the Home team, so FTHG identifies the goals they scored, otherwise the opposition is the Away team, so FTAG is the goals they scored.

Both of these fields may sound complicated, but it was retrievable in the data I had.

However the actual name of the Opposition team wasn’t – this was lost once I pivoted the data – the Opposition is now located on a completely different row of data and as we have no ‘match id’ or similar in the data, I can’t actually identify who the opposition might have been, since multiple matches can play on the same date.

This is where I started to question whether ‘my trick’ was the right one…

I had a bit of a think, and came up with a solution, but it was a bit ‘off piste’ for a #WOW challenge as it required blending… typically when blending is required they’d be some mention of it in the requirements, but since there was nothing stated to say it couldn’t be used, I went with it.

I added the original unpivoted data set to my workbook as an additional data source. I renamed this with the word ‘Home‘ on the end.

When I work with blending, I try to be specific which fields the data sources are going to be linked/blended with.

So in the pivoted data source I’d been working with for most of the challenge, I create a new field

BLEND – Team

[Team]

Then in the Home data source I’d added, I also created a field with the same name, but it contained a different field.

BLEND – Team

[Home Team]

I then added another instance of the original unpivoted data source to my workbook, this time renaming it with the word Away on the end. In this data source I created

BLEND – Team

[Away Team]

so from the 3 data sources, I could now build up a table to verify I could get to the data I needed:

Pivoted Data.Team
Pivoted Data.Date
Home Data.Away Team
Away Data.Home Team

and when blended to the secondary data sources, the links for both BLEND – Team and Date fields are enabled

From this data, we can see that either the Home Team column or the Away Team column is populated, and provides the name of the opposition. So this means we can then create in the primary data source (the pivoted data source)

Opposition

IF ISNULL(ATTR([2020_11_11_WW46_EPL – Home].[Away Team])) THEN ATTR([2020_11_11_WW46_EPL – Away].[Home Team])
ELSE ATTR([2020_11_11_WW46_EPL – Home].[Away Team])
END

If the Away Team is NULL, then store the Home Team value, else store the Away Team value

And from this we now have all the building blocks we need to finalise the Last 5 matches chart.

Just make sure that as you add fields to the chart, you may need to alter the various table calculations to ensure they are still computing correctly (typically all the fields other than Team should be selected in the table calc dialog). And also keep an eye on the fields being blended – BLEND – Team and Date should always be selected for both.

After I posted my solution up, I did indicate I thought I might have done something a bit ‘out of the ordinary’, but apparently my approach match Luke’s! That was a real surprise.

My published version is here.

Happy vizzin’! Stay Safe!

Donna

Can you build a Strava workout calendar?

October 16, 2020 Donna ColesLeave a comment

For 2020 Week 42, the #WoW founder, Andy Kriebel, returned with a challenge to reproduce the Strava training calendar. Compared to some challenges recently, this looked to be quite straight forward; Andy threw in some specific requirements to test certain features – ie no data modelling and no LoDs.

I’ve been doing #WorkoutWednesday challenges since they first started, so I know that Andy is a stickler for formatting and layout – points not necessarily listed as a requirement, just expected as part of the challenge to reproduce. I kept my fingers tightly crossed when I published that I’d got all the finer details, but alas, Andy still found fault – my month summaries weren’t right aligned (my bad – missed that little nuance completely), and my bars had borders on them… Andy must have eyes like Superman to have seen that, as it wasn’t obvious. It also wasn’t a setting I’d intentionally added. I later found out that adding a particular type of pill to the Detail shelf caused borders to automagically be added… There’s always something to learn when Andy’s about!

So onto the challenge – as with previous weeks, I’m going to try to focus on the areas that may be a bit trickier / newer to some rather than detail the complete build step by step.

Using the data sets – blending
Building the calendar grid
Ensuring a 0 measure value is displayed for missing days
Adding the monthly hours summary
Building the BANs
Year Filter control
Remove highlighting
Setting the colour of the Calendar chart background

Using the data sets – blending

Andy was very specific that the 2 data sets provided should be used separately and not joined in any via the data pane.

This meant the data sources would need to be blended (further detail on this is here). Blending used to be one of the only ways within Tableau you could combine data together.

When blending, the number of rows in your output will never be more than the number of rows in your primary* data source. If there are multiple matching rows in the secondary data source, then the results will be aggregated in the display.

* whatever data source the first pill you add to your canvas comes from, will be the primary, and is denoted by a small blue icon by your data source. Secondary data sources are denoted by a small orange icon.

In the case of this challenge, we had a data set containing a list of dates (1 row per day from 01 Jan 2014 up to 31 Dec 2021), along with Andy’s Strava activity, containing a row for each activity recorded, which included the date time the activity occurred. This data could vary in that there could be multiple activities on the same day, and equally days when no activity occurred at all.

So the Calendar data set is our primary data source, as we need to show a bar on the calendar chart for every day of the year, regardless if there’s any activity. The Activity data set is our secondary data source. The number of hours, number of activities etc can all be aggregated from this data set.

When blending data sources, especially on dates, I prefer to create explicit calculated fields that define the fields I want to blend on. So in the Calendar data source I created

BLEND: Date

[Date]

essentially just a duplicate of the existing Date field, and in the Activity data source, I also created

BLEND: Date

DATE([Date Time])

Note the fields are spelled exactly the same, so Tableau automatically uses them as the linking fields when the view is built.

If you now do the following

Add the Calendar.BLEND: Date field to the Filter shelf, and select the Year = 2020,
Add Calendar.BLEND: Date as an exact date to Rows
Add Activity.BLEND: Date as an exact date to Rows
Add Activity.Seconds to Text

You can see that fields from the secondary data source have an orange icon by them; and that there are Null/missing values for the records from the secondary data source as these were the days when there was no activity recorded. You can also see a red link icon against the BLEND: Date field in the left hand data source pane, as this identifies how the two data sets are being matched.

Building the calendar grid

The calendar is essentially a ‘small multiple’ layout with each month being positioned in a particular row or column. To build out this layout we need to define the row number and the column number. There are many ways to build a dynamic small multiple grid which can flex based on the number of items you might be trying to organise, but for the purpose of this exercise, we can keep it simple. We’re working with 12 months that are to be displayed in a 4 x 3 grid layout. Create the following calculated fields in the Calendar data source.

Rows

IF MONTH([Date])<=4 THEN 0
ELSEIF MONTH([Date]) <=8 THEN 1
ELSE 2
END

Cols

(MONTH([Date])-1)%4

I make both of these to be dimensions rather than measures by dragging them above the line on the left hand data source pane. If you build out the view as below, you can see how these calcs are working

As we want to show a mark for every day in the month, we need to add the day of the month from the Calendar data source to Columns. Drag Calendar.BLEND: Date to Columns, then select the drop down to change to the Day date part

We need to show the amount of time in hours rather than seconds. In the Activity data source, create the field

Hours

([Seconds]/60)/60

and drag this onto the Rows, and change the mark type to bar. If need be re-add the YEAR(BLEND: Date) = 2020 to the Filter shelf. Now add Calendar.BLEND: Date as an exact date to the Detail shelf. You should now have

where you can see the gaps in the days where no activities took place, and if you hover vertically, you should find that the days of the month are vertically aligned – ie 30th Jan aligns with 30th May etc.

Ensuring a 0 measure value is displayed for missing days

With the above we displayed the Activity.Hours field, but if you hover over the day when there is no activity, nothing displays on the tooltip rather then 0.

To fix this, create a calculated field in the primary Calendar data source

Hours

ZN(SUM([Sheet1 (Activities Summary)].[Hours]))

This is basically just referencing the field in the secondary blended data source, but wrapping in a ZN() function means it will display 0 when no match can be found

Use this field from the primary data source instead on the calendar viz.

Adding the monthly hours summary

The requirements meant Andy expected the summary to be displayed within the same sheet as the daily calendar viz.

For this I used an old friend MIN(0) to create another axis, which is placed on the Rows in front of the Hours measure.

What I now plan to do is set this axis to be Text and plot the month, monthly hours, and the word ‘hours’ at a specific point to the right of the each cell – I’m choosing day 28 – you might want to experiment and choose a different day.

First up though, I need to build some fields to plot.

Month Name Abbrev

IF DAY([Date]) = 28 THEN
UPPER(LEFT(DATENAME(‘month’,[Date]),3))
END

Hours in Month

IF MIN(DAY([Date])) = 28 THEN
WINDOW_SUM([Hours])
END

LABEL: Hours

IF DAY([Date]) = 28 THEN ‘HOURS’ END

Month

DATE(DATETRUNC(‘month’,[Date]))

Add Month as an exact discrete date to Detail and the other 3 fields to the Text shelf of the Min(0) marks card (change the mark type to Text if you haven’t already done so). Alter the table calculation setting of the Hours in Month field to compute by all fields except Month

Building the BANs

These use a similar concept as above, by using 3 instances of MIN(0) placed side by side on the Columns shelf and set to the Text mark type. This creates 3 marks cards which you can then add the relevant measures and text on.

The measures are all coming from fields in the primary data source that reference measures in the seconday data source ie

# Activities (in Activity data source)

COUNT([Activity ID])

# Activities (in Calendar data source)

ZN([Sheet1 (Activities Summary)].[# Activities])

#Miles (in Calendar data source)

ZN(SUM([Sheet1 (Activities Summary)].[Miles]))

Year Filter Control

All the sheets you are building need to be filtered by the same BLEND: Date field from the Calendar data source (set the filter to Apply to all worksheets).

When this field is added to the dashboard, you can customise it so the All values does not show and the slider control also doesn’t display

Remove highlighting

To stop items on the dashboard from highlighting when they are clicked on, I use a trick that has been probably been the ‘most used trick of #WOW2020’ 🙂

In the primary data source, create a field called True which contains the value TRUE and a field False containing the value FALSE. Add both these fields to the Detail shelf of each sheet you don’t want highlighting on.

On the dashboard, create a Filter URL action for the each sheet that goes from the sheet on the dashboard to the sheet itself, and passes selected fields setting true = false. As this condition will never be true, then there is nothing to ‘filter’ so the marks don’t highlight. This needs to be repeated for each sheet on the dashboard, so I had 3 filter dashboard actions.

NOTE – a consequence of adding the True and False fields to the Detail shelf on the bar sheets, was that it caused a border to be added around the bars.

This wasn’t something I noticed, as it isn’t at all obvious, but Andy called it out!

Setting the colour of the Calendar chart background

You need to format the sheet and set the fill colour of the Pane rather than the whole sheet to grey.

There’s obviously a lot of other formatting settings to apply to get rid of all the row/column borders and gridlines etc, but this was a slight difference that I wanted to call out, as ended up with a ‘border’ on my dashboard that wasn’t required when I set the whole worksheet background.

Right, I think that’s about it for this week! Thanks for the fun challenge Andy – great to have you back!

My published viz is here.

Happy vizzin’! Stay Safe!

Donna

Can you calculate weekday run rate?

April 25, 2020 Donna ColesLeave a comment

This week’s #WoW2020 challenge was set by Meera Umasankar who once again was tackling the concept of ‘run rate’, but this time with an added twist – only consider the working days (ie the typical Mon-Fri weekdays), rather than every day in the month, the assumption being, this ‘business’ does not trade on weekends. Following on from last week’s challenge, Meera also chose to include a bit of blending to combine actual orders against the plan/target. For this Meera provided a custom dataset which just included a sheet of Actuals by Region for each day in April 2020 up to 23rd April, and Plan by Region for the whole month of April.

As per usual I started by putting together a table of data with the core numbers I was going to need per region : MTD value, Run Rate value, Plan value.

Building the key data fields

Whilst Meera had provided data just for April up to April 23rd, I decided to build this in a way as if the data could change.

Today

{FIXED: MAX([Date])}

This stores the maximum date from the Actuals data source – ie 23 April 2020.

Current Month Only

[Date]>=DATETRUNC(‘month’, [Today]) AND [Date] <= [Today]

When true, this will just consider the records in the Actuals data source that are dated between 1st April & 23 April. As it happens, due to the data provided, this will be everything, but in a typical business situation, you’re actuals would probably contain previous months data too.

MTD

IF [Current Month Only] THEN [Sales] END

Only stores the sales for the month we want to report on.

To get the plan we need to blend to the Plan data source. As the data in the Actual data source is per day, and the Plan is per month, we need to blend the data at the month level. Whilst this can be set in other ways, I like to be explicit when using blending, so in my Actuals data source I created

BLEND – Date

DATE(DATETRUNC(‘month’,[Date]))

This stores the 1st day of the month (1st April 2020) against every row of data.

In the Plan data I created a similar field, which is just essentially a duplicate field of the existing Date field, but by having the same name, it allows the blend joins to be automatically picked up.

Blend – Date

[Date]

Ok, let’s get these 2 measures on a table, to sense check we have the right figures so far :

Add Region to Rows (from the Actuals data)
Add MTD to Text
Add Plan to Text (from the Plan data)
Ensure the blend join links on both Region and BLEND – Date are clicked (due to the minimal data we have, the blend on Region only will work, but it’s good practice to include the date blend too if the Plan data contained different months).
Apply formatting as required to the MTD & Plan numbers

Calculating the Run Rate

Meera defines the Run Rate as being the value of Sales expected to be received in the whole month (the end of month position/forecast), based on the rate of sales so far in the month. So we’re looking to work out average sales made per day, then extrapolate that across the number of days in the month.

However, the twist in this challenge, is to only give consideration to the number of weekdays (ie working days).

As with many things, I chose to use my best friend ‘Google’ to see if it would throw up anything that may help this requirement, and it did, very quickly. There is an existing Tableau KB article that describes exactly how to work out the number of weekdays between 2 dates. You can find it here.

To work out the Run Rate I need

to work out the average Sales per weekday so far
multiply that by number of weekdays in the month

So I need to work out the number of working days between 1st April and 23rd April, and also the number of working days between 1st April & 30th April. I need a fair few calculated fields for all this, which I’ll build up rather than combine altogether.

Start of Current Month

DATETRUNC(‘month’,[Today])

simply truncates to 1st of month.

End of Current Month

DATEADD(‘day’, -1, DATEADD(‘month’, 1, [Start of Current Month]))

Adds 1 month onto start of month, then takes off 1 day to get the last day in the month

Following the steps in the article, I need to adjust these dates if they happen to fall on a weekend.

Start of Current Month (shift to weekday)

IF DATEPART(‘weekday’, [Start of Current Month]) = 1
THEN DATEADD(‘day’, 1, [Start of Current Month])
ELSEIF DATEPART(‘weekday’, [Start of Current Month]) = 7
THEN DATEADD(‘day’, 2, [Start of Current Month])
ELSE [Start of Current Month]
END

If the Start of Current Month lands on a Saturday or a Sunday, the start is shifted forward to the following Monday.

End of Current Month (shift to weekday)

IF DATEPART(‘weekday’, [End of Current Month]) = 1
THEN DATEADD(‘day’, -2, [End of Current Month])
ELSEIF DATEPART(‘weekday’, [End of Current Month]) = 7
THEN DATEADD(‘day’, -1, [End of Current Month])
ELSE [End of Current Month]
END

If the End of Current Month lands on a Saturday or Sunday, the end is shifted back to the previous Friday.

#Weekdays in Month

MIN(
(DATEDIFF(‘day’, [Start of Current Month (shift to weekday)], [End of Current Month (shift to weekday)]) + 1) – (2 * DATEDIFF(‘week’, [Start of Current Month (shift to weekday)], [End of Current Month (shift to weekday)])))

This is working out the number of days between the adjusted start & end dates, then adding 1 to this number. It then works out the number of weeks between the adjusted start & end dates, multiples by 2 (since in every week there are 2 weekend days), and then this number is subtracted from the first.

We then need to repeat this to work out the working days from start to today.

Today (shift to weekday)

IF DATEPART(‘weekday’, [Today]) = 1
THEN DATEADD(‘day’, -2, [Today])
ELSEIF DATEPART(‘weekday’, [Today]) = 7
THEN DATEADD(‘day’, -1, [Today])
ELSE [Today]
END

This is our end date, so the date is once again shifted back to the previous Friday if it happens to be a Saturday or Sunday.

# Weekdays from start to Today

MIN(
(DATEDIFF(‘day’, [Start of Current Month (shift to weekday)], [Today (shift to weekday)])+ 1) – (2 * DATEDIFF(‘week’, [Start of Current Month (shift to weekday)], [Today (shift to weekday)])))

Now we have these values, we can work out

Run Rate

(SUM([MTD])/[# Weekdays from Start to Today]) * [# Weekdays in Month]

Format this and add to check table

Building the chart

The left hand side of the chart is all text, but to present it as required, we need a fake axis.

From the Actuals data source, add Region to Rows
In the Columns shelf type in MIN(0) to create the fake axis.
Change the Mark Type to Text
Add Region, MTD, Run Rate from the Actuals data source to Text shelf
Add Plan from the Plan data source to the Text shelf (don’t forget to check the blend links)
Make each row bigger if everything all seems a bit squashed.

We’ll come back to formatting these fields later. Let’s now get the bar displayed & target displayed. This is a dual axis combining a bar and a gantt chart. Add to the chart as follows

Add Run Rate to Columns
Change the Mark Type of this measure to Bar
Remove all fields apart from Run Rate from the Label shelf of this card
Change Alignment of the Label to be left aligned
Add Plan to Columns (check the blend links)
Change Mark Type to be Gantt and remove fields from Label shelf of this card
Set to Dual Axis and Synchronise Axis.
Remove Measure Names from Colour shelf of the Bar and Gantt marks cards
Change Colour of the Gantt Bar (Plan) to black and add a black border to make it a bit thicker
Turn off Tooltips on All marks cards.

Indicating if Plan isn’t going to be met

The bar chart should be red if the Run Rate is less than Plan. The Run Rate on the Text side should also be displayed in red too if it doesn’t meet and black otherwise. We’re going to need some additional fields for this.

Run Rate < Plan

[Run Rate] < SUM([Plan (2020_04_22_WW17_Sales Projection)].[Plan])

Add this to the Colour shelf on the Bar marks card, and adjust the True/False colours accordingly

We can’t conditionally format an individual field in a Text display, so we need to create 2 further instances of the Run Rate field, where only one will ever display.

Run Rate < Plan (red)

If [Run Rate < Plan] THEN [Run Rate] END

Run Rate > Plan (black)

If NOT([Run Rate < Plan]) THEN [Run Rate] END

Format these accordingly, then add to the Text shelf of the Text marks card. Remove the original Run Rate field. You should still only have 1 run rate value displayed per row.

Now we can tidy up the display of this text. Ensure the Run Rate < Plan (red) and Run Rate > Plan (black) fields are on the same line of text with no spaces between, then colour the fonts to match the requirements

Finally, remove axis/row headers, tidy up gridlines etc, and adjust the width of the bars to suit.

Title Sheet

As the title needs to include the date, and to ensure it would be dynamic, I created a simple text sheet to display the title, and set the worksheet background to a light grey.

I then added both sheets to a dashboard, with both set to ‘Fit Entire View’, and titles hidden.

To get the Phone Layout display, I then selected the Phone option, clicked the padlock to Edit layout, and set to Fit all, and made adjustments to suit. The issue you might have though is that while things all look a bit squashed on your laptop display, it actually will render ok when published. This can unfortunately be a bit of trial & error.

My published viz is here.

Happy vizzin’ and stay safe!

Donna

Can you show the adjusted target and missing pipeline?

April 17, 2020 Donna ColesLeave a comment

For #WOW2020 Week 16, Lorna set a slightly different challenge that involved data blending. Blending is a technique in Tableau used to combine data from different data sources. You can read more about it here.

Lorna’s scenario is quite a common one – you have a data source which stores some ‘actual’ data (that in a typical scenario is likely to change as you move through the year), along with a more static data source, storing plan/budget/target data for each month. This is typically created at the start of the year and rarely changes. Comparing actuals to target is a very common business requirement.

Once again, I’m going to tackle this challenge but working out all the numbers I need for each month in a tabular format, before I go onto build the viz.

Building out the data

For this challenge we have 2 data sources, the pipeline data containing multiple years and the target data just containing data for 2020. So the first this we need to do is add a filter for Closed Date from the Pipeline Data source to be the Year 2020.

The data has been specially crafted as if it’s at a particular point in time in April, in my case at the point of building it was 15 April 2020. If this was being built for a real life scenario, we’d want to be reporting based off the Today() function. To simulate this, I created a calculated field to hardcode my ‘today’ date, but if I was doing this ‘for real’, I’d have set it to TODAY().

Today

#2020-04-15#

I need to be able to report the Pipeline Data that is at Stage=Closed Won separately from data that is still in the pipeline (hasn’t been closed as won or lost). I’ll use some calculated fields for this

Closed Won

ZN(IF Stage = ‘Closed Won’ THEN [Sales] END)

Note – the ZN will display as 0 if there is no Sales.

Pipeline

IF [Stage]= ‘Negotiating’ OR [Stage] = ‘Proposing’ THEN [Sales] END

Let’s start to build the table out:

Month of Closed Date on Rows
Closed Won and Pipeline on Cols (as Measure Values)
Year of Closed Date = 2020 on Filter Shelf

Let’s now add in the target from the Target Data source. This will be a blend. When we blend we need to define how to ‘join’ the data sources together. I prefer to make it obvious what fields I am blending on, so although I can use existing fields and define a rule, I prefer created explicit calculated fields so it’s clear.

The Target Data contains a record for each month, dated as per the 1st of each month. In the Target Data, create a new field

BLEND – Month

[Date]

The in the Pipeline Data, create a field named exactly the same

BLEND – Month

DATE(DATETRUNC(‘month’,[Closed Date]))

but in this case we’re truncating the Closed Date to the 1st of each month, and ensuring it too is a Date rather than Datetime data type, so the fields can match.

Now add Target from the Target Data onto the table. If you get a warning message, click ok, then click the ‘link’ symbol that is currently greyed out against the Blend – Month field in the Target Data.

The ‘link’ symbol will go red and indicate that the data is being ‘joined’ on this field. The Target values in the table will now match the values if you check the data source excel file directly, and the Target pill in the Measure Values will show a ‘database’ symbol with an ‘orange tick’ which indicates it’s from a secondary data source. The data sources listed in the Data pane (top left) will also be coloured blue (primary) and orange (secondary), which indicates data blending is being used.

We now need to start working out how much off the YTD target we are so far, so we then work out how much pipeline is potentially missing from each future month.

So first up, how much has been closed won so far this year (only considering complete months). Ie how much has been won in Jan, Feb & March?

YTD Closed

WINDOW_SUM(SUM(IF [Closed Date] < DATETRUNC(‘month’, [Today]) THEN [Closed Won] END))

If the Closed Date is before the 1st of the current month (ie April in this example), get the Closed Won value already computed, but SUM all the values we have for all the months.

Add this onto the table, and you can see the total of the Closed Won values for Jan, Feb & Mar is listed against every month.

The table calc has automatically computed ‘table down’, but I’m, going to explicitly set it, as I know I’m going to move the fields around later, and I don’t want that value to change based on where it gets moved to,

Right click on the YTD Closed pill -> Edit Table Calculation and check Month of Closed Date

We need to work out how much we should have closed in the first 3 months of the year too. So in the Target Data, create a similar calculated field

YTD Target

WINDOW_SUM(SUM(IF [Date] < DATETRUNC(‘month’, [Today]) THEN Target END))

(Note – a Today calculated field also hardcoded to 15th April 2020 needs to be added to this data source too).

Add this field into the table too, and again, set the table calculation to be explicitly set against Month of Closed Date.

In order to work out how much missing pipeline to add to each month, we need to figure out how far we’re currently ‘off’, and then distribute this value across the remaining months.

I’m doing all this in steps, so I can sense check the calcs as I go. We can work out how much we’re off by creating a new field in the Pipeline Data

Missed Sales Value

[Monthly Target (2020_04_15_WW16_Sales Pipeline)].[YTD Target] – [YTD Closed]

Basically this is YTD Target – YTD Closed, but when you refer to a field from the secondary data source, the field will be prefixed by the data source name.

Add this to the table, and again verify the table calc is set explicitly.

As this is a field based on other table calcs, you will see them listed as Nested Calculations, and you need to verify each one listed is set appropriately.

To work out how many months in the year are remaining that we need to distribute the above value over, we need

Remaining Months

12 – (DATEPART(‘month’, [Today]) -1)

As Today is in April, which is month 4, then the remaining months is 12 – (4-1) = 9.

An now we can work out how much needs to be added per month

Distributed Missed Sales Value

[Missed Sales Value] / [Remaining Months]

Pop this onto the table, and verify the table calc again.

Now we have this, we can work out what the Target needs to be adjusted to for each of the remaining months to make up the shortfall, which is basically adding the monthly shortfall above to the existing Target for the month (but only for the current and future months).

Adjusted Target

IF MIN([Closed Date]) >= DATETRUNC(‘month’, [Today]) THEN SUM([Monthly Target (2020_04_15_WW16_Sales Pipeline)].[Target]) + [Distributed Missed Sales Value] END

Note – we wrap Closed Date in a MIN function as we’re working with aggregated fields, so the Date needs to be aggregated too. MAX would work just the same.

Finally we need to work out what the shortfall is in the existing Pipeline to meet the Adjusted Target (if there is any).

For the months beyond the current month, this is simply the difference between the Pipeline value and the Adjusted Target (but only if the Pipeline is less than the Adjusted Target). For the current month though, it’s the difference between the Pipeline + Closed Won values and the Adjusted Target.

Missing Pipeline

IF ZN(SUM([Pipeline])) = 0 THEN NULL

ELSEIF DATETRUNC(‘month’, MIN([Closed Date])) = DATETRUNC(‘month’,[Today]) THEN
//it’s current month, so need to consider what’s closed & what’s remaining
IF (SUM([Closed Won]) + SUM([Pipeline])) < [Adjusted Target]
THEN ZN([Adjusted Target] – (SUM([Closed Won]) + SUM([Pipeline])))
END

ELSEIF SUM([Pipeline]) < [Adjusted Target] THEN ZN([Adjusted Target] – SUM([Pipeline]))

ELSE 0
END

Add this onto the table

And we’ve now got all the pieces we need to start to build the viz. Name this sheet Check Data or similar. We want this as our reference sheet to make sure our figures remain correct.

Building the Bar Chart

Firstly, duplicate the table viz, and remove the fields we don’t need in the final display (YTD Closed, YTD Target, Missed Sales Value, Distributed Missed Sales Value).

Now move the pills as follows :

Closed Date from Rows to Columns
Measure Values from Text to Rows
Measure Names from Rows to Colour shelf
Change Mark Type to Bar

Now move Adjusted Target and Target to the Detail shelf.

Adjust the colours of the remaining measures to suit, and reorder, so that the bars a stacked with Closed Won on the bottom and Missing Pipeline on the top.

Before we deal with the target lines, we’re going to sort the Tooltip out. It’s quite tricky… it might be there’s a better way, but I had to create a few custom calculated fields to get the display required.

Creating the Tooltip

For the first 3 months, the tooltip just needs to display the Closed Won value, but from April onwards, we need to display values for Closed Won, Pipeline & Missing Pipeline, even if the values are 0. Also the first 3 months just show the Target, but the remaining months need the Adjusted Target too. These values are displayed with | symbols in between along with labels, which should only show if relevant.

Firstly, we need to make sure all the measure values displayed, are accessible regardless as to which bar we hover over. So all of the 3 measures (Closed Won, Pipeline & Missing Pipeline) need to be added to the Tooltip. This is done by holding down Ctrl as you drag each pill from the Measure Values area onto the Tooltip shelf. This has the effect of duplicating the pill, and retaining any table calc settings that have been applied.

We only want the text ‘| Adjusted Target :’ to display if there is an Adjusted Target value :

Tooltip : Adjusted Target

IF [Adjusted Target] > 0 THEN ‘ | Adjusted Target : ‘
END

Add this to the Tooltip shelf.

We only want the text ‘| Pipeline :’ to display if there is a Pipeline value

Tooltip : Pipeline

IF [Pipeline] > 0 THEN ‘ | Pipeline : ‘
END

Add this to the Tooltip shelf.

And we only want the text ‘| Missing Pipeline:’ to display if we’re in the current or future months.

Tooltip : Missing Pipeline

IF DATETRUNC(‘month’, [Closed Date]) >= DATETRUNC(‘month’,[Today]) THEN ‘ | Missing Pipeline : ‘ END

Add this to the Tooltip shelf.

Now modify the Tooltip so the various pills are referenced and formatted as required

Finally adjust the Month axis, to set the months to be displayed as abbreviated values.

Adding the Target lines

At first glance, you might think the two target lines are both reference lines. However, if you hover over the tooltip of the Target (the solid line), you’ll see you have the same tooltip as the bars. Whilst there is some ability to control the tooltip of a reference line now, you can’t reference all the pills this tooltip requires.

So the Target is actually a dual axis mark. The Adjust Target however, is a reference line.

To get the Target to display, hold ctrl & drag the Target pill from the Detail shelf to the Rows shelf (to duplicate the pill), next to Measure Values.

On the Target marks card,

Remove Measure Names from the Colour shelf
Change the Mark Type to Gantt
Change the Colour to black, and add a black border too (to make the mark thicker)
Make the chart Dual Axis and Synchronise Axis
Uncheck Show Header on the Target axis

If you hover over the Gantt mark/Target line, you should have the same tooltip as when you hover over the bar.

The Adjusted Target is a reference line. To add this, right click on the left hand axis and Add Reference Line. Adjust settings as follows :

Scope – per cell
Value – Adjusted Target
Label – None
Tooltip – Custom, set to ‘Adjusted Target (Dashed) :’ then add Value from the selector
Change the Line to be black and dashed

Both target lines should now be displayed. It’s just now a case of applying some formatting to remove gridlines, row & column lines, adjust font sizes and remove axis title and column titles.

Building the legend

The dashboard displays a custom colour legend. As always there are multiple ways to do this. I chose to ‘fake it’ using aliases and some values associated to a completely different and unused dimension in the data.

Duplicate the Opportunity Name dimension. I just left it as Opportunity Name (copy). On a new sheet, add Oppotunity Name (copy) to the Filter shelf, and select 5 values only.

Then right click on Opportunity Name (copy) and select Aliases. For each of the values you selected in the filter, set an alias based on the legend names to display

Then build the legend as follows

Add Opportunity Name (copy) to Columns
Type in MIN(1) to the Columns shelf to create a fake axis
Add Opportunity Name (copy) to the Text shelf
Add Opportunity Name (copy) to the Colour shelf
Fix the axis of Min(1) to start at 0 and end at 1
Reorder the displayed values to suit.
Format to remove all rows/column lines and hide the headers.
Format the Label to be centred and size font
Clear the tooltip.

Note – I chose to copy the Opportunity Name pill just to make sure I didn’t inadvertently break anything, and to easily revert if things didn’t go to plan :-).

Now the 2 sheets can be placed on the dashboard along with a suitable title.

One final tip – to prevent the user from inadvertently clicking on the legend viz when on the dashboard, add a floating blank image and position over the top of the legend.

My published viz is here.

Happy vizzin’ & stay safe!

Donna

Donna + DataViz

ramblings in all things Tableau (and occasionally some other Stuff….)

Tag: Blending

Can you create a waffle chart?

Can you track headcount?

Can you build a Premier League table?

Can you build a Strava workout calendar?

Can you calculate weekday run rate?

Can you show the adjusted target and missing pipeline?