# Can you visualise daily and weekly sales in the same view?

This week, another guest poster, Jami Delagrange, provided the #WOW2020 challenge. And I’m not gonna lie, I found this pretty tough.

The challenge suggests you can build the data model yourself, and I thought about doing this, but then figured I’d solve the visualisation challenge itself before looking back at the data model… and I’m glad I did, because it took some time to get through the challenge (and I haven’t attempted to build the model – I’ll wait for Jami’s solution for that).

So what were the areas I struggled with….

Initially it was the mark type, and how to get the ‘bars on bars’, in such a way that the fat weekly bar was sat in the middle of the thin daily bar. I tried a variety of things and spent a lot of time, with no success, before I gave up and had to look at Rosario Guana‘s solution. The answer Gantt bars. But I’d already attempted them, and hadn’t made much progress, and even after trying again having seen that Gantt;s were needed, I was still puzzled. So I had to have another look, and realised that my problem was that I was trying to plot a measure to make a vertical axes (ie put a measure on the rows), and that wasn’t needed.

So now armed with those 2 bits of information – I needed Gantt bars and set the measures for the dual axis on the Columns – I could start figuring out all the pieces I needed.

Within this blog, I’ll focus on

• Calcs required for plotting the Weekly Sales data
• Calcs required for plotting the Daily Sales data
• Building the Gantt chart
• Comparison filter
• Event Group filter

Weekly Sales data

There isn’t a single ‘launch date’, across all the events, but the charts all start at the ‘same point’, so we can’t be plotting the date on the axis. Instead we need to calculate the week no relative to the launch date per event, using

Week of Launch Date

DATE(DATETRUNC(‘week’,[Launch Date],’Monday’))

(the requirements state the week should start on a Monday)

Week of Actual Date

DATE(DATETRUNC(‘week’,[Actual Date],’Monday’))

Week No From Launch

DATEDIFF(‘week’, [Week of Launch Date], [Week of Actual Date]) + 1

This returns an integer starting from 1, and is essentially counting the weeks from launch up until the week of the Event itself.

For the weekly data, we also need the ticket sales for the week, the cumulative weekly ticket sales (both displayed on the tooltip), the total sales and the % of sales per week (used to colour the mark, and displayed on the tooltip).

Ticket Sales

ZN([Sold Amount])

by wrapping this in ZN the field will return 0 in the event there are no tickets sales in a week. This means a mark will exist for every week.

For the cumulative ticket sales, I just used a Running Sum quick table calculation on Ticket Sales, so didn’t create a dedicated field, but if I had if would have been RUNNING_SUM(SUM([Ticket Sales])).

Total Sales Per Event

TOTAL(SUM([Sold Amount]))

% Total Sales

RUNNING_SUM(SUM([Ticket Sales])) / ([Total Sales Per Event])

This is formatted to a percentage with 1 dp.

If we put this all out into a table as follows, then you can see how the data all holds together

• Event Description (Dim Event) &Week No From Launch (blue discrete pill) on Rows
• Measure Names on Columns/ Measure Values on Text, filtered to measures : Ticket Sales, Ticket Sales (running sum table calc), Total Sales Per Event, & % Total Sales.. All table calcs are set to compute using Week No from Launch.

Daily Sales Data

For each week of sales, we need to plot 7 marks for each day of sales. Again we can’t plot using the actual date of sale. Instead we need to ‘index’ each day in the week – ie number each day from 1 – 7.

The data set already has a Day of Week field associated to Actual Date, but this returns 1 for a Sunday and 7 for a Saturday, but we need 1 to be associated to a Monday, so we just need to translate this field

Day No of Week

IF [Day of Week] = 1 THEN 7
ELSE [Day of Week] – 1
END

The days plotted are also coloured based on which ‘type of day’ they are

Type of Day

IF [Actual Date] = [Launch Date] THEN ‘Launch Date’
ELSEIF [Actual Date] = [Event Date (Dim Event)] THEN ‘Event Date’
ELSE ‘Regular’
End

Once again let’s plot the info we need into a table, so you can see what’s going on:

Put Event Description (Dim Date), Actual Date (exact date, discrete blue pill), Week No from Launch, Day Of Week Abbrev, Day No of Week, Type of Day on Rows and Ticket Sales on Text.

So these are the basic building blocks we need.

Building the Gantt Chart

For the ‘daily’ view, we need

• Event Description (DimEvent) on Rows
• Week No from Launch (discrete, blue bill) on Columns
• Day No of Week (Continuous, dimension, green pill) on Columns

By just placing these 3 pills in these locations, the mark type is ‘automatic’ but has immediately selected the ‘gantt bar’ type. Change it to be specific.

You can now add Type of Day to Colour and format the marks appropriately (change the colour, set a border around each mark).

For the ‘weekly’ view, we just need a way of creating a mark that ‘spans’ the 7 days, as we’ve already ‘segmented’ the display by Week No From Launch. I typed in MIN(0) to Columns for this

I then ‘typed’ in Min(1) on the Marks shelf and added it to the Size shelf

Editing the MIN(0) axis to fix it from 0 to 1, fills the space, and then adjust the Size of this mark to make it narrower.

If you now make this ‘dual axis’, you’ve got the view you need

And now it’s just a case of formatting the colours, the Tooltips for each mark, hiding axis headers etc.

Comparison Filter

This filter changes the order in which the rows are displayed.

This is driven by a Parameter which contains the values ‘Event Year’ and ‘YoY Event’ and a calculated field

Display

IF [Comparison] = ‘Event Year’ THEN LEFT([Event Description (DimEvent)],4)
ELSE RIGHT([Event Description (DimEvent)],2)
END

Add the Display to the Rows in front of the Event Description (DimEvent) field and hide it.

Event Group Filter

Another parameter Event Group is created which I’ve set to store integers but display as a string

I also then created

Event Group Filter

([Event Group] = 1 AND QUARTER([Event Date (Dim Event)])<=2) OR ([Event Group] =2 AND QUARTER([Event Date (Dim Event)])>2)
OR [Event Group]=0

which is added to the Filter shelf and set to true.

And hopefully that should be all you need to get this challenge sorted. My published viz is here, and if you download, I also have other sheets which try to breakdown some of the information.

Happy vizzin’! Stay safe!

Donna

# Can you show the Year-on-Year Trend?

It’s Community Month over at #WOW HQ this month, which means guest posters, and Kyle Yetter kicked it all off with this challenge. Having completed numerous YoY related workbooks both through work and previous #WOW challenges, this looked like it might be relatively straight forward on the surface. But Kyle threw in some curve balls, which I’ll try to explain within this blog. The points I’ll be focussing on

• YoY % calculation for colouring the map
• Displaying the circles on the map
• Restricting the Date parameter to 1st Jan – 14th July only
• Showing Daily or Weekly dates on the viz in tooltip
• Restricting to full weeks only (in weekly view)

YoY % calculation

The data provided includes dates from 1st Jan 2019 to 21st July 2020. We need to be able to show Current Year (CY) values alongside Previous Year (PY) values and the YoY% difference. I built up the following calculations for all this

Today

#2020-07-15#

This is just hardcoded based on the requirement. In a business scenario where the data changes, you may use the TODAY() function to get the current date.

Current Year

YEAR([Today])

simply returns 2020, which I could have hardcoded as well, but I prefer to build solutions as if the data were more dynamic.

CY

IF YEAR([Subscription Date]) = [Current Year] THEN [Subscription] END

stores the value of the Subscription field but only for records associated to 2020

PY

IF YEAR([Subscription Date]) = [Current Year]-1 THEN [Subscription] END

stores the value of the Subscription field but only for records associated to 2019 (ie 2020-1)

YoY%

(SUM([CY])- SUM([PY]))/SUM([PY])

format this to a percentage with 0 decimal places. This ultimately is the measure used to colour the map. CY, PY & YoY% are also referenced on the Tooltip.

Displaying circles on the map

This is achieved using a dual axis map (via a second instance of the Latitude pill on Rows). One ‘axis’ is a map mark type coloured by the YoY% and the other is a circle mark type, sized by CY, explicitly coloured black.

The Tooltip for the circle mark type also shows the % of Total subscriptions for the current year, which is a Percent of Total Quick Table Calculation

Restricting the Date parameter to 1st Jan – 14th July only

As mentioned the Subscription Date contains dates from 01 Jan 2019 to 21 July 2020, but we can’t simply add a filter restricting this date to 01 Jan 20 to 14 Jul 20 as that would remove all the rows associated to the 2019 data which we need available to provide the PY and YoY% values.

So to solve this we need a new date field, and we need to baseline / normalise the dates in the data set to all align to the same year.

Baseline Date

//set all dates to be based on current year
MAKEDATE([Current Year], MONTH([Subscription Date]), DAY([Subscription Date]))

So if the Subscription Date is 01 Jan 2019, the equivalent Baseline Date associated will be 01 Jan 2020. The Subscription Date of 01 Jan 2020 will also have a Baseline Date of 01 Jan 2020.

We also want to ensure we don’t have dates beyond ‘today’

Include Dates < Today

[Baseline Date]< [Today]

Add Include Dates < Today to the Filter shelf, and set to True.

Add Baseline Date to the Filter shelf, choose Range of Dates , and by default the dates 01 Jan 2020 to 14 Jul 2020 should be displayed

Select to Show Filter, and when the filter displays, select the drop down arrow (top right) and change to Only Relevant Values

Whilst you can edit the start and end dates in the filter to be before/after the specific dates, this won’t actually use those dates, and the filter control slider can only be moved between the range we want.

The Baseline Date field should then be custom formatted to mmmm dd to display the dates in the January 01 format.

Showing Daily or Weekly dates on the viz in tooltip

The requirements state that if the date range selected is <=30 days, the trend chart shown on the Viz in Tooltip should display daily data, otherwise it should be weekly figures, where the week ‘starts’ on the minimum date selected in the range.

There’s a lot going on to meet this requirement.

First up we need to be able to identify the min & max dates selected by the user via the Baseline Date filter.

This did cause me some trouble. I knew what I wanted, but struggled. A FIXED LOD always gave me the 1st Jan 2020 for the Min Date, regardless of where I moved the slider, whereas a WINDOW_MIN() table calculation function caused issues as it required the data displayed to be at a level of detail that I didn’t want.

A peak at Kyle’s solution and I found he’d added the date filters to context. This means a FIXED LOD would then return the min & max dates I was after.

Min Date

{MIN([Baseline Date])}

Note this is a shortened notation for {FIXED : MIN([Baseline Date])}

Max Date

{MAX([Baseline Date])}

With these, we can work out

Days between Min & Max

DATEDIFF(‘day’,[Min Date], [Max Date])

which in turn we can categorise

Daily | Weekly

IF [Days between Min & Max]<=30 THEN ‘Daily’ ELSE ‘Weekly’ END

We also need to understand the day the weeks will start on.

Day of Week Min Date

DATEPART(‘weekday’,[Min Date])

This returns a number from 1 (Sunday) to 7 (Saturday) based on the Min Date selected.

Using this we can essentially ‘categorise’ and therefore ‘group’ the Baseline Date into the appropriate week.

Baseline Date Week

CASE [Day of Week Min Date]
WHEN 1 THEN DATETRUNC(‘week’,([Baseline Date]),’Sunday’)
WHEN 2 THEN DATETRUNC(‘week’,([Baseline Date]),’Monday’)
WHEN 3 THEN DATETRUNC(‘week’,([Baseline Date]),’Tuesday’)
WHEN 4 THEN DATETRUNC(‘week’,([Baseline Date]),’Wednesday’)
WHEN 5 THEN DATETRUNC(‘week’,([Baseline Date]),’Thursday’)
WHEN 6 THEN DATETRUNC(‘week’,([Baseline Date]),’Friday’)
WHEN 7 THEN DATETRUNC(‘week’,([Baseline Date]),’Saturday’)
END

Ideally we want to simplify this using something like DATETRUNC(‘week’, [Baseline Date], DATEPART(‘weekday’, [Min Date])), but unfortunately, at this point, Tableau won’t accept a function as the 3rd parameter of the DATETRUNC function.

Let’s just have a look at what we’ve got so far

So now we have all this information, we need yet another date field that will be plotted on the date axis of the Viz in Tooltip.

Date to Plot

IF [Days between Min & Max] <=30 THEN ([Baseline Date]) ELSE [Baseline Date Week] END

If you add this field to the tabular display I built out above, you can see how the value changes as you move the filter dates to be within 30 days of each other and out again.

When added to the actual viz, this field is formatted to dd mmm ie 01 Jan, and then is plotted as a continuous, exact date (green pill) field on the Columns alongside the Daily | Weekly field, with State & Subscription on Rows. The YEAR(Subscription Date) provides the separation of the data into 2 lines.

Restricting to full weeks only (in weekly view)

The requirements state only full weeks (ie 7 days of data) should be included when the data is plotted at a weekly level. For this we need to ascertain the ‘week’ the maximum date falls in

Max Date Week

CASE [Day of Week Min Date]
WHEN 1 THEN DATETRUNC(‘week’,([Max Date]),’Sunday’)
WHEN 2 THEN DATETRUNC(‘week’,([Max Date]),’Monday’)
WHEN 3 THEN DATETRUNC(‘week’,([Max Date]),’Tuesday’)
WHEN 4 THEN DATETRUNC(‘week’,([Max Date]),’Wednesday’)
WHEN 5 THEN DATETRUNC(‘week’,([Max Date]),’Thursday’)
WHEN 6 THEN DATETRUNC(‘week’,([Max Date]),’Friday’)
WHEN 7 THEN DATETRUNC(‘week’,([Max Date]),’Saturday’)
END

so if the maximum date selected is a Thursday (eg Thurs 11th June 2020) but the minimum date happens to be a Tuesday, then the week starts on a Tuesday, and this field will return the previous Tuesday date (eg Tues 9th June 2020).

And then to restrict to complete weeks only…

Full Weeks Only

IF [Daily | Weekly]=’Weekly’ THEN
[Date To Plot]< [Max Date Week]
ELSE TRUE
END

If we’re in the ‘weekly’ mode, the Date To Plot field will be storing dates related to the start of the week, so will return true for all records where the field is less than the week of the max date. Otherwise if we’re in ‘daily’ mode we just want all records.

This field is added to the Filter shelf and set to true.

Hopefully that covers off all the complicated bits you need to know to complete this challenge. My published solution is here.

Happy vizzin’! Stay Safe!

Donna

# Can you find all counties within n miles of a selected county?

I’ve been on annual leave for a few weeks. I’ve managed to catch up on all the challenges but haven’t blogged a solution for a while. It’s been a real struggle to get back into things to be honest – back to work, back to school, back to football clubs for my kids, and I’m wondering where I found the time before ðŸ˜¦

Trying to think about when/if I was going to get this blog out has caused me a bit of stress, which I don’t need, and so I need to change my mindset a bit…try to relax a bit … if I don’t manage to post a blog, just accept it and move on. I’ve also got to try to reduce the time it takes me to blog. I’m a ‘detail’ person, so often end up documenting to such a low level, but for my own sanity, I’m going to have to make an effort to be a bit briefer. I can’t guarantee I’ll stick to this though.. will just have to see how things pan out.

Going forward, I’m going to try to focus on the points that I think are key to the challenge, or those I found a bit tricksy.

So onto this week’s challenge. Sean Miller returned as guest poster with a COVID-19 related distance challenge.

Note – I connected to the provided hyper extract file rather than the csv file.

The core points I’m going to discuss are

• Identify counties within n miles
• How to select a county without set actions
• How to handle ‘All’ counties being selected
• Colouring based on ‘percentile’ of hospital beds
• How to only show county borders of counties within selected range

Identify counties within n miles

The number of miles is stored within an integer parameter, n miles, that is defaulted to 100.

A county to act as the ‘start point’ is stored in a set, Selected County, based on the County, State field.

The long/lat coordinates of this selected county need to be captured.

Selected County Lat

{FIXED : AVG(IF [Selected County] THEN [Latitude] END)}

Selected County Long

{FIXED : AVG(IF [Selected County] THEN [Longitude] END)}

By using a FIXED LoD calculation, the values are stored against every ‘row’ in the data set.

With these, the starting point can be determined

Selected County Start Point

MAKEPOINT([Selected County Lat],[Selected County Long])

The position of every other county – the ‘destination’ / end point – also needs to be determined

End Point

MAKEPOINT([Latitude],[Longitude])

With these, the distance between can be computed

Distance

DISTANCE([Selected County Start Point],[End Point],’miles’)

And the counties can then be restricted by

Within n miles

[Distance]<= [n miles]

which can be used as a filter and set to True.

How to select a county without set actions

This is managed via the set control feature; right click on the Selected County set and choose Show Set to display the list of counties with the option to select which ones are in or out of the set. Change the display of the control via the dropdown arrow (top right) to be Single value dropdown which automatically provides as ‘All’ option or the ability to select a single set only.

How to handle ‘All’ counties being selected

When ‘All’ is chosen via the Set Control selector, this has the effect of adding all the State, County values into the set, which means we don’t really have a starting state. So the n miles parameter is essentially redundant. But we need to make the Within n miles filter understand this.

We can manage this by first identifying how many values we have in the set

Count Selected States

{FIXED : COUNTD(IF [Selected County] THEN [County, State] END)}

There will either be 1 or if All is selected, then they’ll be 3000+, and we use the FIXED LoD, so the total is stored against all rows.

We can then update our Within n miles filter to consider this value

Within n miles

([Count Selected States]=1 AND [Distance]<= [n miles]) OR [Count Selected States] > 1

This now returns true if either one State, County is selected and the other records are within n miles OR all the states are selected (the count is > 1).

Colouring based on ‘percentile’ of hospital beds

This caused me a little bit of headscratching. I assumed ‘percentile’ would be based on a percentage of the total num_staffed_beds (note I simply renamed this field to Hospital Beds), associated to the state,counties being displayed (ie within n miles). But after building the calculation I thought, and adding it to colour and choosing the Green-Gold colour range, I didn’t get the same spread as shown in the solution.

I messaged Sean to question whether I’d made the right assumption, but while waiting for a response, I did an online search for “Tableau Percentile Rank”, and quickly spotted that a Table Calculation exists. As you can guess, I don’t use this particular calculation very much at all ðŸ™‚

So to colour, add Hospital Beds , and simply add a Percentile table calculation

How to only show county borders of counties within selected range

When working with Maps, you can use the options within the Map Layers menu to select which features of the map should be displayed. One of these options is State/Province Borders, which you might think is what is needed.

However, this will show all the state borders, which is evident if you zoom out a bit, including those for states that don’t have counties within n miles.

This isn’t the requirement – we only want to show the borders of the states which have counties within the desired range. So instead, we don’t want to show any State/Province Borders via the Map Layers. And we’ll utilise a dual axis instead, by first identifying the states that are within n miles

States within n miles

If [Within n miles] THEN [State] END

Duplicate the Latitude (generated) field on the Rows, remove all the fields on that mark, and then add the States within n miles to the Detail shelf. Set the colour of this mark to white, and swap the Latitude (generated) fields if needed, so the States are on the top, as below.

Then make dual axis ðŸ™‚

And that’s the key points that I hope will help you solve this challenge. My published version is here, which is available to be downloaded. If there’s anything I haven’t covered that you’re not sure about, feel free to contact me.

Happy vizzin’! Stay safe!

Donna

# Is Profit Ratio Influenced by Quantity?

For this week’s challenge, Ann asked we recreated a scatter plot that ‘on click’ became a connected scatter plot and also displayed an ‘insights’ box to the user.

Building out the data

The scatter plot is to show Profit Ratio by Quantity, so first up we need

Profit Ratio

SUM([Profit])/SUM([Sales])

formatted to percentage with 0 dp.

A mark is shown on the scatter plot per Category and Month/Year. When working with dates like this, I often prefer to created a dedicated field to use as an ‘exact date’ which I can then format. So I built

Month Year

DATE(DATETRUNC(‘month’,[Order Date]))

and formatted it to the March 2001 format (ie mmmm yyyy).

We’re also going to need a set based on the Category field, as that is going to drive the interactivity to help us connect the dots ‘on click’. So right click on Category and Create Set and just tick one of the categories displayed for now.

Selected Category

Let’s build these out into a table as follows

• Order Date to Filter, filtered by years 2018 & 2019
• Category, Selected Category & Month Year (exact date , discrete) onto Rows
• Measure Names to Text and Filter to Quantity & Profit Ratio
• Right click on Selected Category and select Show Set to display a selection control to easily allow the set values to be changed

To help drive the interactivity, we’re going to need to store the value of the Profit Ratio for the Selected Category in the set.

Selected PR

IF ATTR([Selected Category]) THEN [Profit Ratio] END

Format this to percentage 0 dp too, and add this to the table – it will only display the profit ratio against the rows associated to the category in the set

At this point, we’ve got what we need to build out the scatter plot, but I’m going to continue to create the fields needed for the ‘Insights’ piece.

These are looking for the months that contained the Max & Min values for the Profit Ratio and Quantity. I’m going to tackle this with Table Calcs.

Max PR per Category

WINDOW_MAX([Profit Ratio])

Adding this to the table, and setting the table calculation to compute by Month Year only, gives us the Max Profit Ratio for each Category

We need 3 more fields like this :

Max Qty per Category

WINDOW_MAX(SUM([Quantity]))

Min PR per Category

WINDOW_MIN([Profit Ratio])

Min Qty per Category

WINDOW_MIN(SUM([Quantity]))

Add all these to the table, setting the table calc in the same way as described above.

Now we know what the min & max values are, we need to know the month when these occurred.

Max PR Month

WINDOW_MAX(IF ATTR([Selected Category]) AND [Profit Ratio] = [Max PR per Category] THEN MIN([Month Year]) END)

Format this to March 2001 format.

If the category is that selected, and the profit ratio matches the maximum value, then retrieve the Month Year value, and duplicate that value across all the rows (this is what the outer WINDOW_MAX does).

Add this to the table in the Rows, and adjust the table calc settings for both the nested calculations to compute by Month Year only. The month for the selected category which had the highest profit ratio will be displayed against all rows for that category, and null for all other rows.

Again we need similar fields for the other months

Max Qty Month

WINDOW_MAX(IF ATTR([Selected Category]) AND SUM([Quantity]) = [Max Qty per Category] THEN MIN([Month Year]) END)

Min PR Month

WINDOW_MAX(IF ATTR([Selected Category]) AND [Profit Ratio] = [Min PR per Category] THEN MIN([Month Year]) END)

Min Qty Month

WINDOW_MAX(IF ATTR([Selected Category]) AND SUM([Quantity]) = [Min Qty per Category] THEN MIN([Month Year]) END)

Format all these too, and add to the table with the appropriate table calc settings applied.

If you now change the value in the the Selected Category set, you should see the table update.

Building the Scatter Plot

On a new sheet, build out the basic scatter plot by

• Order Date to Filter, filtered by years 2018 & 2019
• Profit Ratio to Rows
• Quantity to Columns
• Category to Colour and adjust accordingly
• Month Year exact date, continuous to Detail
• Change Shape to filled circle
• Adjust the Tooltip to suit
• Change the axis titles to be capitalised
• Change the sheet title
• Remove all row/column borders & gridlines. Leave the zero lines and set the axis rulers to be a sold grey line

Add Selected PR to Rows, and change the mark type of that card to line, and add Month Year to path. In the image below, I still have Furniture selected in my set.

Add Month Year to the Label shelf too and format so the text is smaller and coloured to match mark colour

Make the chart dual axis, synchronise the axis and remove the Measure Names from the colour shelf of each mark. Hide the Selected PR axis.

If you now change the selected set value, you can see how the other categories ‘join up’.

Building the Insights Box

Take a duplicate of the data table built originally, and add Selected Category to Filter. If you’ve still got an entry in your set, this should reduce the rows to just those records for that Category.

Remove the fields Profit Ratio, Quantity, Selected PR from the Measure Values section, and remove Selected Category from Rows.

Create a new field

Index = Size

INDEX()=SIZE()

and add this to the Filter shelf, setting the value to true.

The INDEX() function will number each row, the SIZE() function will number each row based on the total number of rows displayed (ie all rows will display the same number), so INDEX()=SIZE() will return true for the last row only

Now we’ve only got 1 row displayed, we’ve got the info we need to build the data in the Insights box by

• Move Month Year to Detail
• Category to Colour
• Move Max PR Month, Max Qty Month, Min PR Month, Min Qty Month to Text
• Move Max PR per Category, Max Qty per Category, Min PR Per Category, Min Qty per Category to Text

This will probably show duplicate instances of the values. This is because we didn’t fix the table calculation setting of the Index=Size field. Edit that to compute by Month Year

and then re-edit the Index=Size filter to just be true again, and expand the display so you can see all the values.

Now you can easily format the Text to the required display

I used this site to get the bullet point image from

Building the dashboard & interactivity

Add the scatter plot to the dashboard, and remove the legends that automatically get added.

Float the Insights sheet and position bottom right, remove the title. fit entire view, and also remove the colour legend which will have been added.

Add a dashboard Set Action to assign values to the Selected Category set and remove all values when the selection is cleared

Additionally, add a highlight action that highlights the Category field only (this keeps the lines highlighted on selection of a single mark).

Now practice clicking on and off marks on the scatter plot and you should see your lines & insights box being added and removed.

Finally add the title and header line by adding a vertical container at the top above the scatter plot.

Add a text box for the title, and underneath add a blank object.

Set the padding of the blank object to 0 and the background to black. Then edit the height to 4, and adjust the height of the container if it’s now too large.

Make adjustments to the position of the floating insights box if need be so it isn’t overlapping any axis.

And that should be it! My published viz is here.

I’ll be taking a break for a couple of weeks, so will be on catch up when I return from some R&R.

Happy vizzin’! Stay Safe!

Donna

# COVID-19 New Case Trends Cartogram

The challenge this time was set by Luke using the data being collated via Tableau’s Covid-19 data hub.

This viz is essentially equivalent to a small multiple display where the charts for a specific dimension (in this case State) get displayed across numerous rows and columns. The difference here, is the row and column for the State to be displayed in, is specifically defined, rather than just sequentially based on how the data is being sorted.

Luke very kindly provided the logic to determine the rows and columns.

Building out the data

Once again, I’m going to start by putting all my data into a table so I can check my calculations, especially since this challenge does involve table calculations (there’s a hint on the Latest Challenges page)

Data Source Filter

Although not explicitly mentioned in the requirements, the information we need to present is based on the dates from 1st March 2020 to 31st July 2020. I messaged Luke to check this, before I then realised it was stated in the title of the viz – doh!

To make things easier, I therefore added a data source filter to remove all the other dates, setting the Report Date to range from 01 March 2020 to 31 July 2020 (right click on the data source -> add data source filter

I then added the basic fields I needed to a table

• Province State Name to Rows
• Report Date discrete, exact date (blue pill) to Rows, custom formatted to mmmm, dd
• People Positive New Cases to Text

I excluded the fields where Province State Name = Null

We need to calculate the 7 day rolling average per Province State Name which we can do by using the UI to create a Moving Average quick table calculation against the People Positive New Cases pill, and then editing to compute over the previous 6 records (+ the current record makes 7 days). But I want to be able to reference this field, so I’m going to ‘bake it’ into the data model by creating a specific calculated field

7 day moving Avg

WINDOW_AVG(SUM([People Positive New Cases Count]), -6, 0)

Add this into the table, and edit the table calculation to compute by Report Date only and set the Null if not enough values checkbox

Do a basic sense check that the averages are correct by summing up 7 sequential rows and working out the average by dividing by 7.

So it looks like we’ve got the core measure to be plotted, but we’re going to need some additional fields in the presentation.

Again it’s not explicitly stated in the requirements, but it is in the title, that the values plotted need to be normalised. This is to ensure the data for each state is visible; if a state with a relatively low number of cases is positioned in the same row as one with a very high number of cases, it will be hard to see the data for the state with the low cases, because while the axis can be set to be independent, this will only work against a row and not an individual instance of a chart.

To normalise, we need to understand the maximum 7 day rolling average for each state.

Max Avg Per State

WINDOW_MAX([7 day moving Avg])

Add this to the table, setting both the nested table calcs to compute by Report Date.

In normalising, what we’re essentially going to do is determine the 7 day moving Avg as a proportion of the Max Avg Per State, so every value to plot will be on a range of 0-1.

Normalised Value

[7 day moving Avg]/[Max Avg Per State]

Format this to 2 dp, and add to chart, remembering to check the table calcs continue to compute by Report Date only.

In the chart displayed, each State is titled by the name of the State. In a small multiple grid of rows & columns, we can’t use a dimension field for this, as it won’t appear where we want it. Instead we’re going to achieve this using dual axis, and plotting a mark at the centre point. For this we need to determine the centre date

Centre Date

FLOOR(DATEDIFF(‘day’,{FIXED:MIN([Report Date])},{FIXED:MAX([Report Date])})/2)
,
{FIXED:MIN([Report Date])}
)

This looks a bit complex, so I’ll break it down. What we’re doing is finding the number of days between the minimum date in the data set and the maximum date in the data set. This is

DATEDIFF(‘day’,{FIXED:MIN([Report Date])},{FIXED:MAX([Report Date])})

We’re then halving this value ( /2) and rounding it down to the nearest whole number (FLOOR).

We then add this number of whole days to the minimum date in the data set (DATEADD), to get our central date – 16 May 2020.

Now we need a point where we can plot a mark against that date. It needs to be above the maximum value in the main chart (which is 1). After a bit of trial and error, I decided 1.75 worked

Plot State

IF [Report Date] = [Centre Date] THEN 1.75 END

Finally we need to create our Rows and Columns fields which provides the co-ordinates to plot each state. The calculations for these were just lifted straight out of the requirements – thanks Luke!

Building the Viz

Start by adding the Rows and Columns fields to their respective shelf. Set them to be discrete dimensions (blue pills). You should immediately see a ‘map’ type layout of the US States.

Exclude the Null Rows value.

Now add Report Date as a continuous exact date (green pill) to Columns and Normalised Value to Rows, remembering to set the table calc to compute by Report Date only for all nested calculations. Change the mark type to Area.

Add 7 day moving Avg to Label and set the label to display the max value only and adjust the font size – I ended up at 7pt. Then add Province State Name & People Positive New Cases Count to Tooltip. Format the tooltip to match.

Remove all column/row lines and grid lines, zero lines etc.

There is a requirement to ‘add a line underneath each of the area trends’.

For this I added a 0 constant reference line formatted to be a solid black line.

But you’ll notice that for the charts that sit directly side by side, the line seems to be continuous, but I want to break it up. I re-added the column divider line to be a thick white line to get the desired effect.

Right, now lets get the State label added.

Add Plot State to Rows before Normalised Value and change the aggregation from SUM to MIN.

Change the Mark Type to Text and move the Province State Name field from Tooltip to the Text shelf. Adjust the text label to remove any other fields that are displaying, and resize the font – again I used 7. Clear the Tooltip for the this mark, so nothing displays on hover.

Make the chart dual axis and synchronise axis. Remove the Measure Names pill from the Colour shelf on both marks cards which will have automatically been added.

And now all you need to do is remove all the headers (uncheck Show Header) against Rows, Columns, Report Date & Plot Value, then right click on the >8k nulls label at the bottom right and select Hide Indicator.

You’re all done – you just need to add to a dashboard now. My published version is here.

I really enjoyed this challenge – a nice mix of calculations & format complexity but not overly cumbersome, which meant this blog didn’t take so many hours to write this week ðŸ™‚

Happy vizzin’! Stay Safe!

Donna

# Can you show the top 10 rank over time for each Olympic country?

The #WOW2020 week 31 challenge was combined with a #PreppinData challenge and launched at a virtual live event. I was fortunate enough to be able to join this event for the first hour and had the pleasure of meeting up with some #WOW regulars Sean Miller, Kyle Yetter & Tim Beard, along with #WOW challenge setter Lorna Brown, and #PreppinData setters, Jenny Martin and Tom Prowse. I was gutted I couldn’t stay on for the whole session, but work commitments got in the way – bah!

I did complete the #PreppinData challenge first, and the last time the team ran a combined event, I did blog on how I built both challenges, but I’m struggling for time, and the #WOW has got a fair bit going on, so I’m afraid the Prep challenge won’t make the cut this time – sorry #Preppers!

So onto the #WOW challenge. Part of this challenge was to utilise the new Relationships feature in 2020.2, so you need to be on this version to follow along.

Creating the data source

If you’d completed the #PreppinData challenge, you could use your own outputs as the inputs for #WOW challenge. I did that initially, but as I was building I had a few minor discrepancies from the solution, so chose to replace my data sources with the hyper files that are referenced in the Prep challenge, these being

• Host Countries – 1 row per Olympic Host City (and Country) per Year since 1896 to 2016
• Country Medals – 1 row per Country attending the Olympics per Year, summarising the number of Bronze, Silver and Gold medals won by that country.
• Medalists – For every Country, for every Year, there is 1 row per Athlete who won a Medal including the type of Medal (Bronze, Silver, Gold).

In Tableau Desktop, connect to the Host Countries hyper file and drag the ‘table’ named Extract into the data source pane. If you right click on the table, you can the rename it – I chose Host.

Then Add a connection to the Country Medals hyper file, and again drag the ‘table’ named Extract into the data source pane so it connects to the Host table. Set the relationship to be on Year. I renamed the ‘table’ again to be Medals.

Now add another connection to the Medalists hyper file, and drag the ‘table’ named Extract to connect to the Medals table, this time setting the relationship both on Country and Year. I renamed the ‘table’ again to be Medalists.

Building the Bump Chart

As with many challenges, if I can, I build the data I need into a table to start with, so I can check my calculations, so lets get the basics

Note – depending on the order you connected your tables in the data source, some field names that exist in multiple tables will be suffixed with fieldname (Extract x). I won’t refer to the Extract part, but will reference fields I use in this blog by prefixing with the table name instead.

Drag out Host.Year, Host.Host Country, Medals.Country to Rows. Lets show the number of medals of each type each country won, so we can use this to sense check some calculations later : put Medals.Gold, Medals.Silver, Medals.Bronze into the table

The bump chart we need to build is ranked based on the score of each country which in turn is determined by giving 3 points for each gold medal, 2 for a silver and 1 for a bronze, so we need a calculated field

Score

IFNULL(SUM([Bronze]),0) + (IFNULL(SUM([Silver]),0) * 2) + (IFNULL(SUM([Gold]),0) * 3)

We need to rank this score, and I’m going to have a calculated field to store this explicitly

Rank Score

RANK_UNIQUE([Score])

Add these 2 fields to the table, and adjust the table calculation of Rank Score so all fields except Year are ticked

Now we could start building out the bump chart now, and I did when I was creating this and would then flit back and forth between doing something on the chart and checking new calcs. However, to keep the blog a bit easier, we’ll continue building out the table.

So first up, we need another rank field. When building out the bump chart initially, I was doing all sorts of things to show the Top 10 only, but whatever I did, I couldn’t stop the lines from joining up between countries when they didn’t exist in consecutive years eg Greece is 1st in 1896, but doesn’t appear in the Top 10 again until 1904. As 1900 was missed, the lines shouldn’t join, but mine were. I was faffing over this for some time, so eventually caved and checked out Lorna’s solution. She’d resolved this simply with

Top 10 Rank Only

IF [Rank Score] < 11 THEN [Rank Score] ELSE NULL END

ie only show the rank if its in the Top 10. Simples really!

Add this onto the table and check the table calculation setting is as previously.

So for the bones of the bump chart we have the two fields we’re going to plot against – Year and Top 10 Rank Only, but before we do that, let’s get the fields we need that are displayed on the tooltip.

# Countries Per Year

{FIXED [Year] : COUNT([Medals])}

This will give us the number of countries that participated in each year. The COUNT([Medals]) comes from dragging the Medals(Count) that is automatically generated as part of the Medals table into the calculated field dialog- in the new Relationships model, this Count field against each table is essentially the equivalent of Number of Records.

# Medals

IFNULL(SUM([Bronze]),0) + IFNULL(SUM([Silver]),0) + IFNULL(SUM([Gold]),0)

A simple tally of the total number of medals won by each country.

Is Host?

[Host Country]=[Country (Extract2)]

where this is checking Host.Host Country against Medals.Country and returns true if they match.

Hosted | Participated

IF [Is Host?] THEN ‘hosted’ ELSE ‘participated’ END

The text on the tooltip differs slightly dependent on whether the country hosted or not.

COLOUR: Country

IF [Is Host?] THEN [Host Country] END

The stars on the bump chart need to be coloured based on country, but we don’t want the circles coloured too, so this field is necessary.

Let’s get all these into our table… you’ll notice (or you may not), but adding some of these fields causes the Rank Score & Top 10 Rank Only to change, so readjust the tableau calculation so only Year remains unchecked.

Now we have everything to build the core bump chart.

Add Host.Year to Columns, Medals.Country to Detail and then add Top 10 Rank Only on Rows. Change the Mark Type to a Line, and verify the table calculation on the Top 10 Rank Only is set to compute by Country only.

Edit the Top 10 Rank Only axis and set the scale to be reversed

Change the Colour to grey and make the Size smaller.

Then add another instance of Top 10 Rank Only to site alongside the existing one on the Rows.(I tend to click on the existing pill, hold down ctrl and then drag – this will create a duplicate instance and will retain the table calculation settings).

Now make the chart dual axis & synchronise the axis.

Change the mark type of the second axis to be a shape, and add Is Host? to the Shape shelf. Adjust so that false is a filled circle and true is a filled star.

Also add Is Host? to the Size shelf on this marks card, and adjust the sizes so true is bigger than false.

At this point you will need to adjust the table calculation of the 2nd Top 10 Rank Only pill, to compute by both Country and Is Host?.

Add COLOUR: Country to the Colour shelf, and adjust the colours to use the Hue Circle palette. Set the NULL value to the same shade of grey as the line.

You’ll need to adjust the table calculation again of the 2nd Top 10 Rank By County pill, so only Year is unselected.

All the following fields need to be added to the Tooltip of the All marks card.

• Score
• # Countries Per Year
• Hosted | Participated
• Top 10 Rank Only (setting the table calc to compute by everything except Year)
• # Medals

Adjust the Tooltip accordingly, then tidy up the formatting of the chart

• Hide the Top 10 Rank Only axis
• Rotate the labels of the Year
• Hide the Year field label
• Remove all row & column lines and gridlines/zero lines

Medals Viz in Tooltip

The Bump chart has 2 Viz in Tooltips, one showing the count of the different medals won and the other showing the top 10 athletes. Build the Medals chart by

• Host.Year to Rows
• Medals.Country to Rows
• Medals.Bronze to Columns
• Then drag the Medals.Silver pill to the bottom of the chart where the Bronze axis is, and when you see 2 green columns, drop the pill. This should have the effect of Measure Values automatically being added to Columns, and Measure Names being automatically added to Rows and the Filter shelf.
• Then add Medals.Gold into the Measure Values pane
• Reorgansie the pills in the Measure Values pane so they are listed Gold, Silver, Bronze
• Show the mark Label

Now hide the Year and the Country columns, and the Value axis at the bottom. Remove all the formatting (the quickest way is to go to the bump chart sheet you should hopefully have formatted already, right click on the tab of the sheet at the bottom, and select Copy Formatting, then go back to the sheet you’re working on, and on the tab, right click & Paste Formatting

If this doesn’t clean everything up enough, just adjust formatting manually.

Finally, set the fit on this sheet to be Entire View. This will squash everything up, but when its referenced from the viz in tooltip, the view will be filtered to the Year and Country. Doing this will remove the ‘this view is too large’ message that may appear on the Viz in Tooltip.

Switch back to the Bump chart and add the sheet to the Tooltip by Insert -> Sheets -> <Select your sheet>. Adjust the maxwidth property to 500 and maxheight property to 100 to make the viz fit better

Top 10 Athletes Viz in Tooltip

Build the initial viz by

• Host.Year to Rows
• Medals.Country to Rows
• Medalists.Athlete to Rows
• Medalists,Medal to Columns and manually reorder the columns.
• Medalists.Medalists(Count) to Columns
• Medalists.Medal to Colour

To work out the Top 10, we need to first work out the score per medal

Medalist Score

CASE [Medal]
WHEN ‘Gold’ THEN 3
WHEN ‘Silver’ THEN 2
WHEN ‘Bronze’ THEN 1
END

then calculate the total score per athlete per games, since an athlete can win more than one medal and appear in multiple games

Total Score per Athlete

{FIXED [Year], [Country (Extract2)], [Athlete] : SUM([Medalist Score])}

where the Country is from the Medals table.

Use this field to sort the Athlete pill

We only want the Top 10 though, so add Athlete to the Filter shelf, and filter by the top 10 of Total Score per Athlete

At this point, things won’t look right, as the filter will be applying over all the data. To get this right, we now need to add the sheet to the Viz in Tooltip, so go back to the Bump chart, and on the tooltip add a reference to this sheet.

Adjust the width and explicitly filter by Host.Year, Medals.Country

If you now return to the Top 10 sheet, an additional filter will have been automatically added to the Filter shelf. Add this filter to context, so it will change to a grey pill.

Now return to the Bump chart and just test out that the chart is presenting correctly when hovering over various fields.

Now return to the Top 10 chart and tidy up all the formatting and hide various fields, and set to Entire View.

The bump chart should now be complete

Building the strip plot

The strip plot is showing a mark for each host with an indicator to show if they finished in the Top 10 or not. Additional information on the tooltip shows the host’s score and medals count. We’ll build this into a table first as below

We ultimately need to show 1 row per year, but the country level data is required to work out the rank. So we’re going to use some more table calculations.

We want to capture the host’s score and medals accrued against all the rows associated with each year.

Score for Host

WINDOW_MAX(IF ATTR([Is Host?]) THEN [Score] END)

Medals for Host

WINDOW_MAX(IF ATTR([Is Host?]) THEN [# Medals] END)

Add these to the table, setting the table calculation to compute over all fields except Year.

We also need to know if the host was in the top 10 or not

Is Host in Top 10?

WINDOW_MAX(IF ATTR([Is Host?]) AND [Rank Score]<=10 THEN 1 ELSE 0 END)

Add this onto the table, and again, sense check the table calculations (this one is nested) to ensure all are computing by all fields except Year

So we have all the bits of data we’ll need, we just need to reduce to 1 row per year.

Index = Size

INDEX() = SIZE()

Add this onto the Filter shelf, and set to true. Check the table calculation again to ensure all fields except Year are set. Recheck the filter is still only selecting true.

This should reduce the data, and just show the last row listed for each country.

• Host.Year to Columns
• Medals.Country to Detail
• Mark Type to Shape
• Is Host in Top 10? to Size and to Shape. Adjust the table calcs and shape/size accordingly (you’ll need to reverse the size).
• Host Country to Tooltip
• Medals for Host to Tooltip (remember table calc)
• Score for Host to Tooltip (remember table calc)
• Index = Size to Filter set to true (again remember table calc)
• Change colour to red

Top 10 Countries Viz in Tooltip

Add Host.Year, Host Country and Medals.Country to Rows and Score to Columns, then click the Sort button in the menu to sort the Country descending. Add Rank Score to Rows and change to be discrete (blue pill). Move it to sit in front of the Country field

We need to restrict the rows based on whether the Country in is the top 10 or it’s the host country

Is Host or Top 10?

ATTR([Is Host?]) OR [Rank Score]<=10

Add this onto the Filter shelf and set to true; once again make sure the table calc is computing by all fields other than Year (and double check its still set to true after adjusting). If you scroll to the bottom, you should see 11 rows for 2016, as the host, Brazil, finished 12th.

And now we need to colour the bars

COLOUR:Host

IF [Rank Score] > 10 THEN ‘Red’
ELSEIF ATTR([Is Host?]) THEN ‘Blue’
ELSE ‘Grey’
END

Add to the Colour shelf and adjust accordingly. Then add Score and # Medals to the Text shelf and format the label displayed. Finally hide some of the fields and format the chart to remove axis/gridlines etc.

Finally, set the fit to Entire View.

Go to the strip plot chart and add this sheet to the Tooltip, setting the filter to be on Year

And that should be all the components you need to build the dashboard.

My published version is here.

Happy vizzin’! Stay Safe!

Donna

# Can you create a drill down using set actions?

For week 30, #WorkoutWednesday alumni Emma Whyte returned re-posting this challenge which was originally set in Week 41 of 2017 (see here). The idea behind this was to see how the challenge could be achieved using features that have been released since that challenge – in this case set actions.

I’ve been doing the #WorkoutWednesday challenges since they were first introduced, so I completed the original challenge, which is posted here.

Despite it being over 2 1/2 years ago, I had a strong recollection as to what was required to achieve this. So the challenge I set myself, was to recreate without looking at my own solution.

Building out the data

This is one of those challenges where we can build the data out into a table to check the functionality before building the actual viz. I always like to do this where possible, as I find it a good reference to make sure I’m getting the logic & the calculated fields I need right.

Start by adding State & City to Rows and add Sales & Profit via Measure Names on Columns .

As the challenge is to use Set Actions, then naturally, we’re going to need a Set. The Set we need is based on State with the idea being that when there is a State(s) in the set, then the City will display instead.

Selected State

Right click on State and Create -> Set. Select an option in the dialog, eg Alabama say

We will need to show the marks based on State or City depending on whether a State has been selected or not. We need a single field that we will use in the viz that displays the dimension we need to show

Display Value

IF [Selected State] THEN [City] ELSE [State] END

Add this onto the Rows and you’ll see how this is working

We can test the functionality of putting values into and out of the set without the need for the dashboard action at this point, by right-clicking on Selected State and selecting Show Set – the list of set values to select will display (a bit like a filter list).

We need a way to figure out what rows to show – how to identify whether there’s anything selected in the set.

Count States Selected

{FIXED : COUNTD(IF [Selected State] THEN [State] END)}

By being an LOD, this will set the count of the items in the set across all the rows in the data. Add to the sheet so you can see how this works

So we want to show information when either there isn’t anything in the set, or for the rows associated to the Selected States only

Records to Show

[Count States Selected] = 0 OR [Selected State]

Add this to Rows and test out… with no State in the set, all the rows are True

but with a State selected, only the rows associated to that State are True

But we seem to have too many marks showing when there’s nothing in the set….?

That’s fine.. just take City out of the view now, and if you deselect all States you should get the 48 rows we’re going to start with listed, and all are Records to Show = True. The Sales & Profit values will also now be aggregated to the appropriate level.

Building the Viz

Ensure your Selected States set is empty, and build out the scatter plot

• Profit on Columns
• Sales on Rows
• Display Value on Detail & Label
• Records to Show on Filter set to True
• Mark Type = Shape set to x

Verify the functionality by clicking a State in the list, and the view should change to show the City.

We need to colour the marks based on Profit

-ve Profit?

SUM([Profit])<0

Finally we need to look at how the title/subtitle changes based on which level we’re at.

Title

IF [Count States Selected] = 0 THEN ‘Sales vs. Profit by State’
ELSE ‘Sales vs. Profit for ‘ + [State]
END

Subtitle

IF [Count States Selected] = 0 THEN ‘Select a state to drill down to city level’
ELSE ‘Double-click a city to drill up to state level’
END

Add these onto the Detail shelf, then they’ll be available to reference in the Title of the sheet.

And then adjust the Tooltip, and we’re pretty much ready to go.

Create a dashboard and add the scatter plot sheet to it.

Add a dashboard action to Change Set Values which runs on the Select action, and assigns values to the Selected State. On clearing the selection, values are removed from the set.

And that should pretty much be it. My published version is here. I thoroughly enjoyed the ‘throwback’ to previous challenges, and would like to see this theme continue on occasion.

Happy vizzin’! Stay Safe

Donna

# Can you dynamically display a label on a heat map?

Ivett Kovacs returned with the #WOW challenge for this week, providing 2 versions. She provided some hints in the requirements which helped. With a bit of head scratching I managed to deliver an advanced version too. It didn’t exactly match what Ivett had delivered, but I was quite happy with my version. Until my follow #WOW participant Rosario Guana pointed out that the display ‘on click’ of a row or column total didn’t work properly ðŸ˜¦

I’ve spent A LOT of time trying to figure out what’s required but can’t resolve it. I’ve checked out Ivett’s solution and noticed a slight flaw in her logic too, so I feel uncomfortable changing my flaw for another. I’ve also looked at other solutions, and all are showing slightly different behaviour when you click on a column or row cell total. As a result, I’ve decided to blog based on the Intermediate version only. Others have posted blogs about their Advanced versions, so I’m going to save some time today, and reference them instead.

Building the basic heat map

The core of this challenge is going to involve Sets, and we’re going to use Set Actions to identify which cell is being interacted with.

For the intermediate version, we’re interacting based on hover actions.

The heatmap displays a grid of months by years, and we need to build sets based off of these. So we’ll start by creating dedicated calculated fields storing these values

Month Order Date

DATENAME(‘month’,[Order Date])

Year Order Date

YEAR([Order Date])

Add Month Order Date to Columns and Year Order Date to Rows, then drag Quantity onto Colour and change to AVG. You should have your basic heatmap.

From the Analysis -> Totals menu, select to shoe row & column grand totals. They will initially display as white cells, so you need to Edit Colours on the colour legend and select Include Totals/

Let’s sort some formatting out at this point.

Set the mark type to Square

Format the borders by setting the row & column dividers to none, but setting the Grand Total pane & header to be thick, white lines

And change the word Grand Total, by right clicking -> format on each cell and changing the Grand Total label.

Adjust the size of the Month & Year labels and remove header labels from displaying and you have

Labelling the total cells

We need to permanently display the label on the total row/column, and only show the label on the cells ‘on hover’.

We need a way to identify when we’re on a total cell. Ivett gave a hint “What is the Size() of the totals and the cells?“. This is a trick I’ve read about and discussed here. Basically the size() of the grand total row or column will be 1, so this is what we’re going to base some logic on.

Size of Months

SIZE()

Size of Years

SIZE()

Total Label

IF [Size of Years]= 1 OR [Size of Months] = 1 THEN AVG([Quantity]) END

If we’re on a row or column grand total, then show the AVG(Quantity). Add this onto the Label shelf, and edit the table calculation so the Size of Years nested calc is computing by Year Order Date, and the Size of Months nested calc is computing by Month Order Date.

Labelling the central cells

We need to create some sets, as these are going to determine whether we are hovering on a cell or not. The dashboard action will add values to a set on hover.

Right click on Month Order Date and Create Set, selecting an arbitrary month.

Hover Month Set

Do the same with Year Order Date

Hover Year Set

We then need a field to label the cell

Cell Label

IF ATTR([Hover Month Set]) AND ATTR([Hover Year Set]) AND [Size of Months]>1 AND [Size of Years]>1 THEN AVG([Quantity])
END

If something has been selected in the Hover Month Set and something has been selected in the Hover Year Set, and we’re not on a grand total row/column, then show the AVG(Quantity).

Add this to the Label shelf, and edit the table calculation settings as above.

Tooltip

The tooltip displays the Profit Ratio

SUM([Profit])/SUM([Sales])

so you’ll need to create this field, add to the tooltip and adjust the display accordingly.

Add the above sheet onto a dashboard, then create a dashboard action to Change Set Values, which is set to

• run on Hover
• Target the Hover Month Set
• Assign values to the set when run and remove values from the set when cleared

Repeat to create a set action to target the Hover Year Set.

Test it out – hover your mouse over the cells. If you hover on a total, you should get the values displayed for all the values in the row or column

As I stated in the intro, I didn’t end up with something I was entirely happy with. While writing this up, I’ve tried to rebuild using ideas from other published solutions, but I still have got anything ‘perfect’.

Ivett’s solution is obviously the closest to what we’re aiming for, since that is the challenge. The minor flaw I found in that though was that in some of the labelling logic there’s a step which compares the ‘average of the selected cell’ with the ‘average of the current cell’, and if they match, display the AVG(Quantity), otherwise show the difference. This logic is basically trying to work out ‘if current cell = selected cell’, but it doesn’t fully work, as there are other cells where the values match – it’s subtle as the displayed values are faded out with the cell highlighting, as I’ve shown in the example below.

Kyle Yetter has blogged his solution here, but if you click on a row or a column total in his solution, the values are all the same, and not showing a difference.

Rosario Guana has blogged her solution here. Rosario has adapted her solution slightly and displays both the average values and the difference ‘on select’, so the logic I was struggling with, isn’t being used in the same way here.

No doubt with a bit more time, I might finally crack this one, but for now, I need to ‘put it to bed’ ðŸ™‚

Happy vizzin’! Stay safe!

Donna

# Are Sales on track with Goal?

This week’s #WOW challenge was set by Ann primarily to demonstrate the recent 2020.2 release of relationships in Tableau. This is a change to how the data modelling works when combining data together, in this case actual sales data and sales goal (target) data. Ann provided a ‘tweaked’ version of superstore sales data for the actual sales, alongside a custom ‘goals’ data set which provided a sales goal value for each month per Segment and Category (ie the goals data was not as granular as the actual sales data).

Combining the Data

Obviously, to make use of the new relationships feature, you’ll need at least v2020.2.1 installed.

When you add the 2 data sources to the Data Source pane and add the Orders sheet from the Superstore data and then add the Sheet 1 from the Sales Goals data, you’re prompted to add a relationship if matches can’t automatically be found (ie the field names aren’t identical).

I added relationships for the Segment, Category and Month of Order Date fields, which then provides a ‘noodle’ link between the two sets of data

Note – to add multiple relationships, you have to close the field listing to then see the Add more fields option.

With relationships, Tableau is now smarter at how it manages the queries between the data. I’m not going to go into it at length (as there’s plenty of good Tableau documentation on the subject), but I’ll quickly show the difference between this and a traditional ‘join’.

In the Relationships model, if I show Sales and Goal by Month & Segment, the Goal is based on the sum of the rows associated to that Month and Segment that you can find if you looked at the source spreadsheet directly

But if I was using a traditional join with these data sets :

and then presented the same info as above

The Goal data is way off, as it’s being influenced by all the additional rows of data that have been constructed by the join clause.

If we expand to the Category level in each example above (the level at which the Goal data is stored), the Relationship version still shows the correct value

while the Join version is still incorrect, and only changing to an Average aggregation do we get the figure needed

Joining the data requires more work (calculations / knowledge) to ensure the Goal data is reporting correctly at the various levels, whereas the Relationships model ‘just works’.

Building the Table

When I started building the solution, I actually started with the YoY trend chart, but found I had to revisit it (swap a couple of pills about) as I progressed through, so I’m going to start with the table of data itself.

This turned out to be really quite tricksy… it’s just a table right… looks harmful enough…. but Ann had lobbed a few ‘gotcha’ grenades into the mix, that took a bit of head-scratching!

First up, Ann likes to use capitalisation, so I simply renamed the following fields

• Category (from Sheet 1 – the Goals data) -> CATEGORY
• Segment (from Sheet 1 – the Goals data) -> SEGMENT
• Month of Order Date (from Sheet 1 – the Goals data) -> MONTH
• Sub-Category (from Orders – the Superstore Sales data) -> SUB-CATEGORY

And I then created a hierarchy called SEGMENT which stored SEGMENT – > CATEGORY -> SUB-CATEGORY.

We also needed to ‘fake’ today so we can show sales to date. The Orders Superstore data set contains data from the start of 2016 to the end of 2019. We needed to pretend ‘today’ was 1st July 2019 and not show any sales beyond the end of the month. So I created the field

Today

#2019-07-01#

and then

ACTUAL SALES

IF DATETRUNC(‘month’,[Order Date])<= [Today] THEN [Sales] END

If we build out the basics of the table

• SEGMENT hierarchy on Rows
• MONTH (discrete, exact date, formatted to mmmm yyyy) on Rows
• Measure Names on Columns and on Filter, filtered to ACTUAL SALES and Goal
• Measure Values on Text
• MONTH on Filter, set to exclude 2016

we get

and as we expand the hierarchy to the next level, the ACTUAL SALES and Goal adjusts as expected

but when we reach the 3rd level we see a couple of issues

1. The Goal is showing the values that were displayed at the CATEGORY level, since it isn’t set at any further level. The requirement we have is to show the value ‘No Goal’ instead once we reach this point.
2. We also have a set of values with a NULL SUB-CATEGORY. This is because there are dates with goal values in the Goals data set, but we have no Sales data for those dates, so a match at the SUB-CATEGORY level can’t be found. We need to remove these records when at this level, and actually only show records where we have ACTUAL SALES, so records from July 2019 onwards should also be excluded.

At this point, I did end up having a sneak look at Ann’s solution. I had an idea of what I wanted to do, BUT I was worried that I may be missing something related to the new data model… that perhaps there was a feature of relationships that I needed to be exploiting as part of this, rather than go down the table calculation route I was heading towards. After all, part of doing these challenges for me is to make use of new features & functions if I can, so I have a reference workbook for future use. I had a read up on the blog posts related to relationships, but couldn’t spot anything obvious, hence a quick check at Ann’s solution…. nope nothing special required in respect of the data model….so instead, we need some additional calculations to help resolve all this.

What we need is some way to identify when we’re at the lowest SUB-CATEGORY level. We’ll start by counting the number of distinct sub-categories we have

# Sub Cats

ZN(COUNTD([SUB-CATEGORY]))

Wrapped in a ZN will return the value 0 when none exist.

Dropping this into the table of data, we can see the 0s for all the future Goal records, and 1 for the rest of the rows, since we’ve expanded to the SUB-CATEGORY level.

If you collapse up to the CATEGORY level, the #Sub Cats count changes, as each row typically contains more than 1 SUB-CATEGORY.

The term ‘typically’ in the above statement though is the reason we can’t assume we’re at the lowest level if the count is 1 though. It is possible for the sales against a particular CATEGORY in a MONTH to only span a single SUB-CATEGORY (Check out Home Office -> Furniture in Jan 2018). So we also need

Max Sub-Cats in Window

WINDOW_MAX([# Sub-Cats])

Which will count the maximum number of SUB-CATEGORY records for each SEGMENT and CATEGORY. In the case of Furniture Sales in the Home Office in Jan 2018, you can see below the count of sub-categories is 1 but the maximum count in the window is 4.

This shows we’re not at the SUB-CATGEORY level, as when we are, the #Sub Cats = Max Sub Cats in Window (ie 1=1)

So with that knowledge we can now derive an alternative field for the goal

SALES GOAL

IF ([# Sub-Cats] = 1 AND [# Sub-Cats]=[Max Sub-Cats in Window]) THEN 0 ELSE SUM([Goal]) END

Remove the original Goal field from the table and add this instead, setting the table calculation to compute by MONTH and you’ll see the SALES GOAL value change from a value to 0 as you expand down to the SUB-CATEGORY level.

So how do we show ‘NO GOAL’ instead of 0? Well this is a bit of a sneaky formatting trick, and one I couldn’t figure out (this step was actually the last thing I changed before publishing).

In my original SALES GOAL calculation I had set the value to be NULL rather than 0 expecting there to be an option to show the NULL values as something else (ie NO GOAL). But the feature I was looking for wasn’t available (maybe I was mistaken about when it appears.. maybe it’s disappeared in 2020.2… I still need to investigate). Anyhow, my alternative was to create a string field to store the value in, but this would make formatting the number quite cumbersome, and I was sure that wasn’t the case.

So I had to check Ann’s solution. Instead of NULL, she had set the value to be 0, and consequently used the following custom number formatting :

“\$”#,##0;-“\$”#,##0;”NO GOAL”

where the last bit is how to format 0. Very very sneaky huh!

So we’ve worked out how to identify when we’re at the SUB-CATEGORY level, and managed to display ‘NO GOAL’ when at that level, now we need to filter out some of the rows (but only when at this level).

Records to Show

IF [Max Sub-Cats in Window] = 0 THEN FALSE
ELSEIF ([# Sub-Cats] = 1 AND [# Sub-Cats]=[Max Sub-Cats in Window])
THEN //at lowest level
IF ATTR([MONTH]) < [Today] THEN TRUE
ELSE FALSE
END
ELSEIF [# Sub-Cats] = 0 AND [Max Sub-Cats in Window] = 1 THEN FALSE //still at lowest level but no sales so exclude
ELSE TRUE
END

This took a bit of trial & error to work out, and there may well be something much more simplified.

Adding this to the table, and again setting to compute by MONTH you can see which rows are reporting True and False. When collapsed to the SEGMENT or CATEGORY level, all rows are True, but the values start to vary when at the SUB-CATEGORY level.

Now, we need to work on the fields to help show us the actual & % difference.

RESULT

IF ([# Sub-Cats] = 1 AND [# Sub-Cats]=[Max Sub-Cats in Window]) THEN NULL ELSE SUM([ACTUAL SALES])-([SALES GOAL]) END

The difference should only show if we’re not at the lowest level. This needs to be custom formatted as

â†‘#,##0;â†“#,##0

I use this site to find the arrows needed.

ACTUAL vs GOAL

[RESULT]/SUM([ACTUAL SALES])

will give me the % difference (format to percentage with 0 dp).

And the final piece of data we need (I think) is to compute a RAG status for each row. This is dependent on some parameters

GREEN (WITHIN X%)

Type Float, set to 0.1 but displayed as a percentage

and similarly

RED (ABOVE Y%)

With these we can work out

RAG

IF ABS([ACTUAL vs GOAL]) > [RED (ABOVE Y%)] THEN ‘WAY OFF TRACK’
ELSEIF ABS([ACTUAL vs GOAL]) < [GREEN (WITHIN X%)] THEN ‘ON TRACK’ ELSEIF ATTR([MONTH])>=[Today] THEN ‘NO COMPARISON’
ELSEIF [SALES GOAL]=0 THEN ‘NO COMPARISON’
ELSE ‘OFF TRACK’
END

If you add all these fields to your table then you can check how they all compare/change as you expand/collapse the hierarchy (make sure you set the compute by MONTH on all fields).

So now we’ve got all the building blocks you can create the actual table.

I’d recommend duplicating the above (and keeping as a ‘check data’ sheet). Then remove the fields you don’t need, and add the Records To Keep = True to the Filter shelf. Format the table appropriately, and add the title so you’re left with

Building the Line Chart

As mentioned above, I started with this chart, but ended up having to revisit it as I found I needed to use different pills that got created as I built the table. There is actually 1 additional field needed, which isn’t obvious until you start playing with the dashboard actions later:

GOAL

IF [SALES GOAL] > 0 THEN [SALES GOAL] END

This is needed so that when we filter by SUB-CATEGORY later and there is no goal, we only get a single line displaying the sales values, rather than also having a line plotted at 0.

• MONTH on Filter, excluding Year 2016
• MONTH (exact date, continuous) on Columns
• Measure Values on Rows and Filtered to ACTUAL SALES and GOAL
• Coloured accordingly by Measure Names

Then add TODAY to Detail and add a Reference Line to the MONTH axis, setting it to TODAY, adjusting the format of the line, and setting the fill above colour to grey.

Format the gridlines, tooltips & axis accordingly to match

Barcode / Strip Chart

For this we’re creating a bar chart using MIN(1), and fixing the axis to range from 0-1 so it fills up the entire space as follows

• MONTH on Filter excluding Year 2016
• MONTH as exact date, discrete (blue pill) on Columns
• MIN(1) on Rows
• RAG on Colour adjusted accordingly
• ACTUAL vs GOAL on Tooltip

Then use the ‘type in’ functionality to create a pill labelled ‘vs Goal’ on the Rows

Creating the Dashboard

I used a vertical container to add the components onto the dashboard, as I used blank objects that are formatted with a black background, 0 padding and a height of 4 to create the horizontal line separators.

The interactivity between the table and the other charts is managed via Dashboard Filter actions.

To get these to work properly though, and in such a way that the selected SEGMENT and/or CATEGORY and/or SUB-CATEGORY can be referenced in the title of the line chart, 3 separate dashboard actions are required, with each one specifying the field to filter by, as follows :

Once these are set up, the associated pills will appear on the Filter shelf of the Line and Barcode charts, and are then available for selection when building the title of the line chart

You should now have your completed chart.

My published version is here.

Happy vizzin’! Stay Safe!

Donna

# What is the distribution of total orders by customer?

Luke Stanke set the challenge this week, and posted a ‘sneak peak’ on Twitter before the challenge was formally released

Challenges from Luke can sometimes be on the harder end of the scale, so with the bit of extra time available and only the gif in the tweet as a clue , I had a play to see if I could get close at all. And surprisingly I could, so once it was formally issued, it was just a case of tidying up some of my calcs.

Calculations

First up we need to establish the basic calculations

# Customers

COUNTD([Customer ID])

# Orders Per Customers

{FIXED [Segment],[Customer ID]: COUNTD([Order ID])}

Plotting these out onto a table with Segment we get

Note #Orders Per Customers needs to be set to be a discrete dimension.

So while this shows us a summary of the number of customers, we’ve only got 1 mark showing the summarised count. When plotting the chart we need something else in the view that will generate more marks. This is Customer ID.

At this point in order to help us build up our tabular view to ‘see’ what’s going on, I’ll filter the table to just show Segment = Consumer, and I’ll add Customer ID to Rows

As expected, our # Customers is now showing a count of 1 per row, but the data is now expanded as we have now got a row (ie a mark) per customer for each # Orders Per Customer ‘bucket’. But we still need a handle on the total customers in each ‘bucket’.

Customers Per Order Count

WINDOW_SUM([# Customers])

Adding this to the view and setting the table calc to compute based on Segment & Customer ID, we get the summarised value back again.

But we don’t want to actually show that number of marks; the number of circles to plot on the chart is dependent on a user parameter:

pMarkIndicator

Based on the requirements, if the number of customers is 15 and the user parameter is 5, then 3 circles should be drawn (15 / 5 =3 ), but if the number of customers was 14, only 2 circles should be drawn (14 / 5 = 2.8), ie the number of circles will always be set to the integer of the equal or lesser value (essentially the FLOOR() function). This can also be achieved by

Marks to Plot

INT([# Customers per Order Count]/[pMarkIndicator])

For some reason FLOOR can’t be used in the above as a table calculation is being used, but INT does the job just fine, and adding to the tabular view and adjust the table calculation accordingly we get

ie for the Consumer Segment, 6 customers have made 1 order in total, so based on batching the customers into groups of 5, this means 1 circle should be displayed. Whereas, 30 customers have made 3 orders in total, so 6 circles should be displayed.

But we can’t actually reduce the amount of rows (ie marks) displayed – we either have 1 row (by removing Customer ID) or a row per customer. But that’s fine, we don’t need to.

What we want is something in our data to group each row into the relevant batch size.

First up let’s generate an ID per row for each customer that restarts for each #Orders Per Customer.

Index

INDEX()

Add this as a discrete pill to Rows, and adjust the table calc

Now we have this, we calculate which ‘column’ each customer can sit in, a bit like what we would do if we were building a small multiple table, arranging objects in rows and columns (see here for an example of what I mean).

Cols

[Index]%([Marks to Plot])

This uses the modulo (%) notation which returns the remainder of the division sum. Lets put this on the view

For customers who have only ordered once, and where we’re only going to plot 1 mark, the Cols value is the same (0) for all rows.

Whereas for customers who have ordered twice, and where we want to plot 2 marks, the Cols value is either 0 or 1.

We’ve now got a value we can use to plot on an axis. We’re still going to plot a mark for each Customer ID, but some marks will be plotted in the same position, ie on top of each other, which therefore looks like just one mark.

Let’s show this more graphically, by duplicating the sheet, deleting some pills, moving some around, changing some to continuous, and setting the mark type to circle as below

Change the pMarksIndicator parameter and the number of circles will adjust as required.

So far, so good. We’ve got the right number of marks, it’s just not looking as nice and symmetrical as it should be.

We need to shift the marks to the left. But how far it shifts is dependent on whether we’ve plotted an odd or even number of marks.

If we have an odd number, the middle mark should be plotted at 0. If we have an even number the middle two marks should be plotted at -0.5 and +0.5 respectively. The calculation below will achieve this

Cols Shifted

IF [Marks to Plot]%2 = 0 //we’re even
THEN [Cols] – ([Marks to Plot]/2) + 0.5
ELSE //we’re odd
[Cols] – (([Marks to Plot]-1)/2)
END

To demonstrate this, I’ve added Cols Shifted along side Cols on the viz (this time make sure all the table calculation settings (including the nested calcs) are applied to compute based on Customer ID only which is different from the calcs above)..

Now you can see how it all works, you can remove the Cols and the Segment from the Filter shelf.

And now its just a case of applying the various formatting to clean up the display, and adding to the dashboard.

My published version is here.

Happy vizzin’! Stay Safe!

Donna