Can you recreate this difference chart?

November 18, 2021 Donna Coles2 Comments

Lorna created this challenge for #WOW2021 this week incorporating tips from the Speed Tipping session she and fellow WOW leader Ann Jackson had presented at TC21.

Defining the calculations

The requirements were to ensure there were only 7 calculated fields used, and no date hardcoding (including in the title – a feature I missed to start with). So let’s start by just going through the required calculations.

We need to identify the latest year in the data set

Current Year

YEAR({FIXED:MAX([Order Date])})

This uses an LoD (Level of Detail) calculation to identify the maximum date in the whole data set, which is 31st Dec 2021, and then extracts the Year of this ie 2021.

From this, we work out

Previous Year

[Current Year] – 1

Both of these fields return numbers, so automatically sit in the measures section of the left hand data pane (ie under the horizontal line). I want to treat these as dimensions, so I just drag the fields above the line.

We now need to create dedicated fields to store the Sales values for both years

CY Sales

IF YEAR([Order Date])=[Current Year] THEN [Sales] END

PY Sales

IF YEAR([Order Date])=[Previous Year] THEN [Sales] END

and with both of these, we can work out the

Difference

SUM([CY Sales])-SUM([PY Sales])

[TIP] This is custom formatted to △#,##0;▽#,##0.

I googled ‘UTF 8 triangles’ and used this link to find the suitable shapes which I just copied and pasted into the number format field.

We’re going to need to determine whether the difference is positive or not.

Is Loss?

MAX(0,[Difference]) =0

This is another [TIP] making use of the array function. If the Difference is negative, it will return 0 as this is the maximum of the two numbers. I’m not entirely sure if this is more efficient than simply writing Difference<=0, but I wanted to incorporate another of the tips presented.

The final calculation we need is another of the PY Sales field, as we need another distinct Measure Name value to display. I simply chose to duplicate the existing field to have a PY Sales (copy) field.

Building the viz

Add Category to Columns, Segment to Rows and then add CY Sales to Columns, which will create a horizontal bar chart. Then drag PY Sales to the CY Sales axis, and when the ‘two columns’ icon appears, drop the field.

This will automatically change the pills so Measure Values is on Columns and Measure Names is on Rows.

Swap the order of the pills on the Measure Values section on the left hand side, so PY Sales is listed before CY Sales.

Add Measure Names to the Colour shelf and adjust. Increase the width of the rows.

Check the Show Mark Labels option on the Label shelf and adjust alignment to display the text to the left

Increase the Size of the bars to the maximum size, and add a white border (via option on Colour shelf)

Add PY Sales (Copy) to Columns, and change the mark type to Gantt Bar. Remove Measure Names from the Colour shelf of this marks card, as it will automatically have been added. Instead add the Is Loss? field to Colour and adjust.

Add Difference to the Size shelf, then click on Size, and reduce it to as small as possible. Set the border of this mark to Automatic (it should become a little thicker).

Next add the Difference field to the Label shelf, align right and set the font colour to match mark colour.

Now make the chart dual axis, synchronise axis, and set the mark type of the Measure Names mark type back to a bar.

On the All marks card, add CY Sales, PY Sales and Difference to the Tooltip shelf. And add Current Year and Previous Year to the Detail shelf.

Adjust the Tooltip against the All marks card, so it is the same when you hover on all of the marks. And edit the title of the chart, referencing the Current Year and Previous Year fields.

The challenge has a ‘space’ between each Segment, and this is the final TIP I used.

On the Measure Values section on the left below the marks card, type in MIN(NULL). This will initially create a new ‘blank’ row between the bars and the gantt marks, which isn’t where we want the blank row to be.

To resolve this, simply click on the MIN(NULL) text in the chart and drag the text below the PY Sales (copy) text

And now you just need to uncheck Show Header against the Measure Names pill on Rows, and the Measure Values and PY Sales (copy) fields on the Columns. Then remove all row and column borders and gridlines and hide labels for rows and columns.

Hopefully you’ve got the final viz which you can now add to a dashboard. My published version is here.

Happy vizzin’! Stay Safe!

Donna

Customer Purchasing Habits (RFM Analysis)

November 11, 2021 Donna ColesLeave a comment

Ann set this week’s #WOW2021 challenge during #TC21 and chose to live stream her build. I couldn’t watch it but I did then manage to catch a snippet of Kyle Yetter who bravely chose to share his attempt at recreating the challenge via a live stream too with Ann & Luke watching. I had already completed my build by the time I watched Kyle, and it was interesting to see where our approaches differed.

The main via is a single chart, so I started by building out all the data I needed in tabular form, so I could verify the sort.

Defining all the calculations for the main viz

First up, the data needs to be compared to ‘today’ where ‘today’ is set to 01 Jan 2022. I used a parameter to store this

pToday

Date constant set to 01 Jan 2022

We also need to understand the latest order date per customer, so need

Max Order Date By Customer

DATE({FIXED [Customer ID]: MAX([Order Date])})

and so with this we can calculate the days since last order, which is one of our key measures.

Days Since Last Order

DATEDIFF(‘day’, [Max Order Date By Customer], [pToday])

The other measures we need are

#Orders

COUNTD([Order ID])

Avg. Order Value

SUM([Sales])/[#Orders]

along with Sales and Quantity.

These are the 5 main measures displayed, but there are some additional calculations needed for the display labels and tooltips.

The label to indicate the ‘time since last purchase’ is displayed in days or years, so for this I created 2 specific fields

LABEL:Days Since Last Order

IF [Days Since Last Order] <365 THEN [Days Since Last Order] END

This will only store a value when the days is less than 365. I then formatted this field to have a ‘ days’ suffix

Similarly, I created

LABEL:Years Since Last Order

IF [Days Since Last Order]>=365 THEN [Days Since Last Order]/365 END

which stores a value for records >=365 only. This was formatted to have 1 dp and the ‘ years’ suffix instead.

The colour of the first measure is based on whether the customer is considered ‘active’ or not. The threshold for this is managed via a parameter

pActiveCustThreshold

An integer parameter defaulted to 90.

We can then create

Is Active Customer?

[Days Since Last Order]<=[pActiveCustThreshold]

This is a boolean field and will return true or false, but to get the tooltip value to display appropriately, I edited the alias of this field (right click > Aliases)

For the tooltip on the ‘total products’ (ie quantity measure), we need to display the number of distinct products orders, which we can capture in

#Products

{FIXED [Customer ID]: COUNTD([Product ID])}

Pop all these fields, along with the Customer ID and Customer Name into a table and you can see validate all the values are as you expect

Sorting the Sort

We need another parameter to manage the sort.

pSort

I used an integer parameter, but set the display to the required text strings. This is because writing logic on integers is more efficient than strings

I then created a calculated field to determine the measure to sort by

Sort By

CASE [pSort]
WHEN 1 THEN SUM([Days Since Last Order]) * -1
WHEN 2 THEN [#Orders]
WHEN 3 THEN SUM([Sales])
WHEN 4 THEN [Avg. Order Value]
WHEN 5 THEN SUM([Quantity])
END

Notice the first measure is being multiplied by -1 since this measure is to be displayed in Ascending order rather than Descending.

In the table, amend the Customer ID pill to Sort by the Sort By field descending

Show the pSort parameter and test the functionality by switching the values.

Building the main viz

Having got all the calcs defined, this viz is relatively straight-forward. It is a multiple axis viz by Customer ID and Customer Name (note I use Customer ID as well, which I hide, just in case there are multiple customers with the same name).

Customer ID and Customer Name on Rows. Apply sort to Customer ID as described above. Hide Customer ID (uncheck Show Header). Align Customer Name to right, and adjust font.
Type in MIN(1) on Columns. Edit Axis to be fixed from 0-1. Add Is Active Customer? to Colour. and adjust. Add LABEL:Days Since Last Order and LABEL:Years Since Last Order to Label. Align Centre and match mark colour. Make sure the two fields are side by side on the label, and not underneath each other. Add Max Order Date By Customer to the Tooltip shelf. Adjust Tooltip
Add #Orders to Columns. Strip off all the fields that have been automatically added to this marks shelf. Set the Colour accordingly. Show the Label at the end of the bar. Adjust Tooltip
Add Sales to Columns. Set the Colour accordingly. Show the Label at the end of the bar. Adjust Tooltip
Add Avg. Order Value to Columns. Set the Colour accordingly. Show the Label at the end of the bar. Adjust Tooltip
Type in MIN(0) on the Columns shelf. Change mark type to Circle. Set the Colour accordingly. Add Quantity to Label. Add #Products to Tooltip. Adjust Tooltip
Remove all row & column borders and gridlines, zero lines etc.
Add a 0 constant Reference Line to the #Orders, Sales and Avg. Order Value axes.
Hide the axes, hide field labels for rows.

Building the Viz in Tooltip

The Viz in Tooltip shows the Top 10 Products per Customer by Sales.

On a new sheet, add Customer ID, Customer Name and Product Name to Rows and Sales and Quantity (via Measure Values) to Text.

Add a Rank Quick Table Calculation to the Sales pill, then edit the table calc to be unique and to compute by Product Name

Click on the Sales rank pill , press Ctrl and drag it into the measures pane. Name the field Sales Rank. Format the field to have a # prefix. Add Sales back into the view.

Now drag the Sales Rank field from the Measure Values section and add to Rows, then change to be a discrete (blue) field, then move so it is to the left of the Product Name field. This should cause everything to re-sort into the required order automatically. It’s always worth double checking the table calc is still computing as expected.

Click on the Sales Rank pill and press Ctrl and this time drag to the Filter shelf and check values 1 to 10.

Now hide the Customer ID and Customer Name fields, and format the display.

Finally, set the display to Entire View. This will make the ‘view’ unreadable, but is necessary to ensure you don’t get a ‘viz is too large to display’ message when you get to add the Viz to the tooltip.

Adding the Viz in Tooltip

Back to the chart sheet, and edit the tooltip of the MIN(0) marks card. Insert the Top 10 sheet via Insert > Sheets. Adjust the width and height values and change the filter to just use <Customer ID>

Hovering on the circle mark should then display the Top 10 Products for that specific customer.

Adding the column headings

On the dashboard, use a horizontal layout container above the main viz. Add 6 text objects, enter the relevant text and centre. Adjust the width of each text object to line up with each column.

The help icon

Add a floating text object to the sheet – it’ll appear over the top of the chart. Position the text object to the top left, and set the background to white. Enter the text. From the context menu of the object, select the Add Show/Hide Button. Position the button in the appropriate location and resize. Edit the button options to have a grey background and show ? when hidden and X when shown.

I think that’s covered all the core points. My published viz is here.

Happy vizzin’! Stay Safe!

Donna

Can you show the top and bottom profit performers for the selected time period?

November 4, 2021 Donna ColesLeave a comment

With #TC21 looming next week, Candra’s set this week’s challenge, based on inspiration from past Tableau Conferences – a simple looking, but effective visualisation for understanding profit performance within some pre-established timeframes.

Building the BANs
Identifying Top 5 / Bottom 5 / Everything Else
Building the Chart and Labelling the Bars
Adding the interactivity

Building the BANs

The timeframes we need to report over need to be based on a specific date. In this case it’s the latest date in the data set. If you were using this for a business dashboard, you might be basing it on Today / 1st of the Current Month etc. Rather than hardcode the date I need, I’ve worked out the latest month I want to use by

Max Month

{FIXED:MAX(DATE(DATETRUNC(‘month’,[Order Date])))}

Set all the Order Dates in the data set to be the 1st of the month, then get the maximum of these dates. So as the last date in the data set is 30th Dec 2021, that’s been truncated to 1st Dec 2021 which is then what this field stores.

I then want to capture the profit values for each month, quarter, year into separate fields, so we have

Month

IF DATETRUNC(‘month’, [Order Date])=[Max Month] THEN [Profit] END

This only stores Profit values for rows where the Order Date is also in December

Quarter

IF DATETRUNC(‘quarter’,[Order Date])=DATETRUNC(‘quarter’,[Max Month]) THEN [Profit] END

This only stores Profit values for rows where the Order Date is in the same quarter as December (ie the 4th quarter which is months Oct-Dec).

Year

IF DATETRUNC(‘year’,[Order Date])=DATETRUNC(‘year’,[Max Month]) THEN [Profit] END

This stores Profit data for rows where the Order Date is in the same year.

All these fields are formatted to be $ with 0 dp.

A basic viz can the be built with Measure Names on Columns and Measure Names and Measure Values on Text. The Measure Names heading is then hidden, and the font and table formatting adjusted so the sheet looks as below.

Note – Naming these fields Month, Quarter, Year rather than Monthly Sales, Quarterly Sales etc, makes this display much easier and also helps with the interaction later.

Identifying the Top 5 / Bottom 5 / Everything Else

We need to be able to identify the Sub-Categories which have the best profits, those that have the worst, and the ‘rest’. We’re going to use Sets to help us with this. However the set entries could change depending on whether we’re looking by month, by quarter or by year. So first we need to create a field that is going to store the particular Profit value we need depending on what time period is being selected.

We need a parameter pDatePart to capture the time frame. This is a string field which is just defaulted to the text ‘Month’.

The interactivity later will set this parameter to the different values.

So now we know the ‘selected’ date part, we need to get the appropriate profit value

Value To Plot

CASE [pDatePart]
WHEN ‘Month’ THEN [Month]
WHEN ‘Quarter’ THEN [Quarter]
ELSE [Year]
END

This just uses the values from the 3 measures we created to start with.

So now we can create the sets we need. Right click on Sub-Category > Create > Set and create a set called Top 5 that is based on the Top 5 Value to Plot values

Then create another set in the same way called Bottom 5

With these sets, we can now determine the Sub-Category ‘label’ that will be displayed

Sub-Category Display

IF [Top 5] OR [Bottom 5] THEN [Sub-Category] ELSE ‘Everyone Else’ END

and the grouping that will be used to colour the bars

Sub Cat Group

IF [Top 5] THEN ‘Top 5’
ELSEIF [Bottom 5] THEN ‘Bottom 5’
ELSE ‘Everything Else’
END

Building the Chart & Labelling the Bars

Ok, so now we’ve got the building blocks in place, we can build the chart. You will probably be tempted to build a bar chart (I did to start with), but positioning the labels then became a bit tricksy. When we get to the labels, we’re going to need to use the left and right alignment options. However, when you build a bar chart, if you right align the label, the label will be positioned outside at the end of the bar (even though this seems a little odd with negative values, as it looks to be on the left…).

Right aligned labels

But then we set the labels to be left aligned, the labels appear inside the bar instead, and not outside on the left.

Left aligned labels

So instead, rather than using the bar mark type, we need to build this chart using the gantt mark type, and base the Size on the Value to Plot field.

However, the value being plotted is actually an average value based on the number of Sub-Categories being ‘grouped’ as otherwise the value associated to Everything Else can end up bigger than all the rest. I created the following field

Avg Value To Plot

SUM([Value to Plot])/COUNTD([Sub-Category])

formatted to $ with 0dp.

So now we start building by adding Sub-Category Display to Rows and type in MIN(0) into Columns. Change the mark type to Gantt and add Avg Value To Plot to Size. Add Sub-Cat Group to Colour and adjust accordingly. Sort the Sub-Category Display field by Avg Value To Plot descending.

Now we can’t just label by a single field of the value or the sub-category, as while the ‘automatic’ label alignment option, almost puts the labels in the right positions, there is no way to define an ‘opposite’ to the ‘automatic’ alignment. We need to define some dedicated label fields based on where we want them to display.

Label – Left – Profit

IF [Avg Value To Plot]<0 THEN [Avg Value To Plot]
END

If we’re in the bottom half of the chart, we’re going to display the Profit value on the left side.

Label – Left – Sub Cat

IF [Avg Value To Plot]>=0 THEN ATTR([Sub-Category Display])
END

If we’re in the top half of the chart, we’re going to display the Sub-Category Display on the left side.

Add both these fields to the Label shelf and then adjust the label alignment to be left.

To label the other ends, we need to create two further label fields

Label – Right – Profit

IF [Avg Value To Plot]>=0 THEN [Avg Value To Plot]
END

Label – Right – Sub Cat

IF [Avg Value To Plot]<0 THEN ATTR([Sub-Category Display])
END

We then need to create another MIN(0) on Columns (easiest way is to hold down control, then click on the existing MIN(0) field and drag it next to itself to create a duplicate. Then on the 2nd marks card, remove the two Label – Left – xxx fields and add the two Label – Right -xxx fields. Change the alignment to right.

The make the chart Dual Axis and synchronise the axis.

Now you can hide the Sub-Category Display header from showing, hide the axis, remove gridlines etc.

Adding the interactivity

Once the two sheets are on the dashboard, you can add a dashboard parameter action which will on select of the KPI/BAN chart, pass the Measure Name into the pDatePart parameter. When the mark is unselected, the parameter value should stay as it is.

And hopefully, you should now have a working viz. My published version is here.

Enjoy #TC21!

Happy vizzin’! Stay Safe!

Donna

Filtering Table Calcs

October 28, 2021 Donna ColesLeave a comment

It was Sean Miller’s turn this week to set the challenge, which had an added twist – to complete only using the web authoring feature in Tableau Public, something I’d never tried before.

Most challenges use a version of Superstore Sales which I usually already have on my laptop, due to the multiple instances of Tableau Desktop installed, but this time, I had to get the data from data.world. I saved this locally then logged into Tableau Public online and started the process to build a viz online, by clicking the Create a Viz (Beta) option.

When prompted I then chose to upload from computer, the data source file I’d downloaded, and once done I was presented with the online canvas to start building

Unsure as to whether there was any auto save facility online, I decided to save the workbook pretty much immediately, although I was first prompted to navigate to the Data Source tab and create an Extract before I could save.

Now I was ready to start building out the requirements.

I started with the trend chart. The grain of the chart was at the month level, so I first created a specific field to store the Order Date at this particular level

Order Date Month

DATE(DATETRUNC(‘month’,[Order Date]))

I then added this to the Columns shelf as a continuous (green) exact date, and added Sales to the Rows shelf. I changed the mark type to Area, set the Colour to a blue, dropping the opacity to 50% in an attempt to match Sean’s colouring and adjusted the tooltip.

I then added another instance of Sales to the Rows shelf next to the existing one, changing the mark type of this instance to Line and setting the Colour to a darker blue at 100% opacity. I then used the context menu on this pill to set a Quick Table Calculation of type Moving Average.

By default this sets the average to the 3-month rolling average as per the requirement. I adjusted the tooltip, made the chart dual axis and synchronised the axis, then removed Measure Names from the Colour shelf that had automatically been added.

My next step at this point was to work on the filtering requirement. To help with this, the first thing I did was to Duplicate as crosstab (right click on the chart tab). This created a tabular view of the data so I could work on getting the calculations I need for the filtering. I flipped the columns and rows, so my starting point was as below.

I chose to use parameters to capture the min and max dates that the user selects on the dashboard.

pMinDate

Date parameter defaulted to 01 Jan 1900

And I also created pMaxDate exactly the same way.

I then needed fields to store the relevant dates depending on whether a selection had been made or not

Min Date Selected

IF [pMinDate]=#1900-01-01# THEN {FIXED : MIN([Order Date Month])}
ELSE [pMinDate]
END

The FIXED statement sets the date to the minimum month in the data set, if the parameter is set to its default. The below doed similar to get the maximum date in the data set.

Max Date Selected

IF [pMaxDate]=#1900-01-01# THEN {FIXED : MAX([Order Date Month])}
ELSE [pMaxDate]
END

Using these dates, I then created a field to determine whether the month was within the min & max dates

Is Month Selected?

[Order Date Month]>=[Min Date Selected] AND
[Order Date Month]<= [Max Date Selected]

Pop this field onto the table view, and show the parameters and set them to some other values eg Min Date = 01 Jun 2018 and Max Date = 01 Sept 2018. You’ll also need to adjust the table calc setting of the Moving Average Sales pill to compute by Is Month Selected as well.

You’ll see True is stamped against the rows which match the date range. Now, if you make a note of the Moving Average values against these rows, and then add Is Month Selected? = True to the Filter shelf, you should get just the True rows displaying, BUT the Moving Average values are now different. This is because, with this type of field as a filter, the filter is being applied BEFORE the table calculation is being applied – the table calculation essentially has ‘no knowledge’ of the rest of the rows of data as they have been filtered out. Solving this issue is the crux of this challenge.

Remove Is Month Selected? from the filter shelf. We’re going to create a new field to filter by instead

FILTER

LOOKUP(MIN([Is Month Selected?]),0)

This field uses the LOOKUP table calculation, and is basically looking at the value of the Is Month Selected? field in it’s ‘own row’ (denoted by the 0 parameter).

Pop this onto the table view, and you can see that it’s essentially a duplicate of the Is Month Selected? field.

Now add the FILTER =True field to the Filter shelf instead, and this time the rows will once again filter as required, but the Moving Average values will be the same as those in the unfiltered list, which is what we need.

This works because the field we’re filtering by is a table calculation. When added to the filter shelf, these types of fields are applied at the ‘end’ of the processing; ie the table calculation is computed against all the rows in the data AND THEN the filter gets applied. As a result the Moving Average values are retained.

This is all part of Tableau’s Order Of Operations and is a fundamental concept when working with filters.

Armed with this knowledge, we can now apply the FILTER=True to the Filter shelf of the chart we built.

Now we can start to work on the bar charts, which will show Sales along with Average Monthly Sales. For this 2nd measure, I need to work out the number of months captured between the min & max dates.

No. Months

DATEDIFF(‘month’, [Min Date Selected],[Max Date Selected])+1

And with this I can now create

Avg Monthly Sales

SUM([Sales])/MIN([No. Months])

The bar chart viz can then be quickly created by adding Sub-Category to Rows and both Sales and Avg Monthly Sales to Columns, and sorting by Sales descending. The colours of each bar are adjusted to suit.

The bars also need to react to the dates selected, but we can simply add the Is Month Selected? = True to the Filter shelf for this, as none of the values displayed on this chart are reliant on table calculations.

To remove the gridlines from both charts, select Format > Workbook from the menu, scroll down the Format dialog displayed on the right and set Gridlines to Off

Finally, title both sheets appropriately, and then add to a dashboard. We need to add 3 actions to this dashboard – 2 parameter actions and 1 filter action.

Create a parameter action which will on Select, set the pMinDate parameter by passing the Minimum Order Date Month, and will reset back to 01 Jan 1900 when unselected.

Then create another instance, which sets the pMaxDate parameter instead, passing the Maximum Order Date Month.

Finally, create a Filter Action which on Select of the bar chart, filters the trend chart

And with that, the challenge should be complete. My published viz is here.

Happy vizzin’! Stay Safe!

Donna

Can you use set actions to create a reference date range?

October 21, 2021 Donna ColesLeave a comment

There’s a lot packed into the challenge this week, which was “set” by Lorna Brown and Erica Hughes to test our Set Action skills. How detailed this blog will go, I have yet to decide… I’ve got a couple of hours to get this nailed, so it could get quite brief as we get towards the end 🙂

I’ve got 6 sheets/charts making up this dashboard, so my intention is to summarise each one, and I’ll define the various calculations that are going to be needed as we go.

The overall summary table
The selected months summary table
The trend line
The donut chart
The top 3 states table
The map
Adding the interactivity

The overall summary table

This challenge is focused on understanding the Sales per month. Whilst its possible to use the built in aggregation features of a date field, I often prefer to create explicit date fields at the level I require, so it’s easier to reference. Therefore, the first field I created for this challenge was

Order Date To Plot

DATETRUNC(‘month’, [Order Date])

This essentially ‘groups’ every order placed in a month to be tagged with the 1st of the month. I custom formatted this field to MMM yyyy (ie Oct 21).

For the overall summary table, I need to capture the total sales of the whole data set, and I use a Fixed LoD calculation for this.

Total Sales

{FIXED: SUM([Sales])}

This field is formatted to $0.00M

NOTE – I actually named this field <space>Total Sales<space> as I want to display the name of the field (the measure name) in the summary table, but the ‘selected months’ summary table also has a Total Sales measure which is a different calculation (see later). Adding the <spaces> is a sneaky way to get two fields with what appears to be the same name. As this field when displayed will be centred, the <spaces> aren’t noticeable.

We also need to get the monthly average sales for the whole data set

Average Sales by Month

AVG({FIXED [Order Date To Plot]: SUM([Sales])})

Format this to to $0.0K

We can now build the summary table by adding Measure Names to the Filter shelf and selecting these 2 fields. The placing Measure Names on Rows and Measure Names and Measure Values on Text. Reorder the measures as required, hide Measure Names on Rows and format the Text as required.

Change the title of the sheet to Superstore Sales, ensure the tooltip doesn’t display and remove all gridlines / row banding etc.

The selected months summary table

The core requirement of this challenge is to make use of set actions, so we’re obviously going to need a set which will contain the dates (months) the user will select on the chart. This set will be based off of Order Date To Plot. Right click on the field > Create > Set. Name it Order Dates To Plot Set and by default, select all the months between Oct 20 to Jun 21 inclusive.

Later I’ll describe how the values of this set will get updated, but for now, we need to get some information relating to the sales in these selected months.

Firstly, we want the total sales for the months in this set.

Total Sales

IF [Order Date To Plot Set] THEN [Sales] END

The default format for this field is set to $ with 0 dp.

Note – this is the other ‘total sales’ field mentioned earlier. This field name has no leading/trailing spaces.

To get the average, I needed a field just to store each member of the set (ie each selected month)

Selected Dates

IF [Order Date To Plot Set] THEN [Order Date To Plot] END

and with this I can then work out

Average Sales

AVG({FIXED [Selected Dates]: SUM([Total Sales])})

The final measure required for this section is the change within the date range, which is basically comparing the value of sales at the first month in the selection with the sales in the final month selected. We need a few fields to get to this.

Firstly, we want to identify the first and last months

Min Selected Date

{FIXED:MIN(IF [Order Date To Plot Set] THEN [Order Date To Plot] END)}

If the date is in the set, then return the date and then take the minimum of all the dates, and store against all the rows in the data. Similarly we have

Max Selected Date

{FIXED:MAX(IF [Order Date To Plot Set] THEN [Order Date To Plot] END)}

Putting this info into a table, you can see how the calculations are working. The values for the Min & Max dates are the same across every row.

Next we need to get the Sales at the min & max points, and spread that value across all rows

Sales at Min Date

{FIXED: SUM(IF [Order Date To Plot]=[Min Selected Date] THEN [Sales] END)}

Sales at Max Date

{FIXED: SUM(IF [Order Date To Plot]=[Max Selected Date] THEN [Sales] END)}

Now we can work out the difference

Change within Date Range

([Sales at Max Date]-[Sales at Min Date])/[Sales at Min Date]

format this to a percentage set to 1 dp

Finally, we need to know the number of months in the set, which is displayed in the title of the monthly summary sheet.

Months in Set

{FIXED: COUNTD(IF [Order Date To Plot Set] THEN [Order Date To Plot] END)}

If the date is within the set, then capture the date, and the count the distinct set of dates captured.

Make this a discrete field (move from the measures section at the bottom of the data pane to the dimensions section at the top (above the line), and add to the tabular view

Now we can build the summary sheet.

Add Measure Names to the Filter shelf and this time filter by Total Sales, Average Sales and Change within Date Range. Add Measure Names to Rows and Measure Values to Text. Reorder the measures.

Format the Total Sales to be in $K, by selecting the Format option from the context menu of the Total Sales pill on the Measure Values shelf (hover on the pill and click the carrot/down arrow that appears – by formatting this way, we’re changing the display of this field for this sheet only).

Add Min Selected Date and Max Selected Date to the Detail shelf and set to be Exact Date. Format both these fields via the pill context menu to be the ‘March 2001’ format

Also add Months in Set to the Detail shelf.

Adjust the title of the sheet as below

Finally you need to set the background of the worksheet to the relevant purple (I used #8074a8), adjust the colours of all the fonts to white and adjust the size/style of the fonts in the table. Remove all gridlines/row banding etc, and you should have something like below

The Trend Line

By this point we’ve built all the calculated fields we need for this chart. This is a dual axis line chart, as we want the colour of the line for the selected dates to be different from the non selected ones, and we want to display a label for the highest sales in the selected timeframe.

Add Order Date to Plot to Columns, and set as a Continuous (green) pill set to Exact Date
Add Sales to Rows
Add Total Sales to Rows
Make the chart dual axis, and synchronise axis.
Adjust the colours of the Measure Names colour legend
On the Label shelf of the Total Sales marks card, set to label the maximum value only

On the All Marks Card, add Min Selected Date and Max Selected Date to the Detail shelf and set to Exact Date.
Right click on the Order Date To Plot axis and Add Reference Line
Create a reference band that starts at the constant Min Selected Date, ends at the constant Max Selected Date, is bounded by dotted lines and shaded between

Hide the Sales and Total Sales axis, format tooltips and adjust the row & column dividers.
Change the title and you should get to

The donut chart

Donut charts are 2 different sized pie charts on top of each other, created using a dual axis chart. On the Rows shelf type in MIN(0). Then type the same next to it. This gives you two axis and two marks cards.

We only care about information related to the selected dates for this chart, so we can add Order Date To Plot Set to the Filter shelf, which by default will just restrict the information to the data ‘in’ the set.

Change the mark type of the 1st MIN(0) marks card to Circle and add Sales to the Tooltip shelf. Adjust the size of the charts/mark.

Change the mark type of the 2nd MIN(0) marks card to Pie and add Sales to the Angle shelf. Add State to the Detail shelf. Sort the State field by Sales descending.

Note – the circles might look the same at this point, but if you hover over the bottom one, you should see that it’s segmented by State.

We need some new fields now to help us identify the top ranking states.

Sales Rank

RANK(SUM([Sales]))

This is a table calculation, so it’s best to see how this field will work in a table view – build one out as below, and set the table calculation on the Sales Rank field as shown

We’re now going to ‘group’ the ranks into the top 3 and everything else

Sales Rank Group

IF [Sales Rank]<=3 THEN [Sales Rank] ELSE 10 END

We can now use this Sales Rank Group field to colour the pie chart. On the 2nd MIN(0) marks card, add Sales Rank Group to the Colour shelf. Adjust the table calculation to compute using State as above, then change the field to be Discrete (blue). Adjust the colours to suit.

Now make the chart dual axis, and synchronise the axis. Adjust the size of the 1st MIN(0) circle to be smaller than the pie. If it’s not showing, right click on the right hand axis and move marks to back. Colour the circle white. Adjust tooltips to suit and hide axis, column/row dividers etc. Update the title. You should have

The top 3 states table

Add Order Date To Plot Set to Filter
Add State to Rows and Sales to Text and sort descending.
Add Sales Rank to Filter and set to At Most is 3. This will just show the top 3 states.
Add State to Text
Add a Percent of Total Quick Table Calculation to the existing Sales field that’s on the Text shelf (via the context menu of the pill)
Add another instance of Sales back onto the Text shelf
Adjust / format the font size and layout of the fields on the Text shelf
Add Sales Rank to the Size shelf and set to be discrete (blue) and set the mark type to be Text. Adjust the size of the marks – it’s likely it’ll need to be reversed and the range adjusted.

Hide the State field on Rows, adjust the font colours, remove row banding and row/column dividers. You should end up with…

The map

Add Order Date To Plot Set to the Filter shelf
Add State to Detail – this should create a map (edit locations to be US if need be – Map -> Edit Locations menu)
Add Sales to the Colour shelf
Edit the colour range to a suitable purple range ( I set the darkest colour of the range to #6c638f)
Adjust the map layers (Map -> Map Layers) so only the option highlighted below is selected.

Adding the interactivity

Add all the sheets to the dashboard, using vertical and horizontal containers to arrange the relevant layout. Then add a Set Action (Dashboard > Actions > Add Action > Change Set Values). The Set Action should be configured as below :

And fingers crossed, you should now be able to select marks on the trend line and see all the other charts, except the initial summary table, all update. My published viz is here.

Happy vizzin’! Stay Safe!

Donna

How much do top sub-categories contribute to sales?

October 14, 2021 Donna ColesLeave a comment

A colourful #WorkoutWednesday challenge this week, courtesy of Ann Jackson incorporating pie charts, top N functionality and interactivity and a highlight table. Pie charts can cause much debate amongst the data viz community and if this one was just representing the multitude of sub-categories, it certainly wouldn’t be ideal. But when the core aim is to simply present 2 key measures (those in the top N against the rest), the pie is a familiar and effective visual. In this instance, the outer ring segmenting all the sub-categories provides additional context without detracting from the main purpose of the viz.

So lets build…

Creating the core calculations
Building the Pie Chart
Building the Highlight Table
Adding the Interactivity

Creating the core calculations

First up, we’re going to need a parameter to define the ‘Top N’. Create an integer parameter with a range from 1 to 17, that steps every 1 interval, and is defaulted to 5.

pTopN

Next we’re going to use a Set to capture the Sub-Categories that are in the Top N Sales. Right click on Sub-Category -> Create ->Set. Use the Top tab to define a set captures the Sum of Sales that is based on the pTopN parameter.

Now, we want to create a grouping of those in and out of the set, which will be used as part of the highlight table

Sub-Cat Group

IF [Sub-Category Set] THEN ‘IN TOP ‘ + STR([pTopN])
ELSE ‘ALL ELSE’
END

Pop all these fields out into a table so you can see what’s going on as you change the pTopN parameter. Sort the Sub-Category by Sales descending.

Now we need to identify the % value of Sales for the Sub-Categories that are in the Top N (this is the label on the darker segment of the central pie chart), so for that we need

Total Sales

{FIXED:SUM([Sales])}

Top N Sales (in hindsight, this should have been named Sales per Group or similar)

{FIXED [Sub-Category Set] : SUM([Sales])}

Top N Sales %

IF ATTR([Sub-Category Set]) THEN
SUM([Top N Sales])/SUM([Total Sales])
END

Format this to percentage with 0 dp.

Adding to the table, we can see the values

The final field we need in order to build the pie, is an additional one to store the label text

Label:SubText

IF [Sub-Category Set] THEN ‘TOP ‘ + STR([pTopN]) END

Building the Pie Chart

To achieve this we’re going to build a dual axis pie chart, where one pie is used to define the In/Out of Top N segmentation in the centre, and the other pie is used to create the outer ring.

Create an axis by typing in MIN(0) onto the Rows shelf, and then adding another instance of MIN(0) next to it. This will generate 2 marks cards, which is where the fields to build the pie charts will be placed.

In the first MIN(0) marks card, change the mark type to Pie, then add Top N Sales to the Angle shelf and Sub-Category Set to the Colour shelf. Adjust colours to suit. Then add Top N Sales % and Label:SubText to the Label shelf. Adjust size of the view and the chart to suit. Also remove all text from the Tooltip.

Positioning the text is a bit fiddly. If you click on the text so the cursor changes to a cross symbol, you can then drag it to a better location. However, when you change the Top N parameter, the text will move. You can go through each parameter value and reposition the text each time (which I did.. it wasn’t too onerous for 17 values), however I found when published to Tableau Public, the positioning wasn’t honoured. Ann’s solution was the same, so I didn’t get too hung up on this, although if anyone resolved it, I’d love to know!

Now on the 2nd MIN(0) marks card, again change the mark type to Pie, and this time add Sales to the Angle shelf and Sub-Category to Colour. Sort the Sub-Category field by Sales descending. Additionally add Sub-Category Set to the Detail shelf (this will be needed later on to make the interactivity work). Edit the colour palette to use the Hue Circle options. Adjust the size of the pie chart. Adjust the tooltip too.

Now make the chart dual axis and synchronize the axis. If the colourful chart is displayed ‘on top’, then right click on the right hand axis and select move marks to back. Adjust the sizes of both pies, so the colour wheel is slightly larger than the other one.

Now hide the axis, and remove all borders and gridlines.

Building the Highlight Table

I’ve built the highlight table as a bar chart. Start off by adding Sub-Category Set, Sub-Cat Group and Sub-Category to Rows. Sort Sub-Category by Sales descending. Then type in MIN(1) into the Columns shelf.

Now add subtotals via the Analysis > Totals > Add all Subtotals menu. This adds 2 additional rows to each section

But we don’t want the ‘grand total’, so click on the Sub-Category Set context menu, and uncheck Subtotals

To position the totals at the top, go to Analysis > Totals > Column Totals To Top

Then add Sub-Category to the Colour shelf, and adjust the colour of the Total bar to white

We now need to get some text onto those bars, but we need some additional calculations to help up with this. Firstly, we want to get the rank of the Sub-Category. We’ll use a table calculation for this

Sales Rank

RANK(SUM([Sales]))

We also need a way to identify the Total rows differently from the main Sub-Categories. I referred to this Tableau KB for help here, and subsequently created

Size

SIZE()

To see what this is doing, add Size to the Label shelf, and adjust the table calculation setting to compute by all fields except the Sub-Category Set. The size of the total rows is 1.

Based on this logic, we can then create

LABEL:Bar

IF [SIZE]=1 THEN ‘SUBTOTAL FOR GROUP’
ELSE ‘#’+STR([Sales Rank]) + ‘ ‘ + ATTR([Sub-Category])
END

Add this to the Label shelf instead of the Size field and adjust the table calc settings as above. Align left. Then add Sales to the Label shelf too and adjust so its on the same row. Adjust the tooltip too.

Now hide the Sub-Category Set and the Sub-Category fields. Right click on the ‘IN TOP x’ text and Rotate Label, then click on Sub-Cat Group text and Hide Field Labels for Rows. Format the header text to suit.

Hide the MIN(1) axis, and set columns and gridlines to None. Adjust the Row dividers to be darker

Adding the Interactivity

Add the 2 sheets onto a dashboard, and add a Highlight Dashboard Action, that on Hover of either of the charts, it highlights the other chart based on the Sub-Category Set only.

I think that’s covered everything. My published viz is here.

Happy vizzin’! Stay Safe!

Donna

Can you make stacked bar charts easier to compare?

October 8, 2021 Donna ColesLeave a comment

Candra McRae was back to set the challenge this week. I found it relatively straightforward, so if you’re relatively new to Tableau, this is quite a good challenge to start with. In this we’ll cover

Grouping the states
Applying the sort
Adding the total value
Adding the interactivity

Grouping the states

The states need to be grouped based on the initial letter. Candra stated she wasn’t expecting a large IF STARTWITH… type formula. I did it by making use of the fact characters can be converted into ASCII which provides a numerical representation of a letter, which we can then utilise. So we need

State Initial ASCII

ASCII(UPPER(LEFT([State],1)))

This takes the 1st letter of the State, ensures it is uppercase, and converts to ASCII. You can see below what this looks does

With this knowledge, we can then create

State Group

IF [State Initial ASCII] <=77 THEN ‘A-M’
ELSE ‘N-Z’ END

and you can now easily create the stacked bar chart (note, I’ve already removed the various gridlines etc)

Applying the Sort

The ultimate intention is to capture the Category a user clicks on into a parameter, so we need to define that parameter

pSelectedCategory

A string parameter defaulted to Office Supplies

Show this parameter on your sheet, so you can manually test how the sort will work by manually changing the values.

Create a new calculated field to define the sort

Sort

IF [pSelectedCategory]=[Category] THEN 0 ELSE 1 END

Then edit the sort property of the Category field that’s on the Colour shelf as below

Now change the value in the parameter box to Technology and see how the chart changes.

Adding the total value

Create a new field to store the total values

Total Sales by State Group

{FIXED [State Group]: SUM([Sales])}

Add this onto the Detail shelf, then add a reference line per line as below

Adding the interactivity

Once you’ve added the chart onto a dashboard, you need to add a parameter action which will set the pSelectedCategory parameter with the value from the Category field on select.

To prevent the selected category from ‘remaining selected’ on click, I applied the ‘true=false’ trick I use a lot.

Create a field True = true and a field False = false, then add both to the Detail shelf of the chart viz. On the dashboard add a new filter action which on select passes selected fields only setting true = false. As this condition can never be true, the filter doesn’t apply and this clears the highlight action.

And that is it for this week – short & sweet, but covers a handful of bite-size concepts. My published viz is here.

Happy vizzin’! Stay Safe!

Donna

Can you quickly tell the net change between two dates?

September 23, 2021 Donna ColesLeave a comment

Continuing with ‘Community Challenge’ month, it was the turn of Will Perkins to set the challenge for this week; a challenge inspired by Google’s stock tracker.

By interacting with the published solution and reading the requirements, I deduced that I was likely to need 3 sheets – 1 for the Region headings & KPIs, 1 for the trend line chart, and 1 to drive the timeframe selections. The trend line chart looked like it was going to involve a dual axis combining a line and and area chart, along with ‘filled’ reference bands, although exactly how it would work I wasn’t entirely sure initially. Finally, there was going to be some ‘parameter actions’ action along with the ‘true = false’ trick to ensure selected marks didn’t remain highlighted.

But before we can tackle the actual chart build, we need to nail some of the calculations involved.

Identifying the date range to highlight

The intention of the chart is that on initial load, it has highlighted the timeframe for the last 14 days up to ‘today’. As this chart is being built with a static data set, which only has data up to the end of 2021, I chose to ‘hardcode’ my ‘today’ value into a parameter. This is so that in a year’s time when I might look at this again, I won’t be presented with a broken looking viz.

pToday

Date parameter defaulted to 20 Sept 2021

The user also has the ability to highlight/select dates on the chart itself, which will define a start and end date range. So we also need some additional parameters to capture this information.

pStartRange

Date parameter defaulted to 01 Jan 1900

Similarly you’ll need a pEndRange parameter too, also defaulted to 01 Jan 1900.

Later on we’ll define parameter actions which will ‘set’ these values based on user interaction.

With these fields, we can then define calculated fields to store the start and end dates depending on whether we’re using the defaults due to initial load (ie 14 days to today), or a user selected range.

Selected Range Start Date

IF [pStartRange] = #1900-01-01# THEN DATE(DATEADD(‘day’,-14,[pToday]))
ELSE DATE([pStartRange])
END

Selected Range End Date

IF [pEndRange] = #1900-01-01# THEN DATE([pToday])
ELSE DATE([pEndRange])
END

We’re going to be plotting Order Date on our axis at the day level, and so to simplify things IMO, I created

Order Date Day

DATE(DATETRUNC(‘day’,[Order Date]))

which I then reference in the following calculated field, which is just to capture all the days within the range selected

Selected Dates To Plot

IF [Order Date Day]>= [Selected Range Start Date] AND [Order Date Day]<=[Selected Range End Date] THEN [Order Date Day] END

We can now start to build out the basic chart

Plotting Order Date Day and Selected Dates to Plot side by side you can see the date axis differ, with only the dates from 06 Sep – 20 Sep 21 displaying on the right hand side. The marks type for Selected Date to Plot is set to Area, and to get the marks to join up, you need to turn Stack Marks Off (Analysis -> Stack Marks -> Off menu).

Defining the Timeframe to Display

We’re going to use another parameter to store the timeframe value

pTimeframe

String parameter defaulted to 6 MONTHS (note the case – it’s simpler to match it to the display format that’s going to be used)

We then need a calculated field to tell us what to do with this value

Timeframe to Display

CASE [pTimeframe]
WHEN ‘1 MONTH’ THEN [Order Date Day]>=DATEADD(‘month’,-1,[pToday]) AND [Order Date Day]<= [pToday]

WHEN ‘6 MONTHS’ THEN [Order Date Day]>=DATEADD(‘month’, -6, [pToday]) AND [Order Date Day]<= [pToday]

WHEN ‘YTD’ THEN [Order Date Day]>=DATETRUNC(‘year’,[pToday]) AND [Order Date Day]<= [pToday] WHEN ‘1 YEAR’ THEN [Order Date Day]>= DATEADD(‘year’,-1,[pToday]) AND [Order Date Day]<= [pToday]
ELSE [Order Date Day] <= [pToday]
END

This field will return true for all the dates that fall within each statement and false otherwise.

Add this field to the Filter shelf and select True.

You can test how the left hand side of the chart is affected by manually typing the different values into the parameter

Colouring the chart

The line and area charts are coloured based on whether dates fall in the selected range and whether the difference between the sales values at the start and end of the selected range is positive or not. We need several more calculated fields to work this out.

We firstly need to capture the min and max dates of the selected area for each region. Now, you initially might think that the Selected Range Start Date and Selected Range End Date fields already have these values. However there isn’t always a sale in every region for these dates. You could argue, that in that case, the sales value for that date should be 0 (ie there were no sales on that day), but to match the solution (and it was easier), we just get the min and max dates within the selected range that have a sales value for each region.

Min Selected Date Per Region

{FIXED [Region]: MIN([Selected Dates to Plot])}

Max Selected Date Per Region

{FIXED [Region]: MAX([Selected Dates to Plot])}

Pop these out into a quick view, and you can see how the dates differ per region compared to the default start & end date values

Now we want to work out the sales value on these dates

Min Date Sales

{FIXED [Region]: SUM(IF [Order Date Day]=[Min Selected Date Per Region] THEN [Sales] END)}

Max Date Sales

{FIXED [Region]: SUM(IF [Order Date Day]=[Max Selected Date Per Region] THEN [Sales] END)}

and then we can work out the difference and the % difference

Range Sales Diff

SUM([Max Date Sales])-SUM([Min Date Sales])

custom formatted to +”$”#,##0.00;-“$”#,##0.00 to show a ‘+’ prefix for positive values

Range Sales % Diff

[Range Sales Diff]/SUM([Min Date Sales])

custom formatted to ▲0.0%;▼0.0%

Now we can compute a field to use to colour the line/area chart

Colour – Trend

IF MIN([Order Date Day]) >= [Selected Range Start Date] AND MIN([Order Date Day])<= [Selected Range End Date] THEN

IF [Range Sales Diff]>= 0 THEN 1 ELSE -1 END
ELSE 0
END

If we’re within the selected date range, then test to see if the value is positive (set to 1) or negative (set to -1), otherwise we’re outside the selected date range, so set to 0

Go back to the trend chart and add this field to the Colour shelf of the All Marks card (so it gets added to both sets of marks). Change it to be a discrete (blue) pill and the adjust the colours accordingly. At this point you may want to change the background colour if you’re using a white line. I’m just setting it to a light grey at this point, but eventually it’ll get set to black.

Adding the highlight band

This took a lot of thinking. I knew I’d need a reference band, but it took some time to figure out how to get the backgrounds coloured differently, since you only have the option to fill between the band with one colour.

The trick is to make use of the two date axes we have and to apply a band per pane.

But we need some more fields to make this happen.

Ref Line Start Date -ve

IF [Range Sales Diff]<0 THEN [Selected Range Start Date] END

Ref Line End Date -ve

IF [Range Sales Diff]<0 THEN [Selected Range End Date] END

Add these fields to the Detail shelf of the Order Date Day card and set to be continuous (green). Then add a reference band to this axis, applying the settings as below (note, the Line is a white dotted line, so isn’t showing up in the field setting, though you can see it on the viz).

Because the reference band has been set at the pane level, and the reference line dates are only relevant if the difference is negative, then the band is just showing on one row.

We then do something very similar, but this time we get some dates only if the difference is positive.

Ref Line Start Date +ve

IF [Range Sales Diff]>=0 THEN [Selected Range Start Date] END

Ref Line End Date +ve

IF [Range Sales Diff]>=0 THEN [Selected Range End Date] END

Add these as continuous pills on the Detail shelf of the Selected Dates to Plot card, and add another reference band to this axis instead.

Now you can set the chart to be a dual axis, synchronising the axes, and removing the Measure Names field from the All Marks card which will have automatically been added

This is the core viz, that will need further formatting before its ready to put on the dashboard – remove gridlines, borders etc, set background, remove headers. NOTE– You’ll need to manually re-sort the Regions before the field is hidden.

The KPI table

We need to build a ‘fake table’ for this, by putting Region on Rows and typing MIN(0) on Columns, then adding the Range Sales Diff and Range Sales % Diff fields to the Text shelf. We need an additional field to colour the text though.

Colour – KPI

[Range Sales Diff]>=0

Finally, I capitalised the Region values by using Aliases. This is a quick method when there aren’t many values, but otherwise I would usually create a field with UPPER([Region}).

Once again don’t forget to sort the Regions, and the apply relevant formatting.

The Timeframe Selector

Will states you can use a separate data source for this, so create the list in Excel

and then copy and paste (via the Data > Paste) menu into your workbook

On a new sheet add Date Range to Columns and Date Range to Text. The size and colour of the text differs based on which one has been selected. So create a field

Timeframe Selected

[Date Range] = [pTimeframe]

and add this field to both the Size and Colour shelves. You’ll need to adjust the settings, and hide headers, remove gridlines etc. Try to avoid touching the Text formatting directly, as you might find the Size then doesn’t adjust.

Adding the interactivity

You’ll need to use layout containers to organise all the objects on the dashboard. Then you can add the various dashboard parameter actions needed

Selecting the timeframe

Add a parameter action that on select passes the Date Range field into the pTimeframe parameter

Selecting the date range to highlight

You’ll need 2 parameter actions for this, one that passes the minimum Order Date Day selected into the pStartRange parameter, and the other that passes the maximum Order Date Day selected into the pEndRange parameter.

Deselecting the highlighted marks

By default when you click on a mark/select marks in Tableau, they are highlighted/selected and the other marks are faded, until you ‘click’ again. To stop this from happening I use a ‘true = false’ trick, that has become very common in #WOW challenges, and I’ve blogged many times before.

Create calculated fields

True

True

and

False

False

and add these to the Detail shelf of the All Marks card on the trend line chart.

Then on the dashboard add a Filter action that on select targets the sheet directly, mapping the true field to the false field. As this can never be ‘true’ the filter doesn’t apply, and the marks become unselected.

Repeat the same on the Timeframe Selector sheet.

Hopefully that’s covered all the core points. My published viz is here.

Happy vizzin’! Stay Safe!

Donna

When do extracts run during the day?

September 12, 2021 Donna Coles5 Comments

When I was approached by the #WOW crew to provide a guest challenge, I was a little unsure as to what I could do. I primarily work as a Tableau Server admin, so rarely have a need to build dashboards (which is why I like to do the weekly #WOW challenges, to keep up my Desktop skills). Then the next day I was looking at a dashboard I’d built to monitor extracts on our Tableau Servers, and I thought it would be an ideal candidate for a challenge. I also thought it would provide any users of Tableau Server with the opportunity to implement this dashboard in their own organisation if they wished, by sharing with their Server Admins.

As a Tableau Server Admin, you get access to a set of ‘out of the box’ Admin views, one of which is called ‘Background Tasks for Extracts’ which gives you a view of when extracted data sources and workbooks run on the server. However while the provided view is fine if you want to quickly see what’s going on now, it’s not ideal if you want to see how things ran over a longer timeframe – it involves a lot of horizontal scrolling.

Many server admins will have ‘opened’ up access to the Tableau repository, the PostgreSQL database which stores a rich array of data, all about your Tableau Server [see here for further info], and enables admins to extend their analysis beyond the provided Admin views. This site even provides a set of pre-curated data sources to help you get started! These aren’t formally supported by Tableau, but is the brain-child of Matt Coles, a server admin at Tableau (no relation to me though!).

My dashboard doesn’t actually use one of these data sources though. For the challenge, I’ve just created some sample, anonymised data in the required structure. I’ll explain later at the end of the post how to go about converting this to use ‘real’ server data, if you do want to plug it into your own server environment.

Understanding the data

When using Tableau Server, published data sources and workbooks can connect to their underlying data source (eg a SQL Server database, an Excel file etc) directly (ie via a live connection) or via an extract. An extract means that a copy of the data is pulled from the underlying data source and stored on Tableau Server utilising Tableau’s ‘in memory’ engine when the data is then queried. An extract gets added to a schedule which dictates the time when the extract will get refreshed; this may be weekly, daily, hourly etc. Every time the extract runs, a background task is created which provides all the relevant details about that task. The data for this challenge provides 1 row for each extract task that was created between Monday 11th Jan 2021 and Friday 5th Feb 2021. The key fields of note are:

Id – uniquely identifies a task
Created At – when the task was created
Started At – when the task actually started running (if too many tasks are set to run at the same time, they will queue until the server resources are available to execute them).
Completed At – when the task finished, will be NULL if task hasn’t finished.
Finish Code – indicates the completion status of the job (0=success, 1=failed, 2= cancelled)
Progress – supposed to define the % complete, but has been observed to only ever contain 0 or 100, where 100 is complete.
Title – the name of the extract
Site – the name of the site on the server the extract is associated to

Based on the Finish Code and Progress, I have derived a calculated field to determine the state of the extract (to be honest, I think this is a definition I have inherited from closer analysis of the Background Tasks for Extracts Server Admin view, so am trusting Tableau with the logic).

Extract Status

IF [Finish Code] = 1 AND [Progress] <> 100 THEN ‘In Progress’
ELSEIF [Finish Code] = 0 AND NOT [Progress] = 1 THEN ‘Success’
ELSE ‘Failed’
END

Building the required calculated fields

The intention when being used ‘in real life’, is to have visibility of what’s going on ‘Now’ as well as how extracts over the previous few days have performed. As we’re working with static data, we need to hardcode what ‘Now’ is. I’ll use a parameter for this, so that in the event you do choose to plug this into your own server, you only have to replace any reference to the Now parameter with the function NOW().

Now

Datetime parameter defaulted to 05 Feb 2021 16:30

The chart we are going to build is a Gantt chart, with 1 bar related to the waiting time of the task, and 1 bar related to the running time of the task. We only have the dates, so need to work out the duration of both of these. These need to be calculated as a proportion of 1 day, since that is what the timeframe is displayed over.

Waiting Time

(DATEDIFF(‘second’, [Created At], IF ISNULL(Started At]) THEN [Now] ELSE [Started At ] END))/86400

Find the difference in seconds, between the create time and start time (or Now, if the task hasn’t yet started), and divide by 86400 which is the number of seconds in a day.

We repeat this for the processing/running time, but this time comparing the start time with the completed time.

Processing Time

(DATEDIFF(‘second’, [Started At], IF ISNULL([Completed At]) THEN [Now] ELSE [Completed At] END))/86400

As mentioned the timeframe we’re displaying over is a 24 hr period, and we want to display the different days over multiple rows, rather than on a single continuous time axis spanning several days.

To achieve this, we need to ‘baseline’ or ‘normalise’ the Created At field to be the exact same day for all the rows of data, but the time of day needs to reflect the actual Created At time . This technique to shift the dates to the same date, is described in this Tableau Knowledge Base article.

Created At Baseline

DATEADD(‘day’, DATEDIFF(‘day’, [Created At], #2021-01-01#), [Created At])

And again, we’re going to need to do a similar thing with the Started At field

Started At Baseline

DATEADD(‘day’, DATEDIFF(‘day’, [Started At], #2021-01-01#), [Started At])

Putting this out into a table, you can see what this data all looks like (note, I’m just choosing a arbitrary date of 01 Jan 2021, so my baseline dates are all on this date:

Building the Gantt chart

We’re going to build a dual axis Gantt chart for this.

Add Site to Rows
Add Title to Rows
Add Created At to Rows. Set it to the day/month/year level and set to be discrete (ie blue pill). Format the field to custom format of ddd dd mmm yyyy so it displays like Mon 11 Jan 2021 etc
Add Created At Baseline to Columns, set to exact date
Add Waiting Time (Avg) to Size and adjust to be thin

This will automatically create a Gantt chart view

Add Started At Baseline to Columns, set to exact date, and move so the pill is now placed to the right of the Created At Baseline pill
On the Started At Baseline marks card, remove Waiting Time and add Processing Time (Avg) to the Size shelf instead. Adjust so the size is thicker.
Set the chart to be dual axis and synchronise the axes

The thicker bars based on the Started At / Processing Time need to be coloured based on Extract Status. Add this field to the Colour shelf of the Started At Baseline marks card and adjust accordingly.

The thinner bars based on the Created At / Waiting Time need to be coloured based on how long the wait time is (over 10 mins or not).

Over Wait Time Threshold

[Waiting Time] > 0.007

0.007 represents 10 mins as a proportion of the number of minutes in a day (10 / (60*24) ).

Add this field to the Colour shelf of the Created At Baseline marks card and adjust accordingly (I chose to set to the same red/grey values used to colour the other bars, but set the transparency of these bars to 50%).

Formatting the Tooltips

The tooltip for the Waiting Time bar displays

The Created At Baseline and Started At Baseline should both be added to the Tooltip shelf and then custom formatted to h:mm am/pm

The Waiting Time needs to be custom formatted to hh:mm:ss

The tooltip for the Processing Time bar is similar but there are small differences in the display,

Formatting the axes

The dates on the axes are displayed as time in am/pm format.

To set this, the Created At Baseline / Started At Baseline pills on the Columns shelf need to be formatted to h:mm am/pm

Adding the reference band

The reference band is used to highlight core working hours between 8am and 5pm. Right click on the Created At Baseline axis and Add Reference Line. Create a reference band using constants, and set the fill colour accordingly.

Apply further formatting to suit – adjust sizes of fonts, add vertical gridlines, hide column/axes titles.

Filtering the dates displayed

As discussed above, when using this chart in my day to day job, I’d be looking at the data ‘Now’. As a consequence I can simply use a relative date quick filter on the Started At field, which I default to Last 7 days.

However, as this challenge is based on static data, we need to craft this functionality slightly differently.

We’re only going to show up to 10 days worth of data, and will drive this using a parameter.

pDaysToShow

An integer parameter, ranging from 1 to 10, defaulted to 7, and formatted to display with a suffix of ‘ days’.

We then need a calculated field to use to filter the dates

Filter : Days to Show

DATETRUNC(‘day’,[Created At]) >= DATEADD(‘day’,([pDaysToShow]-1)*-1,DATETRUNC(‘day’,[Now]))

Add this to the Filter shelf and set to True.

Additionally, the chart can be filtered by Site, so add this to the Filter shelf too.

Building the Key legend

Some people may build this by adding a separate data source, but I’m just going to work with the data we have. This technique is reliant on knowing your data well and whether it will always exist.

On a new sheet, add Site to the Filter shelf and filter to sites 7 and 9.

Create a new field

Key Label

If [Site] = ‘Site 9’ THEN ‘Waiting’ ELSE ‘Processing’ END

and add this to the Columns shelf and sort the field descending, so Waiting is listed before Processing.

Alongside this field, type directly into the Columns shelf MIN(1).

Edit the axes to be fixed to from 0 to 1. Then add the Site field to the Colour shelf and also to the Size shelf and adjust accordingly (you may need to reverse the sizes). I lightened the colour by changing the opacity to 50%.

Now hide the axes, remove row & column borders, hide the column title and turn off tooltips.

The information can all now be added to a dashboard.

Using your own data

To use this chart with your own Tableau Server instance, you need to create a data source against the Tableau postgres repository that connects to the _background_tasks (bgt) table with an inner join to the _sites (s) table on bgt.site_id = s.Id. Rename the name field from the _sites table to Site. If you don’t use multiple sites on your Tableau Server instance, then the join is not required. The sole purpose of the join is to get the actual name of the site to use in the display/filter.

You should then be able to repoint the data source from the Excel sheet to the postgres connection. You may find you need to readjust some of the colours though.

When I run this, I’m using a live connection so I can see what is happening at the point of viewing, rather than using a scheduled extract. To help with this, I add a data source filter to limit the days of data to return from the query (eg Created at <=10 days), which significantly reduces the data volume returned with a live connection.

Hopefully you enjoyed this ‘real world’ challenge, and your server admins are singing your praises over the brilliance of this dashboard 🙂

My published version is here.

If you’ve got any feedback or suggestions on improvements to enhance the viz even further, please do let me know.

Happy vizzin’! Stay safe!

Donna

How much change has occurred?

September 2, 2021 Donna Coles2 Comments

A relatively straightforward challenge was set by Luke this week, to visualise the difference in Sales between 2020 and 2021 in a slightly different format than what you might usually think of.

Start by filtering the data to just the years 2020 and 2021 (add Order Date to the Filter shelf and select specific years, or add a data source filter to limit the whole data set).

Add Sub-Category to Rows, and Sales to Columns, then add Order Date to Colour which by default will display as YEAR(Order Date). Colour the years appropriately.

Now unstack the marks (Analysis menu -> Stack Marks -> Off), and re-order the colour legend, so 2021 is listed first (this makes the 2021 bars sit ‘on top’ of 2020).

Adjust the size to make the bars thinner.

Now add another instance of Sales to the Columns shelf, and make the chart dual axis (synchronising the axis). Reset the mark type of the original SUM(Sales) marks card back to bar.

We need the circle mark for the 2021 Sales to be blue. To do this, duplicate the Order Date field, then add Order Date (copy) to the Colour shelf of the SUM(Sales)(2) marks card. This will show another colour legend, and you can set the colours accordingly. Add a white border around the circle marks.

To work out the % difference to display on the label, we need the following fields

2021 Sales

{FIXED [Sub-Category]: SUM(IF YEAR([Order Date])=2021 Then [Sales] END)}

This returns the value of the 2021 sales for each Sub-Category against all the years in the data set. Similarly we need

2020 Sales

{FIXED [Sub-Category]: SUM(IF YEAR([Order Date])=2020 Then [Sales] END)}

which means we can then create

% Difference

(SUM([2021 Sales])-SUM([2020 Sales]))/SUM([2020 Sales])

format this using custom formatting to display as +0%;-0%

Now we can add % Difference to the Label field of the Sum(Sales)(2) marks card.

You’ll notice you’ll have duplicate labels displayed. To resolve this, you need to adjust the label settings as below

To sort the rows, you need to sort the Sub-Category field by 2021 Sales descending

And finally to show the value of the 2021 Sales, add this field to the Rows shelf, and change to be discrete (blue pill).

All that’s left to do now is adjust the wording of the tooltips as you see fit, and format to remove gridlines, headers etc.

My published version is here.

Happy vizzin’! Stay Safe!

Donna

Donna + DataViz

ramblings in all things Tableau (and occasionally some other Stuff….)

Can you recreate this difference chart?

Customer Purchasing Habits (RFM Analysis)

Can you show the top and bottom profit performers for the selected time period?

Filtering Table Calcs

Can you use set actions to create a reference date range?

How much do top sub-categories contribute to sales?

Can you make stacked bar charts easier to compare?

Can you quickly tell the net change between two dates?

When do extracts run during the day?

How much change has occurred?