Can you build a Top N Bar Chart on a Single Worksheet?

So back from my holibobs and catching up on the #WorkoutWednesday challenges I’ve missed while I’ve been sunning myself on a Greek beach.

First up is another guest challenge posted by Jeffrey Shaffer who asked us to create a Top N bar chart of the quantity per manufacturer per region on a single viz. The main twist with this is that a manufacturer could be ranked in different locations for each region. The full challenge is here.

Determine Manufacturer

In the 2019.1 Superstore Sales data set I had on my machine, there is no Manufacturer field. I couldn’t see one in the link to the dataset included on the blog page either. Perhaps I was missing something, but given I was on catch up, I chose not to find out the real reason. Instead I decided to derive my own Manufacturer field by just taking the first word of the Product Name. It meant my quantity figures wouldn’t match up, but the principles behind the challenge wouldn’t be affected.

MID([Product Name],1, FINDNTH([Product Name],’ ‘,1)-1)

Note – I’ve watched Ann’s solution (published on the challenge page), since I published my version, and Ann just lifts a pre-defined Manufacturer group onto her viz, which also already has an ‘Other’ manufacturer. My calculated field above has no such value. Ann’s solution is therefore a bit more straightforward than the work I had to put in to determine ‘Other’.

I interpreted ‘Other’ to be the total quantity of all the manufacturers not in the top n. But of course the top n had to take into account the Region, so it was possible that a manufacturer listed explicitly in the top n for the Central region say, might be included within the ‘other’ pot for another region.

To work this out, I created a quantity field per region ie

Central – Qty

IF [Region] = ‘Central’ THEN [Quantity] END

East – Qty

IF [Region] = ‘East’ THEN [Quantity] END

South – Qty

IF [Region] = ‘South’ THEN [Quantity] END

West – Qty

IF [Region] = ‘West’ THEN [Quantity] END

I then created a set per region to store the top n manufacturers based on the Quantity per region

The set was based on the Manufacturer field, and limited to the Top n Manufacturers parameter I set up, associated to the relevant [Region] – Qty field created above.

I had 4 sets in the end : Top Central, Top East, Top South, Top West

Note – It is entirely possible with this set up, that each set could contain a completely different set of manufacturers, so up to top n X 4 manufacturers in total

So having got a set which stored my top n per region, I needed to get the ‘Other’ grouping. To do this I created a new field

Manufacturer Category

IF [Region] = ‘Central’ AND [Top Central] THEN [Manufacturer]
ELSEIF [Region] = ‘East’ AND [Top East] THEN [Manufacturer]
ELSEIF [Region] = ‘South’ AND [Top South] THEN [Manufacturer]
ELSEIF [Region] = ‘West’ AND [Top West] THEN [Manufacturer]
ELSE ‘Other’
END

With the Top n parameter set to 10, this gives me

There are 12 rows listed, but only 11 values per region (the top 10 + other). You can see here that Wilson is within the top 10 for East & West, but not Central and South, whilst Hon is in top 10 for Central, East & South but not West. This is why there are 12 rows. If I changed the Top n parameter to 15, I’d get 20 rows listed, as there are 19 manufacturers across the top 15 for all the regions + other.

To get the sorting, we need to introduce an old friend :

Index

INDEX().

Adding this as a discrete (blue) pill onto the rows, moving the Manufacturer Category onto Text, and setting the Table Calculation Properties of the Index field to …

  • Compute over both Region and Manufacturer Category (with Region listed first)
  • To restart every Region
  • To sort by SUM of Quantity descending

… I start to get what I need

But I’ve still got 1 too many rows, I need to deal with the ‘Other’ pot. If my top n parameter is set to 10, then I essentially want to be able to show rows 1-10 above if the Include Other parameter is true, or show rows 2-11 if the Include Other parameter is false (Include Other is a boolean parameter I created).

FILTER – Other

NOT([Include Other]) AND [Manfacturer Category] = ‘Other’

Adding this to the Filter shelf and setting to False, basically hides the Manufacturer Category = Other when Include Other = No, or shows Manufacturer Category = Other when Include Other = Yes

So when Include Other = No, I have 10 rows indexed 1-10, and when Include Other = Yes, I have 11 rows indexed 1-11.

I only want the number of rows based on my Top N parameter, so I created an additional field

FILTER – Index

[Index] <= [Top n Manufacturers]

which is added to the Filters shelf and set to true, so only the rows that are less than or equal to the parameter value are displayed.

So we’re nearly there, except Jeff added an additional curveball…

Display Manufacturer & Rank on hover

To tackle this, I needed 3 more calculated field and a set, as we’re going to need Set Actions to handle this.

Index Rank

‘(#’ + STR([Index]) + ‘)’

this simply formats the index value into the format (#1)

Manufacturer + Rank

ATTR([Manufacturer Category]) + ‘ ‘ + [Index Rank]

this concatenates the two fields together – note the ATTR() is needed as Index Rank references the aggregated table calculation of INDEX().

Highlighted Manufacturer

This is a set created by simply right-clicking on Manufacturer Category and selecting Create -> Set, then choosing an arbitrary option (eg Avery). This will be changed via Set Actions later…

LABEL: Manufacturer

IF ATTR([Highlighted Manufacturer]) THEN ([Manufacturer + Rank]) ELSE ATTR([Manfacturer Category]) END

This is basically displaying the manufacturer with the rank if the manufacturer is in the set.

Adding LABEL:Manufacturer into our data table above, you can see that the Avery manufacturer is displayed with the rank, but the others aren’t

To make the value of the set change on hover, the view needs to be added onto a dashboard, and a Set Action created.

Create a dashboard sheet, and add the view. Then select Dashboard -> Actions -> Add Action -> Change Set Values

Set the properties in the dialog as below – the action to run on Hover, the Target Set to the be the Highlighted Manufacturer created above, and remove all set values when you ‘hover off’

Hovering on Eldon, for example, you can see the manufacturer label is changing in the relative positions

So that’s all the building blocks now in place, the viz just needs to be created, which I did by duplicating my table version above, and moving the pills around.

I basically created a synchronised dual axis with Index on rows, Region, MIN(0), SUM(Quantity) on columns. MIN(0) was set to be a Gantt bar and labelled with the LABEL:Manufacturer field set to be left aligned. The SUM(Quantity) field is labelled with the Quantity. Both labels were set to match the mark colour, which was based on Region using the Nuriel Stone palette specified. Headers were hidden and various formatting applied to the grid lines/rows/columns etc.

My version of the viz is here.

I’m off to ponder about where that Manufacturer field came from, as I think it would have all been a lot simpler if it already existed… hmmmm…

Happy vizzin!

Donna

Can you create step area charts?

Continuing on from last year, August is #WorkoutWednesday takeover month, with challenges posted by invited guests. First up, for Week 32 was this challenge from Klaus Schulte to create a step area chart WITHOUT the use of data densification (ie duplicating the data set).

WARNING! This blog describes my approach to solving the challenge which I can only describe as trying to find your way through a maze for the first time – you take a few wrong turns, double back on yourself several times, but eventually get to the the centre, feeling quite satisfied. However, you later find, that if only you’d turned right rather than left to start with, there would have been a much quicker direct route to get to the end. My approach to this challenge ended up being quite cumbersome due to the path I started down. Klaus’ solution is much simpler 🙂

There were minimal clues in the challenge requirements, apart from the fact that table calcs were going to be required and, as stated, you couldn’t duplicate the data set. So I started by viewing Klaus’ solution on Tableau Public to see what I could glean by mousing over the chart. From doing this I understood that

  1. The stepped area chart had to be managed using a single axis, since the dual axis would be needed for the ‘bonus’, highlighting the max & min drop.
  2. The stepped area chart was indeed an ‘area’ mark type (and not carefully disguised bars or similar), since when you hovered over a mark, an indicator appeared at both the top and bottom of the chart, and if you selected anywhere, the whole section was highlighted

Stepped Area Chart

So where to start…. I decided to watch the relevant section of the TC Europe talk Klaus referred to in the challenge intro, in which he showed a way of doing this by duplicating the data, to see if there were any clues there. By doing this, I understood that

  1. The difference between an area & line chart was that with an area chart, there is no Path shelf, so you can’t control how the marks will be joined up. An Area chart will just join the marks from left to right. This was already familiar, but the reminder was useful to jog the memory.
  2. Whilst Klaus’ solution was using a duplicated data set, he incorporated a method of offsetting the point to plot forward (+0.499) or backward (-0.499) depending on what data set was being referenced. This reminded me of something I’d done in a previous #WorkoutWednesday challenge a long time ago, where I needed to display a side by side bar chart

So I figured I needed to find a way to ‘expand’ the data to give me 2 rows per month by using some offset logic, which I did by creating new fields

Month Position To Plot

DATE(IF MONTH([Order Date]) = 12 THEN DATETRUNC(‘month’,[Order Date])
ELSE
IF DAY([Order Date]) <=15 THEN DATETRUNC(‘month’,[Order Date]) ELSE DATEADD(‘month’,1,DATETRUNC(‘month’,[Order Date]))-1 END
END)

This sets the date to be 1st Dec if the Order Date was December (as I didn’t want an extra point), otherwise, if the Order Date was before 15th of the month, the date would be the 1st day of the month, otherwise it would be the last day of the month.

Sales In Month

{FIXED [Category], DATETRUNC(‘month’, [Order Date]): SUM([Sales])}

I needed this LoD to store the total value of the monthly sales. Without it, due to my Month Position To Plot field above, the Sales would only show the partial sales in the month.

This made my data look like

Plotting this on a chart, setting the [Month Position to Plot] to be a continuous (green) pill set to the Day level, and choosing the Area mark type, and I got a stepped area chart 🙂 The days where the marks are plotted end up being so close together, that the vertical line looks straight to the eye.

Side notes

  1. I initially created a numeric field based on the month’s number offset by +/- 4.99 to plot, instead of the date field [Month Position To Plot], assuming I might be able to use custom formatting to show the date on the axis. When I eventually found this wasn’t possible, I confirmed the axis on Klaus’ solution was actually associated to the main view, and not part of another view carefully placed on the dashboard. I found this by clicking on the small + icon that appears when hovering by the date. This isn’t a feature you’re encouraged to use, as it makes the view go all bonkers 🙂
  2. I used the alias feature to display the Category field in uppercase, as there were only 3 values, so it wasn’t overly cumbersome to type them in.
  3. The date axis was fixed to a suitable start & end to allow a decent amount of spacing between each category. I also set the axis tick months to start on 01 Jan 18 and recur every 11 months, so only Jan & Dec would display.

So this didn’t seem too bad – I’d got a step area chart using a single axis – everything was all good. On to the next bit…

Tooltips

The tooltip needed to show the category, the month/year, the sales value and the value of the change from the previous month.

Because of the way I’d gone about structuring my data, to work out the change I actually needed to get the value of the next row, and find the difference between the two. This may sound counter-intuitive since we’re asked to find the change from the previous month, but it was necessary due to that path I had chosen to build my area chart 🙂

Sales Next Value

LOOKUP(SUM([Sales In Month]),1)

This table calculation needed to be applied for each category (ie in the Edit Table Calculation window, all dimensions except the Category needed to be selected in the Compute Using section). The difference was just then comparing the two fields

Sales Diff

[Sales Next Value] – SUM([Sales In Month])

But whilst this was getting me the numbers I needed, I was stumbling into another issue.

On Klaus’ viz, the point at the top and bottom of each vertical line, represented the same month, whereas the point at the start and end of each horizontal line was the same month for me 😦

So I had to 1) build another calculated field to shift the months so I could label the points correctly, and 2) since every row didn’t show the Sales Difference (the rows with $0 above), I need another table calc to get the info relating to the previous row. This gave me

Labels

So having sorted the tooltips, my focus now changed to the labels that were displayed which were

  • the points which had the biggest rise (the maximum change) or biggest fall (the minimum change) per category
  • the points at the start and end of each year (if they didn’t correspond to being the max/min change above).

You can use WINDOW_MAX() to find the maximum difference for each category, and then compare this to the actual difference to find the matching row. However, yet again, due to my set up, the matching row wasn’t on the right point, I had to shift it again using a LOOKUP() function to find the previous row.

I had to repeat the exercise using WINDOW_MIN to find the minimum difference, and then apply similar logic using FIRST() and LAST() to identify the start and end points. It was all very convoluted and involved multiple table calculations, but it produced the output I needed. (I’ve included my data table sheet in my published workbook, for reference).

Highlighting biggest rise and smallest drop

So now onto the 2nd part of the challenge. As I mentioned at the start, I knew a dual axis would be involved to achieve the lines.

My initial thought was to use the ‘waterfall’ type technique involving a gantt mark type, sized by the difference.

However, I spotted that the blue and red vertical lines were thicker than the grey vertical lines which were also visible, which meant gantt wouldn’t be an option, as I couldn’t use the Size shelf for two purposes.

I recalled the fact you can place multiple measures on the same axis and use the line to join them up vertically (I did this creating a candlestick chart for #WorkoutWednesday2019 week 13, solution here), so started trying this out. But in this case I couldn’t just use any existing fields, as I’d ‘duplicated rows’, so had use some logic to create further fields

Sales in Month Dual Axis

IF [Sales Diff]<>0 THEN SUM([Sales In Month]) END

Sales Next Value Dual Axis

IF [Sales Diff] <> 0 THEN [Sales Next Value] END

These were basically just giving me values for every other row (so de-duplicating the data I’d expanded). I could then plot these two measures on the same secondary axis, remove Measure Names from the Colour shelf, and by setting the mark type to line, I got what I needed. I applied logic to create a field to set the colour based on whether the change was the max or min or not, and also created a field to set the size again based on whether it was max/min or not.

I then applied the dual axis function, not forgetting to synchronise the axis, tidied up the formatting and added to a dashboard. My published version is here.

But you don’t want to do any of that 🙂

You want to have a read of Rody Zakovich‘s blog from October 2017 – Create a step chart in Tableau without data modelling, and just change the mark type to Area…all the vertical dates are the same, so no messing around to get the data for the labels/tooltips …. simples really… *sigh*

Happy vizzin!

Donna