Can you create a BCG growth share matrix with Sales and Orders?

Week 39 of #WorkoutWednesday was set by Lorna this week. Details of the challenge and a description of what a BCG Growth Share Matrix is here.

The main complexity of this challenge is the calculations, and more so with Lorna’s stipulation ‘no table calcs allowed’.

I started using Tableau before LoDs were introduced, so I tend to be much more comfortable with Table Calcs than I do with LoDs; I know there are many people who are the other way round.

So, let’s get started. In the interest of time (it’s now Sunday afternoon, and I’ve dipped into this challenge on & off since Wednesday), I’m going to focus on the calcs involved in achieving my solution, and not necessarily going into the detail of putting it all together on the Viz.

Sales Scatter

To build the Sales Scatter Plot, we need the following information to help get the %growth and %market share values :

  • The total sales across all the specified years per subcategory
  • The sales for the latest year per subcategory
  • The total sales for the previous years per subcategory
  • The overall total sales across all the specified years

The number of years to compare against is set by a parameter to be 2,3,or 4 years (No.of Years).

To limit the data just to these years, I created a Dates To Include as

YEAR([Order Date]) >= [Max Year in Dataset] – ([No of Years]-1)

Max Year in Dataset is another field that I could have simply hardcoded to 2019, but I decided to be more dynamic, and derive it using an LoD

YEAR({FIXED: MAX([Order Date])})

which gets the latest Order Date in the data and then retrieves the Year part of it, which happens to be 2019 in this instance. Dates To Include is then True if the year of the order date is greater than, or equal to, the maximum year less one less than the No of Years param.

Eg if No of Years = 2, then Dates to Include is True if the year of the order date >= 2019 – (2-1) = 2019-1 = 2018.

if No of Years = 4, then Dates to Include is True if the year of the order date >= 2019 – (4-1) = 2019-3= 2016.

To sense check the numbers I need, I built a tabular view by Sub Category adding Dates to Include = True to the filters shelf.

Total sales across all the specified years per subcategory

is simply achieved by adding SUM([Sales]) to the view.

Sales Latest Year

The sales for the latest year per subcategory is

IF YEAR([Order Date]) = [Max Year in Dataset] THEN [Sales] END

Sales Previous Years

The sales for the previous years per subcategory is then just the opposite

IF YEAR([Order Date]) <> [Max Year in Dataset] THEN [Sales] END

The value of this field ultimately changes as the No of Years parameter changes. If No of Years = 2, then this will contain the 2018 sales, if its 3, then it will contain the sum of the sales in 2017 & 2018 etc.

The Growth % is stated as being the change for the latest year in comparison to the average sales of the previous years. So next I need

Average Sales Previous Years

SUM([Sales Previous Years])/([No of Years]-1)

Note the [No of Years]-1, as when comparing across 2 years, I’m really only being asked to compare 2019 with the previous year, ie 2018.

Now I can find the Sales Growth %

Growth Sales

(SUM([Sales Latest Year])-[Average Sales Previous Years])/[Average Sales Previous Years]

The difference of the latest year compared to the average, as a proportion of the average.

Now we need the Market Share of Sales. This would usually be a pretty simple Percent of Total quick table calculation on SUM([Sales]), but I’ve been challenged to do this without table calcs. So an LoD is needed instead. I created

Sales All Years

{FIXED [Dates to Include]:SUM([Sales])}

which gives me my total sales across the specified years, at the ‘overall’ level.

From this I can calculate

Market Share Sales

SUM([Sales])/SUM([Sales All Years])

The table below gives the breakdown of all these fields at a sub-category level for 2 years

Next step is to categorise each row into Cash Cows, Dogs, Question Marks or Stars based on their % Growth & % Market Share values. Lorna stated that 7% should be considered the mid-point for % Market Share, but that the Growth mid point should be half of the maximum growth value.

From the table above, 64.8% (against appliances) is the maximum value in the set, so I need a new field against each row that outputs 32.4%. Once again, with Table Calcs I would just have used WINDOW_MAX to find the max and halve it. But instead I needed another LOD, this one a bit more complex…

Mid Sales Growth Constant (Sub Cat)

({EXCLUDE [Sub-Category]:MAX({INCLUDE [Sub-Category]: (SUM([Sales Latest Year])-[Average Sales Previous Years])/[Average Sales Previous Years]})})/2

I find INCLUDE and EXCLUDE LoDs a bit of a black art, and there may well have been another way to do this, but it was a bit of trial and error in the end, and so once it gave the value I was after, I stuck with it 🙂 I’ve tried to write some sentences to explain it better, but can’t 😦

So with the mid point defined, the categorisation is then

Sales Category (Sub Cat)

IF [Growth Sales]<=MIN([Mid Sales Growth Constant (Sub Cat)]) AND [Market Share Sales]<=MIN([Mid Market Share Constant]) THEN ‘Dogs’ ELSEIF [Market Share Sales]<=MIN([Mid Market Share Constant]) AND [Growth Sales]>MIN([Mid Sales Growth Constant (Sub Cat)]) THEN ‘Question Marks’
ELSEIF [Growth Sales]>MIN([Mid Sales Growth Constant (Sub Cat)]) AND [Market Share Sales]>MIN([Mid Market Share Constant]) THEN ‘Stars’
ELSE ‘Cash Cows’
END

This is all the information now needed to plot the Sales Scatter chart. Custom shapes were sourced from flaticon.com and I used Paint to colour and resize them appropriately, before saving them into a folder in the Shapes folder of the My Tableau Repository directory on my laptop.

Order Scatter

The steps for creating the Orders scatter are the same in principle. I need to find

  • The total orders across all the specified years per subcategory
  • The number of orders for the latest year per subcategory
  • The number of orders for the previous years per subcategory
  • The overall number of orders across all the specified years

However in doing this I got quite different numbers from Lorna, and its due to how you choose to count orders. Unlike a sales value, where the value is just attributed to a single line in the data set, an Order ID can span multiple lines, which means they can span sub-categories too. Eg imagine you have

Order A – contains 2 line of appliances, and 1 line of phones

Order B – contains 1 line of appliances.

How many orders are there? Answer = 2

How many order lines are there? Answer = 4

How many orders contain appliances? Answer = 2 (100%)

How many order lines contain appliances? Answer = 3 (75%)

How many orders contain phones? Answer 1 (50%)

How many order lines contain phones? Answer 1 (25%)

So when it comes to the ‘market share %’, should I be considering order lines (which means it’ll sum to 100%) or distinct orders which means it’ll sum to more than 100%.

I wrestled with this, and ultimately concluded, this challenge is purely for illustrative purposes, and to stick with my original assumption based on distinct orders (which I’d already calculated before finding the discrepancy). So my Market Share % won’t sum to 100%, and probably isn’t really a definition of Market Share, but I made peace with myself and moved on 🙂

# Orders

COUNTD([Order ID])

Orders Latest Year

COUNTD(IF YEAR([Order Date]) = [Max Year in Dataset] THEN [Order ID] END)

Orders Previous Years

COUNTD(IF YEAR([Order Date]) <> [Max Year in Dataset] THEN [Order ID] END)

Average Orders Previous Years

[#Orders Previous Years]/([No of Years]-1)

Growth Orders

([#Orders Latest Year]-[Average Orders Previous Years])/[Average Orders Previous Years]

Total Orders

{FIXED [Dates to Include] : COUNTD([Order ID])}

Market Share Orders

([# Orders])/MIN([Total Orders])

For 2 years this gave me

Once again the Market share mid point was defined to be a 7% constant, while the growth % mid point needed to be half the max value, which in this instance based on the above, was 1/2 of 51.2%. The same type of LOD was required which gave me

Mid Orders Growth Constant (Sub Cat)

({EXCLUDE [Sub-Category]:MAX({INCLUDE [Sub-Category]: ([#Orders Latest Year]-[Average Orders Previous Years])/[Average Orders Previous Years]})})/2

and from this I could the categorise with

Orders Category (Sub Cat)

IF [Growth Orders]<=MIN([Mid Orders Growth Constant (Sub Cat)]) AND [Market Share Orders]<=MIN([Mid Market Share Constant]) THEN ‘Dogs’ ELSEIF [Market Share Orders]<=MIN([Mid Market Share Constant]) AND [Growth Orders]>MIN([Mid Orders Growth Constant (Sub Cat)]) THEN ‘Question Marks’
ELSEIF [Growth Orders]>MIN([Mid Orders Growth Constant (Sub Cat)]) AND [Market Share Orders]>MIN([Mid Market Share Constant]) THEN ‘Stars’
ELSE ‘Cash Cows’
END

Top 20 Manufacturers Bar Chart

To create the bar chart, I created a new field Manufacturer-SubCat as

[Manufacturer] + ‘ (‘ + [Sub-Category] + ‘)’

I then set about creating the same data tables as I’ve included above, one for Sales and one for Orders.

The majority of the measures used were the same, but when it came to determining the growth mid-point, I found creating a separate field was simpler than trying to use a single field that worked across both Sub-Category and Manufacturer-SubCat. I followed the same methodology though:

Mid Sales Growth Constant (Manu Sub Cat)

({EXCLUDE [Manufacturer-SubCat]:MAX({INCLUDE [Manufacturer-SubCat]: (SUM([Sales Latest Year])-[Average Sales Previous Years])/[Average Sales Previous Years]})})/2

Mid Orders Growth Constant (Manu Sub Cat)

({EXCLUDE [Manufacturer-SubCat]:MAX({INCLUDE [Manufacturer-SubCat]: ([#Orders Latest Year]-[Average Orders Previous Years])/[Average Orders Previous Years]})})/2

I also found though that the 7% market share constant didn’t really give me the split, so I decided a 3% constant would be better to give me categorisations at this level into each bracket. As I had different comparison values, I needed new categorisation fields too.

Sales Category (Manu Sub Cat)

IF [Growth Sales]<=MIN([Mid Sales Growth Constant (Manu Sub Cat)]) AND [Market Share Sales]<=MIN([Mid Market Share Constant (Manu Sub Cat)]) THEN ‘Dogs’ ELSEIF [Market Share Sales]<=MIN([Mid Market Share Constant (Manu Sub Cat)]) AND [Growth Sales]>MIN([Mid Sales Growth Constant (Manu Sub Cat)]) THEN ‘Question Marks’
ELSEIF [Growth Sales]>MIN([Mid Sales Growth Constant (Manu Sub Cat)]) AND [Market Share Sales]>MIN([Mid Market Share Constant (Manu Sub Cat)]) THEN ‘Stars’
ELSE ‘Cash Cows’
END

Orders Category (Manu Sub Cat)

IF [Growth Orders]<=MIN([Mid Orders Growth Constant (Manu Sub)]) AND [Market Share Orders]<=MIN([Mid Market Share Constant (Manu Sub Cat)]) THEN ‘Dogs’ ELSEIF [Market Share Orders]<=MIN([Mid Market Share Constant (Manu Sub Cat)]) AND [Growth Orders]>MIN([Mid Orders Growth Constant (Manu Sub)]) THEN ‘Question Marks’
ELSEIF [Growth Orders]>MIN([Mid Orders Growth Constant (Manu Sub)]) AND [Market Share Orders]>MIN([Mid Market Share Constant (Manu Sub Cat)]) THEN ‘Stars’
ELSE ‘Cash Cows’
END


To Sort the Manufacturer-SubCategory data by Sales or Orders, I created a SORT BY parameter with values of Sales and Orders.

I then created

Sort Value

IF [Sort] = ‘Order’ THEN [# Orders] ELSE SUM([Sales]) END

On the Manufacturer-SubCat field, I then set the Sort by property to be

Manufacturer-SubCat was then also added to the Filters shelf, with the filter set to

And that covers the main calcs and complexities that went into my version/interpretation of this challenge. I hate it when I can’t get the numbers to match 😦 but ultimately now feel comfortable with what I did and I hope it might explain why some of you may have got differences too…

My version of the viz is here, and I’ve included additional detail on the tooltips too which should show the numbers that went into the % calcs for each mark.

Happy vizzin!

Donna

Can you build a sales comparison chart with performance indicators?

This weeks #WorkoutWednesday was set by the lovely Ann Jackson who often delivers some ‘challenging’ problems, all beautifully presented to fool you into thinking it’s going to be straightforward.

This week was no different. Time constraints meant I couldn’t dedicate the usual time to it on Wednesday, and then when I did get to it, I ended up with several false starts, that got very nearly there, but just fell at the final hurdle. I started again this evening, and finally got to something I’m happy with. So let’s get to it.

Ann’s challenge here, was to show a set of monthly KPI BANs (big-ass numbers) with a day by day comparison to the same time month in the previous year. From initial inspection, I figured that several table calculations were going to be needed. She also stated that we could use as many sheets as we liked. I ended up with 4 in my final viz; 1 displaying the BAN numbers, 1 displaying the trend chart, 1 displaying the red/green indicators to the left and 1 for the ‘days until month end’ subtitle.

Let’s start with the BAN numbers.

Ann wanted the chart to be dynamic, to be based as if you were looking at the data based on the month of ‘today’, and for it to change if you looked at it tomorrow. Since the Superstore dataset being used only contains data from 2015-2018, you can’t use the real ‘today’ date.

I authored my viz on 20th Sept 2019. I set up a table calculation to simulate today’s date as follows

Today

//simulate today to be based on the latest year in the dataset
MAKEDATE(
YEAR({FIXED:MAX([Order Date])}),
MONTH(TODAY()),
DAY(TODAY())
)

This produces a date of 20 Sept 2018 (or whatever date in 2018 you happen to be building your viz).

Since the data set is fixed, I could have simply hardcoded the year to 2018, but used the above FIXED LoD expression to be more generic. This LoD finds the year of the maximum date in the whole dataset.

I need to know the month to date sales for the month I’m in (in this case sales from the 1st to 20th September).

Sales MTD This Year

IF [Order Date]>=DATETRUNC(‘month’, [Today]) AND [Order Date]<= [Today] THEN [Sales] ELSE 0 END

This returns the Sales value for the records dated between 01 Sept 2018 and 20 Sept 2018.

This gives me my basic headline BAN number

For the BAN, I also need % change from previous year which requires

Today Last Year

DATEADD(‘year’, -1,[Today])

which returns 20 Sept 2017

Sales MTD Last Year

IF [Order Date]>=DATETRUNC(‘month’, [Today Last Year]) AND [Order Date]<= [Today Last Year] THEN [Sales] ELSE 0 END

which returns the Sales value for the records dated between 01 Sept 2017 and 20 Sept 2017.

% Change

(SUM([Sales MTD This Year]) – SUM([Sales MTD Last Year]))/Sum([Sales MTD Last Year])

This gives me the YoY difference, which I then custom formatted to

â–² 0%;â–¼ 0%

I could then set up my BAN sheet, by adding the relevant fields to the Text shelf, and formatting accordingly

For the KPI indicator, I required an additional field to set the colouring based on the value of %Change

Colour:BAN

IF [% Change] < 0 THEN ‘red’ ELSE ‘green’ END

I then created a very simple bar chart using an ‘old favourite’ MIN(1) to create an axis for a bar chart. The axis was fixed to end at 1, so the bar fills the space.

So that’s the straightforward bits… now onto the more challenging part – the trend chart.

This chart is showing the following:

  • The daily month to date sales for the current month up to ‘today’. This is the red/green line which is labelled with the total MTD sales as at today. At the point I’m writing this is the sales from 1-20 Sept 2018.
  • The daily month to date sales for the equivalent month last year, from the start of the month up to the same date last year (in my case 1-20 Sept 2017). This is the darker grey area chart up to the dotted ‘today’ reference line.
  • The daily month to date sales for the equivalent month last year from the start of the month up to the end of the month (in my case 1-30 Sept 2017). This is the dark + light grey area chart.

For this I knew I’d need a dual axis chart using an area chart for one and line chart for the other.

Given there’s a reference line on the axis indicating ‘Today’, I know I needed a continuous date axis, and chose to use the idea of baselining all the dates to the same year, and then filtering the viz just to use the dates in the current month (in this case September).

Date Aligned

//reset all data to pretend all against same year
MAKEDATE(YEAR([Today]),MONTH([Order Date]), DAY([Order Date]))

You can see from above regardless of the year of the actual Order Date, the re-aligned date field, has the same date.

Month To Include

MONTH([Order Date]) = MONTH(TODAY())

adding this to the filter shelf and setting to True filters to just the September dates in the data set.

Area Chart

The area chart is last year’s data. So far I’ve only built a Sales MTD – Last Year field, but plotting this as a running total table calc against Date Aligned (exact date), doesn’t give me what I need….

…as it flattens out after 20 Sept, as that is when I defined the Sales value to stop being counted. I need a Sales field that continues to grow until the end of the month. I also need a Sales field that gives me my running total up to 20 Sept, but then stops.

Sales Full Month Last Year

IF DATETRUNC(‘month’,[Order Date]) = DATETRUNC(‘month’,[Today Last Year]) THEN [Sales] ELSE 0 END

Adding this to the view and changing to use a Running Total quick table calc gives me what I need

Running Sum Sales MTD Last Year

IF ATTR([Date Aligned])<=ATTR([Today]) THEN
RUNNING_SUM(SUM([Sales MTD Last Year])) ELSE NULL END

This is basically only storing the running sum if the date is prior or on today.

I then changed these to sit on the same axis, rather than side by side, changed the mark type to Area, turned stack marks to off, and changed the colour to grey. As there are measures that overlap each other they give the appearance of a darker shade (sneaky huh?).

Line Chart

In a similar way described above, I can’t just use a running total of my existing Sales MTD – This Year field for the line, as that will also continue beyond 20 Sept. So I need

Running Sum Sales MTD This Year

IF ATTR([Date Aligned])<=ATTR([Today]) THEN
RUNNING_SUM(SUM([Sales MTD This Year])) ELSE NULL END

Added to the view as a dual axis (synchronised) and mark type of line I get

To change the colour of the line I can’t just use the field I used to make the KPI indicator above, as my data is now at a much more granular level, and it will return me multiple % changes. I just want the overall % change. I had to create more calculated fields for this :

Total Sales MTD

WINDOW_SUM(SUM([Sales MTD This Year]))

Total Sales MTD – Last Year

WINDOW_SUM(SUM([Sales MTD Last Year]))

% Total Change

([Total Sales MTD] -[Total Sales MTD – Last Yr]) / [Total Sales MTD – Last Yr]

Colour : Line

IF [% Total Change ] < 0 THEN ‘red’ ELSE ‘green’ END

You can obviously combine all these steps into one, but I find it easier to read this way. No doubt there’s also another way I could have achieved this.

So that’s the main trend chart complete you think (don’t forget to add Today as a reference line, and label the end of the line chart), until you examine the tooltips and notice things aren’t quite giving you what you need.

Against each mark, Ann wants us to show:

  • Sales MTD for this year, which rises until ‘today’, then remains the same
  • Sales MTD for previous year, which also rises until ‘today’, then remains the same
  • Sales MTD for previous year, which continues until the end of the month

With the measures I’ve got on the view, the MTD Sales up to today for this year and last year stop once I pass ‘today’.

But not to worry, this actually isn’t too hard; I just need to add Sales MTD This Year, Sales MTD Last Year and Sales Full Month Last Year to the tooltip and change all the be Running Total table calcs.

Apply relevant formatting to the tooltip, and gridlines etc, hide headers & axis and this chart is now good to go!

When I then added these 3 views to the dashboard, I placed them side by side in a horizontal container, and changed the padding on each view to 0 on all sides, so they all butted up against each other and the lines for each row appeared joined up.

The subtitle showing the days until the end of month is simply a sheet showing another calculated field Days Until End of Month in the text

DATEDIFF(‘day’,[Today],DATEADD(‘month’,1,DATETRUNC(‘month’,[Today])))

And so that’s about it I think… on reflection I wonder why I was being such a knob with my initial attempts where the table calcs I was using seemed to be getting out of hand…. we just all have those days I guess 🙂

My published viz is here

Happy vizzin!

Donna

Rounded Bar Charts : What % of Sales is from the East Region?

For Week 37 2019 of #WorkoutWednesday, Luke Stanke challenged us to create a rounded bar chart, displaying the % of sales by sub-category in a selected region as a proportion of the whole.

In the challenge (here), Luke hinted that this could be achieved by unioning the data set together, but I recalled doing something similar before, and so didn’t think I’d need this.

Before tackling the bars though, I needed to set up the data required, which was pretty straightforward. I needed a parameter to store the region, and then a couple of calculated fields

Region Sales

If [Region] = [Region Param] THEN [Sales] END

which stored the value of the sales for the region selected and

Region % of Sales

SUM([Region Sales]) / SUM ([Sales])

which stored the % value required.

Having set that up, I then went to my stored ‘go to’ references as a refresher:

Andy Kriebel‘s Tableau Tip : How to create rounded bars and Ryan Sleeper‘s : How to Make Rounded Bars & Scales in Tableau

I started with Andy’s technique first, which uses MIN(0) plotted on the same axis as Region % of Sales, displayed as a line chart with Measure Names on the Path shelf to force the points to join up. Increasing the size of the line produces rounded bars.

However, I then spent some time puzzling over how to get the 2nd bar, which needed to be on a second (dual) axis, and ideally also needed to be created in a similar technique using MIN(0) to MIN(1) instead. I tried a few things, but couldn’t crack it this way, so used the technique Ryan adopts which overlays bars and circles on a dual axis.

For this, MIN(0), MIN(1) and Region % of Sales are all added to the same axis, so Measure Values is plotted against Sub-Category and Measure Names added to colour

However if you look closely, you’ll see the values go beyond 1, as the marks are all stacked. We don’t want this, so we need to turn stacking off…


.. and then move Region % of Sales to the top of the list in Measure Values, to bring it to the front, and adjust the colours to suit

This gives our bars. Now, for the rounded bit.

Duplicate Measure Values, by clicking on the Measure Values pill in the columns shelf, holding down Ctrl and dragging your mouse to the right. This will create a copy of the Measure Values pill which you can drop next to it

Make the chart dual axis, synchronise the axis and change the mark types to be a bar for one set of Measure Values and a circle for the other. Adjust the sizing so the marks appear as one ‘lozenge’ shape

Labelling the end of the bar with the % values, needed a little bit of creativity. Just adding Region % of Sales to the label of the ‘bar’ marks card showed the label a bit too close to the display, even when right-aligned, as the circle mark was taking up the space

Labelling the circles instead would have meant a bit of trickery to only label the last circle.

So the quickest & easiest thing to do, was to simply create a label field to make the spacing work :

LABEL : % Sales

” ” + STR(ROUND([Region % of Sales] * 100,0)) + “%”

Adding this to the Label shelf on the bar marks gave me the desired result.

I just then needed to tidy up the formatting (fix axis start, hide axis etc) and add to the dashboard.

My version of the challenge is here, but do check out others. There will be multiple ways of achieving the same thing!

Happy vizzin!

Donna

Custom Axis with a tracking reference line

#WorkoutWednesday 2019 Week 36 was set by Curtis Harris, and for me was a nice gentle challenge to finish up my post holiday catch up.

The full challenge is posted here, with the main requirement being to create a ‘custom axis’, which on hover, showed intersecting reference lines on the line chart, as shown below.

This challenge involves the use of 2 views (one for the axis and one for the line chart) and then Set Actions to drive the interactivity.

To build this, I created a calculated field to essentially store the month/year of each record

Month Order Date

DATE(DATETRUNC(‘month’,[Order Date]))

This is then custom formatted to m-yyyy to get the required presentation

The line chart then simply plots Month Order Date (set to continuous, exact date) against SUM([Sales])

Using Month Order Date, I then created a set, Selected Date Set, and just selected one of the values listed. It is this set that will get changed via the use of Set Actions later.

Selected Date

IF [Selected Date Set] THEN [Month Order Date] END

This field stores the date that has been selected in the set, and can then be added to the Detail shelf on the line chart (as a continuous, attribute), so it can be used as a reference line on the date axis.

Sales Ref

IF [Month Order Date]=[Selected Date] THEN [Sales] END

This field stores the value of the sales for the date selected in the set, and can also be added to the Detail shelf on the line chart, so it can be used as a reference line on the Sales axis.

Custom Axis

The custom axis simply plots Month Order Date on the columns with mark type of circle.

However the requirement states that if the month is the last month or the start of a quarter, the text should display rather than a dot. To do this I needed…

Max Date Month

DATE(DATETRUNC(‘month’,{FIXED : MAX([Order Date])}))

The month associated to the latest Order Date

Order Date Display

IF ([Month Order Date] = DATETRUNC(‘quarter’,[Month Order Date]))
OR ([Month Order Date] = [Max Date Month])
THEN
[Month Order Date]
END

If the month is also the same as the quarter, or the month is the last month, return the month.

Putting this field onto the Label shelf, centre aligning and allowing labels to overlap marks, almost gives the required affect….


I don’t want the dots showing through when the labels exist, so I created

Colour : Circle

IF ISNULL([Order Date Display]) THEN ‘teal’ ELSE ‘white’ END

and added this to the Colour shelf,and adjusted the colours to suit.

Now it’s just a case of putting the sheets together onto a dashboard, so we can then invoke the Set Action, which is sourced from the Custom Axis sheet on Hover, affects the Selected Date Set, and empties the set when the mouse is moved off (so causing the reference lines to disappear).

That’s the core features this challenge is testing. There’s a few other bits and bobs listed, which I haven’t gone into, so do let me know if there’s something you’re struggling with that I have mentioned.

My published viz is here.

Happy vizzin!

Donna

Drill Up and Down with Parameter Actions

The final guest #WorkoutWednesday challenge for August was set by zen master, ambassador and fellow #WorkoutWednesday enthusiast Rosairo Gauna.

The challenge is here and built on a challenge set earlier in the year by Luke Stanke which involved Set Actions. Rosario was challenging us to use Parameter Actions instead to navigate the hierarchy (they weren’t around when Luke set his challenge), as well as not duplicate the data set.

I have to say, this really stumped me. Huge kudos to Rosario for managing to figure out how to use Parameter Actions in this way.

I had to reference Rosario’s own blog post to get to the bottom of this, so I won’t be documenting this myself.

This is going to be one of those challenges that demonstrates a technique that if I ever want to use again, I’ll be referencing this post (well Rosario’s post via this post), rather than remembering all the steps/calculations… a bit like I do when I have a need to build a trellis chart or if I ever had to build a radial chart again – I have my ‘go to’ reminder posts.

My solution is here.

Happy vizzin!

Donna

Can you build a Top N Bar Chart on a Single Worksheet?

So back from my holibobs and catching up on the #WorkoutWednesday challenges I’ve missed while I’ve been sunning myself on a Greek beach.

First up is another guest challenge posted by Jeffrey Shaffer who asked us to create a Top N bar chart of the quantity per manufacturer per region on a single viz. The main twist with this is that a manufacturer could be ranked in different locations for each region. The full challenge is here.

Determine Manufacturer

In the 2019.1 Superstore Sales data set I had on my machine, there is no Manufacturer field. I couldn’t see one in the link to the dataset included on the blog page either. Perhaps I was missing something, but given I was on catch up, I chose not to find out the real reason. Instead I decided to derive my own Manufacturer field by just taking the first word of the Product Name. It meant my quantity figures wouldn’t match up, but the principles behind the challenge wouldn’t be affected.

MID([Product Name],1, FINDNTH([Product Name],’ ‘,1)-1)

Note – I’ve watched Ann’s solution (published on the challenge page), since I published my version, and Ann just lifts a pre-defined Manufacturer group onto her viz, which also already has an ‘Other’ manufacturer. My calculated field above has no such value. Ann’s solution is therefore a bit more straightforward than the work I had to put in to determine ‘Other’.

I interpreted ‘Other’ to be the total quantity of all the manufacturers not in the top n. But of course the top n had to take into account the Region, so it was possible that a manufacturer listed explicitly in the top n for the Central region say, might be included within the ‘other’ pot for another region.

To work this out, I created a quantity field per region ie

Central – Qty

IF [Region] = ‘Central’ THEN [Quantity] END

East – Qty

IF [Region] = ‘East’ THEN [Quantity] END

South – Qty

IF [Region] = ‘South’ THEN [Quantity] END

West – Qty

IF [Region] = ‘West’ THEN [Quantity] END

I then created a set per region to store the top n manufacturers based on the Quantity per region

The set was based on the Manufacturer field, and limited to the Top n Manufacturers parameter I set up, associated to the relevant [Region] – Qty field created above.

I had 4 sets in the end : Top Central, Top East, Top South, Top West

Note – It is entirely possible with this set up, that each set could contain a completely different set of manufacturers, so up to top n X 4 manufacturers in total

So having got a set which stored my top n per region, I needed to get the ‘Other’ grouping. To do this I created a new field

Manufacturer Category

IF [Region] = ‘Central’ AND [Top Central] THEN [Manufacturer]
ELSEIF [Region] = ‘East’ AND [Top East] THEN [Manufacturer]
ELSEIF [Region] = ‘South’ AND [Top South] THEN [Manufacturer]
ELSEIF [Region] = ‘West’ AND [Top West] THEN [Manufacturer]
ELSE ‘Other’
END

With the Top n parameter set to 10, this gives me

There are 12 rows listed, but only 11 values per region (the top 10 + other). You can see here that Wilson is within the top 10 for East & West, but not Central and South, whilst Hon is in top 10 for Central, East & South but not West. This is why there are 12 rows. If I changed the Top n parameter to 15, I’d get 20 rows listed, as there are 19 manufacturers across the top 15 for all the regions + other.

To get the sorting, we need to introduce an old friend :

Index

INDEX().

Adding this as a discrete (blue) pill onto the rows, moving the Manufacturer Category onto Text, and setting the Table Calculation Properties of the Index field to …

  • Compute over both Region and Manufacturer Category (with Region listed first)
  • To restart every Region
  • To sort by SUM of Quantity descending

… I start to get what I need

But I’ve still got 1 too many rows, I need to deal with the ‘Other’ pot. If my top n parameter is set to 10, then I essentially want to be able to show rows 1-10 above if the Include Other parameter is true, or show rows 2-11 if the Include Other parameter is false (Include Other is a boolean parameter I created).

FILTER – Other

NOT([Include Other]) AND [Manfacturer Category] = ‘Other’

Adding this to the Filter shelf and setting to False, basically hides the Manufacturer Category = Other when Include Other = No, or shows Manufacturer Category = Other when Include Other = Yes

So when Include Other = No, I have 10 rows indexed 1-10, and when Include Other = Yes, I have 11 rows indexed 1-11.

I only want the number of rows based on my Top N parameter, so I created an additional field

FILTER – Index

[Index] <= [Top n Manufacturers]

which is added to the Filters shelf and set to true, so only the rows that are less than or equal to the parameter value are displayed.

So we’re nearly there, except Jeff added an additional curveball…

Display Manufacturer & Rank on hover

To tackle this, I needed 3 more calculated field and a set, as we’re going to need Set Actions to handle this.

Index Rank

‘(#’ + STR([Index]) + ‘)’

this simply formats the index value into the format (#1)

Manufacturer + Rank

ATTR([Manufacturer Category]) + ‘ ‘ + [Index Rank]

this concatenates the two fields together – note the ATTR() is needed as Index Rank references the aggregated table calculation of INDEX().

Highlighted Manufacturer

This is a set created by simply right-clicking on Manufacturer Category and selecting Create -> Set, then choosing an arbitrary option (eg Avery). This will be changed via Set Actions later…

LABEL: Manufacturer

IF ATTR([Highlighted Manufacturer]) THEN ([Manufacturer + Rank]) ELSE ATTR([Manfacturer Category]) END

This is basically displaying the manufacturer with the rank if the manufacturer is in the set.

Adding LABEL:Manufacturer into our data table above, you can see that the Avery manufacturer is displayed with the rank, but the others aren’t

To make the value of the set change on hover, the view needs to be added onto a dashboard, and a Set Action created.

Create a dashboard sheet, and add the view. Then select Dashboard -> Actions -> Add Action -> Change Set Values

Set the properties in the dialog as below – the action to run on Hover, the Target Set to the be the Highlighted Manufacturer created above, and remove all set values when you ‘hover off’

Hovering on Eldon, for example, you can see the manufacturer label is changing in the relative positions

So that’s all the building blocks now in place, the viz just needs to be created, which I did by duplicating my table version above, and moving the pills around.

I basically created a synchronised dual axis with Index on rows, Region, MIN(0), SUM(Quantity) on columns. MIN(0) was set to be a Gantt bar and labelled with the LABEL:Manufacturer field set to be left aligned. The SUM(Quantity) field is labelled with the Quantity. Both labels were set to match the mark colour, which was based on Region using the Nuriel Stone palette specified. Headers were hidden and various formatting applied to the grid lines/rows/columns etc.

My version of the viz is here.

I’m off to ponder about where that Manufacturer field came from, as I think it would have all been a lot simpler if it already existed… hmmmm…

Happy vizzin!

Donna

Corey’s Table Challenge

I’m going to attempt to be really brief this week, as I have very little time to get this drafted before I head off on my holibobs for a couple of weeks (so I’ll be off grid for the next couple of #WorkoutWednesdays but will attempt to catch up when I return).

This week’s challenge was set by IronViz 2018 finalist, Corey Jones, and focuses on formatting a table – full challenge details here.

Presenting data in Tableau in an ‘out of the box’ table chart type, does not allow row level formatting in the manner described in this challenge (Tableau is not Excel you know :-))

So you have to be creative, and this is what this challenge is all about.

Corey gives the clue that no more than 3 sheets are allowed, hinting that the above presentation is not all on one view, but at least 2 carefully arranged side by side on a dashboard.

I had 3 views in my dashboard – the shaded table of sub-category and measures, the stacked bar of % of region sales vs rest, and the title.

Now I ended up with 3 including the title, as I used the [Region] field as a quick filter throughout my build, and so to enable the title to change dynamically I needed to create a sheet so I could reference that field. Having looked at Corey’s solution later on, he chose to use a parameter to drive the Region selection. Parameters can be referenced in dashboard titles, so in Corey’s solution, the Sub Category list on the left hand side is also a view, whereas I combined with the measures. The only thing I couldn’t achieve with the combination was being able to left-align the Sub-Category heading, while leaving the other headings (Sales, Profit etc) centre-aligned.

The formatted table

The magic ingredient in building formatted tables is using an axis with labels. In this instance the field MIN(1) is your best friend.

  1. Add [Sub-Category] to rows
  2. Type MIN(1) into columns
  3. Set mark type to bar
  4. Add SUM([Sales]) to the Text shelf and right align
  5. Fix the axis from to start from 0 and ends at 1

and you get…

… something that looks pretty tabular, but you now have more flexibility with formatting styles.

This concept forms the basis for the whole table, but we need to step back quickly, as we need to set up some calculated fields.

Sales, Profit & Quantity are already defined in the dataset, but we also need

Profit Ratio

SUM([Profit]) / SUM(Sales)] formatted as a percentage to 0 decimal places

Total Sales for Year & Category

{FIXED YEAR([Order Date]), [Sub-Category]: SUM([Sales])}

This is a Level of Detail (LoD) calculation that stores the total sales for the year (2018) and subcategory, regardless of the [Region] filter being selected

% Sales for Selected Region

SUM([Sales])/SUM([Total Sales for Year & Category]) formatted as a percentage to 0 decimal places

Note Technically this field will return % sales if any filter is applied, not just region

Highlight

[% Sales for Selected Region] > ([Highlight Threshold]/100)

This returns true where the % sales is more than the threshold selected by the user by entering a value into the [Highlight Threshold] parameter

The [Highlight] field can now be used to format the ‘table’ by adding to the Colour Shelf, setting true to be grey and false to be white.

Making the text bold

My solution differs from Corey’s. I chose to create the following calculated fields

LABEL : Sales Bold

IF [Highlight] THEN SUM([Sales]) END

This returns a value only against the rows where [Highlight] is true.

LABEL: Sales Normal

IF NOT([Highlight]) THEN SUM([Sales]) END

This returns a value only against the rows where [Highlight] is false.

Both these fields are placed side by side on the Label shelf, and with the ‘bold’ field formatted to be bold. As no row will ever have a value set in both fields at the same time, only a single value is displayed

Showing the column title at the top

I did this by creating an additional axis MIN(0), and applying dual axis which I synchronised. I removed all the pills from the MIN(0) marks card, and the [Measure Names] that had been automatically added to the colour shelf on the MIN(1) marks card.

This gave me an axis at the top which I could then edit – I changed the title to Sales, and set the axis tick marks to None for both major & minor ticks marks. On the bottom axis, I also had to edit, in this case setting the title to nothing as well as setting the axis ticks to none.

The height of the axis was also adjusted to make it appear closer and the font formatted accordingly. The Sub-Category was also set to not ‘Show Header’.

To create the other columns, the process is repeated with further dual axis MIN(1) & MIN(0) fields added to the columns, with further bold/normal Label fields also created.

The More Fields to the bottom left lists all the MIN(1) & MIN(0) cards in order.

To get the ‘column’ line to appear between the sub-category and sales columns, I set the transparency of the MIN(0) card associated to the Sales measure back to 100%, and it gave the appearance of a solid line.

Stacked Bar Chart

This is also a dual axis chart, which plots the % Sales for Selected Region field alongside our new friend MIN(1), where the MIN(1) axis is set to just over 1.

  • The Measure Names colours are set to grey (% Sales) and white (MIN(1))
  • The MIN(1) axis is set to the back
  • % Sales for Selected Region is the label on the associated card, and left aligned
  • A new field which is basically 1 – [ % Sales for Selected Region] is added to the label of the MIN(1) card and right aligned.
  • A border is applied to the bars via the Colour shelf

To create the ‘column heading’ a new field is required

LABEL: Bar

[Region] + ‘ vs. All Other Regions’

This is then added to the column shelf to get

so when the Region filter changes, the title changes too.

Both these sheets are then positioned carefully side by side on a dashboard, setting the Fit to Entire View. The padding of the table sheet is adjusted so it has 0 outer padding on the right, whilst the padding of the stacked bar is adjusted so it has 0 outer padding on the left.

And finally, the Region filter is added to the dashboard, set to floating positioned just near to the heading of the stacked bar. It’s size is also adjusted so only the arrow part remains.

There’s a fair few other little formatting bits and bobs to get the text right, remove columns etc, but these are hopefully settings you already know about. Feel free to contact me if you can’t figure something out.

My solution is available here.

Happy vizzin!

I’m off for some sunshine 🙂

Donna

Can you create step area charts?

Continuing on from last year, August is #WorkoutWednesday takeover month, with challenges posted by invited guests. First up, for Week 32 was this challenge from Klaus Schulte to create a step area chart WITHOUT the use of data densification (ie duplicating the data set).

WARNING! This blog describes my approach to solving the challenge which I can only describe as trying to find your way through a maze for the first time – you take a few wrong turns, double back on yourself several times, but eventually get to the the centre, feeling quite satisfied. However, you later find, that if only you’d turned right rather than left to start with, there would have been a much quicker direct route to get to the end. My approach to this challenge ended up being quite cumbersome due to the path I started down. Klaus’ solution is much simpler 🙂

There were minimal clues in the challenge requirements, apart from the fact that table calcs were going to be required and, as stated, you couldn’t duplicate the data set. So I started by viewing Klaus’ solution on Tableau Public to see what I could glean by mousing over the chart. From doing this I understood that

  1. The stepped area chart had to be managed using a single axis, since the dual axis would be needed for the ‘bonus’, highlighting the max & min drop.
  2. The stepped area chart was indeed an ‘area’ mark type (and not carefully disguised bars or similar), since when you hovered over a mark, an indicator appeared at both the top and bottom of the chart, and if you selected anywhere, the whole section was highlighted

Stepped Area Chart

So where to start…. I decided to watch the relevant section of the TC Europe talk Klaus referred to in the challenge intro, in which he showed a way of doing this by duplicating the data, to see if there were any clues there. By doing this, I understood that

  1. The difference between an area & line chart was that with an area chart, there is no Path shelf, so you can’t control how the marks will be joined up. An Area chart will just join the marks from left to right. This was already familiar, but the reminder was useful to jog the memory.
  2. Whilst Klaus’ solution was using a duplicated data set, he incorporated a method of offsetting the point to plot forward (+0.499) or backward (-0.499) depending on what data set was being referenced. This reminded me of something I’d done in a previous #WorkoutWednesday challenge a long time ago, where I needed to display a side by side bar chart

So I figured I needed to find a way to ‘expand’ the data to give me 2 rows per month by using some offset logic, which I did by creating new fields

Month Position To Plot

DATE(IF MONTH([Order Date]) = 12 THEN DATETRUNC(‘month’,[Order Date])
ELSE
IF DAY([Order Date]) <=15 THEN DATETRUNC(‘month’,[Order Date]) ELSE DATEADD(‘month’,1,DATETRUNC(‘month’,[Order Date]))-1 END
END)

This sets the date to be 1st Dec if the Order Date was December (as I didn’t want an extra point), otherwise, if the Order Date was before 15th of the month, the date would be the 1st day of the month, otherwise it would be the last day of the month.

Sales In Month

{FIXED [Category], DATETRUNC(‘month’, [Order Date]): SUM([Sales])}

I needed this LoD to store the total value of the monthly sales. Without it, due to my Month Position To Plot field above, the Sales would only show the partial sales in the month.

This made my data look like

Plotting this on a chart, setting the [Month Position to Plot] to be a continuous (green) pill set to the Day level, and choosing the Area mark type, and I got a stepped area chart 🙂 The days where the marks are plotted end up being so close together, that the vertical line looks straight to the eye.

Side notes

  1. I initially created a numeric field based on the month’s number offset by +/- 4.99 to plot, instead of the date field [Month Position To Plot], assuming I might be able to use custom formatting to show the date on the axis. When I eventually found this wasn’t possible, I confirmed the axis on Klaus’ solution was actually associated to the main view, and not part of another view carefully placed on the dashboard. I found this by clicking on the small + icon that appears when hovering by the date. This isn’t a feature you’re encouraged to use, as it makes the view go all bonkers 🙂
  2. I used the alias feature to display the Category field in uppercase, as there were only 3 values, so it wasn’t overly cumbersome to type them in.
  3. The date axis was fixed to a suitable start & end to allow a decent amount of spacing between each category. I also set the axis tick months to start on 01 Jan 18 and recur every 11 months, so only Jan & Dec would display.

So this didn’t seem too bad – I’d got a step area chart using a single axis – everything was all good. On to the next bit…

Tooltips

The tooltip needed to show the category, the month/year, the sales value and the value of the change from the previous month.

Because of the way I’d gone about structuring my data, to work out the change I actually needed to get the value of the next row, and find the difference between the two. This may sound counter-intuitive since we’re asked to find the change from the previous month, but it was necessary due to that path I had chosen to build my area chart 🙂

Sales Next Value

LOOKUP(SUM([Sales In Month]),1)

This table calculation needed to be applied for each category (ie in the Edit Table Calculation window, all dimensions except the Category needed to be selected in the Compute Using section). The difference was just then comparing the two fields

Sales Diff

[Sales Next Value] – SUM([Sales In Month])

But whilst this was getting me the numbers I needed, I was stumbling into another issue.

On Klaus’ viz, the point at the top and bottom of each vertical line, represented the same month, whereas the point at the start and end of each horizontal line was the same month for me 😦

So I had to 1) build another calculated field to shift the months so I could label the points correctly, and 2) since every row didn’t show the Sales Difference (the rows with $0 above), I need another table calc to get the info relating to the previous row. This gave me

Labels

So having sorted the tooltips, my focus now changed to the labels that were displayed which were

  • the points which had the biggest rise (the maximum change) or biggest fall (the minimum change) per category
  • the points at the start and end of each year (if they didn’t correspond to being the max/min change above).

You can use WINDOW_MAX() to find the maximum difference for each category, and then compare this to the actual difference to find the matching row. However, yet again, due to my set up, the matching row wasn’t on the right point, I had to shift it again using a LOOKUP() function to find the previous row.

I had to repeat the exercise using WINDOW_MIN to find the minimum difference, and then apply similar logic using FIRST() and LAST() to identify the start and end points. It was all very convoluted and involved multiple table calculations, but it produced the output I needed. (I’ve included my data table sheet in my published workbook, for reference).

Highlighting biggest rise and smallest drop

So now onto the 2nd part of the challenge. As I mentioned at the start, I knew a dual axis would be involved to achieve the lines.

My initial thought was to use the ‘waterfall’ type technique involving a gantt mark type, sized by the difference.

However, I spotted that the blue and red vertical lines were thicker than the grey vertical lines which were also visible, which meant gantt wouldn’t be an option, as I couldn’t use the Size shelf for two purposes.

I recalled the fact you can place multiple measures on the same axis and use the line to join them up vertically (I did this creating a candlestick chart for #WorkoutWednesday2019 week 13, solution here), so started trying this out. But in this case I couldn’t just use any existing fields, as I’d ‘duplicated rows’, so had use some logic to create further fields

Sales in Month Dual Axis

IF [Sales Diff]<>0 THEN SUM([Sales In Month]) END

Sales Next Value Dual Axis

IF [Sales Diff] <> 0 THEN [Sales Next Value] END

These were basically just giving me values for every other row (so de-duplicating the data I’d expanded). I could then plot these two measures on the same secondary axis, remove Measure Names from the Colour shelf, and by setting the mark type to line, I got what I needed. I applied logic to create a field to set the colour based on whether the change was the max or min or not, and also created a field to set the size again based on whether it was max/min or not.

I then applied the dual axis function, not forgetting to synchronise the axis, tidied up the formatting and added to a dashboard. My published version is here.

But you don’t want to do any of that 🙂

You want to have a read of Rody Zakovich‘s blog from October 2017 – Create a step chart in Tableau without data modelling, and just change the mark type to Area…all the vertical dates are the same, so no messing around to get the data for the labels/tooltips …. simples really… *sigh*

Happy vizzin!

Donna

Can you create a hub and spoke map?

This week’s #WorkoutWednesday challange was a mash up with my other favourite weekly challenge, #PreppinData, the idea being to manipulate the data for the challenge in Tableau Prep, then create the viz in Tableau with the output.

I don’t plan to document this challenge to the nth degree, but just pull out the bits that I think are of most relevance… always happy for anyone to reach out though if there’s something I don’t mention that they’re not clear on 🙂

Anyway, so onto the challenge. Let’s start with the data prep….

#PreppinData Week 25

If you haven’t yet discovered it, #PreppinData is a set of weekly challenges to help you learn Tableau Prep. Carl Allchin and Jonathan Allenby set this up inspired by the success of #WorkoutWednesday. I had barely used the tool before they set the challenges up, and it’s definitely a great resource to get you started.

So this week’s challenge is defined here, and consisted of the following input

  • a file with 1 row per concert involving either Ed Sheeran or Ben Howard (ie the artists) including concert location
  • a file with lat/long data for each location
  • a file with the artists’ hometowns and lat/long

The basic requirement was to

  1. Join the concert/location data together
  2. Find the fellow artists who performed at the concert along with Ben and Ed, and expand the data so there was 1 row per concert per fellow artist as well as row for the artist themselves.
  3. Remove some ‘obvious’ duplicate rows
  4. Add in the home location for the artist

1.Join concert/location data

The location data contained Latitude and Longitude data in a single field LongLats in the format below

LocationLongLats
Rochester Hills, MI42.680396, -83.133654
Camden, NJ39.925640, -75.121151

So before combining with the concert data, this needed to be split up, which I did using the Split Values -> Automatic Split function

This automatically created 2 fields, splitting the data by the comma (,) which I renamed Long & Lat.

Warning : I don’t use mapping data that often and am therefore not overly familiar with what a typical lat/long number looks like. Given the field was originally named LongLats, I assumed the first number was Longitude and the second Latitude. This was an incorrect assumption, which I only discovered later when I displayed in Tableau to find the marks spread up and down the viz rather than across. So I came back and rectified my flow with the correct labels.

I then joined the concert data with the location data using a ‘left outer’ JOIN, to ensure all concert data was retained, even if a match to location couldn’t be found.

2.Find fellow artists and expand 1 row per concert & fellow artist

This bit took some thinking.

The Concert field contained a string of text in varying forms, that was either

  • blank
  • contained the name of the artist on their own, eg Ben Howard
  • contained the name of the concert eg Multiply Tour
  • contained a string of ‘fellow artists’, where each artist was separated by a forward slash ( / ). The number of artists listed could vary considerably
    1. eg Ed Sheeran / Taylor Swift or
    2. Mumford & Sons / Felice Brothers / Ben Howard etc

The instruction was that if the Concert field didn’t contain a slash ( / ), then there were no fellow artists.

I knew that ultimately I was going to need to split the Concert field up, pivot the data to expand 1 row into 1 or more rows, and then delete the unnecessary rows which had no fellow artist data. This is because when a field is split, you end up with however many columns are necessary for the longest string. So for the examples above, 3 split fields would be created due to example 2 above, but example 1 would also have 3 fields, with the 3rd field being blank. When this data is then pivoted, there would be 3 rows per original row, so I’d want to delete the rows where fellow artist was then blank. HOWEVER, if there had been no fellow artist in the first place (ie the Concert field did not contain a / ), I also needed to ensure I wouldn’t inadvertently be deleting rows of data I actually wanted.

I approached this in the following way (by all means there may be something more straightforward, but this is just how my brain was working at the time 🙂 )

  1. Created the Fellow Artist field, and set it to <blank string> ( ” ) if the Concert field was NULL or did not contain a slash ( / ). Otherwise I stored the Concert data.
    • This involved the use of FIND() and ISNULL() functions
  2. Removed the name of the artist if he also appeared in the listing (as I wasn’t sure it was always there).
    • This involved the use of REPLACE() to replace the artist with <blank string>
    • TIP : To manipulate a single field in multiple steps, you don’t need to keep creating a field with a new name each time, and then remove/rename. You just name the calculated field the same as the one you’re working with (eg Fellow Artist in this instance), and you can refer to the field itself in the calculation. This isn’t something you can do in Tableau Desktop, so might not be obvious if you’re familiar with that.
  3. Added the name of the artist back in to every row, including the ones where there was no artist listed. This may seem odd, given I’d just removed it, but as I said above I couldn’t be sure that the artist themselves was always present in the original list, and I wanted to ensure the Fellow Artist field in every row contained the name of the artist once only. This would ensure that when it came to deleting rows later, I wouldn’t remove rows I still needed (ie those relating to Concerts with no fellow artists).
    • If the Fellow Artist did not contain ‘ / ‘ , then store Artist else add Artist/ to the front of the Fellow Artist field.
  4. I then used the Custom Split function to split all fields separated by /

This generated several SPLIT calculated fields based on the maximum number of ‘ / ‘ found in the string. NOTE – if my input data then changed and there was an even longer string containing more / , then I’d have to manually add further calculated fields to cope with this.

I could now add a PIVOT step to transpose the data from 1 row into multiple, ensuring I was pivoting columns to rows and adding all my ‘split’ fields as the columns to pivot

This generated 15 rows for every original row, with the automatically named field Fellow Artist Split containing either blank or the name of a single artist. Adding a step to then filter all the rows where Fellow Artist Split <> ” then removed all the unnecessary rows in the data.

3. Remove some obvious duplicate rows

Determining what the authors had defined a ‘duplicate’ took a couple of attempts. They had stated in the requirements that the provided Concert ID field could not be relied upon, but that was the only clue. In my mind the Date, Artist , Fellow Artist Split and Location was probably enough, but although when eyeballing the data, this seemed to be a sensible grouping, this didn’t yield the expected number of records. In the end it seemed Date, Artist, Fellow Artist Split, Concert, Location,Venue were the fields deemed to define uniqueness.

So I created an AGGREGATE step, grouping the data by the fields listed above, and then I stored the minimum value of Concert ID against each summarised row. I then joined this aggregation back to the step before the aggregation joining on the Concert ID fields. This resulted in the expected number of records stated in the challenge.

I then applied a few tidy up steps – removing unwanted columns, renaming columns as appropriate and most importantly (to match the requirements), setting the Fellow Artist field to NULL if it only contained the name of the Artist himself.

4. Add in Home Location of Artist

This was a simple step to Inner JOIN to the Artist’s hometown location file, using the Artist field as the linking field.

I then saved the output. Complete flow below.

#WorkoutWednesday Week 31 – hub and spoke chart

So now the data was prepped, I could start to tackle Lorna’s challenge to visualise Ed Sheeran’s & Ben Howard’s tour data. The full challenge is defined here, with the core requirements of

  1. Group Locations into Regions
  2. Create Hub & Spoke maps
  3. Create a Word Cloud of Fellow Artists

In addition to the above, there are also charts summarising the number of concerts per artist per region, and a cumulative count over time. These are pretty standard / typical outputs, so I’m not going to describe how to build these, but you may notice that the numbers I present differ from those on Lorna’s solution.

This is due to the way I decided to calculate the Number of Concerts (# Concerts). As part of the #PreppinData challenge, I still had a Concert ID in my output. Due to the way I approached the challenge, the Concert ID ended up being a unique value per Concert in my output, so I chose to use this as my way to count the # Concerts (using COUNTD() function).

I did check Lorna’s solution to see what she did, but decided to proceed with my definition, knowing that ultimately it didn’t really make that much difference to the visualisations required.

1. Group Locations into Regions

I did this simply by manually selecting marks on a map.

Simply double clicking on my Lat & Long fields, then unchecking Aggregate Measures on the Analysis menu automatically presented me with a map of the world. I then selected the marks in Europe with the intention to use the automatic Group function, but it wasn’t available.

This puzzled me momentarily, but then I realised I needed a dimension (blue pill) in the view, so I added Location to the Detail shelf, and now I had the Group function visible. I repeated the process with the other marks to create multiple groupings which I renamed appropriately.

2. Create Hub & Spoke Maps

This was the primary focus of this week’s challenge, providing the ability to try out some very new features – the MAKEPOINT() and MAKELINE() functions which have only been released in v2019.2.

Being someone who doesn’t work with map data very much at all, Lorna quite helpfully added a pointer to this Tableau blog which introduced the new functions and how to use them.

So following the instructions, I created a field called Route as follows:

MAKELINE(MAKEPOINT([Hometown Latitude],[Hometown Longitude]), MAKEPOINT([Lat],[Long]))

MAKELINE() requires a start & end point. In the above the start is the point associated to the artist’s hometown, defined by the first MAKEPOINT() function, and the end is the point associated to the location of the concert, defined by the second MAKEPOINT() function.

Simply double-clicking on this new field, automatically generated the display below

This then needed to be split by Artist, filtered & coloured by Region, sized by # Concerts, and have Venue and Concert added to the Detail shelf. The Tooltip was then added and the sizing adjusted to get a better differentiation.

To get the circles at each destination, a dual axis map is required, where the second axis is just showing the point related to the Concert location. This is basically the 2nd part of the MAKELINE function above, so I had to create a new field Destination which is

MAKEPOINT([Lat],[Long])

In hindsight, to make things clearer, I probably should have created a Start field representing the starting point, and a Destination field representing the end point, and then adjusted the Route field to reference these…

Anyway, to get the circle, I did the following:

  • Duplicated the Longitude(generated) field on Columns
  • On the second Marks card, replaced the Route field with the Destination field, and removed the # Concerts from the Size shelf
  • Set the circle to have a white border (via the Colour shelf properties)
  • Made the chart dual axis
  • Set the background style of the map to Dark (via the Map -> Map Layers menu)
  • Unticked all the map option features to prevent the pan/zoom/selector toolbar from showing when you hover over the map (via the Map -> Map Options menu)

NOTE – I think there may be a bug in v2019.2.2 as whilst the map toolbar stopped being visible on the worksheet view, when I later added the map to the dashboard, it was visible again 😦 Testing the property in v2018.3.3, which is the version I use for work, I didn’t get this behaviour.

I tidied up by applying various other formatting features to meet the required display, then duplicated the sheet and adjusted the filters for other regions.

3. Create a Word Cloud of Fellow Artists

I’ve never actually had a need to create a word cloud before, so figured it would be useful to document, just as much as for my future reference 🙂

  1. Add Fellow Artist to Text shelf
  2. Add Fellow Artist to Size shelf, then change to be measure Count
  3. At this point, the chart will change to be a treemap style, so you need to change the Mark type to Text

Then just apply the various formatting/colour etc to match the requirement

Obviously there’s more views that need to be built for this challenge, and then they all need to be placed on a single dashboard with the relevant formatting and dashboard action being applied to make the Word Cloud change when a summary value is selected.

Hopefully these tasks should be quite straightforward and this is already a very lengthy post. If you’ve got this far, then thanks very much for taking the time to read 🙂

My solution is published here

Happy Vizzin’

Donna

Creating a Navigating KPI Block

You know you get those days when your mind just isn’t in the right place?  Well, that was me yesterday – even forgot to collect my daughter from her holiday club… I’d driven home before I remembered – eeek!  She was none the wiser though, as I wasn’t late – phew!

After reading through Ann Jackson’s challenge for Week 30 of #WorkoutWednesday2019 (here), I thought this should be pretty straightforward (Ann had even said so herself), so found it quite perplexing to then be sitting staring at my screen trying to figure out how to get 4 independent measures into a single view in the grid format displayed…..

So I closed my laptop lid, and walked away… well actually went and worked up a sweat in my weekly Zumba class.

After a good physical workout, shower and food, I set about readdressing the mental workout challenge, and promptly had that ‘doh!’ moment…

The focus of the challenge was to utilise some of the new features in Tableau to aid in navigation between dashboards.  Subsequently you’ll need Tableau v2018.3 to be able to complete this challenge.

The solution is built of some very simple views/worksheets, which are then placed on a number of dashboards.  The Go To Sheet dashboard action and Button dashboard object are then utilised to provide the navigation – these are the features introduced in v2018.3.

All relatively straightforward, and I have no idea why I had such a mental block….

My solution consists of 8 worksheets and 5 dashboards, and to build really ended up involving a lot of right-click -> duplicate worksheet. 

1. 4 Block KPI dashboard

This consists of 4 separate views, 1 for each measure. I built the first view (Customers) as follows:

a. Dragged Customer_ID to Text shelf, then changed the aggregation to Count (Distinct) by clicking on the white arrow (carrot) to the right of the pill.

b. Changed the mark type to Square, changed the Size to be as large as possible, and changed the ‘fit type’ to Entire View

c. Edited the text and formatted the size, set the font to ‘match mark colour’ and adjusted the alignment to middle centre

d. Coloured the box, using a recent tip I believe I got from Lorna Eden…

If you just select the Colour shelf with the view above, the colour palette below is displayed

Ann mentions in her blog that she uses the Hue Circle colour palette.  The easiest way to access this specific palette is to create a ‘dummy’ pill to put on the Colour shelf.  To do this, simply double click in the space on the marks shelf, below where the text measure is shown, and type the word ‘dummy’ (including quotes) into the area

This creates a new blue (discrete) pill, which you can then add to the Colour shelf, and this will then display the Colour Legend, which you can edit, find the Hue Circle colour palette and select the appropriate colour.

e. Finally edit the tooltips to uncheck the ‘show tooltip’ option.

f. Once done, this sheet can then be duplicated to create the other KPI blocks; you just then need to change the field on the text to be CNTD(Product Name), CNTD(Order ID) or CNTD(City State), where [City State] is a calculated field I created of UPPER([City] + ‘, ‘ + [State]).

g. NOTE – You do need to remove then recreate the ‘dummy’ field on each duplicated sheet though, as otherwise, if you change the colour, it will affect all the sheets.  Simply remove the field, then create a new one by typing directly in the same manner described above.  Whilst you may name this field the same ie ‘dummy’, the removal and recreation actually creates a new instance of the pill ‘under the bonnet’, which is therefore independent of the ‘dummy’ pill on the other sheets, so can be coloured differently.

h. Add these 4 views to a dashboard.

2. Bar chart dashboard

On clicking on each block on the 4 block KPI dashboard, a new dashboard is presented showing a sorted Sales by xxxx bar chart.  Let’s work through creating the Sales By Customer dashboard.

a. Ann likes her UPPERCASE, so first create a calculated field to store UPPER([Customer Name]), and add this to rows and SUM(Sales) to columns and sort.

b. Add both fields to the Label shelf, edit the font sizes to suit, match mark colour and align left middle. Untick ‘show tooltips’ from the Tooltip shelf.

c. Apply formatting to the chart to remove gridlines, columns, rows etc, but keep the axis rulers on rows – make the line thicker/darker to suit.

d. Add a ‘dummy’ pill to set the colour again, and choose the appropriate colour from the hue circle palette again.

e. Untick ‘Show Header’ from the pills on the rows & columns shelf to remove the axis and the Customer column.

f. Change the title to SALES BY CUSTOMER, adjusting the font size & colour to match (you should hopefully see the colour you’ve recently selected out of the hue circle palette in the ‘recently used’ section of the text colour selector

g. Create a new dashboard and add this view to it, and set to ‘fit width’ if it isn’t already

h. Add a floating button object to the dashboard and position top right

i.  Edit button, by clicking the carrot and changing the option

Navigate to – select the name you’ve given the 4 KPI block dashboard

 Button style – Text

Title – GO BACK (and adjust font to suit)

Background – select the relevant colour to match

j. Back to the KPI dashboard, and select Dashboard -> Actions from the menu, and add a ‘Go To Sheet’ action.

Give it a suitable title, then select the source sheet on the KPI dashboard as the one displaying the Customer Count KPI, and set the Target sheet to be the name of the Customers dashboard created above.

Once you’re happy all is working as expected, then repeat the steps outlined above to create a Sales by Product, a Sales By City and a Sales by Order dashboard.  You may need to create some further calculated fields to handle the UPPER case formatting, and don’t forget to recreate the ‘dummy’ field to help with the colouring.

My version of the challenge is published here.

Happy vizzin’

Donna