If you attended / watched Ann & Laura’s Speed Tip Session at TC this year, then you’ll have seen how to complete the core part of this challenge. I watched the session after I’d completed it, and found I’d approached it slightly differently from Ann. I’ll walk through what I did.
So the core requirement of this challenge was to
Identify % of Sales per State
Group those with a value less than a threshold defined by the user, into an ‘Other’ pot, and colour the ‘Other’ pot differently.
Sort the display based on % Sales descending, except the ‘Other’ pot should always be at the bottom
Label each bar with State/’Other’ pot and % sales, positioning the label above the bar, and colour accordingly
Add tooltips for each bar, coloured accordingly
Incorporate a summary of the number of states in and out of the group within the chart title (since it all needed to be created on a single sheet).
As I often do with these challenges, I played around with Ann’s published solution; changing the parameter and seeing how the viz altered, and hovering my mouse over various points, to see if it would give me a clue as to how it had been built – in this case I was mainly interested in the labels.
What I found from doing this was there was another mark at the 0 point of the axis; it looked like a gantt bar mark. This suggested dual axis would be involved.
So armed with that information, let’s crack on…
Identify % Sales Per State
Total Sales
{FIXED:SUM([Sales])}
This simply stores the sum of all the sales against each row in the data source
% Sales Per State
{FIXED [State]:SUM([Sales]) / SUM([Total Sales])}
The sum of the sales per State is determined as a proportion of all sales
Group those with a value less than a threshold
A [Threshold Percent] parameter was created to store the user defined variable (defaulted to 3%), and then this was used to determine how the State should be grouped for display
State Grouping
IF [% Sales Per State] >= [Threshold Percent] THEN [State] ELSE ‘All Other States’ END
Sort the display based on % Sales descending
Whilst the States needed to be sorted with the largest % first, the ‘Other’ pot always had to display at the end. To do this I created a new calculated field
State Order
IF [State Grouping] = ‘All Other States’ THEN 1 ELSE 0 END
With the basic components now defined, we can start to build the chart
The [State Order] field placed at the front, meant that the States are listed first, and forces ‘All Other States’ to the bottom.
Setting the Sort on the [State Grouping] field to be sorted based on the [% Sales Per State] desc, gives the required result:
Colouring based on [State Order] gives the split required too.
Label the Bar
As stated above, a dual axis is needed for this, utilising our old friend, MIN(0) on the columns. The Mark Type of MIN(0) is set to Gantt Bar, with the [State Grouping] and [% Sales Per State] added to the Label, with the font adjusted to Match Mark Colour, and aligned top right. The Colour of the Gantt Bar mark is also set to 0 transparency so it looks invisible.
To make the label appear when the axis is ‘dualled’ and synchronised, requires a bit of ‘jiggery pokery’ with the row height and bar size. I found this to be very much trial and error, even publishing a couple of times, as while it looked overlapped on my laptop screen, the view did render ok on public.
Note – what I found with this solution is that changing the parameter could make the labels overlap again, which wasn’t ideal. Ann’s solution also suffered from this. I’ve been trying to figure out an alternative, but haven’t got there yet.
Tooltips
The tooltips on the bars look straightfoward, but the State ‘label’ is coloured based on the bars, so requires a couple of calculated fields
Label : State
IF [State Order]=0 THEN [State] END
Label : Other States
IF [State Order] = 1 THEN ‘All Other States’ END
Both these fields are added to the Tooltip side by side and coloured accordingly. As only one will ever be true per row, only one will display :
Count of States for chart title
To count the states based on which group they were in, required a couple of table calcs :
Count States Listed
WINDOW_SUM(IF ATTR([State Order]) = 0 THEN COUNTD([State]) END)
Count States Grouped
WINDOW_SUM(IF ATTR([State Order]) = 1 THEN COUNTD([State]) END)
These were added to the Detail shelf, so could then be referenced in the title of the Viz. The table calc needed to be applied per State Grouping
Then it was just a case of hiding various pills, and formatting to remove row lines and gridlines. A reference line for the selected parameter was also added.
My published version is here. The main difference to Ann’s solution is that she used Sets to group the States together.
One of the main reasons I like to partake in the weekly #WorkoutWednesday challenges, is to get the chance to try out the newer features of Tableau that I can’t use at work, due to the version we’re using (we’re never in a position to react to upgrades at the same speed as Tableau releases).
This week, Lorna provided a challenge to introduce Explain Data which is a feature Tableau released as part of v2019.3, so to complete this challenge, you’ll need this version of Tableau Desktop.
The challenge itself isn’t that taxing, so I’m going to attempt not to make this blog that lengthy (which is a challenge in itself… 🙂 ).
Having not used Explain Data before, then the first thing I did was google it, and I watched & read the info Tableau had published here. There’s absolutely nothing wrong in using online resources to help you complete these challenges :-).
So with that understood, I built the basic vizzes required:
Sales by Quarter
Straightforward line graph….
This Year vs Last Year
A few calculated fields needed for this one:
Latest Year
{FIXED : MAX(YEAR([Order Date]))}
Previous Year
[Latest Year]-1
Sales This Year
IF YEAR([Order Date]) = [Latest Year] THEN [Sales] END
Sales Last Year
IF YEAR([Order Date]) = [Previous Year] THEN [Sales] END
The [Sales This Year] is then plotted against [Sub-Category] with [Sales Last Year] added to the Detail shelf and displayed as a reference line against each cell.
The [Sub-Category] field is hidden, so the row labels aren’t duplicated when displayed next to each other on the dashboard.
Sales by Order ID
This is a dot plot that has then been ‘jittered’ to spread the marks out vertically in a random manner.
Without the jitter, you’d just get…
…where all the circle marks are arranged on the same horizontal line, and subsequently, the majority all appear on top of each other.
To make the jitter effect, I make use of a hidden function in Tableau called Random() which assigns a random no between 0 and 1 to each mark on the row.
Jitter
Random()
Then adding Jitter to the rows (and disaggregating), a new axis gets added and the marks get spread vertically, giving a much better sense of the volume of marks that are around the same values.
Hide the axis to get the desired display. The vertical dotted line is a reference line set to be the average sales for each row.
For further reading on creating jitter plots using Random() or Index() please check out :
So with the 3 charts built, these can be added side by side on a dashboard, and with all set to Fit Entire View, you can guarantee each row will line up. You can adjust the padding of each object to make it look like the row divider lines actually continue as a single line (I forgot to do this on my published version :-)).
Explain Data
So having read the information on Explain Data, I switched to my dot plot, selected a mark, and used the ‘lightbulb’ in the command toolbar to view the Explain Data information. I then saved each of the suggested explanation visuals as it’s own sheet, which I added side by side on another dashboard.
The text at the bottom of each worksheet, is simply the caption text that is automatically generated, and can be exposed on the dashboard by enabling the caption to show.
And that’s it for this week. A relatively straight-forward challenge, but providing some good learning opportunities. My published viz is here.
It’s #data19 in Vegas, so this week’s #WorkoutWednesday challenge, set by Curtis was run ‘live’ at the Conference. Not being there, I had to wait patiently at home until it was released and build solo 😦
Given it was a live challenge, Curtis wanted to ensure it offered the right balance for those newer to Tableau & #WorkoutWednesday, while still giving the more experienced participants something to get their teeth into. And I think he managed this well.
Building the scatter plot
The scatter plot is built to be dynamic, with both the axes and displayed marks changing based on user selection. This is driven by 3 parameters
(quick side note here…. the challenge states to use the Superstore Sales dataset that comes with v2019.1. To get the Manufacturer field, you need to use the provided Tableau data source and not the excel file – this always trips me up!).
(another note … the challenge says to use Profit, but the solution provided used Orders – I chose to use Orders too so I could verify a match)
Set Y-Axis
The same as above (just duplicate the parameter and rename).
So with the parameters in place, we need to build calculated fields to reference them.
LOD
CASE [Set Level of Detail] WHEN ‘Customer’ THEN [Customer Name] WHEN ‘Manufacturer’ THEN [Manufacturer] WHEN ‘Product’ THEN [Product Name] END
X-Axis
CASE [Set X-Axis] WHEN ‘Quantity’ THEN SUM([Quantity]) WHEN ‘Sales’ THEN SUM([Sales]) WHEN ‘Orders’ THEN COUNTD([Order ID]) WHEN ‘Discount’ THEN SUM([Discount]) END
Y-Axis
CASE [Set Y-Axis] WHEN ‘Quantity’ THEN SUM([Quantity]) WHEN ‘Sales’ THEN SUM([Sales]) WHEN ‘Orders’ THEN COUNTD([Order ID]) WHEN ‘Discount’ THEN SUM([Discount]) END
(basically duplicate X-Axis and edit & rename)
With these we can now build the basic scatter : [X-Axis] on Columns, [Y-Axis] on Rows and [LOD] on Detail :
Changing the parameters and the scatter will change too.
So now its just a case of formatting and adding a few additions to match the display.
First up, the solution has the X-Axis scale on the top, rather than the bottom. This is achieved by dual-axis.
Duplicate the [X-Axis] on the Columns shelf (Press Ctrl, click on the [X-Axis] pill and drag next to it.
Make dual-axis and synchronise
Edit the bottom axis, deleting the title of the axis, and changing the Tick Marks to None.
The solution also shows each axis is labelled based on the parameter selection. To do this, edit both the left and top axis to remove the titles, then add the [Set X-Axis] parameter to Columns and [Set Y-Axis] parameter to Rows.
Now right-click on the displayed Set X-Axis and Set Y-Axis and ‘hide field labels for columns/rows’, and right-click on Discount and ‘rotate label’.
Finally, adjust the tooltip to contain :
<Parameters.Set Level of Detail> : <LOD>
That’s the core viz built, it just needs some formatting applied, to
remove column and row lines
remove gridlines (the 0-lines are retained)
Change mark type to circle, adjust colour to suit and add border
Bar Charts & Proportional Brushing
When it’s all together on the dashboard, the selection of marks in the scatter should cause the bar charts to update to show the % of each variable accounted for in the selection. This concept is referred to as proportional brushing, and makes use of the Set Actions feature. Tableau’s own KB article describing Set Actions and proportional brushing can be seen here.
The first thing we need to do is build a set : right click on the [LOD] field and select Create -> Set. In the Set dialog, name the set Selected LOD and randomly select some values (not all)
We also need various calculated fields :
# LOD
COUNTD([LOD])
Just counts the number of distinct items in the user selected LOD. When bringing the Selected LOD set into play, we can the count the number of items in or out of the set, and using a quick table calculation, change to get the % of total. I tend to start using data tables at this point, to sense check the data I need before I viz :
I’ve turned on Grand Totals for columns too, just so I can check the numbers are what I expect.
Adding the [X-Axis] and [Y-Axis] values into the table, and finding % of total too
and we’ve got all the % needed to make up the bars. At this point, edit each table calculation and fix it to compute by the In/Out Selected LOD (so if you shift pills around later, you won’t change the values).
While we could build the viz at this point, I noticed that the tooltip on the bar chart displays the same information whether you hover the ‘in’ or the ‘out’ portion of the bar
With the data we currently have, we couldn’t do this, so we need a few more fields/calculations. We need to be able to get the total value of the LOD/measure against each row, as well as the number and % of the ‘In’ items.
We’ll use table calculations for this.
Total #LOD
WINDOW_SUM([# LOD])
# LOD Selected
ZN(WINDOW_MAX(IF ATTR([Selected LOD]) THEN COUNTD([LOD]) END))
This counts the number of items selected in the set, and the WINDOW_MAX statement will then ‘spread’ that value across both the rows (as only 1 row contains a value).
% LOD Selected
[# LOD Selected]/[Total # LOD]
This is formatted to 1 dp.
Adding into the ‘check’ table, you can see how the values with table calculations are matching the initial set
A similar set of steps are the repeated to get the X & Y axis values
So now I’m happy I’ve got all the fields needed to build the bars, we can, and this is done in 3 separate sheets (it can be done on a single sheet, but I noticed some spacing on Curtis’ solution that suggested it wasn’t).
LOD Bar
You can build from scratch, or as I tend to do, duplicate the check sheets, remove the pills I don’t want, and shuffle the rest around. You ultimately want :
[# LOD] as % of total on Columns
[Selected LOD] set on Colour (adjust colours, and make sure the Out is listed before In on the colour legend, so the darker values show first)
The format to remove axis title, show axis ticks etc.
Name the sheet, then duplicate to create an X-Axis version, dragging the equivalent pills over the top, and do the same for the Y-Axis version. If you do this, just double check all the values are matching your check sheet, and the table calc settings haven’t been lost.
Putting it all together on the dashboard
Build the dashboard, and add the 4 sheets (the scatter and 3 bars) into their appropriate place, and expose the 3 parameters too.
To get the proportional brushing technique to apply to the bars, add a Set Action to change the value of the Selected LOD set when marks are selected in the scatter plot. The Set Action needs to be set to ‘remove all values from set’ when the marks are unselected.
And that’s it! My published version is here. Enjoy!
This week Luke challenged us to create Donut Charts in Tableau, but added additional requirements to make it just a bit spicier. The full details are here.
Taking inspiration from Ann’s challenge the week before, Luke provided an advanced version to incorporate grey circles for each date when there were no orders, but stated this should be achieved without any data duplication or densification. Deep down, I knew Luke really meant ‘only one sheet’, but he didn’t state this, so my initial interpretation was to tackle the challenge in a similar way to Ann’s, and use 2 sheets; a background with all the grey circles displayed, and a foreground with the donuts/green circles, and float one over the other. This week, while still tricky, I managed to get them aligned, so that was the version I published and released.
I documented this floating technique last week, here, so have no plans to repeat, except to state that I got all the dates to display by creating a Baseline Date field of
which is basically setting all the dates in the dataset to the same year. I then used this date to build my background viz, and as there was an order in every region for the day/month combination, I didn’t get any gaps. If the dataset didn’t exhibit this behaviour, this technique would not work, so it isn’t the most robust solution.
Right, now I’ve finished talking about that, let’s get onto building the chart…
Fixing the report date
Luke stated the report should be based on the last 7 days from 6th Nov 2019, so this date had to essentially be ‘hardcoded’ into the report. The easiest way to do this in a way that could be quickly changed if need be, was to create a parameter, Report Date, defaulted to 6th Nov 2019.
Restricting to 7 days
I created a calculated field to limit the data just to the days I was interested in
This required a few calculated fields. First up, we needed to know if the order had shipped before the report date or not
Has Shipped?
[Ship Date]<= [Report Date]
# Shipped
IIF([Has Shipped?],1,0)
This returns 1 or 0 depending on whether the order has shipped before the report date or not. This field when added to the viz, can then be aggregated (summed) to provide a count of the orders that have shipped.
% Shipped
SUM([# Shipped])/SUM([Number of Records])
How many shipped as a proportion of all records. NOTE – in this solution, an order equates to a row in the data set. However technically, an order can contain multiple rows, so when counting orders, you could think you need to COUNTD([OrderID]). This is certainly valid, and IMO more accurate, but things do get a bit more complex this way. Luke had counted rows, so I decided to work with getting the data to match his solution.
Finally we needed to know if the orders had been fully shipped or not, as this is what will determine whether we’re showing a green circle with a tick, or a donut.
Fully Shipped?
[% Shipped]=1
Building the Donuts
I’ve worked with donut charts before, so knew that the trick was in using a dual axis technique to show a pie chart with a smaller circle mark on top.
I also needed ‘axis’ to work with, so incorporated our old friend MIN(0), and added 2 instances as synchronised dual axis like below:
The first instance, I changed to be of mark type Pie chart, adding SUM([Number of Records]) to the Angle shelf, and Has Shipped? to the Colour shelf. The colours were set to green if Has Shipped? was true and grey otherwise. The colours were reordered, so True is listed first, so the green starts at 12 o’clock.
On the second instance, I changed the mark to be a circle, adjusted the size so it was smaller, and added Fully Shipped? to the Colour shelf. This time the colour is set to green if true, but white if false.
For the labels, I created 2 calculated fields :
Label Tick
IF [Fully Shipped?] THEN ‘✓’ END
Label % Shipped
IF NOT([Fully Shipped?]) THEN [% Shipped] END
and added these side by side to the Label shelf of the Circle mark. The font was adjusted to fit, and centre aligned.
Removing the column/row labels, hiding some of the pills and formatting the axis/gridlines etc, and the core of the viz is built. This would achieve the ‘standard’ solution.
Date Format
The order date was set to a custom format of mmm d, ‘yy to get the desired display.
Displaying the grey circles
As stated above, I originally used the 2 sheets & floating technique to show the circles within the gaps.
After publishing, I had a look at Luke’s solution to see how he’d tackled it.
However much I stared at his workbook though, I could not see what he’d done. All I could tell by playing around was that there were no further marks than what I had already got displayed.
Then Rosario Guana posted up her solution, so I checked hers out too…. still couldn’t see it. I went through every menu option I could think of, trying to find what the secret was…..
Eventually I pinged Rosario a message… background image was the response.
Doh! I don’t use these often, well actually I’ve only used it once on another #WorkoutWednesday challenge that Rody Zakovich set sometime ago. Tableau’s help on using Background Images (which you often associate with maps, and hence is under the Maps menu option), is here.
So now I had the clue, what did I need to do….
First up, I needed an image to use. As I’d already downloaded Rosario’s workbook, I used the one in her workbook.
Secondly, I needed ‘proper’ pills to map the co-ordinates of my image to, my ‘typed in’ MIN(0) wouldn’t suffice. So I created
Col
0.0
Row
0.0
I then rebuilt the viz with Col (set to be a dimension), and added Row to the rows shelf (again set to be a dimension). I hid all the headers again
I then added the image via Map > Background Image > <Data source name>, then Add Image, and applied settings below
The sizes just then needed to be adjusted to fit, but this is best done when displayed on the dashboard itself, as what you see in the sheet view might not present the same when on the dashboard.
Both my versions of this challenge can be viewed here.
An interesting challenge set by Ann this week, which seemed initially to me that it should be quite straightforward, but as usual ended up causing all sorts of frustrations.
The trickiest part was how to show marks for days when there weren’t any sales. From my initial observations of Ann’s viz (hovering over marks etc), I’d figured out that she had made use of the ability to set a sheet’s background as transparent and then float one sheet over the other. This was because when I hovered over the circles with no days displayed, the mark didn’t ‘highlight’ in any way, indicating that it was ‘behind’ something else.
So with this thought in mind, I went about trying to build the basic viz – a calendar view with all days for the background, and a calendar view with just the days with sales to layer over the top.
Ann didn’t state any requirements about how this could be tackled, so I assumed she hadn’t used any sort of date scaffolding (ie using another data set containing all the dates). I therefore tried to do the same, and did end up with something, but it just felt too complex, and wasn’t really ‘future’ proof for a real world scenario, as it relied on each day across a year existing in the dataset being used, regardless of year (ie there was an entry for 1st Jan, 2nd Jan etc all the way through to 31st Dec, but it could be 1st Jan 2016, 2nd Jan 2018 etc). Knowing I had to blog about what I’d done, I decided to take a step back, and approach the challenge based on what I would have done if I was asked to do something similar for work.
I still wanted to incorporate the transparency/floating sheets requirement though, since this was a piece of functionality I believe Ann was trying to demonstrate the use of in this challenge.
Date Scaffold
I first needed to create a data source to use as my date scaffold. This is simply an excel sheet containing 1 column, Date, with 1 row per day ranging from 01 Jan 2017 through to 31 Dec 2019.
I created this, saved the sheet and then connected Tableau Desktop to it. This will provide the data for my background calendar.
Building the Background Calendar
The overall calendar needs to be organised into 3 columns (based on months) and 4 rows (based on quarters). Within each column, there is then a column per day of week and within each row there is a row per week of the year. Essentially we’re building a trellis type chart.
Column Number
When building trellis charts, there’s all sorts of clever techniques to flex the number of rows & columns based on how many ‘entities’ in the data you’re trying to organise. In this instance we’re dealing with months and we know the data is going to be static, so I just created a simple calculated field to determine which column each month should sit in :
CASE MONTH([Date]) WHEN 1 THEN 1 WHEN 2 THEN 2 WHEN 3 THEN 3 WHEN 4 THEN 1 WHEN 5 THEN 2 WHEN 6 THEN 3 WHEN 7 THEN 1 WHEN 8 THEN 2 WHEN 9 THEN 3 WHEN 10 THEN 1 WHEN 11 THEN 2 WHEN 12 THEN 3 END
This field was then added to the columns shelf, and next to it I added WEEKDAY([Date]) .
Top Tip – by right clicking on a Date field in your dimensions pane, and dragging onto the columns/rows shelf, you get a date selection dialog displayed when you release the mouse, so you can quickly choose the type of date you want to display
By default, as I’m based in the UK, my weeks are set to start on Mondays, but the display starts on a Sunday. To fix this, right click on the datasource and select Date Properties
Onto the rows, I added QUARTER([Date]) using same technique as above.
Now I need to get the weeks to display, but if I just add WEEK([Date]) to rows, I get too many rows for each quarter
What I want is to be able to show weeks 6-9 and 10-14 on the same rows as weeks 1-5. To achieve this I created
The FIXED part of the calculation is finding the minimum week number for each month. So for January, the minimum week number is 1, for February it’s 5, for March it’s 9. That minimum number is then being subtracted from each week number, so for
week 1 (in Jan), I have 1-1 = 0
week 2 (in Jan), I have 2-1 = 1 etc
week 5 (in Feb) I have 5-5 = 0
week b (in Feb) I have 6-5 = 1 etc
Adding Week Index to Rows rather than WEEK([Date]) gives me
which is the layout I’m after.
Restrict the Year
I created a parameter SELECT YEAR which I listed 2017, 2018, 2019. I then created a new calculated field
FILTER : Year
YEAR([Date]) = [SELECT YEAR]
which I added to the Filter shelf and set to True
I then just had to apply some formatting :
Set the mark type to circle and choose a light grey colour, increase the size slightly
Set the row banding
Lighten the row/column divider lines
Set the Q1, Q2 etc label headings to white font (this is the background sheet, so I don’t want them to display at all)
Set the Sunday, Monday etc label headings to white font
Don’t show the Week Index header
Don’t show the Column Number header
Hide all other column/row labels
Name the sheet Backgound
Building the Foreground Calendar
Using the Superstore dataset, I basically repeated all the steps above, but this time referencing the Order Date field in the data set so
Column Number
CASE MONTH([Order Date]) WHEN 1 THEN 1 WHEN 2 THEN 2 WHEN 3 THEN 3 WHEN 4 THEN 1 WHEN 5 THEN 2 WHEN 6 THEN 3 WHEN 7 THEN 1 WHEN 8 THEN 2 WHEN 9 THEN 3 WHEN 10 THEN 1 WHEN 11 THEN 2 WHEN 12 THEN 3 END
which I again added to the Filter shelf and set to True
When plotted on the sheet in the same way, you can start to see the gaps appearing where there is no sales for that day.
I then applied the following changes
Set mark type to circle, colour slightly darker grey than that selected above, added border to circle, and adjusted size slightly to match the other sheet
Removed the row banding completely
Removed the row/column divider lines
Set the Q1, Q2 etc label headings to larger font aligned middle & centre
Set the Sunday, Monday etc label headings to darker font, and formatted to just show First Letter
Don’t show the Week Index header
Don’t show the Column Number header
Hide all other column/row labels
Labelled the sheet Foreground
Added Order Date to the Text shelf, setting it to be Discrete Exact Date, then changed the format to dd/mm. Changed font size to 7 and centre aligned.
Added Sales to the tooltip, setting the format to be $ with no decimal places
Created a copy of Order Date (right click > duplicate), an added the copy to the tooltip, setting the format to be Wednesday, 14 March 2001
Adjusted the tooltip to match
Changed the background of the whole sheet from white to ‘None’ – this sets it to be transparent, and is the main trick for this display
Highlight Top 3
To achieve this requirement, I first created a parameter HIGHLIGHT TOP 3 containing the values DAYS, WEEKS, MONTHS
When MONTHS was selected, I needed to find the top 3 months in the year, etc. So I created some calculated fields
This stores the total week’s sales against every row
Value
IF [HIGHLIGHT TOP 3] = ‘MONTHS’ THEN SUM([Total Monthly Sales])
ELSEIF [HIGHLIGHT TOP 3] = ‘WEEKS’ THEN SUM([Total Weekly Sales])
ELSE SUM([Sales])
END
This is basically storing the relevant value I need to consider for the Top 3 based on what was selected in the parameter.
I then also created an additional field
Group By Date
IF [HIGHLIGHT TOP 3] = ‘MONTHS’ THEN DATETRUNC(‘month’,[Order Date]) ELSEIF [HIGHLIGHT TOP 3] = ‘WEEKS’ THEN DATETRUNC(‘week’, [Order Date]) ELSE [Order Date] END
This captures a date at the relevant level on each row depending on the parameter selection.
InTop 3
From the Group By Date field, I then created a set, which I set to be Top 3 by Value
To see how all these fields interact, build a basic viz with Group By Date on rows (exact date, discrete), and Value on Text. Add In top 3 to rows too. Then add FILTER- Year to the filters shelf, and add to context. This step is crucial to ensure the year filter is applied before the set computes its top 3. Use the parameters to see how the values of Group By Date and Value change
So now you can see how the set is working with the parameter, the set can now be added to the Colour shelf of the Foreground sheet, and the colours adjusted accordingly. The FILTER – Year needs to be added to context on this sheet too.
Building the dashboard
So now the two sheets have been created, the dashboard can be built.
I started by setting the size of the dashboard to 1600 x 1300, adding a text field for the title, and text fields underneath for my standard ‘footer’. I then added the Background sheet into the middle between my title & footer, moving the parameters to form part of the title row. I hid the title of the sheet and set to Fit -> Entire View.
At this point everything on the dashboard is tiled.
I then changed the option to floating, and added my Foreground sheet. As with the background, I hid the title and set to Fit -> Entire View.
I then used the position values of the Background sheet (the x & y position and the height & width), to set the position values of the Foreground sheet to be exactly the same. The intention here was the circles on the Foreground should then be positioned directly over the circles on the Background, and as the Foreground was transparent, the circles that were missing on the Foreground where there were no sales, would show through from the Background sheet, along with the row banding & the row/column lines.
However, this just wouldn’t work as I hoped. I checked the padding options, I shifted things slightly left, right, up & down but when I got some circles lining up, others wouldn’t. It really was quite frustrating and I spent some time trying to fix this, but ended up publishing as it was, which you can see here. In truth I was secretly hoping that by publishing to Tableau Public, it would miraculously work, but it didn’t :-(.
I checked out Ann’s viz and she had managed to get it all to line up beautifully, although the position values on her sheets weren’t exact either, so I’m not sure if this was just trial & error to get right too. I’ll have to watch her solution when it’s published.
So that was attempt 1, but I really wasn’t happy, so came up with an alternative…
Take Two – Join the data at source
For this version, I used Tableau’s ability to join two excel data sources together to create a single data source, with 1 row for every day from 01 Jan 2017 to 31 Dec 2019, supplemented by the relevant superstore sales data against each date if there was a sale on that date.
The data was left outer joined, using Date = Order Date as the joining key
I then created the viz on a single sheet using Dual Axis. To get an axis though, I had to create a Week Index (Date) field and a Week Index (Order Date) field (just as the two Week Index fields described above), but when added to the view, they were set to continuous with the axis reversed to ensure the 0 was at the top rather than the bottom.
All the data required to colour the circles, apply the tooltips etc was added to the 2nd Week Index (Order Date) marks card.
The only difference with this version, is that when you hover on the non-labelled circles, they do ‘highlight’. I can live with this, and felt the solution was much ‘cleaner’ and far less complex 🙂 However it does rely on the fact that the types of data sources I was working with could be joined. In my day job, a lot of the data sources I use are published Tableau Server data sources, and as yet Tableau doesn’t allow these to be joined 😦
It’s getting near #data19, Tableau’s annual customer conference, which once again is being held in Las Vegas. So for #WorkoutWednesday this week, Lorna decided to incorporate this into her challenge, which was also intended to showcase some of the new features introduced into v2019.3 (so you’ll need this version at least to complete the challenge).
For the challenge, Lorna provided a dataset in a .hyper format that simply consisted of the (casino) name, address and it’s associated latitude and longitude values.
The core requirement was to show all the casinos within a user defied radius from another casino, which was selected by user by clicking on the map.
So we need to
identify the selected casino
determine the lat and long of the selected casino (the start)
determine the distance (in miles) between the selected casino and all the other casinos
identify the radius
restrict the display to just those within the specified radius
Identify the selected casino
This is simply a parameter Selected Casino which is initially hardcoded to have the default of “Mandalay Bay Resort & Casino” (it’s important you get this spelt right using the exact case too).
I then have a boolean calculated field
Is Selected Casino?
name = [Selected Casino]
which is true if the name of the casino in the dataset matches the value stored in the parameter
Lat & Long of selected casino
My aim is to get the latitude and longitude of the selected casino to be stored against each row in the dataset, so I need to use an LoD to do this
The first MAKEPOINT function above determines the start point using the information about the selected casino, and the second MAKEPOINT determines the end point, using the casino location information stored on each row.
Identify the radius
The viz requires the data to be restricted just to those within a defined radius, so we need another parameter, How many miles? to capture this value.
In Lorna’s viz, the parameter has a fixed range, which is specified as part of the parameter properties
In order to then restrict the casinos shown to just those within range, we need
Within specified miles?
[Distance (miles)]<= [How many Miles?]
The final thing we also need at this point, is a Distance (metres) field as this is used to colour the marks on the viz, and is also referenced on the tooltip.
Building the map
So now we’re at the point we can start building the map.
My initial observations upon hovering over Lorna’s published solution, was this would need to be a dual axis map. The text on the tooltip differed dependent on whether the mark was the selected one or not. The colour of the marks also differed; for the selected mark, the colour is black, while for the non selected marks the colour varied based on the distance.
So with this in mind, I decided to build another calculated field
IsSelected Lat
IF [Is Selected Casino?] THEN [latitude] END
which would just store a latitude value against the row of the selected casino only, as demonstrated below (I set it to have a geographic role of latitude)
I figured I’d then plot this as a dual axis on the map, but as soon as I added, second latitude pill, my display changed from having a map background, to a standard scatter plot chart 😦
So I started to go down the route of just making it a single axis chart, but the tooltips started to get too complicated, and the colouring just wasn’t working….
So I reverted back to the original plan, but this time, used the same latitude pill for my dual axis, which retained the map background
I now needed to differentiate on each map between the selected casino, and the others, and did this initially by Size
Size (selected)
IF [Is Selected Casino?] THEN 2 ELSE 1 END
Size (non selected)
IF [Is Selected Casino?] THEN 2 ELSE 1 END
These are independent fields which can be adjusted differently, and one is placed on the Size shelf of each marks card and adjusted so on one chart the selected mark is the biggest, and all the other marks are as small as they can be. On the other chart, the selected mark is made as small as possible.
I then started adding the various other features
added Distance (metres) to the colour shelf on the non- selected marks card, changing the colour palette to purple, and reversing the range so the lower the distance, the darker the colour
changed the mark type to Shape on the selected marks card and added Is Selected Casino? to the shape shelf, changing the shapes accordingly
The location marker image I found on flaticon.com (remember to add an attribution on your viz if need be to the icon creator)
See this blog for how to get the shape into Tableau
Set the tooltip appropriately for each marks card
Filtered the view to restrict to the casinos in range
The map background needs to be changed to street view, which is set via the Map -> Map Layers setting in the menu
We can now make the map properly dual axis, and can put on a dashboard.
Changing the selected casino
Now it’s on the dashboard, we can use a parameter action to change the value of the selected casino ‘on click’
Displaying count of casinos within radius
The dashboard shows a summary on selection, which I added as the title of the map sheet
I created a calculated field
Count Casinos
TOTAL(SUM([Number of Records])) – 1
which needs to exclude the selected casino, hence the subtraction. I added this to the Detail shelf, so I could reference it in the sheet title.
Ensuring all the casinos remain ‘visible’ on selection
By default, on selection of a different casino, all the other casinos within range are faded out, which we don’t want to happen
So I used the same ‘dummy’ trick with highlight action that I documented in a previous blog to prevent this from happening
The dashboard just needed a few more tweaks – adding the title (which references the parameters), adding the colour legend and the distance parameter, and I was done. I published my viz.
Except I wasn’t completely happy…
When I increased the number of miles, the location shape became ‘hidden’ beneath the other marks, but if I changed the order of the axis, tiny marks were visible on each casino, which I couldn’t get the colour to match
This stumped me, so I had to check out Lorna’s solution, but I stared for some time and just couldn’t figure out what she’d done. It was Rosario Gauna’s solution where I immediately saw what I had to do, as Rosario had made a comment in the worksheet caption as a reminder.
Frustratingly, this is exactly what I do to remind me of this sneaky trick… the small marks on the ‘selected casino’ mark shelf, were ‘hidden’ by simply right clicking on the Is Selected Casino? shape legend and ‘hiding’ the false values
Doh! I wish there was a way you could see more easily on the sheet in Desktop where someone has done this.
My initial thoughts when I first saw this week’s challenge posted by Ann Jackson, was this would be pretty straightforward. And on the whole it was. I’ve dealt with defining specific date periods for comparison in the past, both with other #WorkoutWednesday challenges and within my job too. But Ann threw a curveball with her dynamic input requirement using what she termed as ‘custom buttons’.
The date listing didn’t phase me – the date forms part of the dataset, so displaying that to look like formatted ‘buttons’ (aka circle marks) was something I knew I could do.
It was the time range options that had me puzzled. The values ‘Last 3 Months’, ‘Last 6 Months’ etc aren’t values associated to any field in the dataset. Typically they would be options of a parameter. However parameters can’t be formatted in the way Ann presented. The closest you can get with a traditional parameter is a radio button list, where you have limited formatting options – certainly nothing as colourful as Ann’s requirement.
When I came to tackle this challenge, as this was the area I was most stumped over, I chose to just use the basic parameter concept to start with to get everything else working as I hoped. When I finally figured out how to do it, I adapted what I’d built to incorporate my change. For the purposes of this write up though, I’m going to start with the ‘dynamic inputs’.
Pick Time Range
From observing the PICK TIME RANGE selector in Ann’s published viz and interacting with it (hovering over the marks, clicking on an option etc) I deduced I needed to build a viz using Circle marks, and a row per option. The problem, as discussed above, was there was no dimension in the dataset that I could use to give me the values I wanted.
I needed to create one, but how?
By using a type of data densification technique, that I’ve used on other challenges, to essentially ‘fake’ a new field. My fellow #WorkoutWednesday pal, Rosario Gauna, recently co-authored a blog all about this technique, which you can read here. As a consequence I’m not going to go into too much detail about this.
I created the following calculated field
Pick Time Range
IF MONTH([Order Date]) = 1 THEN ‘LAST 3 MONTHS’ ELSEIF MONTH([Order Date]) = 2 THEN ‘LAST 6 MONTHS’ ELSEIF MONTH([Order Date]) = 3 THEN ‘LAST 9 MONTHS’ ELSEIF MONTH([Order Date]) = 4 THEN ‘LAST 12 MONTHS’ ELSE ‘YEAR TO DATE’ END
Given the dataset contains records for every month in the 4 years, I could guarantee there would be a record for each of these options.
I used the MIN(0) trick (or MIN(0.0) to create the display I was after, as this ensured I could align the text to the right of the circle mark I needed. This was plotted against my new Pick Time Range field (which has Show Header unchecked). Using MIN(0.0) meant I had a bit more precision when it came to fixing my axis to keep everything starting from the left when I eventually put on the dashboard. My axis were fixed from -0.01 to 0.1.
To make a value look like it is ‘selected’ did still require the use of a parameter….
Time Range
… and another calculated field
Time Selected
[Pick Time Range]=[Time Range]
This was then added to the Colour shelf, with the colours adjusted accordingly
Finally to make the colour change ‘on selection’, I needed a Parameter Action.
First I added the sheet created above to a dashboard. By default, this automatically displays the Time Range parameter and the colour legend to the dashboard. Leave these for now. I then selected Dashboard > Actions > Add Action > Change Parameter, and configured the options as below
When an option is clicked on, the Time Range parameter can be seen to change, and in turn, the selected option changes to pink.
The legend, and the Time Range parameter can now be removed from the display on the dashboard. This bit is done, onto the Date Selector 🙂
Pick End Month
Initially this follows similar principles to the above. I created a calculated field to store the month of the date
Order Date Month
DATE(DATETRUNC(‘month’,[Order Date]))
This sets every date to be the 1st of the associated month. I formatted this date to display as <month> <year>, and added these to a view in the same way I did the above, using MIN(0.0), a circle mark and, right aligning the label. The dates need to be sorted to descending.
Another parameter is needed to store the ‘selected’ end month, which I defaulted to 01 October 2019 (aka October 2019)
Date
But this time, we don’t just need to highlight the date selected, we need to colour the circles based on whether the date is within the ‘current period’, which in turn is based on the Time Range selected above, or within the ‘previous period’.
So if the end month is October 2019 and the Time Range is LAST 6 MONTHS, the Current period is the dates from May 2019 to October 2019, and the previous period is the 6 months prior to that.
To determine all this, I need to determine the start and end of the current period and the start and end of the previous period.
The end of the current period is simple, its the date selected
End Date Current Period
DATETRUNC(‘month’,DATE([Date]))
this is probably a bit overkill, as the default date is 1st October and all the other dates have been truncated to be the 1st of the month, but this just ‘makes sure’ 🙂
To get the end of the previous period, I need to either go back a number of months from the end date, or set to the same date of the previous year, if Year To Date is selected. I need to use a DATEADD function, but I need the number of months as an integer, so I need another calculated field to help with this, which will return me a number based on the value in the Time Range parameter
Time Value
CASE [Time Range] WHEN ‘LAST 3 MONTHS’ THEN 3 WHEN ‘LAST 6 MONTHS’ THEN 6 WHEN ‘LAST 9 MONTHS’ THEN 9 WHEN ‘LAST 12 MONTHS’ THEN 12 ELSE 0 END
So now I have this, I can work out
End Date Previous Period
IF [Time Value] = 0 THEN //take off a year DATEADD(‘year’,-1,[End Date Current Period]) ELSE //go back a few months DATEADD(‘month’,([Time Value] * -1), [End Date Current Period]) END
and
Start Date Current Period
IF [Time Value] = 0 THEN //1st of Jan for current year DATETRUNC(‘year’,[End Date Current Period]) ELSE //calc start time period as 1 month on from previous period end DATEADD(‘month’, 1, [End Date Prior Period]) END
and
Start Date Prior Period
IF [Time Value] = 0 THEN //1st of Jan for previous year DATETRUNC(‘year’,[End Date Prior Period]) ELSE //calc start time period (x months – 1) before end date DATEADD(‘month’, ([Time Value]*-1)+1, [End Date Prior Period]) END
With all the dates now sorted, we can determine which colour each month should be set to
COLOUR:Date Selector
IF [Order Date Month]>=[Start Date Current Period] AND [Order Date Month]<=[End Date Current Period] THEN ‘blue’ ELSEIF [Order Date Month]>=[Start Date Prior Period] AND [Order Date Month]<=[End Date Prior Period] THEN ‘dark grey’ ELSE ‘grey’ END
which can then be added to the Colour shelf of my date list sheet.
To get the Date parameter to change on selection of the sheet, we need to use parameter actions again, so add the sheet to the dashboard, and create a new Dashboard Action
Now we’re ready to build the line chart…
Period by Period Line Chart
This is essentially a ‘Sales Over Time’ chart, but we can’t just plot a date on the columns shelf as we need to be able to compare the time periods, ie the value of sales associated to the date of the current start period must be plotted in the same ‘column’ as the value of the sales associated to the the date of the previous start period.
This is managed using the table calculation INDEX() to essentially number the dates from 1 to x for each period band (which I could use my COLOUR:Date Selector field to segregate). This concept of plotting dates from a set point is sometimes referred to as a rocket chart, although there are other names for them.
My Order Date Month field has to exist on the Detail shelf, and and the Index is set to compute using that field too
However, as you can see, I have 3 date periods – the dates based on the current period, the dates in the previous, and all other dates (the light grey line). We don’t want these, so another calculated field…
Dates To Include
([Order Date Month]>=[Start Date Prior Period] AND [Order Date Month]<=[End Date Prior Period]) OR ([Order Date Month]>=[Start Date Current Period] AND [Order Date Month]<=[End Date Current Period])
… is added to the Filter shelf and set to True (the dates to display must either be within the Current Period, or within the Previous Period).
Add this to the Dashboard, and you can now start to see how the line chart will change as the dates are changed and the time period are changed.
So final step now…
Title
The title of the dashboard updates based on the inputs selected, so the title is just built on another sheet, placing all the relevant fields on the Text shelf and formatting accordingly
Adding this onto the dashboard too, and you have all the components this challenge requires. You just need to sort any formatting/placement changes.
#WorkoutWednesday Week 41 saw Luke challenge us with this task, based on a real scenario he’d encountered during his job.
Before I’d even saw Luke’s tweet advertising the challenge, I knew this was going to require Set Actions
To avoid any discrepancies interpreting the requirements, Luke very kindly defined all the calculations, so I’m not going to dwell on these. This blog will focus on the requirements I think will be most useful / those I had to think about, so I’m making assumptions you can build the calcs and the basic vizzes required. My published workbook (referenced at the end of the blog) can be downloaded if you’re not too sure.
Only Show Customers with Sales over $500
I created a fixed level of detail (LoD) calculation to store the total sales per customer
Total Sales per Customer
{FIXED [Customer Name] : SUM([Sales])}
which I then referenced in another calculation
Sales > 500
[Total Customer Sales]>500
which I added to the Filter shelf of all the sheets I built, setting the value to True.
Change the circle colour on the scatter plot for selected customers
I created a Set based on the Customer Name
I called the Set Selected Customer and just ticked a random set of customers in the list presented (I’ll describe how the set members, ie the customers, change later).
Dragging the Selected Customer set onto the Colour shelf of my scatter plot, gives me an In/Out colour option, where the ‘In’ represents the customers that are members of the set (those I randomly selected above), and ‘Out’ is those customers that aren’t members.
Move the selected customers to the top of the table
In the table viz, add the Selected Customer set onto the Rows shelf, in front of the Customer Name. This will add a level of ‘grouping’ to the table, with those ‘In’ the set being listed above those ‘Out’ of the set.
Then untick Show Header against this pill, to hide the In/Out from the display
Show a totals for each cohort
This is achieved by adding subtotals to the table.
Go to Analysis > Totals > Add all Subtotals
This will add a Total row which can then be formatted via a right-click, where the Label can be renamed to ‘Cohort’ and bold text applied.
Change the colour of the ‘unselected’ customers to dark grey in the table
I have to admit, I did scratch my head on this one for a few moments…but then the light bulb switched on 🙂
This is simply an adjustment of the row banding formatting option, ensuring the band size & level are set appropriately
Sort Customers by Lost Sales
In the table, this is just done by applying a Sort to the Customer Name field, using the Lost Sales calculated field that you should already have created.
Change the members of the Selected Customer Set…
…which in turn will change which customers are displayed at the top of the table.
This is where Set Actions come into play.
Add both the scatter viz and the table viz to a dashboard.
Then go to Dashboard -> Actions ->Add Action > Change Set Values
Select the Scatter viz to be the source sheet, and the Target Set to be the Selected Customer set. Choose to run action on Select and opt to remove all values from the set when the selection is cleared.
Now test the set action, by selecting circles on the scatter plot – you should see your selections changing colour and the customers selected being listed at the top of the table…. we’re nearly there… one final tricky requirement…
Make sure all members remain un-highlighted on the scatterplot
What you’ll notice when testing the set action above, is that on selection, while your selected customer circles change colour, the unselected customer circles’fade out’
The challenge is for the unselected circles to remain the same colour as in the screenshot to the left above.
We need to use a highlight action to do this, but trick it in such a way that it doesn’t actually highlight anything…
I did this by creating a new calculated field called Dummy which I just set to be an empty string (“”).
Add this to the Detail shelf on the scatter viz.
On the dashboard, add a Highlight dashboard action
Set the source and target sheets to be the scatter viz and change the Target Highlighting section to be Selected Fields, choosing the Dummy field.
And now, on selection, the other customer circles won’t fade away! Hurrah!
After last week’s tricky calcs that meant my solution didn’t match up, it was a welcome relief to get this challenge from Curtis Harris. Whilst some ‘seasoned’ #WorkoutWednesdayers may find this challenge very straightfoward, the main intention was to ensure everyone had been given an opportunity to implement the technique of sheet swapping.
I put myself into the seasoned category and have completed sheet swapping challenges in the past. In the past I would have been googling ‘tableau sheet swapping’ and referring to the raft of blogs & videos that provide you with the technique (maybe this one will become one of those in future!).
The challenge for me this time, was whether I could remember what to do without having to access previous workbooks or online references.
The answer was ‘yes’. I got through this in about 30 mins, which was very satisfying. So here’s what I did.
Build the Views
I created 6 views for this challenge; 3 for the main chart, and 3 for the preview.
I started by creating the Sales by Month line chart, applied all the various formatting to remove the axis & gridlines, set the tooltips etc. Once happy, I duplicated the sheet and changed the mark type to ‘bar’, then I duplicated again and changed the mark type ‘area’
I then named the 3 sheets line, bar , area.
Next I created a ‘preview’ chart. I duplicated the bar chart, removed the text from the tooltip, then added another instance of the date pill alongside to give me a secondary marks card, which I changed to be area.
I duplicated this chart 2 more times, changing the mark types so one showed a line and area, and another line and bar.
I named these Preview :Line, Preview: Bar and Preview:Area.
Show & Hide the Data
I created a parameter called Choose Display Type to store 3 string values : Line, Bar, Area.
The parameter won’t do anything until referenced within another field, so I created the following calculated field
FILTER: Display
CASE [Choose Display Type] WHEN ‘Line’ THEN ‘Line’ WHEN ‘Bar’ THEN ‘Bar’ ELSE ‘Area’ END
I then navigated to the Line sheet created above, showed the parameter control and set the option to Line. Then I added the FILTER:Display field to the Filters shelf. The only option available to select is ‘Line’ , so I ticked it.
I then selected to ‘apply this filter’ to ‘selected worksheets’, selecting my Preview:Line sheet
Then I went to the Bar sheet. Again showed the parameter control, and changed it to ‘bar’. Then I added the FILTER:Display field to the Filters shelf. The only option available this time to select is ‘Bar’ , so I ticked it, and then ‘applied the filter’ to the Preview:Bar sheet too.
At this point, navigating back to the Line sheet, the display is blank because ‘bar’ is the selected option in the parameter, and this sheet is filtered to ‘Line’. Looking at the filter, you see as the parameter is set to bar, the only option for selection in the filter is also bar. As line <> bar, the data doesn’t show.
Finally I went to the Area sheet. Again showed the parameter control, and changed it to ‘area’. Then I added the FILTER:Display field to the Filters shelf. The only option available this time to select is ‘Area’ , so I ticked it, and then ‘applied the filter’ to the Preview:Area sheet too.
Adding to the Dashboard
So while the parameter is driving the visibility of the views, they now need to be added to the dashboard.
The trick is to use Containers objects. These can be a bit fiddly and take some time to get used to. If you haven’t used them before, I recommend you have a watch of Tim Ngwena‘s videos :
I added a vertical container to the dashboard, and one of the first things I always do when working with containers is add a blank object. I then added the 3 initial charts (line, bar, area). Due to the parameter driving the hiding function, only one of these views actually displayed, although the title of each is visible, so you can see it’s been added. As a double check to make sure all are inside the same container, select one of the views on the dashboard (it will have a grey border when selected), then double click on the ‘handle’ at the top
This will then select the layout container the view is in, identified by the blue border
and you should see the 3 sheets + the blank object surrounded by a solid blue border, separated by dotted lines.
Test changing the parameter, and the appropriate sheet should display and the others hidden. Now hide the titles of each sheet, and remove the blank object. The displayed chart should look to fill up the whole space
The ‘preview’ sheets also need adding to another vertical container, but this vertical container needs to be positioned next to some text, so what I actually need first is a horizontal container.
I add the horizontal container to exist inside the same vertical container as the 3 sheets, so it’s at the bottom. The Item Hierarchy on the left nav bar of the Layout tab, helps show where items are positioned
The horizontal container has pushed the other objects up, but I’ll deal with that shortly.
I then add a text object into this container, then place another vertical container to the right of the text object
So now we have a vertical container inside a horizontal container, inside a vertical container. Phew! Once again double-clicking on the container handle will highlight the container the object is within to help you get to grips with the arrangement (I personally wish there was a way to see the layout by displayed lines in some way that don’t show on publication – a bit like the grid display, so you can get a better sense of the structure).
Now into this final vertical container add the 3 preview sheets.
and remove all the titles again, and test the parameter control.
Finally, the requirement is for the preview to be 50 pixels. Containers allow you to set the height for horizontal containers, and the width for vertical containers.
So select the horizontal container (either by selecting on the dashboard, or click the relevant horizontal container on the Item Hierarchy).
then choose Edit Height from the container menu accessible from the right hand side. Set to 50, and voila, the section shrinks.
Week 39 of #WorkoutWednesday was set by Lorna this week. Details of the challenge and a description of what a BCG Growth Share Matrix is here.
The main complexity of this challenge is the calculations, and more so with Lorna’s stipulation ‘no table calcs allowed’.
I started using Tableau before LoDs were introduced, so I tend to be much more comfortable with Table Calcs than I do with LoDs; I know there are many people who are the other way round.
So, let’s get started. In the interest of time (it’s now Sunday afternoon, and I’ve dipped into this challenge on & off since Wednesday), I’m going to focus on the calcs involved in achieving my solution, and not necessarily going into the detail of putting it all together on the Viz.
Sales Scatter
To build the Sales Scatter Plot, we need the following information to help get the %growth and %market share values :
The total sales across all the specified years per subcategory
The sales for the latest year per subcategory
The total sales for the previous years per subcategory
The overall total sales across all the specified years
The number of years to compare against is set by a parameter to be 2,3,or 4 years (No.of Years).
To limit the data just to these years, I created a Dates To Include as
YEAR([Order Date]) >= [Max Year in Dataset] – ([No of Years]-1)
Max Year in Dataset is another field that I could have simply hardcoded to 2019, but I decided to be more dynamic, and derive it using an LoD
YEAR({FIXED: MAX([Order Date])})
which gets the latest Order Date in the data and then retrieves the Year part of it, which happens to be 2019 in this instance. Dates To Include is then True if the year of the order date is greater than, or equal to, the maximum year less one less than the No of Years param.
Eg if No of Years = 2, then Dates to Include is True if the year of the order date >= 2019 – (2-1) = 2019-1 = 2018.
if No of Years = 4, then Dates to Include is True if the year of the order date >= 2019 – (4-1) = 2019-3= 2016.
To sense check the numbers I need, I built a tabular view by Sub Category adding Dates to Include = True to the filters shelf.
Total sales across all the specified years per subcategory
is simply achieved by adding SUM([Sales]) to the view.
Sales Latest Year
The sales for the latest year per subcategory is
IF YEAR([Order Date]) = [Max Year in Dataset] THEN [Sales] END
Sales Previous Years
The sales for the previous years per subcategory is then just the opposite
IF YEAR([Order Date]) <> [Max Year in Dataset] THEN [Sales] END
The value of this field ultimately changes as the No of Years parameter changes. If No of Years = 2, then this will contain the 2018 sales, if its 3, then it will contain the sum of the sales in 2017 & 2018 etc.
The Growth % is stated as being the change for the latest year in comparison to the average sales of the previous years. So next I need
Average Sales Previous Years
SUM([Sales Previous Years])/([No of Years]-1)
Note the [No of Years]-1, as when comparing across 2 years, I’m really only being asked to compare 2019 with the previous year, ie 2018.
The difference of the latest year compared to the average, as a proportion of the average.
Now we need the Market Share of Sales. This would usually be a pretty simple Percent of Total quick table calculation on SUM([Sales]), but I’ve been challenged to do this without table calcs. So an LoD is needed instead. I created
Sales All Years
{FIXED [Dates to Include]:SUM([Sales])}
which gives me my total sales across the specified years, at the ‘overall’ level.
From this I can calculate
Market Share Sales
SUM([Sales])/SUM([Sales All Years])
The table below gives the breakdown of all these fields at a sub-category level for 2 years
Next step is to categorise each row into Cash Cows, Dogs, Question Marks or Stars based on their % Growth & % Market Share values. Lorna stated that 7% should be considered the mid-point for % Market Share, but that the Growth mid point should be half of the maximum growth value.
From the table above, 64.8% (against appliances) is the maximum value in the set, so I need a new field against each row that outputs 32.4%. Once again, with Table Calcs I would just have used WINDOW_MAX to find the max and halve it. But instead I needed another LOD, this one a bit more complex…
I find INCLUDE and EXCLUDE LoDs a bit of a black art, and there may well have been another way to do this, but it was a bit of trial and error in the end, and so once it gave the value I was after, I stuck with it 🙂 I’ve tried to write some sentences to explain it better, but can’t 😦
So with the mid point defined, the categorisation is then
Sales Category (Sub Cat)
IF [Growth Sales]<=MIN([Mid Sales Growth Constant (Sub Cat)]) AND [Market Share Sales]<=MIN([Mid Market Share Constant]) THEN ‘Dogs’ ELSEIF [Market Share Sales]<=MIN([Mid Market Share Constant]) AND [Growth Sales]>MIN([Mid Sales Growth Constant (Sub Cat)]) THEN ‘Question Marks’ ELSEIF [Growth Sales]>MIN([Mid Sales Growth Constant (Sub Cat)]) AND [Market Share Sales]>MIN([Mid Market Share Constant]) THEN ‘Stars’ ELSE ‘Cash Cows’ END
This is all the information now needed to plot the Sales Scatter chart. Custom shapes were sourced from flaticon.com and I used Paint to colour and resize them appropriately, before saving them into a folder in the Shapes folder of the My Tableau Repository directory on my laptop.
Order Scatter
The steps for creating the Orders scatter are the same in principle. I need to find
The total orders across all the specified years per subcategory
The number of orders for the latest year per subcategory
The number of orders for the previous years per subcategory
The overall number of orders across all the specified years
However in doing this I got quite different numbers from Lorna, and its due to how you choose to count orders. Unlike a sales value, where the value is just attributed to a single line in the data set, an Order ID can span multiple lines, which means they can span sub-categories too. Eg imagine you have
Order A – contains 2 line of appliances, and 1 line of phones
Order B – contains 1 line of appliances.
How many orders are there? Answer = 2
How many order lines are there? Answer = 4
How many orders contain appliances? Answer = 2 (100%)
How many order lines contain appliances? Answer = 3 (75%)
How many orders contain phones? Answer 1 (50%)
How many order lines contain phones? Answer 1 (25%)
So when it comes to the ‘market share %’, should I be considering order lines (which means it’ll sum to 100%) or distinct orders which means it’ll sum to more than 100%.
I wrestled with this, and ultimately concluded, this challenge is purely for illustrative purposes, and to stick with my original assumption based on distinct orders (which I’d already calculated before finding the discrepancy). So my Market Share % won’t sum to 100%, and probably isn’t really a definition of Market Share, but I made peace with myself and moved on 🙂
# Orders
COUNTD([Order ID])
Orders Latest Year
COUNTD(IF YEAR([Order Date]) = [Max Year in Dataset] THEN [Order ID] END)
Orders Previous Years
COUNTD(IF YEAR([Order Date]) <> [Max Year in Dataset] THEN [Order ID] END)
Once again the Market share mid point was defined to be a 7% constant, while the growth % mid point needed to be half the max value, which in this instance based on the above, was 1/2 of 51.2%. The same type of LOD was required which gave me
IF [Growth Orders]<=MIN([Mid Orders Growth Constant (Sub Cat)]) AND [Market Share Orders]<=MIN([Mid Market Share Constant]) THEN ‘Dogs’ ELSEIF [Market Share Orders]<=MIN([Mid Market Share Constant]) AND [Growth Orders]>MIN([Mid Orders Growth Constant (Sub Cat)]) THEN ‘Question Marks’ ELSEIF [Growth Orders]>MIN([Mid Orders Growth Constant (Sub Cat)]) AND [Market Share Orders]>MIN([Mid Market Share Constant]) THEN ‘Stars’ ELSE ‘Cash Cows’ END
Top 20 Manufacturers Bar Chart
To create the bar chart, I created a new field Manufacturer-SubCat as
[Manufacturer] + ‘ (‘ + [Sub-Category] + ‘)’
I then set about creating the same data tables as I’ve included above, one for Sales and one for Orders.
The majority of the measures used were the same, but when it came to determining the growth mid-point, I found creating a separate field was simpler than trying to use a single field that worked across both Sub-Category and Manufacturer-SubCat. I followed the same methodology though:
I also found though that the 7% market share constant didn’t really give me the split, so I decided a 3% constant would be better to give me categorisations at this level into each bracket. As I had different comparison values, I needed new categorisation fields too.
Sales Category (Manu Sub Cat)
IF [Growth Sales]<=MIN([Mid Sales Growth Constant (Manu Sub Cat)]) AND [Market Share Sales]<=MIN([Mid Market Share Constant (Manu Sub Cat)]) THEN ‘Dogs’ ELSEIF [Market Share Sales]<=MIN([Mid Market Share Constant (Manu Sub Cat)]) AND [Growth Sales]>MIN([Mid Sales Growth Constant (Manu Sub Cat)]) THEN ‘Question Marks’ ELSEIF [Growth Sales]>MIN([Mid Sales Growth Constant (Manu Sub Cat)]) AND [Market Share Sales]>MIN([Mid Market Share Constant (Manu Sub Cat)]) THEN ‘Stars’ ELSE ‘Cash Cows’ END
Orders Category (Manu Sub Cat)
IF [Growth Orders]<=MIN([Mid Orders Growth Constant (Manu Sub)]) AND [Market Share Orders]<=MIN([Mid Market Share Constant (Manu Sub Cat)]) THEN ‘Dogs’ ELSEIF [Market Share Orders]<=MIN([Mid Market Share Constant (Manu Sub Cat)]) AND [Growth Orders]>MIN([Mid Orders Growth Constant (Manu Sub)]) THEN ‘Question Marks’ ELSEIF [Growth Orders]>MIN([Mid Orders Growth Constant (Manu Sub)]) AND [Market Share Orders]>MIN([Mid Market Share Constant (Manu Sub Cat)]) THEN ‘Stars’ ELSE ‘Cash Cows’ END
To Sort the Manufacturer-SubCategory data by Sales or Orders, I created a SORT BY parameter with values of Sales and Orders.
I then created
Sort Value
IF [Sort] = ‘Order’ THEN [# Orders] ELSE SUM([Sales]) END
On the Manufacturer-SubCat field, I then set the Sort by property to be
Manufacturer-SubCat was then also added to the Filters shelf, with the filter set to
And that covers the main calcs and complexities that went into my version/interpretation of this challenge. I hate it when I can’t get the numbers to match 😦 but ultimately now feel comfortable with what I did and I hope it might explain why some of you may have got differences too…
My version of the viz is here, and I’ve included additional detail on the tooltips too which should show the numbers that went into the % calcs for each mark.