Can you recommend profitable return customer product bundles?

This week Candra Mcrae provided a market basket analysis challenge which had a bit of all sorts going on as well as the core ‘what else was bought with product x’ puzzle – parameter actions, sheet swapping, custom toggle buttons.

Hopefully I’ll cover all the key elements :

  • Identifying the 2nd customer order
  • Identifying the orders containing the selected sub-category
  • Identifying the ‘other’ sub-categories ordered
  • Building the bar chart
  • Building the tree map
  • Creating the toggle button
  • Controlling what viz to display
  • Creating the dashboard and interactivity

Identifying the 2nd customer order

The requirements state the analysis is based on returning customers, and specifically their 2nd order. So we need a way to identify this set of data.

In the superstore data set, a customer only ever places one order on a day, so we will make use of this feature, but this technique won’t always work in other situations – I personally have been known to place multiple orders with Amazon on the same day, especially when Christmas shopping!

If we look at the dates customers have ordered, we’re looking to identify the orders associated to the customer & dates highlighted below

We can identify whether the date is the first order date using a FIXED LoD calculation

Is Customer’s First Order?

{FIXED [Customer ID]: MIN([Order Date])} = [Order Date]

The FIXED LoD statement within the {..} is finding the minimum Order Date for each Customer ID. This is then being checked against each Order Date, and returning true when it matches.

So this is giving us the first record, but we want the second….

I solved this, by first adding Is Customer’s First Order? = False as a data source filter (right click on the Superstore data source -> edit data source filters)

This essentially removes all the rows associated to the first order for each customer from the data set, and consequently, the Is Customer’s First Order? is now reported as True against what was originally the 2nd order…
If you compare the image above to the one further up the page where I marked the 2nd orders, you should see the values match what now seems to be listed as the first!

This works due to Tableau’s order of operations; the data source filter is filtering the data ‘early on’, so the rows of data we’re now working against, has no knowledge of those rows. Subsequently the customer’s first order is now actually reporting the customer’s 2nd order as the minimum (aka first) order date.

We can now add Is Customer’s First Order? to the Filter shelf and set to True, to just work with this set of orders.

NOTEIf the data source has not yet been extracted, which will be required if you want to publish to Tableau Public, then you will need to do one of the following :

  1. At the point of extraction, the data source filter will automatically be applied as an extract filter. If you keep it, then after extracting, remove the data source filter if its still there, otherwise you’ll find you’re reporting against the 3rd customer order OR
  2. At the point of extraction, remove the extract filter that is automatically applied. After extraction, re-add the data source filter if it’s been removed.

An alternative to using any data source/extract filters is to use the following LoD (which I only got my head round after building out the above, so I didn’t put it in my solution., but works just as well).

Is Customer’s 2nd Order?

{FIXED [Customer ID]:MIN(IIF([Is Customer’s First Order]=False,[Order Date],NULL))} = [Order Date]

The IIF statement is getting the Order Date for all the records which aren’t the Customer’s 1st order. The FIXED LOD, is then returning the minimum/smallest of these dates, which will be the date of the 2nd order. This is then compared to each Order Date to identify the appropriate record.

If you strip off any data source/extract filters, you can see how this works, and Is Customer’s 2nd Order? can be added to the Filter shelf instead, set to True, to get the required rows.

Identifying the orders containing the selected sub-category

To start with we need a parameter to store the selected Sub-Category. Right click on Sub-Category and select Create -> Parameter to create a string parameter which lists the sub-categories. I named mine pSelected_SubCat and it’s defaulted to Accessories.

What we’re now trying to find is the rows relating to Order IDs that contain the stated Sub-Category, in this instance Accessories

So we first need to identify the rows which match

Customer Ordered Selected SubCat

IIF([pSelected_SubCat]=[Sub-Category],1,0)

Add this to the view, and you can see the rows associated to Accessories are marked with 1

But we don’t just want these rows; we also need the rest of the rows on the same order, so in the case of order CA-2018-131534 above, we also need the row associated to the Paper Sub-Category.

For this, I used Sets as per the technique described in the link Candra provided within the hint.

We want the set of Order IDs where the sum of the Customer Ordered Selected SubCat field is >=1. Right click on Order ID and Create -> Set. On the Condition tab, set the relevant properties.

Adding the Orders with Selected SubCat set into the view, and we can see it’s picked up the rows we care about, and we can now filter by this set

An alternative to using Sets, is to use another FIXED LoD calculation

{FIXED [Order ID]: SUM([Customer Ordered Selected SubCat])}>=1

For each Order ID, sum up the Customer Ordered Selected SubCat field, and if it’s >=1 then it will return True, and you can then filter by this instead.

Identifying the ‘other’ sub-categories ordered

We want to count the orders which contain each Sub-Category, which is simply

Count Orders

COUNTD([Order ID])

With a basic view that is filtered as required, that lists Sub-Category by Count Orders (and sorted by Count Orders descending), we get

And you’ll see that Accessories is listed at the top. Change the parameter, and that value will also be listed. But we don’t want to see this – we need to filter it out. For this, we can create

Other Items Ordered

If [pSelected_SubCat]=[Sub-Category] THEN ‘N/A’ ELSE [Sub-Category] END

Pop this into the view, and you can see the ‘N/A’ is listed against the selected Sub-Category.

This means we can add a further Filter using the Other Items Ordered field and set it to Exclude N/A.

Building the bar chart

So now we’ve got the 3 filters we need to identify the rows of data we need to consider, we can easily build a bar chart.

To colour the bars, we need to work out the profitability

Profit Ratio

SUM([Profit])/SUM([Sales])

Is Profitable?

[Profit Ratio]>0

Add Is Profitable? onto the Colour shelf and adjust accordingly

You just then need to format to remove gridlines/axes etc, and add the relevant fields (correctly formatted) to the Tooltip to create the required text.

Building the treemap

The simplest way to do this, is to duplicate the sheet you’ve built for the bar chart, then use the Show Me option (top right) and select TreeMap. This will put most of the pills in the right places. The only thing that needs to be shifted is to add Is Profitable? back onto the Colour shelf, and to explicitly set the Sort order against the Sub-Category field. Count Orders also needs adding to the Label shelf, so it can be displayed.

Creating the toggle control

To create the toggle control which will control the switching of the vizzes do the following :

  • In the Columns shelf, type in MIN(0)
  • Then next to it, type in MIN(1)
  • Drag the MIN(1) pill from the Columns shelf and drop onto the MIN(0) axis, when you see the ‘2 green columns’ appear
  • Remove Measure Names from the Rows shelf
  • Change the mark type to Line
  • Create a parameter called Toggle which is an integer containing the values 0 & 1, defaulted to 0. Show this parameter on the view.
  • Create a new calculated field, Selected Toggle which just references the Toggle parameter
  • Add Selected Toggle to the Columns shelf and change to aggregate to MIN rather than SUM
  • Make it dual axes, and synchronise the axes
  • Remove Measure Names from the Colour shelf on the All marks card.
  • On the Selected Toggle card, change the mark type to Circle, and increase the Size. If you change the Toggle parameter the circle should shift to 1.
  • To colour the line based on the position, create a new calculated field

Colour

IF [Toggle]=0 THEN ‘grey’ ELSE ‘green’ END

  • Add Colour to the Colour shelf of the All marks card. With the Toggle parameter set to 0, set the colour to grey. Change the parameter to 1, and set the colour to green.
  • To set the tooltip create a new calculated field

Toggle Label

IF [Toggle]=0 THEN ‘Bar’ ELSE ‘Treemap’ END

  • Add Toggle Label to the Tooltip shelf of the Selected Toggle marks card. Adjust the tooltip text to just show this value.
  • Remove the text from the Tooltip of the Measure Names marks card.
  • Hide the axes headers and remove the row & column dividers

Controlling what viz to display

With the Toggle parameter set to 0, add the Selected Toggle field to the Filter shelf of the bar chart. and set it to be 0 (or at most 0 depending what filter control displays)

Then set the Toggle parameter to 1, and add the Selected Toggle field to the Filter shelf of the Treemap chart. Set this to be 1 (or at least 1 depending on the filter control displayed)

You should now find that if you switch to the bar chart sheet, nothing is displayed. Change the Toggle parameter back to 0, and the bar chart will now show, but there will be nothing on the treemap chart.

Creating the dashboard and interactivity

Create a dashboard sheet.

Add a vertical layout container

Add the bar chart into the vertical layout container. Set the chart to fit entire view and don’t display the title.

Add the treemap chart beneath the bar chart in the same vertical container. Whilst nothing is actually displaying, still set the chart to fit entire view and again don’t display the title.

The Toggle parameter should have automatically displayed on the right hand side, so test changing the parameter to see how the chart switches.

Add the Toggle Control sheet into the right hand container too.

Add a dashboard action to change the Toggle parameter when the Toggle Control is interacted with. It should take effect on Select of the Toggle chart, affect the Toggle parameter and pass the Measure Values field into the parameter.

Clicking on each end of the Toggle chart should now make the viz change from bar to treemap.

You just now need to tidy up the dashboard – add a title, remove any unrequired objects, set the container background of the right hand panel to grey (you will need to set the background on the toggle sheet to the same shade of grey too).

And hopefully that’s it. My published version is here.

Happy vizzin’! Stay Safe!

Donna

What percentage of sub-orders are profitable?

It was Luke’s turn to provide the challenge for this week – to produce a modified mekko chart (or marimekko chart / mosaic plot – see here for more information).

Luke suggested it would require LODs to solve, but that it would also be possible with table calcs. I tackled it with a mixture of both, mainly based on what felt right at the time.

Like most challenges I worked out the data I needed for the final viz in tabular form first, so I could ratify the numbers and calculations I built before building the viz.

  • Building out the data
  • Building the viz

Building out the data

The y-axis of the viz plots the % of profitable orders for each Sub-Category. So the first thing we need to do is identify a profitable order. A single order can have multiple product lines, where each product line might be associated to a different Sub-Category. We need to determine whether the profit against all the products for the same Sub-Category on a single order is positive (ie profitable) or not. We use an LoD for this

Order Is Profitable

IIF({FIXED [Order ID], [Sub-Category]: SUM([Profit])}>0,1,0)

For each Sub-Category within a single Order ID, check if the total profit is a positive number. If it is then return 1 else 0. I’m purposefully choosing 1 and 0 to help with the next step.

To determine the % of profitable orders, we need to know the total of all the profitable orders as a proportion of all the orders for a Sub-Category.

Count Orders

COUNTD([Order ID])

The number of distinct orders.

Profitable Orders

SUM([Order Is Profitable])/[Count Orders]

formatted to be a percentage with 0 dp.

Put these into a table, along with the Sales measure, and you can see what’s going on

We need to sort this data, first by Profitable Orders desc, then by Sales desc. And we need to sort in a way that can be used once this table of data is displayed in the required Viz format. If we just wanted to sort this table of data displayed here, we can use a technique described in Tableau’s KB here. However this doesn’t work for the viz, as it relies on the newly created dimension existing on the Rows shelf, which won’t work when we get to building the viz, as we can’t put it on rows. Since the field also contains table calculations (Rank), you can’t reference the field in the Sort option of a pill.

Anyway, based on this, and after a bit of trial and error, I managed to create a sort field that I could reference.

Sort

STR(ROUND([Profitable Orders],2)) + “-” + STR(ROUND(SUM([Sales])/1000000,2))

Here I’m building up a string field combining the Profitable Orders field, that I’ve rounded to 2 decimal places, with the Sales field that I’ve rounded to $millions at 2 decimal places. This is to ensure that since we’re working with string data, 800 is ordered after 8000 when sorting descending.

Pop this into the view, and set the sort property of the Sub-Category pill to sort by Sort descending

There may be a better way of doing this, and I’m not sure it would work in all circumstances, but it worked for this challenge. I’ll be interested in seeing how others approach this element of the challenge.

Now, that’s resolved, let’s get the other fields we need.

Along with the Profitable Orders %, we also need to display the percentage of non-profitable orders

Non-Profitable Orders

1 – [Profitable Orders]

I also chose to format this to % with 0dp (purely for display purposes in the table).

The width of the bars in the viz, is based on the % of total sales, so let’s work that out…

% of Total Sales

SUM([Sales])/TOTAL(SUM([Sales]))

TOTAL is a table calculation that ‘totals’ up the sales in the table. In this case we care about the whole table, but as with any table calculation it can be set to apply to certain partitions in the view.

Having worked out the % of Total Sales, we now need to work out where to plot each Sub-Category along the x axis, as due to the variable width of the bars, we can’t just use the Sub-Category dimension on the Columns shelf.

Each Sub-Category is going to be positioned based on its % of Total Sales relative to it’s position in the sort order. That is we need to work out the cumulative value of % of Total Sales.

% of Total Sales Cumulative

RUNNING_SUM([% of Total Sales])

This totals up the % of Total Sales as we go down the rows in the table, as shown below. I’ve included column grand totals, so you see that % of Total Sales adds up to 100%, and the cumulative version sums up the previous values as it goes down the table, ending in 100% (note the numbers have been rounded due to 0dp, but if you changed the formatting, you’d see this better).

Now we have all the core elements we need to build the viz.

Building the Viz

On a new sheet,

  • add Profitable Orders to Rows
  • add Sub-Category to Detail
  • add % Total Sales Cumulative to Columns

Edit the table calculation setting of the % Total Sales Cumulative field, so that it is computing by Sub-Category (Compute Using -> Sub-Category option from the drop down /context menu of the pill).

Set the sort option against the Sub-Category pill to sort by the Sort field descending

Change the mark type to Bar.

Add % of Total Sales to the Size shelf. Then change the Size option (by clicking on the button) from Manual to Fixed, and set the alignment to Right

Add Non-Profitable Orders to Rows, then right-click on the relevant axis, Edit Axis, and set to reversed

This is the basic viz – it just now needs formatting

  • Add Measure Names to Colour and adjust accordingly
  • Set a white border around each bar (via the Colour shelf)
  • Edit the % Total Sales Cumulative axis, and change to start from -0.05 which will give a bit of space at the front
  • Remove all row / column borders and gridlines
  • To get the black bar visible across the 0 line, I ended up adding a reference line; a constant set to 0, formatted to a black line.
  • Hide the axes
  • Add appropriate fields to the Tooltip and set accordingly.
  • Add Segment to the Filter shelf.

Then just add to a dashboard, and you’re all set. My published viz is here.

Happy vizzin’! Stay Safe!

Donna

Can you do Comparative Quantity Analysis?

For this week’s challenge, Sean wanted to test us on a particular aspect of Tableau – the use of INCLUDE & EXCLUDE LOD expressions, although he then finished off the challenge intro with the statement “You can complete this challenge however you like but just DON’T USE A FIXED LOD”.

Like many, I am pretty comfortable using a straightforward FIXED LoD, but rarely get INCLUDE & EXCLUDE ‘out of my toolbox’. I usually end up using them because I’ve followed an example online to achieve a specific task. With this challenge, I had every intention of trying to make use of these ‘lesser used’ LoDs when I started out.

However I did struggle. I got to a point where I knew what I’d do with table calculations, but however I tried with various LoD statements, I couldn’t get the behaviour I desired. And at this point, I had a working solution that hadn’t used FIXED LoDs either, so based on Sean’s closing statement, I had achieved a result. When doing these challenges, I do have to consider how much of my time to devote to them (although COVID restrictions does mean my home/family/social life if not as busy at the moment), so I chose to publish so I could feel content I had completed the challenge. However, there was a niggle, that I hadn’t managed to apply the functions the challenge was really testing for.

A twitter post the next day from a fellow #WOW participant Nik Eveleigh, calling out how helpful Tim Ngwena‘s Include LoD YouTube video was, piqued my interest, and that evening I set about watching the video, and subsequently managing to create another solution using the required INCLUDE & EXCLUDE LoDs, without any table calculations either.

So for this blog, you’re gonna get two solutions 🙂 I’ll explain the various fields required for each solution, and build out a tabular view to demonstrate. Building the view is then pretty much the same.

  • Getting Started
  • The Table Calculation Solution
  • The LoD Solution
  • Building the Viz
  • Adding the dashboard interactivity

Getting Started

First up, just a note about the data set. You need to use the Superstore dataset, but need a version that includes the Manufacturer field. This doesn’t seem to exist by default in the excel file included as part of Tableau’s install. So I used the data provided by Sean on data.world.

Secondly, once connected to the data source, we’re going to need a parameter that is ultimately going to be used to drive the dashboard interactivity later on. The parameter needs to store the value of the Sub-Category the user has selected on the chart. The parameter is required regardless of which solution you choose to follow.

pSelectedSubCat

This is a string parameter that contains the text Fasteners by default. Create this and then show it on the view.

The Table Calculation Solution

The first measure being displayed on the viz is the average Quantity by Manufacturer for each Sub-Category; that is for the total quantity of goods sold against each sub-category, how much on average did each manufacturer order?

Count Manufacturer

COUNTD([Manufacturer])

Identifies the distinct number of manufacturers

Avg Qty

SUM([Quantity])/[Count Manufacturer]

Lets put these fields out into a table, so we can see the values. I’ve sorted the rows by Avg Qty descending, and you can see the values match the required output.

Now we need to identify the Avg Qty associated to the selected Sub-Category.

Selected Avg

IF [pSelectedSubCat]=MIN([Sub-Category]) THEN [Avg Qty] ELSE 0 END

This will return the value we want for the row that matches the value of the parameter, and 0 otherwise. The Sub-Category dimension is wrapped in MIN as Avg Qty is an aggregated value, so the dimensions also need to be aggregated. MAX or ATTR will work as well.

Pop this field onto the table.

For the next step, we need to find a way to ‘replicate’ this 152 value across all the rows displayed. When building, I knew exactly what I would do next with a table calc, but I thought this is where the LoD needed to be used, and despite my best efforts I just couldn’t figure it, but more on that later. For now, we’re going to use a table calculation to replicate the value across the whole ‘window’.

Avg of Selected Sub Cat

WINDOW_MAX([Selected Avg])

This is basically looking for the maximum value of the Selected Avg field, which is 152 (as all the other values are 0).

Add this to the view, and set the table calculation to compute using Sub-Category. This is just good practice when using table calcs, as it means wherever the pill gets moved to in the view, it will retain the same computation.

Variance

[Avg Qty]-[Avg of Selected Sub Cat]

Add this onto the table, and again set the table calculation to compute by Sub-Category.

If you change the value in the pSelectedSubCat parameter, you’ll see the last 3 columns change.

So now we’ve just got a few more fields to create that we’ll need on the viz.

We need to be able to identify the selected Sub-Category which will be used to colour the bars.

Is Selected Sub Cat?

[Sub-Category]=[pSelectedSubCat]

The tooltip also shows the text ‘more’ or ‘less’ when describing the display, so we need a field for this

TOOLTIP: more or less

IF [Variance] <= 0 THEN ‘less’ ELSE ‘more’ END

And finally, the tooltip displays the variance as an absolute number, so we need a field for this too

TOOLTIP:Variance

ABS([Variance])

So this is all the core fields you’ll need to build the solution, so if you’re not interested in the LoD route, skip over the next bit 🙂

The LoD Solution

After watching Tableau Tim’s video, I realised why I had struggled to get an LoD solution to work. I needed to revisit my approach to calculating the average quantity, a step I hadn’t originally considered.

Now in the interest of time, I’m not going to write out why the calculation below works in detail when Tim demonstrates it so eloquently in his video 🙂

Qty Inc Manufacturer

{INCLUDE [Manufacturer] : SUM([Quantity])}

This field is going to help us get our ‘average quantity’ value.

Add Sub-Category, Quantity and Qty Inc Manufacturer into a tabular view, and change the aggregation of the Qty Inc Manufacturer pill from SUM to AVG. Order by this field descending, and you can see we have the same values as the Avg Qty field I created in the above solution.

This time, I’m going to start with the field to identify the selected Sub Category.

Is Selected Sub Cat

[pSelectedSubCat]=[Sub-Category]

Then we need to identify the value associated to the selected sub category, and will do this with two fields

Selected Qty

IF [Is Selected SubCat] THEN [Quantity] END

Selected Qty Inc Manufacturer

{INCLUDE [Manufacturer] :SUM([Selected Qty])}

Add these onto the view, and you can see the behaviour. Note this time I chose not to bother with an ‘Else 0’ statement (it wasn’t necessary for the above either).

Now we want to be able to ‘spread’ this value across all rows, and the Comparative Sales Analysis example in this Tableau blog post helps here (and I’ve just seen that Tableau Tim has produced an EXCLUDE LoD video too, which I’m sure will explain things very well).

Overall Selected Qty Inc Manufacturer

{EXCLUDE [Sub-Category], [Is Selected SubCat]: AVG([Selected Qty Inc Manufacturer])}

Note – I found I needed to add the Is Selected SubCat dimension into the statement, to get the tooltips working on both the bar charts.

And once again, we can now compute the variance, and the variance required for the tooltip, and the more or less values:

Variance

AVG([Qty Inc Manufacturer]) – MIN([Overall Selected Qty Inc Manufacturer])

TOOLTIP:Variance

ABS([Variance])

TOOLTIP : more or less

IF [Variance] <= 0 THEN ‘less’ ELSE ‘more’ END

Building the Viz

To build the viz

  • Add Sub-Category to Rows
  • Add Avg Qty (or Qty Inc Manufacturer set to AVG) to Columns. Sort by the field descending.
  • Add Variance to Columns
  • On the Avg Qty marks card, add Is Selected Sub Cat to the Colour shelf and adjust accordingly
  • On the Variance marks card, add Variance to the Colour shelf, and notch down the transparency to about 75%
  • Add TOOLTIP:Variance and TOOLTIP:more or less to the Tooltip shelf of the All marks card
Then it’s just a case of formatting the display and the tooltips.

Remember – if you’re working with the table calc solution, make sure all the table calculation fields in the display (those with the triangle symbol) have all been set to compute by Sub-Category.

Adding the dashboard interactivity

When you click on a Sub-Category, the variance needs to change. This is managed via a Parameter Dashboard Action. Once the viz has been added to a dashboard then add a dashboard action that on select of the viz, sets the pSelectedSubCat parameter, with the value of the field from the Sub-Category dimension.

There’s also an additional feature in this dashboard that stops the selected field from being highlighted on click. To stop this from happening, create 2 new fields

True

True

False

False

Add both of these to the Detail shelf on the All marks card of the chart viz.

Then on the dashboard, add a dashboard filter action, which on select targets the chart sheet, and maps True to False. As this is never a match ie ‘true’, the filter doesn’t apply.

Hopefully I’ve covered everything. My published vizzes are here

Happy vizzin’! Stay Safe!

Donna

Can you recreate the work of Hans Rosling?

This week, Lorna set the challenge to recreate an iconic data viz by the legend, Hans Rosling. You ABSOLUTELY MUST watch the TED talk link from the challenge if you haven’t ever seen this before. Hans is utterly engaging in making this data ‘come to life’!

The challenge involves the following key components

  • modelling the data
  • building the scatter chart
  • displaying the year in the chart
  • highlighting the regions

Modelling the Data

The data provided consisted of 4 files

  1. Life Expectancy
  2. Population
  3. Income
  4. Region Mapping

The first 3 files are stored as a matrix of Country (rows) by Year (columns) with the appropriate measure in the intersection. This isn’t the ideal format for Tableau so the data needs to pivoted to show 3 columns – Country, Year and the relevant measure (life expectancy, population or income depending on which file is referenced).

In Tableau Desktop, connect to the Life Expectancy file. If you find the column names all seem to nave generic labels, then right click on the ‘object’ in the data source pane, and select Field names are in the first row.

This will change so all the years are now listed as column headings

Now we need to pivot this data; click on the 1800 column, then scroll across to the end, press shift and click on the final column to select all the year columns. While highlighted, right-click and select Pivot.

Your data will be reshaped into 3 columns

I then renamed each column as

  • Country
  • Year
  • Life Expectancy

I also chose to change the datatype of the Year column to be a number (whole), as I knew I’d be needing to relate the data on this field later, and working with numeric data is more efficient than strings.

Now I want to bring in the next file – Population (via the Add connection option). But before following these steps, just read on a bit, as I hit a hurdle, which at the point of writing I’m still a bit unclear about…..

I added the Population file into the data source pane. By default it showed the Edit Relationship dialog, but I had the same issues with the column headings.

So I closed the dialog, and fixed the headings by setting the Field names are the first row option. Trying to apply the pivot at this point wasn’t available – no Pivot option when I selected all the columns. So I then manually edited the relationship and set County = country (population)

This populated the data, but again when I tried to apply the pivot option, it wouldn’t display 😦

I’m not sure whether this is possible (Google hasn’t shown me that it isn’t, but it hasn’t shown me that it is either…). I’ve tried using csv and excel versions of the file…..

So until I see the official solution (or I find out otherwise), I carried out the following approach instead :

I removed Population object, so I was just left with the Life Expectancy object. I then went to sheet 1, right clicked on the data source and selected Extract Data -> Extract and when prompted saved the file as Life Expectancy.hyper.

I then started a new instance of Desktop, and connected to the Population file, and went through the steps outlined above – set the Field names are first row, selected all the years to pivot, renamed the fields, set the datatype of the Year field to a whole number, then extracted and saved as Population.hyper.

I then repeated the exact same process with the Income file.

Once I had the 3 hyper files created, I then instantiated another instance of Desktop, and connected to the Life Expectancy.hyper file.

This automatically added an object called Extract into the pane. I renamed this (via right click) to DS-Life Expectancy. I also found my field names hadn’t been retained, so I renamed these too.

I then added the Population.hyper file. I set the relationships as below (once again I’d lost field names 😦 )

I then renamed the object & columns – note it won’t let you add fields with the same name as in the other object in the data pane, so I suffixed the columns

The process was once again repeated to add the Income data

and finally the Region data was also added matching Country to Name

And now, after all that, I have the data in the format I need to build the scatter. Phew!

[Side Note] – whilst writing this blog, I have been messaging my fellow #WOW participant Rosario Gauna. We often check in with each other, if we’ve noticed something odd with the challenges, to sense check whether its an issue with the data/the requirement, or just our own personal interpretation/development. In this instance we confirmed we were both using v2020.4.2, BUT Rosario had been able to do exactly what I set out to do at the start of this blog!! She had a pivot option on her 2nd connection. We’ve tried to figure it out, but have been unable to… I am using Windows, but Rosario is on a Mac, so there might be something there….. It will be interesting to understand if anyone else encountered my problem…. I really didn’t think the modelling part of the challenge was intended to be as fiddly as I’ve described above.

Building the Scatter Chart

The data we have goes beyond 2021, so we need to filter. Add Year to the Filter shelf and set the condition as below

We also have some countries that aren’t mapped to a region (Eight Regions field), so all add this field to the filter shelf and set to exclude Nulls.

Set both these filters to Apply to All Using this Datasource as they’re relevant across all sheets that will be built.

Now build the scatter by

  • Add Year to the Pages shelf
  • Add Income to Columns and change to Dimension (rather than Sum) – still leave to be Continuous
  • Add Life Expectancy to Rows and change to Dimension (rather than Sum) – still leave to be Continuous
  • Add Country to Detail

If you change the Year on the pages control to 2021 you should get something like

It’s roughly the right shape, but not spread as expected.

If you examine the Income axis on the solution, you’ll see the scale isn’t uniform. This is because it’s using a logarithmic scale instead, which you can set by right-clicking on the Income axis -> Edit Axis and selecting the relevant checkbox.

Now we can add Population to Size, change the mark type to circle, and we also want to colour by the region, but we need a new field for this to get it in the format we need for tooltips etc:

Region

UPPER(REPLACE([Eight Regions],’_’,’ ‘))

Add this to Colour and adjust accordingly, reducing the transparency a bit. Format the tooltips and you should have the main chart

Displaying the Year in the Chart

On a new sheet build a very simple view that just shows the Year in the Text shelf, that has been formatted to a very large (72pt) point, centred in the middle, and a pale grey colour.

On the dashboard, add this Year sheet. Then when you add the Scatter sheet, add it as a Floating object and position directly on top of the Year sheet. The Year sheet won’t be visible, so navigate to your Scatter sheet, right click on the canvas and format. Set the background of the worksheet to None ie transparent

If you return to the dashboard, the Year sheet should now be visible.

Note, as you add more objects to your dashboard, you will have to move things around a bit to get the layout as desired.

Highlighting the Regions

Create a simple table that displays the Regions in a row as below (the Show Header on the Region pill on the Columns shelf is unchecked).

Add this to the dashboard, then add a Highlight Dashboard Action that sources from the Region view and affects/targets the Scatter chart.

The final comment to make is that you can’t remove the ‘speed control’ from the page control display. However, when published to Tableau Public, it doesn’t show, so I assume is not enabled (or broken) there.

Hopefully this now gives you what’s needed to rebuild this fabulous viz. My published version is here.

Please do let me know if you had issues with your data modelling like I did!

Happy vizzin’! Stay Safe!

Donna

Can you build a “must include” filter?

Ann Jackson set this week’s challenge inspired by a business requirement she’d encountered, which was to be able to filter the data based on a user selection, and then additionally apply another filter to narrow down the required data.

The task ‘splash’ page had already indicated the challenge would involve sets, and the requirements additionally stated you’d need version 2020.2 or higher, which added an additional hint that set controls would be involved. Examining Ann’s solution, I could also see that the tell-tale set icon displayed when hovering near the input control, which confirmed the ‘filters’ were indeed using the set control functionality.

  • Building the bar chart
  • Viz in Tooltip
  • BANs

Building the bar chart

Let’s start by focusing on the bar chart. The first thing to do is to rename several fields. Ann likes capital letters, so the following fields just need to be renamed so they’re all in uppercase: Sales, Quantity, Product, Order ID and Customer (Name). Sales can also be formatted to $ with 0dp.

We create the first set we need by simply right-clicking on the PRODUCT field -> Create -> Set. I named this 1st Products and set to the Use All condition.

Lets build out a basic table and then we can play with how we need to use this set. Put

  • ORDER ID on Rows
  • CUSTOMER on Rows
  • Add Sales to Text
  • Drag Quantity over the Sales column, so Measure Names gets added to Columns and Measure Values is on Text
  • Sort by Sales descending

If we now add the 1st Products set to the Filter shelf and choose the Show Set option, the list of products in the set will be displayed for selection.

Choose the 1st item in the list, and the data will be filtered to just those customers who have ordered that product

But, while this list matches the names if you do the same on the solution, the Sales and Quantity values differ. This is because, we have actually filtered the data just to the lines containing the selected product. If you bring PRODUCT onto Rows, you’ll see you have 1 line per customer.

Now, while you can use LOD calculations to compute the total sales/quantity at an order id level, to resolve this, it won’t solve the next step of the puzzle, to filter the data further by other products on the same orders, as we’ve already filtered out all the other data. So we need to do something else.

To demonstrate, we just want to keep a handle on these specific ORDER IDs so ctrl-click to select them all, and then Include to add them to the Filter shelf. Now remove the 1st Products set from the Filter shelf, then re-add back to the Rows.

You’ll see we now have all the product lines on each order, with an In or Out displayed against each row. The rows which match the PRODUCT selected in the set, is marked as In. This is essentially what the filter is doing if 1st Products is on the Filter shelf – it is filtering the rows to those which return as In (the set).

We basically want a way to be able to identify all the rows on an order which have at least 1 In row, so we can filter on that instead. For this we can create a new field,

Order has 1st Products

{FIXED [ORDER ID]: MAX([1st Products])}

What this is doing is saying for each ORDER ID, get the maximum 1st Products value, which is then ‘stored’ against each row for the same ORDER ID. This may look odd, as 1st Products from the display is showing an ‘In’ or an ‘Out’, so how does the MAX work? Well, whilst the display is In or Out, under the bonnet this is actually a boolean field of 1 = True = In or 0 = False = Out, so the MAX is actually returning either a 1 (true) or a 0 (false).

If you add Order has 1st Products to Rows, you can now see every row in this restricted data set is listed as True

Remove the ORDER ID from the Filter shelf, and you should observe all the other orders listed have False listed against every row.

So we can now add Order Has 1st Product = True to the Filter shelf, and we can remove 1st Products from Rows too.

So we’ve mastered the 1st product filter, time for the 2nd. Once again right-click on PRODUCT and create another set with the Use All condition. This time I called the set 2nd Product.

Just as before, we want to identify all orders that contain this 2nd product selected, so we’ll create another calculated field

Order has 2nd Product

{FIXED [ORDER ID]: MAX([2nd Product])}

Add this field to the Filter shelf and select True. Now click on the 2nd Product set in the Dimesions pane, and selected Show Set, to display the list of products

You’ll notice though that all products are listed, and not just the ones that are associated to the set of orders already filtered. To fix this, we need to add the Order Has 1st Products field to context (click on the Order Has 1st Products pill on the Filter shelf and Add to Context). This will change the pill to a grey colour, and you can now restrict the values in the 2nd Products set to only show All Values in Context.

The add to context function allows us to dictate the order in which the filtering is applied, so the data is primarily filtered based on our 1st Products set and then filtered based on the 2nd Product.

You should now be able to play around with the filtering and see how the rows for complete orders are retained.

Removing PRODUCT from the Rows, and we get back to the total order value / quantity values we’re expecting.

From this, you should now be able to build out the viz. Firstly, duplicate the sheet so we retain the original table, then move SALES & QUANTITY to the Columns, add Measure Names to Colour, add Labels to match mark colour, and format accordingly.

Viz in Tooltip

As an ‘added extra’ Ann added the ability to hover on a bar and see the details of the order in a simple table. We can use the original table we built above to demonstrate the functionality. Add PRODUCT to Row, and replace the QUANTITY field, with a new field you need to create called QTY, which is just based on QUANTITY ie

QTY

[QUANTITY]

Then hide the ORDER ID & CUSTOMER fields (uncheck Show Header) on the pill. You could remove completely, but I like to retain just in case we need to debug if things don’t work.

We will need to show some totals, but this isn’t the Grand Total. Grand Totals will work, but the background of the row will always be a fixed colour. Either white, by default, or set to grey if you choose. However I noticed that depending on the order id I hovered on, the Grand Total row may display grey or white depending on the existing banding. For this we need to use subtotals instead (Analysis menu -> Totals -> Add All Subtotals).

Because we have multiple (hidden) dimensions on the rows, multiple Total rows will display. We can remove by unchecking the Subtotals option against the ORDER ID pill.

And finally, we need to relabel the row. Right-click on the Total row and select Format. Change the label of the Totals section from ‘Total’ to ‘Grand Total’.

With additional formatting to change the font size etc, this sheet can now be added as a ‘Viz in Tooltip’ to the bar chart. On the Tooltip shelf associated to the bar chart, you can Insert -> Sheets -> <Select Sheet>

Hovering over a bar should now automatically filter the tabular display to the appropriate CUSTOMER & ORDER ID.

BANs

For the BANs, we need several additional calculated fields

# ORDERS

COUNTD([ORDER ID])

simply counts the number of distinct Order IDs in the filtered view.

Total Orders

{FIXED:COUNTD([ORDER ID])}

an LOD calculation that counts the overall number of distinct Order IDs, regardless of any filters applied.

% OF TOTAL ORDERS

[# ORDERS]/SUM([Total Orders])

formatted as a % to 1 dp.

AVG ORDER AMOUNT

SUM([SALES]) / [# ORDERS]

formatted to $ 0 dp.

AVG ORDER QUANTITY

SUM([QUANTITY])/[# ORDERS]

formatted to a number with 0 dp.

Add Measure Names to the Filter shelf, and filter to the relevant 4 measures. Add Measure Values and Measure Names to the Text shelf, and Measure Names on Columns. Add the Order Has 1st Products = true and Order has 2nd Product = true to the Filter shelf, but this time don’t add anything to context. If you do, it will affect the Total Orders measure, and the % of orders calc will be wrong. Hide the Measure Names on the columns (uncheck show header), and format the text appropriately.

To get the thick lines, format and set the Row Divider on the sheet to be a thick coloured line.

And you should now have all the core building blocks needed to add onto a dashboard.

My published viz is here.

Happy vizzin’! Stay safe!

Donna

Can you hide a chart in map layers?

Candra set the challenge this week to use the new map layers to build a map display which, on click of a country, filtered the display to that country and additionally displayed a donut chart indicating the percentage of urban dwellers in that country.

If map layers are very new to you, then the webinar by Adam McCann referenced in the challenge, has some VERY useful pointers for this challenge (the workbook for that can be downloaded from here, as I found some things needed closer inspection).

I think this challenge is going to be best described by walking through the steps.

Building the 1st map layer

Double-Click on Country/Region to load a map, and change the mark type to (filled) Map. Add Region to the Colour shelf and assign the appropriate colours. Adjust the map background via the Map -> Map Layers menu and set the Style to dark, and remove all the selections against the Map Layers list

The intention is when a country is selected the map will ‘drill into’ /filter that country, and display additional information. We will drive this by a parameter, which will get set via a parameter action, but for now we’ll manually set the value.

Create a new string parameter, that is default to nothing/empty string/ ”, and then show this on your sheet.

pSelectedCountry

We need the 1st layer of the map to display, when there is no value in the parameter. We need a calculated field to help drive this.

All Countries

IF [pSelectedCountry]=” THEN [Country/Region] END

By default this will create a field of type ‘string’ but we need it to be a geographic data type. so change this as below.

Add this field to the Detail shelf of the map, and remove the Country/Region field that was automatically added when we first built the map.

Enter the name of a country, eg China, into the parameter. The map should essentially go blank (black screen).

Building the 2nd map layer

We need the country to display, if it’s entered into the parameter. For this, we need another calculated field

Selected Country

IF [pSelectedCountry]=[Country/Region] THEN [Country/Region] END

Once again change this to a geographic role data type of type County/Region.

Next click and drag this field onto the map, and drop it onto the Add Marks Layer option that displays. This will create a new Selected Country marks card, although nothing will obviously change on the map display itself.

Move the Selected Country pill to be on the Detail shelf instead, and add Region onto the Colour shelf. Change the mark type from circle to Map.

Now if you enter a county into the parameter, eg China, the display should ‘filter’ & ‘zoom in’ on China.

So what we have is the 1st layer only showing when no countries have been selected, and vice versa, the 2nd layer only showing when a country has been selected.

We need to now add further layers for the donut chart, which only want to show the country has been selected as well.

Building the 3rd map layer

A donut chart, in the past, is traditionally created by building a pie chart, then using a dual axis to add a circle, sized smaller that the pie chart, on top (see this Tableau KB for info). Rather than a dual axis, we’re going to use map layers – 1 layer for the pie chart, and then another layer for the central circle.

Keeping a country selected (so we can see what we’re building), drag Selected Country onto the map again to create another map layer. Change the mark type to Pie and increase the Size to as large as possible. Move Selected Country to Detail.

In the data set we have a field called Population Urban which stores the ‘percentage’ value of urban dwellers eg 0.17 is 17%. To create the angles for the pie chart, we need to know

Population Non-Urban

1-[Population Urban]

Drag Measure Values onto the Angle shelf. This will automatically add Measure Names to the Filter shelf. Edit the filter to just select the Population Urban and Population Non-Urban measures. Drag the Measure Names field that was also automatically added to the Detail shelf, to the Colour shelf. Adjust colours accordingly, and set the border of the pie chart to white (under the Colour shelf options).

Verify that if you set the parameter to empty again, the whole world map displays, and you can’t see any pie charts.

Building the 4th map layer

Now we need to make the donut hole. Once again, ensure a country is selected, so your pie chart is visible, then drag Selected Country onto the map again, and drop to add another map layer.

This time, move the Selected Country field onto the Detail shelf, add Region to the Colour shelf, and adjust the size of the circle, so its smaller than the pie. Set the border of the circle to be white again too,

Add Population Urban onto the Label shelf, and format to a percentage with 0 dp. The best way to do this, is to format the Population Urban measure in the data pane (right click->default properties -> number format).

At this point you’ll notice the number is huge… we need to add Year to the Filter shelf, and select 2012.

Align the label to be middle centre, and adjust the font to be much bigger text. Add ‘urban dwellers’ underneath.

Once again, verify you get the expected behaviour as you change the values in the parameter from nothing to Russia or China etc.

The final step on this sheet is to add text to the Tooltips. Unlike when working with dual axis, you don’t have an All marks card, so you’ll need to add the required fields (Country/Region, Region, Population Total (formatted to Millions with 0dp), Population Urban to the Tooltip shelf on each of the relevant layers.

Setting the parameter interactively

Create a dashboard sheet, and add the map sheet you’ve built. Then create a dashboard action which sets the pSelectedCountry parameter, impacting the All Countries field, and that when the selection is cleared, the value is reset to ”.

My published viz is available here. Enjoy!

Happy vizzin’! Stay Safe!

Donna

Can you use the Brush Filter Extension?

For this week’s challenge, Sean Miller decided to continue with the theme of dashboard extensions, introducing the brush filter extension. The whole of the top section is the extension, so there’s actually only a single chart that needs to be built for this challenge. The main areas of focus for this blog are

  • The basic chart
  • Colouring the chart
  • Identifying the date range
  • Adding the extension

The basic chart

The chart needs to plot the number of rat sightings vs the Created Date by Borough. The number of sightings is based on the automatically generated field that is related to the name of your data source. To simplify, I created a new field based on this:

# of Sightings

COUNT([2021_02_24_WW08_Rat_Sightings.csv])

and then created the chart as follows

  • Borough on Columns
  • Month(Created Date) on Columns as green continuous date pill, formatted to mmm yy
  • # of Sightings on Rows
  • Borough on Filter, excluding the value Unspecified

The Boroughs were then manually sorted into the required order.

Colouring the chart

Each Borough needs to be coloured based on the difference between the start & end points. So we need to identify what these are for each Borough. I used table calculations for this

Start Count

WINDOW_MAX(IF FIRST()=0 THEN [# of Sightings] END)

If we’re at the first point in the partition, then record the number of sightings. All the other values in that partition will be NULL. The WINDOW_MAX statement then takes that value and ‘spreads’ it across all the data points/rows in that partition, as shown below. In this instance the partition is the Borough.

When Start Count is added to the view, the table calculation is set to compute by the Created Date.

We create a similar field for the end point, this time using the LAST() table calculation

End Count

WINDOW_MAX(IF LAST()=0 THEN [# of Sightings] END)

With these two calculated fields, we can now create

Change

IF [End Count]-[Start Count]>0 THEN ‘INCREASE’
ELSEIF [End Count]-[Start Count]< 0 THEN ‘DECREASE’
ELSE ‘NO CHANGE’
END

To be honest, I outputted the words, as I thought I’d be using these in a legend somewhere, but I didn’t. Outputting as values 1, 0 , -1 or similar will have been more efficient.

This field can now be added to the Colour shelf with the table calculation setting applied to compute by Created Date

Identifying the Date Range

As part of the chart title, we need to identify the date range, ie the earliest month and the latest month within the whole window (whole chart). Again we can use table calculations for this

Start Month

WINDOW_MAX(IF FIRST()=0 THEN MIN([Created Date]) END)

If we’re at the first point in the partition (this time the partition is the whole table), then get the Created Date (this is wrapped within a MIN, as the field needs to be an aggregate; using MAX would have worked just as well). The WINDOW_MAX statement once again ‘spreads’ this value across all the rows. This field is formatted to mmmm yyyy

Similarly we have

End Month

WINDOW_MAX(IF LAST()=0 THEN MIN([Created Date]) END)

also formatted to mmmm yyyy.

Both these fields can then be added to the Detail shelf, but this time the table calculation is set as table across. Once on the Detail shelf, the fields can be referenced in the chart title.

Adding the Brush Filter Extension

This challenge makes use of the Brush Filter extension, which you need to download from this link and save the .trex file somewhere appropriate on your machine.

Create a dashboard and add the chart you’ve already built, then add the Extension object to the dashboard. When prompted select My Extensions and browse to the Brush Filter .trex file you’ve saved.

You’ll then be prompted to configure the extension as below, selecting the name of the sheet containing your chart, the Created Date field that will be filtered, and the relevant measure to display. You can then adjust the colours of the ‘filter’ control, and scroll down to save.

As well as a new sightings by month profile chart being displayed, this will have the effect of adding the Created Date to the Filter shelf on your worksheet. The ‘grey’ box over the chart, can then be adjusted by dragging in the ends, or moving the whole box, to apply the required filter.

And apart from formatting adjustments to the chart, that’s all there is to it for today. My published viz is here.

Happy Vizzin’! Stay Safe!

Donna

Can you use Dashboard Extensions?

For this week’s #WOW2021 challenge, Lorna tasked us with using dashboard extensions, some of which can also be used on Tableau Public (see here). I haven’t had an opportunity to use extensions before, so this was going to be a brand new learning experience for me.

Modelling the Data

In a break from the ‘norm’ Lorna provided us with a new set of data to use, which had to be retrieved from the sources Lorna provided in the challenge

  • Emoji Sentiment – a csv file containing a summary of how often a particular emoji was used within tweets in 2015 and whether the tweets were classified as being positive, neutral, or negative in sentiment.
  • Emoji Database – a link to a site where after registering, you can download a csv file which defines and classifies each emoji

The first part of the challenge requires these data sources to be modelled using relationships in Tableau Desktop. Lorna hinted that a join calculation would be required on the codepoint fields.

I found this a bit odd, as both data sources contained an Emoji field which was automatically set to be the relationship field, and seemed to work.

For the purpose of this blog though, I’m going to attempt to rebuild my solution as I type, and use the suggested fields. The Unicode codepoint field in the Emoji Sentiment data needs to map to the codepoint field in the Emoji Database, but they’re not an exact match. We can prefix ‘0x’ to the codepoint field and ensure the case of the letters match. We can use a relationship calculation for this.

Grouping the Groups

This chart shows the top 20 emojis based on Occurrences and should filter to the appropriate group when a ‘category’ is clicked on the dashboard extension image on the left hand side.

The Emjoi Database data has a Group field, but this doesn’t exactly match the groupings on the image – some groups are ‘grouped’ together. I used Tableau’s in built grouping functionality to group the entries as required (right click field -> create -> group).

Category

I grouped People & Body and Smileys & Emotion together and named them Smileys and People

Filtering by Top 20

The list of Emojis needs to show the Top 20 based on Occurrences, but also filtered by Category. Build out a basic table of

  • Emoji Sentiment : Emoji1, Unicode name on Rows
  • Occurrences on Text
  • Emoji1 on Filter, set to filter by Top 20 of Occurrences
  • Category as a Context Filter – select all values to filter by, then right click -> add to context. This means the data will be filtered based on the Category first before the Top 20 filter is applied. The pill will change to grey to show it is a context filter. Show the filter and you can test it’s working

Building out the Top 20 Chart

Now we have the basics of the filtering functionality, we can build out the rest of the chart.

We need to display percentages, so need fields

% Positive

SUM([Positive])/SUM([Occurrences])

formatted to 1 decimal place. Create similar fields for % Negative and % Neutral

First start by creating a duplicate of the Occurrences field and call it #. Then add this as a discrete pill to the Rows shelf. Remove Occurrences from the Text shelf. Then add Measure Values to Columns and filter by Measure Names so only % Positive, % Neutral and % Negative measures are displayed. Add Measure Names to Colour.

All this has been done on one axis, so now we can add another to display the Position – add this field to Columns. This will create a 2nd marks card. Remove the Measure Names fields from the card, and change the mark type to circle.

Add an additional field MIN(1) to Columns by ‘typing in’ (double click on the Columns area) . Then click on the pill and select dual axis which will combine this field with the Position measure. Synchronise axis. This unfortunately will probably set all the marks to be circle type

so we need to reset…

  • change the mark type of the Measure Values card to bar
  • remove Measure Names from the Position card
  • remove Measure Names from the MIIN(1) card
  • change mark type of MIN(1) card to bar, and reduce size to as small as possible

This should give you the core chart, which can now be formatted accordingly.

Rounded Bar Chart

I always forget how to build these for some reason, so a quick google gave me a refresher via Andy Kriebel’s tutorial on YouTube

  • Add Category to Rows
  • Occurrences to Columns
  • Type in MIN(0) on Columns
  • Drag the MIN(0) pill and drop it on the Occurrences axis. This will change the view so Measure Values is now on Columns and Measure Names is on Rows
  • Change the mark type to Line and drag the Measure Names pill from the Rows onto the Path shelf
  • Increase the size of the mark

Exclude the Null Category and manually sort the categories to match the order in which the entries are listed on the left hand image.

Set to show mark labels, only displaying them at the end of the line, and aligning middle right

Add Category to the Colour shelf and colour accordingly, then apply the relevant formatting again to remove gridlines, axis and hide the Category pill from displaying.

Adding & Configuring the Dashboard Extension

This challenge makes use of the Image Map Filter extension, which you need to download from this link and save the .trex file somewhere appropriate on you machine. I also chose to make use of the image Lorna used rather than create anything, so saved the image from the challenge page to my laptop.

Create a dashboard and add the 2 charts you’ve already built, then add the Extension object to the dashboard. When prompted select My Extensions and browse to the Image Map Filter .trex file you’ve saved.

You’ll then be prompted to configure the extension as below, selecting the image you’ve saved, which is set to scale to container; choosing the Category dimension which is what the chart is filtered by, and then selecting the Top 20 sheet

You then need to select the rectangle option, which allows you to ‘draw’ on the image

Create a rectangle around one of the options, and when prompted select the appropriate Category which the selection relates to.

Repeat this for all the options in the image.

Average Legend

To build this I simply duplicated the Top 20 chart, removed all the pills from the Rows shelf, removed the Emoji pill from the Filter shelf and then changed the aggregation of the Position pill from SUM to AVG. I had to re-tweak the tooltips too.

And after all that, I hope you have all the components you need to deliver this solution. My published viz is here.

Happy Vizzin’! Stay Safe!

Donna

Can you build a Fancy Text Table?

Ann Jackson provided this week’s challenge, to deliver a text table using only Measure Names & Measure Values. I thought with Ann’s introduction that “This challenge should be straightforward for users of all levels” that this would be relatively straightforward, but I have to confess there were moments that I struggled with this. I knew the fundamentals that I’d need to complete this; that all the columns except the first were going to need to be numbers (ie measures), that I’d have to use custom formatting to display the number in the required format (a shape, a date, a word), and that I’d need to use the ‘legends per measure’ functionality to colour each column independently of each other. But determining the best/worst date to display proved to be a bit tricksy! I got there in the end, but there was a fair bit of trial and error.

Custom Formatting

I’m going to step through the build of this, as I think that’s probably the easiest way to describe this challenge. But before I do, one of the core fundamentals to this is knowing about how numbers can be custom formatted. By that I mean when you right-click on a measure -> default properties -> number format -> custom

This box allows you to type in, but you need to know what format/syntax to use. If you set the formatting via one of the other options, then look at the Custom option, it’ll have an entry that will give you a starting point. The above is the format for a number set to 1 decimal place, and shows that negative numbers will be prefixed by a minus sign (-). If we wanted to always show a plus sign in front of a positive number, we can edit this custom formatting to +#,##0.0;-#,##0.0.

The first entry to the left of the semi colon (;) indicates what’s applied to positive numbers. The next entry, to the right of the semi colon, indicates what’s applied to a negative number.

With this knowledge, you can apply more ‘creative’ custom formatting to any numeric measure that contains positive and negative numbers. For example if you want to show a ☑ or a ☒ depending on a ‘yes/no’ or ‘true/false’ concept, then we can create a version of the field as a number along the lines of

Field as Number

If [Field] = ‘XXXX’ THEN 1 ELSE -1 END.

We can than custom format this field by entering ☑;☒ into the text box

The field can still be treated as a measure, since the underlying value is still a number (in this case +/- 1), it’s just displayed differently.

Building the measures

So now we’ve covered how this ‘sneaky formatting’ is working, we’ll get on with the overall build.

The data just needs data from 2019 & 2020, so I chose to set a data source filter to restrict to just these two years.

But, I wanted the rest of the challenge to derive the current year instead of hardcoding, so I created fields

Current Year

YEAR({MAX([Order Date])})

Last Year

[Current Year] – 1

From these, I could then use LoDs to create

CY SALES

IF YEAR([Order Date]) = [Current Year] THEN [Sales] END

This is formatted to $ with 0 dp

and then

LY SALES

IF YEAR([Order Date]) = [Last Year] THEN [Sales] END

again formatted to $ with 0 dp.

We then need an indicator which is ‘true’ if CY SALES is greater than LY SALES, but as discussed above, we need this to be a ‘measure’, which we can custom format.

CY vs LY

IF SUM([CY SALES])>SUM([LY SALES]) THEN 1 ELSE -1 END

Custom format this as ✅;❌ (just copy these symbols from this page… they’ll look black and white in the dialog) – check out this page, to lift the images/other symbols from.

The actual difference identified by △ (again just copy and paste this symbol into the field name) is simply

SUM([CY SALES]) – SUM([LY SALES])

formatted to $ with 0 dp. Once you’ve done this using the Currency(Custom) option, then go to the Custom option and add + to the front of the string :

+”$”#,##0;-“$”#,##0

Next up is the percentage difference

% DIFF
[△]/SUM([LY SALES])

again format this first to a Percentage at 1 dp, then edit the Custom format to +0.0%;-0.0%

Now we’re getting to the slightly more complex part of the challenge – to identify the best and worst day in the month. We’ll start with the best day. We’re using FIXED LoDs throughout this, and while it’s probably possible to do in a single calculation, we’ll use multiple calcs to build up the components.

Order Month

DATENAME(‘month’,[Order Date])

This is the one dimension that’s going to be used in the final output, and simply outputs the month name (January, February etc).

In the data set, there can be multiple sales (ie orders) in a single day. We want to identify the total sales in 2020 (ie the current year) for each order date.

Sales Per Day

{FIXED [Order Date] : SUM([CY SALES])}

Now we’ve got the total sales per day, we want to identify the value of the maximum daily sales in each month

Max CY Sales Per Month

{FIXED [Order Month]: MAX([Sales Per Day])}

Now we need to identify the date in the month that the max daily sales ocurred

BEST DAY

INT({FIXED [Order Month]: MAX(IF [Sales Per Day] = [Max CY Sales Per Month] THEN ([Order Date]) END)}) + 2

WOAH! WHAT??? Let’s try to break this down…

IF [Sales Per Day] = [Max CY Sales Per Month] THEN ([Order Date]) END

If the daily sales value is the maximum daily sales in the month, then return the associated Order Date. But we need to get a date per month, so we’ve wrapped this in a FIXED LoD, for each Order Month. LoDs require the value to be aggregated, so the IF statement gets wrapped in a MAX statement (note MIN would work just as well).

{FIXED [Order Month]: MAX(IF [Sales Per Day] = [Max CY Sales Per Month] THEN ([Order Date]) END)}

Finally, due to the nature of this challenge, that requires we only work with Measure Names & Measure Values, we will convert this date field to a number using the INT function.

The intention here, is that we can then use the Custom formatting option once again, to set the number as a date format – I chose dd mmm yyyy (ie 01 Jan 2020 format, as I feel its less confusing that working out whether the date is in UK or US format).

However, by a very weird circumstance, converting a date to an INT then formatting as a date, will give you a date 2 days out from the one you converted. I don’t understand why, and it left me scratching my head for some time. I had to sense check with a fellow #WOWer who had the same, and checking Ann’s solution, she also was handling the oddity, which is the reason for the +2 on the calculation.

We just create similar fields for identifying the worst day

Min CY Sales Per Month

{FIXED [Order Month]: MIN([Sales Per Day])}

WORST DAY

INT({FIXED [Order Month]: MAX(IF [Sales Per Day] = [Min CY Sales Per Month] THEN ([Order Date]) END)}) + 2

format this to dd mmm yyyy

The final measure we need is based on determining the rank of the CY SALES per month. Ie if we ordered the months based on CY SALES descending, the top 6 would be marked as ‘Top’ and the rest as ‘Bottom’.

RANK

IF RANK(SUM([CY SALES])) <= 6 THEN 1 ELSE -1 END

We can then custom format this to “Top”;”Bottom”

Formatting the Table

Create a text table by

  • Order Month on Rows
  • Measure Names on Columns
  • Measure Values on Text
  • Measure Names on Filter, filtered to just the relevant measures

Add Measure Values to the Colour shelf, and select the Use Separate Legends option to display multiple diverging colour legend controls.

It’s now a case of going through each measure and editing the colour palette, and other settings. Some of this was again a bit of trial and error for me – I chose options that worked.

For the black text fields (CY SALES, LY SALES, %DIFF), choose a diverging colour palette, then click on the coloured squares at each end and select black from the colour picker. Select Stepped Colour and reduce the steps to 2.

Apply the same concept to the BEST DAY and WORST DAY legends, but select the appropriate green or red colour instead.

For the remaining fields, select a diverging colour palette, select the appropriate red at one end, and green at the other, reduce the steps to 2

And subject to some other formatting tweaks (increase font sizes, centre text), this is enough to complete the challenge. My final published viz is here. Note, the published viz does have slight differences to what I’ve blogged… as with many things, you sometimes realise things can be simpler when you try again.

Happy vizzin’! Stay Safe!

Donna

Can you predict the future?

Continuing on from last week’s challenge, Candra Mcrae set this #WOW2021 challenge to introduce another new feature of v2020.4 (so you’re going to need this version as a minimum to complete this challenge). Predictive modelling isn’t something I use at all, so I was quite reliant on the help documentation Candra referenced. I’m also not overly familiar with the various different models, and when they should be used, so this isn’t something I’m likely to use personally, without careful consideration and thought – it’s worth reiterating Candra’s warning :

Viz Responsibly: While predictive analysis in Tableau has never been easier, we encourage you to be intentional in how you use this powerful feature because not doing so (e.g., selecting a model type not appropriate for your data) could lead to inaccurate results.  

The focus of this blog will be

  • Building the main chart
  • Creating the tooltips
  • Determining the data for the title
  • Building the measure selector
  • Adding the measure selector interactivity

Building the main chart

The data provided has some records related to years before 1993, but the requirement is just to use data from 1993 onwards, so the first thing I did was to set a data source filter (right click on data source -> Edit Data Source Filters) to restrict the whole data source to Year 1 >=1993

Next I created the measures we need to display

Total Enrollment

[Total enrollment 2 All students]

simply references the existing measure.

% Black Students

SUM([Total enrollment 2 Black students]) / SUM([Total Enrollment])

% Non-Black Students

1- [% Black Students]

These 3 measures need to be displayed on a single chart where the actual measure displayed is determined by user selection. This selection will be driven by the use of a parameter, that will be set by a parameter action. For now, we just need to establish the parameter.

pSelect_Measure

This is simply a string parameter which will store the value of Total Enrollment by default.

Using this parameter, we can now decide which value we’re going to plot on the chart

Actual Value

CASE [pSelect_Measure]
WHEN ‘Total Enrollment’ THEN SUM([Total Enrollment])
WHEN ‘% Black Students’ THEN [% Black Students]
ELSE [% Non-Black Students]
END

Add this to a view by placing

  • Year 1 as continuous (green) pill on Columns
  • Actual Value on Rows

and you get the basic shape, although it’s not as ‘peaky’. This is resolved by editing the Actual Value axis (right click axis -> edit) and unchecking the Include zero checkbox.

Now change the text in the pSelect_Measure input parameter that’s displayed to % Black Students, and the chart will change. Verify it changes with the text % Non-Black Students too.

In reading through the Tableau KB documentation about Predictive Modeling Functions in Time Series Visualizations, I came to learn of the feature to Extend Date Range, something I’ve never come across before, and I’m not sure what version it first appeared in. Anyway, for this, you need to be working with a field which is a date datatype. The Year 1 field provided is an int.

I’m not entirely sure what I’ve done next is the optimum method, but it worked for me… some of it involved a bit trial and error when I came to defining and invoking the modelling feature later on. In writing this up, I’m essentially helping you to avoid the ‘back and forth’ steps I took in getting it all to work.

Anyway, I needed a date field to represent the year

Year

MAKEDATE([Year 1],1,1)

This resolves to a field containing 1st Jan YYYY for each Year in the data set.

Replace Year 1 on the chart with this field, and changing it to the ‘continuous’ instance of the date by choosing the second ‘Year’ option from the context menu

This changes the pill to green and then the option to Extend Date Range is visible on the menu. Set this to 5 years using the custom option

After doing this you’ll get a 5 nulls indicator displayed, which is basically saying there’s some years without any data. This what we expect for now.

Now onto the modelling part. We need a new calculated field to contain the predicted value using the Gaussian process regression.

Predicted Value

MODEL_QUANTILE(‘model=gp’, 0.5,[Actual Value],ATTR(DATETRUNC(‘year’,[Year])))

again, this took a bit of trial and error to get the date field in the required format.

Add this to the Rows and again edit the axis to ‘exclude zero’. You should now see the data extending for the predicted value beyond 2018. You can now hide the 5 nulls indicator (right click -> hide indicator)

You can now combine to be a dual axis chart (don’t forget to synchronise axis), and apply the relevant formatting to change the marks, apply relevant colours, hide axis & gridlines etc. Note, I set the area chart to 25% opacity, rather than the 75% stated in the requirement, as this gave me the colour most similar to the solution.

Creating the tooltips

Hovering over the chart, the tooltips display the Actual, Predicted and Residual (Actual-Predicted) value for each point. But depending on the measure selected, the format differs – Total Enrollment is in K and the others are in %.

We can’t use the number formatting feature of a field to resolve this, so we need to be a bit more creative. I confess I started creating individual fields for each measure (Actual, Predicted, Residual) based on the measure type selected (Total Enrollment, % Black Students, % Non-Black Students), but this meant I was creating loads of calculated fields, which just seemed a bit unnecessary.

So after a bit of a think, I realised there was a better way.

First up, let’s get our residual

Prediction Residual

[Actual Value]-[Predicted Value]

Now, if we put the values in a tabular form, you can see what the precision of the values are depending on the measure stated

We need to format appropriately. Both displays required the values to be formatted to 1 decimal place, and both have a suffix, either a K or a %.

To get the value in the required display format

Tooltip – Actual Value

IF [pSelect_Measure]=’Total Enrollment’ THEN [Actual Value]/1000
ELSE [Actual Value] * 100
END

Format this to 1 dp.

Create a similar field for the predicted value, which should also be formatted to 1 dp.

Tooltip – Predicted Value

IF [pSelect_Measure] = ‘Total Enrollment’ THEN [Predicted Value]/1000
ELSE [Predicted Value] * 100
END

And finally Tooltip – Residual

[Tooltip – Actual Value] – [Tooltip – Predicted Value]

This needs to be custom formatted to +#,##0.0;-#,##0.0 which ensures a + symbol is always displayed for positive values.

Pop these onto the tabular display we built earlier, and you can see the values are now displaying in the way we need

Finally we need to create a field to store the required suffix

Tooltip – Value Suffix

IF [pSelect_Measure] = ‘Total Enrollment’ THEN ‘K’
ELSE ‘%’
END

We can now add these 4 fields onto the Tooltip shelf of the ‘All’ marks card, and create the tooltip as required

Determining the data for the title

As we need to only use 2 sheets to build this solution, and 1 sheet will be required for the measure selection, we have to incorporate the summary data displayed at the top of the dashboard as part of the title of the chart viz.

In the title, we need to display the following :

  • The latest year in the provided data set (ie 2018)
  • The latest year including the extended date range (ie 2023 – 5 years later)
  • The actual value from 2018 based on the selected measure
  • The predicted value from 2023 based on the selected measure
  • An indicator to show whether the values were likely to increase or decrease

The requirement was to ensure there was no ‘hardcoding’. And as we’re working on getting information related to a specific row (ie year) in a set of data that consists of multiple rows (years), then we’re going to need to make use of table calculations for this.

Let’s start with the ‘easier’ bits first. We want the Year of the latest in the actual data set, and we want this value to be essentially stored against every row in the data

Latest Year

{FIXED: MAX([Year])}

This returns the value of 2018.

Latest Year + 5

DATE(DATEADD(‘year’,5,[Latest Year]))

This simply adds 5 years to the Latest Year, so returns 2023.

Now when I’m using table calculations, I often prefer to see what the data is doing in the table itself, so I can be sure I’m doing things correctly. With the ‘extended year’ stuff, it’s a bit fiddly creating the table from scratch, so I simply started by duplicating the chart sheet ‘as crosstab’ (right click on the sheet name tab, -> Duplicate as Crosstab). Rearrange the fields so Measure Names is on Columns and Year is on Rows and the ‘Tooltip’ named fields are visible. Add Latest Year and Latest Year+5 to Rows, and you can see how these fields show the same value against every row.

Now, remove these fields, as by adding them, we’ve lost the additional ‘extended dates’ rows (ie the ‘fake’ rows that don’t actually exist in the data). Ok, so now we want to get the Actual Value associated to 2018, but then perpetuate this across every row in the data.

Latest Year – Actual

WINDOW_MAX(IF MIN([Year]) = MIN([Latest Year]) THEN [Tooltip – Actual Value] END)

If the Year is the same as Latest Year, then display the value from the Tooltip – Actual Value field. The WINDOW_MAX table calc, then spreads this same value across all rows displayed. Format to 1dp and add this to the table.

We need to do something similar to get the Predicted Value for 2023

Latest Year +5 – Predicted

WINDOW_MAX(IF LAST()=0 THEN [Tooltip – Predicted Value] END)

If we’re at the last row in the data, then display the value from the Tooltip – Predicted Value field. Again the WINDOW_MAX spreads the value across all the rows. Set this to 1 dp and add to the table.

And now we just need to get the increase/decrease indicator

Increase | Decrease

IF ([Latest Year – Actual])-[Latest Year +5 – Predicted]>0 THEN ‘decrease’ ELSE ‘increase’ END

So now we know we’ve got the correct values we need, we can add these fields to the Detail shelf of the chart sheet, so we can reference them in the Title of the chart.

We also need the Latest Year and Latest Year +5 fields added to the Detail shelf, but when you add these, you’ll notice that you lose the ‘extended years’. You can fix this by wrapping the fields in an ATTR function. Double click on the field, which will allow you to ‘type in’ to the field.

You should now be able to create the text in the chart title

Building the measure selector

Phew! Are you still with me… there’s a fair bit going on already, and now we’ve got to build the chart that will drive the user selection.

On a separate sheet, add

  • Measure Names to Rows
  • Measure Values to Detail
  • Measure Names to Text
  • Measure Names to Filter, and restricted to the 3 original measures – Total Enrollment, % Black Students, %Non-Black Students

Uncheck Show Header on the pill on the Rows, then format

  • Set background colour of the pane to a navy blue
  • Set row & column borders to be white
  • Set the text label to be white text, centred, and increase the font
  • Turn off the tooltip

Adding the measure selector interactivity

Create the dashboard and add both the charts. To add the interactivity so that on click of a row in the Measure Selection sheet, it changes the measure being displayed, we need to add a dashboard action, that changes a parameter (Dashboard menu -> Actions -> Add Action -> Change Parameter). Set the action to run on Select when the Measure Select sheet is clicked. The target parameter is pSelect_Measure and the Measure Names field should be passed into this.

And with all that, you should hopefully now have a working solution. My published viz is here (note, my Measure Selection sheet is slightly different from what I’ve described above). The above is a bit simpler I think.

Happy vizzin’! Stay Safe!

Donna