Can you performance tune this workbook?

This week’s #WOW2021 challenge was a bit different. Rather than being challenged to recreate a viz, Candra challenged us to performance tune a workbook she’d put together, with the aim for all steps to be sub 0.5s.

I started by downloading the slow published workbook (waiting patiently for it to download), opening the workbook ( waiting patiently for it to open), and then assessing how it had been built.

It took a while to download… it was 99MB…

When it finally opened, I could see

  • The workbook had multiple data sources, which were being blended
  • There were cross data source filters being applied
  • There were lots of fields in the data source that weren’t being used
  • One of the data sources contained 2.2 million rows

Tackling these points initially was key.

The requirements stated that the end user only cares about OB/GYNS (specifically Pri Spec = OBSTETRICS/GYNECOLOGY), so the hint here was we’re allowed to restrict the size of the data source.

The requirements also stated any data prep/transformation had to happen in Tableau Desktop, hinting that some re-modelling would be required.

This means you do need to download the original data source files (which also took some time). My aim was to create a single data source, that would be restricted to the OB/GYNS data only, that I could then repoint the existing report to.

I created a new connection within the ‘slow’ workbook by connecting to the two files and relating them – the relationship of St to St should happen automagically πŸ™‚

I then added Data Source Filters (right click on the data source > Edit Data Source Filters), and set the filters to restrict to the relevant Pri Spec and exclude certain states (which I found was applied when looking at the data source filters applied to one of the original data sources).

Next I repointed the workbook to use this new data source : right click on the original DAC_NationalDownloadableFile data source > Replace Data Source, and repoint to the DAC_NationalDownloadableFile.csv+ (Multiple Connections) data source

This caused an issue with the NPI field which was a string in the original and a number in the replacement.

Resolve this by right clicking on the NPI (1) field and select Replace References and point to the NPI field. This will then fix the issue with the Providers per Capita field too.

On the Providers per Capita sheet, remove both the pills from the Filter shelf and replace the Population field on the Tooltip shelf that is from the secondary data source (since there is an orange tick against it), with the Population field from the newly created data source (literally drag it and drop it onto the existing pill so it replaces it). You’ll need to format the Population field to millions with 1 dp.

Edit the Providers per Capita measure from

round(COUNTD([NPI])/(SUM([Sheet1 (State Regions)].[Population])/100000),0)

to

COUNTD([NPI])/(SUM([Population])/100000)

This pill is no longer referencing data from another data source. The ROUND isn’t necessary as this can be managed with the number formatting (so its one less calculation to be done and shaves a teeeeeny bit of time).

You can now remove the original 2 data sources (right click data source and close).

Now extract the remaining data source – right click > Extract Data. The data source filters you applied, will automatically be set to be extract filters. Also set the extract to Hide Unused Fields. The extract filters will reduce the number of rows in the data source; the hide unused fields will reduce the number of columns, all of which optimises the data source as much as it can be.

Change to the dashboard sheet now.

Remove the Primary Speciality filter as its no longer relevant (contains 1 value).

I chose to change the City filter to ‘All values in database‘ and to be a Wildcard Match input, so the user types in rather than selecting from a list. Using fields that have a high number of values as a filter and displaying all the possible values in the filter can impact performance. A separate query needs to be run to populate the list. Using ‘relevant values’ again causes a query to happen each time other selections are made with other filters/actions. This can also add overhead.

For the same reasoning, I also changed the NPI filter in the same way,

I also changed the Gender filter to be a Single Value List (ie radio button) entry, set to Include rather than Exclude values.

With these changes I ran the performance recording (Help menu -> settings and performance -> start performance recording), followed the required steps (select Texas, deselect Texas, set Gender to F, reset back to All), then stopped the performance recording ( Help menu -> settings and performance -> stop performance recording). This will automatically load a new workbook with the results.

The best I got to on my personal laptop is below – not the 0.5s suggested in the challenge, but a lot better than it was.

Running the same process against the same workbook on my more powerful work laptop, I got :

which suggest hardware does help πŸ™‚

After a bit of discussion with another #WOW participant, it was suggested reordering the columns in the table (just reverse the order) should help. So I tried this, and shaved a bit more time (this is personal laptop, so compare to top image above).

I’m interested in seeing what other solutions and suggestions get shared and what Candra came up with herself.

Happy vizzin’! Stay Safe!

Donna

Generation Population

For week 17 of #WOW2021, Sean Miller decided to challenge us with recreating a chart by Nathan Yau (see here for the original). The aim was to recreate within 1 sheet, which I managed to do. So how did I do it? Read on πŸ™‚

  • The groundwork
  • Colouring the bars
  • Adding the year labels
  • Final formatting

The groundwork

The chart itself follows a straight forward structure of multiple blue dimension fields on Columns with a green measure on Rows similar to this view below based on Superstore Sales data – it’s just formatted a bit more creatively!

The data provided just contains 3 fields : SEX, AGE and 2019 Population. We want to present the 2019 Population measure for the Total SEX only by Year. We need to create the Year field, which is simply

Year

2019-[AGE]

I dragged this into the ‘dimensions’ section of the data pane (above the line).

This allows us to create the basic bar chart required

We now need to define the various fields that we will need to add as additional dimensions on the Columns shelf to create the ‘generation’ data panes.

Generation

IF [Year]<= 1927 THEN ‘Greatest
Generation’
ELSEIF [Year]<= 1945 THEN ‘Silent Generation’
ELSEIF [Year]<=1964 THEN ‘Baby Boomer’
ELSEIF [Year] <= 1980 THEN ‘Generation X’
ELSEIF [Year]<=1996 THEN ‘Millennials’
ELSEIF [Year]<=2012 THEN ‘Generation Z’
ELSE ‘Gen
Alpha’
END

NOTE – there is a deliberate carriage return in the condition for ‘Greatest Generation’ and ‘Gen Alpha’ which will force the field to ‘wrap’ when displayed.

Having defined the above, we need to determine

Total Population Per Generation

{FIXED [Generation], [SEX]: SUM([2019 Population])}

and then

% of Total Population

SUM([Total Population Per Generation]) / TOTAL(SUM([Total Population Per Generation]))

NOTE – to create this field, I originally created a ‘quick table calculation’ against the Total Population Per Generation field which I’d displayed on a view, and then dragged the resulting pill into the measures pane to create the new field with the desired calc.

Let’s put these in a table, so we can then check the values, and see that the 2nd and 3rd columns are the same value for each row associated to a particular generation, which is what we need.

Right, so now we need to determine the rank based on the Total Population Per Generation

Rank

RANK_DENSE(SUM([Total Population Per Generation]))

Format this to a custom number with 0 decimal places, but prefixed with #

When added to the table we get

The intention, is that Rank will be displayed as discrete ‘header’ pill rather than a measure, so let’s move Rank to be the 1st pill on the Rows shelf and change to be discrete.

But we need the Total Population Per Generation and % of Total Population fields to be combined into a single pill. So we need to do a bit of string manipulation/ number formatting for this

Total | Percent

STR(ROUND(SUM([Total Population Per Generation])/1000000,1)) + ‘M’ + ‘ | ‘ + STR(ROUND([% of Total Population] * 100,1)) +’%’

This looks complicated, but its because even though you may have applied the relevant display number formatting against the individual numeric measures of Total Population Per Generation and % of Total Population, the formatting is not preserved, when converted into a string field, which this field needs to be. So the relevant calculations need to be applied within the field itself.

This outputs the below

Now we need a way to sort the data so the ‘Greatest Generation’ associated to the earliest years is listed first. I did this by determining the minimum date within each Generation.

Min Year Per Generation

{FIXED [Generation], [SEX]: MIN([Year])}

Add this into the view as the first pill in Rows, and the data should automatically sort from lowest to highest

We can now build the viz – duplicate the table sheet, remove Total Population Per Generation and % of Total Population from the Measure Values section. Drag 2019 Population to Columns, then click the swap rows & columns button :

Colouring the bars

The bars are coloured based on each ‘Generation’ pane. You could hardcode this along the lines of ‘Generation = x OR Generation = y or Generation = z etc’ where x, y and z etc are generations of the same colour. This would return true or false, which you can then add to the colour shelf and adjust accordingly.

I decided to be a bit more dynamic, deciding I wanted to set the colour based on whether it was an odd or even pane.

For this I created another ‘rank’ field based on the field I’d used to ‘sort’ the data, the Min Year Per Generation field.

Sort Position

RANK_DENSE(MIN([Min Year per Generation]),’asc’)

If you add this into the data table, you’ll see each section is numbered 1 -7

From this, we can then determine if the number is even (or not)

Sort Position is even number

[Sort Position]%2=0

Add this onto the Colour shelf which will return True or False and colour accordingly.

Adding the year labels

The year labels are achieved by using a dual axis chart, to plot a point for each specific year (based on the Min Year Per Generation field) at some arbitrary value.

Point to Plot Year Label

IF [Year]=[Min Year per Generation] AND [Year]<>1919 THEN 4700000 END

For each ‘min year’ that isn’t 1919, plot a value at 4.7M.

Add this field to the Rows shelf, change mark type to circle, reduce size to as small as it can, and set the colour transparency to 0.

Add Min Year Per Generation to the Label shelf, then change the alignment to vertical.

Now you make the chart dual axis and synchronise the axis. Some of the colours/marks may change, so reset by removing Measure Names from the Colour shelf and changing the mark types bar to bar & circle.

Final formatting

So at this point all the main components are there. It’s now a case of formatting – removing right and bottom axes, removing gridlines. The vertical dashed lines, are column dividers, set at the pane level only.

The solid left hand axis is set via

The text is formatted using the fonts advised in the requirements and sizes adjusted to suit.

Add on a tooltip, and set the background colour of the worksheet and you should be done.

My published viz is here.

Happy vizzin’! Stay Safe!

Donna

Can you use Quick LoDs to recreate this view?

It was Lorna’s turn to set the challenge this week, and she took the opportunity to ask us to use a new feature in Tableau v2021.1 – Quick LoDs (you’ll obviously need v2021.1 to use the functionality, but the LoDs can be created manually in earlier versions if need be).

This blog will focus on

  • Creating the Sub-Category Average Sales by Category LoD using Quick LoDs
  • Formatting the difference calculation
  • Colouring the bars
  • Text for Tooltip
  • Putting it all together

The viz itself isn’t that complex once you’ve nailed the LoD, so lets’ start with that bit.

Creating the Sub-Category Average Sales by Category LoD using Quick LoDs

The what…? It’s a bit of a mouthful,.. the “Sub-Category Average Sales by Category”. What we’re essentially after here is the Total Sales per Category / No of Sub-Categories in the Category. To do this with LoDs, we first need to create an LoD to represent the total sales in each Sub-Category.

As mentioned in the requirements / referenced in the KB article above, you use a combination of ctrl/command click & drag to create an LoD via the Quick LoD feature. In this case I dragged the Sales measure onto the Sub-Category dimension, and this automatically created

Sales (Sub-Category)

{ FIXED [Sub-Category]: SUM([Sales]) }

which is the sales per sub-category.

From this, I then dragged this measure onto the Category dimension, which automatically then created

Sales (Sub-Category) (Category)

{ FIXED [Category]: SUM([Sales (Sub-Category)]) }

BUT I then edited this to change the aggregation to AVG, so the field became

{ FIXED [Category]: AVG([Sales (Sub-Category)]) }

This gives the average value required, which you can see is the same across the rows in a single Category :

Formatting the difference calculation

The viz displays the % difference between the Sub-Category sales and the average. This is calculated with

Difference

(SUM([Sales])- SUM([Sales (Sub-Category) (Category)])) / SUM([Sales (Sub-Category) (Category)])

This is then custom formatted as β–²0%;β–Ό0% (I use this site to get my shapes from).

Colouring the bars

The bars need to be coloured based on the value of the Difference field, so another calculated field is required

Colour

[Difference]>0

This will just return true or false and can dropped on the Colour shelf of the bars.

Text for Tooltip

Within the tooltip, the % difference is displayed, along with some text which indicates if it is above or below the average. For this we need another calculated field to reference in the Tooltip.

Above | Below

IF [Difference]>0 THEN ‘Above’ ELSE ‘Below’ END

Putting it all together

With all the calculated fields built, the the chart itself is a relatively simple dual axis chart

  • Add Category then Sub-Category to Rows
  • Add Sales to Columns and sort descending
  • Add Sales (Sub-Category) (Category) to Columns. Make dual axis and synchronise axis
  • Click on the All marks card and remove Measure Names from the Colour shelf.
  • Click on the Sales marks card and change mark type to bar; add the Colour field to the Colour shelf and adjust accordingly; add Difference to the Label shelf and format appropriately.
  • Click on the Sales (Sub-Category) (Category) marks card and change mark type to Gantt; adjust the Size to be as large as possible; set the colour to the relevant grey.
  • On the All marks card, add the Above | Below field to the Tooltip shelf, then edit the Tooltip on the All marks card to create the required text.
  • Finally, remove axis, remove the column heading labels, remove grid lines and column borders and format the displayed text appropriately. Title the viz.

A very brief post today, but hopefully I’ve ticked off all the required elements, My published viz is here.

Happy vizzin’! Stay Safe!

Donna

Tableau Website Analytics

This week’s #WOW2021 challenge by Ann Jackson is focussed on dashboard design/layout, reference lines and formatting time, so that will be the focus of this blog too. I’m hoping it might be a fairly short post this week πŸ™‚

  • Filtering the data
  • Creating the key measures, formatting time, adding average lines
  • Dashboard layout

Filtering the data

The data Ann provided is based on the Google Analytics data the team have harnessed related to the activity on the #WOW website. It’s a static data set, which contains data from 29 Dec 2019 up to 12 April 2021, but Ann states her solution only includes data up to 10 April 2021. I therefore added the Date field to the Filter shelf on the sheet and set it to end on 10 April 2021

Knowing I’d be building multiple sheets for this challenge, I set this filter to be a global filter by setting it to apply to worksheets – > all using this data source

Creating the key measures, formatting time, adding average lines

For this challenge, we’ll be creating a sheet for each BAN, and a sheet for each trend line depicting a measure by week, so 8 sheets in total.

For the Sessions measure, both the BAN and the trend chart are straightforward.

The BAN just shows SUM([Sessions]) on the Text shelf, appropriately formatted. The other BAN sheets are pretty much the same, but just show the appropriate measure in the appropriate colour.

The trend line displays Week([Date]) by SUM([Sessions]), with the average line added by dragging Average Line from the Analytics pane onto the chart and then formatting.

If the numbers don’t quite match up, it may be because your week is set to start on a different day. By default as I’m UK based, dates are set to start their week on a Monday. For this challenge to match Ann’s solution, your week needs to start on a Sunday. You can set this by right clicking on the data source itself and changing via the Date Properties option

To determine the average session duration, we need to build a calculated field, that then needs to be formatted to show minutes and seconds.

Average Session Duration

(SUM([Session Duration]) / SUM([Sessions]))/86400

Session Duration / Sessions will return a value in seconds. To be able to format this in the way required, we need to get the number of seconds as a proportion of a day. There are 86400 seconds in a day (60 * 60 * 24), so we divide by this.

We can then use a custom format on this field and use the nn:ss notation. NOTE not mm:ss. If you needed to format this as hours:minutes:seconds, the format would be hh:mm:ss, but mm:ss does not provide you with the right values. This video demonstrates all this, if you’re interested.

When it comes to building the trend line for this measure, the average line, can’t be added as simply in the way it could for the Sessions trend chart (at least I couldn’t get it to work that way…).

We need to build a calculated field that will show the same overall average value alongside the weekly averages. This value needs to match what’s displayed in the BAN chart.

Overall Session Duration mm:ss

{FIXED : [Avg Session Duration]}

This is returning the Avg Session Duration based on all the rows in the data. As the Date field has been added as a global filter, it is acting like a data source filter, so the dates we don’t want have been excluded from the data set that the FIXED LoD is being applied against. If the Date filter was a simple ‘quick filter’, this calculation wouldn’t work, as the data for the 11th & 12th April would also be included in the calculation.

Format this to nn:ss as well. Add this field to the Detail shelf of the trend chart, then right click on the Average Session Duration axis and Add Reference Line, and reference the Overall Session Duration mm:ss field.

For the bounce rate, we simply need

Bounce Rate

SUM([Bounces])/SUM([Sessions])

which is formatted to a percentage of 1 dp.

When building the trend line, I added the average line from the analytics pane, but that gave me a different value from my BAN. So I built

Overall Bounce Rate

{FIXED : [Bounce Rate]}

This was formatted to % 1dp, and added as a reference line as described above.

Finally the last measure, we need

Avg Time on Page

(SUM([Time on Page])/(SUM([Pageviews])- SUM([Exits])))/86400

formatted to nn:ss

and

Overall Time on Page mm:ss

{FIXED : [Avg Time on Page]}

and again formatted to nn:ss.

The charts for these are built exactly like the Average Session Duration.

Dashboard Layout

The easiest way to describe the layout I built is to show it πŸ™‚ Note the Item Hierarchy on the left hand side of the image below.

I have a vertical container as the Main page.

The first row in this container is the title in a Text object.

The second row is a Blank object and is the yellow line. The background of the blank object is set to the relevant yellow, the outer padding is set to 0 all round, and then the height is set to 4. This gives the appearance of a thick coloured line.

The third row is another vertical container, and I’ve done this, so I can ultimately use the option to Distribute Contents Evenly on the container to ensure the horizontal container ‘rows’, which I’ll be adding into this container, are evenly spaced.

So ‘within’ the 3rd row, the 4th-7th rows are managed using a horizontal container, which in turn contains a blank object (the coloured vertical line), the BAN sheet and the trend sheet. Around each horizontal container I set the outer padding to 10 all round to give some spacing. The blank object in each ‘row’ is given the relevant background colour and set to a width of 15.

Finally, I finished off with an additional horizontal container at the bottom which is where I added my standard #WOW footer. Note this horizontal container is essentially the 4th row of the original Main vertical container though.

Hopefully I’ve provided enough for you to build the challenge yourself / resolve any issues you might have. If there’s anything I’ve missed, do please comment to let me know.

My published viz is here. Please note, that for some reason (and I don’t know why), Tableau Public does not seem to display my time axis properly (just shows 0). As the workbook renders on Public, the values on the axis show, but once fully loaded, they change. If you download the workbook, the values are fine. I published from Tableau Desktop 2021.1.0. I’m putting this down to an issue with Tableau Public.

Happy vizzin’! Stay Safe!

Donna

Can you recommend profitable return customer product bundles?

This week Candra Mcrae provided a market basket analysis challenge which had a bit of all sorts going on as well as the core ‘what else was bought with product x’ puzzle – parameter actions, sheet swapping, custom toggle buttons.

Hopefully I’ll cover all the key elements :

  • Identifying the 2nd customer order
  • Identifying the orders containing the selected sub-category
  • Identifying the ‘other’ sub-categories ordered
  • Building the bar chart
  • Building the tree map
  • Creating the toggle button
  • Controlling what viz to display
  • Creating the dashboard and interactivity

Identifying the 2nd customer order

The requirements state the analysis is based on returning customers, and specifically their 2nd order. So we need a way to identify this set of data.

In the superstore data set, a customer only ever places one order on a day, so we will make use of this feature, but this technique won’t always work in other situations – I personally have been known to place multiple orders with Amazon on the same day, especially when Christmas shopping!

If we look at the dates customers have ordered, we’re looking to identify the orders associated to the customer & dates highlighted below

We can identify whether the date is the first order date using a FIXED LoD calculation

Is Customer’s First Order?

{FIXED [Customer ID]: MIN([Order Date])} = [Order Date]

The FIXED LoD statement within the {..} is finding the minimum Order Date for each Customer ID. This is then being checked against each Order Date, and returning true when it matches.

So this is giving us the first record, but we want the second….

I solved this, by first adding Is Customer’s First Order? = False as a data source filter (right click on the Superstore data source -> edit data source filters)

This essentially removes all the rows associated to the first order for each customer from the data set, and consequently, the Is Customer’s First Order? is now reported as True against what was originally the 2nd order…
If you compare the image above to the one further up the page where I marked the 2nd orders, you should see the values match what now seems to be listed as the first!

This works due to Tableau’s order of operations; the data source filter is filtering the data ‘early on’, so the rows of data we’re now working against, has no knowledge of those rows. Subsequently the customer’s first order is now actually reporting the customer’s 2nd order as the minimum (aka first) order date.

We can now add Is Customer’s First Order? to the Filter shelf and set to True, to just work with this set of orders.

NOTEIf the data source has not yet been extracted, which will be required if you want to publish to Tableau Public, then you will need to do one of the following :

  1. At the point of extraction, the data source filter will automatically be applied as an extract filter. If you keep it, then after extracting, remove the data source filter if its still there, otherwise you’ll find you’re reporting against the 3rd customer order OR
  2. At the point of extraction, remove the extract filter that is automatically applied. After extraction, re-add the data source filter if it’s been removed.

An alternative to using any data source/extract filters is to use the following LoD (which I only got my head round after building out the above, so I didn’t put it in my solution., but works just as well).

Is Customer’s 2nd Order?

{FIXED [Customer ID]:MIN(IIF([Is Customer’s First Order]=False,[Order Date],NULL))} = [Order Date]

The IIF statement is getting the Order Date for all the records which aren’t the Customer’s 1st order. The FIXED LOD, is then returning the minimum/smallest of these dates, which will be the date of the 2nd order. This is then compared to each Order Date to identify the appropriate record.

If you strip off any data source/extract filters, you can see how this works, and Is Customer’s 2nd Order? can be added to the Filter shelf instead, set to True, to get the required rows.

Identifying the orders containing the selected sub-category

To start with we need a parameter to store the selected Sub-Category. Right click on Sub-Category and select Create -> Parameter to create a string parameter which lists the sub-categories. I named mine pSelected_SubCat and it’s defaulted to Accessories.

What we’re now trying to find is the rows relating to Order IDs that contain the stated Sub-Category, in this instance Accessories

So we first need to identify the rows which match

Customer Ordered Selected SubCat

IIF([pSelected_SubCat]=[Sub-Category],1,0)

Add this to the view, and you can see the rows associated to Accessories are marked with 1

But we don’t just want these rows; we also need the rest of the rows on the same order, so in the case of order CA-2018-131534 above, we also need the row associated to the Paper Sub-Category.

For this, I used Sets as per the technique described in the link Candra provided within the hint.

We want the set of Order IDs where the sum of the Customer Ordered Selected SubCat field is >=1. Right click on Order ID and Create -> Set. On the Condition tab, set the relevant properties.

Adding the Orders with Selected SubCat set into the view, and we can see it’s picked up the rows we care about, and we can now filter by this set

An alternative to using Sets, is to use another FIXED LoD calculation

{FIXED [Order ID]: SUM([Customer Ordered Selected SubCat])}>=1

For each Order ID, sum up the Customer Ordered Selected SubCat field, and if it’s >=1 then it will return True, and you can then filter by this instead.

Identifying the ‘other’ sub-categories ordered

We want to count the orders which contain each Sub-Category, which is simply

Count Orders

COUNTD([Order ID])

With a basic view that is filtered as required, that lists Sub-Category by Count Orders (and sorted by Count Orders descending), we get

And you’ll see that Accessories is listed at the top. Change the parameter, and that value will also be listed. But we don’t want to see this – we need to filter it out. For this, we can create

Other Items Ordered

If [pSelected_SubCat]=[Sub-Category] THEN ‘N/A’ ELSE [Sub-Category] END

Pop this into the view, and you can see the ‘N/A’ is listed against the selected Sub-Category.

This means we can add a further Filter using the Other Items Ordered field and set it to Exclude N/A.

Building the bar chart

So now we’ve got the 3 filters we need to identify the rows of data we need to consider, we can easily build a bar chart.

To colour the bars, we need to work out the profitability

Profit Ratio

SUM([Profit])/SUM([Sales])

Is Profitable?

[Profit Ratio]>0

Add Is Profitable? onto the Colour shelf and adjust accordingly

You just then need to format to remove gridlines/axes etc, and add the relevant fields (correctly formatted) to the Tooltip to create the required text.

Building the treemap

The simplest way to do this, is to duplicate the sheet you’ve built for the bar chart, then use the Show Me option (top right) and select TreeMap. This will put most of the pills in the right places. The only thing that needs to be shifted is to add Is Profitable? back onto the Colour shelf, and to explicitly set the Sort order against the Sub-Category field. Count Orders also needs adding to the Label shelf, so it can be displayed.

Creating the toggle control

To create the toggle control which will control the switching of the vizzes do the following :

  • In the Columns shelf, type in MIN(0)
  • Then next to it, type in MIN(1)
  • Drag the MIN(1) pill from the Columns shelf and drop onto the MIN(0) axis, when you see the ‘2 green columns’ appear
  • Remove Measure Names from the Rows shelf
  • Change the mark type to Line
  • Create a parameter called Toggle which is an integer containing the values 0 & 1, defaulted to 0. Show this parameter on the view.
  • Create a new calculated field, Selected Toggle which just references the Toggle parameter
  • Add Selected Toggle to the Columns shelf and change to aggregate to MIN rather than SUM
  • Make it dual axes, and synchronise the axes
  • Remove Measure Names from the Colour shelf on the All marks card.
  • On the Selected Toggle card, change the mark type to Circle, and increase the Size. If you change the Toggle parameter the circle should shift to 1.
  • To colour the line based on the position, create a new calculated field

Colour

IF [Toggle]=0 THEN ‘grey’ ELSE ‘green’ END

  • Add Colour to the Colour shelf of the All marks card. With the Toggle parameter set to 0, set the colour to grey. Change the parameter to 1, and set the colour to green.
  • To set the tooltip create a new calculated field

Toggle Label

IF [Toggle]=0 THEN ‘Bar’ ELSE ‘Treemap’ END

  • Add Toggle Label to the Tooltip shelf of the Selected Toggle marks card. Adjust the tooltip text to just show this value.
  • Remove the text from the Tooltip of the Measure Names marks card.
  • Hide the axes headers and remove the row & column dividers

Controlling what viz to display

With the Toggle parameter set to 0, add the Selected Toggle field to the Filter shelf of the bar chart. and set it to be 0 (or at most 0 depending what filter control displays)

Then set the Toggle parameter to 1, and add the Selected Toggle field to the Filter shelf of the Treemap chart. Set this to be 1 (or at least 1 depending on the filter control displayed)

You should now find that if you switch to the bar chart sheet, nothing is displayed. Change the Toggle parameter back to 0, and the bar chart will now show, but there will be nothing on the treemap chart.

Creating the dashboard and interactivity

Create a dashboard sheet.

Add a vertical layout container

Add the bar chart into the vertical layout container. Set the chart to fit entire view and don’t display the title.

Add the treemap chart beneath the bar chart in the same vertical container. Whilst nothing is actually displaying, still set the chart to fit entire view and again don’t display the title.

The Toggle parameter should have automatically displayed on the right hand side, so test changing the parameter to see how the chart switches.

Add the Toggle Control sheet into the right hand container too.

Add a dashboard action to change the Toggle parameter when the Toggle Control is interacted with. It should take effect on Select of the Toggle chart, affect the Toggle parameter and pass the Measure Values field into the parameter.

Clicking on each end of the Toggle chart should now make the viz change from bar to treemap.

You just now need to tidy up the dashboard – add a title, remove any unrequired objects, set the container background of the right hand panel to grey (you will need to set the background on the toggle sheet to the same shade of grey too).

And hopefully that’s it. My published version is here.

Happy vizzin’! Stay Safe!

Donna

What percentage of sub-orders are profitable?

It was Luke’s turn to provide the challenge for this week – to produce a modified mekko chart (or marimekko chart / mosaic plot – see here for more information).

Luke suggested it would require LODs to solve, but that it would also be possible with table calcs. I tackled it with a mixture of both, mainly based on what felt right at the time.

Like most challenges I worked out the data I needed for the final viz in tabular form first, so I could ratify the numbers and calculations I built before building the viz.

  • Building out the data
  • Building the viz

Building out the data

The y-axis of the viz plots the % of profitable orders for each Sub-Category. So the first thing we need to do is identify a profitable order. A single order can have multiple product lines, where each product line might be associated to a different Sub-Category. We need to determine whether the profit against all the products for the same Sub-Category on a single order is positive (ie profitable) or not. We use an LoD for this

Order Is Profitable

IIF({FIXED [Order ID], [Sub-Category]: SUM([Profit])}>0,1,0)

For each Sub-Category within a single Order ID, check if the total profit is a positive number. If it is then return 1 else 0. I’m purposefully choosing 1 and 0 to help with the next step.

To determine the % of profitable orders, we need to know the total of all the profitable orders as a proportion of all the orders for a Sub-Category.

Count Orders

COUNTD([Order ID])

The number of distinct orders.

Profitable Orders

SUM([Order Is Profitable])/[Count Orders]

formatted to be a percentage with 0 dp.

Put these into a table, along with the Sales measure, and you can see what’s going on

We need to sort this data, first by Profitable Orders desc, then by Sales desc. And we need to sort in a way that can be used once this table of data is displayed in the required Viz format. If we just wanted to sort this table of data displayed here, we can use a technique described in Tableau’s KB here. However this doesn’t work for the viz, as it relies on the newly created dimension existing on the Rows shelf, which won’t work when we get to building the viz, as we can’t put it on rows. Since the field also contains table calculations (Rank), you can’t reference the field in the Sort option of a pill.

Anyway, based on this, and after a bit of trial and error, I managed to create a sort field that I could reference.

Sort

STR(ROUND([Profitable Orders],2)) + “-” + STR(ROUND(SUM([Sales])/1000000,2))

Here I’m building up a string field combining the Profitable Orders field, that I’ve rounded to 2 decimal places, with the Sales field that I’ve rounded to $millions at 2 decimal places. This is to ensure that since we’re working with string data, 800 is ordered after 8000 when sorting descending.

Pop this into the view, and set the sort property of the Sub-Category pill to sort by Sort descending

There may be a better way of doing this, and I’m not sure it would work in all circumstances, but it worked for this challenge. I’ll be interested in seeing how others approach this element of the challenge.

Now, that’s resolved, let’s get the other fields we need.

Along with the Profitable Orders %, we also need to display the percentage of non-profitable orders

Non-Profitable Orders

1 – [Profitable Orders]

I also chose to format this to % with 0dp (purely for display purposes in the table).

The width of the bars in the viz, is based on the % of total sales, so let’s work that out…

% of Total Sales

SUM([Sales])/TOTAL(SUM([Sales]))

TOTAL is a table calculation that ‘totals’ up the sales in the table. In this case we care about the whole table, but as with any table calculation it can be set to apply to certain partitions in the view.

Having worked out the % of Total Sales, we now need to work out where to plot each Sub-Category along the x axis, as due to the variable width of the bars, we can’t just use the Sub-Category dimension on the Columns shelf.

Each Sub-Category is going to be positioned based on its % of Total Sales relative to it’s position in the sort order. That is we need to work out the cumulative value of % of Total Sales.

% of Total Sales Cumulative

RUNNING_SUM([% of Total Sales])

This totals up the % of Total Sales as we go down the rows in the table, as shown below. I’ve included column grand totals, so you see that % of Total Sales adds up to 100%, and the cumulative version sums up the previous values as it goes down the table, ending in 100% (note the numbers have been rounded due to 0dp, but if you changed the formatting, you’d see this better).

Now we have all the core elements we need to build the viz.

Building the Viz

On a new sheet,

  • add Profitable Orders to Rows
  • add Sub-Category to Detail
  • add % Total Sales Cumulative to Columns

Edit the table calculation setting of the % Total Sales Cumulative field, so that it is computing by Sub-Category (Compute Using -> Sub-Category option from the drop down /context menu of the pill).

Set the sort option against the Sub-Category pill to sort by the Sort field descending

Change the mark type to Bar.

Add % of Total Sales to the Size shelf. Then change the Size option (by clicking on the button) from Manual to Fixed, and set the alignment to Right

Add Non-Profitable Orders to Rows, then right-click on the relevant axis, Edit Axis, and set to reversed

This is the basic viz – it just now needs formatting

  • Add Measure Names to Colour and adjust accordingly
  • Set a white border around each bar (via the Colour shelf)
  • Edit the % Total Sales Cumulative axis, and change to start from -0.05 which will give a bit of space at the front
  • Remove all row / column borders and gridlines
  • To get the black bar visible across the 0 line, I ended up adding a reference line; a constant set to 0, formatted to a black line.
  • Hide the axes
  • Add appropriate fields to the Tooltip and set accordingly.
  • Add Segment to the Filter shelf.

Then just add to a dashboard, and you’re all set. My published viz is here.

Happy vizzin’! Stay Safe!

Donna

Can you do Comparative Quantity Analysis?

For this week’s challenge, Sean wanted to test us on a particular aspect of Tableau – the use of INCLUDE & EXCLUDE LOD expressions, although he then finished off the challenge intro with the statement “You can complete this challenge however you like but just DON’T USE A FIXED LOD”.

Like many, I am pretty comfortable using a straightforward FIXED LoD, but rarely get INCLUDE & EXCLUDE ‘out of my toolbox’. I usually end up using them because I’ve followed an example online to achieve a specific task. With this challenge, I had every intention of trying to make use of these ‘lesser used’ LoDs when I started out.

However I did struggle. I got to a point where I knew what I’d do with table calculations, but however I tried with various LoD statements, I couldn’t get the behaviour I desired. And at this point, I had a working solution that hadn’t used FIXED LoDs either, so based on Sean’s closing statement, I had achieved a result. When doing these challenges, I do have to consider how much of my time to devote to them (although COVID restrictions does mean my home/family/social life if not as busy at the moment), so I chose to publish so I could feel content I had completed the challenge. However, there was a niggle, that I hadn’t managed to apply the functions the challenge was really testing for.

A twitter post the next day from a fellow #WOW participant Nik Eveleigh, calling out how helpful Tim Ngwena‘s Include LoD YouTube video was, piqued my interest, and that evening I set about watching the video, and subsequently managing to create another solution using the required INCLUDE & EXCLUDE LoDs, without any table calculations either.

So for this blog, you’re gonna get two solutions πŸ™‚ I’ll explain the various fields required for each solution, and build out a tabular view to demonstrate. Building the view is then pretty much the same.

  • Getting Started
  • The Table Calculation Solution
  • The LoD Solution
  • Building the Viz
  • Adding the dashboard interactivity

Getting Started

First up, just a note about the data set. You need to use the Superstore dataset, but need a version that includes the Manufacturer field. This doesn’t seem to exist by default in the excel file included as part of Tableau’s install. So I used the data provided by Sean on data.world.

Secondly, once connected to the data source, we’re going to need a parameter that is ultimately going to be used to drive the dashboard interactivity later on. The parameter needs to store the value of the Sub-Category the user has selected on the chart. The parameter is required regardless of which solution you choose to follow.

pSelectedSubCat

This is a string parameter that contains the text Fasteners by default. Create this and then show it on the view.

The Table Calculation Solution

The first measure being displayed on the viz is the average Quantity by Manufacturer for each Sub-Category; that is for the total quantity of goods sold against each sub-category, how much on average did each manufacturer order?

Count Manufacturer

COUNTD([Manufacturer])

Identifies the distinct number of manufacturers

Avg Qty

SUM([Quantity])/[Count Manufacturer]

Lets put these fields out into a table, so we can see the values. I’ve sorted the rows by Avg Qty descending, and you can see the values match the required output.

Now we need to identify the Avg Qty associated to the selected Sub-Category.

Selected Avg

IF [pSelectedSubCat]=MIN([Sub-Category]) THEN [Avg Qty] ELSE 0 END

This will return the value we want for the row that matches the value of the parameter, and 0 otherwise. The Sub-Category dimension is wrapped in MIN as Avg Qty is an aggregated value, so the dimensions also need to be aggregated. MAX or ATTR will work as well.

Pop this field onto the table.

For the next step, we need to find a way to ‘replicate’ this 152 value across all the rows displayed. When building, I knew exactly what I would do next with a table calc, but I thought this is where the LoD needed to be used, and despite my best efforts I just couldn’t figure it, but more on that later. For now, we’re going to use a table calculation to replicate the value across the whole ‘window’.

Avg of Selected Sub Cat

WINDOW_MAX([Selected Avg])

This is basically looking for the maximum value of the Selected Avg field, which is 152 (as all the other values are 0).

Add this to the view, and set the table calculation to compute using Sub-Category. This is just good practice when using table calcs, as it means wherever the pill gets moved to in the view, it will retain the same computation.

Variance

[Avg Qty]-[Avg of Selected Sub Cat]

Add this onto the table, and again set the table calculation to compute by Sub-Category.

If you change the value in the pSelectedSubCat parameter, you’ll see the last 3 columns change.

So now we’ve just got a few more fields to create that we’ll need on the viz.

We need to be able to identify the selected Sub-Category which will be used to colour the bars.

Is Selected Sub Cat?

[Sub-Category]=[pSelectedSubCat]

The tooltip also shows the text ‘more’ or ‘less’ when describing the display, so we need a field for this

TOOLTIP: more or less

IF [Variance] <= 0 THEN ‘less’ ELSE ‘more’ END

And finally, the tooltip displays the variance as an absolute number, so we need a field for this too

TOOLTIP:Variance

ABS([Variance])

So this is all the core fields you’ll need to build the solution, so if you’re not interested in the LoD route, skip over the next bit πŸ™‚

The LoD Solution

After watching Tableau Tim’s video, I realised why I had struggled to get an LoD solution to work. I needed to revisit my approach to calculating the average quantity, a step I hadn’t originally considered.

Now in the interest of time, I’m not going to write out why the calculation below works in detail when Tim demonstrates it so eloquently in his video πŸ™‚

Qty Inc Manufacturer

{INCLUDE [Manufacturer] : SUM([Quantity])}

This field is going to help us get our ‘average quantity’ value.

Add Sub-Category, Quantity and Qty Inc Manufacturer into a tabular view, and change the aggregation of the Qty Inc Manufacturer pill from SUM to AVG. Order by this field descending, and you can see we have the same values as the Avg Qty field I created in the above solution.

This time, I’m going to start with the field to identify the selected Sub Category.

Is Selected Sub Cat

[pSelectedSubCat]=[Sub-Category]

Then we need to identify the value associated to the selected sub category, and will do this with two fields

Selected Qty

IF [Is Selected SubCat] THEN [Quantity] END

Selected Qty Inc Manufacturer

{INCLUDE [Manufacturer] :SUM([Selected Qty])}

Add these onto the view, and you can see the behaviour. Note this time I chose not to bother with an ‘Else 0’ statement (it wasn’t necessary for the above either).

Now we want to be able to ‘spread’ this value across all rows, and the Comparative Sales Analysis example in this Tableau blog post helps here (and I’ve just seen that Tableau Tim has produced an EXCLUDE LoD video too, which I’m sure will explain things very well).

Overall Selected Qty Inc Manufacturer

{EXCLUDE [Sub-Category], [Is Selected SubCat]: AVG([Selected Qty Inc Manufacturer])}

Note – I found I needed to add the Is Selected SubCat dimension into the statement, to get the tooltips working on both the bar charts.

And once again, we can now compute the variance, and the variance required for the tooltip, and the more or less values:

Variance

AVG([Qty Inc Manufacturer]) – MIN([Overall Selected Qty Inc Manufacturer])

TOOLTIP:Variance

ABS([Variance])

TOOLTIP : more or less

IF [Variance] <= 0 THEN ‘less’ ELSE ‘more’ END

Building the Viz

To build the viz

  • Add Sub-Category to Rows
  • Add Avg Qty (or Qty Inc Manufacturer set to AVG) to Columns. Sort by the field descending.
  • Add Variance to Columns
  • On the Avg Qty marks card, add Is Selected Sub Cat to the Colour shelf and adjust accordingly
  • On the Variance marks card, add Variance to the Colour shelf, and notch down the transparency to about 75%
  • Add TOOLTIP:Variance and TOOLTIP:more or less to the Tooltip shelf of the All marks card
Then it’s just a case of formatting the display and the tooltips.

Remember – if you’re working with the table calc solution, make sure all the table calculation fields in the display (those with the triangle symbol) have all been set to compute by Sub-Category.

Adding the dashboard interactivity

When you click on a Sub-Category, the variance needs to change. This is managed via a Parameter Dashboard Action. Once the viz has been added to a dashboard then add a dashboard action that on select of the viz, sets the pSelectedSubCat parameter, with the value of the field from the Sub-Category dimension.

There’s also an additional feature in this dashboard that stops the selected field from being highlighted on click. To stop this from happening, create 2 new fields

True

True

False

False

Add both of these to the Detail shelf on the All marks card of the chart viz.

Then on the dashboard, add a dashboard filter action, which on select targets the chart sheet, and maps True to False. As this is never a match ie ‘true’, the filter doesn’t apply.

Hopefully I’ve covered everything. My published vizzes are here

Happy vizzin’! Stay Safe!

Donna

Can you recreate the work of Hans Rosling?

This week, Lorna set the challenge to recreate an iconic data viz by the legend, Hans Rosling. You ABSOLUTELY MUST watch the TED talk link from the challenge if you haven’t ever seen this before. Hans is utterly engaging in making this data ‘come to life’!

The challenge involves the following key components

  • modelling the data
  • building the scatter chart
  • displaying the year in the chart
  • highlighting the regions

Modelling the Data

The data provided consisted of 4 files

  1. Life Expectancy
  2. Population
  3. Income
  4. Region Mapping

The first 3 files are stored as a matrix of Country (rows) by Year (columns) with the appropriate measure in the intersection. This isn’t the ideal format for Tableau so the data needs to pivoted to show 3 columns – Country, Year and the relevant measure (life expectancy, population or income depending on which file is referenced).

In Tableau Desktop, connect to the Life Expectancy file. If you find the column names all seem to nave generic labels, then right click on the ‘object’ in the data source pane, and select Field names are in the first row.

This will change so all the years are now listed as column headings

Now we need to pivot this data; click on the 1800 column, then scroll across to the end, press shift and click on the final column to select all the year columns. While highlighted, right-click and select Pivot.

Your data will be reshaped into 3 columns

I then renamed each column as

  • Country
  • Year
  • Life Expectancy

I also chose to change the datatype of the Year column to be a number (whole), as I knew I’d be needing to relate the data on this field later, and working with numeric data is more efficient than strings.

Now I want to bring in the next file – Population (via the Add connection option). But before following these steps, just read on a bit, as I hit a hurdle, which at the point of writing I’m still a bit unclear about…..

I added the Population file into the data source pane. By default it showed the Edit Relationship dialog, but I had the same issues with the column headings.

So I closed the dialog, and fixed the headings by setting the Field names are the first row option. Trying to apply the pivot at this point wasn’t available – no Pivot option when I selected all the columns. So I then manually edited the relationship and set County = country (population)

This populated the data, but again when I tried to apply the pivot option, it wouldn’t display 😦

I’m not sure whether this is possible (Google hasn’t shown me that it isn’t, but it hasn’t shown me that it is either…). I’ve tried using csv and excel versions of the file…..

So until I see the official solution (or I find out otherwise), I carried out the following approach instead :

I removed Population object, so I was just left with the Life Expectancy object. I then went to sheet 1, right clicked on the data source and selected Extract Data -> Extract and when prompted saved the file as Life Expectancy.hyper.

I then started a new instance of Desktop, and connected to the Population file, and went through the steps outlined above – set the Field names are first row, selected all the years to pivot, renamed the fields, set the datatype of the Year field to a whole number, then extracted and saved as Population.hyper.

I then repeated the exact same process with the Income file.

Once I had the 3 hyper files created, I then instantiated another instance of Desktop, and connected to the Life Expectancy.hyper file.

This automatically added an object called Extract into the pane. I renamed this (via right click) to DS-Life Expectancy. I also found my field names hadn’t been retained, so I renamed these too.

I then added the Population.hyper file. I set the relationships as below (once again I’d lost field names 😦 )

I then renamed the object & columns – note it won’t let you add fields with the same name as in the other object in the data pane, so I suffixed the columns

The process was once again repeated to add the Income data

and finally the Region data was also added matching Country to Name

And now, after all that, I have the data in the format I need to build the scatter. Phew!

[Side Note] – whilst writing this blog, I have been messaging my fellow #WOW participant Rosario Gauna. We often check in with each other, if we’ve noticed something odd with the challenges, to sense check whether its an issue with the data/the requirement, or just our own personal interpretation/development. In this instance we confirmed we were both using v2020.4.2, BUT Rosario had been able to do exactly what I set out to do at the start of this blog!! She had a pivot option on her 2nd connection. We’ve tried to figure it out, but have been unable to… I am using Windows, but Rosario is on a Mac, so there might be something there….. It will be interesting to understand if anyone else encountered my problem…. I really didn’t think the modelling part of the challenge was intended to be as fiddly as I’ve described above.

Building the Scatter Chart

The data we have goes beyond 2021, so we need to filter. Add Year to the Filter shelf and set the condition as below

We also have some countries that aren’t mapped to a region (Eight Regions field), so all add this field to the filter shelf and set to exclude Nulls.

Set both these filters to Apply to All Using this Datasource as they’re relevant across all sheets that will be built.

Now build the scatter by

  • Add Year to the Pages shelf
  • Add Income to Columns and change to Dimension (rather than Sum) – still leave to be Continuous
  • Add Life Expectancy to Rows and change to Dimension (rather than Sum) – still leave to be Continuous
  • Add Country to Detail

If you change the Year on the pages control to 2021 you should get something like

It’s roughly the right shape, but not spread as expected.

If you examine the Income axis on the solution, you’ll see the scale isn’t uniform. This is because it’s using a logarithmic scale instead, which you can set by right-clicking on the Income axis -> Edit Axis and selecting the relevant checkbox.

Now we can add Population to Size, change the mark type to circle, and we also want to colour by the region, but we need a new field for this to get it in the format we need for tooltips etc:

Region

UPPER(REPLACE([Eight Regions],’_’,’ ‘))

Add this to Colour and adjust accordingly, reducing the transparency a bit. Format the tooltips and you should have the main chart

Displaying the Year in the Chart

On a new sheet build a very simple view that just shows the Year in the Text shelf, that has been formatted to a very large (72pt) point, centred in the middle, and a pale grey colour.

On the dashboard, add this Year sheet. Then when you add the Scatter sheet, add it as a Floating object and position directly on top of the Year sheet. The Year sheet won’t be visible, so navigate to your Scatter sheet, right click on the canvas and format. Set the background of the worksheet to None ie transparent

If you return to the dashboard, the Year sheet should now be visible.

Note, as you add more objects to your dashboard, you will have to move things around a bit to get the layout as desired.

Highlighting the Regions

Create a simple table that displays the Regions in a row as below (the Show Header on the Region pill on the Columns shelf is unchecked).

Add this to the dashboard, then add a Highlight Dashboard Action that sources from the Region view and affects/targets the Scatter chart.

The final comment to make is that you can’t remove the ‘speed control’ from the page control display. However, when published to Tableau Public, it doesn’t show, so I assume is not enabled (or broken) there.

Hopefully this now gives you what’s needed to rebuild this fabulous viz. My published version is here.

Please do let me know if you had issues with your data modelling like I did!

Happy vizzin’! Stay Safe!

Donna

Can you build a “must include” filter?

Ann Jackson set this week’s challenge inspired by a business requirement she’d encountered, which was to be able to filter the data based on a user selection, and then additionally apply another filter to narrow down the required data.

The task ‘splash’ page had already indicated the challenge would involve sets, and the requirements additionally stated you’d need version 2020.2 or higher, which added an additional hint that set controls would be involved. Examining Ann’s solution, I could also see that the tell-tale set icon displayed when hovering near the input control, which confirmed the ‘filters’ were indeed using the set control functionality.

  • Building the bar chart
  • Viz in Tooltip
  • BANs

Building the bar chart

Let’s start by focusing on the bar chart. The first thing to do is to rename several fields. Ann likes capital letters, so the following fields just need to be renamed so they’re all in uppercase: Sales, Quantity, Product, Order ID and Customer (Name). Sales can also be formatted to $ with 0dp.

We create the first set we need by simply right-clicking on the PRODUCT field -> Create -> Set. I named this 1st Products and set to the Use All condition.

Lets build out a basic table and then we can play with how we need to use this set. Put

  • ORDER ID on Rows
  • CUSTOMER on Rows
  • Add Sales to Text
  • Drag Quantity over the Sales column, so Measure Names gets added to Columns and Measure Values is on Text
  • Sort by Sales descending

If we now add the 1st Products set to the Filter shelf and choose the Show Set option, the list of products in the set will be displayed for selection.

Choose the 1st item in the list, and the data will be filtered to just those customers who have ordered that product

But, while this list matches the names if you do the same on the solution, the Sales and Quantity values differ. This is because, we have actually filtered the data just to the lines containing the selected product. If you bring PRODUCT onto Rows, you’ll see you have 1 line per customer.

Now, while you can use LOD calculations to compute the total sales/quantity at an order id level, to resolve this, it won’t solve the next step of the puzzle, to filter the data further by other products on the same orders, as we’ve already filtered out all the other data. So we need to do something else.

To demonstrate, we just want to keep a handle on these specific ORDER IDs so ctrl-click to select them all, and then Include to add them to the Filter shelf. Now remove the 1st Products set from the Filter shelf, then re-add back to the Rows.

You’ll see we now have all the product lines on each order, with an In or Out displayed against each row. The rows which match the PRODUCT selected in the set, is marked as In. This is essentially what the filter is doing if 1st Products is on the Filter shelf – it is filtering the rows to those which return as In (the set).

We basically want a way to be able to identify all the rows on an order which have at least 1 In row, so we can filter on that instead. For this we can create a new field,

Order has 1st Products

{FIXED [ORDER ID]: MAX([1st Products])}

What this is doing is saying for each ORDER ID, get the maximum 1st Products value, which is then ‘stored’ against each row for the same ORDER ID. This may look odd, as 1st Products from the display is showing an ‘In’ or an ‘Out’, so how does the MAX work? Well, whilst the display is In or Out, under the bonnet this is actually a boolean field of 1 = True = In or 0 = False = Out, so the MAX is actually returning either a 1 (true) or a 0 (false).

If you add Order has 1st Products to Rows, you can now see every row in this restricted data set is listed as True

Remove the ORDER ID from the Filter shelf, and you should observe all the other orders listed have False listed against every row.

So we can now add Order Has 1st Product = True to the Filter shelf, and we can remove 1st Products from Rows too.

So we’ve mastered the 1st product filter, time for the 2nd. Once again right-click on PRODUCT and create another set with the Use All condition. This time I called the set 2nd Product.

Just as before, we want to identify all orders that contain this 2nd product selected, so we’ll create another calculated field

Order has 2nd Product

{FIXED [ORDER ID]: MAX([2nd Product])}

Add this field to the Filter shelf and select True. Now click on the 2nd Product set in the Dimesions pane, and selected Show Set, to display the list of products

You’ll notice though that all products are listed, and not just the ones that are associated to the set of orders already filtered. To fix this, we need to add the Order Has 1st Products field to context (click on the Order Has 1st Products pill on the Filter shelf and Add to Context). This will change the pill to a grey colour, and you can now restrict the values in the 2nd Products set to only show All Values in Context.

The add to context function allows us to dictate the order in which the filtering is applied, so the data is primarily filtered based on our 1st Products set and then filtered based on the 2nd Product.

You should now be able to play around with the filtering and see how the rows for complete orders are retained.

Removing PRODUCT from the Rows, and we get back to the total order value / quantity values we’re expecting.

From this, you should now be able to build out the viz. Firstly, duplicate the sheet so we retain the original table, then move SALES & QUANTITY to the Columns, add Measure Names to Colour, add Labels to match mark colour, and format accordingly.

Viz in Tooltip

As an ‘added extra’ Ann added the ability to hover on a bar and see the details of the order in a simple table. We can use the original table we built above to demonstrate the functionality. Add PRODUCT to Row, and replace the QUANTITY field, with a new field you need to create called QTY, which is just based on QUANTITY ie

QTY

[QUANTITY]

Then hide the ORDER ID & CUSTOMER fields (uncheck Show Header) on the pill. You could remove completely, but I like to retain just in case we need to debug if things don’t work.

We will need to show some totals, but this isn’t the Grand Total. Grand Totals will work, but the background of the row will always be a fixed colour. Either white, by default, or set to grey if you choose. However I noticed that depending on the order id I hovered on, the Grand Total row may display grey or white depending on the existing banding. For this we need to use subtotals instead (Analysis menu -> Totals -> Add All Subtotals).

Because we have multiple (hidden) dimensions on the rows, multiple Total rows will display. We can remove by unchecking the Subtotals option against the ORDER ID pill.

And finally, we need to relabel the row. Right-click on the Total row and select Format. Change the label of the Totals section from ‘Total’ to ‘Grand Total’.

With additional formatting to change the font size etc, this sheet can now be added as a ‘Viz in Tooltip’ to the bar chart. On the Tooltip shelf associated to the bar chart, you can Insert -> Sheets -> <Select Sheet>

Hovering over a bar should now automatically filter the tabular display to the appropriate CUSTOMER & ORDER ID.

BANs

For the BANs, we need several additional calculated fields

# ORDERS

COUNTD([ORDER ID])

simply counts the number of distinct Order IDs in the filtered view.

Total Orders

{FIXED:COUNTD([ORDER ID])}

an LOD calculation that counts the overall number of distinct Order IDs, regardless of any filters applied.

% OF TOTAL ORDERS

[# ORDERS]/SUM([Total Orders])

formatted as a % to 1 dp.

AVG ORDER AMOUNT

SUM([SALES]) / [# ORDERS]

formatted to $ 0 dp.

AVG ORDER QUANTITY

SUM([QUANTITY])/[# ORDERS]

formatted to a number with 0 dp.

Add Measure Names to the Filter shelf, and filter to the relevant 4 measures. Add Measure Values and Measure Names to the Text shelf, and Measure Names on Columns. Add the Order Has 1st Products = true and Order has 2nd Product = true to the Filter shelf, but this time don’t add anything to context. If you do, it will affect the Total Orders measure, and the % of orders calc will be wrong. Hide the Measure Names on the columns (uncheck show header), and format the text appropriately.

To get the thick lines, format and set the Row Divider on the sheet to be a thick coloured line.

And you should now have all the core building blocks needed to add onto a dashboard.

My published viz is here.

Happy vizzin’! Stay safe!

Donna

Can you hide a chart in map layers?

Candra set the challenge this week to use the new map layers to build a map display which, on click of a country, filtered the display to that country and additionally displayed a donut chart indicating the percentage of urban dwellers in that country.

If map layers are very new to you, then the webinar by Adam McCann referenced in the challenge, has some VERY useful pointers for this challenge (the workbook for that can be downloaded from here, as I found some things needed closer inspection).

I think this challenge is going to be best described by walking through the steps.

Building the 1st map layer

Double-Click on Country/Region to load a map, and change the mark type to (filled) Map. Add Region to the Colour shelf and assign the appropriate colours. Adjust the map background via the Map -> Map Layers menu and set the Style to dark, and remove all the selections against the Map Layers list

The intention is when a country is selected the map will ‘drill into’ /filter that country, and display additional information. We will drive this by a parameter, which will get set via a parameter action, but for now we’ll manually set the value.

Create a new string parameter, that is default to nothing/empty string/ ”, and then show this on your sheet.

pSelectedCountry

We need the 1st layer of the map to display, when there is no value in the parameter. We need a calculated field to help drive this.

All Countries

IF [pSelectedCountry]=” THEN [Country/Region] END

By default this will create a field of type ‘string’ but we need it to be a geographic data type. so change this as below.

Add this field to the Detail shelf of the map, and remove the Country/Region field that was automatically added when we first built the map.

Enter the name of a country, eg China, into the parameter. The map should essentially go blank (black screen).

Building the 2nd map layer

We need the country to display, if it’s entered into the parameter. For this, we need another calculated field

Selected Country

IF [pSelectedCountry]=[Country/Region] THEN [Country/Region] END

Once again change this to a geographic role data type of type County/Region.

Next click and drag this field onto the map, and drop it onto the Add Marks Layer option that displays. This will create a new Selected Country marks card, although nothing will obviously change on the map display itself.

Move the Selected Country pill to be on the Detail shelf instead, and add Region onto the Colour shelf. Change the mark type from circle to Map.

Now if you enter a county into the parameter, eg China, the display should ‘filter’ & ‘zoom in’ on China.

So what we have is the 1st layer only showing when no countries have been selected, and vice versa, the 2nd layer only showing when a country has been selected.

We need to now add further layers for the donut chart, which only want to show the country has been selected as well.

Building the 3rd map layer

A donut chart, in the past, is traditionally created by building a pie chart, then using a dual axis to add a circle, sized smaller that the pie chart, on top (see this Tableau KB for info). Rather than a dual axis, we’re going to use map layers – 1 layer for the pie chart, and then another layer for the central circle.

Keeping a country selected (so we can see what we’re building), drag Selected Country onto the map again to create another map layer. Change the mark type to Pie and increase the Size to as large as possible. Move Selected Country to Detail.

In the data set we have a field called Population Urban which stores the ‘percentage’ value of urban dwellers eg 0.17 is 17%. To create the angles for the pie chart, we need to know

Population Non-Urban

1-[Population Urban]

Drag Measure Values onto the Angle shelf. This will automatically add Measure Names to the Filter shelf. Edit the filter to just select the Population Urban and Population Non-Urban measures. Drag the Measure Names field that was also automatically added to the Detail shelf, to the Colour shelf. Adjust colours accordingly, and set the border of the pie chart to white (under the Colour shelf options).

Verify that if you set the parameter to empty again, the whole world map displays, and you can’t see any pie charts.

Building the 4th map layer

Now we need to make the donut hole. Once again, ensure a country is selected, so your pie chart is visible, then drag Selected Country onto the map again, and drop to add another map layer.

This time, move the Selected Country field onto the Detail shelf, add Region to the Colour shelf, and adjust the size of the circle, so its smaller than the pie. Set the border of the circle to be white again too,

Add Population Urban onto the Label shelf, and format to a percentage with 0 dp. The best way to do this, is to format the Population Urban measure in the data pane (right click->default properties -> number format).

At this point you’ll notice the number is huge… we need to add Year to the Filter shelf, and select 2012.

Align the label to be middle centre, and adjust the font to be much bigger text. Add ‘urban dwellers’ underneath.

Once again, verify you get the expected behaviour as you change the values in the parameter from nothing to Russia or China etc.

The final step on this sheet is to add text to the Tooltips. Unlike when working with dual axis, you don’t have an All marks card, so you’ll need to add the required fields (Country/Region, Region, Population Total (formatted to Millions with 0dp), Population Urban to the Tooltip shelf on each of the relevant layers.

Setting the parameter interactively

Create a dashboard sheet, and add the map sheet you’ve built. Then create a dashboard action which sets the pSelectedCountry parameter, impacting the All Countries field, and that when the selection is cleared, the value is reset to ”.

My published viz is available here. Enjoy!

Happy vizzin’! Stay Safe!

Donna