Can you create digital signage in Tableau?

Community Challenge month continues this week, and Yusuke Nakanishi set this challenge where the concept is to engage the audience by providing some automated highlighting of the dimensions in the viz.

This challenge did take a bit of thinking, and what I document below is ultimately what you’ll need to build it as I did; however it took me a bit of ‘back and forth’ to get there.

Modelling the data

The viz essentially ‘ranks’ the Sub-Categories based on their Sales in 2023 (Note – I used the version of Superstore shipped with v2023.1, so I filtered to 2023 to get the same data as presented in the solution). The page control cycles through the rankings, highlighting the Sub-Category with the matching rank. Adding pills to the Page shelf essentially applies a filter to the data, so I couldn’t use the ranking calculation I created, as that would filter out all other data. I therefore created a ‘scaffold’ data source in Excel, which contained a single column called Number and contained a row for each number from 1 to 50 (I didn’t want to limit it to the number of Sub-Categories I knew existed, so I could demonstrate a more ‘generic’ solution).

I then related the Superstore data set to the excel sheet using a calculated relationship of 1=1

Creating the core calculations

We’ll work against a tabular view of the data to start with, so add Order Date to Filter and filter to the latest year in the data set you’re using (for me this was 2023, but it might be 2022 for you). Then add Category and Sub-Category to Rows and Sales to Text. Sort by Sales descending. Format Sales to $ with 0 dp.

We need to calculate two different rankings. One is the overall rank of each Sub-Category and is used in the highlighting via the page shelf. The other is the rank of each Sub-Category per Category and is needed so the bars can be displayed in the right places.

Overall Rank

RANK(SUM([Sales]))

Add this to the table, and set the table calculation properties to compute using both Category and Sub-Category, so the ranking runs from 1 to 17

Rank by Category

RANK(SUM([Sales]))

Add this to the sheet, and while this is the same calculation as the overall rank, apply the table calculation to compute by Sub-Category only. This means the ranking will restart for each Category

We need to capture the total number of Sub-Categories

Total Sub Cats

{FIXED Year([Order Date]): COUNTD([Sub-Category])}

Add this to the table.

At this point, the only data we’ve referred to in the calculations is the data from the Superstore data set. If we now add Number to Rows, we get every row duplicated 50 times. We need the Number to help with the paging, but we only want to page through 17 items (the number of Sub-Categories). To restrict this we need

Index to Include

[Number]<=[Total Sub Cats]

Add this to the Filter shelf and set to True. This will limit the duplicate rows to 17.

Now add Number to the Pages shelf. The page control will automatically be set to 1, and you’ll see all the data not related to Number =1 disappear.

We need to be able to identify/highlight the Sub-Category whose overall rank matches the value in the page control.

Rank to Highlight

[Overall Rank] = SUM([Number])

Add this to Rows and verify the table calculation is set to compute by Category & Sub-Category only. There should only be 1 row that is True per Sub-Category, and only 1 row in all the data that has Rank to Highlight = True and has data displayed in the other columns.

Move Rank to Highlight to the Colour shelf and remove the Number field from Rows. You should now only have 1 row per Sub-Category and as you cycle through the page control, the relevant row is highlighted in a different colour.

The above calculations are enough for us to build the bar chart, but we also need to be able to identify the selected Category to highlight on the BANs.

Category to Highlight

MIN([Category]) = WINDOW_MAX(IF [Rank to Highlight] THEN MIN([Category]) END)

Breaking this down: If the row is the rank we want to highlight, then return the Category associated to this row, and ‘spread’ it across all other rows (via the WINDOW_MAX function). If the Category for the row, matches this value, then return True.

To see how this is working, add the field the the Tooltip shelf of the table of data, and set the table calculations to compute by Category and Sub-Category (note there are two nested calculations within this calculation and both need setting).

If you hover over any row that has the same Category as the row that is highlighted, the Category to Highlight value will be True, but hovering over rows in the other Categories, will return False.

Building the Bar Chart

On a new sheet add Category to Columns, Sales to Columns and Sub-Category to Detail. Then add Rank By Category as a blue discrete pill to Rows. Verify the table calculation is set to compute by Sub-Category only.

Add Order Date to Filter and restrict to the latest year. Add Index to Include to Filter and set to True.

Double click into the Columns shelf and manually type MIN(0) to create another axis.

Change the mark of the MIN(0) marks card to be Gantt Bar and set the Size to be as small as possible. Add Sub-Category and Sales to the Label shelf of the MIN(0) marks card, and adjust the label so the text is arranged as required. Add Measure Names to the Rows shelf to create a row per Measure.

Make the chart Dual Axis and Synchronise the axis. Change the mark type of the Sales marks card back to a bar, and remove Measure Names from the All marks card.

Add Number to the Pages shelf, then add Rank to Highlight to the Colour shelf of the All marks card. Adjust the table calculation so it is computing by both Category and Sub-Category, then adjust the colours to suit.

Edit the Label of the MIN(0) marks card, so the font is bigger/bolder (I used Tableau Medium Bold 10pt) and set the font to match mark colour.

Hide the MIN(0) top axis, the Category headings and the Measure Names column (uncheck show header). Edit the Sales axis: remove the title, fix the axis from 0 to 199,000, and fix the axis ticks every 100,000. Format the Sales axis, so the tick marks are labelled in $k format. Format the Rank by Category column so the text is bolder and larger (I used Tableau Medium 12pt). Hide the column label (hide field labels for rows), and make the column narrower

Format the sheet, so the background is grey, remove all row dividers, set the column dividers to be thicker, to not apply to the header, and to be at Level 2

Adjust the colour of the column grid lines so they show up on the grey background. Adjust the formatting of the Sales axis labels. Remove all tooltips. Scroll through the page control and verify the highlighted dimensions change.

Building the BANs

On a new sheet, add Category to Columns and Sub-Category to rows. Add Order Date to Filter and restrict to the latest year. Add Index to Include to filter and set to True. Create a new field

Sales per Category

{FIXED YEAR([Order Date]), [Category]: SUM([Sales])}

format to $ with 0dp and add to Text.

Add Rank By Category as a blue discrete pill to the Rows shelf, and move Sub-Category to Detail. Verify the Rank by Category table calculation is set to compute by Sub-Category only.

Add Number to the Pages shelf, and then add Category to Highlight to the Colour shelf. Verify both the nested table calculations are set to compute by Category and Sub-Category only, then adjust the colour accordingly.

Move the Rank by Category field from the Rows shelf to the Filter shelf and set to filter to 1, so only the first row displays. Add Category to the Text shelf, and adjust the text positioning and size as required, and align centrally. If you wish, change the mark type to Shape and set to use a transparent shape. (this just gives a nicer user experience when clicking/hovering on the text – see this post for information). If you do this, you’ll need to adjust the Label to match mark colour. Hide the Category header, remove row dividers and format the background to be light grey. Remove the tooltip and update the title.

Building the Summary Ranking

While the requirements suggested the viz should be built using 2 sheets, I couldn’t manage this (I was hoping to use the Caption feature on the bar chart, but as I used table calculations, the values wouldn’t retain). So I built this section with a 3rd sheet, and checking the solution after posting, found that it also had 3 sheets.

On a new sheet, add Order Date to Filter and set to the latest year, and add Index to Include to filter and set to True. Add Sub-Category to Detail, Number to the Pages shelf and then add Rank to Highlight to Filter and set to True. Verify the table calculation is set to compute by Sub-Category only, and if adjusted, recheck the filter is set to just True. You should have 1 mark displayed, and on hover, it should list the Sub-Category with the rank that matches the number of the page.

Add Overall Rank (set to compute by Sub-Category only), Total Sub Cats and Sub-Category to the Text shelf. Adjust the size and format of the text and align centrally. Again if preferred, change the mark to Shape and use a transparent shape.

Set the background to light grey, hide the tooltip, and verify the display changes as you cycle through the pages.

Now arrange all the sheets onto a dashboard using padding if required to provide some spacing.

My published viz is here.

Happy vizzin’!

Donna

Can you create a jittered bar chart?

The theme of alternative chart types continued with Lorna setting this challenge to create a jittered bar chart. We needed to use the same fake survey data set used in week 4, so initially some data remodelling was required. I’ve already blogged that here, so please refer to the modelling the data section if you need help.

The questions all need to be grouped based on the text that preceded the question, so I created

Question Group

TRIM( SPLIT( [Question], “-“, 1 ) )

which split off the text in front of the first ‘-‘. I actually created this by right-clicking on the Question field > Transform > Custom Split and selecting to to split off the first 1 column using the – separator.

I also created a field to number the responses

Response No

CASE [Response]
WHEN ‘Strongly Disagree’ THEN 1
WHEN ‘Disagree’ THEN 2
WHEN ‘Neutral’ THEn 3
WHEN ‘Agree’ THEN 4
ELSE 5
END

Once I’d done that I was able to ratify the numbers by building a simple table with Response No and Response on Columns, Question Group on Rows, and CNT(Id) on Text.

Eyeballing these numbers against where the ‘bars’ in Lorna’s viz stopped and I figured that was on the right track. However then I wasn’t particularly sure what to do next.

Lorna’s instructions were very brief. I figured I’d need to use a jitter (ie RANDOM()) somewhere, and as I’d need to add Id to the Detail shelf at some point, to generate 1 dot per respondent, I deduced I’d also need

Total Respondents

{FIXED [Response], [Question Group]:COUNT([Id])}

Adding this into by table above and I got the same values as CNT(Id) which is what I expected.

After scratching my head a bit, I decided that Lorna was possibly being vague, as there was probably help ‘out there’. So I googled, and immediately stumbled upon this very useful blog post, which pointed me in the right direction for the additional calculations required.

I created

Columns

RANDOM()

Jitter

RANDOM()

note – due to the way RANDOM() works, Jitter and Columns won’t contain the same value.

Rows

[Jitter] * [Total Respondents]

And with these I could build out the viz

Add Response No, Response and Columns to Columns. Change Columns to be a dimension.

Add Question Group and Rows to Rows. Change Rows to be a dimension. Add Id to Detail.

Change the mark type to circle, reduce the size and add Question Group to Colour and adjust accordingly.

Finally add Question to Tooltip and adjust the tooltip. Then remove all gridlines, hide the Columns axis, change the title of the Rows axis, and remove the row and column titles (hide field labels for rows / columns).

A pretty short blog today! My published viz is here.

Happy vizzin’!

Donna

Can you visualise survey data?

It was Sean’s first challenge of 2023, and he asked us to recreate this chart . Sean referred to it in the requirements as a floating bar chart, others may refer to it as a diverging bar chart or likert chart.

Modelling the data

Fake survey data was provided which was structured with each question in a column

The first thing we need to do is to pivot the data, so we have 1 row per question, per respondent. On the data source tab in Tableau Desktop, multi-select the question columns (Ctrl-click), then click the context menu for one of the selected columns and select Pivot

Due to the width of the screen and the number of columns, you may have to do this in multiple steps. Just multi-select the next set of columns, and select Add Data to Pivot from the context menu instead.

Once complete you should have a data set with 3 columns

Rename Pivot Field Names to Question and Pivot Field Values to Response.

Building the calculations

All the questions have 5 options to respond to ranging from Strongly Disagree -> Disagree -> Neutral -> Agree – > Strongly Agree. Sean hinted we would need to group the questions based on their sentiment. Some questions were worded in a positive way, which meant ‘agree’ responses could be taken as a ‘good thing’. While other questions were worded in a negative way, meaning agree’ responses were actually a ‘bad thing’.

So the first thing to do was to group the questions. I used the group functionality (right click on Question -> Create -> Group).

Sentiment

All the questions where I deemed an ‘agree’ answer was a favourable response to the product I grouped under Positive.

All the question where I deemed an ‘agree’ answer was a non-favourable response to the product I grouped under Negative.

I then wanted to bucket the responses into 3 groups – Disagree, Neutral and Agree – based on the combination of the actual Response, and the Sentiment grouping.

Response Group

IF [Sentiment] = ‘Positive’ THEN
IF CONTAINS([Response],’Disagree’) THEN ‘Disagree’
ELSEIF CONTAINS([Response], ‘Agree’) THEN ‘Agree’
ELSE ‘Neutral’
END
ELSEIF [Sentiment] = ‘Negative’ THEN
IF CONTAINS([Response],’Disagree’) THEN ‘Agree’
ELSEIF CONTAINS([Response], ‘Agree’) THEN ‘Disagree’
ELSE ‘Neutral’
END
END

With this I am essentially swapping disagree responses against questions of a negative sentiment into the agree/positive grouping and vice versa.

Finally I need another couple of calculations just to sense check were we are

Count Respondents

COUNTD([Id])

Question – Display

TRIM( SPLIT( [Question], “-“, -1 ) )

This removes the text that precedes the ‘-‘ from the Question string. I actually created this field using the Split functionality. Right click on Question -> Transform -> Custom Split. Use the separator – and split off the Last 1 columns

This will generate the field int he dimensions pane which I then renamed.

If we put all these out into a table, we can see how the data is shaping up

The next step is to work out the NPS values. When I’ve dealt with NPS in the past, I’ve referred to terms such as Promoters and Detractors, with an NPS score being computed as (Number of Promoters – Number of Detractors)/Total No of Respondents.

In this case a Promoter is anyone who is in the agree bucket.

Promoters

IF [Response Group]=’Agree’ THEN 1 ELSE 0 END

while a Detractor is anyone who is in the disagree bucket

Detractors

IF [Response Group]=’Disagree’ THEN 1 ELSE 0 END

The total number of respondents is

Total Respondents

{FIXED:COUNTD([Id])}

So I can now calculate

NPS

(SUM([Promoters])-SUM([Detractors]))/[Count Respondents]

Format this to percentage with 0 dp.

Let’s put this into another table so we can validate the data.

Ok, so this gives us the data required to plot the circle marks.

But we need to work out what’s need to plot the bars. And for this we need to know the respondents who are neither Promoters or Detractors.

Neutrals

IF [Response Group] = ‘Neutral’ THEN 1 ELSE 0 END

We are going to plot 2 bars for each row. 1 bar that represents the proportion of respondents who are in the positive side and 1 bar to represent the proportion of respondents who are in the negative side. The positive side mainly consist of the promoters, while the negative is the detractors. However, we need to consider the neutrals too, and we do this by halving the number and making half positive and half negative. Let’s see what I mean

# Respondents – Positive

(SUM([Promoters]) + (SUM([Neutrals])/2))/SUM([Total Respondents])

format this to percentage with 0 dp.

Similarly

# Respondents – Negative

-1 * ((SUM([Detractors]) + (SUM([Neutrals])/2))/SUM([Total Respondents]))

format this to percentage with 0 dp.

In this case the result is multipled by -1 as we want the bar to be plotted on the negative side of the axis.

Let’s pop these into the table too.

Ok. So now we have the core building blocks required to start building the viz.

Building the Likert Chart

Add Question – Display to Rows and # Respondents – Positive to Columns.

Click and drag # Respondents – Negative onto the canvas and drop the pill on the bottom axis when you see the 2 green columns icon appear.

This will automatically add the pill onto the same axis, and the view will update to use Measure Names and Measure Values

Remove Measure Names from the Rows and the bars will all appear on the same row. Sort the Question – Display field by NPS descending.

Add Response Group to the Colour shelf and adjust colours accordingly (using the Nuriel Stone palette). I also set the transparency to about 70%. You may need to reorder the values of Response Group – just do this manually in the colour legend if need be.

Increase the width of each row a bit, expand the Question – Display column, and remove the tooltips from displaying. Format the Question – Display column so it is aligned left and the font is a bit bolder/darker. Remove the Question – Display column heading (right click label > hide field labels for rows).

Format the axis to display as percentages to 0 dp, then edit the axis and

  • remove the title
  • fix the axis from -1.05 to 1.05
  • fix the tick marks to display every 0.5

Adjust the row banding to show, and remove any column/row dividers. Remove all gridlines. Adjust the column axis rulers and tick marks to be a solid dark line. Set the zero line for the columns to be a wider solid white line – it should just about be visible through the viz.

Now we need to label the agree and the disagree sections only. For this we need additional fields

Label – Positive

If MIN([Response Group]) = ‘Agree’ THEN [# Respondents – Positive] END

and

Label – Negative

If MIN([Response Group]) = ‘Disagree’ THEN [# Respondents – Negative] END

Add both these fields to the Label shelf and arrange side by side. Manually adjust the colour of the font to a dark grey.

This is the core design of a likert chart. It could be bult using dual axis too, but that wasn’t an option in this instance, as we need to add the additional NPS circles to the chart.

Adding the NPS circles

Add NPS to Columns. On the NPS marks card, remove all the fields, and change the mark type to circle. Add NPS to the Colour shelf, and adjust the colours so they range from the same yellow/green to blue/teal colours already used. The easiest way to do this is to click on the coloured square at each of the edit colour dialog and use the pick screen colour option to select the colours already used.

Set the colour to be 100% opacity and add a white border around the circles (via the Colour shelf)

Set the chart to be dual axis and synchronise the axis. Change the mark types back to bar and circle accordingly if they changed.

Now we can see some of the labels on the bar chart are positioned where the circles are/very close to them. Manually move those affected – click on the section of the bar to highlight it, then click on the label and when the cursor changes to a crosshatch, drag the label to where you want. Note – this isn’t something I would typically do if the data was to change behind this viz without any manual involvement.

Hide the top axis (uncheck show header) and remove all row/column dividers.

Now we just need to label the circles. I need additional fields for this. Firstly, the NPS field is on the wrong format – it’s actually stored as a decimal, formatted to a percentage. Secondly, I ended up with two label fields, due to the colouring. Sometimes when you add a label to a mark, the colour of the text may display black or white depending on the contrast with the underlying colour. Tableau automatically does this, and it isn’t something you can really control. As soon as you start manually adjusting any colours, it can be hard to get back to the right state. I fiddled around trying to get things to work properly for a while using the auto colouring (which involved rebuilding by starting with the NPS dot plot and then adding the bars), but it still wasn’t a ‘cast-iron’ solution. So I ended up creating

Label -NPS-Black

IF [NPS] * 100 <=70 THEN [NPS] *100 END

Label-NPS-White

IF [NPS]*100 >70 THEN [NPS]*100 END

I just chose an arbitrary ‘cut off’ for when the colour might change.

Add both these fields to the Label shelf of the NPS marks card, and arrange them side by side. Edit to font so the Label-NPS-White field is coloured with white font, and the other black. Set the alignment of the text to be middle centre. Adjust the size of the circles if required. Verify tooltips don’t exist on any marks.

Stopping the chart from being clicked

In the solution, if you try to click on the bar chart, you get no interaction – nothing highlights, like it does if you click on the axis or the questions. On a dashboard, this is managed by adding a floating blank object and carefully placing it over the top of the section you don’t want to be clickable.

And that should be it! My published viz is here.

I have to admit I did have a few false starts when building this out, and its highly probable this could have been created without all the fields I ended up with. It’s sometimes just the path your brain follows, and there’s nothing wrong with that!

Happy vizzin’!

Donna

Can you performance tune this workbook?

This week’s #WOW2021 challenge was a bit different. Rather than being challenged to recreate a viz, Candra challenged us to performance tune a workbook she’d put together, with the aim for all steps to be sub 0.5s.

I started by downloading the slow published workbook (waiting patiently for it to download), opening the workbook ( waiting patiently for it to open), and then assessing how it had been built.

It took a while to download… it was 99MB…

When it finally opened, I could see

  • The workbook had multiple data sources, which were being blended
  • There were cross data source filters being applied
  • There were lots of fields in the data source that weren’t being used
  • One of the data sources contained 2.2 million rows

Tackling these points initially was key.

The requirements stated that the end user only cares about OB/GYNS (specifically Pri Spec = OBSTETRICS/GYNECOLOGY), so the hint here was we’re allowed to restrict the size of the data source.

The requirements also stated any data prep/transformation had to happen in Tableau Desktop, hinting that some re-modelling would be required.

This means you do need to download the original data source files (which also took some time). My aim was to create a single data source, that would be restricted to the OB/GYNS data only, that I could then repoint the existing report to.

I created a new connection within the ‘slow’ workbook by connecting to the two files and relating them – the relationship of St to St should happen automagically 🙂

I then added Data Source Filters (right click on the data source > Edit Data Source Filters), and set the filters to restrict to the relevant Pri Spec and exclude certain states (which I found was applied when looking at the data source filters applied to one of the original data sources).

Next I repointed the workbook to use this new data source : right click on the original DAC_NationalDownloadableFile data source > Replace Data Source, and repoint to the DAC_NationalDownloadableFile.csv+ (Multiple Connections) data source

This caused an issue with the NPI field which was a string in the original and a number in the replacement.

Resolve this by right clicking on the NPI (1) field and select Replace References and point to the NPI field. This will then fix the issue with the Providers per Capita field too.

On the Providers per Capita sheet, remove both the pills from the Filter shelf and replace the Population field on the Tooltip shelf that is from the secondary data source (since there is an orange tick against it), with the Population field from the newly created data source (literally drag it and drop it onto the existing pill so it replaces it). You’ll need to format the Population field to millions with 1 dp.

Edit the Providers per Capita measure from

round(COUNTD([NPI])/(SUM([Sheet1 (State Regions)].[Population])/100000),0)

to

COUNTD([NPI])/(SUM([Population])/100000)

This pill is no longer referencing data from another data source. The ROUND isn’t necessary as this can be managed with the number formatting (so its one less calculation to be done and shaves a teeeeeny bit of time).

You can now remove the original 2 data sources (right click data source and close).

Now extract the remaining data source – right click > Extract Data. The data source filters you applied, will automatically be set to be extract filters. Also set the extract to Hide Unused Fields. The extract filters will reduce the number of rows in the data source; the hide unused fields will reduce the number of columns, all of which optimises the data source as much as it can be.

Change to the dashboard sheet now.

Remove the Primary Speciality filter as its no longer relevant (contains 1 value).

I chose to change the City filter to ‘All values in database‘ and to be a Wildcard Match input, so the user types in rather than selecting from a list. Using fields that have a high number of values as a filter and displaying all the possible values in the filter can impact performance. A separate query needs to be run to populate the list. Using ‘relevant values’ again causes a query to happen each time other selections are made with other filters/actions. This can also add overhead.

For the same reasoning, I also changed the NPI filter in the same way,

I also changed the Gender filter to be a Single Value List (ie radio button) entry, set to Include rather than Exclude values.

With these changes I ran the performance recording (Help menu -> settings and performance -> start performance recording), followed the required steps (select Texas, deselect Texas, set Gender to F, reset back to All), then stopped the performance recording ( Help menu -> settings and performance -> stop performance recording). This will automatically load a new workbook with the results.

The best I got to on my personal laptop is below – not the 0.5s suggested in the challenge, but a lot better than it was.

Running the same process against the same workbook on my more powerful work laptop, I got :

which suggest hardware does help 🙂

After a bit of discussion with another #WOW participant, it was suggested reordering the columns in the table (just reverse the order) should help. So I tried this, and shaved a bit more time (this is personal laptop, so compare to top image above).

I’m interested in seeing what other solutions and suggestions get shared and what Candra came up with herself.

Happy vizzin’! Stay Safe!

Donna

Can you recreate the work of Hans Rosling?

This week, Lorna set the challenge to recreate an iconic data viz by the legend, Hans Rosling. You ABSOLUTELY MUST watch the TED talk link from the challenge if you haven’t ever seen this before. Hans is utterly engaging in making this data ‘come to life’!

The challenge involves the following key components

  • modelling the data
  • building the scatter chart
  • displaying the year in the chart
  • highlighting the regions

Modelling the Data

The data provided consisted of 4 files

  1. Life Expectancy
  2. Population
  3. Income
  4. Region Mapping

The first 3 files are stored as a matrix of Country (rows) by Year (columns) with the appropriate measure in the intersection. This isn’t the ideal format for Tableau so the data needs to pivoted to show 3 columns – Country, Year and the relevant measure (life expectancy, population or income depending on which file is referenced).

In Tableau Desktop, connect to the Life Expectancy file. If you find the column names all seem to nave generic labels, then right click on the ‘object’ in the data source pane, and select Field names are in the first row.

This will change so all the years are now listed as column headings

Now we need to pivot this data; click on the 1800 column, then scroll across to the end, press shift and click on the final column to select all the year columns. While highlighted, right-click and select Pivot.

Your data will be reshaped into 3 columns

I then renamed each column as

  • Country
  • Year
  • Life Expectancy

I also chose to change the datatype of the Year column to be a number (whole), as I knew I’d be needing to relate the data on this field later, and working with numeric data is more efficient than strings.

Now I want to bring in the next file – Population (via the Add connection option). But before following these steps, just read on a bit, as I hit a hurdle, which at the point of writing I’m still a bit unclear about…..

I added the Population file into the data source pane. By default it showed the Edit Relationship dialog, but I had the same issues with the column headings.

So I closed the dialog, and fixed the headings by setting the Field names are the first row option. Trying to apply the pivot at this point wasn’t available – no Pivot option when I selected all the columns. So I then manually edited the relationship and set County = country (population)

This populated the data, but again when I tried to apply the pivot option, it wouldn’t display 😦

I’m not sure whether this is possible (Google hasn’t shown me that it isn’t, but it hasn’t shown me that it is either…). I’ve tried using csv and excel versions of the file…..

So until I see the official solution (or I find out otherwise), I carried out the following approach instead :

I removed Population object, so I was just left with the Life Expectancy object. I then went to sheet 1, right clicked on the data source and selected Extract Data -> Extract and when prompted saved the file as Life Expectancy.hyper.

I then started a new instance of Desktop, and connected to the Population file, and went through the steps outlined above – set the Field names are first row, selected all the years to pivot, renamed the fields, set the datatype of the Year field to a whole number, then extracted and saved as Population.hyper.

I then repeated the exact same process with the Income file.

Once I had the 3 hyper files created, I then instantiated another instance of Desktop, and connected to the Life Expectancy.hyper file.

This automatically added an object called Extract into the pane. I renamed this (via right click) to DS-Life Expectancy. I also found my field names hadn’t been retained, so I renamed these too.

I then added the Population.hyper file. I set the relationships as below (once again I’d lost field names 😦 )

I then renamed the object & columns – note it won’t let you add fields with the same name as in the other object in the data pane, so I suffixed the columns

The process was once again repeated to add the Income data

and finally the Region data was also added matching Country to Name

And now, after all that, I have the data in the format I need to build the scatter. Phew!

[Side Note] – whilst writing this blog, I have been messaging my fellow #WOW participant Rosario Gauna. We often check in with each other, if we’ve noticed something odd with the challenges, to sense check whether its an issue with the data/the requirement, or just our own personal interpretation/development. In this instance we confirmed we were both using v2020.4.2, BUT Rosario had been able to do exactly what I set out to do at the start of this blog!! She had a pivot option on her 2nd connection. We’ve tried to figure it out, but have been unable to… I am using Windows, but Rosario is on a Mac, so there might be something there….. It will be interesting to understand if anyone else encountered my problem…. I really didn’t think the modelling part of the challenge was intended to be as fiddly as I’ve described above.

Building the Scatter Chart

The data we have goes beyond 2021, so we need to filter. Add Year to the Filter shelf and set the condition as below

We also have some countries that aren’t mapped to a region (Eight Regions field), so all add this field to the filter shelf and set to exclude Nulls.

Set both these filters to Apply to All Using this Datasource as they’re relevant across all sheets that will be built.

Now build the scatter by

  • Add Year to the Pages shelf
  • Add Income to Columns and change to Dimension (rather than Sum) – still leave to be Continuous
  • Add Life Expectancy to Rows and change to Dimension (rather than Sum) – still leave to be Continuous
  • Add Country to Detail

If you change the Year on the pages control to 2021 you should get something like

It’s roughly the right shape, but not spread as expected.

If you examine the Income axis on the solution, you’ll see the scale isn’t uniform. This is because it’s using a logarithmic scale instead, which you can set by right-clicking on the Income axis -> Edit Axis and selecting the relevant checkbox.

Now we can add Population to Size, change the mark type to circle, and we also want to colour by the region, but we need a new field for this to get it in the format we need for tooltips etc:

Region

UPPER(REPLACE([Eight Regions],’_’,’ ‘))

Add this to Colour and adjust accordingly, reducing the transparency a bit. Format the tooltips and you should have the main chart

Displaying the Year in the Chart

On a new sheet build a very simple view that just shows the Year in the Text shelf, that has been formatted to a very large (72pt) point, centred in the middle, and a pale grey colour.

On the dashboard, add this Year sheet. Then when you add the Scatter sheet, add it as a Floating object and position directly on top of the Year sheet. The Year sheet won’t be visible, so navigate to your Scatter sheet, right click on the canvas and format. Set the background of the worksheet to None ie transparent

If you return to the dashboard, the Year sheet should now be visible.

Note, as you add more objects to your dashboard, you will have to move things around a bit to get the layout as desired.

Highlighting the Regions

Create a simple table that displays the Regions in a row as below (the Show Header on the Region pill on the Columns shelf is unchecked).

Add this to the dashboard, then add a Highlight Dashboard Action that sources from the Region view and affects/targets the Scatter chart.

The final comment to make is that you can’t remove the ‘speed control’ from the page control display. However, when published to Tableau Public, it doesn’t show, so I assume is not enabled (or broken) there.

Hopefully this now gives you what’s needed to rebuild this fabulous viz. My published version is here.

Please do let me know if you had issues with your data modelling like I did!

Happy vizzin’! Stay Safe!

Donna