Can you create a jitterfly chart?

This week it’s the Tableau Conference edition – and as I was at #data23, I’m a bit delayed in getting the challenge and solution posted – too much going on and too much sleep to catch up on over the weekend 🙂

It was great to catch up with the #WOW crew in person and to meet so many participants at the social nights out and at the hands on live session too. And thank you for all those who complimented me on this blog – it really does make the time and effort I put into it worthwhile, when I get to hear how much it has helped you all!

Here’s a few pics from the Tableau Conference, and a massive shout out to Chris McClellan, who made me a custom WOW t-shirt!

Ok, enough of the pre-amble, onto the challenge and solution. Lorna set this challenge to recreate what she referred to as a ‘jitterfly’ chart.

Modelling the data

The data needs to be downloaded from data world. This contains multiple files and it was expected to connect to the central_trend_2017_base.xslx file. This file contains multiple sheets. The Population – Females and Population – Males sheets need to be unioned together in the data source canvas.

Drag Population – Females onto the canvas, then drag Population- Males on too, and drop it when the Union option appears beneath the Population – Females

You’ll know you’ve done it right, if you only have a single object in the canvas and the union symbol is displayed

The data displays all years in separate columns – we need to transpose this, so we have a column for Year and a column for the population value.

Click on the first year column (2011), then scroll across to the last year column (2050). Hold down shift and click the last column, and all the columns in between should be highlighted/selected. Right-click on any of the selected columns and select Pivot.

Rename the Pivot Field Names column to Year and rename Pivot Field Values to Population.

To tidy things up a bit, hide the following fields (right click on column and hide) : Sheet, Table Name, Gss Code, Component. Also, change the data type of the Year field to be a number rather than a string. This should leave you with 5 columns

Navigate to Sheet 1 and in the left hand data pane, drag Age from the lower ‘measures’ section to above the line and into the dimensions section. Age isn’t going to be something we aggregate (sum/avg) etc – it is simply a categorical property of the record. This step isn’t critical, but I just like things to be neat :-).

Finally, though it isn’t stated in the requirements, the data relating to District = London needs to be excluded, as it is a summarised total of the other rows. To handle this I added this exclusion as a Data Source Filter, so once applied, I didn’t have to worry about it when I built the chart. Right click on the data source listed on the top left, and select Edit Data Source Filters, then add a condition to exclude the London District.

Creating the calculations

The chart requires a user to select two years to compare – I’ll refer to these as the Primary and Secondary year. We’ll create parameters to enable the selections.

pPrimaryYear

Integer parameter, defaulted to 2033 and displayed in ‘2033’ format (ie no commas). It should allow all other years to be selected (easiest way to get this populated is to right-click on the Year field in the data pane, and Create > Parameter). If you don’t do this, then use the Add values from button to select the Year field to populate the list.

pSecondaryYear

as above, but default to 2023 (the easiest way to create this parameter is to duplicate the first and just amend the name and default value).

With these parameters, we can then create the calculated fields needed to present the relevant values for each year and gender (as hinted by Lorna).

Population – Primary Male

IF [Year] = [pPrimaryYear] AND [Sex] = ‘male’ THEN [Population] END

format this to K with 1 dp

Population – Primary Female

IF [Year] = [pPrimaryYear] AND [Sex] = ‘female’ THEN [Population] END

format this to K with 1 dp

Population – Secondary Male

IF [Year] = [pSecondaryYear] AND [Sex] = ‘male’ THEN [Population] END

format this to K with 1 dp

Population – Secondary Female

IF [Year] = [pSecondaryYear] AND [Sex] = ‘female’ THEN [Population] END

format this to K with 1 dp

popping these into a table as below, you can see the results

The final calculation we need is for the age banding. We can’t use the in-built ‘bins’ function as the final ‘bin’ contains the ‘rest’ and not just 10 values. Also if we used ‘bins’ the ‘labels’ would be based on the data values and not a custom display as we have here.

Age Bracket

IF [Age] <= 10 THEN ‘<=10’
ELSEIF [Age] <=20 THEN ’11-20′
ELSEIF [Age] <=30 THEN ’21-30′
ELSEIF [Age] <=40 THEN ’31-40′
ELSEIF [Age]<=50 THEN ’41-50′
ELSEIF [Age]<=60 THEN ’51-60′
ELSEIF [Age]<=70 THEN ’61-70′
ELSEIF [Age]<=80 THEN ’71-80′
ELSE ’80+’
END

Note – I altered the calculation and label to be <=10 rather than just <10 as is shown in the solution

Building the Viz

On a new sheet, add Age Bracket to Rows, Population Primary Male to Columns and District to Detail. Change the mark type to circle. Manually re-sort the Age Bracket values so <=10 is listed at the top (just drag the value from the bottom to the top).

Drag Population Primary Female onto the canvas and drop it on the Population Primary Male axis when the double green column symbol appears

This will make the values for both male & female display on the same axis, and Measure Names and Measure Values automatically gets added to the viz. But the values are all displaying in the same direction (the positive axis).

To resolve this, double click into the Population Primary Female pill that is in the Measure Values box underneath the marks card and type in *-1 to the end of the pill and press return

The female values will then be displayed in the opposite direction. Adjust the colours of the Measure Names legend to suit.

To make the dots ‘jitter’, that is appear in a random vertical position, we need a measure (green pill) on the Rows so we generate a y-axis.

Now typically when I am creating jitter plots I use the undocumented RANDOM() function which generates a random number between 0 and 1. The function is undocumented, as it only works for some data sources (excel being one). Using RANDOM() was something I mentioned to several attendees of the Live WOW session at Tableau Conference.

However, due to a later requirement, you’ll need to use a different function instead – in this case INDEX(). For clarity I created an explicit calculated field for this

Jitter

INDEX()

INDEX()is a table calculation that creates a unique sequence number from 1 to n for each record in the table partition. In this case the partition is each Age Bracket. Add Jitter to Rows and adjust the table calculation setting so it is computing by District only. Set the fit of the chart to Fit Width to ensure you can see the display better.

Adjust the colour of the marks to 50% opacity.

To handle the detail displayed on the Tooltip we need to create some additional fields for the population values, these ones not split based on gender.

Population – Primary Year

IF [Year] = [pPrimaryYear] THEN [Population] END

format to K with 1 dp

Population – Secondary Year

IF [Year] = [pSecondaryYear] THEN [Population] END

format to K with 1 dp

Then

Population Change

(SUM([Population – Primary Year]) – SUM([Population – Secondary Year]))/SUM([Population – Primary Year])

custom format this as ▲0.0%;▼0.0%;0.0%

Add Sex, Population – Secondary Year, Population – Primary Year and Population Change to the Tooltip and adjust accordingly.

The coloured bands are based on the average for the ‘primary’ year and sex. Add Population Primary Female and Population Primary Male to the Detail shelf. Double click on Population Primary Female and type in *-1 to ensure you get the relevant negative value.

Right click on the ‘value’ x-axis and Add Reference Line.

Change the option to be a reference band per pane, and set a band to go from Population Primary Female *-1 : Average to a Constant of 0. Ensure no labels/tooltips display, and set the fill colour of the band accordingly.

Add another reference band for the which goes from constant 0 to the Average of Population Primary Male. Adjust fill colour to suit.

Add Population Secondary Male and Population Secondary Female to the Detail shelf. Double click into the Population Secondary Female pill and add *-1 to the end.

Right click on the ‘value’ x-axis and Add Reference Line.

Add a reference line that is the Average of the Population Secondary Female * -1 field. Adjust colour and thickness of line to suit (I used the middle thickness and line coloured at 80% transparency).

Repeat the same to add an average reference line for the Population Secondary Male field.

The final step is to add the age banding label into the centre of the viz.

Double click into the Columns shelf and type MIN(0). This will create a secondary axis with a new marks card. Remove all the pills except District from the MIN(0) marks card. Add Age Bracket to the Label shelf. Adjust the label properties as below – to label the Min/Max per pane; at the District field level, and label the maximum value only.

This positions the label in the same place on each age banding. This works because we have used INDEX() to control the jittering which means the maximum value is always the same for each bracket. If we had used RANDOM() to define the jittering, there would be no guarantee the same maximum value would have existed for every banding.

Reduce the opacity to 0% and size of the circle to be as small as possible on the MIN(0) marks card, and then make the chart dual axis and synchronise the axis.

Finally format the chart by

  • Hide the Age Bracket column (uncheck show header)
  • Hide the Jitter axis (uncheck show header)
  • Hide the MIN(0) axis (again uncheck show header)
  • Format the Value axis so the title is Population, and the scale is in 0K format and positive in both directions (custom format ,##0,”K”;#,##0,”K”) NOTE wrapping the K in “” ensures the display is retained when publishing to Tableau Public – thank you Deborah for the tip!)
  • Hide the nulls indicator (right click – hide indicator)
  • Remove all gridlines, zero line, axis ticks etc
  • Remove column dividers
  • Set Row dividers to white with the widest thickness
  • Set the chart to fit entire view

Add the sheet to a dashboard. Use a text object to display the title and sub-tile which should reference the pPrimaryYear and pSecondaryYear parameters. Float the parameter controls and resize to give the appearance of just the drop down option – this will take a bit of tweaking to get just right, and you may need to edit via Tableau Public to get the positioning right.

My published viz is here.

Happy vizzin’!

Donna

One thought on “Can you create a jitterfly chart?

Leave a comment