How are EPL Football Teams performing this season compared to previous seasons?

I set this week’s challenge and I tried to deliver something to hopefully suit everyone wherever they are on their Tableau journey. The primary focus for this challenge is on table calculations, but there’s a few other features/functionality included too.

I love football – all the family are involved in it in some way, so thought I’d see what I could get from a different data set this week. The data set contains the results of matches played in the English Premier League for the last five seasons – from the 2020-21 season to the current season 2024-25. Matches up to the end of 2024 only are included.

As I wrote the challenge in such a way that should allow it to be ‘built upon’ from the beginner challenge, I’m going to author this blog in the same way. I’ll describe how to build the beginner challenge first, then will adapt /add on to that solution. Obviously, as in most cases, this is just how I built my solution – there may well be other ways to achieve the same result.

Beginner Challenge

Modelling the data

The data set provided displays 1 row per match with the teams being displayed in the Home and the Away columns respectively, and the FTR (full time result) column indicating if the Home team won (H), the Away team won (A) or if the match was a draw (D).

The first thing we need to do is pivot this data so we have 2 rows per match, with a column displaying the Team and another indicating if the team is the Home or Away team. To do this, in the data source window, select the Home and the Away columns (Ctrl-Click to multi-select – they’ll be highlighted in blue), then from the context menu on either column, select the Pivot option.

This will duplicate the rows, and generate two new fields called Pivot Field Names and Pivot Field Values. Rename the fields by double-clicking into the field heading

  • Pivot Field Names to Home or Away
  • Pivot Field Values to Team

Creating the points calculation

We need to determine how many points each team gained out of the match which is based on whether they won (3pts), lost (0pts) or drew (1pt). Create a calculated field

Pts

IF [Home or Away] = ‘Home’ AND [FTR] = ‘H’ THEN 3
ELSEIF [Home or Away] = ‘Away’ AND [FTR] = ‘A’ THEN 3
ELSEIF [FTR] = ‘D’ THEN 1
ELSE 0
END

Creating the cumulative points line chart

We want to create a line chart that displays the cumulative number of points each team has gained by each week in each season.

Start by adding Wk to Columns and Pts to Rows as these are the core 2 fields we want to plot. But we need to split this by Team and by season, so add Team and Season End Year to the Detail shelf.

This gives us all the data points we need, but at the moment, it’s currently just showing how many points each team gained per week in each season.

To get the cumulative value, we add a running total quick table calculation to the Pts field.

which gives us the display we need

While we can leave the SUM(Pts) field as is in the Rows, I tend to like to ‘bake’ this field into the data set, so I have a dedicated field representing the running total. I can create this field in 2 ways

  1. Create a calculated field called Cumulative Points Per Season which contains the text RUNNING_SUM(SUM([Pts])). Add this field to Rows instead of just Pts.
  2. Hold down Ctrl and then click on the SUM([Pts]) pill in Rows and drag into the left hand data pane and then release the mouse. This will automatically create a new field which you can rename to Cumulative Points Per Season. The field will already contain the text from the calculation used within the quick table calculation, and the field on Rows will automatically be updated.

Filtering the data

Add Team to the Filter shelf and when prompted just select a single entry eg Arsenal.

Show the Filter control. From the context menu, change the control to be a single value (list) display, and then select customise and uncheck the show all value option, so All can’t be selected.

Format the line

To make the line stepped, click the path button and select the step option

To identify the latest season without hardcoding, we can create

Is Latest Season

[Season End Year] = {MAX([Season End Year])}

The {MAX([Season End Year])} is a FIXED LOD (level of detail) calculation, and is a shortened notation for {FIXED: MAX([Season End Year])} which basically returns the latest value of Season End Year across every row in the data set. This calculation returns true if the Season End Year value of the row matches the overall value.

Add this field to the Colour shelf and adjust colours to suit. Also add the same field to the Size shelf

The line for the latest season is being displayed ‘behind’ the other seasons. To fix this, drag the True value in either the colour or size legend to be listed before the False value. Then edit the sizes from the context menu of the size legend, and check the reversed checkbox to make True the thicker line.

If the thick line seems too thick, adjust the mark size range to get it as you’d prefer

Label the line

Add Season End Year to the Label shelf. Align middle right. Allow labels to overlap marks. And match mark colour.

Create the Tooltip

We need an additional calculated field for this, as we want to display the season in the tooltip in the format 2024-2025 and not just the ending year of the season.

Season Start Year

[Season End Year]-1

Add this field to the Tooltip shelf, and then edit the Tooltip to build up the text in the required format. Use the Insert button on the tooltip to add referenced fields.

Final Formatting

To tidy up the display we want to

  • Change the axis titles
    • Right click on each axis and Edit Axis then adjust the title (or remove altogether if you don’t want a title to display)
  • Remove gridlines and axis ruler
    • Right click anywhere on the chart canvas, and select Format. In the left hand pane, select the format lines option and set grid Lines to None and axis rulers to None.
  • Set all text to dark purple
    • Select the Format menu option at the top of the screen and select Workbook. The under the All fonts section, change the colour to that required

  • Update the title to reference the selected team
    • double click in to the title of the sheet and amend accordingly, using the Insert option to add relevant fields.

Intermediate Challenge

For this part of the challenge we’re looking at using a dual axis to display another set of marks – these ones are circular and only up to 1 mark per season should display. As this now takes a bit more thought, and to help verify the calculations required, I’m going to build up the calculations I need in a tabular form first.

Defining the additional mark to plot

On a new sheet add Team, Season End Year and Wk to Rows. Set the latter 2 fields to be discrete (blue) pills. Add Cumulative Points Per Season to Text. Add Team to Filter and select Arsenal.

We need to identify the date of the last match played by each team, so we can use an LOD for this

Latest Date Per Team

{FIXED [Team] : MAX([Date])}

Add this to Rows as a discrete (blue pill) exact date. For Arsenal, the last match was on 27 Dec 2024, whereas for Chelsea it’s 22 Dec 2024.

With this, we can work out what the latest points are for each team in the current season.

Latest Points Per Team

WINDOW_MAX(IF MIN([Date]) = MIN([Latest Date Per Team]) THEN [Cumulative Points Per Season] END)

Breaking this down : the inner part of the statement says “if the date associated to the row matches the latest date, then return the points associated with that row”. Only 1 row in the table of data has a value at this point, all the rest of the rows are ‘nothing’. The WINDOW_MAX statement, then essentially ‘floods’ that value across every row in the data, because the ‘value’ returned by the inner statement is the maximum value (it’s higher than nothing). Add this field into the table.

We’re trying to identify the week in each season where the points are at least the same as the latest points. We’re going to capture the previous week’s points against every row.

Previous Points

LOOKUP([Cumulative Points Per Season],-1)

This is a table calculation that returns the value of the Cumulative Points Per Season from the previous row (-1). If we wanted the next row, the function parameter would be 1. 0 identifies the ‘current row’.

Add this to the table.

We can see the behaviour – The Previous Points associated to 2025 week 18 for Arsenal is 33, which is the value associated to Cumulative Points Per Season for week 17. But we can also see that week 38 from season 2024 is being reported as the previous points for week 1 of season 2025, which is wrong – we don’t want a previous value for this row.

To resolve, edit the table calculation of the Previous Points field and adjust so the calculation for Previous Points is just computing by the Wk field only.

With this we can identify the week in each season we want to ‘match’ against. In the case of the latest season, we just want the last row of data, but for previous seasons, we want to identify the first row where the number of points was at least the same as the latest points; the row where the points in the row are the same or greater than the latest points, and the points in the previous row are less.

Matching Week

//for latest season, just label latest record, otherwise find the week where the team had scored at least the same number of points as the current season so far
IF MIN([Is Latest Season]) AND LAST()=0 THEN TRUE
ELSE
IF NOT(MIN([Is Latest Season])) AND [Cumulative Points Per Season]>= [Latest Points Per Team] AND [Previous Points] < [Latest Points Per Team] THEN TRUE
ELSE FALSE
END
END

Add this to Rows and check the data. In the example below, in Season 2020-21, in week 26, Arsenal had 37 points. The previous week they had 34 points. Arsenal’s latest points are 36, so since 37 >=36 and 34 < 36, then week 26 is the matching week.

Looking at season 2023-24, in both week 15 and 16, Arsenal had 36 points. But only week 15 is highlighted as the match, as in week 14, Arsenal had 33 points so 36 >=36 and 33 <36, but for week 16, as the previous week was also 36, the 2nd half of the logic isn’t true : 36 is not less than 36.

So now we’ve identified the row in each season we want to display on the viz, we need to get the relevant points isolated in their own field too.

Matching Week Points

IF [Matching Week] THEN [Cumulative Points Per Season] END

Add this to the table

We now have data in a field we can plot.

Visualise the additional mark

If you want to retain your ‘Beginner’ solution, then the first step is to duplicate the Beginner worksheet, other wise, just build on what you have.

Add Matching Week Points to Rows to create an additional axis. By default only 1 mark may have displayed. Adjust the table calculation setting of the field, so the Latest Points Per Team calculation is computing by all fields except the Team field.

Change the mark type of the Matching Week Points marks card to Circle and remove Season End Year from the Label shelf (simply drag the pill against the T symbol off off the marks card)

Size the circles

We want the circles to be bigger, but if we adjust the Size, the lines change too, as the Size is being set based on the Is Latest Year pill on both marks cards. To resolve this, create a duplicate instance of Is Latest Year (right click pill and duplicate). This will automatically create Is Latest Season (copy) in the dimensions pane. Drag this onto the size shelf of the Matching Week Points marks card instead to make the sizing independent (you will probably find you need to readjust the table calculation for the Matching Week Points pill to include Is Latest Season (copy)). Then adjust the sizes as required.

Label the circles

Add Latest Points Per Team to the Label shelf of the Matching Week Points marks card. Adjust the table calculation setting, so the Latest Points Per Team calc is computing just by Wk only, so only the latest value displays.

Then format the Label so the text is aligned middle centre, is white & bold and slightly larger font.

Adjust the Tooltip text on the Matching Week Points mark, so it reads the same as on the line mark. You will need to reference the Matching Week Points value instead.

Then make the chart dual axis and synchronise the axis. Remove the Measure Names field that has automatically been added to the All marks card

Remove Label for Latest Season

We don’t want the season label to display for the current year, so create a new field

Label:Season

IF NOT([Is Latest Season]) THEN [Season End Year] END

Replace the Season End Year pill on the Label shelf of the Cumulative Points Per Season marks card, with this one instead.

Final Formatting

To tidy up

  • Remove right hand axis
    • right click on axis and uncheck show header
  • Remove 165 nulls indicator
    • right click on indicator and hide indicator
  • Remove row & column dividers
    • Right click on the canvas and Format. Select Format Borders and set Row and Column Divider values to None

Advanced Challenge

For the final part of the challenge we want to add some additional text to the tooltip and adjust the filter control. As before either duplicate the Intermediate challenge sheet or just build on.

We’ll start with the tooltip text.

Expand the Tooltip Text

For this, we’ll go back to the data table sheet we were working with to validate the calculations required.

We want to calculate the difference in the number of weeks between the latest week of the current season, and the week number of the matching record from previous seasons. So first, we want to identify the latest week of the current season, and ‘flood’ that over every row.

Latest Week Number Per Team

WINDOW_MAX(MAX(IF ([Date]) = ([Latest Date Per Team]) THEN [Wk] END))

This is the same logic we used above when getting the Latest Points Per Team, although as this time the Wk field isn’t already an aggregated field like Cumulative Points Per Season is, we have to wrap the conditional statement with an aggregation (eg MAX) before applying the WINDOW_MAX.

Add this to the table.

And then we need, the week number in each season where the match was found, but this needs to be spread across every row associated to that season.

Matching Week No Per Team and Season

WINDOW_MAX(IF [Matching Week] THEN MIN([Wk]) END)

Add to the table, but adjust the table calculation so the field is computing by all fields except the Season End Year.

Now we have these 2 fields, we can compute the difference

Week Difference

[Latest Week Number Per Team] – [Matching Week No Per Team and Season]

Add to the table. With this we can then define what this means for the text in the tooltip

Less/More

IF [Week Difference]<0 THEN ‘more’ ELSEIF [Week Difference]>0 then ‘less’
ELSE ”
END

and also build the tooltip text completely

Tooltip- other seasons

IF NOT(MIN([Is Latest Season])) THEN
IF [Week Difference] = 0 THEN ‘It took the same amount of weeks to accrue at least the same number of points as the current season’
ELSE ‘It took ‘ + STR(ABS([Week Difference])) + ‘ weeks ‘ + [Less/More] + ‘ to accrue at least the same number of points as the current season’
END
END

Add this to the Tooltip shelf of the Matching Week Points marks card. Adjust the Tooltip to reference the field.

Adjust the Tooltip – other seasons table calculation, so the Latest Points Per Team nested calculation is computing for all fields except Team (this will make the circles reappear)

and also adjust the Latest Week Number Per Team nested calculation to compute by all fields except Team. This should make the text in the tooltip appear.

Filtering by teams in the current season only

We need to get a list of the teams in the current season only, which we can define by

Filter Team

IF {FIXED [Team]: MAX([Season End Year])} = {MAX([Season End Year])} THEN [Team] END

Breaking this down: {FIXED [Team]: MAX([Season End Year])} returns the maximum season for each team, which is compared against the maximum season in the data set. So if there is a match, then name of the team is returned. Listing this out in a table we get Null for all the teams whose latest season didn’t match the overall maximum

We can then build a set off of this field. Right click on the Filter Team field Create > Set and click the Null field and select Exclude

Then add this to the Filter shelf, and the list of teams displayed in the Team filter display will be reduced.

And with that the challenge is completed. My published viz is here.

Happy vizzin’!

Donna

Can you create a jittered bar chart?

The theme of alternative chart types continued with Lorna setting this challenge to create a jittered bar chart. We needed to use the same fake survey data set used in week 4, so initially some data remodelling was required. I’ve already blogged that here, so please refer to the modelling the data section if you need help.

The questions all need to be grouped based on the text that preceded the question, so I created

Question Group

TRIM( SPLIT( [Question], “-“, 1 ) )

which split off the text in front of the first ‘-‘. I actually created this by right-clicking on the Question field > Transform > Custom Split and selecting to to split off the first 1 column using the – separator.

I also created a field to number the responses

Response No

CASE [Response]
WHEN ‘Strongly Disagree’ THEN 1
WHEN ‘Disagree’ THEN 2
WHEN ‘Neutral’ THEn 3
WHEN ‘Agree’ THEN 4
ELSE 5
END

Once I’d done that I was able to ratify the numbers by building a simple table with Response No and Response on Columns, Question Group on Rows, and CNT(Id) on Text.

Eyeballing these numbers against where the ‘bars’ in Lorna’s viz stopped and I figured that was on the right track. However then I wasn’t particularly sure what to do next.

Lorna’s instructions were very brief. I figured I’d need to use a jitter (ie RANDOM()) somewhere, and as I’d need to add Id to the Detail shelf at some point, to generate 1 dot per respondent, I deduced I’d also need

Total Respondents

{FIXED [Response], [Question Group]:COUNT([Id])}

Adding this into by table above and I got the same values as CNT(Id) which is what I expected.

After scratching my head a bit, I decided that Lorna was possibly being vague, as there was probably help ‘out there’. So I googled, and immediately stumbled upon this very useful blog post, which pointed me in the right direction for the additional calculations required.

I created

Columns

RANDOM()

Jitter

RANDOM()

note – due to the way RANDOM() works, Jitter and Columns won’t contain the same value.

Rows

[Jitter] * [Total Respondents]

And with these I could build out the viz

Add Response No, Response and Columns to Columns. Change Columns to be a dimension.

Add Question Group and Rows to Rows. Change Rows to be a dimension. Add Id to Detail.

Change the mark type to circle, reduce the size and add Question Group to Colour and adjust accordingly.

Finally add Question to Tooltip and adjust the tooltip. Then remove all gridlines, hide the Columns axis, change the title of the Rows axis, and remove the row and column titles (hide field labels for rows / columns).

A pretty short blog today! My published viz is here.

Happy vizzin’!

Donna

Can you visualise survey data?

It was Sean’s first challenge of 2023, and he asked us to recreate this chart . Sean referred to it in the requirements as a floating bar chart, others may refer to it as a diverging bar chart or likert chart.

Modelling the data

Fake survey data was provided which was structured with each question in a column

The first thing we need to do is to pivot the data, so we have 1 row per question, per respondent. On the data source tab in Tableau Desktop, multi-select the question columns (Ctrl-click), then click the context menu for one of the selected columns and select Pivot

Due to the width of the screen and the number of columns, you may have to do this in multiple steps. Just multi-select the next set of columns, and select Add Data to Pivot from the context menu instead.

Once complete you should have a data set with 3 columns

Rename Pivot Field Names to Question and Pivot Field Values to Response.

Building the calculations

All the questions have 5 options to respond to ranging from Strongly Disagree -> Disagree -> Neutral -> Agree – > Strongly Agree. Sean hinted we would need to group the questions based on their sentiment. Some questions were worded in a positive way, which meant ‘agree’ responses could be taken as a ‘good thing’. While other questions were worded in a negative way, meaning agree’ responses were actually a ‘bad thing’.

So the first thing to do was to group the questions. I used the group functionality (right click on Question -> Create -> Group).

Sentiment

All the questions where I deemed an ‘agree’ answer was a favourable response to the product I grouped under Positive.

All the question where I deemed an ‘agree’ answer was a non-favourable response to the product I grouped under Negative.

I then wanted to bucket the responses into 3 groups – Disagree, Neutral and Agree – based on the combination of the actual Response, and the Sentiment grouping.

Response Group

IF [Sentiment] = ‘Positive’ THEN
IF CONTAINS([Response],’Disagree’) THEN ‘Disagree’
ELSEIF CONTAINS([Response], ‘Agree’) THEN ‘Agree’
ELSE ‘Neutral’
END
ELSEIF [Sentiment] = ‘Negative’ THEN
IF CONTAINS([Response],’Disagree’) THEN ‘Agree’
ELSEIF CONTAINS([Response], ‘Agree’) THEN ‘Disagree’
ELSE ‘Neutral’
END
END

With this I am essentially swapping disagree responses against questions of a negative sentiment into the agree/positive grouping and vice versa.

Finally I need another couple of calculations just to sense check were we are

Count Respondents

COUNTD([Id])

Question – Display

TRIM( SPLIT( [Question], “-“, -1 ) )

This removes the text that precedes the ‘-‘ from the Question string. I actually created this field using the Split functionality. Right click on Question -> Transform -> Custom Split. Use the separator – and split off the Last 1 columns

This will generate the field int he dimensions pane which I then renamed.

If we put all these out into a table, we can see how the data is shaping up

The next step is to work out the NPS values. When I’ve dealt with NPS in the past, I’ve referred to terms such as Promoters and Detractors, with an NPS score being computed as (Number of Promoters – Number of Detractors)/Total No of Respondents.

In this case a Promoter is anyone who is in the agree bucket.

Promoters

IF [Response Group]=’Agree’ THEN 1 ELSE 0 END

while a Detractor is anyone who is in the disagree bucket

Detractors

IF [Response Group]=’Disagree’ THEN 1 ELSE 0 END

The total number of respondents is

Total Respondents

{FIXED:COUNTD([Id])}

So I can now calculate

NPS

(SUM([Promoters])-SUM([Detractors]))/[Count Respondents]

Format this to percentage with 0 dp.

Let’s put this into another table so we can validate the data.

Ok, so this gives us the data required to plot the circle marks.

But we need to work out what’s need to plot the bars. And for this we need to know the respondents who are neither Promoters or Detractors.

Neutrals

IF [Response Group] = ‘Neutral’ THEN 1 ELSE 0 END

We are going to plot 2 bars for each row. 1 bar that represents the proportion of respondents who are in the positive side and 1 bar to represent the proportion of respondents who are in the negative side. The positive side mainly consist of the promoters, while the negative is the detractors. However, we need to consider the neutrals too, and we do this by halving the number and making half positive and half negative. Let’s see what I mean

# Respondents – Positive

(SUM([Promoters]) + (SUM([Neutrals])/2))/SUM([Total Respondents])

format this to percentage with 0 dp.

Similarly

# Respondents – Negative

-1 * ((SUM([Detractors]) + (SUM([Neutrals])/2))/SUM([Total Respondents]))

format this to percentage with 0 dp.

In this case the result is multipled by -1 as we want the bar to be plotted on the negative side of the axis.

Let’s pop these into the table too.

Ok. So now we have the core building blocks required to start building the viz.

Building the Likert Chart

Add Question – Display to Rows and # Respondents – Positive to Columns.

Click and drag # Respondents – Negative onto the canvas and drop the pill on the bottom axis when you see the 2 green columns icon appear.

This will automatically add the pill onto the same axis, and the view will update to use Measure Names and Measure Values

Remove Measure Names from the Rows and the bars will all appear on the same row. Sort the Question – Display field by NPS descending.

Add Response Group to the Colour shelf and adjust colours accordingly (using the Nuriel Stone palette). I also set the transparency to about 70%. You may need to reorder the values of Response Group – just do this manually in the colour legend if need be.

Increase the width of each row a bit, expand the Question – Display column, and remove the tooltips from displaying. Format the Question – Display column so it is aligned left and the font is a bit bolder/darker. Remove the Question – Display column heading (right click label > hide field labels for rows).

Format the axis to display as percentages to 0 dp, then edit the axis and

  • remove the title
  • fix the axis from -1.05 to 1.05
  • fix the tick marks to display every 0.5

Adjust the row banding to show, and remove any column/row dividers. Remove all gridlines. Adjust the column axis rulers and tick marks to be a solid dark line. Set the zero line for the columns to be a wider solid white line – it should just about be visible through the viz.

Now we need to label the agree and the disagree sections only. For this we need additional fields

Label – Positive

If MIN([Response Group]) = ‘Agree’ THEN [# Respondents – Positive] END

and

Label – Negative

If MIN([Response Group]) = ‘Disagree’ THEN [# Respondents – Negative] END

Add both these fields to the Label shelf and arrange side by side. Manually adjust the colour of the font to a dark grey.

This is the core design of a likert chart. It could be bult using dual axis too, but that wasn’t an option in this instance, as we need to add the additional NPS circles to the chart.

Adding the NPS circles

Add NPS to Columns. On the NPS marks card, remove all the fields, and change the mark type to circle. Add NPS to the Colour shelf, and adjust the colours so they range from the same yellow/green to blue/teal colours already used. The easiest way to do this is to click on the coloured square at each of the edit colour dialog and use the pick screen colour option to select the colours already used.

Set the colour to be 100% opacity and add a white border around the circles (via the Colour shelf)

Set the chart to be dual axis and synchronise the axis. Change the mark types back to bar and circle accordingly if they changed.

Now we can see some of the labels on the bar chart are positioned where the circles are/very close to them. Manually move those affected – click on the section of the bar to highlight it, then click on the label and when the cursor changes to a crosshatch, drag the label to where you want. Note – this isn’t something I would typically do if the data was to change behind this viz without any manual involvement.

Hide the top axis (uncheck show header) and remove all row/column dividers.

Now we just need to label the circles. I need additional fields for this. Firstly, the NPS field is on the wrong format – it’s actually stored as a decimal, formatted to a percentage. Secondly, I ended up with two label fields, due to the colouring. Sometimes when you add a label to a mark, the colour of the text may display black or white depending on the contrast with the underlying colour. Tableau automatically does this, and it isn’t something you can really control. As soon as you start manually adjusting any colours, it can be hard to get back to the right state. I fiddled around trying to get things to work properly for a while using the auto colouring (which involved rebuilding by starting with the NPS dot plot and then adding the bars), but it still wasn’t a ‘cast-iron’ solution. So I ended up creating

Label -NPS-Black

IF [NPS] * 100 <=70 THEN [NPS] *100 END

Label-NPS-White

IF [NPS]*100 >70 THEN [NPS]*100 END

I just chose an arbitrary ‘cut off’ for when the colour might change.

Add both these fields to the Label shelf of the NPS marks card, and arrange them side by side. Edit to font so the Label-NPS-White field is coloured with white font, and the other black. Set the alignment of the text to be middle centre. Adjust the size of the circles if required. Verify tooltips don’t exist on any marks.

Stopping the chart from being clicked

In the solution, if you try to click on the bar chart, you get no interaction – nothing highlights, like it does if you click on the axis or the questions. On a dashboard, this is managed by adding a floating blank object and carefully placing it over the top of the section you don’t want to be clickable.

And that should be it! My published viz is here.

I have to admit I did have a few false starts when building this out, and its highly probable this could have been created without all the fields I ended up with. It’s sometimes just the path your brain follows, and there’s nothing wrong with that!

Happy vizzin’!

Donna

Can you build a butterfly chart?

Lorna Brown set the challenge this week to build this butterfly chart, so called because of the symmetrical display. I’ve built these before so hoped it wouldn’t be too taxing.

The data set isn’t very verbose, fields for gender & year specific values by age bracket

When connecting to this excel file, you need to tick the Use Data Interpreter checkbox which removes all the superfluous rows you can see in the excel file.

Now, I started by building a version that required no further data reshaping, which I’ve published here. This version uses two sets of dual axis, and I use the reversed axis feature to display the male figures on the left. This version required me to float the Population axis label onto the dashboard so it was central. Overall it works, the only annoying snag is that I have two 0M labels displayed on the bottom axis, since this isn’t a single axis. As a result, I came up with an alternative, which matches the solution, and which I’m going to blog about.

Reshaping the data

So once I connected to the data and applied the Use Data Interpreter option, I then selected the Males, 2021 and Females, 2021 columns, right clicked and selected Pivot

This results in a row for the Male figures and a row for the Female figures, with a column Pivot Names indicating the labels for each row. I renamed this to Gender, and the Pivot Values column to Population.

Building the Chart

On a sheet, add Age to Rows and Population to Columns and Gender to Colour. Set Stack Marks Off (Analysis > Stack Marks > Off).

You’ll see that both the values for the Male & Female data is displaying in the positive x-axis. We don’t want this. So let’s create

Population – Split

IF [Gender]= ‘Males, 2021’ THEN [Population]*-1 ELSE [Population] END

Replace the Population pill on Columns with Population Split

So this is the basic butterfly, but now we need to show the Total Population. Firstly, let’s create

Total Population per Age Bracket

{FIXED Age: SUM([Population])}

And then similarly to before, we need to display this in both directions against the Male and the Female side, so create

Total Population Split

IF [Gender]= ‘Males, 2021’ THEN [Total Population per Age Bracket]*-1 ELSE [Total Population per Age Bracket] END

Drag this field onto the Population Split axis until the two green column icon appears, then let go of the mouse. This is creating a combined axis, where multiple measures are on the same axis.

Move Measure Names from Rows to the Detail shelf and then change the symbol to the left, so the pill is also added to the Colour shelf. Adjust the colours of the 4 options accordingly.

Edit the axis and rename to Population. Then Format the axis, and set the scale to display in millions to 0dp

The tooltip needs to display % of total, so we’ll need

% of Total

SUM([Population]) / SUM([Total Population per Age Bracket])

Add this to the Tooltip shelf

We also need the population figures on the tooltip displayed differently, so explicitly add Population Split and Total Population Split to the Tooltip shelf. Use the context menu of the pills on this shelf, to set the display format in the pane to be millions to 2 decimal places. But this time, once set using the Number(Custom) dialog, switch to the Custom option and remove the – (minus sign) from the format string. This will make the values displayed against the Males which are plotted on the -ve axis, to show as a positive number.

Finally, set aliases against the Gender field, so it displays as Males or Females (right click on Gender > Aliases).

Now you have all the information to set the tooltip text.

Now add a title to the sheet to match that displayed, hide field labels for rows, and adjust any other formatting with row/column dividers etc that may be required, and add to a dashboard.

And that should be it. My published version of this build is here.

Happy vizzin’!

Donna

Can you recreate the work of Hans Rosling?

This week, Lorna set the challenge to recreate an iconic data viz by the legend, Hans Rosling. You ABSOLUTELY MUST watch the TED talk link from the challenge if you haven’t ever seen this before. Hans is utterly engaging in making this data ‘come to life’!

The challenge involves the following key components

  • modelling the data
  • building the scatter chart
  • displaying the year in the chart
  • highlighting the regions

Modelling the Data

The data provided consisted of 4 files

  1. Life Expectancy
  2. Population
  3. Income
  4. Region Mapping

The first 3 files are stored as a matrix of Country (rows) by Year (columns) with the appropriate measure in the intersection. This isn’t the ideal format for Tableau so the data needs to pivoted to show 3 columns – Country, Year and the relevant measure (life expectancy, population or income depending on which file is referenced).

In Tableau Desktop, connect to the Life Expectancy file. If you find the column names all seem to nave generic labels, then right click on the ‘object’ in the data source pane, and select Field names are in the first row.

This will change so all the years are now listed as column headings

Now we need to pivot this data; click on the 1800 column, then scroll across to the end, press shift and click on the final column to select all the year columns. While highlighted, right-click and select Pivot.

Your data will be reshaped into 3 columns

I then renamed each column as

  • Country
  • Year
  • Life Expectancy

I also chose to change the datatype of the Year column to be a number (whole), as I knew I’d be needing to relate the data on this field later, and working with numeric data is more efficient than strings.

Now I want to bring in the next file – Population (via the Add connection option). But before following these steps, just read on a bit, as I hit a hurdle, which at the point of writing I’m still a bit unclear about…..

I added the Population file into the data source pane. By default it showed the Edit Relationship dialog, but I had the same issues with the column headings.

So I closed the dialog, and fixed the headings by setting the Field names are the first row option. Trying to apply the pivot at this point wasn’t available – no Pivot option when I selected all the columns. So I then manually edited the relationship and set County = country (population)

This populated the data, but again when I tried to apply the pivot option, it wouldn’t display 😦

I’m not sure whether this is possible (Google hasn’t shown me that it isn’t, but it hasn’t shown me that it is either…). I’ve tried using csv and excel versions of the file…..

So until I see the official solution (or I find out otherwise), I carried out the following approach instead :

I removed Population object, so I was just left with the Life Expectancy object. I then went to sheet 1, right clicked on the data source and selected Extract Data -> Extract and when prompted saved the file as Life Expectancy.hyper.

I then started a new instance of Desktop, and connected to the Population file, and went through the steps outlined above – set the Field names are first row, selected all the years to pivot, renamed the fields, set the datatype of the Year field to a whole number, then extracted and saved as Population.hyper.

I then repeated the exact same process with the Income file.

Once I had the 3 hyper files created, I then instantiated another instance of Desktop, and connected to the Life Expectancy.hyper file.

This automatically added an object called Extract into the pane. I renamed this (via right click) to DS-Life Expectancy. I also found my field names hadn’t been retained, so I renamed these too.

I then added the Population.hyper file. I set the relationships as below (once again I’d lost field names 😦 )

I then renamed the object & columns – note it won’t let you add fields with the same name as in the other object in the data pane, so I suffixed the columns

The process was once again repeated to add the Income data

and finally the Region data was also added matching Country to Name

And now, after all that, I have the data in the format I need to build the scatter. Phew!

[Side Note] – whilst writing this blog, I have been messaging my fellow #WOW participant Rosario Gauna. We often check in with each other, if we’ve noticed something odd with the challenges, to sense check whether its an issue with the data/the requirement, or just our own personal interpretation/development. In this instance we confirmed we were both using v2020.4.2, BUT Rosario had been able to do exactly what I set out to do at the start of this blog!! She had a pivot option on her 2nd connection. We’ve tried to figure it out, but have been unable to… I am using Windows, but Rosario is on a Mac, so there might be something there….. It will be interesting to understand if anyone else encountered my problem…. I really didn’t think the modelling part of the challenge was intended to be as fiddly as I’ve described above.

Building the Scatter Chart

The data we have goes beyond 2021, so we need to filter. Add Year to the Filter shelf and set the condition as below

We also have some countries that aren’t mapped to a region (Eight Regions field), so all add this field to the filter shelf and set to exclude Nulls.

Set both these filters to Apply to All Using this Datasource as they’re relevant across all sheets that will be built.

Now build the scatter by

  • Add Year to the Pages shelf
  • Add Income to Columns and change to Dimension (rather than Sum) – still leave to be Continuous
  • Add Life Expectancy to Rows and change to Dimension (rather than Sum) – still leave to be Continuous
  • Add Country to Detail

If you change the Year on the pages control to 2021 you should get something like

It’s roughly the right shape, but not spread as expected.

If you examine the Income axis on the solution, you’ll see the scale isn’t uniform. This is because it’s using a logarithmic scale instead, which you can set by right-clicking on the Income axis -> Edit Axis and selecting the relevant checkbox.

Now we can add Population to Size, change the mark type to circle, and we also want to colour by the region, but we need a new field for this to get it in the format we need for tooltips etc:

Region

UPPER(REPLACE([Eight Regions],’_’,’ ‘))

Add this to Colour and adjust accordingly, reducing the transparency a bit. Format the tooltips and you should have the main chart

Displaying the Year in the Chart

On a new sheet build a very simple view that just shows the Year in the Text shelf, that has been formatted to a very large (72pt) point, centred in the middle, and a pale grey colour.

On the dashboard, add this Year sheet. Then when you add the Scatter sheet, add it as a Floating object and position directly on top of the Year sheet. The Year sheet won’t be visible, so navigate to your Scatter sheet, right click on the canvas and format. Set the background of the worksheet to None ie transparent

If you return to the dashboard, the Year sheet should now be visible.

Note, as you add more objects to your dashboard, you will have to move things around a bit to get the layout as desired.

Highlighting the Regions

Create a simple table that displays the Regions in a row as below (the Show Header on the Region pill on the Columns shelf is unchecked).

Add this to the dashboard, then add a Highlight Dashboard Action that sources from the Region view and affects/targets the Scatter chart.

The final comment to make is that you can’t remove the ‘speed control’ from the page control display. However, when published to Tableau Public, it doesn’t show, so I assume is not enabled (or broken) there.

Hopefully this now gives you what’s needed to rebuild this fabulous viz. My published version is here.

Please do let me know if you had issues with your data modelling like I did!

Happy vizzin’! Stay Safe!

Donna

Can you build a divergent histogram?

It was Ann Jackson’s last #WOW2020 challenge of the year, to build this divergent histogram depicting the average life expectancy by gender, with each bar indicating the number of countries with that life expectancy.

Ann hinted that reshaping the data may help, so I chose to do this by pivoting the data. To do this, in the data pane, I selected both the Life Expectancy Male and Life Expectancy Female columns, then right-clicked and selected Pivot.

This results in the number of rows in the data set being doubled, with a field called Pivot Field Names containing the values Life Expectancy Male and Life Expectancy Female, and then a field called Pivot Field Values which contains the life expectancy value. This field I renamed to Life Expectancy Age and moved it to the top section of the fields pane (above the line), so it is treated like a dimension rather than a measure.

We need to count the countries, so I created

Country Count

COUNTD([Country/Region])

Let’s see what this all looks like

Filtering by Year (2012), adding Life Expectancy Age to Rows, Pivot Field Names to Columns and Country Count to Text we get

Filtering out the Null and we have the basic data we want to plot, which we can do by

  • Year on Filter set to 2012
  • Life Expectancy Age on Filter set to exclude Null
  • Pivot Field Names on Columns
  • Life Expectancy Age on Columns (continuous)
  • Country Count on Rows
  • Mark Type set to Bar
  • Size set to Fixed

If we move Pivot Field Names from Columns to Colour, we get an overlapping bar.

But we want the males to be going down the chart, so let’s change the Country Count field to be

Country Count

IF CONTAINS(ATTR([Pivot Field Names]),’Female’) THEN COUNTD([Country/Region]) ELSE COUNTD([Country/Region])*-1 END

which is basically inverting the values for the male, and is pretty much the crux of the challenge 🙂

We need to label the max bar for each gender, or in this case the max & min. So showing mark labels and setting to just label the max and min values, we have a couple of issues….

… the labels are positioned in the centre, and males is labelled -13, rather than 13. To fix the positioning, we need to turn stacked marks off via the Analysis menu (took a bit of while to figure this one out).

To resolve the negative labelling, I formatted the County Count field to custom format to #,##0;#,##0

The final requirement is to ensure the axis range is identical. To do this I created reference lines based on the maximum value

Max Count Ref Line

WINDOW_MAX([Country Count])

which I then copied

Max Count Ref Line (Copy)

WINDOW_MAX([Country Count]) *-1

Adding both of these to the Detail shelf, and setting the table calculation to compute by the fields below

And apart from some formatting and the tooltips, that should be complete. My published viz is here. When I looked at Ann’s version, she tackled it differently, which is the beauty of these challenges – there’s often more than one solution.

Happy vizzin’! Stay Safe!

Donna

Can you build a Premier League table?

Luke provided this fun challenge this week. It was nice to see an English sport that I’m very familiar with feature, even if I had to keep retyping the word ‘Draw’ for ‘Tie’ 🙂

This blog will cover

  • Re-modelling the data
  • Building the table & bar chart
  • Building the Last 5 matches chart
  • Getting the data for the Last 5 matches chart tooltip

Re-modelling the data

Sometimes the #WOW challenges are set with a very specific direction in mind eg no LODs, use 1 sheet only, no data modelling etc. This challenge had no real restrictions, and there were minimal clues. The Latest Challenges page indicated they’d be table calculations involved

and Luke’s tweet introducing the challenge suggested there was a ‘trick’ that would make this a lot simpler to solve…. hmmmm???

I looked at the data referenced to get a feel of the shape and layout – it showed 1 row per game, with the home & away team for each game stored in different columns. Trying to count the number of games a specific team played was likely to be complicated in this format. Given Luke had suggested a ‘trick’, I decided I was going to pivot the data to expand the data to give me 1 row per game per team, hoping this was the ‘trick’ he implied, and there was no requirement to state that data modelling wasn’t allowed.

I did this in Tableau itself, in the data source pane. After connecting to the data, highlight the columns Home Team and Away Team, right click and select Pivot.

This has the effect of duplicating the rows of data and adding 2 additional columns to the end Pivot Field Names containing the column names that had been pivoted, and Pivot Field Values containing the values of those fields.

I then simply renamed Pivot Field Names to Home or Away and Pivot Field Values to Team.

With the data structured this way it was very simple to calculate the various measures required :

Win-Loss-Draw

IF [Home or Away] = ‘Home Team’ AND FTR = ‘H’ THEN ‘W’
ELSEIF [Home or Away] = ‘Away Team’ AND FTR = ‘A’ THEN ‘W’
ELSEIF FTR = ‘D’ THEN ‘T’
ELSE ‘L’
END

FTR in the data set stands for full-time result and indicated whether it was the Home (H) or the Away (A) team that won, or whether it was a Draw (D). So this field indicates whether the Team in each row won, lost or drew (tied) the match.

From this we can then work out the points the Team gained in the match.

Points

IF [Win-Loss-Draw] = ‘W’ THEN 3
ELSEIF [Win-Loss-Draw] = ‘T’ THEN 1
ELSE 0
END

To determine the total number of Wins for each team I created

Wins

FIXED [Team]: SUM(IF [Win-Loss-Draw] = ‘W’ THEN 1 ELSE 0 END)}

and then repeated this similarly to get a Losses and Ties measure.

Finally I determined the total number of games each team played by using the provided tablename (count) field that is now added in more recent versions of v2020 (supersedes the Number of Records field).

Matches Played

{FIXED [Team]: (COUNT([2020_11_11_WW46_EPL.csv]))}

Building the table & bar charts

In my initial built, I created one sheet to show both the table summary with the bar chart, and a separate sheet to show the last 5 matches, with the intention the charts would be lined up side by side on the dashboard, removing padding to make the charts look to be one.

But with the table and bar on the same chart, I couldn’t get the labels of each column and the ‘Total Points’ to be aligned in the way displayed. So I ended up using 3 different sheets.

Table

Given we’ve already got all the measures, this was a very simple table, with the Team field sorted descending by the Points measure.

Bar Chart

Again this is a pretty basic chart with Team on Rows,(which is then hidden) and Points on Columns, and Win-Lose-Draw on Colour.

To label the end of the chart with the total points, I created an additional measure

Total Points

{FIXED [Team]: SUM([Points])}

which I added to Columns and made dual axis, with the Total Points being a Gantt type, and the mark being labelled.

Building the last 5 matches chart

This chart is slightly more complex, and where the table calculations clue comes into play.

In the data, a match for each Team is identified by the Date, but we can’t use Date to plot against as not everyone has their matches on the same date. We just want to number each match played by each team in order of the date played.

We can use a field Index = INDEX() to number our matches per team – you’ll see that some teams have played 8 matches while others 7.

But we only want the last 5 matches, which for some teams will be matches 3-7 and others 4-8.

To do this, we need to determine the number of matches played. In hindsight I should have realised, I’ve already got this in a variable, but as my head was in ‘table calc’ mode, I created a field Size = SIZE()

As you can see, this returns the number of rows for each team against every row for that team.

I then created another calculated field

Last 5 matches only

[Index] > [Size]-5

Adding this into the table, and verifying the table calc settings for both the nested table calcs are set as above, you can see the last 5 rows of each team are set to True

This can then be added to the Filter shelf to reduce the rows shown.

However, as some teams have played less matches than others, the Index values shown aren’t consistent for every team, and we need this.

So another field is required

Index to Plot

//want a value from 1 to 5 based on the index
[Index] – ([Size]-5)

This gives us the 1-5 sequence for each row we need to build the chart

The layout below is the basic premise.

The circles need to be coloured based on the Win-Loss-Draw field, and the labelled with the same field which is set to match mark colour. Reducing the transparency of the colour will fade the circle colour and make the label stand out.

Getting the data for the Last 5 matches chart tooltip

So just by pivoting the data, I was able to crack through the measures and the charts pretty quickly, and felt happy that I’d worked out Luke’s trick.

But then I came to adding the tooltip to the circle chart which needs information about the opposition, and the score.

The score is always displayed on the tooltip as

Team ‘s score – Opposition score.

For example when Arsenal played Sheffield United at Home where Arsenal won 2-1, the score is displayed on the tooltip as 2-1.

And when Arsenal played Manchester City Away, where Manchester City won 1-0, the score for Arsenal is displayed on the tooltip as 0-1.

So to get the team’s score I created

Team Score

IF [Home or Away]=’Home Team’ THEN [Fthg] ELSE [Ftag] END

Where FTHG is Full Time Home Goals and FTAG is Full Time Away Goals. So if the team is listed as the Home team, their score is identified by FTHG, but if they are the Away team, their score is identified by FTAG.

Correspondingly, we need

Opposition Score

IF [Home or Away]=’Away Team’ THEN [Fthg] ELSE [Ftag] END

If the team we’re interested in is listed as the Away Team, then the opposition will be the Home team, so FTHG identifies the goals they scored, otherwise the opposition is the Away team, so FTAG is the goals they scored.

Both of these fields may sound complicated, but it was retrievable in the data I had.

However the actual name of the Opposition team wasn’t – this was lost once I pivoted the data – the Opposition is now located on a completely different row of data and as we have no ‘match id’ or similar in the data, I can’t actually identify who the opposition might have been, since multiple matches can play on the same date.

This is where I started to question whether ‘my trick’ was the right one…

I had a bit of a think, and came up with a solution, but it was a bit ‘off piste’ for a #WOW challenge as it required blending… typically when blending is required they’d be some mention of it in the requirements, but since there was nothing stated to say it couldn’t be used, I went with it.

I added the original unpivoted data set to my workbook as an additional data source. I renamed this with the word ‘Home‘ on the end.

When I work with blending, I try to be specific which fields the data sources are going to be linked/blended with.

So in the pivoted data source I’d been working with for most of the challenge, I create a new field

BLEND – Team

[Team]

Then in the Home data source I’d added, I also created a field with the same name, but it contained a different field.

BLEND – Team

[Home Team]

I then added another instance of the original unpivoted data source to my workbook, this time renaming it with the word Away on the end. In this data source I created

BLEND – Team

[Away Team]

so from the 3 data sources, I could now build up a table to verify I could get to the data I needed:

  • Pivoted Data.Team
  • Pivoted Data.Date
  • Home Data.Away Team
  • Away Data.Home Team

and when blended to the secondary data sources, the links for both BLEND – Team and Date fields are enabled

From this data, we can see that either the Home Team column or the Away Team column is populated, and provides the name of the opposition. So this means we can then create in the primary data source (the pivoted data source)

Opposition

IF ISNULL(ATTR([2020_11_11_WW46_EPL – Home].[Away Team])) THEN ATTR([2020_11_11_WW46_EPL – Away].[Home Team])
ELSE ATTR([2020_11_11_WW46_EPL – Home].[Away Team])
END

If the Away Team is NULL, then store the Home Team value, else store the Away Team value

And from this we now have all the building blocks we need to finalise the Last 5 matches chart.

Just make sure that as you add fields to the chart, you may need to alter the various table calculations to ensure they are still computing correctly (typically all the fields other than Team should be selected in the table calc dialog). And also keep an eye on the fields being blended – BLEND – Team and Date should always be selected for both.

After I posted my solution up, I did indicate I thought I might have done something a bit ‘out of the ordinary’, but apparently my approach match Luke’s! That was a real surprise.

My published version is here.

Happy vizzin’! Stay Safe!

Donna