How are EPL Football Teams performing this season compared to previous seasons?

I set this week’s challenge and I tried to deliver something to hopefully suit everyone wherever they are on their Tableau journey. The primary focus for this challenge is on table calculations, but there’s a few other features/functionality included too.

I love football – all the family are involved in it in some way, so thought I’d see what I could get from a different data set this week. The data set contains the results of matches played in the English Premier League for the last five seasons – from the 2020-21 season to the current season 2024-25. Matches up to the end of 2024 only are included.

As I wrote the challenge in such a way that should allow it to be ‘built upon’ from the beginner challenge, I’m going to author this blog in the same way. I’ll describe how to build the beginner challenge first, then will adapt /add on to that solution. Obviously, as in most cases, this is just how I built my solution – there may well be other ways to achieve the same result.

Beginner Challenge

Modelling the data

The data set provided displays 1 row per match with the teams being displayed in the Home and the Away columns respectively, and the FTR (full time result) column indicating if the Home team won (H), the Away team won (A) or if the match was a draw (D).

The first thing we need to do is pivot this data so we have 2 rows per match, with a column displaying the Team and another indicating if the team is the Home or Away team. To do this, in the data source window, select the Home and the Away columns (Ctrl-Click to multi-select – they’ll be highlighted in blue), then from the context menu on either column, select the Pivot option.

This will duplicate the rows, and generate two new fields called Pivot Field Names and Pivot Field Values. Rename the fields by double-clicking into the field heading

  • Pivot Field Names to Home or Away
  • Pivot Field Values to Team

Creating the points calculation

We need to determine how many points each team gained out of the match which is based on whether they won (3pts), lost (0pts) or drew (1pt). Create a calculated field

Pts

IF [Home or Away] = ‘Home’ AND [FTR] = ‘H’ THEN 3
ELSEIF [Home or Away] = ‘Away’ AND [FTR] = ‘A’ THEN 3
ELSEIF [FTR] = ‘D’ THEN 1
ELSE 0
END

Creating the cumulative points line chart

We want to create a line chart that displays the cumulative number of points each team has gained by each week in each season.

Start by adding Wk to Columns and Pts to Rows as these are the core 2 fields we want to plot. But we need to split this by Team and by season, so add Team and Season End Year to the Detail shelf.

This gives us all the data points we need, but at the moment, it’s currently just showing how many points each team gained per week in each season.

To get the cumulative value, we add a running total quick table calculation to the Pts field.

which gives us the display we need

While we can leave the SUM(Pts) field as is in the Rows, I tend to like to ‘bake’ this field into the data set, so I have a dedicated field representing the running total. I can create this field in 2 ways

  1. Create a calculated field called Cumulative Points Per Season which contains the text RUNNING_SUM(SUM([Pts])). Add this field to Rows instead of just Pts.
  2. Hold down Ctrl and then click on the SUM([Pts]) pill in Rows and drag into the left hand data pane and then release the mouse. This will automatically create a new field which you can rename to Cumulative Points Per Season. The field will already contain the text from the calculation used within the quick table calculation, and the field on Rows will automatically be updated.

Filtering the data

Add Team to the Filter shelf and when prompted just select a single entry eg Arsenal.

Show the Filter control. From the context menu, change the control to be a single value (list) display, and then select customise and uncheck the show all value option, so All can’t be selected.

Format the line

To make the line stepped, click the path button and select the step option

To identify the latest season without hardcoding, we can create

Is Latest Season

[Season End Year] = {MAX([Season End Year])}

The {MAX([Season End Year])} is a FIXED LOD (level of detail) calculation, and is a shortened notation for {FIXED: MAX([Season End Year])} which basically returns the latest value of Season End Year across every row in the data set. This calculation returns true if the Season End Year value of the row matches the overall value.

Add this field to the Colour shelf and adjust colours to suit. Also add the same field to the Size shelf

The line for the latest season is being displayed ‘behind’ the other seasons. To fix this, drag the True value in either the colour or size legend to be listed before the False value. Then edit the sizes from the context menu of the size legend, and check the reversed checkbox to make True the thicker line.

If the thick line seems too thick, adjust the mark size range to get it as you’d prefer

Label the line

Add Season End Year to the Label shelf. Align middle right. Allow labels to overlap marks. And match mark colour.

Create the Tooltip

We need an additional calculated field for this, as we want to display the season in the tooltip in the format 2024-2025 and not just the ending year of the season.

Season Start Year

[Season End Year]-1

Add this field to the Tooltip shelf, and then edit the Tooltip to build up the text in the required format. Use the Insert button on the tooltip to add referenced fields.

Final Formatting

To tidy up the display we want to

  • Change the axis titles
    • Right click on each axis and Edit Axis then adjust the title (or remove altogether if you don’t want a title to display)
  • Remove gridlines and axis ruler
    • Right click anywhere on the chart canvas, and select Format. In the left hand pane, select the format lines option and set grid Lines to None and axis rulers to None.
  • Set all text to dark purple
    • Select the Format menu option at the top of the screen and select Workbook. The under the All fonts section, change the colour to that required

  • Update the title to reference the selected team
    • double click in to the title of the sheet and amend accordingly, using the Insert option to add relevant fields.

Intermediate Challenge

For this part of the challenge we’re looking at using a dual axis to display another set of marks – these ones are circular and only up to 1 mark per season should display. As this now takes a bit more thought, and to help verify the calculations required, I’m going to build up the calculations I need in a tabular form first.

Defining the additional mark to plot

On a new sheet add Team, Season End Year and Wk to Rows. Set the latter 2 fields to be discrete (blue) pills. Add Cumulative Points Per Season to Text. Add Team to Filter and select Arsenal.

We need to identify the date of the last match played by each team, so we can use an LOD for this

Latest Date Per Team

{FIXED [Team] : MAX([Date])}

Add this to Rows as a discrete (blue pill) exact date. For Arsenal, the last match was on 27 Dec 2024, whereas for Chelsea it’s 22 Dec 2024.

With this, we can work out what the latest points are for each team in the current season.

Latest Points Per Team

WINDOW_MAX(IF MIN([Date]) = MIN([Latest Date Per Team]) THEN [Cumulative Points Per Season] END)

Breaking this down : the inner part of the statement says “if the date associated to the row matches the latest date, then return the points associated with that row”. Only 1 row in the table of data has a value at this point, all the rest of the rows are ‘nothing’. The WINDOW_MAX statement, then essentially ‘floods’ that value across every row in the data, because the ‘value’ returned by the inner statement is the maximum value (it’s higher than nothing). Add this field into the table.

We’re trying to identify the week in each season where the points are at least the same as the latest points. We’re going to capture the previous week’s points against every row.

Previous Points

LOOKUP([Cumulative Points Per Season],-1)

This is a table calculation that returns the value of the Cumulative Points Per Season from the previous row (-1). If we wanted the next row, the function parameter would be 1. 0 identifies the ‘current row’.

Add this to the table.

We can see the behaviour – The Previous Points associated to 2025 week 18 for Arsenal is 33, which is the value associated to Cumulative Points Per Season for week 17. But we can also see that week 38 from season 2024 is being reported as the previous points for week 1 of season 2025, which is wrong – we don’t want a previous value for this row.

To resolve, edit the table calculation of the Previous Points field and adjust so the calculation for Previous Points is just computing by the Wk field only.

With this we can identify the week in each season we want to ‘match’ against. In the case of the latest season, we just want the last row of data, but for previous seasons, we want to identify the first row where the number of points was at least the same as the latest points; the row where the points in the row are the same or greater than the latest points, and the points in the previous row are less.

Matching Week

//for latest season, just label latest record, otherwise find the week where the team had scored at least the same number of points as the current season so far
IF MIN([Is Latest Season]) AND LAST()=0 THEN TRUE
ELSE
IF NOT(MIN([Is Latest Season])) AND [Cumulative Points Per Season]>= [Latest Points Per Team] AND [Previous Points] < [Latest Points Per Team] THEN TRUE
ELSE FALSE
END
END

Add this to Rows and check the data. In the example below, in Season 2020-21, in week 26, Arsenal had 37 points. The previous week they had 34 points. Arsenal’s latest points are 36, so since 37 >=36 and 34 < 36, then week 26 is the matching week.

Looking at season 2023-24, in both week 15 and 16, Arsenal had 36 points. But only week 15 is highlighted as the match, as in week 14, Arsenal had 33 points so 36 >=36 and 33 <36, but for week 16, as the previous week was also 36, the 2nd half of the logic isn’t true : 36 is not less than 36.

So now we’ve identified the row in each season we want to display on the viz, we need to get the relevant points isolated in their own field too.

Matching Week Points

IF [Matching Week] THEN [Cumulative Points Per Season] END

Add this to the table

We now have data in a field we can plot.

Visualise the additional mark

If you want to retain your ‘Beginner’ solution, then the first step is to duplicate the Beginner worksheet, other wise, just build on what you have.

Add Matching Week Points to Rows to create an additional axis. By default only 1 mark may have displayed. Adjust the table calculation setting of the field, so the Latest Points Per Team calculation is computing by all fields except the Team field.

Change the mark type of the Matching Week Points marks card to Circle and remove Season End Year from the Label shelf (simply drag the pill against the T symbol off off the marks card)

Size the circles

We want the circles to be bigger, but if we adjust the Size, the lines change too, as the Size is being set based on the Is Latest Year pill on both marks cards. To resolve this, create a duplicate instance of Is Latest Year (right click pill and duplicate). This will automatically create Is Latest Season (copy) in the dimensions pane. Drag this onto the size shelf of the Matching Week Points marks card instead to make the sizing independent (you will probably find you need to readjust the table calculation for the Matching Week Points pill to include Is Latest Season (copy)). Then adjust the sizes as required.

Label the circles

Add Latest Points Per Team to the Label shelf of the Matching Week Points marks card. Adjust the table calculation setting, so the Latest Points Per Team calc is computing just by Wk only, so only the latest value displays.

Then format the Label so the text is aligned middle centre, is white & bold and slightly larger font.

Adjust the Tooltip text on the Matching Week Points mark, so it reads the same as on the line mark. You will need to reference the Matching Week Points value instead.

Then make the chart dual axis and synchronise the axis. Remove the Measure Names field that has automatically been added to the All marks card

Remove Label for Latest Season

We don’t want the season label to display for the current year, so create a new field

Label:Season

IF NOT([Is Latest Season]) THEN [Season End Year] END

Replace the Season End Year pill on the Label shelf of the Cumulative Points Per Season marks card, with this one instead.

Final Formatting

To tidy up

  • Remove right hand axis
    • right click on axis and uncheck show header
  • Remove 165 nulls indicator
    • right click on indicator and hide indicator
  • Remove row & column dividers
    • Right click on the canvas and Format. Select Format Borders and set Row and Column Divider values to None

Advanced Challenge

For the final part of the challenge we want to add some additional text to the tooltip and adjust the filter control. As before either duplicate the Intermediate challenge sheet or just build on.

We’ll start with the tooltip text.

Expand the Tooltip Text

For this, we’ll go back to the data table sheet we were working with to validate the calculations required.

We want to calculate the difference in the number of weeks between the latest week of the current season, and the week number of the matching record from previous seasons. So first, we want to identify the latest week of the current season, and ‘flood’ that over every row.

Latest Week Number Per Team

WINDOW_MAX(MAX(IF ([Date]) = ([Latest Date Per Team]) THEN [Wk] END))

This is the same logic we used above when getting the Latest Points Per Team, although as this time the Wk field isn’t already an aggregated field like Cumulative Points Per Season is, we have to wrap the conditional statement with an aggregation (eg MAX) before applying the WINDOW_MAX.

Add this to the table.

And then we need, the week number in each season where the match was found, but this needs to be spread across every row associated to that season.

Matching Week No Per Team and Season

WINDOW_MAX(IF [Matching Week] THEN MIN([Wk]) END)

Add to the table, but adjust the table calculation so the field is computing by all fields except the Season End Year.

Now we have these 2 fields, we can compute the difference

Week Difference

[Latest Week Number Per Team] – [Matching Week No Per Team and Season]

Add to the table. With this we can then define what this means for the text in the tooltip

Less/More

IF [Week Difference]<0 THEN ‘more’ ELSEIF [Week Difference]>0 then ‘less’
ELSE ”
END

and also build the tooltip text completely

Tooltip- other seasons

IF NOT(MIN([Is Latest Season])) THEN
IF [Week Difference] = 0 THEN ‘It took the same amount of weeks to accrue at least the same number of points as the current season’
ELSE ‘It took ‘ + STR(ABS([Week Difference])) + ‘ weeks ‘ + [Less/More] + ‘ to accrue at least the same number of points as the current season’
END
END

Add this to the Tooltip shelf of the Matching Week Points marks card. Adjust the Tooltip to reference the field.

Adjust the Tooltip – other seasons table calculation, so the Latest Points Per Team nested calculation is computing for all fields except Team (this will make the circles reappear)

and also adjust the Latest Week Number Per Team nested calculation to compute by all fields except Team. This should make the text in the tooltip appear.

Filtering by teams in the current season only

We need to get a list of the teams in the current season only, which we can define by

Filter Team

IF {FIXED [Team]: MAX([Season End Year])} = {MAX([Season End Year])} THEN [Team] END

Breaking this down: {FIXED [Team]: MAX([Season End Year])} returns the maximum season for each team, which is compared against the maximum season in the data set. So if there is a match, then name of the team is returned. Listing this out in a table we get Null for all the teams whose latest season didn’t match the overall maximum

We can then build a set off of this field. Right click on the Filter Team field Create > Set and click the Null field and select Exclude

Then add this to the Filter shelf, and the list of teams displayed in the Team filter display will be reduced.

And with that the challenge is completed. My published viz is here.

Happy vizzin’!

Donna

Let’s go streaking!

It was Sean Miller’s turn to set the challenge this week, where the primary focus was to find the highest number of consecutive months where the monthly sales value was higher than the previous month.

This was a table calculations based challenge, and I always tackle these by building out the data required in a tabular format. The challenge was also reminiscent of a previous challenge Sean has set, which I’ve blogged about here, and admit I used as a reference myself.

So let’s get started.

To start with, we need the month date, the Sub-Category, the Sales value and the difference in Sales from the previous month. For the month date, I like to define this explicitly

Order Date Month

DATE(DATETRUNC(‘month’,[Order Date]))

This aligns all Order Dates to the 1st of the relevant month.

Add Sales Category, Order Date Month (set to discrete exact date blue pill), and Sales into a view, then set a Quick Table Calculation of Difference on the Sales pill

Edit the table calculation to compute by Order Date Month only, so the previous calculation restarts at each Sub-Category.

Then drag this pill from the marks card into the left hand data pane to ‘bake’ the calculated field into the data model. Name the field Sales Diff. The re-add Sales back into the view too, so you can double check the figures.

Identify whether there is an increase with the field

Diff is +ve

IF [Sales Diff]>0 THEN 1 ELSE 0 END

Add this into the view too, and verify the calculation is computing by Order Date Month only again.

Now we need to work out if the row matches the previous value

Match Prev Value

LOOKUP([Diff Is +ve],-1) = [Diff Is +ve]

The LOOKUP is looking at the previous row (identified by the -1) and comparing to the current. If they match then it returns True else False.

Again add into the view, and again double check the table calc settings. In this case there is nested calculations so you need to double check the settings against each calc referenced in the drop down

Now we need to work out when there are consecutive increases, and how many of them there are

Increase Streak

IF (NOT([Match Prev Value])) AND ([Diff Is +ve] = 1) THEN 1
ELSEIF [Diff Is +ve] = 1 THEN ([Diff Is +ve]+PREVIOUS_VALUE([Diff Is +ve]))

END

If the current row has a +ve difference and the previous row wasn’t +ve, then we’re at the start of an increase streak, so set to 1. Else, if the current row has a +ve difference then we must be on a consecutive increase, so add to the previous row, and this becomes a recursive calculation, so builds up the values..

Add this onto the view, set the table calc settings, and you can see how this is working…

So now we’ve identified the streaks in each Sub-Category, we just want the maximum value.

Longest Streak

WINDOW_MAX([Increase Streak])

Add this and set the table calc setting again. You’ll see the max value is spread across every row per Sub-Category.

Finally we need to identify Sales values in the months when the streak is at its highest.

Sales of Month with Longest Streak

IF [Longest Streak]=[Increase Streak] THEN SUM([Sales]) END

Add this into the view again (don’t forget those table calc settings), and you’ll notice that for some Sub-Categorys there are multiple points with the same max streak

With all this we can now build the viz, which is relatively straight forward….

Add Order Date Month (exact date, continuous green pill) to Columns, Sub-Category to Rows and Sales to Rows. Edit the Sales axis to be independent, then change the line type of the Path to stepped

Add Sales of Month with Longest Streak to Rows and set to dual axis, and synchronise. Make sure the mark type of the 2nd axis is set to circle, and remove Measure Names from the colour shelf of both marks.

Manually set the colour of the line chart to grey. Add Longest Streak to the Colour shelf of the circle marks card. Adjust the colour to use the green palette, set to stepped of 5 value and ensure the range starts at 0 and ends at 5 (don’t forget to edit the table calc settings!).

Now add Longest Streak as a discrete blue pill to the view too.

This is all the core components. The last thing we need to do is sort the list. I wasn’t entirely sure how it had been sorted, apart from the largest Longest Streak at the top. I created a new field for this

Sort

[Longest Streak]*-1

and added this as a blue discrete pill in front of Sub-Category….

…, then hid the column.

Then just apply the tooltip and relevant formatting on the chart.

For the legend, I created a new field

Legend

CASE [Sub-Category]
WHEN ‘Art’ THEN 0
WHEN ‘Chairs’ THEN 1
WHEN ‘Labels’ THEN 2
WHEN ‘Paper’ THEN 3
WHEN ‘Phones’ THEN 4
ELSE 5 END

and added this into a new sheet as below

The components then just need to be added to the dashboard. My published version is here.

Happy vizzin’! Stay Safe!

Donna

How often is Sean listening to his favourite songs?

A “scatter plot with a twist” music based challenge this week was posed by Sean Miller, using data from his last.fm account. On the surface, a simple scatter plot of each song, comparing the number of days between first and last listen against total plays. But click on song, and you get to see a timeline depicting days since first play vs cumulative plays, on the same chart. Hmmm…

The data set provided consists of 1 row per song per date played since 2017. For the scatter plot, we’re looking to summarise the data at a song level. For the timeline related to the selected song, we need to have the data at the song and date level. Ultimately we’re mixing levels of granularity within a single chart.

This certainly took some thinking. Creating the 2 charts independently was pretty straightforward, but trying to put them together took some thought. I knew I was going to want to use a set action to drive the interactivity and apply different logic based on whether a song was selected or not, but it took a bit of trial and error to get a solution.

First up, the data provided contained some timestamp date fields, but these were string data types. I chose to use the Timestamp UTC field to get a ‘proper’ date. I actually did this by duplicating the field, renaming it to Date Played and changing the datatype to a date. This generated a calculation which is below… I’d have never of typed this myself 🙂

Date Played

DATE(IF NOT ISNULL( DATEPARSE ( “MM/dd/yyyy HH:mm”, [TimeStamp UTC] ) ) THEN DATEPARSE ( “MM/dd/yyyy HH:mm”, [TimeStamp UTC] ) ELSEIF NOT ISNULL ( DATEPARSE ( “MM/dd/yyyy hh:mm:ss a”, [TimeStamp UTC] ) ) THEN DATEPARSE ( “MM/dd/yyyy hh:mm:ss a”, [TimeStamp UTC] ) END)

Now we have that, we can work out the first date a song was played

Min Date Per Song

{FIXED [Song ID]:MIN([Date Played])}

and the latest dates

Max Date Per Song

{FIXED [Song ID]:MAX([Date Played])}

and then we can derive the days between

Days since first listen

DATEDIFF(‘day’, [Min Date per Song], [Max Date per Song])

We can also get the total plays per song using

Total Plays per Song

{FIXED [Song ID]: [Total Plays]}

and with these calculated fields, we can build the basic scatter, using Total Plays per Song as a range filter, and setting both the axes not to start at zero.

So far so good. Now let’s think about the timeline. We need to identify a ‘selected song’ to help build this, so let’s create a set by right clicking on Song ID > Create > Set, and selecting a single option

We need to capture the number of plays on a date

Total Plays

COUNT([2021_05_26_WW21_My Streaming Activity.csv])

This is the default count field generated (the equivalent of the Number of Records if you’re on older versions of Tableau).

And we also need to capture the number of days from first play date to the current date, as we need to plot on a consistent axis when putting the charts together (ie we can’t plot with date on the axis).

Days to Date

DATEDIFF(‘day’,[Min Date per Song],[Date Played])

Let’s put this out into a table so you can see what’s going on.

Add the Selected Song = True to the filter shelf

There are days when there were multiple plays, so the Days To Date field needs to be set to AVG rather than SUM, to get the correct figure.

For Total Plays we need to plot the cumulative value, so we can set a quick table calculation against the Total Plays field of Running Sum (right click on Total Plays field). We’re going to explicitly set the table calculation to compute by Date Played as when depicted on a viz, the default of table down, might not give the correct values.

So with these fields, we can build the timeline viz (duplicate the table sheet and move the fields around). Set the mark type to line, and change the setting on the Path to make a stepped chart.

Ok, so now we have the 2 charts and hopefully understand what we’re aiming for. But how do we now go about getting everything onto a single chart?

We’re going to need a dual axis chart, since we have different mark types in play. And we’re going to want to plot different measures depending on whether we’re working with the selected song or not.

We’re going to build the data up in a table to get the logic for the fields we need. To just test the concept, we’ll filter to just a few songs, including the one in the Selected Song set. Add Song ID to the filter shelf and filter to a few songs.

Then build out a table as below:

What we’re aiming for, is for the records where In/Out of the set is Out, we want to plot the information we’re getting from the 1st two columns, but for the records where In/Out the set is In, we want the information from the other columns.

So let’s build this out.

Days to Plot

IF ATTR([Selected Song]) THEN AVG([Days To Date]) ELSE SUM([Days since first listen]) END

Plays to Plot

IF ATTR([Selected Song]) THEN RUNNING_SUM([Total Plays]) ELSE SUM([Total Plays per Song]) END

Pop these in the table, making sure any fields which are table calculations are set to compute by Date Played.

If you scroll to find where the data changes from a song out of the set to the one in, you can see how the two new fields are working.

So lets try plotting the chart out using these fields instead.

  • Days to Plot on Columns
  • Total Plays Per Song on Rows
  • Song ID on Detail shelf
  • Plays to Plot on Columns
  • Date Played (set to exact date) on Detail of the All Marks card.
  • Set the table calculation of Plays to Plot to compute by Date Played only.
  • Change mark type of the Plays to Plot to Line and set the Path to stepped line
  • Add Total Plays Per Song to Filter shelf and set to range from 50 to 100 (just to make the chart less busy).

You should end up with the below

You’ll notice we have the line of circles in the top chart, which is plotting a mark per day for the Selected Song. If we remove the Date Played pill from the Detail shelf of the Total Plays Per Song marks card, we lose these marks including the mark for the Selected Song too, which we need.

To resolve this, we need another field.

Is Last?

LAST()=0

Last Plays to Plot

IF [Is Last?] THEN Sum([Total Plays per Song]) END

This is just saying give me the total plays for the last mark in the list. Add these to the table to see what’s going on, making sure to set the table calc to compute by Date Played only

Now if we replace Total Plays per Song with the Last Plays to Plot field, we get
Now make the chart dual axis (don’t forget to synchronise), and we can then sort the formatting.
  • Set the marks type on the Last Plays to Plot to circle
  • Remove Measure Names from the Colour shelf, and set the mark colour to #3dde3c. Reduce the opacity to about 90%. Add a dark grey border.
  • Add Selected Song to the Size shelf, and adjust so the selected song is larger than the others.
  • On the Plays to Plot marks card, again remove Measure Names from the Colour shelf, and set the colour to black.
  • Set the Path to stepped line.
  • Click on the right hand axis and select Move marks to back
  • Reduce the Size of the mark.
  • Add Selected Song to the Colour shelf, and adjust the colours so the line is black and the tiny dots for all the other marks that you can see in the circles, is set to #3dde3c

You’ve now got the core chart, which needs to be further tidied to remove grid lines, axes, add tooltip etc. Once done, you can add to a dashboard, where you can then set the interactivity.

Add a dashboard action to change set values that sets the Selected Song set on Select. Set to work on single-select only.

Finally, you’ll find that if you select a mark, while you’ll get the trend line, the other points will now ‘fade out’

Create a new field True = True, and add this to the Detail shelf of the All Marks card. Then on the dashboard, add a new dashboard highlight action, which is set to Target Highlighting to the True field only.

Now if you click on and off a mark, you should get the trendline show and disappear, and all marks remain at the same transparency throughout.

Hopefully you’ve got enough now to complete this challenge. My published viz is here.

Happy vizzin’! Stay Safe!

Donna