Can you use Tableau to estimate Aaron Judge’s home run trajectories?

It’s community month still for #WOW2022, and this week saw Samuel Epley set this challenge to visualise the home run trajectories of Aaron Judge.

I had a little mini-break to Rome this week, so was hoping I was going to be able to get this week’s challenge done and dusted on the Tuesday evening if it landed early enough, as I wasn’t going to be around.

It did land on the Tuesday for me, but wow! it was not going to be easy! I managed to build the KPIs & the scatter plots on the Tuesday evening, and knowing I didn’t have much time, just chose to use the Home Runs stats data set only. I knew these charts weren’t going to need any data densification, so found this approach simpler.

I’m afraid I’m still constrained by time at the moment, so this post isn’t going to be the detailed walkthrough you might usually expect – sorry! I’m just going to try to pull out key points from each chart.

KPIs

I built this on a single sheet, using Measure Names and Measure Values.

I used aliases on the Measure Names (right click -> Aliases) to change the label you can see displayed ie the Distance pill is aliased to ‘Average Distance’

I also custom formatted the various numbers and applied suffixes to display the unit of measure

Note – to To get the degree symbol, I typed Alt+ 0176

Scatter Plots

I built the Exit Velocity by Distance scatter plot first, and completed all the formatting & tooltips. Then I duplicated the sheet to form the basis of the other scatter plots, and just swapped the relevant pills as needed.

For the ball shape, I loaded the provided images as custom shapes into my shapes repository. I then just created the following calculated field to use as a discrete dimension I could add to the Shape shelf

Ball Shape

[HR Number]%9

It’s not as completely randomised as perhaps it should be, but it looks random enough on the display.

The Pitcher in the data is in the format <Surname>, <Forename>, but on the tooltip it needs to display as <Forename> <Surname>, so I just used a transformation on the Pitcher field to split the field based on the comma (right click Pitcher -> Transform -> Split). This automatically created 2 fields I could use on the Tooltip.

I also noticed a very subtle wording change in the tooltip based on whether the match was Home or Away. If Home, the tooltip read ‘New York Yankees vs. <Opposition>’ otherwise it read ‘New York Yankees at <Opposition>’. I used a calculated field for this logic

TOOLTIP: vs or at

IIF([Location]=’Home’,’vs.’, ‘at’)

The Trajectory Plot

OK, so this was the hardest part of this challenge, and mainly due to getting your head round the physics involved, as so many of the calculations are dependent on each other.

I’m generally pretty confident with my maths, but this was complex, especially with the force calculations for the y-axis. Samuel stated that both gravity and drag impacted the Y-axis calcs, but it wasn’t clear to me how both these forces should be applied (a bit of trial and error and I ended up adding them within the formula).

By the time I came to tackle this challenge, Samuel had already posted a video walkthrough, which can be viewed here and is another reason why I’m not going down to the nth degree in this post.

My suggestion is to watch Samuel’s video and/or feel free to download my workbook. I built my workbook independent of Samuel’s video, so there may be steps/calculations that differ.

However, I have tried to number my calculations in the order in which I created them, so you can hopefully follow the thought process. I have also left a CHK:Data sheet in the workbook, which I used to sense check what I was doing.

All the table calculations in the CHK:Data sheet are just set to the default ‘table down’ as I have filtered the sheet to a specific Home Run (HR Number = 1) only (ie I didn’t change any of the table calc settings as I added the pills to the sheet).

However, when you build the main trajectory chart, you have multiple HR Numbers in the view, so all the table calculations must be set so that calculations are only working for each HR Number. This means that any table calc (and any nested calculations) need to have all the fields except HR Number checked

When using the Pages shelf, which isn’t something I’ve ever really had to do before, you need to Show History and adjust the various settings to get the trail lines to show

To rotate the ball (the bonus option), you need another field to use on the Shape shelf. I had lost the will to live a bit by this point, so used the formula from my friend Rosario Gauna’s solution.

Rotation Shape

STR(IIF([14-Start Position Y m] <= 0, 0,
(MIN([Time Interval]) * 1000 / 25) % 9))

Note – when you add this to the Shape shelf, and select your baseball palette, just then use the Assign Palette button to automatically assign a ball to a number – this will get them into the correct order, without you having to do it one by one.

Finally, when adding the reference average lines, be sure to set the scope to per pane rather than table, otherwise you’ll end up with the wrong figures.

I think I’ve pretty much covered all the ‘little’ points that I came across that may trip you up, aside from all the tricky calcs of course!

My published workbook is here. I hope what I’ve written is enough for you to build it yourself. I think I’d still be here next year if I tried to do anything more fully! I’m off for a lie down now!

Happy vizzin’!

Donna

Advertisement

What is the time-weighted return?

Indeed! What is the time-weighted return? It’s not a term I’d ever heard of until Luke set the challenge this week. Fortunately he provided a link to a web page to explain all about it – phew!

I actually approached the challenge initially by downloading the data source, copying the contents of sheet2 into a separate excel workbook, and then stepping through the example in the web page, creating new columns in the excel sheet for each part of the calculation. I didn’t apply any rounding and ended up with slightly different figures from Luke. However after a chat with my fellow WOW participant Rosario Gauna, we were both aligned, so I was happy. I’d understood the actual calculation itself, so I then cracked open Desktop to begin building. This bit didn’t really take that long – most of the time on this challenge was actually spent in Excel, but shhhhh! don’t tell anyone 🙂

Building the calculations

I’m going to build lots of calculated fields for this, primarily because I like to do things step by step and validate I’m getting what I expect at each stage. You could probably actually do this in just 2 or 3 calcs. As I built each calc, I added it into a tabular view of the data. I’m not going to screen shot what the table looks like after every calc though.

To start though, I chose to rename some of the existing fields as follows :

  • Flows -> A: Flows
  • Start Value -> B: Start Value
  • End Value -> C: End Value

I did this because I have several new fields to make, and coming up with a meaningful name for each was going to be tricky – you’ll understand as you read on 🙂

First up, create a sheet and add Reporting Date as an exact date discrete blue pill to Rows then add the above 3 fields.

Now onto the calculations, and this all just comes from the website. Each calc builds on from the previous. I chose to add inline comments as a reminder. After I built each calc I added to the table.

D: A+B

//sum flow + start point
SUM([A: Flows]) + SUM([B: Start Value])

E: C-D

//difference between end value and (flow+start)
SUM([C: End Value]) – [D: A+B]

F: E/D

//Difference as a proportion of (start + flow)
[E: C-D]/ [D: A+B]

I formatted this one to 3 decimal places so the figure showed on screen, but ‘under the bonnet’ Tableau will use the full number in the onward calcs.

At this point now, F: E/D stores what the article describes as the ‘HP’ value for each date. Now we need to start building the calcs we need to compute the TWR value.

G: 1+F

1+[F: E/D]

formatted to 3 dp

H: Multiply G values

//Running Product of column G
[G: 1+F]*PREVIOUS_VALUE(1)

This is the core calculation for this challenge. This takes the value of the G: 1+F field in each row, and multiples it by the value of the G: 1+F field in the previous row. As each row ends being the product of itself and the former row, this ultimately builds up a ‘running product’ value.

Add these fields onto the table and explicitly set the table calculation of the H: Multiply G values field to compute using Reporting Date.

And finally we can create

I: TWR

//subtract 1 from the running product -1
[H: Multiply G values]-1

format this to be a % with 0dp. Add to the table and again ensure the table calc (and all nested calcs) are set to compute using Reporting Date

The final calculation needed is one to use to display the TWR value for the last date.

J: Latest TWR Value

//get the last value computed
WINDOW_MAX([I:TWR])

format this to be a % with 0dp.

This function looks for the highest number across the whole table, and spreads that value across every row. Since the I: TWR field is cumulative, the highest number will be the last value. Add this onto the table and verify all the tab calcs are set to compute by Reporting Date.

And now you can build the viz

Building the Viz

Add Reporting Date as an exact date continuous green pill to Columns and I:TWR to Rows, setting the compute by of the table calculation as described above. Add J:Latest TWR Value to Detail, and again set the compute by.

Then update the title of the sheet to reference the J:Latest TWR Value field, amend the tooltips and update the label on the y-axis. Add to a dashboard and you’re done 🙂

My published viz is here.

Happy vizzin’!

Donna

Can you track headcount?

Luke decided to set us a challenge this week based on Human Resources (HR) data, using a dataset provided as part of the #RWFD Real World Fake Data project managed by Mark Bradbourne.

The challenge focussed on reporting monthly headcount, using a dataset that contained 1 row per employee with a column relating to the start date (Hire Date) and a column relating to the date the employee left (Termdate), which could have been a future date or NULL/blank.

I’ve worked on headcount reporting before in my day job, but have used a snapshotted data source, which captures the active employees as at midnight on the 1st day of every month. This method makes counting the headcount each month very simple, as it’s just counting the number of rows per snapshot date.

Obviously, this wasn’t an option in this case. Luke gave very little clues as to what approach would be required, apart from saying ‘you will also need to manipulate the data’….

A word about the data…

Just a little side note here…. I ended up in discussions with my fellow #WOW participant Rosario Gauna as after attempting one method, I wasn’t getting as high numbers as the solution originally suggested. There were rows in the data set that had NULL values for most of the data except job title, but I think this was only around 225 rows, so even including them I didn’t get close to the figures. Rosario however, was somehow managing to see approx 15,000 rows of null data. I ended up building and publishing my initial solutions based on a version of the data sent to me directly by Rosario.

Since then, it seems others were also confused, and so the solution was adjusted to ignore all the null rows. As a result I reworked one of my solutions to use the original data source I downloaded from the site, and it’s this I will focus on in the blog.

A word about my solutions…

With little instruction, it took a bit of head-scratching to figure out how to work through the problem. I initially decided to use a scaffold dataset to help. I built a simple file in Excel which contained 1 column listing 1 date per month which went from 01 Jan 2017 to 01 Jan 2021. I then joined this to the HR data in Tableau Desktop using the physical data layer (ie I did not use relations), using join calculations as below (note, this is using the amended copy of data I received where Termdate was already defined as a date field.

The solution using this method is published here.

However, I didn’t feel completely comfortable with this approach. Luke had suggested the difficulty of this was 9/10 and referred to nested tableau calculations… the above solution just used quick table calcs. So I decided to see if I could come up with an alternative, just using the provided data, and that’s what I’ll document below.

Setting up the data

I downloaded the Human Resources.csv file from the link provided, and connected to it in Tableau Desktop. I found the Termdate field presented as a string field, so I converted it to a date, just by clicking on the ABC symbol of the data type against the field. I also added a data source filter to exclude all records where the Id field was NULL.

I then added a new data source, and connected to the same file again. This meant I had 2 instances of the Human Resources data source listed in my Data pane. I renamed the 2nd instance to be Leavers Only, changed the data type of the Termdate field again and added a data source filter so that only records with non-Null Termdate values were retained.

Note – What I will be doing here is using the Human Resources data source to manage the months we’re reporting headcount over. This method only works if there is at least 1 new starter every month, which in this case there is.

Building the calculations

The requirements state that a new starter doesn’t count in the monthly headcount figures until the following month, so the date field we need to use in the output, needs to be created as

Report Date

DATE(DATETRUNC(‘month’,DATEADD(‘month’,1, [Hire Date])))

This basically shifts the date the new starter joined to the 1st day of the following month.

To make the calculations easier to read, I also created

Starters

COUNT([2022_04_06_WW14_Human Resources.csv])

which is literally a reference to the field that automatically gets added as part of the connection – it’s just a bit of a mouthful as its named according the data connection.

Let’s pop these out in a table as below – I’m deliberately using the dateparts of the Report Date field as they’ll be used in the viz like this.

So this is just showing the new starters against the month at which they will count for headcount. We need to get details of leavers now, and we’ll do this by blending.

When I blend data sources, I tend to create specific fields so it’s clear to me that they’re being used for blending, rather than defining blend relationships between differently named fields. So in the Human Resources data source I’ve created

BLEND – Date

[Report Date]

and then in the Leavers data source, I’ve created

BLEND – Date

DATE(DATETRUNC(‘month’,DATEADD(‘month’,1,[Termdate])))

which is shifting the Termdate forward to the the 1st of the next month, since leavers should also only be recorded in the following month.

In the Human Resources data source I then created

Leavers Per Month

ZN(COUNT([2022_04_06_WW14_Human Resources – Leavers Only].[2022_04_06_WW14_Human Resources.csv]))

which is just referencing the automatically generated ‘count’ field from the Leavers data source. I’ve wrapped it in ZN so 0 is reported in the event no match is found.

Add Leavers Per Month into the table, and ensure the linking field from the secondary Leaver data source is connected on the BLEND – Date field

So now we know how many starters and leavers per month, we now need to total these up by generating a cumulative running sum of the starters, but subtracting any leavers along the way.

Official Headcount

([Starters] + PREVIOUS_VALUE(0)) – [Leavers Per Month]

Takes the value of the Starters in the current row, adds it to the value of Official Headcount from the previous row, then subtracts any Leavers recorded against the current row. Add this onto the table, and as its a table calculation, edit it so it is explicitly computing my the month & year of Report Date

For the annual change, we’re looking to compare the difference of the Official Headcount value for the current month eg Jan 2021, with the Official Headcount value for the same month 1 year ago eg Jan 2020.

Annual Change

(ZN([Official Headcount]) – LOOKUP(ZN([Official Headcount]), -12)) / ABS(LOOKUP(ZN([Official Headcount]), -12))

Take the Official Headcount from the current row, subtract the Official Headcount from 12 rows before, then divide the result by the Official Headcount from 12 rows before.

Format this to a percentage at 1 dp. (For some reason my numbers here don’t seem to match Luke’s revised solution…).

Again add to the sheet and set the table calcs of both the nested calcs to compute by month and year

The final measure was a bit of a strangely named one IMO. I felt I should be doing some rolling calculation, but ultimately, it just seemed to be reporting the difference between the headcount now and that 12 months ago ie the numerator of the calculation above.

Net Rolling 12 Change

(ZN([Official Headcount]) – LOOKUP(ZN([Official Headcount]), -12))

Pop this into the view, adjust the table calc settings again

Building the viz

On a new sheet, add Report Date at the month level, but then set to be continuous, to Columns and Report Date at the year level to Colour. Add Official Headcount to Rows (adding the linking field for the blend when prompted), and adjust the table calc settings to compute over both month & year, ensuring the Year of Report Date is listed first.

Then in the legend, select all the years from 2000-2017 and ‘hide’ (this is probably a cheat way, but the quickest – we need to retain the data from the previous years, so can’t just ‘filter’).

Edit the axis to not start from zero. Adjust colours to suit.

Add Annual Change and Net Rolling 12 Change to the Rows shelf, adjust both of the table calc settings, so all nested calcs are computing by Year & then Month.

Edit the axis of both these additional measures to also exclude 0. On the All marks card, click the Label button and tick the Show mark labels option.

Final steps ….

…on the All marks card, click the Tooltip button and uncheck Show tooltips so no tooltips display on hover.

Right click on the Month axis and format the axis so the dates in the Scale section are displayed as abbreviated dates

Edit the same axis and remove the title, then format the whole chart to

  • remove all row and column dividers
  • to set the row banding with a band size of 1
  • adjust the colour of the row and column gridlines to be a slightly darker shade of grey

And you’re done 🙂 My published viz is here

Had a bit of all sorts this week… I’ll be intrigued to see how Luke solved it!

Happy vizzin’!

Donna

How many consecutive starts?

Another table calculation related challenge this week, set by Luke, visualising cumulative starts for NFL Quarterbacks per team from 2006.

Luke provided the data within a Tableau workbook template on Tableau Public, so I started by downloading the workbook and understanding the data structure.

The challenge talks about teams playing over 17 weeks, but the data showed some data associated to weeks 28-32. So I excluded these weeks by filtering them out.

I then started to build out the data in tabular form, so I could start to build up what was required. I added Team, Season, Week and Player ID to Rows, and just to reduce the amount of data I was working with while I built up the calcs, chose to filter to the Teams ARZ, ATL & BLT.

What we’re looking to do is examine each Player ID and work out whether it is the first record for the Team or whether it differs from the previous row’s data. If so then we’re at the ‘start’ of a run, so we record a ‘counter’ value of 1. If not, the values match, so we need to increment the counter.

We’ll do this in stages.

Firstly, let’s get the previous Player ID value.

Prev Player ID

LOOKUP(MIN([Player Id]),-1)

This ‘looks up’ the record in the previous (-1) row. Change this field to be discrete and add to the Rows. Set the table calculation to compute by all fields except Team.

Each Prev Player ID value matches the Player ID from the row before, unless its the first row for a new Team in which case the value is Null.

Then we can create a field to check if the values match

Match Prev Player ID

MIN([Player Id])=[Prev Player ID]

Add this to the view and set the table calc as above, and the data shows True, False or NULL

Now we can work out the consecutive streak values

Consecutive Streak

IF (NOT([Match Prev Player ID])) OR ISNULL([Match Prev Player ID]) THEN 1
ELSE 1+PREVIOUS_VALUE(-1)
END

If we don’t match the previous value or we’re at the start of a new team (as value is NULL), then start the streak by setting the counter to 1, otherwise increment the counter. Add this to the view and set the table calc for both the nested calculations as per the settings described above.

Next we need to identify the last value of the Consecutive Streak for each Team.

Current Streak

WINDOW_MAX(IF LAST()=0 THEN [Consecutive Streak] END)

The inner IF statement, will return the value of Consecutive Streak stored against the last row for the Team. All other rows will be Null/blank. The WINDOW_MAX() statement then ‘spreads’ this value across all the rows for the Team.

Add this onto the view, and set the table calc for all the nested calcs.

Finally, we need one more bit of data. The chart essentially plots values from 2006 week 1 through to 2020 week 17. We need to ‘index’ these values, so we have a continuous week number value from the 1st week. We can use the Index table calculation for this

Index

INDEX()

Add this field to the view, set it to be discrete (blue pill) and position after the Week field on the Rows. Set the table calc as usual, so the Index restarts at each Team.

Now we’ve got all the data points we need, we can build the viz. I did this by duplicating the tabular view and then

  • Remove Prev Player ID and Match Prev Player ID
  • Move Season, Week and Player ID from Rows to Detail
  • Move Current Streak from Text to Rows and change to be discrete (blue)
  • Move Index from Rows to Columns and change to be continuous (green)
  • Move Consecutive Streak from Text to Rows
  • Change mark type to bar, and set to fit width to expand the view.
  • Change Size to be Fixed, width size 1 and aligned right
  • Set the border on the Colour shelf to be None.
  • Remove the Team filter and adjust the row height

All that’s left now is to set the tooltip (add Player to the Tooltip shelf to help this), and then apply the formatting. You can use workbook formatting (Format -> Workbook menu) to set all the font to Times New Roman.

Hopefully this is enough to get you to the end 🙂 My published viz is here.

Happy vizzin’! Stay Safe!

Donna

Let’s go streaking!

It was Sean Miller’s turn to set the challenge this week, where the primary focus was to find the highest number of consecutive months where the monthly sales value was higher than the previous month.

This was a table calculations based challenge, and I always tackle these by building out the data required in a tabular format. The challenge was also reminiscent of a previous challenge Sean has set, which I’ve blogged about here, and admit I used as a reference myself.

So let’s get started.

To start with, we need the month date, the Sub-Category, the Sales value and the difference in Sales from the previous month. For the month date, I like to define this explicitly

Order Date Month

DATE(DATETRUNC(‘month’,[Order Date]))

This aligns all Order Dates to the 1st of the relevant month.

Add Sales Category, Order Date Month (set to discrete exact date blue pill), and Sales into a view, then set a Quick Table Calculation of Difference on the Sales pill

Edit the table calculation to compute by Order Date Month only, so the previous calculation restarts at each Sub-Category.

Then drag this pill from the marks card into the left hand data pane to ‘bake’ the calculated field into the data model. Name the field Sales Diff. The re-add Sales back into the view too, so you can double check the figures.

Identify whether there is an increase with the field

Diff is +ve

IF [Sales Diff]>0 THEN 1 ELSE 0 END

Add this into the view too, and verify the calculation is computing by Order Date Month only again.

Now we need to work out if the row matches the previous value

Match Prev Value

LOOKUP([Diff Is +ve],-1) = [Diff Is +ve]

The LOOKUP is looking at the previous row (identified by the -1) and comparing to the current. If they match then it returns True else False.

Again add into the view, and again double check the table calc settings. In this case there is nested calculations so you need to double check the settings against each calc referenced in the drop down

Now we need to work out when there are consecutive increases, and how many of them there are

Increase Streak

IF (NOT([Match Prev Value])) AND ([Diff Is +ve] = 1) THEN 1
ELSEIF [Diff Is +ve] = 1 THEN ([Diff Is +ve]+PREVIOUS_VALUE([Diff Is +ve]))

END

If the current row has a +ve difference and the previous row wasn’t +ve, then we’re at the start of an increase streak, so set to 1. Else, if the current row has a +ve difference then we must be on a consecutive increase, so add to the previous row, and this becomes a recursive calculation, so builds up the values..

Add this onto the view, set the table calc settings, and you can see how this is working…

So now we’ve identified the streaks in each Sub-Category, we just want the maximum value.

Longest Streak

WINDOW_MAX([Increase Streak])

Add this and set the table calc setting again. You’ll see the max value is spread across every row per Sub-Category.

Finally we need to identify Sales values in the months when the streak is at its highest.

Sales of Month with Longest Streak

IF [Longest Streak]=[Increase Streak] THEN SUM([Sales]) END

Add this into the view again (don’t forget those table calc settings), and you’ll notice that for some Sub-Categorys there are multiple points with the same max streak

With all this we can now build the viz, which is relatively straight forward….

Add Order Date Month (exact date, continuous green pill) to Columns, Sub-Category to Rows and Sales to Rows. Edit the Sales axis to be independent, then change the line type of the Path to stepped

Add Sales of Month with Longest Streak to Rows and set to dual axis, and synchronise. Make sure the mark type of the 2nd axis is set to circle, and remove Measure Names from the colour shelf of both marks.

Manually set the colour of the line chart to grey. Add Longest Streak to the Colour shelf of the circle marks card. Adjust the colour to use the green palette, set to stepped of 5 value and ensure the range starts at 0 and ends at 5 (don’t forget to edit the table calc settings!).

Now add Longest Streak as a discrete blue pill to the view too.

This is all the core components. The last thing we need to do is sort the list. I wasn’t entirely sure how it had been sorted, apart from the largest Longest Streak at the top. I created a new field for this

Sort

[Longest Streak]*-1

and added this as a blue discrete pill in front of Sub-Category….

…, then hid the column.

Then just apply the tooltip and relevant formatting on the chart.

For the legend, I created a new field

Legend

CASE [Sub-Category]
WHEN ‘Art’ THEN 0
WHEN ‘Chairs’ THEN 1
WHEN ‘Labels’ THEN 2
WHEN ‘Paper’ THEN 3
WHEN ‘Phones’ THEN 4
ELSE 5 END

and added this into a new sheet as below

The components then just need to be added to the dashboard. My published version is here.

Happy vizzin’! Stay Safe!

Donna

Can you compare a 3-day vs 14-day moving average and describe the latest trend?

This week for #WOW2020, Ann provided a table calculation feast of a challenge! This certainly is not for the faint-hearted! As well as cracking all the table calcs, the challenge features multiple views, measure swapping, parameters, BANs, filtering, sorting …. it’s got it all going on!

Ann hinted you’d probably want to start with the table, and even if there hadn’t been a table output in the display, this is what I would have done. If you’ve read enough of my blogs, you’ll know I often like to build up a ‘check data’ sheet, which just contains the data I need in tabular form as a quick reference. When working with table calculations this is an absolute must have!

So let’s build out that Check Data table to start with. I have a feeling this is going to be a lengthy blog 🙂

Initial Set up

First up, the requirements stated that the latest date would be 7 June, but I found records with a 8 June date. All the associated info for this date was null though, so I set a data source filter to exclude this. This means I wouldn’t get any issues if I needed to store the max date in a FIXED LoD calculation at any point.

I also found it easier to rename a couple of the measures provided to match the output, so rename PEOPLE_POSITIVE_NEW_CASES_COUNT to New Cases and PEOPLE_POSITIVE_CASES_COUNT to Reported Cases. I’ll refer to these renamed fields going forward.

Building all the Calculated Fields

To build out the table, we’re just going to focus on one State & County, as there’s a lot of data. So add Province State Name = Tennessee and County = Davidson to the Filter shelf.

Add Report Date (discrete exact date – blue pill) and New Cases & Reported to Rows. As you scroll down, you’ll see data starting to come in on 8 March.

We want to create our moving average calculations

3 Day Moving Avg

WINDOW_AVG(SUM([New Cases]), -2, 0)

14 Day Moving Avg

WINDOW_AVG(SUM([New Cases]), -13, 0)

Notice the number of rows to average over is 1 less than you might expect, as the current row is included, so the calculation is saying ‘current row’ and 2 | 13 previous rows.

Add these to the table, and adjust the table calculation so it is explicitly calculating by Report Date. This would have happened automatically, as the calculation would have been computing ‘down’ the table, but it’s best to fix the computation, so it doesn’t matter where the pill gets moved to in the view.

We now need to work out whether there is an increase or not between the 3-day and 14-day average.

Is Increase?

IF [3 Day Moving Avg] > [14 Day Moving Avg] THEN 1 ELSE 0 END

Is Decrease?

IF [3 Day Moving Avg] <= [14 Day Moving Avg] THEN 1 ELSE 0 END

I’m using 1s and 0s as it’s going to help with a later calculation.

NOTE – I’m assuming that if there is ‘no change’ it’ll be recorded as a decrease. This is how I interpreted the requirement, “ …whether it is an increase or a decrease (or no change)” and it wasn’t easy to find any matches anyway.

I also need some text to indicate the increase or decrease

Increase | Decrease

UPPER(IF [Is Increase?]=1 THEN ‘Increase’ ELSE ‘Decrease’ END)

The UPPER is used as that’s part of the tooltip formatting.

Let’s get these onto the view, always making sure the table calculations are set to Report Date.

We need to calculate the number of days that has been reported INCREASE in succession, and the number of days where successive DECREASE has been reported.

So first, let’s identify which rows match the previous row.

Match Prev Value?

LOOKUP([Is Increase?],-1) = [Is Increase?]

If the value of the Is Increase? field in the previous (-1) row is the same as the Is Increase? field in the current row, then this is true, else false.

Add to the view, and verify the table calculation for itself and all nested calculations being referenced, is set to Report Date.

We now have all the information we need to help us work out the number of days in the increase/decrease ‘trend’.

Days in Trend

IF (FIRST()=0) OR(NOT([Match Prev Value?])) THEN 1
ELSEIF [Increase | Decrease] = ‘INCREASE’ THEN ([Is Increase?]+PREVIOUS_VALUE([Is Increase?]))
ELSEIF [Increase | Decrease] = ‘DECREASE’ THEN ([Is Decrease? ]+PREVIOUS_VALUE([Is Decrease? ]))
END

If the row in the table is the very first entry (so there’s nothing previous to compare against), or the row in the table didn’t match it’s predecessor (ie there was a change), then we’re starting a new ‘trend run’, which obviously starts at 1.

Otherwise, if the current row we’re on indicates an increase, then we’ll add the value of the Is Increase? field (which is 1) to the previous value (which is also 1). PREVIOUS_VALUE works recursively though, so it essentially builds up a running sum, which gives our trend.

We ultimately do the same thing using the Is Decrease? column. This is why using 1 & 0s in the earlier calculation help.

Adding into the view, and setting the table calculation correctly, you should get something similar to this…

Finally, there’s one key field we need to add; something to help identify the latest row as we will need it for filtering in the table that’s displayed on the dashboard. Simply applying a standard ‘quick filter’ won’t work, as the table requires we show the 3-day & 14-day moving averages. A ‘quick filter’ to limit the data to the latest date (7th June), will show the wrong values, as the data related to the other days will be filtered out, so the table calc won’t have the information to correctly compute over.

We need to create another table calculation that we can use as a filter, and that due to Tableau’s ‘order of operations’ will apply later in the filtering process than a traditional quick filter.

Max Date

{FIXED : MAX([Report Date])}

The latest date in the whole data set.

Show Data for Latest Date

LOOKUP(MIN([Report Date]),0) = MIN([Max Date])

If the Report Date of the current row is the same as the maximum date in the whole data set, then return true.

We’ve now got all the core data components we need to create the various charts.

In the interest of time (my time in writing this out), I’m going to attempt not to describe the building of all the charts in too much detail, but just call out the useful bits you might need. If you’re attempting this challenge with the table calcs above, I’m assuming you know Tableau enough to not need everything defined to the lowest level.

The whole report is driven off a parameter which the user must enter a State – County combo.

You’ll need a calculated field to store the combo

State – County

[Province State Name] + ‘ – ‘ + [County]

and then create a parameter (State – County Parameter) off of this (right click, Create -> Parameter) which will create a string parameter with all the permutations.

When displaying on the dashboard, set this to be of type Type In

BAN

The BAN is a basic summary of the latest trend for the entered state county.

We need to filter the sheet to the value entered in the parameter

Is Selected State County?

[State – County Parameter] = [State – County]

Add this to the Filter shelf as true, along with the Show Data for Latest Date.

Add the relevant fields to the Text shelf to display the required text. The Report Date needs to be custom formatted to ddd, mmm d to get the Sun, Jun 7 display

Map

For the map, as well as filtering the latest date, we’re also going to need to filter just to the state only (not state & county) as above. So I created

Is Selected State?

LEFT([State – County Parameter], FIND([State – County Parameter],’-‘)-2) = [Province State Name]

This is unpicking the State – County combined string stored in the parameter, to just find the State part and compare to the Province State Name.

Build a filled map based on County and filter to the latest date and the selected state. I set the Map Layers to that below, which seems to match up

You’ll need to set both the Is Selected State County? and Increase|Decrease fields to the Colour shelf.

Bar & Line Chart

You’re going to need a few more calculated fields for this.

Moving Avg Selector

for the user to choose what the line should display. I’ve set it to an intger parameter that displays text

We then need a field to show on the display depending on what’s been selected in the parameter

Moving Avg to Display

If [Moving Avg Selector] = 3 THEN [3 Day Moving Avg] ELSE [14 Day Moving Avg] END

You’ll need a Dual Axis chart plotting New Cases and Moving Avg to Display against Report Date (continuous exact date)

The data only starts from 8th March, so I added Report Date to filter to start from 8th March. 8th March is also added as a constant reference line.

Table

Based on the State / County entered, the table is filtered to show the data for the latest date for all the counties in the state entered. Although not stated in the requirements, the first row is the county selected, with the rest ordered by Reported Cases.

You can get the selected county to the top, by adding Is Selected State County as a hidden field to the Rows, and moving ‘True’ to the top.

And that should be everything you need to build the dashboard, which is pretty much just stacking all the sheets one on top of each other in a single column.

My published viz is here.

Happy vizzin’! Stay Safe!

Donna

Can you create a concatenated list of values?

The challenge for this week’s #WOW was set by Sean Miller, with a focus on creating a more ‘traditional’ list style report. The challenge was related to the structure of the data, and using Tableau to take multiple rows of data for a patient, and present as a single row, with some of the information concatenated into a string of comma separated values.

Whilst there are other tools & methods that could be used to shape the data before it hits Tableau, which may be more appropriate/performant for a business use, this is after all a Tableau challenge.

I figured out what I was going to need pretty quickly – a table calculation to compare rows to build up the string. However I did struggle with the volume of data being used. It caused Tableau to keep crashing initially, so I had to ping Sean to sense check I was on the right track, and there wasn’t something I was missing. Sean confirmed I was doing the right thing, so I decided to start building against a much smaller set of data. I restricted the data by adding a data source filter, and using a wildcard filter against Member Name starts with ‘a’.

So onto the build.

Building up the concatenated string

Add Member Name and Health Check Name to Rows to get the set of rows we need to work with.

What we’re aiming for is to get a concatenated string of each Health Check Name for each Member Name, on a row.

We’re going to achieve this by building up a string that combines the Health Check Name for the current row, with the combined list from the previous row.

This requires the use of a couple of Table Calculations. The first one is Index(). This can be used to number the rows in a table. I typically create a calculated field to store this.

Index

INDEX()

Set it to be discrete rather than continuous, and add the field to the Rows.

By default, the index is basically displaying a number per row, starting from 1 to however many rows are being displayed. This is because the table calculation has been set to compute Table Down.

We actually want the calculation to restart at 1 when the Member Name changes, ie for each Member Name.

Change this by clicking the ‘carrot’ / arrow on the Index pill and set to Compute Using -> Health Check Name

Next we need to create new calculated field that will build up the string

Health Check Name List

IF [Index]=1 THEN ATTR([Health Check Name])
ELSE PREVIOUS_VALUE(ATTR([Health Check Name])) + ‘, ‘ + ATTR([Health Check Name])
END

If we’re the first row, then store the name of the current row, else get the value from the previous row and concatenate to the current value, separating with a ‘,’.

Add this field to the Text shelf, and verify the table calculation is set to Compute by Health Check Name

You can see how the list is building up for each row, so by the time you get to the last row for the Member Name you’ve got the complete list.

Restrict to 1 row per Member Name

The last row for each Member Name is ultimately what we want. To identify this, I need another table calculation

Size

SIZE()

This does exactly what it ‘says on the tin’; returns the number of rows being displayed, and like INDEX() it can be computed over different partitions in the table.

Change it to be discrete and add to Rows, once again setting it to Compute By -> Health Check Name

Size is displaying the total number of rows associated to each Member Name.

To reduce the data to 1 row per Member Name, we just need the following:

Index = Size?

Index = Size

This returns true if the values match, so add this to the Filter shelf and set to True, and hey presto! 1 row per Member Name.

You can now start adding all the other fields to the output and remove Index & Size from the rows.

You will need to ensure Health Check Name remains on the canvas though, as otherwise the table calculations will break. You can either hide it from displaying as a column, by unchecking Show Header, or move it to the Detail shelf.

To ‘label’ the concatenated string field, so it ‘looks’ like just another regular column, I type the text directly into the Columns shelf

This gives us a ‘double’ column heading, and we can simply remove one

Min # Healthchecks filter

This is achieved by adding Size to the Filter shelf, but to get a range slider, the field needs to be a green continuous pill rather than blue.

List Must Contain filter

Right click on the Health Check Name field and Create -> Parameter

The Edit Parameter dialog will open with the list of distinct values pre-populated. Add the value ‘All’ to the bottom.

This parameter will be presented on the dashboard to allow user selection. But as it stands it won’t do anything. We need another field to read in the value selected in the parameter and filter the rows.

FILTER: Health Check Names

CONTAINS([Health Check Name List],[Health Check Name Parameter]) OR [Health Check Name Parameter]=’All’

This returns True if ‘All’ is selected or the concatenated list we built, contains the value from the parameter.

Add this to the Filter shelf, and set to True.

Note throughout this challenge, all the table calculations must be set to compute by Health Check Name. This includes Nested Calculations

It’s worth double checking, and if you do alter them, recheck the filter value is selected to True and hasn’t reset itself.

And that’s the core of the tricky stuff all sorted. The additional filters need adding for Physician & Age Bracket, formatting applied and then all displayed nicely on a dashboard.

Once I’d got all this I saved my work, and then took off my data source filter, keeping my fingers crossed it didn’t all grind to a halt… it didn’t, but it wasn’t the quickest either.

My published solution is here.

Apparently someone completed the challenge with LODs… I’m off to investigate further!

Happy vizzin’!

Donna