This week’s challenge was set by Sean and was a recreation of a PowerBI WOW challenge set a few weeks ago. The data set just contains 2 pieces of information – a datetime field and a glucose measurement.
Creating the calculations
On a new sheet, add Device Timestamp to Rows as a discrete (blue) pill at the hour level.
Then create the following fields, which are aggregations of the glucose levels at each hourly time period.
Minimum
MIN([Glucose mg/dL])
5th Percentile
PERCENTILE([Glucose mg/dL],0.05)
25th Percentile
PERCENTILE([Glucose mg/dL],0.25)
Median
PERCENTILE([Glucose mg/dL],0.5)
75th percentile
PERCENTILE([Glucose mg/dL],0.75)
95th percentile
PERCENTILE([Glucose mg/dL],0.95)
Maximum
MAX([Glucose mg/dL])
Add all these fields to the table and then format the Hour field to use the 12hr format by default
Building the Viz
On a new sheet, add Device Timestamp as a continuous (green) pill at the hour level to Columns. Then add Median to Rows. Set the mark type to line and set the path to be a dashed line and colour black. Format the Device Timestamp pill so the default format on the pane is 12hr format (axis should remain as numeric).
Add Minimum to Rows to create a second marks card. Change the mark type to Area and set the colour to be 100% opacity.
Then drag 5th Percentile to the Minimum axis and drop when the ‘2 green column’ symbol appears.
This will automatically add Measure Values to Rows instead and Measure Names to Colour. Add 25th Percentile, 75th Percentile, 95th Percentile and Maximum to the axis too. Reorder the fields in the Measure Values section so the fields are listed in the appropriate order from Min to Max. Adjust the Colours accordingly. Unstack the marks (Analysis -> Stack Marks -> Off).
Add Measure Names to Label and align middle right. Allow labels to overlap marks.
Add Measure Names to the Label on the Median marks card. Also align middle right and format to be underlined.
Make the chart dual axis, synchronise the axis, and then right click on the right hand axis and move marks to back.
Hide the right hand axis (uncheck show header). Edit the Device Timestamp axis to fix from 0 to 26, set the axis to display a value every 3 hours and remove the title.
Add constant Reference Lines to the left hand axis for the values 80 and 180
Edit the left hand axis title. Remove row and column dividers. Add axis rulers. Add a title.
Add to a dashboard and then publish. My published version is here.
Note – the vertical grid lines that appear on the Tableau Public version are a nuance of Tableau Public. I originally recreated on Desktop thinking it was a requirement of the challenge, by adding multiple constant reference lines for each hour. But when I checked Sean’s solution after publishing, I found he didn’t have any and the intention was not to have any.
Erica Hughes set her first #WOW challenge this week, which is all about reference lines. The intention by the #WOW crew is to use this initial challenge as the basis for the next few weeks, so it’s going to be interesting to see how that works out.
But back to this week. We’re building a basic Sales vs Profit Ratio scatter plot by State, but then need various calculated fields to plot the reference lines and the reference box.
Building the basic chart
Adding the dotted reference lines
Adding the reference box
Making the y-axis symmetrical
Highlighting the selected state
Building the basic chart
First up we need to create our favourite calculated field
Profit Ratio
SUM([Profit]) / SUM( [Sales])
format to percentage with 0 dp.
Create the basic scatter plot as below
Remove all gridlines, zero lines and axis rulers and tick marks.
Adding the dotted reference lines
These lines are the median values for the Profit Ratio and Sales.
Sales – Median
WINDOW_MEDIAN(SUM([Sales]))
format to Custom Currency with 1 dp, $ prefix and display units to Thousands (K).
PR- Median
WINDOW_MEDIAN([Profit Ratio])
format to percentage with 0 dp.
Add both these fields to the Detail shelf, and adjust the table calculation settings of both fields so they are computing by State
Add a reference line to the Sales axis (I do this by right clicking on the axis > Add Reference Line, but you can also drag from the Analytics pane).
Set the reference line to be against the Entire Table, using the Sales – Median field, no label, and the tooltip set to custom as below. Set the line to be dotted.
Apply a similar reference line to the Profit Ratio axis, using the PR – Median value instead.
Adding the reference box
I have to admit, this bit took a bit of trial and error. I knew it was going to involve a combination of bands and lines and colouring above and below, but things didn’t always go to plan. It was this post by Jonathan Allenby that helped.
The block is basically bounded by the 25th & 75th percentile of the Sales and Profit Ratio values. So lets create them
Sales – 25th Percentile
WINDOW_PERCENTILE(SUM([Sales]),0.25)
Sales – 75th Percentile
WINDOW_PERCENTILE(SUM([Sales]),0.75)
Add these fields to the Detail shelf and set the table calculation on both to compute by State.
Add a reference line again to the Sales axis, and this time select Band that goes from the Sales – 25th Percentile to the Sales – 75th Percentile and fill based on the colour stated in the requirements.
Now add a reference line to the Profit Ratio axis. This time select Distribution.
Select the computation to be Percentiles and enter 25 as the value. Set Label, Tooltip and Line to None. Now this is the odd bit… set the Fill to the colour white, then check the Fill Below box, which then seems to set the Fill option to ‘No Fill’. Doing this seems to have the desired effect. Trying this in any other order, or manually selecting the Fill option to be No Fill, just doesn’t seem to work…
Now repeat the process adding another reference line on the Profit Ratio axis, and this time create a distribution for the 75th percentile. This time, after setting the Fill to white, check the Fill Above checkbox. Weirdly this time, the fill on my laptop got set the Grey Light, although it was correctly displaying as white on the screen. I manually changed it to ‘No Fill’.
It feels like there’s something a bit ‘buggy’ with all this, which might explain while my initial attempts were failing. I’ll be interested in knowing if you see this behaviour too (I created on v 2021.4.2).
Making the y-axis symmetrical
This was a ‘bonus’ feature, but is again achieved with a reference line. What we’re looking for here is to understand what is the maximum absolute value so we can determine where to place a hidden reference line – ie if the maximum profit ratio is 20% and the minimum is -30%, the maximum absolute value is 30%, and we’d need a reference line +30%, so the axis extends from -30% to +30%. Conversely if the maximum profit ratio is 40% and the minimum is -25%, the maximum absolute value is 40% and we need a reference line at -40% for symmetry.
I’ve encapsulated all this logic within one field below
Ref Line – Profit Ratio Symmetry
IF MAX(WINDOW_MAX([Profit Ratio]), ABS(WINDOW_MIN([Profit Ratio]))) = WINDOW_MAX([Profit Ratio]) THEN -1*WINDOW_MAX([Profit Ratio]) ELSE -1*WINDOW_MIN([Profit Ratio]) END
The statement MAX(WINDOW_MAX([Profit Ratio]), ABS(WINDOW_MIN([Profit Ratio]))) is returning the maximum of the max and min values (ie MAX(20, ABS(-30)) = MAX(20,30) = 30, MAX(40, ABS(-25)) = MAX(40,25) = 40.
Add this onto the Detail shelf and add another reference line to the Profit Ratio axis, setting it to use the Minimum of the Ref Line – Profit Ratio Symmetry field
Test how the field works, by selecting a few of the State marks towards the top and exclude; the axis should change, but still be symmetrical.
Highlighting the selected state
For this we need a parameter
pSelectedState
string parameter defaulted to Virginia
Then we need a field to identify which state has been captured
Selected State
[State]=[pSelectedState]
which returns a boolean. Change the mark type to Circle and add Selected State to the Colour shelf, and adjust accordingly (add a border to the circle too via the Colour shelf). Also add Selected State to the Size shelf and again tweak.
Once the sheet has been added to the dashboard, create a dashboard parameter action that passes the State field into the pSelectedState parameter on Select of the chart.
The final step to add is to stop all the other marks from ‘fading’ out when a State mark is selected. This is achieved by creating the following calculated fields
True
True
False
False
Add both of these to the Detail shelf of the scatter chart.
On the dashboard, create a dashboard filter action as below, which passes selected fields setting True = False, which can never be true and prevents the mark from being highlighted.
And with that, you should have the desired dashboard. I’m interested to see if this matches Erica’s solution (it’s likely I’ll start with the provided workbook next week, rather than mine, just in case there are some discrepancies – eg I’ve used a lot of table calcs… LoDs may have been possible…)
Sean Miller provided the challenge for this week, resurrecting a challenge originally set by Emma Whyte in 2017. Revisiting these older challenges is great fun, as often newer product features provide a different way of solving. For me, I also like the fact I know I’ve already solved it once, and have my own work to reference if I get stuck – ha ha!
Sean hinted that this wasn’t a challenge to ‘overthink’ – no table calcs or LoDs required. You need to be able to display average responses per question per university alongside the overall average response for the question. Simply filtering by university isn’t going to cut it, as the quick filter will immediately eliminate all the data that isn’t associated to the selected university, which means you can’t compute an ‘overall average’ without using LoDs.
The key to this challenge is to use a parameter to drive the University selection. Create this by right clicking on the University field -> Create > Parameter. This will create the parameter dialog box, prepopulated with all the university values. Set the default to University of Liverpool.
pUniversity
With this, we can now create calculated fields to store the values associated to the selected university only.
Note 1, there is already field called Sample Size in the data set. The actual name of this field is <space>Sample Size<space> which Tableau sees as a different name. In hindsight I should have just renamed the original field, so I could then have ‘Sample Size‘. Be mindful of this when I refer to the field later; unless I call it out, I’m referring to my version.
Note 2, I chose to apply the AVG aggregation within the calc rather than changing the default aggregation on the pill when added to the view. There was a reason I did this, but I can’t recall what it was, and think it wasn’t necessary in the end….
University Avg
AVG(IF [University] = [pUniversity ] THEN [% Agree] END)
formatted to percentage, 1 dp
We can also then define the overall average for comparison
Overall Avg
AVG([% Agree])
formatted to percentage, 1 dp
and with that can calculate the variance between the two
Delta
([University Avg]) – ([Overall Avg])
This is custom formatted to 0.00%▲;0.00%▼ (I use this site to get the arrow characters)
And then we need a field to define how the mark needs to be coloured
Colour
[Delta]>=0
We can put these all out in a view
Question Number (which I renamed to No) on Rows
Question Text on Rows
Sample Size on Rows (set to be a discrete blue pill)
University Avg on Rows (set to be discrete)
Overall Avg on Rows (set to be discrete)
Delta on Rows (set to be discrete)
University Avg on Columns (continuous green pill)
Change Mark Type to Circle
Add Colour to the Colour shelf and adjust
Now we need to work on adding the various lines and bands on the chart. This is all managed by adding reference lines (or bands).
Drag % Agree to the Detail shelf, and change to be AVG.
The drag the same field % Agree on Detail again, this time change to MIN. Repeat again, and change to MAX.
Right click on the University Avg axis > Add Reference Line. Create a line, per pane using the AVG(% Agree) field.
Add another reference line (by right clicking on the axis again). This time create a band that starts at MIN(% Agree) and ends at MAX(% Agree). Set the Fill colour to light grey.
We need to create some new fields for the quartile values.
Lower Quartile
PERCENTILE([% Agree],0.25)
Upper Quartile
PERCENTILE([% Agree],0.75)
Add both these fields to the Detail shelf again, then add another reference line (band) similar to that above, but referencing the quartile fields. Set the Fill colour to be a darker grey.
Adjust the formatting and set the tooltips and you’ve got the main chart…. well almost…
In the solution, the first column, the question no, is not labelled. I couldn’t figure out how to do this, which is why I relabelled to simply No. I tried various things, including using text boxes as column headings on the dashboard, but the layout just didn’t work.
BUT I’ve now found out how to do it… because I googled, and I didn’t yesterday when I was building 😦 Andy Kriebel explains it all here. He searches for a ‘zero width space’ character on this site , and then copies the resulting ‘character image’ displayed and pastes into the label of a calculated field. Watch the video to see it in action, but I’ve noted the steps here, just as much for my own benefit when I can’t remember what to do in future… I can see this type of feature cropping up often 🙂
The legend utilises a lot of the concepts above, but we don’t what the mark changing with each university selection. So let’s just hardcode
Legend Avg
AVG(IF [University] = ‘Middlesex University’ THEN [% Agree] END)
and we’ll need a dedicated field for the colour
Legend Colour
[Legend Avg] – [Overall Avg] >=0
The legend sheet can then be built just by plotting the Legend Avg pill on the Columns shelf, with a mark type of circle, the Legend Colour on the Colour shelf, and the same pills used in the reference lines above on the Detail shelf.
When adding the reference lines and bands this time, you will need to add labels and format their position.
The quartile band, also has dotted lines indicating the end of the band, which you can apply as part of the band properties. However the quartile band also has one label left aligned, while the other is right aligned. For a single reference band, the labels can either both be formatted left aligned or both right aligned. To resolve this, don’t add a label to the ‘band to’ section. Create another reference line for the Upper Quartile value, and you can then format the label of this independently.
My published via associated to this challenge is here.
My version based on the original challenge from 2017 is here. The requirement was a bit more complicated it would seem, and it looks like I utilised FIXED LODs quite heavily.
I’ve been on annual leave for a few weeks. I’ve managed to catch up on all the challenges but haven’t blogged a solution for a while. It’s been a real struggle to get back into things to be honest – back to work, back to school, back to football clubs for my kids, and I’m wondering where I found the time before 😦
Trying to think about when/if I was going to get this blog out has caused me a bit of stress, which I don’t need, and so I need to change my mindset a bit…try to relax a bit … if I don’t manage to post a blog, just accept it and move on. I’ve also got to try to reduce the time it takes me to blog. I’m a ‘detail’ person, so often end up documenting to such a low level, but for my own sanity, I’m going to have to make an effort to be a bit briefer. I can’t guarantee I’ll stick to this though.. will just have to see how things pan out.
Going forward, I’m going to try to focus on the points that I think are key to the challenge, or those I found a bit tricksy.
So onto this week’s challenge. Sean Miller returned as guest poster with a COVID-19 related distance challenge.
Note – I connected to the provided hyper extract file rather than the csv file.
The core points I’m going to discuss are
Identify counties within n miles
How to select a county without set actions
How to handle ‘All’ counties being selected
Colouring based on ‘percentile’ of hospital beds
How to only show county borders of counties within selected range
Identify counties within n miles
The number of miles is stored within an integer parameter, n miles, that is defaulted to 100.
A county to act as the ‘start point’ is stored in a set, Selected County, based on the County, State field.
The long/lat coordinates of this selected county need to be captured.
Selected County Lat
{FIXED : AVG(IF [Selected County] THEN [Latitude] END)}
Selected County Long
{FIXED : AVG(IF [Selected County] THEN [Longitude] END)}
By using a FIXED LoD calculation, the values are stored against every ‘row’ in the data set.
With these, the starting point can be determined
Selected County Start Point
MAKEPOINT([Selected County Lat],[Selected County Long])
The position of every other county – the ‘destination’ / end point – also needs to be determined
End Point
MAKEPOINT([Latitude],[Longitude])
With these, the distance between can be computed
Distance
DISTANCE([Selected County Start Point],[End Point],’miles’)
And the counties can then be restricted by
Within n miles
[Distance]<= [n miles]
which can be used as a filter and set to True.
How to select a county without set actions
This is managed via the set control feature; right click on the Selected County set and choose Show Set to display the list of counties with the option to select which ones are in or out of the set. Change the display of the control via the dropdown arrow (top right) to be Single value dropdown which automatically provides as ‘All’ option or the ability to select a single set only.
How to handle ‘All’ counties being selected
When ‘All’ is chosen via the Set Control selector, this has the effect of adding all the State, County values into the set, which means we don’t really have a starting state. So the n miles parameter is essentially redundant. But we need to make the Within n miles filter understand this.
We can manage this by first identifying how many values we have in the set
Count Selected States
{FIXED : COUNTD(IF [Selected County] THEN [County, State] END)}
There will either be 1 or if All is selected, then they’ll be 3000+, and we use the FIXED LoD, so the total is stored against all rows.
We can then update our Within n miles filter to consider this value
Within n miles
([Count Selected States]=1 AND [Distance]<= [n miles]) OR [Count Selected States] > 1
This now returns true if either one State, County is selected and the other records are within n miles OR all the states are selected (the count is > 1).
Colouring based on ‘percentile’ of hospital beds
This caused me a little bit of headscratching. I assumed ‘percentile’ would be based on a percentage of the total num_staffed_beds (note I simply renamed this field to Hospital Beds), associated to the state,counties being displayed (ie within n miles). But after building the calculation I thought, and adding it to colour and choosing the Green-Gold colour range, I didn’t get the same spread as shown in the solution.
I messaged Sean to question whether I’d made the right assumption, but while waiting for a response, I did an online search for “Tableau Percentile Rank”, and quickly spotted that a Table Calculation exists. As you can guess, I don’t use this particular calculation very much at all 🙂
So to colour, add Hospital Beds , and simply add a Percentile table calculation
How to only show county borders of counties within selected range
When working with Maps, you can use the options within the Map Layers menu to select which features of the map should be displayed. One of these options is State/Province Borders, which you might think is what is needed.
However, this will show all the state borders, which is evident if you zoom out a bit, including those for states that don’t have counties within n miles.
This isn’t the requirement – we only want to show the borders of the states which have counties within the desired range. So instead, we don’t want to show any State/Province Borders via the Map Layers. And we’ll utilise a dual axis instead, by first identifying the states that are within n miles
States within n miles
If [Within n miles] THEN [State] END
Duplicate the Latitude (generated) field on the Rows, remove all the fields on that mark, and then add the States within n miles to the Detail shelf. Set the colour of this mark to white, and swap the Latitude (generated) fields if needed, so the States are on the top, as below.
Then make dual axis 🙂
And that’s the key points that I hope will help you solve this challenge. My published version is here, which is available to be downloaded. If there’s anything I haven’t covered that you’re not sure about, feel free to contact me.