Can you make Spine Charts?

Sean Miller provided the challenge for this week, resurrecting a challenge originally set by Emma Whyte in 2017. Revisiting these older challenges is great fun, as often newer product features provide a different way of solving. For me, I also like the fact I know I’ve already solved it once, and have my own work to reference if I get stuck – ha ha!

Sean hinted that this wasn’t a challenge to ‘overthink’ – no table calcs or LoDs required. You need to be able to display average responses per question per university alongside the overall average response for the question. Simply filtering by university isn’t going to cut it, as the quick filter will immediately eliminate all the data that isn’t associated to the selected university, which means you can’t compute an ‘overall average’ without using LoDs.

The key to this challenge is to use a parameter to drive the University selection. Create this by right clicking on the University field -> Create > Parameter. This will create the parameter dialog box, prepopulated with all the university values. Set the default to University of Liverpool.

pUniversity

With this, we can now create calculated fields to store the values associated to the selected university only.

Sample Size

AVG(IF [University]=[pUniversity ]THEN [Sample Size] END)

Note 1, there is already field called Sample Size in the data set. The actual name of this field is <space>Sample Size<space> which Tableau sees as a different name. In hindsight I should have just renamed the original field, so I could then have ‘Sample Size‘. Be mindful of this when I refer to the field later; unless I call it out, I’m referring to my version.

Note 2, I chose to apply the AVG aggregation within the calc rather than changing the default aggregation on the pill when added to the view. There was a reason I did this, but I can’t recall what it was, and think it wasn’t necessary in the end….

University Avg

AVG(IF [University] = [pUniversity ] THEN [% Agree] END)

formatted to percentage, 1 dp

We can also then define the overall average for comparison

Overall Avg

AVG([% Agree])

formatted to percentage, 1 dp

and with that can calculate the variance between the two

Delta

([University Avg]) – ([Overall Avg])

This is custom formatted to 0.00%▲;0.00%▼ (I use this site to get the arrow characters)

And then we need a field to define how the mark needs to be coloured

Colour

[Delta]>=0

We can put these all out in a view

  • Question Number (which I renamed to No) on Rows
  • Question Text on Rows
  • Sample Size on Rows (set to be a discrete blue pill)
  • University Avg on Rows (set to be discrete)
  • Overall Avg on Rows (set to be discrete)
  • Delta on Rows (set to be discrete)
  • University Avg on Columns (continuous green pill)
  • Change Mark Type to Circle
  • Add Colour to the Colour shelf and adjust

Now we need to work on adding the various lines and bands on the chart. This is all managed by adding reference lines (or bands).

Drag % Agree to the Detail shelf, and change to be AVG.

The drag the same field % Agree on Detail again, this time change to MIN. Repeat again, and change to MAX.

Right click on the University Avg axis > Add Reference Line. Create a line, per pane using the AVG(% Agree) field.

Add another reference line (by right clicking on the axis again). This time create a band that starts at MIN(% Agree) and ends at MAX(% Agree). Set the Fill colour to light grey.

We need to create some new fields for the quartile values.

Lower Quartile

PERCENTILE([% Agree],0.25)

Upper Quartile

PERCENTILE([% Agree],0.75)

Add both these fields to the Detail shelf again, then add another reference line (band) similar to that above, but referencing the quartile fields. Set the Fill colour to be a darker grey.

Adjust the formatting and set the tooltips and you’ve got the main chart…. well almost…

In the solution, the first column, the question no, is not labelled. I couldn’t figure out how to do this, which is why I relabelled to simply No. I tried various things, including using text boxes as column headings on the dashboard, but the layout just didn’t work.

BUT I’ve now found out how to do it… because I googled, and I didn’t yesterday when I was building 😦 Andy Kriebel explains it all here. He searches for a ‘zero width space’ character on this site , and then copies the resulting ‘character image’ displayed and pastes into the label of a calculated field. Watch the video to see it in action, but I’ve noted the steps here, just as much for my own benefit when I can’t remember what to do in future… I can see this type of feature cropping up often 🙂

The legend utilises a lot of the concepts above, but we don’t what the mark changing with each university selection. So let’s just hardcode

Legend Avg

AVG(IF [University] = ‘Middlesex University’ THEN [% Agree] END)

and we’ll need a dedicated field for the colour

Legend Colour

[Legend Avg] – [Overall Avg] >=0

The legend sheet can then be built just by plotting the Legend Avg pill on the Columns shelf, with a mark type of circle, the Legend Colour on the Colour shelf, and the same pills used in the reference lines above on the Detail shelf.

When adding the reference lines and bands this time, you will need to add labels and format their position.

The quartile band, also has dotted lines indicating the end of the band, which you can apply as part of the band properties. However the quartile band also has one label left aligned, while the other is right aligned. For a single reference band, the labels can either both be formatted left aligned or both right aligned. To resolve this, don’t add a label to the ‘band to’ section. Create another reference line for the Upper Quartile value, and you can then format the label of this independently.

My published via associated to this challenge is here.

My version based on the original challenge from 2017 is here. The requirement was a bit more complicated it would seem, and it looks like I utilised FIXED LODs quite heavily.

Happy vizzin’! Stay Safe!

Donna

Can you visualise the Cholera Outbreak?

Lorna delivered an exciting map based challenge this week, to recreate one of the most famous visualisations ever created. If you’re part of the data viz community and never heard of John Snow and his cholera map, then I suggest you go and google now 🙂

While I love map based challenges, as its an area I don’t get to use regularly, they’re also the ones that can give me the most frustrations/take the longest to complete, as the functions and concepts I need to use don’t come as readily as with other challenges. I had to refer back to my own blogs on previous map based challenges (specifically this one and this one), to help me out. However these were written before map layers were introduced, so not everything was applicable.

  • Modelling the data
  • Building the Avg Distance to Pump bar chart
  • Building the Total Deaths within Avg Distance bar chart
  • Building Map
  • Adding the dashboard interactivity

Modelling the data

3 sets of data was provided

  • Pumps – 1 row per pump with its location
  • Deaths – 1 row per death and its location (can be multiple rows for the same location)
  • Deaths aggregated – 1 row per location where at least 1 death was recorded, including the count of deaths at that location

This all needed to be ‘combined’ so we can work across all 3 sets of data and compare the locations. I used relationship calculations to create a relationship of 1=1 between the related data sets, which means all the aggregated deaths are mapped to each pump, and all the individual deaths are mapped to each pump too.

Building the Avg Distance to Pump bar chart

I chose to start building the bar charts first so I could get better acquainted with how the data was working together, and I knew the maps would take more effort.

To start we need to figure out the distance from each pump to each death, so we need to get the location of each pump, and the location of each death

Pump Location

MAKEPOINT([Pump Lat],[Pump Lon])

Individual Death Location

MAKEPOINT([Death Lat],[Death Lon])

These will both create calculated fields of a geographic data type as indicated by the globe icon to the left of the field.

And with these fields, we can then work out the distance between them in metres

Distance Pump to Death

DISTANCE([Pump Location], [Individual Death Location],’m’)

Now we want the average distance per pump

Avg Distance

{FIXED [Pump ID]: AVG([Distance Pump to Death])}

I formatted this to 0 dp and added a suffix of ‘m’

And with this, we can build the first bar chart – Pump ID on Rows and Avg Distance on Columns. You need to apply formatting

  • Change the font – I set the font to Times New Roman at the Workbook level, to make life easier (Format menu > Workbook).
  • Set the worksheet background colour. I used #f5f1f0
  • Add a black border to each bar (via the Colour shelf)
  • Show labels on bars (via Label shelf)
  • Hide Avg Distance axis (uncheck show header on the Avg Distance pill)
  • Remove all row & column borders
  • Set the Axes Ruler on Rows to black
  • Set the Tooltips

A parameter is needed to store the selected pump

pSelectedPump – an integer defaulted to 1

With this, we can then create a field to indicate which pump is selected

Is Selected Pump?

[Pump ID]=[pSelectedPump]

Add this field to the Colour shelf and set the True option to black and the false option to match the background colour you used.

Building the Total Deaths within Avg Distance bar chart

We now need to identify if the death occurred within the average distance that has been calculated.

Number Deaths within Avg Distance

SUM(INT([Distance Pump to Death] <= [Avg Distance]))

This is a bit of short hand instead of writing IF [Distance Pump to Death] <= [Avg Distance] THEN 1 ELSE 0, and summing up the result. It relies on the the fact that the equation [Distance Pump to Death] <= [Avg Distance] returns a boolean of true or false, which can be converted to an integer as true =1 and 0 = false.

At a row level, the field just returns 1 or 0, but just plot this field against Pump ID, you get the desired count

And the easiest way to build this bar chart, is to duplicate the one created above (on the worksheet tab, right click & duplicate sheet), then replace the Avg Distance pill with the Number Deaths within Avg Distance pill (simply drag the latter from the data pane and drop directly onto the former in the columns shelf). Then adjust tooltips etc to suit. Doing it this way preserves (most of) the formatting.

Building the Map

We’re going to use 4 layers in creating the map, and I’m documenting from the bottom upwards (although this isn’t the order in which I actually built the viz in practice – I shuffled things round afterwards).

We’ll start with the density layer – the blue coloured area on the map.

On a new sheet, add Individual Death Location and Death ID to the Detail shelf. This should automatically create a map view with the auto generated lat & long fields. Then

  • Change the mark type to Density
  • Change the colour to Density Blue-Teal Light
  • Alter the Intensity to 75%, the Opacity to 60%
  • Increase the Size to maximum
  • Delete the info in the tooltip

We need to set the background map

Via the Map > Map Layers menu option, set the Style to Normal and uncheck all options and set Streets, Highways, Routes

Then set the worksheet background to the colour we’ve used before

Onto Layer 2 – the pump buffer circle.

We’ve already identified the selected pump, but now we need to get it’s location, and for that we need the lat and long of the selected pump.

Selected Pump Lat

{FIXED: MIN(IIF([Is Selected Pump?],[Pump Lat],NULL))}

This essentially gets the latitude of the selected pump and sets that value across all the rows in the data.

Selected Pump Long

{FIXED: MIN(IIF([Is Selected Pump?],[Pump Lon],NULL))}

And with these we can get

Selected Pump Location

MAKEPOINT([Selected Pump Lat],[Selected Pump Long])

We also need to get the average distance of the selected pump spread across every row

Selected Pump Avg Distance

{FIXED :SUM( IF [Is Selected Pump?] THEN [Avg Distance] END)}

Now we have this, we can build the buffer field

Pump Buffer

BUFFER([Selected Pump Location],[Selected Pump Avg Distance],’m’)

Drag this Pump Buffer field onto the map and drop it onto the ‘Add a Marks Layer’ section that should appear. This will create an additional marks card.

Set the colour to the same background colour previously used, and drop the opacity to around 5%. Add Pump ID to Detail and Selected Pump Avg Distance to the Tooltip shelf. Adjust tooltip to match.

Onto Layer 3 – the pumps

Drag Pump Location onto the map and Add a Marks Layer.

Add Pump ID to Text and Location to Tooltip to the relevant marks card. Change mark type to circle, increase the size and change colour to red and add a dark grey border. Adjust text to be centred, and set the tooltips accordingly.

Onto the final layer now, the aggregated death locations. We need a further location field

Death Location

MAKEPOINT([DeathLat (deaths aggregate.csv)], [DeathLon (deaths aggregate.csv)])

Drag this onto the map to Add a Marks Layer then on the subsequent marks card add Loc ID to the Detail shelf. Change the mark type to circle, and add Deaths to the Size shelf – adjust accordingly. Change the colour to black with a white border. Adjust tooltip.

Adding the dashboard interactivity

Add the 3 sheets onto a new dashboard sheet. Create a new dashboard action to Change Parameter, that runs on select of any of the 3 charts, and targets the pSelectedPump parameter by setting it with the value from the Pump ID field.

Now if you select a pump on the map, or a bar on either of the bar charts, all the charts will reflect the selection made.

The other interactvity you may notice, is that on click, of the bars, or the pumps, the other data ‘fades’ (or the selected data is highlighted). We don’t want this.

To resolve, create a calculated field

True

True

and another

False

False

Add both these fields to the Detail shelf on both bar chart sheets and to the Detail shelf on the Pumps marks card.

Then on the dashboard, create a dashboard filter action which filters on selected fields setting True = False (see below). You’ll need to create one of these actions for each of the 3 sheets.

And hopefully, you now have a working viz. My published version is here.

Happy vizzin’! Stay Safe!

Donna

Sales Goal Selection Tool

Ann Jackson set this week’s #WOW2021 challenge, based on a recent ‘real world’ situation she had encountered.

Analysing Ann’s solution (by interacting with her published solution), I deduced we’d need to use set actions to add and remove the selected Sub-Categories into and out of the set (I hadn’t noticed Ann had tagged the challenge on the main page with Set Actions 🙂 ). I also realised the initial visual in the first column, wasn’t using reference lines to depict the target, as the tooltip displayed on hover, was much more detailed than what you can add to a reference line tooltip.

So armed with this knowledge, I set about building what was required.

  • Defining the required calculations
  • Building the BANs
  • Building the viz
  • Adding the interactivity

Defining the required calculations

This is one of those challenges where I want to get all the calcs sorted up front in a tabular view, before even attempting the viz. I’ll go through what I ended up with, but be assured this did take a bit of time and change of direction to get what I needed.

Let’s start with parameters. Firstly I created a parameter to simulate ‘today’, that is hardcoded to 9th June 2021

I then created a parameter to store the % uplift we want to use for one of the goal options. I default this to 0.1 (ie 10%)

We need to work out the sales so far this year (ie the sales in 2021 up to and including 9th June 2021), and the equivalent sales for the previous year (ie the sales in 2020 up to and including 9th June 2020).

SALES YTD by Sub Cat

IF YEAR([Order Date]) = YEAR([pToday]) AND [Order Date] <= [pToday] THEN [Sales] END

SALES LY YTD by Sub Cat

IF YEAR([Order Date]) = YEAR([pToday])-1 AND [Order Date] <= DATEADD(‘year’,-1,[pToday]) THEN [Sales] END

Let’s pop these into a table with the Category and Sub-Category fields, and add grand totals, so we can see what figures we need to be aiming for the BANs at the top.

Now let’s work out the values for the goals.

One goal is based on finding the average sales per day in 2021, then extrapolating this across the whole year.

So firstly, we need to know how many days in the year up to ‘today’.

# Days So Far This Year

DATEDIFF(‘day’,DATETRUNC(‘year’,[pToday]), [pToday] ) +1

This finds the number of days between 01 Jan 2021 and 09 June 2021 (then adds 1, as we need to include 9th June too).

# Days in Year

DATEDIFF(‘day’,DATETRUNC(‘year’,[pToday]), DATEADD(‘year’,1,DATETRUNC(‘year’,[pToday])))

This finds the number of days between 01 Jan 2021 and 01 Jan 2022.

These methods ensure the right number of days is recorded if the ‘current’ year happens to be a leap year.

We can now work out one of the goals

Same Pace Goal

(SUM([SALES YTD by Sub Cat])/[# Days So Far This Year]) * [# Days In Year]

Add up the Sales so far this year and divide by the number of days so far to get the average sales value per day, then multiply by the total number of days in the year.

For the other goal, we need to determine the sales for the whole of the previous year, and multiple by the uplift %.

SALES LY

IF YEAR([Order Date]) = YEAR([pToday])-1 THEN [Sales] END

This gives the total sales for 2020.

LY + Percent Increase

SUM([SALES LY])*(1+[pPercentIncrease])

This applies the % increase parameter to the total sales for 2020.

Now we want to decide which of the goal values to use based on ‘selection’. To define the selected Sub-Categories, we’re going to use a set. Right-click on Sub-Category > Create > Set, name accordingly and select ‘Phones’ as the initial value in the set.

Use Last Year With Increase

Now we can use whether the record is in or out of the set in the logic to determine the goal to use

YEAR_END GOAL per Sub-Cat

IF ATTR([Use Last Year With Increase]) then [LY + Percent Increase Goal] ELSE [Same Pace Goal] END

Add the set and the goal field to the table, and you’ll see the value in the final column is matching the relevant previous columns, depending on whether the Sub-Category is IN or OUT of the set.

NOTE – when you do this, you’ll get In or Out displayed against each row. To get the description as I’ve got displayed, right click on the text In or Out and Edit Alias.

These data fields are going to be used to build out the central viz. There’s a couple of other fields we need to finish off the data requirements for the viz

Value to Goal

[YEAR-END GOAL Per Sub Cat]-SUM([SALES YTD by Sub Cat])

and

% of Goal

SUM([SALES YTD by Sub Cat])/[YEAR-END GOAL Per Sub Cat]

This field needs to be formatted to percentage with 0 dp (all other monetary fields need to be formatted to $ with 0 dp).

So now we have the core fields stored against each row that will help us build out the main viz. You can edit the set and add further sub-categories, so you can validate how the values change.

Now we want to work on the data we need for the BANs. You may think the grand totals sum up all the rows above, however while this works fine for the SALES YTD by Sub Cat and SALES LY YTD by Sub Cat columns, it isn’t the case for the YEAR-END GOAL by Sub Cat column.

If we select all the rows in that column, and examine the tooltip that appears on hover, the total of the selected columns differs from the total at the bottom of the column.

The value in the hover text is what we need. The discrepancy with the column grand total is because it’s working across the whole data and not row by row. Because at least 1 record is in the set, its taking the logic to use the same pace goal. You’ll notice the total of this column matches the total of the Same Pace Goal column. In fact as you add more values to the set, the total will match the Same Pace Goal total up until all values are in the set. At that point the total will match the LY + Percent Increase column.

So back to the summary measures. We’re going to use table calcs to solve this.

SALES YTD

WINDOW_SUM(SUM([SALES YTD by Sub Cat]))

SALES LY YTD

WINDOW_SUM(SUM([SALES LY YTD by Sub Cat]))

YEAR-END GOAL

WINDOW_SUM([YEAR-END GOAL Per Sub Cat])

These are all basically summing up the values displayed in the rows on the screen. Add them to the table, and you can see the same values are displayed on each row which tally to other data in the table

Right, we have all the data fields we need, let’s start building.

Building the BANs

On a new sheet, add the fields as below

We want to turn this into 1 row.

Create a field called

Index

INDEX()

and add this to the Rows shelf and change to be discrete (blue pill). Each row should be numbered from 1 to 17.

Now drag the Index field from the Rows shelf onto the Filter shelf and when prompted select 1.

You’ve now got 1 row displayed, but the data associated to all the rows is being included in the calcs (if you’d filtered just to Sub Category = Accessories for example, the date would be just related to the rows with the Accessories value).

Now we can tidy this to look how we want

  • Add Measure Names to Text shelf (change display to entire view if need be).
  • Hide Sub-Category from displaying (uncheck Show Header)
  • Right Click on each column title and Edit Alias to remove the ‘along…’ text
  • Format the text appropriately and align centrally
  • Adjust the tooltip to remove the Sub-Category info
  • Hide the Measure Names from displaying (uncheck Show Header).
  • Format the Row Dividers to be thick

Building the Viz

Add Category and Sub-Category to the Rows shelf on a new sheet. Change the sort of the Category field to sort by data source order, descending. Change the sort of the Sub-Category field to sort as below

Add SALES YTD by Sub Cat to Columns, then drag SALES LY YTD by Sub Cat onto the canvas and drop onto the SALES YTD by Sub Cat axis when you see the 2 columns appear

Drag Measure Names from the Rows shelf to the Size shelf. Add Measure Names to the Colour shelf. Adjust sizes and colours accordingly. Add a white border to the bars on the colour shelf.

Turn stack marks off (Analysis > Stack Marks > Off) to allow the bars to overlay each other.

Add both SALES YTD by Sub Cat and SALES LY YTD by Sub Cat to the Tooltip shelf, so you can reference both values regardless which bar you’re hovering over.

Add YEAR-END GOAL by Sub Cat to the Columns shelf and set to dual axis and synchronise axis. Reset the mark type of the Measure Values card to bar, and of the YEAR-END GOAL by Sub Cat card to Gantt.

Remove the Measure Names from the Size and Colour on the YEAR-END GOAL by Sub Cat card. Set the colour of this mark to black.

On the All marks card, add YEAR-END GOAL by Sub Cat to the Tooltip shelf so once again it’s value can be referenced by all marks.

Adjust the tooltip on the All marks card to match.

‘Type in’ MIN(0) to the Columns shelf (double click in the space next to the pills to enable the text edit feature). On this MIN(0) marks card, change mark type to Text and add Value to Goal to the Text shelf. Adjust the text to include the additional wording and adjust size too if need be. Adjust tooltip too.

Repeat this process for the % of Goal field too.

For the circles, create another MIN(0) column in the same way, but this time change mark type to circle, and add Use Last Year with Increase set to the Colour shelf and adjust. Adjust tooltips.

Format axes/gridlines/text/row banding accordingly and rotate text of the Category field.

Adding the interactivity

Add the sheets to a dashboard. Add a set action to add Sub-Categories to the set on click (Dashboard > Action > Add Action > Change Set Values). Set the action to run on select (on click) of the chart viz, to add values to the Use Last Year with Increase set.

We then need another set action to remove the Sub-Categories, but this needs to work by clicking a link in the tooltip which displays. The name of this action will display on the tooltip.

And fingers crossed.. that should be it! My published viz is here.

Happy vizzin’! Stay Safe!

Donna

Can you structure the unstructured?

As soon as I saw that Candra’s challenge for this week was going to involve Regular Expressions (RegEx), I gave a little groan. RegEx just isn’t my thing 😦 I only ever seem to use them for these challenges, and not in my working life, so have minimal experience. I always think I should focus some time on learning them properly, but other things just end up taking priority. Ho Hum…

So most of my time was spent trying to wrangle the info I needed to identify ‘how many bedrooms’ each property had. I did a bit of googling to try to find the right expressions I think I needed, used the regex101 site to test my expression to find certain patterns of text against some of the data in the Description field, and then tried to plug that into a calculated field in Tableau to extract the data I needed.

But I couldn’t get it to work 😦 I could find matching text using the REGEXP_MATCH function, but when I then tried to use the REXP_EXTRACT functions I couldn’t get anything out…

So I ended up having to look at the solutions that had already been published by the time I started, Candra’s, Lorna Brown’s and Sam Epley’s. I just needed to get my head round what I was obviously doing wrong and give me some pointers. All 3 had slightly different approaches. I absorbed, then closed their workbooks and attempted again from memory. With a lot more trial and error I got somewhere… it isn’t perfect and has some mismatches from the others (but they don’t all match each other either…).

Once I’d got a grouping for each property, the actual Tableau stuff was quite straightforward…

  • Identifying the ‘Number of Bedrooms’
  • Building the Histogram
  • Adding the Average Price
  • Building the Map
  • Adding the Interactivity

Identifying the Number of Bedrooms

So the way I approached this, was to try to identify all the various permutations that represented the word ‘bedroom’ and replace it with the word ‘Bedroom’. But one of the options was BR or br, and the Description field contained html markup with the term <br />. I didn’t want all these to become ‘bedroom’, so I got rid of them all first,

Desc with Bedroom

REGEXP_REPLACE(LOWER(REPLACE([Description],'<br />’, ‘ ‘)),’bedroom|br |bdrm|bed|bd|br, |br/|rooms’,’ Bedroom’)

Firstly, replace any occurence of <br /> with a space, then replace any occurrence of the text bedroom or br<space> or bdrm or bed or bd or br<comma> or br<forward slash> or rooms with the word Bedroom.

I basically added more options to the or statement (identified by the | separator), as I went on examining the descriptions that were left. Using the LOWER function meant that bedroom or Bedroom or BedRoom etc would all be covered with one option.

Then I attempted to extract the number of bedrooms or identify as a studio

Studio | Beds

IF CONTAINS(LOWER([Desc with Bedroom]), ‘studio’) THEN ‘Studio’
ELSEIF REGEXP_MATCH(LOWER([Desc with Bedroom]),’\d bedroom’) THEN REGEXP_EXTRACT(LOWER([Desc with Bedroom]),'(\d+) bedroom’)
ELSEIF REGEXP_MATCH(LOWER([Desc with Bedroom]),’\d bedroom’) THEN REGEXP_EXTRACT(LOWER([Desc with Bedroom]),'(\d+) bedroom’)
ELSEIF CONTAINS(LOWER([Desc with Bedroom]), ‘six bedroom’) THEN ‘6’
END

If the revised description contains the word ‘studio’ then assume its a Studio.

Else if the revised description contains a number (\d) followed by 2 spaces then the word ‘bedroom’ then extract the numbers (\d+) that occur before the word bedroom. The brackets around the \d+ is what is used to identify what bit of the matching pattern to extract… this is the bit that I didn’t really know about and why I couldn’t get things to work.

Else if the revised description contains a number (\d) followed by 3 spaces then the word ‘bedroom’ then extract the numbers (\d+) that occur before the word bedroom. This just happened to be another pattern that occurred and meant some records didn’t get picked up by the prior statement. There’s probably a better way of doing this in one statement…

Finally, if the revised description contains the text ‘six bedroom’ then assume the property has 6 rooms.

This logic seemed to get a match against every record although it’s not 100% accurate, but it was close enough given my struggles.

I then wanted to get the rooms grouped

Room Grouping

CASE [Studio | Beds]
WHEN ‘Studio’ THEN ‘Studio’
WHEN ‘1’ THEN ‘1 Bedroom’
WHEN ‘2’ THEN ‘2 Bedrooms’
WHEN ‘3’ THEN ‘3 Bedrooms’
WHEN ‘4’ THEN ‘4 Bedrooms’
ELSE ‘5 or more Bedrooms’
END

I planned to use this field as my filter, but in doing so the value listed alphabetically, so Studio ended up at the bottom of the list.

To resolve this I created a parameter which meant I could define the order I wanted :

pBedroomSelector

And then I created a new field to use for the filter

Filter Room

[pBedroomSelector] = ‘All’ OR
[pBedroomSelector] = [Room Grouping]

I could then add this onto the filter shelf of the sheets I needed to build, setting the value to True.

Building the Histogram

For this chart, we need to ‘bin’ the Price of each property into groups of $100 ranges. However if we use the built in ‘bin’ function, the field created can’t be referenced in other calculations, and I needed to do this. So instead I determined the ‘lower’ value of the range by

Price per Night Min

FLOOR([Price]/100) *100

Divide the price by 100, round down to the nearest whole integer (so 1.9 will round down to 1), then multiply the result by 100.

And given that, I can then calculate

Price per Night Max

[Price per Night Min]+100

I also created a ‘friendlier’ field to store the number of properties

# of Listings

COUNT([listings copy_listings copy])

which is just a reference to the auto generated field created when you connect to the data source.

With these I can plot the histogram

  • Price per Night Min on Columns (set to discrete, continuous)
  • # of Listings on Rows
  • Mark type of Bar
  • Size set to be Fixed with a width of 100
  • Filter Room on the Filter shelf, set to True.
  • Adjust the colour via the Colour shelf and set a white border
  • Show the pBedroomSelector parameter
  • Add Price per Night Max to the Tooltip shelf and set to be an attribute.
  • Set the Tooltip accordingly and format gridlines, axes labels etc

Adding the Average Price

I wasn’t entirely sure what the average price on Candra’s solution represented, so I chose to go for the average price of the properties in the filtered selection; that is of all the 2-bedroom properties for example, find the average price per night, based on the total price per night of all the properties divided by the number of properties. ie I was looking for these values in the 3rd column.

But I couldn’t simply add the Price field aggregated to Avg to the bar chart. Doing so gave me different values per Price per Night Min grouping.

I just want the value on the grand total line spread across the all the data in the chart. So I created

Window Avg Price

WINDOW_SUM(SUM([Price])) / WINDOW_SUM([# of Listings])

This table calculation, set to compute by Price per Night Min gives the value I want across all rows of data

Add Window Avg Price to the Detail shelf of the histogram, set the calc to compute as above. Then you can add a reference line to the Price per Night Min axis.

Building the Map

To build maps you need fields that are geographic data types. For me, the Longitude field was already set, but I had to manually set the Latitude field (right click -> Geographic Role -> Latitude).

Once done, the map could be quickly built by double-clicking the Longitude field, then double clicking the Latitude field, then adding Name and Listing URL to the Detail shelf, and Price to the Tooltip shelf. Finally set Filter Room = True to the Filter shelf.

I then adjusted the colour of the circles, reduced the opacity to 50% and added a border (all via the Colour shelf).

I also added Area Code Boundaries via the Map -> Map Layers menu to get the map style Candra had used.

Adding the Interactivity

Add the 2 sheets to a dashboard. Each chart can be used to filter each other. This functionality can easily be added by clicking on the context menu of the dashboard object, and selecting Use as Filter. A filter dashboard action will automatically be added. Do this for both charts.

The final requirement, is for a link to the actual listing to be available from the map tooltip. This is a dashboard URL Action (Dashboard -> Actions -> Add Action -> Go to URL). Set as below

The words in the Name field will what is displayed on the tooltip.

The layout requires use of containers, background colours and a bit of padding. This is typically a bit of trial and error to get this right. You can check out my published version here.

Happy vizzin’! Stay Safe!

Donna

How often is Sean listening to his favourite songs?

A “scatter plot with a twist” music based challenge this week was posed by Sean Miller, using data from his last.fm account. On the surface, a simple scatter plot of each song, comparing the number of days between first and last listen against total plays. But click on song, and you get to see a timeline depicting days since first play vs cumulative plays, on the same chart. Hmmm…

The data set provided consists of 1 row per song per date played since 2017. For the scatter plot, we’re looking to summarise the data at a song level. For the timeline related to the selected song, we need to have the data at the song and date level. Ultimately we’re mixing levels of granularity within a single chart.

This certainly took some thinking. Creating the 2 charts independently was pretty straightforward, but trying to put them together took some thought. I knew I was going to want to use a set action to drive the interactivity and apply different logic based on whether a song was selected or not, but it took a bit of trial and error to get a solution.

First up, the data provided contained some timestamp date fields, but these were string data types. I chose to use the Timestamp UTC field to get a ‘proper’ date. I actually did this by duplicating the field, renaming it to Date Played and changing the datatype to a date. This generated a calculation which is below… I’d have never of typed this myself 🙂

Date Played

DATE(IF NOT ISNULL( DATEPARSE ( “MM/dd/yyyy HH:mm”, [TimeStamp UTC] ) ) THEN DATEPARSE ( “MM/dd/yyyy HH:mm”, [TimeStamp UTC] ) ELSEIF NOT ISNULL ( DATEPARSE ( “MM/dd/yyyy hh:mm:ss a”, [TimeStamp UTC] ) ) THEN DATEPARSE ( “MM/dd/yyyy hh:mm:ss a”, [TimeStamp UTC] ) END)

Now we have that, we can work out the first date a song was played

Min Date Per Song

{FIXED [Song ID]:MIN([Date Played])}

and the latest dates

Max Date Per Song

{FIXED [Song ID]:MAX([Date Played])}

and then we can derive the days between

Days since first listen

DATEDIFF(‘day’, [Min Date per Song], [Max Date per Song])

We can also get the total plays per song using

Total Plays per Song

{FIXED [Song ID]: [Total Plays]}

and with these calculated fields, we can build the basic scatter, using Total Plays per Song as a range filter, and setting both the axes not to start at zero.

So far so good. Now let’s think about the timeline. We need to identify a ‘selected song’ to help build this, so let’s create a set by right clicking on Song ID > Create > Set, and selecting a single option

We need to capture the number of plays on a date

Total Plays

COUNT([2021_05_26_WW21_My Streaming Activity.csv])

This is the default count field generated (the equivalent of the Number of Records if you’re on older versions of Tableau).

And we also need to capture the number of days from first play date to the current date, as we need to plot on a consistent axis when putting the charts together (ie we can’t plot with date on the axis).

Days to Date

DATEDIFF(‘day’,[Min Date per Song],[Date Played])

Let’s put this out into a table so you can see what’s going on.

Add the Selected Song = True to the filter shelf

There are days when there were multiple plays, so the Days To Date field needs to be set to AVG rather than SUM, to get the correct figure.

For Total Plays we need to plot the cumulative value, so we can set a quick table calculation against the Total Plays field of Running Sum (right click on Total Plays field). We’re going to explicitly set the table calculation to compute by Date Played as when depicted on a viz, the default of table down, might not give the correct values.

So with these fields, we can build the timeline viz (duplicate the table sheet and move the fields around). Set the mark type to line, and change the setting on the Path to make a stepped chart.

Ok, so now we have the 2 charts and hopefully understand what we’re aiming for. But how do we now go about getting everything onto a single chart?

We’re going to need a dual axis chart, since we have different mark types in play. And we’re going to want to plot different measures depending on whether we’re working with the selected song or not.

We’re going to build the data up in a table to get the logic for the fields we need. To just test the concept, we’ll filter to just a few songs, including the one in the Selected Song set. Add Song ID to the filter shelf and filter to a few songs.

Then build out a table as below:

What we’re aiming for, is for the records where In/Out of the set is Out, we want to plot the information we’re getting from the 1st two columns, but for the records where In/Out the set is In, we want the information from the other columns.

So let’s build this out.

Days to Plot

IF ATTR([Selected Song]) THEN AVG([Days To Date]) ELSE SUM([Days since first listen]) END

Plays to Plot

IF ATTR([Selected Song]) THEN RUNNING_SUM([Total Plays]) ELSE SUM([Total Plays per Song]) END

Pop these in the table, making sure any fields which are table calculations are set to compute by Date Played.

If you scroll to find where the data changes from a song out of the set to the one in, you can see how the two new fields are working.

So lets try plotting the chart out using these fields instead.

  • Days to Plot on Columns
  • Total Plays Per Song on Rows
  • Song ID on Detail shelf
  • Plays to Plot on Columns
  • Date Played (set to exact date) on Detail of the All Marks card.
  • Set the table calculation of Plays to Plot to compute by Date Played only.
  • Change mark type of the Plays to Plot to Line and set the Path to stepped line
  • Add Total Plays Per Song to Filter shelf and set to range from 50 to 100 (just to make the chart less busy).

You should end up with the below

You’ll notice we have the line of circles in the top chart, which is plotting a mark per day for the Selected Song. If we remove the Date Played pill from the Detail shelf of the Total Plays Per Song marks card, we lose these marks including the mark for the Selected Song too, which we need.

To resolve this, we need another field.

Is Last?

LAST()=0

Last Plays to Plot

IF [Is Last?] THEN Sum([Total Plays per Song]) END

This is just saying give me the total plays for the last mark in the list. Add these to the table to see what’s going on, making sure to set the table calc to compute by Date Played only

Now if we replace Total Plays per Song with the Last Plays to Plot field, we get
Now make the chart dual axis (don’t forget to synchronise), and we can then sort the formatting.
  • Set the marks type on the Last Plays to Plot to circle
  • Remove Measure Names from the Colour shelf, and set the mark colour to #3dde3c. Reduce the opacity to about 90%. Add a dark grey border.
  • Add Selected Song to the Size shelf, and adjust so the selected song is larger than the others.
  • On the Plays to Plot marks card, again remove Measure Names from the Colour shelf, and set the colour to black.
  • Set the Path to stepped line.
  • Click on the right hand axis and select Move marks to back
  • Reduce the Size of the mark.
  • Add Selected Song to the Colour shelf, and adjust the colours so the line is black and the tiny dots for all the other marks that you can see in the circles, is set to #3dde3c

You’ve now got the core chart, which needs to be further tidied to remove grid lines, axes, add tooltip etc. Once done, you can add to a dashboard, where you can then set the interactivity.

Add a dashboard action to change set values that sets the Selected Song set on Select. Set to work on single-select only.

Finally, you’ll find that if you select a mark, while you’ll get the trend line, the other points will now ‘fade out’

Create a new field True = True, and add this to the Detail shelf of the All Marks card. Then on the dashboard, add a new dashboard highlight action, which is set to Target Highlighting to the True field only.

Now if you click on and off a mark, you should get the trendline show and disappear, and all marks remain at the same transparency throughout.

Hopefully you’ve got enough now to complete this challenge. My published viz is here.

Happy vizzin’! Stay Safe!

Donna

Can you compare Same Day to a Selected Date?

Week 20 for #WOW2021, and Lorna Brown set the challenge based on comparing dates. The requirement was to avoid LoDs in preference of table calculations. Unintentionally, I managed to build a solution without table calcs either. This just happened based on the problem solving process I ended up taking. I figured I would encounter a need for a table calc as I progressed, but it didn’t happen.

  • Defining the key calculated fields
  • Building the BAN
  • Building the Line Chart
  • The Reset button
  • Adding the dashboard interactivity

Defining the key calculated fields

To drive this viz, you’ll need 2 parameters – one to store the selected date and one for the selected comparison timeframe.

pSelectedDate

pComparison

Set to be an integer (more efficient in calcs later), but display text on screen

We now need to work out what the comparison date will be for each of the selections.

We’ll start with the date last year. We’re looking for the ‘equivalent’ day of the year in the previous year. I didn’t read the hint provided, so came up with my own method based on the month and whether we’re in a leap year or not

Equivalent Day Last Year

DATE(IF ((DATEPART(‘month’,[pSelectedDate]) <=2) AND ((YEAR([pSelectedDate])-1)%4=0)) OR ((DATEPART(‘month’,[pSelectedDate])>=3) AND (YEAR([pSelectedDate])%4=0))
THEN DATEADD(‘day’,2,DATEADD(‘year’,-1,[pSelectedDate]))
ELSE DATEADD(‘day’,1,DATEADD(‘year’,-1,[pSelectedDate]))
END)

So what is this saying….

If the month of the pSelectedDate is Jan or Feb AND the previous year is a leap year (as it can be divided by 4 with no remainder), OR if the month of pSelectedDate is not Jan or Feb and this year is a leap year, then get the exact same date from last year, but add on 2 days, otherwise get the exact same date from last year, but add on just 1 day.

Now let’s consider the same day for last month. I played with Lorna’s solution, entering difference dates to see what the compare date came back as, and it wasn’t always clear to me what the logic was that was being used. So this is what I came up with. If the pSelectedDate was for example, the 3rd Tuesday in the month, then I wanted to get the 3rd Tuesday in the previous month. However, based on the length of months and when months start and end, some months can have nearly 5 weeks in a month, while the previous may only have 4. In my logic therefore, if there was no 5th Tuesday in the previous month, I would return the 4th Tuesday of the month. So in that instance if pSelectedDate is the 4th Tuesday in the month, I’d compare to the 4th Tuesday of the previous month. If pSelectedDate is moved to the 5th Tuesday in the same month, it would also compare to the 4th Tuesday of the previous month – PHEW! all a bit mind boggling perhaps, and it certainly took the most amount of time in this challenge.

I broke this down into multiple calculations.

DoW Selected Date

DATEPART(‘weekday’,[pSelectedDate],’Sunday’)

What is the day of the week number for the pSelectedDate? If pSelectedDate is 18th May 2021, this will be 3 (ie Tuesday).

Exact Date Prev Month

DATE(DATEADD(‘month’, -1, [pSelectedDate]))

as it says on the tin – exact same date last month, so if pSelectedDate is 18th May, this will be 18th April.

Prior DoW Prev Mth

DATE(CASE [DoW Selected Date]
WHEN 1 THEN DATETRUNC(‘week’, [Exact Date Prev Month],’Sunday’)
WHEN 2 THEN DATETRUNC(‘week’, [Exact Date Prev Month],’Monday’)
WHEN 3 THEN DATETRUNC(‘week’, [Exact Date Prev Month],’Tuesday’)
WHEN 4 THEN DATETRUNC(‘week’, [Exact Date Prev Month],’Wednesday’)
WHEN 5 THEN DATETRUNC(‘week’, [Exact Date Prev Month],’Thursday’)
WHEN 6 THEN DATETRUNC(‘week’, [Exact Date Prev Month],’Friday’)
WHEN 7 THEN DATETRUNC(‘week’, [Exact Date Prev Month],’Saturday’)
END)

If pSelectedDate is a Tuesday, then this returns the date of the Tuesday that is before the exact date last month. If pSelectedDate is Tuesday 18th May, Exact Date Prev Month is 18th April, which is a Sunday. Truncating this date to the start of the week, where the week starts on a Tuesday, returns Tues 13th April.

Next DoW Prev Mth

DATE(DATEADD(‘week’, 1, [Prior DoW Prev Mth]))

Just adds 1 week onto the above.

Equivalent Day Last Mth

IF MONTH([pSelectedDate]) = MONTH([Next DoW Prev Month]) THEN [Prior DoW Prev Mth]
ELSE [Next DoW Prev Month]
END

If the months of pSelectedDate and New DoW Prev Mth are the same, then use the ‘previous’ date, otherwise use the ‘next date’. This took some pen and paper to work out!

So now having worked out what the dates might be, I just plugged them into the below to get the comparison date we need.

Compare Date

DATE(IF [pComparison] = 1 THEN [Equivalent Day Last Year]
ELSEIF [pComparison] = 2 THEN [Equivalent Day Last Month]
ELSE DATEADD(‘week’, -1, [pSelectedDate])
END)

Further fields we then need are

Selected Date Sales

IF [Order Date] = [pSelectedDate] THEN [Sales] END

Compared Date Sales

IF [Order Date] = [Compare Date] THEN [Sales] END

% Difference

(SUM([Selected Date Sales]) – SUM([Compared Date Sales])) / SUM([Compared Date Sales])

custom formatted to ▲0.0%;▼0.0%

Building the BAN

I built this using a bar mark type with MIN(1) on Columns and the axis fixed from 0 to 1.

Then all the relevant fields were added to the Label field and organised accordingly. All date fields were formatted to the <weekday>, dd mmmm, yyyy format.

The % Difference field was added to Colour, and then the colour legend formatted as below

Building the Line Chart

The line chart only needs to show the information between the Compare Date and the pSelectedDate.

Dates to Show

[Order Date]>=[Compare Date] AND [Order Date]<=[pSelectedDate]

This needs to be on Filter shelf and set to True.

The main line chart is then just simply Order Date (exact date, continuous) plotted against Sales

For the circle markers, we can add Selected Date Sales and Compared Date Sales as ‘shared measures’ onto another axis

To get the ‘label’ points to show, we’re going to plot 2 more marks at an arbitrary point on the axis, but at the dates we care about. For this I created

Minus 500

IF [Order Date] = [pSelectedDate] OR [Order Date]= [Compare Date] THEN -500 END

Add this into the Measure Values section, but aggregated to Avg rather than Sum

We want to label these marks, but not with the -500 value.

Selected | Compared Date Sales

IF [Order Date] = [pSelectedDate] THEN ([Selected Date Sales])
ELSEIF [Order Date] = [Compare Date] THEN ([Compared Date Sales])
END

Add this field onto the Label shelf of the Measure Values marks card and show mark labels

But we only want the labels to show against the lower marks. I have to admit I tried many things to try to make this work, but in the end had to peek at Lorna’s solution. The label setting needs to be – its the Measure Names field that is key here!

I then set the colour of the Minus 500 measure to white, so you can’t see it

This chart can then be set to dual axis (don’t forget to synchronise those axis). You may need to reset some of the mark types etc. Remove all headers/borders/gridlines.

The Reset Button

I created a Default Date field storing DATE(#2021-05-18#) and added to a sheet as below.

Adding the dashboard interactivity

On click of a point on the chart, it should set the pSelectedDate parameter. This is a parameter action.

We don’t what the chart to show the point as ‘selected’/highlighted though, so we need to use a dashboard filter action to supress this from happening.

Create fields True = True and False = False. Add these 2 fields to the Detail shelf on the line chart.

Then on the dashboard, add a dashboard filter action as below, setting the fields to map True to False.

Finally, the reset button needs to set the parameter too :

Hopefully this should be enough to help you complete this challenge. My published viz is here.

Happy vizzin’! Stay Safe!

Donna

Regional Sales Readout

The challenge this week from Ann Jackson was all about map layers, a feature introduced in v 2020.4. As an extra twist Ann specified that you couldn’t use table calculations or LoDs, and that the viz should all be on a single sheet.

The key features of this challenge are

  • display the map segmented by region
  • colour the borders of each region with different colours
  • display the cities sized by sales
  • display sales per region
  • on hover of a city, display city sales, city name and % of regional sales

Display the map segmented by region

Maps work with geographic data types, so to split map by Region, the field needs to be converted to this type. To do this, right click on Region > Geographic Role > Create From > State

Double clicking on Region should automatically create a map visual. Change the mark type to filled map, and you’ll see the region boundaries.

Strip off all the background & labels etc via the Map > Map Layers menu. Uncheck all the options on the right hand side.

Set the colour of the map to grey (via the Colour shelf), then set the background of the worksheet to a darker grey/black. Remove row & column dividers.

Colour the borders of each region with different colours

On the colour shelf, you can set a border colour, but this is a fixed single colour, not one that can be associated to a field. This is where the map layers start to come in.

We need to create a layer per region.

To start we need a calculated field

West Region

IF [Region]=’West’ THEN [Region] END

This should still be a geographic role (if it hasn’t got a globe symbol next to it, then make it geographic as you did above).

Drag this field onto the map, and drop it on the Add a Marks Layer symbol that displays. A new West Region marks card will display. Change the mark type to filled map.

Now on the Colour shelf, reduce the opacity to 0. Change the border colour to #7ec886, and the halo to None.

Finally, and this is the really sneaky bit that took me some time to figure out, right click on the Null on the legend on the right hand side, and Hide. Only the West region now has the coloured border.

Repeat these same steps for the Central (#82d7d6), East (#b6b6dd) and South (#fd8b59) regions.

A final step at this point is to turn off tooltips across all 5 layers. Also, we don’t want the regions ‘highlighting’ as you hover the mouse over the map, so on each layer, disable selection

Display the cities sized by sales

Drag the City field onto the map, and drop it to create another layer. Drag State onto the Detail shelf of the City marks card (as there are some cities that are associated to different states).

Add Sales to Size and adjust accordingly.

Add Region to Colour, and reduce the Opacity to suit. Remove tooltips from displaying.

Display sales per region

Add another layer by dragging Region onto the map again. Change the mark type to Text. Add Sales to the Text shelf. Format appropriately. Set Disable Selection on the layer.

Good practice, rename the layer to Sales Text or similar.

On hover of a city, display city sales, city name and % of regional sales

For this we need to store a selected city, and we’ll use sets to do this. I did this by right-clicking on one of the cities in the viz, and choosing Create Set. Rename the set Selected City.

We can now work out the value of the Sales for this city, and the % of sales.

Selected City Sales

IF [Selected City] THEN [Sales] END

format to $ 0dp

% Total Sales

SUM([Selected City Sales])/SUM([Sales])

format to % 1dp.

Add these onto the Text shelf of the Sales Text marks card and format.

Now we want to add the City Name, some other additional text.

Selected City Name

IF [Selected City] THEN [City] END

However, if you add this field to the Text shelf, the marks move and things go a bit crazy.

I had to ponder about this for a long time, and I can’t explain why it does it. However it didn’t happen when I added measures (green pills) to the Text shelf. So I converted the field to a measure by editing the field and changing it to

Selected City Name

MIN(IF [Selected City] THEN [City] END)

Adding this to the Text shelf and the text stayed where it was.

A further calculated field is required, which is simply

Selected City Sub Text

MIN(IF [Selected City] THEN ‘OF REGION TOTAL’ END)

Add this and adjust the text.

So now we have the text showing, we need to make it change based on hover.

Add the map to a dashboard, then add a set action (Dashboard > Action > Add Action > Change Set Values)

Set the properties as below, and then test the interactivity by hovering over different cities.

Hopefully that’s everything you need to complete this challenge. The borders and the ‘moving marks’ certainly caused me sometime of puzzlement, but it is a very satisfying feeling when you get that ‘lightbulb’ moment. My published viz is here.

Happy vizzin’! Stay Safe!

Donna

Can you performance tune this workbook?

This week’s #WOW2021 challenge was a bit different. Rather than being challenged to recreate a viz, Candra challenged us to performance tune a workbook she’d put together, with the aim for all steps to be sub 0.5s.

I started by downloading the slow published workbook (waiting patiently for it to download), opening the workbook ( waiting patiently for it to open), and then assessing how it had been built.

It took a while to download… it was 99MB…

When it finally opened, I could see

  • The workbook had multiple data sources, which were being blended
  • There were cross data source filters being applied
  • There were lots of fields in the data source that weren’t being used
  • One of the data sources contained 2.2 million rows

Tackling these points initially was key.

The requirements stated that the end user only cares about OB/GYNS (specifically Pri Spec = OBSTETRICS/GYNECOLOGY), so the hint here was we’re allowed to restrict the size of the data source.

The requirements also stated any data prep/transformation had to happen in Tableau Desktop, hinting that some re-modelling would be required.

This means you do need to download the original data source files (which also took some time). My aim was to create a single data source, that would be restricted to the OB/GYNS data only, that I could then repoint the existing report to.

I created a new connection within the ‘slow’ workbook by connecting to the two files and relating them – the relationship of St to St should happen automagically 🙂

I then added Data Source Filters (right click on the data source > Edit Data Source Filters), and set the filters to restrict to the relevant Pri Spec and exclude certain states (which I found was applied when looking at the data source filters applied to one of the original data sources).

Next I repointed the workbook to use this new data source : right click on the original DAC_NationalDownloadableFile data source > Replace Data Source, and repoint to the DAC_NationalDownloadableFile.csv+ (Multiple Connections) data source

This caused an issue with the NPI field which was a string in the original and a number in the replacement.

Resolve this by right clicking on the NPI (1) field and select Replace References and point to the NPI field. This will then fix the issue with the Providers per Capita field too.

On the Providers per Capita sheet, remove both the pills from the Filter shelf and replace the Population field on the Tooltip shelf that is from the secondary data source (since there is an orange tick against it), with the Population field from the newly created data source (literally drag it and drop it onto the existing pill so it replaces it). You’ll need to format the Population field to millions with 1 dp.

Edit the Providers per Capita measure from

round(COUNTD([NPI])/(SUM([Sheet1 (State Regions)].[Population])/100000),0)

to

COUNTD([NPI])/(SUM([Population])/100000)

This pill is no longer referencing data from another data source. The ROUND isn’t necessary as this can be managed with the number formatting (so its one less calculation to be done and shaves a teeeeeny bit of time).

You can now remove the original 2 data sources (right click data source and close).

Now extract the remaining data source – right click > Extract Data. The data source filters you applied, will automatically be set to be extract filters. Also set the extract to Hide Unused Fields. The extract filters will reduce the number of rows in the data source; the hide unused fields will reduce the number of columns, all of which optimises the data source as much as it can be.

Change to the dashboard sheet now.

Remove the Primary Speciality filter as its no longer relevant (contains 1 value).

I chose to change the City filter to ‘All values in database‘ and to be a Wildcard Match input, so the user types in rather than selecting from a list. Using fields that have a high number of values as a filter and displaying all the possible values in the filter can impact performance. A separate query needs to be run to populate the list. Using ‘relevant values’ again causes a query to happen each time other selections are made with other filters/actions. This can also add overhead.

For the same reasoning, I also changed the NPI filter in the same way,

I also changed the Gender filter to be a Single Value List (ie radio button) entry, set to Include rather than Exclude values.

With these changes I ran the performance recording (Help menu -> settings and performance -> start performance recording), followed the required steps (select Texas, deselect Texas, set Gender to F, reset back to All), then stopped the performance recording ( Help menu -> settings and performance -> stop performance recording). This will automatically load a new workbook with the results.

The best I got to on my personal laptop is below – not the 0.5s suggested in the challenge, but a lot better than it was.

Running the same process against the same workbook on my more powerful work laptop, I got :

which suggest hardware does help 🙂

After a bit of discussion with another #WOW participant, it was suggested reordering the columns in the table (just reverse the order) should help. So I tried this, and shaved a bit more time (this is personal laptop, so compare to top image above).

I’m interested in seeing what other solutions and suggestions get shared and what Candra came up with herself.

Happy vizzin’! Stay Safe!

Donna

Generation Population

For week 17 of #WOW2021, Sean Miller decided to challenge us with recreating a chart by Nathan Yau (see here for the original). The aim was to recreate within 1 sheet, which I managed to do. So how did I do it? Read on 🙂

  • The groundwork
  • Colouring the bars
  • Adding the year labels
  • Final formatting

The groundwork

The chart itself follows a straight forward structure of multiple blue dimension fields on Columns with a green measure on Rows similar to this view below based on Superstore Sales data – it’s just formatted a bit more creatively!

The data provided just contains 3 fields : SEX, AGE and 2019 Population. We want to present the 2019 Population measure for the Total SEX only by Year. We need to create the Year field, which is simply

Year

2019-[AGE]

I dragged this into the ‘dimensions’ section of the data pane (above the line).

This allows us to create the basic bar chart required

We now need to define the various fields that we will need to add as additional dimensions on the Columns shelf to create the ‘generation’ data panes.

Generation

IF [Year]<= 1927 THEN ‘Greatest
Generation’
ELSEIF [Year]<= 1945 THEN ‘Silent Generation’
ELSEIF [Year]<=1964 THEN ‘Baby Boomer’
ELSEIF [Year] <= 1980 THEN ‘Generation X’
ELSEIF [Year]<=1996 THEN ‘Millennials’
ELSEIF [Year]<=2012 THEN ‘Generation Z’
ELSE ‘Gen
Alpha’
END

NOTE – there is a deliberate carriage return in the condition for ‘Greatest Generation’ and ‘Gen Alpha’ which will force the field to ‘wrap’ when displayed.

Having defined the above, we need to determine

Total Population Per Generation

{FIXED [Generation], [SEX]: SUM([2019 Population])}

and then

% of Total Population

SUM([Total Population Per Generation]) / TOTAL(SUM([Total Population Per Generation]))

NOTE – to create this field, I originally created a ‘quick table calculation’ against the Total Population Per Generation field which I’d displayed on a view, and then dragged the resulting pill into the measures pane to create the new field with the desired calc.

Let’s put these in a table, so we can then check the values, and see that the 2nd and 3rd columns are the same value for each row associated to a particular generation, which is what we need.

Right, so now we need to determine the rank based on the Total Population Per Generation

Rank

RANK_DENSE(SUM([Total Population Per Generation]))

Format this to a custom number with 0 decimal places, but prefixed with #

When added to the table we get

The intention, is that Rank will be displayed as discrete ‘header’ pill rather than a measure, so let’s move Rank to be the 1st pill on the Rows shelf and change to be discrete.

But we need the Total Population Per Generation and % of Total Population fields to be combined into a single pill. So we need to do a bit of string manipulation/ number formatting for this

Total | Percent

STR(ROUND(SUM([Total Population Per Generation])/1000000,1)) + ‘M’ + ‘ | ‘ + STR(ROUND([% of Total Population] * 100,1)) +’%’

This looks complicated, but its because even though you may have applied the relevant display number formatting against the individual numeric measures of Total Population Per Generation and % of Total Population, the formatting is not preserved, when converted into a string field, which this field needs to be. So the relevant calculations need to be applied within the field itself.

This outputs the below

Now we need a way to sort the data so the ‘Greatest Generation’ associated to the earliest years is listed first. I did this by determining the minimum date within each Generation.

Min Year Per Generation

{FIXED [Generation], [SEX]: MIN([Year])}

Add this into the view as the first pill in Rows, and the data should automatically sort from lowest to highest

We can now build the viz – duplicate the table sheet, remove Total Population Per Generation and % of Total Population from the Measure Values section. Drag 2019 Population to Columns, then click the swap rows & columns button :

Colouring the bars

The bars are coloured based on each ‘Generation’ pane. You could hardcode this along the lines of ‘Generation = x OR Generation = y or Generation = z etc’ where x, y and z etc are generations of the same colour. This would return true or false, which you can then add to the colour shelf and adjust accordingly.

I decided to be a bit more dynamic, deciding I wanted to set the colour based on whether it was an odd or even pane.

For this I created another ‘rank’ field based on the field I’d used to ‘sort’ the data, the Min Year Per Generation field.

Sort Position

RANK_DENSE(MIN([Min Year per Generation]),’asc’)

If you add this into the data table, you’ll see each section is numbered 1 -7

From this, we can then determine if the number is even (or not)

Sort Position is even number

[Sort Position]%2=0

Add this onto the Colour shelf which will return True or False and colour accordingly.

Adding the year labels

The year labels are achieved by using a dual axis chart, to plot a point for each specific year (based on the Min Year Per Generation field) at some arbitrary value.

Point to Plot Year Label

IF [Year]=[Min Year per Generation] AND [Year]<>1919 THEN 4700000 END

For each ‘min year’ that isn’t 1919, plot a value at 4.7M.

Add this field to the Rows shelf, change mark type to circle, reduce size to as small as it can, and set the colour transparency to 0.

Add Min Year Per Generation to the Label shelf, then change the alignment to vertical.

Now you make the chart dual axis and synchronise the axis. Some of the colours/marks may change, so reset by removing Measure Names from the Colour shelf and changing the mark types bar to bar & circle.

Final formatting

So at this point all the main components are there. It’s now a case of formatting – removing right and bottom axes, removing gridlines. The vertical dashed lines, are column dividers, set at the pane level only.

The solid left hand axis is set via

The text is formatted using the fonts advised in the requirements and sizes adjusted to suit.

Add on a tooltip, and set the background colour of the worksheet and you should be done.

My published viz is here.

Happy vizzin’! Stay Safe!

Donna

Can you use Quick LoDs to recreate this view?

It was Lorna’s turn to set the challenge this week, and she took the opportunity to ask us to use a new feature in Tableau v2021.1 – Quick LoDs (you’ll obviously need v2021.1 to use the functionality, but the LoDs can be created manually in earlier versions if need be).

This blog will focus on

  • Creating the Sub-Category Average Sales by Category LoD using Quick LoDs
  • Formatting the difference calculation
  • Colouring the bars
  • Text for Tooltip
  • Putting it all together

The viz itself isn’t that complex once you’ve nailed the LoD, so lets’ start with that bit.

Creating the Sub-Category Average Sales by Category LoD using Quick LoDs

The what…? It’s a bit of a mouthful,.. the “Sub-Category Average Sales by Category”. What we’re essentially after here is the Total Sales per Category / No of Sub-Categories in the Category. To do this with LoDs, we first need to create an LoD to represent the total sales in each Sub-Category.

As mentioned in the requirements / referenced in the KB article above, you use a combination of ctrl/command click & drag to create an LoD via the Quick LoD feature. In this case I dragged the Sales measure onto the Sub-Category dimension, and this automatically created

Sales (Sub-Category)

{ FIXED [Sub-Category]: SUM([Sales]) }

which is the sales per sub-category.

From this, I then dragged this measure onto the Category dimension, which automatically then created

Sales (Sub-Category) (Category)

{ FIXED [Category]: SUM([Sales (Sub-Category)]) }

BUT I then edited this to change the aggregation to AVG, so the field became

{ FIXED [Category]: AVG([Sales (Sub-Category)]) }

This gives the average value required, which you can see is the same across the rows in a single Category :

Formatting the difference calculation

The viz displays the % difference between the Sub-Category sales and the average. This is calculated with

Difference

(SUM([Sales])- SUM([Sales (Sub-Category) (Category)])) / SUM([Sales (Sub-Category) (Category)])

This is then custom formatted as ▲0%;▼0% (I use this site to get my shapes from).

Colouring the bars

The bars need to be coloured based on the value of the Difference field, so another calculated field is required

Colour

[Difference]>0

This will just return true or false and can dropped on the Colour shelf of the bars.

Text for Tooltip

Within the tooltip, the % difference is displayed, along with some text which indicates if it is above or below the average. For this we need another calculated field to reference in the Tooltip.

Above | Below

IF [Difference]>0 THEN ‘Above’ ELSE ‘Below’ END

Putting it all together

With all the calculated fields built, the the chart itself is a relatively simple dual axis chart

  • Add Category then Sub-Category to Rows
  • Add Sales to Columns and sort descending
  • Add Sales (Sub-Category) (Category) to Columns. Make dual axis and synchronise axis
  • Click on the All marks card and remove Measure Names from the Colour shelf.
  • Click on the Sales marks card and change mark type to bar; add the Colour field to the Colour shelf and adjust accordingly; add Difference to the Label shelf and format appropriately.
  • Click on the Sales (Sub-Category) (Category) marks card and change mark type to Gantt; adjust the Size to be as large as possible; set the colour to the relevant grey.
  • On the All marks card, add the Above | Below field to the Tooltip shelf, then edit the Tooltip on the All marks card to create the required text.
  • Finally, remove axis, remove the column heading labels, remove grid lines and column borders and format the displayed text appropriately. Title the viz.

A very brief post today, but hopefully I’ve ticked off all the required elements, My published viz is here.

Happy vizzin’! Stay Safe!

Donna