Let’s go streaking!

It was Sean Miller’s turn to set the challenge this week, where the primary focus was to find the highest number of consecutive months where the monthly sales value was higher than the previous month.

This was a table calculations based challenge, and I always tackle these by building out the data required in a tabular format. The challenge was also reminiscent of a previous challenge Sean has set, which I’ve blogged about here, and admit I used as a reference myself.

So let’s get started.

To start with, we need the month date, the Sub-Category, the Sales value and the difference in Sales from the previous month. For the month date, I like to define this explicitly

Order Date Month

DATE(DATETRUNC(‘month’,[Order Date]))

This aligns all Order Dates to the 1st of the relevant month.

Add Sales Category, Order Date Month (set to discrete exact date blue pill), and Sales into a view, then set a Quick Table Calculation of Difference on the Sales pill

Edit the table calculation to compute by Order Date Month only, so the previous calculation restarts at each Sub-Category.

Then drag this pill from the marks card into the left hand data pane to ‘bake’ the calculated field into the data model. Name the field Sales Diff. The re-add Sales back into the view too, so you can double check the figures.

Identify whether there is an increase with the field

Diff is +ve

IF [Sales Diff]>0 THEN 1 ELSE 0 END

Add this into the view too, and verify the calculation is computing by Order Date Month only again.

Now we need to work out if the row matches the previous value

Match Prev Value

LOOKUP([Diff Is +ve],-1) = [Diff Is +ve]

The LOOKUP is looking at the previous row (identified by the -1) and comparing to the current. If they match then it returns True else False.

Again add into the view, and again double check the table calc settings. In this case there is nested calculations so you need to double check the settings against each calc referenced in the drop down

Now we need to work out when there are consecutive increases, and how many of them there are

Increase Streak

IF (NOT([Match Prev Value])) AND ([Diff Is +ve] = 1) THEN 1
ELSEIF [Diff Is +ve] = 1 THEN ([Diff Is +ve]+PREVIOUS_VALUE([Diff Is +ve]))

END

If the current row has a +ve difference and the previous row wasn’t +ve, then we’re at the start of an increase streak, so set to 1. Else, if the current row has a +ve difference then we must be on a consecutive increase, so add to the previous row, and this becomes a recursive calculation, so builds up the values..

Add this onto the view, set the table calc settings, and you can see how this is working…

So now we’ve identified the streaks in each Sub-Category, we just want the maximum value.

Longest Streak

WINDOW_MAX([Increase Streak])

Add this and set the table calc setting again. You’ll see the max value is spread across every row per Sub-Category.

Finally we need to identify Sales values in the months when the streak is at its highest.

Sales of Month with Longest Streak

IF [Longest Streak]=[Increase Streak] THEN SUM([Sales]) END

Add this into the view again (don’t forget those table calc settings), and you’ll notice that for some Sub-Categorys there are multiple points with the same max streak

With all this we can now build the viz, which is relatively straight forward….

Add Order Date Month (exact date, continuous green pill) to Columns, Sub-Category to Rows and Sales to Rows. Edit the Sales axis to be independent, then change the line type of the Path to stepped

Add Sales of Month with Longest Streak to Rows and set to dual axis, and synchronise. Make sure the mark type of the 2nd axis is set to circle, and remove Measure Names from the colour shelf of both marks.

Manually set the colour of the line chart to grey. Add Longest Streak to the Colour shelf of the circle marks card. Adjust the colour to use the green palette, set to stepped of 5 value and ensure the range starts at 0 and ends at 5 (don’t forget to edit the table calc settings!).

Now add Longest Streak as a discrete blue pill to the view too.

This is all the core components. The last thing we need to do is sort the list. I wasn’t entirely sure how it had been sorted, apart from the largest Longest Streak at the top. I created a new field for this

Sort

[Longest Streak]*-1

and added this as a blue discrete pill in front of Sub-Category….

…, then hid the column.

Then just apply the tooltip and relevant formatting on the chart.

For the legend, I created a new field

Legend

CASE [Sub-Category]
WHEN ‘Art’ THEN 0
WHEN ‘Chairs’ THEN 1
WHEN ‘Labels’ THEN 2
WHEN ‘Paper’ THEN 3
WHEN ‘Phones’ THEN 4
ELSE 5 END

and added this into a new sheet as below

The components then just need to be added to the dashboard. My published version is here.

Happy vizzin’! Stay Safe!

Donna

Can you rebuild the Olympic Schedule?

This week’s #WOW challenge was a joint one with the #PreppinData crew, with the intention to use the #PreppinData challenge to create the data set needed for the Tableau challenge. I completed the Prep challenge, but decided to use the output provided by the #PreppinData crew as the input to this challenge (just in case I had inadvertently ended up with discrepancies).

  • Sport Selector
  • Adjusting the time
  • The Schedule Viz
  • Event Counts in Tooltip
  • Event listing Viz in Tooltip
  • Other Sports bar chart Viz in Tooltip
  • Dashboard interactivity

Sport Selector

This is a simple chart that lists the Sport Groups. I chose to build a bar chart using MIN(1) on the Columns and Sport Group on Rows. The axis was then fixed from 0-1.

Create a set based on Sport Group and select a few values to be ‘in’ the set (eg Boxing, Gymnastics, Martial Arts).

Add the Sport Group Set to the Colour shelf to identify the selected sports. Adjust colours accordingly.

Adjusting the Time

Create a parameter pTimeAdjust which is an integer paramater, defaulted to 0 and ranges from -12 to +12. Set the step value to 1 as this will ensure when you add the parameter to the dashboard, the prev/next buttons can be displayed alongside the slider.

Create a calculated field to store the time of the event based on the ‘timezone’ selected via the above parameter

Date Time Adjust

DATEADD(‘hour’, [pTimeAdjust], [UK Date Time])

This field will be used to display the full event date & time on the event listing viz in tooltip, along with building the schedule viz itself.

Additionally, create a field based on the above, which just stores the day of the adjusted datetime field above

Day of Adjusted Date

DATE(DATETRUNC(‘day’,[Date Time Adjust]))

This field is needed to help with the filtering required for the viz in tooltips to display.

The Schedule Viz

Add Date Time Adjust set to the Month datepart (blue pill) to the Columns shelf, and alongside it add the same field set to the Day datepart (blue pill). On the Rows, add Sport Group and Sport. Add the Sport Group Set to the Filter shelf. This will give you the ‘bones’ of the schedule

In viewing the provided solution, there was a bit of a discrepancy between when a ‘medal’ icon should show or not, compared to the Medal Ceremony? field provided in the data. It transpired Lorna had made an adjustment, as there were some events that had a ‘final’, but did not include a gold medal or ceremony event.

So to try to match up with Lorna’s output, I too made adjustments, but I can’t guarantee it matches any published solution.

First up I identify the Victory Ceremony events

Is Victory Ceremony?

CONTAINS([Event],’Victory Ceremony’)

I chose to exclude all these events from the schedule, so this field is added to the Filter shelf and set to False.

I also identify the events which appear to be a ‘final’

Is Final?

CONTAINS([Event],’Gold Medal’) OR CONTAINS([Event],’ Final’)

This field will separate the events into two types. Change the Mark Type to Shape, then add this field onto that shelf. Set the shapes accordingly. Note – to add the medal shape, save the image Lorna provided to your machine, then follow these instructions so it’s available for selection.

I chose to add the Is Final? to the Size shelf too, so the shapes can be adjusted to something more suitable.

If you add the rows and columns dividers, you’ll notice the single circles aren’t centred. To resolve this, we’re going to need some axis.

Add MIN(1) to the Rows shelf (y-axis). This will give us some vertical headspace.

Now we need to manage the horizontal space, and ensure the marks don’t overlap each other. When there’s no finals, we want the circle to be plotted in the middle. When there’s both non-final and final ‘events’ we want the two marks to be off-centre, one to the left and one to the right.

We need some calculations to help with this.

#Events by Sport Per Day

{FIXED [Day of Adjusted Date], [Is Victory Ceremony?],[Sport]: COUNT([Event Schedule])}

This helps us count the number of of events per day for a specific sport

#Event Finals By Sport By Day

{FIXED [Day of Adjusted Date], [Is Victory Ceremony?],[Sport]: SUM(IIF([Is Final?],1,0))}

This basically helps us count the number of finals for each sport on a day.

With this we can build

X-Axis

IF [#Event Finals by Sport Per Day] =0 THEN 5
ELSEIF [#Event Finals by Sport Per Day]-[#Events by Sport Per Day] =0 THEN 5
ELSEIF [Is Final?] THEN 7
ELSE 3
END

If there’s no final, plot at 5, if there’s only a final, plot at 5 otherwise plot a final at 7 and a non-final at 3.

Add this to the Columns shelf (set to be a dimension ie not SUM), and edit the axis to be fixed from 0-10.

Events Count in Tooltip

I was also a bit puzzled by some of the numbers being displayed in the tooltip, so chose to compute and show the following 3 measures

  • Number of Events for Sport on that day (this is the #Events by Sport per Day already calculated)
  • Number of Event Finals for Sport on that day (this is the #Event Finals by Sport by Day already calculated)
  • Total Number of events for all other sports on that day (ie the selected sport is excluded from the count).

For this last measure we need

#Events per Day

{FIXED [Day of Adjusted Date], [Is Victory Ceremony?] : COUNT([Event Schedule])}

#Events per Day of Other Sports

SUM([# Events Per Day]) – SUM([#Events by Sport Per Day])

Pop all these on the Tooltip shelf and format appropriately.

You’ll also need to add the Day of Adjusted Date to the Tooltip. This should be set to exact date and discrete (blue pill).

Event Listing in Tooltip

Build out a data listing view of Sport Group, Sport, Date Time Adjust and add Event (I renamed the Event Split field) to the Text shelf. Add Day of Adjusted Date to the Detail shelf. Hide Sport Group and format.

On the schedule viz, add this worksheet to the tooltip, passing Sport and Day of Adjusted Date as filters on the string

Other Sports bar chart Viz in Tooltip

Once again this is a relatively simple chart to build out, with the Day of Adjusted Date field hidden in the display (but necessary for the VIT to filter properly).

However, this will display all sports, and we need this chart to not show the sport that has been initially selected (hovered).

Create a parameter pSportToExclude which is a string parameter. For the purpose of demonstration, enter the text Football.

Create a field

Excluded Sport?

[Sport]=[pSportToExclude]

Add this field to the Filter shelf and set to False, and the sport will disappear from the list

Add a reference to this sheet from the tooltip of the schedule viz, this time passing just Day of Adjusted Date as the filter.

Dashboard Interactivity

Hide / Show Sport Selector

When adding the Sport Selector sheet and the Schedule viz to the dashboard, you need to make sure they exist side by side in the same horizontal container.

Then, providing you are using v2021.2, you can set the Sport Selector object to Hide/Show. See this video for help.

Add remove sports

You will need 2 dashboard set actions for this. They should run on ‘menu’, and one will add items to the set, and the other remove

Set the selected sport to exclude

We’ll use a parameter action for this to run on hover and set the pSportToExclude parameter

Stop Sport Selector highlighting

Create a new field Dummy containing the text “Dummy” and add this to the Detail shelf of the Sport Selector viz.

The add a highlight action against this sheet only

Hopefully I’ve ticked off all the core elements here. There was a fair bit going on, and I’m conscious I’ve drafted this blog fairly quickly in comparison. My published viz is here .

Note, there are a couple of elements in my viz that I added which weren’t on the original solution. I’ve chosen not to include in the blog as the images/characters I chose to use didn’t render on Tableau Public. If you download the workbook, you’ll be able to see what my intention was.

I did also create an alternative view ‘heatmap’ style view as well which you can see here.

Happy vizzin’! Enjoy the Olympics! Stay Safe!

Donna

Can you build an app to visualise wildfires?

Ann Jackson’s husband Josh (@VizJosh) set the challenge this week, to build an ‘application’ to help visualise the scale of wildfires; that is when a fire is said to be 5000 acres, you can use the app to view how that compares to an area of the world you may know, so you can really appreciate just how large (or small) the fire is.

I have to hold my hand up here, and say that after reading the requirements several times, I was absolutely stumped as to where to start. We were provided with some ‘data’ to copy which consisted of 5 rows, which I duly copied and pasted into Desktop, but I then like ‘what now….?’ I knew I needed something geographic to build a map, but couldn’t understand the relevance of the 5 rows… I’ve said before I don’t use maps that often, so was unsure whether there was something I needed to do with this data. After staring at the screen for what seemed like an age, I ended up looking at the solution.

The data is just ‘dummy’ data and is just something to allow you to ‘connect’ Tableau to. You can’t build anything in Tableau without a data source. It could just have been 1 row with a column headed ‘Dummy’ and a value of 0. If it had been that, it might have been more obvious to me 🙂

  • Defining the parameters
  • Building the map
  • Apply button
  • Dashboard Actions

Defining the parameters

Ultimately the ‘data’ being used to build the viz is driven by parameters – the Location selector and the Latitude & Longitude inputs.

pLocation

An integer list parameter that stores values, but displays worded locations – wherever you choose. I opted for my hometown of Didcot in the UK alongside locations Josh had used, mainly so I could validate how the rest of the ‘app’ would work when I came to build it.

pLongitude

Float input, defaulted to the longitude of location 1 (ie Didcot) above.

I just googled Didcot Latitude and Longitude to find the relevant values

Note – Longitude W means an input of -1 * value. Similarly for Latitude S needs to be a negative input.

Then I created

pLatitude

Since we’re talking about parameters, there’s a couple more required, so lets create them now

Acres

Integer parameter defaulted to 5000

pZoom

Integer, list parameter with the values below, defaulted to 2.

Building the map

Now we have some lat & long values (in the pLatitude and pLongitude parameters), we can create some geographic data needed to build a map.

Location

MAKEPOINT([pLatitude], [pLongtitude])

This gives us the centre point which we want to build the ‘fire size’ buffer around. For this we need the calculation JOsh kindly provided :

Acres to Feet

SQRT(([Acres]*43560)/PI())

and then we can create the buffer

Fire Size

BUFFER([Location],[Acres to Feet],’ft’)

Double click on this field and it should automatically create you a ‘map’

Adjust the map ‘format’ via the Map > Map Layers menu option. I chose to set it to the dark style at 20% washout, then ticked various selections to give the details I needed (I added and removed options as I was testing against Josh’s version). I also set the colour of the mark via the Colour shelf to be pale red.

Also, as per the requirement, turn off the map options via Map > Map Options menu, and unchecking all the selections.

So this is the basic map, and you can input different lats & longs into the parameters to test things out.

Now we need to deal with the zoom requirement.

I wasn’t entirely sure about this, so had a bit of a search and found Jeffrey Shaffer’s blog post How to create a map zoom with buffer calculation in Tableau – bingo!

The zoom had to be x times the size of the circle on the map, so achieved by

Zoom

BUFFER([Location],[pZoom] * [Acres to Feet],’ft’)

Add this a map layer (drag field onto the map and drop onto the Add a Marks Layer section that displays)

This has generated a 2nd circle and consequently caused the background map to zoom out. We don’t want this circle to show, nor to be selected, so on the Colour shelf, set the Opacity to 0%, and the Border and Halo to None. To prevent the circle from showing when you hover your mouse on the map, you need to Disable Selection of the Zoom marks card

Apply Button

On a separate sheet, double click into the space below the Marks card, and type ‘Apply’ into the resulting ‘text pill’ that displays, and then press return.

This will create a blue pill, which you can then add to the Label/Text shelf. Align the text to be middle centre

This view is essentially going to act as your ‘Apply’ button on the dashboard. When it is clicked on, we want it to take the Lat & Long values associated to the place listed in the pLocation parameter, and update the pLatitude & pLongitude parameter values.

For this, we need a couple of extra calculated fields

Location Lat

CASE [pLocation]
WHEN 1 THEN 51.6080
WHEN 2 THEN 40.7812
WHEN 3 THEN 51.5007
WHEN 4 THEN 48.8584
END

Note – as before, all these values were worked out via Google as shown above.

Location Long

CASE [pLocation]
WHEN 1 THEN -1.2448
WHEN 2 THEN -73.9665
WHEN 3 THEN 0.1246
WHEN 4 THEN 2.2945
END

Add both these fields to the Detail shelf of the Apply sheet.

Dashboard Actions

When you add the 2 sheets to the dashboard, you then need to add parameter actions to set the values of the pLongitude & pLatitude parameters on click of the Apply button

Set Lat

A parameter action that runs on Select of the Apply sheet, setting the pLatitude parameter with the value from the Location Lat field.

You need another action Set Long which does a similar thing by passing the Location Long field into the pLongitude variable.

Finally, you don’t want the ‘Apply’ button to look ‘selected’ (ie highlighted pale blue) once clicked. Create calculated fields True = True and False = False and add both of these to the Detail shelf on your Apply button sheet.

Then add a dashboard filter action that uses Selected Fields and maps True to False

Hopefully, this should provide you with all the core features to get the functionality working as required. Ultimately once I got out of the starting blocks, it wasn’t too bad…

My published viz is here.

Happy vizzin’! Stay Safe!

Donna

Can you find the top and bottom performers?

The challenge this week came from Candra McRae, where the focus was to use statistics to identify the top & bottom performers, rather than the more common ‘top n’ and ‘bottom n’. By statistics, we’re specifically looking records within the top 25th percentile and the bottom 25th percentile.

So let’s dive in.

  • Identify Current date
  • Defining the calculations
  • Building the chart
  • Month Selector and interaction

Identify Current date

The data we’re using is the Superstore Sales data from 2021.1 which includes data up to 31st Dec 2021. The requirement talks about the current rolling x months worth of data compared to the previous x months worth of data.

This means we need a way to determine what ‘current’ is. Typically, in a real sales environment, you’d probably only have data up to ‘today’, and I did consider working up a solution based on ‘today’, but equally I like to deliver a solution that I know matches the challenger, as it helps to validate my workings, and also I like to have a solution that I can look back on in the future and know there’s data.

So I took Candra’s hint and based ‘current’ off of the maximum date in the data set, derived by

Max Date

DATE({FIXED:MAX([Order Date])})

Defining the calculations

If you’re a regular reader of my blogs, you’ll know when I can, I like to build the data out into a tabular form, so I can verify the calculations, and then I’ll build out the viz.

First up, we want to get a value for the ‘current rolling n months’.

To define ‘n’ we need a parameter.

pRollingMonth

An integer defaulted to 12. It doesn’t need to be a list, as this will be populated via a parameter action from another sheet – more on that later.

Current Rolling Sales

IF [Order Date]>= DATEADD(‘month’, -1 * [pRollingMonth] ,DATETRUNC(‘month’,[Max Date])) AND [Order Date] <= [Max Date] THEN [Sales] END

let’s break this down… DATETRUNC(‘month’,[Max Date]) truncates the Max Date which is 31st Dec 2021 to the 1st of the month ie it returns 01 Dec 2021.

DATEADD(‘month’, -1 * [pRollingMonth] ,DATETRUNC(‘month’,[Max Date])) , is then going back to the 12 months prior (-1×12=-12) , so is 01 Dec 2020.

So we’re only going to get a sales value if the Order Date is >= 01 Dec 2020 and <= 31 Dec 2021 (essentially 13 months of sales data).

For the previous sales, we first need

Prev Month

DATEADD(‘month’, -1, [Max Date])

so in our current example, this will be 30 Nov 2021.

and then to get the previous rolling 12 month sales, we can apply similar logic using Prev Month instead of Max Date

Previous Rolling Sales

IF [Order Date]>= DATEADD(‘month’, -1 * [pRollingMonth] ,DATETRUNC(‘month’,[Prev Month])) AND [Order Date] <= [Prev Month] THEN [Sales] END

Both these fields can be formatted to 1 decimal place, $ prefix and format in thousands (k).

And then we also need a difference to display on the tooltip

Difference

SUM([Current Rolling Sales]) – SUM([Previous Rolling Sales])

This needs to be additionally formatted so that negatives are displayed in brackets ().

Add all these into a table and sort by the Current Rolling Sales descending

So we’ve got the data needed for the bar, the line and the tooltip. We now need to work on the crux of the challenge – the calculations needed to identify the top & bottom.

We’re looking to identify the 25th percentile value based on Current Rolling Sales values displayed on screen

25th Percentile

WINDOW_PERCENTILE(SUM([Current Rolling Sales]),0.25)

and also we need the 75th percentile

75th Percentile

WINDOW_PERCENTILE(SUM([Current Rolling Sales]),0.25)

If you pop these table calcs into the table, you’ll see the values for each field are the same for each row

and with these we can now identify where each row falls

Colour

IF SUM([Current Rolling Sales]) >= [75th Percentile] THEN ‘Top’
ELSEIF SUM([Current Rolling Sales]) <= [25th Percentile] THEN ‘Bottom’
ELSE ‘Middle’
END

Finally we need to identify the rows with a negative difference and flag with a circle.

Sales Contraction Indicator

IF [Difference]<0 THEN ‘●’ ELSE ” END

I use this site to get the symbols for these types of requirements,

Pop these two fields in to the table, and you’ve got all the data needed to build the chart:

Building the chart

Candra states that we can’t use a reference line to display the previous sales data, so for the core chart we need to build a dual axis chart plotting Sub-Category against Current Rolling Sales (bar chart) and Previous Rolling Sales (gantt chart).

Current Rolling Sales is coloured by Colour. I created a Label:Current Rolling Sales field just based on Current Rolling Sales but formatted to 0dp to add to the Label shelf.

To get the circles displayed, and to retain the order of the display, duplicate the Sub-Category field so you have a Sub-Category (copy) field. Add this to Rows alongside the existing Sub-Category field.

Then add the Sales Contraction Indicator field between these 2 fields, and format the font of that field so it is in red text (getting a coloured circle, was the part of this challenge I struggled most over, yet it really was very simple once the penny dropped!).

Then hide the first Sub-Category field (uncheck Show Header) so it no longer displays.

Apply various formatting to remove the row & column lines, gridlines etc, and adjust the tooltip and you should be done.

Month Selector and Interaction

A separate sheet is needed for this. We need to build a basic viz that has 12 data points with values 1-12. And we can get this from the Order Date field

Month Order Date

MONTH([Order Date])

will return the month number

Add this field as a discrete (blue) pill to the Columns shelf, set the mark type to square, and add Month Order Date to the Label shelf too. Colour the mark pale grey, and remove all borders etc, and hide the headers

When you add the 2 sheets onto the dashboard, you need to set a parameter action from the Month Selector sheet that sets the pRollingMonth parameter, using the value from the Month Order Date field. When unselecting, the value should default back to 12.

Hopefully there’s enough here to get you to the end! My published viz is here.

Happy vizzin’! Stay Safe!

Donna

Profitability with Dual Axis Charts

Luke Stanke returned for this week’s challenge, to build a pareto chart & bar chart on an unsynchronised dual axis. The crux of this challenge is table calculations, so as with any challenge like this, I’m going to build out what I need in tabular form first, so I can thoroughly validate I’m getting the right values. Once that is done, I’ll build the chart, then finally I’ll look at how to get the measures needed for the subtitle text.

  • Defining the core calculations
  • Building the chart
  • Working out the measures for the subtitle

Defining the core calculations

For the pareto, we need to plot % of orders against cumulative profit, so we need to build up some fields to get to these.

Add Order ID to Rows and Profit to Text and sort by Profit descending.

For the cumulative profit, we can add a Running Total Quick Table Calculation to the Profit pill

Add another Profit pill back into the view, and you can see how the table calculation is adding up the values of the Profit from the previous rows.

The triangle symbol indicates the field is a table calculation. By default, if you edit the table calculation, the calculation is computing down the table. I always choose to ‘fix’ how my calculations are computing, so that the values don’t inadvertently change if I move the pill elsewhere. So I recommend you set the table calc to Compute Using Order ID

I also want to ‘bake’ this table calculation into the data model (ie create a dedicated calculated field) that I can pick up and reuse. The simplest way to do this is to press Ctrl, then drag the field into the left hand data field pane (this will effectively copy the field rather than remove it from the view). Name the field and then you can verify it’s contents.

Cumulative Profit

RUNNING_SUM(SUM([Profit]))

So that’s one of the measures we need. Onto the next.

First of all we need to get a cumulative count of the number of orders.

Count Orders

COUNTD([Order ID])

Add this to the measures and it will display the value 1 per row (since each row is an order). Add a Running Total Quick Table Calculation to this field too, and again set to Compute Using Order ID. ‘Bake’ this into the data model too, by dragging the field as described above, and create a new field

Cumulative Order Count

RUNNING_SUM([Count Orders])

Now we need to get a handle on the total number of orders. I could do this with a LoD, but will stick with table calcs

Total Order Count

WINDOW_SUM([Count Orders])

Add to the view, and compute using Order ID again.

Now we can calculate the cumulative % of total orders

Cumulative % of Total Orders

[Cumulative Order Count]/[Total Order Count]

Format this to a % with 2 dp.

Add to the view and again compute using Order ID. You should see the values increase until 100%.

NOTE – I could have got this value by adding a Running Total table calculation to the Order Count field, and then editing that table calculation and adding a secondary table calculation to get to the % of total. However, I want to be able to reference the output of this field later on, so having a dedicated calculated field is the better option.

Ok, so now we have the 2 measures we need to plot the basic chart – Cumulative Profit and Cumulative % of Total Orders.

Building the chart

I typically start by duplicating the data sheet and then moving pills around

  • Duplicate Sheet
  • Remove Cumulative Order Count and Total Order Count
  • Move Order ID to the Detail shelf. Reset the sort on this pill to sort by Profit Descending
  • Remove Measure Names
  • Move Cumulative Profit to Rows
  • Move Cumulative % of Total Orders to Columns
  • Move Profit to Tooltip
  • Change mark type to Line
  • Add Sales to Tooltip and adjust tooltip accordingly

The chart needs to be coloured based on whether the marks has a profit > 0 or not. So for this we need

Profit is +ve

SUM([Profit]) >0

Add this to the Colour shelf and adjust accordingly.

Now we can add the second axis by adding Sales to the Rows shelf, then

  • Change mark type of the Sales marks card to bar
  • Remove the Profit is +ve field from the Colour shelf
  • Change the size to the smallest value
  • Adjust the tooltip
  • Make dual axis
  • Bring the Cumulative Profit axis to the front (right click on the axis > move marks to front)

Now the chart just needs to be formatted

  • remove column and row borders
  • edit the axis titles
  • format all the axes to to 8pt, and change the font of the axis title to Times New Roman
  • format the % of Total Orders axis to be 0dp

Working out the measures for the subtitle

For this, we are going to revert back to the tabular view.

We need to identify the point at which the Profit value starts to become negative. Let’s add the Profit is +ve field to Rows.

We’re looking for the row highlighted, which is the row where the previous value is true, while itself is false, which is achieved by

Profitable Marker

LOOKUP([Profit is +ve],-1) AND NOT([Profit is +ve])

Let’s add this now (ensuring the compute using Order ID)

We need to get a handle on the Cumulative % of Total Orders value for this row, but spread it across all the rows in data set, which we can do by

% of Total Profitable

WINDOW_MAX(IF [Profitable Marker] THEN [Cumulative % of Total Orders] END)

Add this on, compute by Order ID, and you can see the value for the ‘true’ line is displayed against every row. Format this field to % 0 dp.

For the potential profitability decrease, we need to get the Cumulative Profit value for the Profitable Marker row, along with the final (total) Cumulative Profit value.

Total Cumulative Profit

WINDOW_MAX(IF LAST()=0 THEN [Cumulative Profit] END)

This takes the value from the very last row in the data and again spreads across the all the rows.

With this, we can now work out the potential decrease

Potential Profitability Decrease

WINDOW_MAX(IF [Profitable Marker] THEN ([Cumulative Profit]-[Total Cumulative Profit])/[Cumulative Profit] END)

of the profitable marker row, take the difference between the ‘current’ cumulative profit and the final cumulative profit, as a proportion of the current value. Spread this across every row. Format to % of 0dp.

Now, as we have worked out these 2 values, % of Total Profitable and Potential Profitability Decrease to be the same across every row, you can add them to the Detail shelf of the All marks card on the chart viz, and reference them in the Title of the viz. (Don’t forget to ensure all table calc fields are set to compute using Order ID).

My published viz is here.

Happy vizzin’! Stay Safe!

Donna

Can you make Spine Charts?

Sean Miller provided the challenge for this week, resurrecting a challenge originally set by Emma Whyte in 2017. Revisiting these older challenges is great fun, as often newer product features provide a different way of solving. For me, I also like the fact I know I’ve already solved it once, and have my own work to reference if I get stuck – ha ha!

Sean hinted that this wasn’t a challenge to ‘overthink’ – no table calcs or LoDs required. You need to be able to display average responses per question per university alongside the overall average response for the question. Simply filtering by university isn’t going to cut it, as the quick filter will immediately eliminate all the data that isn’t associated to the selected university, which means you can’t compute an ‘overall average’ without using LoDs.

The key to this challenge is to use a parameter to drive the University selection. Create this by right clicking on the University field -> Create > Parameter. This will create the parameter dialog box, prepopulated with all the university values. Set the default to University of Liverpool.

pUniversity

With this, we can now create calculated fields to store the values associated to the selected university only.

Sample Size

AVG(IF [University]=[pUniversity ]THEN [Sample Size] END)

Note 1, there is already field called Sample Size in the data set. The actual name of this field is <space>Sample Size<space> which Tableau sees as a different name. In hindsight I should have just renamed the original field, so I could then have ‘Sample Size‘. Be mindful of this when I refer to the field later; unless I call it out, I’m referring to my version.

Note 2, I chose to apply the AVG aggregation within the calc rather than changing the default aggregation on the pill when added to the view. There was a reason I did this, but I can’t recall what it was, and think it wasn’t necessary in the end….

University Avg

AVG(IF [University] = [pUniversity ] THEN [% Agree] END)

formatted to percentage, 1 dp

We can also then define the overall average for comparison

Overall Avg

AVG([% Agree])

formatted to percentage, 1 dp

and with that can calculate the variance between the two

Delta

([University Avg]) – ([Overall Avg])

This is custom formatted to 0.00%▲;0.00%▼ (I use this site to get the arrow characters)

And then we need a field to define how the mark needs to be coloured

Colour

[Delta]>=0

We can put these all out in a view

  • Question Number (which I renamed to No) on Rows
  • Question Text on Rows
  • Sample Size on Rows (set to be a discrete blue pill)
  • University Avg on Rows (set to be discrete)
  • Overall Avg on Rows (set to be discrete)
  • Delta on Rows (set to be discrete)
  • University Avg on Columns (continuous green pill)
  • Change Mark Type to Circle
  • Add Colour to the Colour shelf and adjust

Now we need to work on adding the various lines and bands on the chart. This is all managed by adding reference lines (or bands).

Drag % Agree to the Detail shelf, and change to be AVG.

The drag the same field % Agree on Detail again, this time change to MIN. Repeat again, and change to MAX.

Right click on the University Avg axis > Add Reference Line. Create a line, per pane using the AVG(% Agree) field.

Add another reference line (by right clicking on the axis again). This time create a band that starts at MIN(% Agree) and ends at MAX(% Agree). Set the Fill colour to light grey.

We need to create some new fields for the quartile values.

Lower Quartile

PERCENTILE([% Agree],0.25)

Upper Quartile

PERCENTILE([% Agree],0.75)

Add both these fields to the Detail shelf again, then add another reference line (band) similar to that above, but referencing the quartile fields. Set the Fill colour to be a darker grey.

Adjust the formatting and set the tooltips and you’ve got the main chart…. well almost…

In the solution, the first column, the question no, is not labelled. I couldn’t figure out how to do this, which is why I relabelled to simply No. I tried various things, including using text boxes as column headings on the dashboard, but the layout just didn’t work.

BUT I’ve now found out how to do it… because I googled, and I didn’t yesterday when I was building 😦 Andy Kriebel explains it all here. He searches for a ‘zero width space’ character on this site , and then copies the resulting ‘character image’ displayed and pastes into the label of a calculated field. Watch the video to see it in action, but I’ve noted the steps here, just as much for my own benefit when I can’t remember what to do in future… I can see this type of feature cropping up often 🙂

The legend utilises a lot of the concepts above, but we don’t what the mark changing with each university selection. So let’s just hardcode

Legend Avg

AVG(IF [University] = ‘Middlesex University’ THEN [% Agree] END)

and we’ll need a dedicated field for the colour

Legend Colour

[Legend Avg] – [Overall Avg] >=0

The legend sheet can then be built just by plotting the Legend Avg pill on the Columns shelf, with a mark type of circle, the Legend Colour on the Colour shelf, and the same pills used in the reference lines above on the Detail shelf.

When adding the reference lines and bands this time, you will need to add labels and format their position.

The quartile band, also has dotted lines indicating the end of the band, which you can apply as part of the band properties. However the quartile band also has one label left aligned, while the other is right aligned. For a single reference band, the labels can either both be formatted left aligned or both right aligned. To resolve this, don’t add a label to the ‘band to’ section. Create another reference line for the Upper Quartile value, and you can then format the label of this independently.

My published via associated to this challenge is here.

My version based on the original challenge from 2017 is here. The requirement was a bit more complicated it would seem, and it looks like I utilised FIXED LODs quite heavily.

Happy vizzin’! Stay Safe!

Donna

Can you visualise the Cholera Outbreak?

Lorna delivered an exciting map based challenge this week, to recreate one of the most famous visualisations ever created. If you’re part of the data viz community and never heard of John Snow and his cholera map, then I suggest you go and google now 🙂

While I love map based challenges, as its an area I don’t get to use regularly, they’re also the ones that can give me the most frustrations/take the longest to complete, as the functions and concepts I need to use don’t come as readily as with other challenges. I had to refer back to my own blogs on previous map based challenges (specifically this one and this one), to help me out. However these were written before map layers were introduced, so not everything was applicable.

  • Modelling the data
  • Building the Avg Distance to Pump bar chart
  • Building the Total Deaths within Avg Distance bar chart
  • Building Map
  • Adding the dashboard interactivity

Modelling the data

3 sets of data was provided

  • Pumps – 1 row per pump with its location
  • Deaths – 1 row per death and its location (can be multiple rows for the same location)
  • Deaths aggregated – 1 row per location where at least 1 death was recorded, including the count of deaths at that location

This all needed to be ‘combined’ so we can work across all 3 sets of data and compare the locations. I used relationship calculations to create a relationship of 1=1 between the related data sets, which means all the aggregated deaths are mapped to each pump, and all the individual deaths are mapped to each pump too.

Building the Avg Distance to Pump bar chart

I chose to start building the bar charts first so I could get better acquainted with how the data was working together, and I knew the maps would take more effort.

To start we need to figure out the distance from each pump to each death, so we need to get the location of each pump, and the location of each death

Pump Location

MAKEPOINT([Pump Lat],[Pump Lon])

Individual Death Location

MAKEPOINT([Death Lat],[Death Lon])

These will both create calculated fields of a geographic data type as indicated by the globe icon to the left of the field.

And with these fields, we can then work out the distance between them in metres

Distance Pump to Death

DISTANCE([Pump Location], [Individual Death Location],’m’)

Now we want the average distance per pump

Avg Distance

{FIXED [Pump ID]: AVG([Distance Pump to Death])}

I formatted this to 0 dp and added a suffix of ‘m’

And with this, we can build the first bar chart – Pump ID on Rows and Avg Distance on Columns. You need to apply formatting

  • Change the font – I set the font to Times New Roman at the Workbook level, to make life easier (Format menu > Workbook).
  • Set the worksheet background colour. I used #f5f1f0
  • Add a black border to each bar (via the Colour shelf)
  • Show labels on bars (via Label shelf)
  • Hide Avg Distance axis (uncheck show header on the Avg Distance pill)
  • Remove all row & column borders
  • Set the Axes Ruler on Rows to black
  • Set the Tooltips

A parameter is needed to store the selected pump

pSelectedPump – an integer defaulted to 1

With this, we can then create a field to indicate which pump is selected

Is Selected Pump?

[Pump ID]=[pSelectedPump]

Add this field to the Colour shelf and set the True option to black and the false option to match the background colour you used.

Building the Total Deaths within Avg Distance bar chart

We now need to identify if the death occurred within the average distance that has been calculated.

Number Deaths within Avg Distance

SUM(INT([Distance Pump to Death] <= [Avg Distance]))

This is a bit of short hand instead of writing IF [Distance Pump to Death] <= [Avg Distance] THEN 1 ELSE 0, and summing up the result. It relies on the the fact that the equation [Distance Pump to Death] <= [Avg Distance] returns a boolean of true or false, which can be converted to an integer as true =1 and 0 = false.

At a row level, the field just returns 1 or 0, but just plot this field against Pump ID, you get the desired count

And the easiest way to build this bar chart, is to duplicate the one created above (on the worksheet tab, right click & duplicate sheet), then replace the Avg Distance pill with the Number Deaths within Avg Distance pill (simply drag the latter from the data pane and drop directly onto the former in the columns shelf). Then adjust tooltips etc to suit. Doing it this way preserves (most of) the formatting.

Building the Map

We’re going to use 4 layers in creating the map, and I’m documenting from the bottom upwards (although this isn’t the order in which I actually built the viz in practice – I shuffled things round afterwards).

We’ll start with the density layer – the blue coloured area on the map.

On a new sheet, add Individual Death Location and Death ID to the Detail shelf. This should automatically create a map view with the auto generated lat & long fields. Then

  • Change the mark type to Density
  • Change the colour to Density Blue-Teal Light
  • Alter the Intensity to 75%, the Opacity to 60%
  • Increase the Size to maximum
  • Delete the info in the tooltip

We need to set the background map

Via the Map > Map Layers menu option, set the Style to Normal and uncheck all options and set Streets, Highways, Routes

Then set the worksheet background to the colour we’ve used before

Onto Layer 2 – the pump buffer circle.

We’ve already identified the selected pump, but now we need to get it’s location, and for that we need the lat and long of the selected pump.

Selected Pump Lat

{FIXED: MIN(IIF([Is Selected Pump?],[Pump Lat],NULL))}

This essentially gets the latitude of the selected pump and sets that value across all the rows in the data.

Selected Pump Long

{FIXED: MIN(IIF([Is Selected Pump?],[Pump Lon],NULL))}

And with these we can get

Selected Pump Location

MAKEPOINT([Selected Pump Lat],[Selected Pump Long])

We also need to get the average distance of the selected pump spread across every row

Selected Pump Avg Distance

{FIXED :SUM( IF [Is Selected Pump?] THEN [Avg Distance] END)}

Now we have this, we can build the buffer field

Pump Buffer

BUFFER([Selected Pump Location],[Selected Pump Avg Distance],’m’)

Drag this Pump Buffer field onto the map and drop it onto the ‘Add a Marks Layer’ section that should appear. This will create an additional marks card.

Set the colour to the same background colour previously used, and drop the opacity to around 5%. Add Pump ID to Detail and Selected Pump Avg Distance to the Tooltip shelf. Adjust tooltip to match.

Onto Layer 3 – the pumps

Drag Pump Location onto the map and Add a Marks Layer.

Add Pump ID to Text and Location to Tooltip to the relevant marks card. Change mark type to circle, increase the size and change colour to red and add a dark grey border. Adjust text to be centred, and set the tooltips accordingly.

Onto the final layer now, the aggregated death locations. We need a further location field

Death Location

MAKEPOINT([DeathLat (deaths aggregate.csv)], [DeathLon (deaths aggregate.csv)])

Drag this onto the map to Add a Marks Layer then on the subsequent marks card add Loc ID to the Detail shelf. Change the mark type to circle, and add Deaths to the Size shelf – adjust accordingly. Change the colour to black with a white border. Adjust tooltip.

Adding the dashboard interactivity

Add the 3 sheets onto a new dashboard sheet. Create a new dashboard action to Change Parameter, that runs on select of any of the 3 charts, and targets the pSelectedPump parameter by setting it with the value from the Pump ID field.

Now if you select a pump on the map, or a bar on either of the bar charts, all the charts will reflect the selection made.

The other interactvity you may notice, is that on click, of the bars, or the pumps, the other data ‘fades’ (or the selected data is highlighted). We don’t want this.

To resolve, create a calculated field

True

True

and another

False

False

Add both these fields to the Detail shelf on both bar chart sheets and to the Detail shelf on the Pumps marks card.

Then on the dashboard, create a dashboard filter action which filters on selected fields setting True = False (see below). You’ll need to create one of these actions for each of the 3 sheets.

And hopefully, you now have a working viz. My published version is here.

Happy vizzin’! Stay Safe!

Donna

Sales Goal Selection Tool

Ann Jackson set this week’s #WOW2021 challenge, based on a recent ‘real world’ situation she had encountered.

Analysing Ann’s solution (by interacting with her published solution), I deduced we’d need to use set actions to add and remove the selected Sub-Categories into and out of the set (I hadn’t noticed Ann had tagged the challenge on the main page with Set Actions 🙂 ). I also realised the initial visual in the first column, wasn’t using reference lines to depict the target, as the tooltip displayed on hover, was much more detailed than what you can add to a reference line tooltip.

So armed with this knowledge, I set about building what was required.

  • Defining the required calculations
  • Building the BANs
  • Building the viz
  • Adding the interactivity

Defining the required calculations

This is one of those challenges where I want to get all the calcs sorted up front in a tabular view, before even attempting the viz. I’ll go through what I ended up with, but be assured this did take a bit of time and change of direction to get what I needed.

Let’s start with parameters. Firstly I created a parameter to simulate ‘today’, that is hardcoded to 9th June 2021

I then created a parameter to store the % uplift we want to use for one of the goal options. I default this to 0.1 (ie 10%)

We need to work out the sales so far this year (ie the sales in 2021 up to and including 9th June 2021), and the equivalent sales for the previous year (ie the sales in 2020 up to and including 9th June 2020).

SALES YTD by Sub Cat

IF YEAR([Order Date]) = YEAR([pToday]) AND [Order Date] <= [pToday] THEN [Sales] END

SALES LY YTD by Sub Cat

IF YEAR([Order Date]) = YEAR([pToday])-1 AND [Order Date] <= DATEADD(‘year’,-1,[pToday]) THEN [Sales] END

Let’s pop these into a table with the Category and Sub-Category fields, and add grand totals, so we can see what figures we need to be aiming for the BANs at the top.

Now let’s work out the values for the goals.

One goal is based on finding the average sales per day in 2021, then extrapolating this across the whole year.

So firstly, we need to know how many days in the year up to ‘today’.

# Days So Far This Year

DATEDIFF(‘day’,DATETRUNC(‘year’,[pToday]), [pToday] ) +1

This finds the number of days between 01 Jan 2021 and 09 June 2021 (then adds 1, as we need to include 9th June too).

# Days in Year

DATEDIFF(‘day’,DATETRUNC(‘year’,[pToday]), DATEADD(‘year’,1,DATETRUNC(‘year’,[pToday])))

This finds the number of days between 01 Jan 2021 and 01 Jan 2022.

These methods ensure the right number of days is recorded if the ‘current’ year happens to be a leap year.

We can now work out one of the goals

Same Pace Goal

(SUM([SALES YTD by Sub Cat])/[# Days So Far This Year]) * [# Days In Year]

Add up the Sales so far this year and divide by the number of days so far to get the average sales value per day, then multiply by the total number of days in the year.

For the other goal, we need to determine the sales for the whole of the previous year, and multiple by the uplift %.

SALES LY

IF YEAR([Order Date]) = YEAR([pToday])-1 THEN [Sales] END

This gives the total sales for 2020.

LY + Percent Increase

SUM([SALES LY])*(1+[pPercentIncrease])

This applies the % increase parameter to the total sales for 2020.

Now we want to decide which of the goal values to use based on ‘selection’. To define the selected Sub-Categories, we’re going to use a set. Right-click on Sub-Category > Create > Set, name accordingly and select ‘Phones’ as the initial value in the set.

Use Last Year With Increase

Now we can use whether the record is in or out of the set in the logic to determine the goal to use

YEAR_END GOAL per Sub-Cat

IF ATTR([Use Last Year With Increase]) then [LY + Percent Increase Goal] ELSE [Same Pace Goal] END

Add the set and the goal field to the table, and you’ll see the value in the final column is matching the relevant previous columns, depending on whether the Sub-Category is IN or OUT of the set.

NOTE – when you do this, you’ll get In or Out displayed against each row. To get the description as I’ve got displayed, right click on the text In or Out and Edit Alias.

These data fields are going to be used to build out the central viz. There’s a couple of other fields we need to finish off the data requirements for the viz

Value to Goal

[YEAR-END GOAL Per Sub Cat]-SUM([SALES YTD by Sub Cat])

and

% of Goal

SUM([SALES YTD by Sub Cat])/[YEAR-END GOAL Per Sub Cat]

This field needs to be formatted to percentage with 0 dp (all other monetary fields need to be formatted to $ with 0 dp).

So now we have the core fields stored against each row that will help us build out the main viz. You can edit the set and add further sub-categories, so you can validate how the values change.

Now we want to work on the data we need for the BANs. You may think the grand totals sum up all the rows above, however while this works fine for the SALES YTD by Sub Cat and SALES LY YTD by Sub Cat columns, it isn’t the case for the YEAR-END GOAL by Sub Cat column.

If we select all the rows in that column, and examine the tooltip that appears on hover, the total of the selected columns differs from the total at the bottom of the column.

The value in the hover text is what we need. The discrepancy with the column grand total is because it’s working across the whole data and not row by row. Because at least 1 record is in the set, its taking the logic to use the same pace goal. You’ll notice the total of this column matches the total of the Same Pace Goal column. In fact as you add more values to the set, the total will match the Same Pace Goal total up until all values are in the set. At that point the total will match the LY + Percent Increase column.

So back to the summary measures. We’re going to use table calcs to solve this.

SALES YTD

WINDOW_SUM(SUM([SALES YTD by Sub Cat]))

SALES LY YTD

WINDOW_SUM(SUM([SALES LY YTD by Sub Cat]))

YEAR-END GOAL

WINDOW_SUM([YEAR-END GOAL Per Sub Cat])

These are all basically summing up the values displayed in the rows on the screen. Add them to the table, and you can see the same values are displayed on each row which tally to other data in the table

Right, we have all the data fields we need, let’s start building.

Building the BANs

On a new sheet, add the fields as below

We want to turn this into 1 row.

Create a field called

Index

INDEX()

and add this to the Rows shelf and change to be discrete (blue pill). Each row should be numbered from 1 to 17.

Now drag the Index field from the Rows shelf onto the Filter shelf and when prompted select 1.

You’ve now got 1 row displayed, but the data associated to all the rows is being included in the calcs (if you’d filtered just to Sub Category = Accessories for example, the date would be just related to the rows with the Accessories value).

Now we can tidy this to look how we want

  • Add Measure Names to Text shelf (change display to entire view if need be).
  • Hide Sub-Category from displaying (uncheck Show Header)
  • Right Click on each column title and Edit Alias to remove the ‘along…’ text
  • Format the text appropriately and align centrally
  • Adjust the tooltip to remove the Sub-Category info
  • Hide the Measure Names from displaying (uncheck Show Header).
  • Format the Row Dividers to be thick

Building the Viz

Add Category and Sub-Category to the Rows shelf on a new sheet. Change the sort of the Category field to sort by data source order, descending. Change the sort of the Sub-Category field to sort as below

Add SALES YTD by Sub Cat to Columns, then drag SALES LY YTD by Sub Cat onto the canvas and drop onto the SALES YTD by Sub Cat axis when you see the 2 columns appear

Drag Measure Names from the Rows shelf to the Size shelf. Add Measure Names to the Colour shelf. Adjust sizes and colours accordingly. Add a white border to the bars on the colour shelf.

Turn stack marks off (Analysis > Stack Marks > Off) to allow the bars to overlay each other.

Add both SALES YTD by Sub Cat and SALES LY YTD by Sub Cat to the Tooltip shelf, so you can reference both values regardless which bar you’re hovering over.

Add YEAR-END GOAL by Sub Cat to the Columns shelf and set to dual axis and synchronise axis. Reset the mark type of the Measure Values card to bar, and of the YEAR-END GOAL by Sub Cat card to Gantt.

Remove the Measure Names from the Size and Colour on the YEAR-END GOAL by Sub Cat card. Set the colour of this mark to black.

On the All marks card, add YEAR-END GOAL by Sub Cat to the Tooltip shelf so once again it’s value can be referenced by all marks.

Adjust the tooltip on the All marks card to match.

‘Type in’ MIN(0) to the Columns shelf (double click in the space next to the pills to enable the text edit feature). On this MIN(0) marks card, change mark type to Text and add Value to Goal to the Text shelf. Adjust the text to include the additional wording and adjust size too if need be. Adjust tooltip too.

Repeat this process for the % of Goal field too.

For the circles, create another MIN(0) column in the same way, but this time change mark type to circle, and add Use Last Year with Increase set to the Colour shelf and adjust. Adjust tooltips.

Format axes/gridlines/text/row banding accordingly and rotate text of the Category field.

Adding the interactivity

Add the sheets to a dashboard. Add a set action to add Sub-Categories to the set on click (Dashboard > Action > Add Action > Change Set Values). Set the action to run on select (on click) of the chart viz, to add values to the Use Last Year with Increase set.

We then need another set action to remove the Sub-Categories, but this needs to work by clicking a link in the tooltip which displays. The name of this action will display on the tooltip.

And fingers crossed.. that should be it! My published viz is here.

Happy vizzin’! Stay Safe!

Donna

Can you structure the unstructured?

As soon as I saw that Candra’s challenge for this week was going to involve Regular Expressions (RegEx), I gave a little groan. RegEx just isn’t my thing 😦 I only ever seem to use them for these challenges, and not in my working life, so have minimal experience. I always think I should focus some time on learning them properly, but other things just end up taking priority. Ho Hum…

So most of my time was spent trying to wrangle the info I needed to identify ‘how many bedrooms’ each property had. I did a bit of googling to try to find the right expressions I think I needed, used the regex101 site to test my expression to find certain patterns of text against some of the data in the Description field, and then tried to plug that into a calculated field in Tableau to extract the data I needed.

But I couldn’t get it to work 😦 I could find matching text using the REGEXP_MATCH function, but when I then tried to use the REXP_EXTRACT functions I couldn’t get anything out…

So I ended up having to look at the solutions that had already been published by the time I started, Candra’s, Lorna Brown’s and Sam Epley’s. I just needed to get my head round what I was obviously doing wrong and give me some pointers. All 3 had slightly different approaches. I absorbed, then closed their workbooks and attempted again from memory. With a lot more trial and error I got somewhere… it isn’t perfect and has some mismatches from the others (but they don’t all match each other either…).

Once I’d got a grouping for each property, the actual Tableau stuff was quite straightforward…

  • Identifying the ‘Number of Bedrooms’
  • Building the Histogram
  • Adding the Average Price
  • Building the Map
  • Adding the Interactivity

Identifying the Number of Bedrooms

So the way I approached this, was to try to identify all the various permutations that represented the word ‘bedroom’ and replace it with the word ‘Bedroom’. But one of the options was BR or br, and the Description field contained html markup with the term <br />. I didn’t want all these to become ‘bedroom’, so I got rid of them all first,

Desc with Bedroom

REGEXP_REPLACE(LOWER(REPLACE([Description],'<br />’, ‘ ‘)),’bedroom|br |bdrm|bed|bd|br, |br/|rooms’,’ Bedroom’)

Firstly, replace any occurence of <br /> with a space, then replace any occurrence of the text bedroom or br<space> or bdrm or bed or bd or br<comma> or br<forward slash> or rooms with the word Bedroom.

I basically added more options to the or statement (identified by the | separator), as I went on examining the descriptions that were left. Using the LOWER function meant that bedroom or Bedroom or BedRoom etc would all be covered with one option.

Then I attempted to extract the number of bedrooms or identify as a studio

Studio | Beds

IF CONTAINS(LOWER([Desc with Bedroom]), ‘studio’) THEN ‘Studio’
ELSEIF REGEXP_MATCH(LOWER([Desc with Bedroom]),’\d bedroom’) THEN REGEXP_EXTRACT(LOWER([Desc with Bedroom]),'(\d+) bedroom’)
ELSEIF REGEXP_MATCH(LOWER([Desc with Bedroom]),’\d bedroom’) THEN REGEXP_EXTRACT(LOWER([Desc with Bedroom]),'(\d+) bedroom’)
ELSEIF CONTAINS(LOWER([Desc with Bedroom]), ‘six bedroom’) THEN ‘6’
END

If the revised description contains the word ‘studio’ then assume its a Studio.

Else if the revised description contains a number (\d) followed by 2 spaces then the word ‘bedroom’ then extract the numbers (\d+) that occur before the word bedroom. The brackets around the \d+ is what is used to identify what bit of the matching pattern to extract… this is the bit that I didn’t really know about and why I couldn’t get things to work.

Else if the revised description contains a number (\d) followed by 3 spaces then the word ‘bedroom’ then extract the numbers (\d+) that occur before the word bedroom. This just happened to be another pattern that occurred and meant some records didn’t get picked up by the prior statement. There’s probably a better way of doing this in one statement…

Finally, if the revised description contains the text ‘six bedroom’ then assume the property has 6 rooms.

This logic seemed to get a match against every record although it’s not 100% accurate, but it was close enough given my struggles.

I then wanted to get the rooms grouped

Room Grouping

CASE [Studio | Beds]
WHEN ‘Studio’ THEN ‘Studio’
WHEN ‘1’ THEN ‘1 Bedroom’
WHEN ‘2’ THEN ‘2 Bedrooms’
WHEN ‘3’ THEN ‘3 Bedrooms’
WHEN ‘4’ THEN ‘4 Bedrooms’
ELSE ‘5 or more Bedrooms’
END

I planned to use this field as my filter, but in doing so the value listed alphabetically, so Studio ended up at the bottom of the list.

To resolve this I created a parameter which meant I could define the order I wanted :

pBedroomSelector

And then I created a new field to use for the filter

Filter Room

[pBedroomSelector] = ‘All’ OR
[pBedroomSelector] = [Room Grouping]

I could then add this onto the filter shelf of the sheets I needed to build, setting the value to True.

Building the Histogram

For this chart, we need to ‘bin’ the Price of each property into groups of $100 ranges. However if we use the built in ‘bin’ function, the field created can’t be referenced in other calculations, and I needed to do this. So instead I determined the ‘lower’ value of the range by

Price per Night Min

FLOOR([Price]/100) *100

Divide the price by 100, round down to the nearest whole integer (so 1.9 will round down to 1), then multiply the result by 100.

And given that, I can then calculate

Price per Night Max

[Price per Night Min]+100

I also created a ‘friendlier’ field to store the number of properties

# of Listings

COUNT([listings copy_listings copy])

which is just a reference to the auto generated field created when you connect to the data source.

With these I can plot the histogram

  • Price per Night Min on Columns (set to discrete, continuous)
  • # of Listings on Rows
  • Mark type of Bar
  • Size set to be Fixed with a width of 100
  • Filter Room on the Filter shelf, set to True.
  • Adjust the colour via the Colour shelf and set a white border
  • Show the pBedroomSelector parameter
  • Add Price per Night Max to the Tooltip shelf and set to be an attribute.
  • Set the Tooltip accordingly and format gridlines, axes labels etc

Adding the Average Price

I wasn’t entirely sure what the average price on Candra’s solution represented, so I chose to go for the average price of the properties in the filtered selection; that is of all the 2-bedroom properties for example, find the average price per night, based on the total price per night of all the properties divided by the number of properties. ie I was looking for these values in the 3rd column.

But I couldn’t simply add the Price field aggregated to Avg to the bar chart. Doing so gave me different values per Price per Night Min grouping.

I just want the value on the grand total line spread across the all the data in the chart. So I created

Window Avg Price

WINDOW_SUM(SUM([Price])) / WINDOW_SUM([# of Listings])

This table calculation, set to compute by Price per Night Min gives the value I want across all rows of data

Add Window Avg Price to the Detail shelf of the histogram, set the calc to compute as above. Then you can add a reference line to the Price per Night Min axis.

Building the Map

To build maps you need fields that are geographic data types. For me, the Longitude field was already set, but I had to manually set the Latitude field (right click -> Geographic Role -> Latitude).

Once done, the map could be quickly built by double-clicking the Longitude field, then double clicking the Latitude field, then adding Name and Listing URL to the Detail shelf, and Price to the Tooltip shelf. Finally set Filter Room = True to the Filter shelf.

I then adjusted the colour of the circles, reduced the opacity to 50% and added a border (all via the Colour shelf).

I also added Area Code Boundaries via the Map -> Map Layers menu to get the map style Candra had used.

Adding the Interactivity

Add the 2 sheets to a dashboard. Each chart can be used to filter each other. This functionality can easily be added by clicking on the context menu of the dashboard object, and selecting Use as Filter. A filter dashboard action will automatically be added. Do this for both charts.

The final requirement, is for a link to the actual listing to be available from the map tooltip. This is a dashboard URL Action (Dashboard -> Actions -> Add Action -> Go to URL). Set as below

The words in the Name field will what is displayed on the tooltip.

The layout requires use of containers, background colours and a bit of padding. This is typically a bit of trial and error to get this right. You can check out my published version here.

Happy vizzin’! Stay Safe!

Donna

How often is Sean listening to his favourite songs?

A “scatter plot with a twist” music based challenge this week was posed by Sean Miller, using data from his last.fm account. On the surface, a simple scatter plot of each song, comparing the number of days between first and last listen against total plays. But click on song, and you get to see a timeline depicting days since first play vs cumulative plays, on the same chart. Hmmm…

The data set provided consists of 1 row per song per date played since 2017. For the scatter plot, we’re looking to summarise the data at a song level. For the timeline related to the selected song, we need to have the data at the song and date level. Ultimately we’re mixing levels of granularity within a single chart.

This certainly took some thinking. Creating the 2 charts independently was pretty straightforward, but trying to put them together took some thought. I knew I was going to want to use a set action to drive the interactivity and apply different logic based on whether a song was selected or not, but it took a bit of trial and error to get a solution.

First up, the data provided contained some timestamp date fields, but these were string data types. I chose to use the Timestamp UTC field to get a ‘proper’ date. I actually did this by duplicating the field, renaming it to Date Played and changing the datatype to a date. This generated a calculation which is below… I’d have never of typed this myself 🙂

Date Played

DATE(IF NOT ISNULL( DATEPARSE ( “MM/dd/yyyy HH:mm”, [TimeStamp UTC] ) ) THEN DATEPARSE ( “MM/dd/yyyy HH:mm”, [TimeStamp UTC] ) ELSEIF NOT ISNULL ( DATEPARSE ( “MM/dd/yyyy hh:mm:ss a”, [TimeStamp UTC] ) ) THEN DATEPARSE ( “MM/dd/yyyy hh:mm:ss a”, [TimeStamp UTC] ) END)

Now we have that, we can work out the first date a song was played

Min Date Per Song

{FIXED [Song ID]:MIN([Date Played])}

and the latest dates

Max Date Per Song

{FIXED [Song ID]:MAX([Date Played])}

and then we can derive the days between

Days since first listen

DATEDIFF(‘day’, [Min Date per Song], [Max Date per Song])

We can also get the total plays per song using

Total Plays per Song

{FIXED [Song ID]: [Total Plays]}

and with these calculated fields, we can build the basic scatter, using Total Plays per Song as a range filter, and setting both the axes not to start at zero.

So far so good. Now let’s think about the timeline. We need to identify a ‘selected song’ to help build this, so let’s create a set by right clicking on Song ID > Create > Set, and selecting a single option

We need to capture the number of plays on a date

Total Plays

COUNT([2021_05_26_WW21_My Streaming Activity.csv])

This is the default count field generated (the equivalent of the Number of Records if you’re on older versions of Tableau).

And we also need to capture the number of days from first play date to the current date, as we need to plot on a consistent axis when putting the charts together (ie we can’t plot with date on the axis).

Days to Date

DATEDIFF(‘day’,[Min Date per Song],[Date Played])

Let’s put this out into a table so you can see what’s going on.

Add the Selected Song = True to the filter shelf

There are days when there were multiple plays, so the Days To Date field needs to be set to AVG rather than SUM, to get the correct figure.

For Total Plays we need to plot the cumulative value, so we can set a quick table calculation against the Total Plays field of Running Sum (right click on Total Plays field). We’re going to explicitly set the table calculation to compute by Date Played as when depicted on a viz, the default of table down, might not give the correct values.

So with these fields, we can build the timeline viz (duplicate the table sheet and move the fields around). Set the mark type to line, and change the setting on the Path to make a stepped chart.

Ok, so now we have the 2 charts and hopefully understand what we’re aiming for. But how do we now go about getting everything onto a single chart?

We’re going to need a dual axis chart, since we have different mark types in play. And we’re going to want to plot different measures depending on whether we’re working with the selected song or not.

We’re going to build the data up in a table to get the logic for the fields we need. To just test the concept, we’ll filter to just a few songs, including the one in the Selected Song set. Add Song ID to the filter shelf and filter to a few songs.

Then build out a table as below:

What we’re aiming for, is for the records where In/Out of the set is Out, we want to plot the information we’re getting from the 1st two columns, but for the records where In/Out the set is In, we want the information from the other columns.

So let’s build this out.

Days to Plot

IF ATTR([Selected Song]) THEN AVG([Days To Date]) ELSE SUM([Days since first listen]) END

Plays to Plot

IF ATTR([Selected Song]) THEN RUNNING_SUM([Total Plays]) ELSE SUM([Total Plays per Song]) END

Pop these in the table, making sure any fields which are table calculations are set to compute by Date Played.

If you scroll to find where the data changes from a song out of the set to the one in, you can see how the two new fields are working.

So lets try plotting the chart out using these fields instead.

  • Days to Plot on Columns
  • Total Plays Per Song on Rows
  • Song ID on Detail shelf
  • Plays to Plot on Columns
  • Date Played (set to exact date) on Detail of the All Marks card.
  • Set the table calculation of Plays to Plot to compute by Date Played only.
  • Change mark type of the Plays to Plot to Line and set the Path to stepped line
  • Add Total Plays Per Song to Filter shelf and set to range from 50 to 100 (just to make the chart less busy).

You should end up with the below

You’ll notice we have the line of circles in the top chart, which is plotting a mark per day for the Selected Song. If we remove the Date Played pill from the Detail shelf of the Total Plays Per Song marks card, we lose these marks including the mark for the Selected Song too, which we need.

To resolve this, we need another field.

Is Last?

LAST()=0

Last Plays to Plot

IF [Is Last?] THEN Sum([Total Plays per Song]) END

This is just saying give me the total plays for the last mark in the list. Add these to the table to see what’s going on, making sure to set the table calc to compute by Date Played only

Now if we replace Total Plays per Song with the Last Plays to Plot field, we get
Now make the chart dual axis (don’t forget to synchronise), and we can then sort the formatting.
  • Set the marks type on the Last Plays to Plot to circle
  • Remove Measure Names from the Colour shelf, and set the mark colour to #3dde3c. Reduce the opacity to about 90%. Add a dark grey border.
  • Add Selected Song to the Size shelf, and adjust so the selected song is larger than the others.
  • On the Plays to Plot marks card, again remove Measure Names from the Colour shelf, and set the colour to black.
  • Set the Path to stepped line.
  • Click on the right hand axis and select Move marks to back
  • Reduce the Size of the mark.
  • Add Selected Song to the Colour shelf, and adjust the colours so the line is black and the tiny dots for all the other marks that you can see in the circles, is set to #3dde3c

You’ve now got the core chart, which needs to be further tidied to remove grid lines, axes, add tooltip etc. Once done, you can add to a dashboard, where you can then set the interactivity.

Add a dashboard action to change set values that sets the Selected Song set on Select. Set to work on single-select only.

Finally, you’ll find that if you select a mark, while you’ll get the trend line, the other points will now ‘fade out’

Create a new field True = True, and add this to the Detail shelf of the All Marks card. Then on the dashboard, add a new dashboard highlight action, which is set to Target Highlighting to the True field only.

Now if you click on and off a mark, you should get the trendline show and disappear, and all marks remain at the same transparency throughout.

Hopefully you’ve got enough now to complete this challenge. My published viz is here.

Happy vizzin’! Stay Safe!

Donna