Can you visualise when it gets busy?

March 7, 2024 Donna ColesLeave a comment

It was Yoshi’s first challenge as a #WOW coach this week, and it provided us with an opportunity to develop our data densification and date calculation skills. I will admit, this certainly was a challenge that made me have to think a lot and reference other content where I’d done similar things before.

Modelling the data

Yoshi provided us with an adapted data set of patient waiting times, based on the Healthcare dataset from the Real World Fake Data (#RWFD) site. This provides 1 row per patient with a wait start time and end time. The requirement was to understand how many patients were waiting in each 30 min time slot in a 24hr period, so if a patient waited over 30 mins, or the start & wait time spanned a 30 min time slot, then the data needed to be densified appropriately. Yoshi therefore provided an additional data set to use for this densification, and I have to be honest, it wasn’t quite what I was expecting.

So the first challenge was to understand how to relate the two sets of data together, and this took some testing before I got it right. Yoshi provided hints about using a DATEDIFF calculation to find the time difference between the wait start time (rounded to 30 mins) and the wait end time (rounded to 30 mins).

To work out the calculation required, I actually first connected to the Hospital ER data set and then spent time working out the various parts of the formula required. I wanted to work out what the waiting start time was ’rounded down’ to the previous 30 min time slot, and what the waiting end time was ’rounded up’ to the next 30 min time slot.

Date Start Round

//Rounding down the wait start time

IIF(DATEPART(‘minute’, [Date Start])<30 , DATETRUNC(‘hour’, [Date Start]) , DATEADD(‘minute’, 30, DATETRUNC(‘hour’, [Date Start])))

If the minute of Date Start is between 0 and 29, then return the date at the hour mark, otherwise (if the minute of Date Start is between 30-59) then find the date at the hour mark, and add on 30 minutes. So if Date Start is 23/04/2019 13:23:00 then return 23/04/2019 13:00:00, but if Date Start is 23/04/2019 13:47, then return 23/04/2019 13:30:00.

Date End Round

//Rounding up the wait end time

IIF(DATEPART(‘minute’, [Date End])<30 , DATEADD(‘minute’, 30, DATETRUNC(‘hour’, [Date End])) , DATEADD(‘hour’, 1,DATETRUNC(‘hour’, [Date End])))

If the minute of Date End is between 0 and 29, then find the date at the hour mark, and add on 30 minutes, otherwise (if the minute of Date End is between 30-59) then find the date at the hour mark, and add on 1 hour. So if Date End is 23/04/2019 13:23:00 then return 23/04/2019 13:30:00, but if Date End is 23/04/2019 13:47, then return 23/04/2019 14:00:00.

With this, I was then able to relate the Hospital ER data set to the dummy by 30 min data set by adding a relationship calculation to the Hospital ER data set of

DATEDIFF(‘minute’,
IIF(DATEPART(‘minute’, [Date Start])<30 , DATETRUNC(‘hour’, [Date Start]) , DATEADD(‘minute’, 30, DATETRUNC(‘hour’, [Date Start]))),
IIF( DATEPART(‘minute’, [Date End])<30 , DATEADD(‘minute’, 30, DATETRUNC(‘hour’, [Date End])) , DATEADD(‘hour’, 1,DATETRUNC(‘hour’, [Date End])))
)

(which returns the difference in minutes between the ’rounded down’ Date Start and the ’rounded up’ Date End – you can’t reference calculated fields in the relationship, so you have to recreate)

and set this to be >= to Range End from the dummy by 30 min data set

Let’s see what this has done to the data.

On a sheet add Patient ID. Date Start (as discrete exact date – blue pill), Date End (as discrete exact date – blue pill), Date Start Round (as discrete exact date – blue pill), Date End Round (as discrete exact date – blue pill) and Index (as discrete dimension – blue pill) to Rows and add Patient Waittime to Text.

Looking at the data for Patient Id 899-89-7946 : they started waiting at 06:18 and ended at 07:17. This meant they spanned three 30 min time slots: 06:00-06:30, 06:30-07:00 and 07:00-07:30, and consequently there are 3 rows displayed, and the ’rounded’ start & end dates go from 06:00 to 07:30.

Identifying the axis to plot the 24hr time period against

Having ‘densified’ the data, we now need to get a date against each row related to the 30 min time slot it represents. ie, for the example above we need to capture a date with the 06:00 time slot, the 06:30 time slot and the 07:00 time slot.

But, as the chart we want to display is depicted over a 24hr timeframe, we need to align all the dates to the exact same day, while retaining the time period associated to each record. This technique to shift the dates to the same date, is described in this Tableau Knowledge Base article.

Date Start Baseline

DATEADD(‘day’, DATEDIFF(‘day’, [Date Start Round], #1990-01-01#), [Date Start Round])

Add this into the table, you can see what this data all looks like – note, I’m just choosing a arbitrary date of 01 Jan 1990, so my baseline dates are all on this date, but the time portions reflect that of the Date Start Round field.

With the day all aligned, we now want to get a time reflective of each 30 min timeslot. The logic took a bit of trial and error to get right, and there may well be a much better method than what I came up with. It became tricky as I had to handle what happens if the time is after 11pm (23:00) as I needed to make sure I didn’t end up returning dates on a different day (ie 2nd Jan 1990). I’ll describe my logic first.

If the Index is 0 then we just want the date time we’ve already adjusted – ie Date Start Baseline

If the Index is 1, then we need to shift the Date Start Baseline by 30 minutes. However, if Date Start Baseline is already 23:30, then we need to hardcode the date to be 01 Jan 1990 00:00:00, as ‘adding’ 30 mins will result in 02 Jan 1990 00:00:00.

If the Index is 2, then we need to shift the Date Start Baseline by 1 hour. However, if Date Start Baseline is already 23:30, then we need to hardcode the date to be 01 Jan 1990 00:30:00, as ‘adding’ 1 hour will result in 02 Jan 1990 00:30:00. Similarly, if Date Start Baseline is already 23:00, then we need to hardcode the date to 01 Jan 1990 00:00:00, otherwise we’ll end up with 02 Jan 1990 00:00:00.

As the relationship didn’t result in any instances of Index > 2, I stopped my logic there. This is all encapsulated within the calculation

Date to Plot

CASE [Index]
WHEN 0 THEN [Date Start Baseline ]
WHEN 1 THEN IIF(DATEPART(‘hour’, [Date Start Baseline ])=23 AND DATEPART(‘minute’, [Date Start Baseline ])=30, #1990-01-01 00:00:00#, DATEADD(‘minute’, 30, [Date Start Baseline ]))
WHEN 2 THEN IIF(DATEPART(‘hour’, [Date Start Baseline ])=23 AND DATEPART(‘minute’, [Date Start Baseline ])=30, #1990-01-01 00:30:00#, IIF(DATEPART(‘hour’, [Date Start Baseline ])=23 AND DATEPART(‘minute’, [Date Start Baseline ])=0, #1990-01-01 00:00:00#, DATEADD(‘hour’, 1, [Date Start Baseline ])))
END

Phew!

Add this to the table as a discrete exact date (blue pill), and you can see what’s happening.

For Patient Id 899-89-7946, we have the 3 timeslots we wanted: 06:00 (representing 06:00-06:30), 06:30 (representing 06:30 -07:00) and 07:00 (representing 07:00-07:30).

On a new sheet, add Date To Plot as a discrete exact date (blue pill) and then create

Count Patients

COUNT([Patient Id])

and add to Text and you have a list of the 48 30 min time slots that exist within a 24hr period, with the patient counts.

We need to be able to filter this based on a quarter, so create

Date Start Quarter

DATE(DATETRUNC(‘quarter’,[Date Start]))

and custom format this to yyyy-“Q”q

Add to the Filter shelf selecting individual dates and filter to 2019-Q2, and the results will adjust, and you should be able to reconcile against the solution.

Building the 30 minute time slot bar chart

On a new sheet, add Date Start Quarter to the Filter shelf and set to 2019-Q2. Then add Date to Plot as a continuous exact date (green pill) to Columns and Count Patients to Rows. Change the mark type to a bar and show mark labels.

Custom format Date To Plot to hh:nn.

We need to identify a selected bar in a different colour. We’ll use a parameter to capture the selected bar.

pSelectTimePeriod

date parameter defaulted to 01 Jan 1990 14:00 with a display format of hh:nn

Then create a calculated field

Is Selected Time Period

[Date to Plot] = [pSelectTimePeriod]

and add to the Colour shelf, adjusting colours to suit and setting the font on the labels to match mark colour.

A ‘bonus’ requirement is to make each bar ’30 mins’ wide. For this we need

Size – 30 Mins

//number of seconds in 30 mins divided by number of seconds in a day (24hrs)
(30 * 60) / 86400

Add this to the Size shelf, change it to be a dimension (so no longer aggregated to SUM), then click on the Size shelf, and set it to be fixed width, left aligned.

Via the Colour shelf, change the border of the mark to white.

Next, we need to get the tooltip to reflect the ‘current’ time slot, as well as the next time slot. For this I created

Next Time Period

IIF(LAST()=0, #1990-01-01 00:00:00#, LOOKUP(MIN([Date to Plot]),1))

This is a table calculation. If it’s the last record (ie 01 Jan 1990 23:30), then set the time slot to be 1st Jan 1990 00:00, otherwise return the next time slot (lookup the next (+1) record). Custom format this to hh:nn and add to the Tooltip shelf. Set the table calculation to compute by Date to Plot and Is Selected Time Period.

Update the Tooltip as required.

The final part of this bar chart is another ‘bonus’ requirement – to add 2 hr time interval ‘bandings’. For this I created another calculated field

2hr Time Period to Plot

IIF(DATEPART(‘hour’, MIN([Date to Plot]))%4 = 0 AND DATEPART(‘minute’, MIN([Date to Plot]))=0,WINDOW_MAX([Count Patients])*1.1,NULL)

If the hour part of the date is divisible by 4 (ie 0, 4, 8,12,16, 20), and we’re at the hour mark (minute of date is 0), then get the value of highest patient count displayed in the chart, and uplift it by 10%, otherwise return nothing.

Add this to Rows. On the 2nd marks card created, remove Is Selected Time Period from Colour and Next Time Period from Tooltip. Adjust the table calculation of the 2hr Time Period to Plot to compute by Date to Plot only. Remove labels from displaying.

The bars are currently showing all at the same height (as required) but at a width of 30 minutes. We want this to be 2 hrs instead, so create

Size – 2 Hrs

//number of seconds in 2hrs / number of seconds in a day (24 hrs)
(60*60*2)/86400

Add this to the Size shelf instead (fixing it to left again). Adjust the colour to a pale grey, and delete all the text from the Tooltip.

Hide the nulls indicator. Make the chart dual axis and synchronise the axis. Right click on the 2hr Time Period Plot axis (the right hand axis) and move marks to back.

Tidy up the chart by removing row & column dividers, hiding the right hand axis, removing the titles from the other axes. Update the title accordingly.

Building the Patient Detail bar chart

On a new sheet, add Is Selected Time Period to Filter and set to True. Then go back to the other bar chart, and set the Date Start Quarter Filter to Apply to Worksheets -> Selected Worksheets and select the new one being built.

Add Department Referral, Patient Id, Date Start (as discrete exact date – blue pill) and Date End (as discrete exact date – blue pill) to Rows and Patient Waittime to Columns. Manually drag the None Department Referral option to the top of the chart. Add Patient Waittime to Colour and adjust to use the grey colour palette. Remove column dividers.

The title of this chart needs to display the number of patients in the selection, as well as the timeframe of the 30 min period. We already have the start of this captured in the parameter pSelectTimePeriod. We can use parameters to capture the other values too.

pNumberPatients

integer parameter defaulted to 0

pNextTimePeriod

date parameter, defaulted to 01 Jan 1990 14:30:00, with a custom display format of hh:nn

Update the title of the Patient Detail bar chart to reference these parameters (don’t worry about the count not being right at this point).

Capturing the selections

Add the sheets onto a dashboard. Then create the following dashboard parameter actions.

Set Selected Start

On select of the 30 Min bar chart, set the pSelectTimePeriod parameter passing in the value from the Date To Plot field.

Set Selected End

On select of the 30 Min bar chart, set the pNextTimePeriod parameter passing in the value from the Next Time Period field.

Set Count of Patients

On select of the 30 Min bar chart, set the pNumberPatients parameter passing in the value from the Count Patients field, aggregated at the SUM level.

With these all set, you should find you can click on a bar in the 30 minute chart and filter the list of patients below, with the title reflecting the bar chosen.

The basic viz itself wasn’t overly tricky, once you get over the hurdle of the calculations needed for the relationship and identifying the relevant 30 min time periods.

My published viz is here. Enjoy!

Happy vizzin’!

Donna

What is the lifetime value of customers?

November 3, 2022November 3, 2022 Donna ColesLeave a comment

Luke Stanke set the #WOW2022 challenge this week (see here) asking us to visualise the lifetime value of customers. I have to admit, I did struggle to really understand what was being requested, and to get the numbers to match Luke’s. I ended up referring to my own blog post on a similar challenge Ann Jackson set in week 2 of 2021 to get the calculations I needed 🙂

We first need to define the quarter that each customer made their first purchase.

Customer First Order Quarter

DATE(DATETRUNC(‘quarter’, {FIXED [Customer ID]:MIN([Order Date])}))

The {FIXED LOD} calculation returns the minimum order date per customer, then truncates this to the first day of the quarter that date falls in.

We then need to determine the difference in quarters between the Customer First Order Quarter and the quarter associated to the Order Date of each order in the data set. The requirement indicated we needed to add 1 to the result. I split this into multiple calculated fields. Firstly,

Order Date Quarter

DATE(DATETRUNC(‘quarter’, [Order Date]))

gets me the quarter of each Order Date, then

Quarters Since First Purchase

DATEDIFF(‘quarter’, [Customer First Order Quarter], [Order Date Quarter])+1

Let’s pop these fields out into a table to check things are behaving as expected.

Add Customer First Order Quarter and Order Date Quarter to Rows as discrete exact dates (blue pills), and add Quarters Since First Purchase to the Text field, but set it to be a dimension, so it is disaggregated.

Now we need to count the number of distinct customers that made a purchase in each Customer First Order Quarter (the cohort)

Count Customers Per Cohort

{FIXED [Customer First Order Quarter]: COUNTD([Customer ID])}

Add this to to the sheet, along with Sales (I moved Quarters Since First Purchase to rows too)

As expected, we can see the same customer count for each Customer First Order Quarter cohort.

We’ve now got all the building blocks to move on the the next requirement, but I’m going to rearrange the table a bit, to start to reflect the data actually needed for the output.

The x-axis of the chart is going to be based on the Quarters Since First Purchase field, so move that to be the first column of data (1st entry in Rows). Then remove Customer First Order Quarter and Order Date Quarter, as we don’t need this level of information in the final viz.

We now have the sum of all the customers and the total sales, so we can now create the division reqiurement

Sales / Customer

SUM([Sales])/SUM([Count Customers Per Cohort])

I set this to currency $ with 0 dp.

Add this to the table

Then to make this a running total, create a Running Total quick table calculation off of this field (right click on field -> Quick Table Calculation -> Running Total). Add back in Sales/ Customer.

We’ve now got all the components needed to build the viz, but do require an additional calculation to be displayed in the tooltip, which is the difference between each row.

Right click the existing running total Sales / Customer pill (the one with the triangle) and choose to Edit Table Calculation. Tick the Add secondary calculation checkbox and choose the Difference From table calc to run down the table, making the calc relative to the previous row.

Re-add a Running Total quick table calc to the other Sales / Customer pill, and then add Sales / Customer back into the view (it’s annoying that Tableau won’t let you add multiple pills of the same measure name unless they have a different calculation against them).

The snag with this is that we need a value to display for the first row. We need to create a new field that can reference the data in the second table calculation. Click and drag the ‘difference’ table calculation field into the Data pane, and name the field Difference. It should look like below, but you shouldn’t have needed to type any of that.

Difference

ZN(RUNNING_SUM([Sales / Customers])) – LOOKUP(ZN(RUNNING_SUM([Sales / Customers])), -1)

Now create a new calculated field

Tooltip:Difference

IF FIRST()=0 THEN [Sales / Customers] ELSE [Difference] END

Set this to a customer number format of 1dp, prefixed by + (the data is always cumulative so is never going to have a -ve value, so this works).

Now we can build the viz.

On a new sheet add Quarters Since First Purchase to Columns as a continuous dimension (green pill), then add Sales / Customers to Rows and set to be a Running Total quick table calc. Change the mark type to be a Gantt Bar.

Create a new field

Size

[Tooltip:Difference]*-1

and add this to the Size shelf.

Add a second instance of the Sales / Customer running total calc (press ctrl and click and drag the existing pill in rows to create another next to it). Change the mark type of this to be bar. Remove the Size pill, and then click the Size shelf button, and set the Size to be fixed, aligned left.

Now make the charts dual axis and synchronise the axis. Set the colour of each mark type to the relevant palette, and set the mark borders to None. Hide the right hand axis.

On the All marks card, add the Tooltip:Difference and Count Customers Per Cohort fields to the Tooltip shelf, and amend the tooltip to match.

On the bar marks card, click the Label shelf button and tick the show mark labels checkbox. Align these left.

Remove the left hand axis, remove all gridlines and row/column dividers. Add column axis ruler and dark tick marks

Edit the x-axis and rename the title to Quarter of Order.

If you find you have the x-axis going from 0 to 18, then change the datatype of Quarters Since First Purchase from a whole number to decimal. (right click pill in data pane -> change data type -> Number(decimal).

Add the chart to a dashboard and boom! you’re done. My public viz is here. (note, I did notice some vertical dividers displaying in the area chart on public, but think this is a Tableau Public ‘bug’ as I’ve switched off all the lines I can think of, and Desktop displays fine….

Happy vizzin’!

Donna

How many consecutive starts?

August 31, 2021 Donna Coles1 Comment

Another table calculation related challenge this week, set by Luke, visualising cumulative starts for NFL Quarterbacks per team from 2006.

Luke provided the data within a Tableau workbook template on Tableau Public, so I started by downloading the workbook and understanding the data structure.

The challenge talks about teams playing over 17 weeks, but the data showed some data associated to weeks 28-32. So I excluded these weeks by filtering them out.

I then started to build out the data in tabular form, so I could start to build up what was required. I added Team, Season, Week and Player ID to Rows, and just to reduce the amount of data I was working with while I built up the calcs, chose to filter to the Teams ARZ, ATL & BLT.

What we’re looking to do is examine each Player ID and work out whether it is the first record for the Team or whether it differs from the previous row’s data. If so then we’re at the ‘start’ of a run, so we record a ‘counter’ value of 1. If not, the values match, so we need to increment the counter.

We’ll do this in stages.

Firstly, let’s get the previous Player ID value.

Prev Player ID

LOOKUP(MIN([Player Id]),-1)

This ‘looks up’ the record in the previous (-1) row. Change this field to be discrete and add to the Rows. Set the table calculation to compute by all fields except Team.

Each Prev Player ID value matches the Player ID from the row before, unless its the first row for a new Team in which case the value is Null.

Then we can create a field to check if the values match

Match Prev Player ID

MIN([Player Id])=[Prev Player ID]

Add this to the view and set the table calc as above, and the data shows True, False or NULL

Now we can work out the consecutive streak values

Consecutive Streak

IF (NOT([Match Prev Player ID])) OR ISNULL([Match Prev Player ID]) THEN 1
ELSE 1+PREVIOUS_VALUE(-1)
END

If we don’t match the previous value or we’re at the start of a new team (as value is NULL), then start the streak by setting the counter to 1, otherwise increment the counter. Add this to the view and set the table calc for both the nested calculations as per the settings described above.

Next we need to identify the last value of the Consecutive Streak for each Team.

Current Streak

WINDOW_MAX(IF LAST()=0 THEN [Consecutive Streak] END)

The inner IF statement, will return the value of Consecutive Streak stored against the last row for the Team. All other rows will be Null/blank. The WINDOW_MAX() statement then ‘spreads’ this value across all the rows for the Team.

Add this onto the view, and set the table calc for all the nested calcs.

Finally, we need one more bit of data. The chart essentially plots values from 2006 week 1 through to 2020 week 17. We need to ‘index’ these values, so we have a continuous week number value from the 1st week. We can use the Index table calculation for this

Index

INDEX()

Add this field to the view, set it to be discrete (blue pill) and position after the Week field on the Rows. Set the table calc as usual, so the Index restarts at each Team.

Now we’ve got all the data points we need, we can build the viz. I did this by duplicating the tabular view and then

Remove Prev Player ID and Match Prev Player ID
Move Season, Week and Player ID from Rows to Detail
Move Current Streak from Text to Rows and change to be discrete (blue)
Move Index from Rows to Columns and change to be continuous (green)
Move Consecutive Streak from Text to Rows
Change mark type to bar, and set to fit width to expand the view.
Change Size to be Fixed, width size 1 and aligned right
Set the border on the Colour shelf to be None.
Remove the Team filter and adjust the row height

All that’s left now is to set the tooltip (add Player to the Tooltip shelf to help this), and then apply the formatting. You can use workbook formatting (Format -> Workbook menu) to set all the font to Times New Roman.

Hopefully this is enough to get you to the end 🙂 My published viz is here.

Happy vizzin’! Stay Safe!

Donna

Can you structure the unstructured?

June 3, 2021 Donna Coles1 Comment

As soon as I saw that Candra’s challenge for this week was going to involve Regular Expressions (RegEx), I gave a little groan. RegEx just isn’t my thing 😦 I only ever seem to use them for these challenges, and not in my working life, so have minimal experience. I always think I should focus some time on learning them properly, but other things just end up taking priority. Ho Hum…

So most of my time was spent trying to wrangle the info I needed to identify ‘how many bedrooms’ each property had. I did a bit of googling to try to find the right expressions I think I needed, used the regex101 site to test my expression to find certain patterns of text against some of the data in the Description field, and then tried to plug that into a calculated field in Tableau to extract the data I needed.

But I couldn’t get it to work 😦 I could find matching text using the REGEXP_MATCH function, but when I then tried to use the REXP_EXTRACT functions I couldn’t get anything out…

So I ended up having to look at the solutions that had already been published by the time I started, Candra’s, Lorna Brown’s and Sam Epley’s. I just needed to get my head round what I was obviously doing wrong and give me some pointers. All 3 had slightly different approaches. I absorbed, then closed their workbooks and attempted again from memory. With a lot more trial and error I got somewhere… it isn’t perfect and has some mismatches from the others (but they don’t all match each other either…).

Once I’d got a grouping for each property, the actual Tableau stuff was quite straightforward…

Identifying the ‘Number of Bedrooms’
Building the Histogram
Adding the Average Price
Building the Map
Adding the Interactivity

Identifying the Number of Bedrooms

So the way I approached this, was to try to identify all the various permutations that represented the word ‘bedroom’ and replace it with the word ‘Bedroom’. But one of the options was BR or br, and the Description field contained html markup with the term <br />. I didn’t want all these to become ‘bedroom’, so I got rid of them all first,

Desc with Bedroom

REGEXP_REPLACE(LOWER(REPLACE([Description],'<br />’, ‘ ‘)),’bedroom|br |bdrm|bed|bd|br, |br/|rooms’,’ Bedroom’)

Firstly, replace any occurence of <br /> with a space, then replace any occurrence of the text bedroom or br<space> or bdrm or bed or bd or br<comma> or br<forward slash> or rooms with the word Bedroom.

I basically added more options to the or statement (identified by the | separator), as I went on examining the descriptions that were left. Using the LOWER function meant that bedroom or Bedroom or BedRoom etc would all be covered with one option.

Then I attempted to extract the number of bedrooms or identify as a studio

Studio | Beds

IF CONTAINS(LOWER([Desc with Bedroom]), ‘studio’) THEN ‘Studio’
ELSEIF REGEXP_MATCH(LOWER([Desc with Bedroom]),’\d bedroom’) THEN REGEXP_EXTRACT(LOWER([Desc with Bedroom]),'(\d+) bedroom’)
ELSEIF REGEXP_MATCH(LOWER([Desc with Bedroom]),’\d bedroom’) THEN REGEXP_EXTRACT(LOWER([Desc with Bedroom]),'(\d+) bedroom’)
ELSEIF CONTAINS(LOWER([Desc with Bedroom]), ‘six bedroom’) THEN ‘6’
END

If the revised description contains the word ‘studio’ then assume its a Studio.

Else if the revised description contains a number (\d) followed by 2 spaces then the word ‘bedroom’ then extract the numbers (\d+) that occur before the word bedroom. The brackets around the \d+ is what is used to identify what bit of the matching pattern to extract… this is the bit that I didn’t really know about and why I couldn’t get things to work.

Else if the revised description contains a number (\d) followed by 3 spaces then the word ‘bedroom’ then extract the numbers (\d+) that occur before the word bedroom. This just happened to be another pattern that occurred and meant some records didn’t get picked up by the prior statement. There’s probably a better way of doing this in one statement…

Finally, if the revised description contains the text ‘six bedroom’ then assume the property has 6 rooms.

This logic seemed to get a match against every record although it’s not 100% accurate, but it was close enough given my struggles.

I then wanted to get the rooms grouped

Room Grouping

CASE [Studio | Beds]
WHEN ‘Studio’ THEN ‘Studio’
WHEN ‘1’ THEN ‘1 Bedroom’
WHEN ‘2’ THEN ‘2 Bedrooms’
WHEN ‘3’ THEN ‘3 Bedrooms’
WHEN ‘4’ THEN ‘4 Bedrooms’
ELSE ‘5 or more Bedrooms’
END

I planned to use this field as my filter, but in doing so the value listed alphabetically, so Studio ended up at the bottom of the list.

To resolve this I created a parameter which meant I could define the order I wanted :

pBedroomSelector

And then I created a new field to use for the filter

Filter Room

[pBedroomSelector] = ‘All’ OR
[pBedroomSelector] = [Room Grouping]

I could then add this onto the filter shelf of the sheets I needed to build, setting the value to True.

Building the Histogram

For this chart, we need to ‘bin’ the Price of each property into groups of $100 ranges. However if we use the built in ‘bin’ function, the field created can’t be referenced in other calculations, and I needed to do this. So instead I determined the ‘lower’ value of the range by

Price per Night Min

FLOOR([Price]/100) *100

Divide the price by 100, round down to the nearest whole integer (so 1.9 will round down to 1), then multiply the result by 100.

And given that, I can then calculate

Price per Night Max

[Price per Night Min]+100

I also created a ‘friendlier’ field to store the number of properties

# of Listings

COUNT([listings copy_listings copy])

which is just a reference to the auto generated field created when you connect to the data source.

With these I can plot the histogram

Price per Night Min on Columns (set to discrete, continuous)
# of Listings on Rows
Mark type of Bar
Size set to be Fixed with a width of 100
Filter Room on the Filter shelf, set to True.
Adjust the colour via the Colour shelf and set a white border
Show the pBedroomSelector parameter
Add Price per Night Max to the Tooltip shelf and set to be an attribute.
Set the Tooltip accordingly and format gridlines, axes labels etc

Adding the Average Price

I wasn’t entirely sure what the average price on Candra’s solution represented, so I chose to go for the average price of the properties in the filtered selection; that is of all the 2-bedroom properties for example, find the average price per night, based on the total price per night of all the properties divided by the number of properties. ie I was looking for these values in the 3rd column.

But I couldn’t simply add the Price field aggregated to Avg to the bar chart. Doing so gave me different values per Price per Night Min grouping.

I just want the value on the grand total line spread across the all the data in the chart. So I created

Window Avg Price

WINDOW_SUM(SUM([Price])) / WINDOW_SUM([# of Listings])

This table calculation, set to compute by Price per Night Min gives the value I want across all rows of data

Add Window Avg Price to the Detail shelf of the histogram, set the calc to compute as above. Then you can add a reference line to the Price per Night Min axis.

Building the Map

To build maps you need fields that are geographic data types. For me, the Longitude field was already set, but I had to manually set the Latitude field (right click -> Geographic Role -> Latitude).

Once done, the map could be quickly built by double-clicking the Longitude field, then double clicking the Latitude field, then adding Name and Listing URL to the Detail shelf, and Price to the Tooltip shelf. Finally set Filter Room = True to the Filter shelf.

I then adjusted the colour of the circles, reduced the opacity to 50% and added a border (all via the Colour shelf).

I also added Area Code Boundaries via the Map -> Map Layers menu to get the map style Candra had used.

Adding the Interactivity

Add the 2 sheets to a dashboard. Each chart can be used to filter each other. This functionality can easily be added by clicking on the context menu of the dashboard object, and selecting Use as Filter. A filter dashboard action will automatically be added. Do this for both charts.

The final requirement, is for a link to the actual listing to be available from the map tooltip. This is a dashboard URL Action (Dashboard -> Actions -> Add Action -> Go to URL). Set as below

The words in the Name field will what is displayed on the tooltip.

The layout requires use of containers, background colours and a bit of padding. This is typically a bit of trial and error to get this right. You can check out my published version here.

Happy vizzin’! Stay Safe!

Donna

What percentage of sub-orders are profitable?

April 1, 2021 Donna ColesLeave a comment

It was Luke’s turn to provide the challenge for this week – to produce a modified mekko chart (or marimekko chart / mosaic plot – see here for more information).

Luke suggested it would require LODs to solve, but that it would also be possible with table calcs. I tackled it with a mixture of both, mainly based on what felt right at the time.

Like most challenges I worked out the data I needed for the final viz in tabular form first, so I could ratify the numbers and calculations I built before building the viz.

Building out the data
Building the viz

Building out the data

The y-axis of the viz plots the % of profitable orders for each Sub-Category. So the first thing we need to do is identify a profitable order. A single order can have multiple product lines, where each product line might be associated to a different Sub-Category. We need to determine whether the profit against all the products for the same Sub-Category on a single order is positive (ie profitable) or not. We use an LoD for this

Order Is Profitable

IIF({FIXED [Order ID], [Sub-Category]: SUM([Profit])}>0,1,0)

For each Sub-Category within a single Order ID, check if the total profit is a positive number. If it is then return 1 else 0. I’m purposefully choosing 1 and 0 to help with the next step.

To determine the % of profitable orders, we need to know the total of all the profitable orders as a proportion of all the orders for a Sub-Category.

Count Orders

COUNTD([Order ID])

The number of distinct orders.

Profitable Orders

SUM([Order Is Profitable])/[Count Orders]

formatted to be a percentage with 0 dp.

Put these into a table, along with the Sales measure, and you can see what’s going on

We need to sort this data, first by Profitable Orders desc, then by Sales desc. And we need to sort in a way that can be used once this table of data is displayed in the required Viz format. If we just wanted to sort this table of data displayed here, we can use a technique described in Tableau’s KB here. However this doesn’t work for the viz, as it relies on the newly created dimension existing on the Rows shelf, which won’t work when we get to building the viz, as we can’t put it on rows. Since the field also contains table calculations (Rank), you can’t reference the field in the Sort option of a pill.

Anyway, based on this, and after a bit of trial and error, I managed to create a sort field that I could reference.

Sort

STR(ROUND([Profitable Orders],2)) + “-” + STR(ROUND(SUM([Sales])/1000000,2))

Here I’m building up a string field combining the Profitable Orders field, that I’ve rounded to 2 decimal places, with the Sales field that I’ve rounded to $millions at 2 decimal places. This is to ensure that since we’re working with string data, 800 is ordered after 8000 when sorting descending.

Pop this into the view, and set the sort property of the Sub-Category pill to sort by Sort descending

There may be a better way of doing this, and I’m not sure it would work in all circumstances, but it worked for this challenge. I’ll be interested in seeing how others approach this element of the challenge.

Now, that’s resolved, let’s get the other fields we need.

Along with the Profitable Orders %, we also need to display the percentage of non-profitable orders

Non-Profitable Orders

1 – [Profitable Orders]

I also chose to format this to % with 0dp (purely for display purposes in the table).

The width of the bars in the viz, is based on the % of total sales, so let’s work that out…

% of Total Sales

SUM([Sales])/TOTAL(SUM([Sales]))

TOTAL is a table calculation that ‘totals’ up the sales in the table. In this case we care about the whole table, but as with any table calculation it can be set to apply to certain partitions in the view.

Having worked out the % of Total Sales, we now need to work out where to plot each Sub-Category along the x axis, as due to the variable width of the bars, we can’t just use the Sub-Category dimension on the Columns shelf.

Each Sub-Category is going to be positioned based on its % of Total Sales relative to it’s position in the sort order. That is we need to work out the cumulative value of % of Total Sales.

% of Total Sales Cumulative

RUNNING_SUM([% of Total Sales])

This totals up the % of Total Sales as we go down the rows in the table, as shown below. I’ve included column grand totals, so you see that % of Total Sales adds up to 100%, and the cumulative version sums up the previous values as it goes down the table, ending in 100% (note the numbers have been rounded due to 0dp, but if you changed the formatting, you’d see this better).

Now we have all the core elements we need to build the viz.

Building the Viz

On a new sheet,

add Profitable Orders to Rows
add Sub-Category to Detail
add % Total Sales Cumulative to Columns

Edit the table calculation setting of the % Total Sales Cumulative field, so that it is computing by Sub-Category (Compute Using -> Sub-Category option from the drop down /context menu of the pill).

Set the sort option against the Sub-Category pill to sort by the Sort field descending

Change the mark type to Bar.

Add % of Total Sales to the Size shelf. Then change the Size option (by clicking on the button) from Manual to Fixed, and set the alignment to Right

Add Non-Profitable Orders to Rows, then right-click on the relevant axis, Edit Axis, and set to reversed

This is the basic viz – it just now needs formatting

Add Measure Names to Colour and adjust accordingly
Set a white border around each bar (via the Colour shelf)
Edit the % Total Sales Cumulative axis, and change to start from -0.05 which will give a bit of space at the front
Remove all row / column borders and gridlines
To get the black bar visible across the 0 line, I ended up adding a reference line; a constant set to 0, formatted to a black line.
Hide the axes
Add appropriate fields to the Tooltip and set accordingly.
Add Segment to the Filter shelf.

Then just add to a dashboard, and you’re all set. My published viz is here.

Happy vizzin’! Stay Safe!

Donna

Can you create a variable width bar chart?

November 27, 2020 Donna ColesLeave a comment

For #WOW2020 Week 48, Jami Delagrange set the challenge based on a problem she’d been puzzling over for a while.

The data set was one Jami created herself about turkeys creating a ‘conference’ (it is Thanksgiving week in the US).

The above is basically 2 bar charts plotting the time an event happened (on the x-axis) against the number of turkeys attending the event (on the y-axis). One bar chart is coloured based on the location, the other is a white bar which has a width based on the duration of the event.

There are a couple of calcs needed for the basic data

# Turkeys

COUNTD([User Name (Original Name)])

Duration (mins)

DATEDIFF(‘minute’,[Start time],[End time])

With these we can get a basic set of data

The Start Time and End Time fields can be custom formatted to hh:mm AMPM.

The first bar chart we need is very simple, plotting Start Time against # Turkeys, coloured by Location.

For the second chart, we need to make use of the Size shelf, and the feature that allows you to fix the size based on a field. If we add Duration (mins), we don’t get what we need

The bars are far too wide. As the Start Time being plotted is at the ‘minute’ level, each unit of the axis is 1 minute of a day. There are 24 * 60 = 1440 minutes in a day. So the size needs to be based as a proportion of the number of minutes in a day ie

Duration Proportion of Day

[Duration (mins)] / (24 * 60)

Adding into our data table, you can see what these values translate to

and if we drop this onto the Size shelf instead, we get the required display

Build these charts as a dual axis, and you’ve got the output. Tableau’s own help article, discusses the Size feature a bit more.

My published viz is here.

Happy vizzin’! Stay Safe!

Donna

Donna + DataViz

ramblings in all things Tableau (and occasionally some other Stuff….)

Tag: Fixed Size Bar

Can you visualise when it gets busy?

What is the lifetime value of customers?

How many consecutive starts?

Can you structure the unstructured?

What percentage of sub-orders are profitable?

Can you create a variable width bar chart?