How many consecutive starts?

Another table calculation related challenge this week, set by Luke, visualising cumulative starts for NFL Quarterbacks per team from 2006.

Luke provided the data within a Tableau workbook template on Tableau Public, so I started by downloading the workbook and understanding the data structure.

The challenge talks about teams playing over 17 weeks, but the data showed some data associated to weeks 28-32. So I excluded these weeks by filtering them out.

I then started to build out the data in tabular form, so I could start to build up what was required. I added Team, Season, Week and Player ID to Rows, and just to reduce the amount of data I was working with while I built up the calcs, chose to filter to the Teams ARZ, ATL & BLT.

What we’re looking to do is examine each Player ID and work out whether it is the first record for the Team or whether it differs from the previous row’s data. If so then we’re at the ‘start’ of a run, so we record a ‘counter’ value of 1. If not, the values match, so we need to increment the counter.

We’ll do this in stages.

Firstly, let’s get the previous Player ID value.

Prev Player ID

LOOKUP(MIN([Player Id]),-1)

This ‘looks up’ the record in the previous (-1) row. Change this field to be discrete and add to the Rows. Set the table calculation to compute by all fields except Team.

Each Prev Player ID value matches the Player ID from the row before, unless its the first row for a new Team in which case the value is Null.

Then we can create a field to check if the values match

Match Prev Player ID

MIN([Player Id])=[Prev Player ID]

Add this to the view and set the table calc as above, and the data shows True, False or NULL

Now we can work out the consecutive streak values

Consecutive Streak

IF (NOT([Match Prev Player ID])) OR ISNULL([Match Prev Player ID]) THEN 1
ELSE 1+PREVIOUS_VALUE(-1)
END

If we don’t match the previous value or we’re at the start of a new team (as value is NULL), then start the streak by setting the counter to 1, otherwise increment the counter. Add this to the view and set the table calc for both the nested calculations as per the settings described above.

Next we need to identify the last value of the Consecutive Streak for each Team.

Current Streak

WINDOW_MAX(IF LAST()=0 THEN [Consecutive Streak] END)

The inner IF statement, will return the value of Consecutive Streak stored against the last row for the Team. All other rows will be Null/blank. The WINDOW_MAX() statement then ‘spreads’ this value across all the rows for the Team.

Add this onto the view, and set the table calc for all the nested calcs.

Finally, we need one more bit of data. The chart essentially plots values from 2006 week 1 through to 2020 week 17. We need to ‘index’ these values, so we have a continuous week number value from the 1st week. We can use the Index table calculation for this

Index

INDEX()

Add this field to the view, set it to be discrete (blue pill) and position after the Week field on the Rows. Set the table calc as usual, so the Index restarts at each Team.

Now we’ve got all the data points we need, we can build the viz. I did this by duplicating the tabular view and then

  • Remove Prev Player ID and Match Prev Player ID
  • Move Season, Week and Player ID from Rows to Detail
  • Move Current Streak from Text to Rows and change to be discrete (blue)
  • Move Index from Rows to Columns and change to be continuous (green)
  • Move Consecutive Streak from Text to Rows
  • Change mark type to bar, and set to fit width to expand the view.
  • Change Size to be Fixed, width size 1 and aligned right
  • Set the border on the Colour shelf to be None.
  • Remove the Team filter and adjust the row height

All that’s left now is to set the tooltip (add Player to the Tooltip shelf to help this), and then apply the formatting. You can use workbook formatting (Format -> Workbook menu) to set all the font to Times New Roman.

Hopefully this is enough to get you to the end 🙂 My published viz is here.

Happy vizzin’! Stay Safe!

Donna

Can you structure the unstructured?

As soon as I saw that Candra’s challenge for this week was going to involve Regular Expressions (RegEx), I gave a little groan. RegEx just isn’t my thing 😦 I only ever seem to use them for these challenges, and not in my working life, so have minimal experience. I always think I should focus some time on learning them properly, but other things just end up taking priority. Ho Hum…

So most of my time was spent trying to wrangle the info I needed to identify ‘how many bedrooms’ each property had. I did a bit of googling to try to find the right expressions I think I needed, used the regex101 site to test my expression to find certain patterns of text against some of the data in the Description field, and then tried to plug that into a calculated field in Tableau to extract the data I needed.

But I couldn’t get it to work 😦 I could find matching text using the REGEXP_MATCH function, but when I then tried to use the REXP_EXTRACT functions I couldn’t get anything out…

So I ended up having to look at the solutions that had already been published by the time I started, Candra’s, Lorna Brown’s and Sam Epley’s. I just needed to get my head round what I was obviously doing wrong and give me some pointers. All 3 had slightly different approaches. I absorbed, then closed their workbooks and attempted again from memory. With a lot more trial and error I got somewhere… it isn’t perfect and has some mismatches from the others (but they don’t all match each other either…).

Once I’d got a grouping for each property, the actual Tableau stuff was quite straightforward…

  • Identifying the ‘Number of Bedrooms’
  • Building the Histogram
  • Adding the Average Price
  • Building the Map
  • Adding the Interactivity

Identifying the Number of Bedrooms

So the way I approached this, was to try to identify all the various permutations that represented the word ‘bedroom’ and replace it with the word ‘Bedroom’. But one of the options was BR or br, and the Description field contained html markup with the term <br />. I didn’t want all these to become ‘bedroom’, so I got rid of them all first,

Desc with Bedroom

REGEXP_REPLACE(LOWER(REPLACE([Description],'<br />’, ‘ ‘)),’bedroom|br |bdrm|bed|bd|br, |br/|rooms’,’ Bedroom’)

Firstly, replace any occurence of <br /> with a space, then replace any occurrence of the text bedroom or br<space> or bdrm or bed or bd or br<comma> or br<forward slash> or rooms with the word Bedroom.

I basically added more options to the or statement (identified by the | separator), as I went on examining the descriptions that were left. Using the LOWER function meant that bedroom or Bedroom or BedRoom etc would all be covered with one option.

Then I attempted to extract the number of bedrooms or identify as a studio

Studio | Beds

IF CONTAINS(LOWER([Desc with Bedroom]), ‘studio’) THEN ‘Studio’
ELSEIF REGEXP_MATCH(LOWER([Desc with Bedroom]),’\d bedroom’) THEN REGEXP_EXTRACT(LOWER([Desc with Bedroom]),'(\d+) bedroom’)
ELSEIF REGEXP_MATCH(LOWER([Desc with Bedroom]),’\d bedroom’) THEN REGEXP_EXTRACT(LOWER([Desc with Bedroom]),'(\d+) bedroom’)
ELSEIF CONTAINS(LOWER([Desc with Bedroom]), ‘six bedroom’) THEN ‘6’
END

If the revised description contains the word ‘studio’ then assume its a Studio.

Else if the revised description contains a number (\d) followed by 2 spaces then the word ‘bedroom’ then extract the numbers (\d+) that occur before the word bedroom. The brackets around the \d+ is what is used to identify what bit of the matching pattern to extract… this is the bit that I didn’t really know about and why I couldn’t get things to work.

Else if the revised description contains a number (\d) followed by 3 spaces then the word ‘bedroom’ then extract the numbers (\d+) that occur before the word bedroom. This just happened to be another pattern that occurred and meant some records didn’t get picked up by the prior statement. There’s probably a better way of doing this in one statement…

Finally, if the revised description contains the text ‘six bedroom’ then assume the property has 6 rooms.

This logic seemed to get a match against every record although it’s not 100% accurate, but it was close enough given my struggles.

I then wanted to get the rooms grouped

Room Grouping

CASE [Studio | Beds]
WHEN ‘Studio’ THEN ‘Studio’
WHEN ‘1’ THEN ‘1 Bedroom’
WHEN ‘2’ THEN ‘2 Bedrooms’
WHEN ‘3’ THEN ‘3 Bedrooms’
WHEN ‘4’ THEN ‘4 Bedrooms’
ELSE ‘5 or more Bedrooms’
END

I planned to use this field as my filter, but in doing so the value listed alphabetically, so Studio ended up at the bottom of the list.

To resolve this I created a parameter which meant I could define the order I wanted :

pBedroomSelector

And then I created a new field to use for the filter

Filter Room

[pBedroomSelector] = ‘All’ OR
[pBedroomSelector] = [Room Grouping]

I could then add this onto the filter shelf of the sheets I needed to build, setting the value to True.

Building the Histogram

For this chart, we need to ‘bin’ the Price of each property into groups of $100 ranges. However if we use the built in ‘bin’ function, the field created can’t be referenced in other calculations, and I needed to do this. So instead I determined the ‘lower’ value of the range by

Price per Night Min

FLOOR([Price]/100) *100

Divide the price by 100, round down to the nearest whole integer (so 1.9 will round down to 1), then multiply the result by 100.

And given that, I can then calculate

Price per Night Max

[Price per Night Min]+100

I also created a ‘friendlier’ field to store the number of properties

# of Listings

COUNT([listings copy_listings copy])

which is just a reference to the auto generated field created when you connect to the data source.

With these I can plot the histogram

  • Price per Night Min on Columns (set to discrete, continuous)
  • # of Listings on Rows
  • Mark type of Bar
  • Size set to be Fixed with a width of 100
  • Filter Room on the Filter shelf, set to True.
  • Adjust the colour via the Colour shelf and set a white border
  • Show the pBedroomSelector parameter
  • Add Price per Night Max to the Tooltip shelf and set to be an attribute.
  • Set the Tooltip accordingly and format gridlines, axes labels etc

Adding the Average Price

I wasn’t entirely sure what the average price on Candra’s solution represented, so I chose to go for the average price of the properties in the filtered selection; that is of all the 2-bedroom properties for example, find the average price per night, based on the total price per night of all the properties divided by the number of properties. ie I was looking for these values in the 3rd column.

But I couldn’t simply add the Price field aggregated to Avg to the bar chart. Doing so gave me different values per Price per Night Min grouping.

I just want the value on the grand total line spread across the all the data in the chart. So I created

Window Avg Price

WINDOW_SUM(SUM([Price])) / WINDOW_SUM([# of Listings])

This table calculation, set to compute by Price per Night Min gives the value I want across all rows of data

Add Window Avg Price to the Detail shelf of the histogram, set the calc to compute as above. Then you can add a reference line to the Price per Night Min axis.

Building the Map

To build maps you need fields that are geographic data types. For me, the Longitude field was already set, but I had to manually set the Latitude field (right click -> Geographic Role -> Latitude).

Once done, the map could be quickly built by double-clicking the Longitude field, then double clicking the Latitude field, then adding Name and Listing URL to the Detail shelf, and Price to the Tooltip shelf. Finally set Filter Room = True to the Filter shelf.

I then adjusted the colour of the circles, reduced the opacity to 50% and added a border (all via the Colour shelf).

I also added Area Code Boundaries via the Map -> Map Layers menu to get the map style Candra had used.

Adding the Interactivity

Add the 2 sheets to a dashboard. Each chart can be used to filter each other. This functionality can easily be added by clicking on the context menu of the dashboard object, and selecting Use as Filter. A filter dashboard action will automatically be added. Do this for both charts.

The final requirement, is for a link to the actual listing to be available from the map tooltip. This is a dashboard URL Action (Dashboard -> Actions -> Add Action -> Go to URL). Set as below

The words in the Name field will what is displayed on the tooltip.

The layout requires use of containers, background colours and a bit of padding. This is typically a bit of trial and error to get this right. You can check out my published version here.

Happy vizzin’! Stay Safe!

Donna

What percentage of sub-orders are profitable?

It was Luke’s turn to provide the challenge for this week – to produce a modified mekko chart (or marimekko chart / mosaic plot – see here for more information).

Luke suggested it would require LODs to solve, but that it would also be possible with table calcs. I tackled it with a mixture of both, mainly based on what felt right at the time.

Like most challenges I worked out the data I needed for the final viz in tabular form first, so I could ratify the numbers and calculations I built before building the viz.

  • Building out the data
  • Building the viz

Building out the data

The y-axis of the viz plots the % of profitable orders for each Sub-Category. So the first thing we need to do is identify a profitable order. A single order can have multiple product lines, where each product line might be associated to a different Sub-Category. We need to determine whether the profit against all the products for the same Sub-Category on a single order is positive (ie profitable) or not. We use an LoD for this

Order Is Profitable

IIF({FIXED [Order ID], [Sub-Category]: SUM([Profit])}>0,1,0)

For each Sub-Category within a single Order ID, check if the total profit is a positive number. If it is then return 1 else 0. I’m purposefully choosing 1 and 0 to help with the next step.

To determine the % of profitable orders, we need to know the total of all the profitable orders as a proportion of all the orders for a Sub-Category.

Count Orders

COUNTD([Order ID])

The number of distinct orders.

Profitable Orders

SUM([Order Is Profitable])/[Count Orders]

formatted to be a percentage with 0 dp.

Put these into a table, along with the Sales measure, and you can see what’s going on

We need to sort this data, first by Profitable Orders desc, then by Sales desc. And we need to sort in a way that can be used once this table of data is displayed in the required Viz format. If we just wanted to sort this table of data displayed here, we can use a technique described in Tableau’s KB here. However this doesn’t work for the viz, as it relies on the newly created dimension existing on the Rows shelf, which won’t work when we get to building the viz, as we can’t put it on rows. Since the field also contains table calculations (Rank), you can’t reference the field in the Sort option of a pill.

Anyway, based on this, and after a bit of trial and error, I managed to create a sort field that I could reference.

Sort

STR(ROUND([Profitable Orders],2)) + “-” + STR(ROUND(SUM([Sales])/1000000,2))

Here I’m building up a string field combining the Profitable Orders field, that I’ve rounded to 2 decimal places, with the Sales field that I’ve rounded to $millions at 2 decimal places. This is to ensure that since we’re working with string data, 800 is ordered after 8000 when sorting descending.

Pop this into the view, and set the sort property of the Sub-Category pill to sort by Sort descending

There may be a better way of doing this, and I’m not sure it would work in all circumstances, but it worked for this challenge. I’ll be interested in seeing how others approach this element of the challenge.

Now, that’s resolved, let’s get the other fields we need.

Along with the Profitable Orders %, we also need to display the percentage of non-profitable orders

Non-Profitable Orders

1 – [Profitable Orders]

I also chose to format this to % with 0dp (purely for display purposes in the table).

The width of the bars in the viz, is based on the % of total sales, so let’s work that out…

% of Total Sales

SUM([Sales])/TOTAL(SUM([Sales]))

TOTAL is a table calculation that ‘totals’ up the sales in the table. In this case we care about the whole table, but as with any table calculation it can be set to apply to certain partitions in the view.

Having worked out the % of Total Sales, we now need to work out where to plot each Sub-Category along the x axis, as due to the variable width of the bars, we can’t just use the Sub-Category dimension on the Columns shelf.

Each Sub-Category is going to be positioned based on its % of Total Sales relative to it’s position in the sort order. That is we need to work out the cumulative value of % of Total Sales.

% of Total Sales Cumulative

RUNNING_SUM([% of Total Sales])

This totals up the % of Total Sales as we go down the rows in the table, as shown below. I’ve included column grand totals, so you see that % of Total Sales adds up to 100%, and the cumulative version sums up the previous values as it goes down the table, ending in 100% (note the numbers have been rounded due to 0dp, but if you changed the formatting, you’d see this better).

Now we have all the core elements we need to build the viz.

Building the Viz

On a new sheet,

  • add Profitable Orders to Rows
  • add Sub-Category to Detail
  • add % Total Sales Cumulative to Columns

Edit the table calculation setting of the % Total Sales Cumulative field, so that it is computing by Sub-Category (Compute Using -> Sub-Category option from the drop down /context menu of the pill).

Set the sort option against the Sub-Category pill to sort by the Sort field descending

Change the mark type to Bar.

Add % of Total Sales to the Size shelf. Then change the Size option (by clicking on the button) from Manual to Fixed, and set the alignment to Right

Add Non-Profitable Orders to Rows, then right-click on the relevant axis, Edit Axis, and set to reversed

This is the basic viz – it just now needs formatting

  • Add Measure Names to Colour and adjust accordingly
  • Set a white border around each bar (via the Colour shelf)
  • Edit the % Total Sales Cumulative axis, and change to start from -0.05 which will give a bit of space at the front
  • Remove all row / column borders and gridlines
  • To get the black bar visible across the 0 line, I ended up adding a reference line; a constant set to 0, formatted to a black line.
  • Hide the axes
  • Add appropriate fields to the Tooltip and set accordingly.
  • Add Segment to the Filter shelf.

Then just add to a dashboard, and you’re all set. My published viz is here.

Happy vizzin’! Stay Safe!

Donna

Can you create a variable width bar chart?

For #WOW2020 Week 48, Jami Delagrange set the challenge based on a problem she’d been puzzling over for a while.

The data set was one Jami created herself about turkeys creating a ‘conference’ (it is Thanksgiving week in the US).

The above is basically 2 bar charts plotting the time an event happened (on the x-axis) against the number of turkeys attending the event (on the y-axis). One bar chart is coloured based on the location, the other is a white bar which has a width based on the duration of the event.

There are a couple of calcs needed for the basic data

# Turkeys

COUNTD([User Name (Original Name)])

Duration (mins)

DATEDIFF(‘minute’,[Start time],[End time])

With these we can get a basic set of data

The Start Time and End Time fields can be custom formatted to hh:mm AMPM.

The first bar chart we need is very simple, plotting Start Time against # Turkeys, coloured by Location.

For the second chart, we need to make use of the Size shelf, and the feature that allows you to fix the size based on a field. If we add Duration (mins), we don’t get what we need

The bars are far too wide. As the Start Time being plotted is at the ‘minute’ level, each unit of the axis is 1 minute of a day. There are 24 * 60 = 1440 minutes in a day. So the size needs to be based as a proportion of the number of minutes in a day ie

Duration Proportion of Day

[Duration (mins)] / (24 * 60)

Adding into our data table, you can see what these values translate to

and if we drop this onto the Size shelf instead, we get the required display

Build these charts as a dual axis, and you’ve got the output. Tableau’s own help article, discusses the Size feature a bit more.

My published viz is here.

Happy vizzin’! Stay Safe!

Donna