What is the distribution of total orders by customer?

Luke Stanke set the challenge this week, and posted a ‘sneak peak’ on Twitter before the challenge was formally released

Challenges from Luke can sometimes be on the harder end of the scale, so with the bit of extra time available and only the gif in the tweet as a clue , I had a play to see if I could get close at all. And surprisingly I could, so once it was formally issued, it was just a case of tidying up some of my calcs.

Calculations

First up we need to establish the basic calculations

# Customers

COUNTD([Customer ID])

# Orders Per Customers

{FIXED [Segment],[Customer ID]: COUNTD([Order ID])}

Plotting these out onto a table with Segment we get

Note #Orders Per Customers needs to be set to be a discrete dimension.

So while this shows us a summary of the number of customers, we’ve only got 1 mark showing the summarised count. When plotting the chart we need something else in the view that will generate more marks. This is Customer ID.

At this point in order to help us build up our tabular view to ‘see’ what’s going on, I’ll filter the table to just show Segment = Consumer, and I’ll add Customer ID to Rows

As expected, our # Customers is now showing a count of 1 per row, but the data is now expanded as we have now got a row (ie a mark) per customer for each # Orders Per Customer ‘bucket’. But we still need a handle on the total customers in each ‘bucket’.

Customers Per Order Count

WINDOW_SUM([# Customers])

Adding this to the view and setting the table calc to compute based on Segment & Customer ID, we get the summarised value back again.

But we don’t want to actually show that number of marks; the number of circles to plot on the chart is dependent on a user parameter:

pMarkIndicator

Based on the requirements, if the number of customers is 15 and the user parameter is 5, then 3 circles should be drawn (15 / 5 =3 ), but if the number of customers was 14, only 2 circles should be drawn (14 / 5 = 2.8), ie the number of circles will always be set to the integer of the equal or lesser value (essentially the FLOOR() function). This can also be achieved by

Marks to Plot

INT([# Customers per Order Count]/[pMarkIndicator])

For some reason FLOOR can’t be used in the above as a table calculation is being used, but INT does the job just fine, and adding to the tabular view and adjust the table calculation accordingly we get

ie for the Consumer Segment, 6 customers have made 1 order in total, so based on batching the customers into groups of 5, this means 1 circle should be displayed. Whereas, 30 customers have made 3 orders in total, so 6 circles should be displayed.

But we can’t actually reduce the amount of rows (ie marks) displayed – we either have 1 row (by removing Customer ID) or a row per customer. But that’s fine, we don’t need to.

What we want is something in our data to group each row into the relevant batch size.

First up let’s generate an ID per row for each customer that restarts for each #Orders Per Customer.

Index

INDEX()

Add this as a discrete pill to Rows, and adjust the table calc

Now we have this, we calculate which ‘column’ each customer can sit in, a bit like what we would do if we were building a small multiple table, arranging objects in rows and columns (see here for an example of what I mean).

Cols

[Index]%([Marks to Plot])

This uses the modulo (%) notation which returns the remainder of the division sum. Lets put this on the view

For customers who have only ordered once, and where we’re only going to plot 1 mark, the Cols value is the same (0) for all rows.

Whereas for customers who have ordered twice, and where we want to plot 2 marks, the Cols value is either 0 or 1.

We’ve now got a value we can use to plot on an axis. We’re still going to plot a mark for each Customer ID, but some marks will be plotted in the same position, ie on top of each other, which therefore looks like just one mark.

Let’s show this more graphically, by duplicating the sheet, deleting some pills, moving some around, changing some to continuous, and setting the mark type to circle as below

Change the pMarksIndicator parameter and the number of circles will adjust as required.

So far, so good. We’ve got the right number of marks, it’s just not looking as nice and symmetrical as it should be.

We need to shift the marks to the left. But how far it shifts is dependent on whether we’ve plotted an odd or even number of marks.

If we have an odd number, the middle mark should be plotted at 0. If we have an even number the middle two marks should be plotted at -0.5 and +0.5 respectively. The calculation below will achieve this

Cols Shifted

IF [Marks to Plot]%2 = 0 //we’re even
THEN [Cols] – ([Marks to Plot]/2) + 0.5
ELSE //we’re odd
[Cols] – (([Marks to Plot]-1)/2)
END

To demonstrate this, I’ve added Cols Shifted along side Cols on the viz (this time make sure all the table calculation settings (including the nested calcs) are applied to compute based on Customer ID only which is different from the calcs above)..

Now you can see how it all works, you can remove the Cols and the Segment from the Filter shelf.

And now its just a case of applying the various formatting to clean up the display, and adding to the dashboard.

My published version is here.

Happy vizzin’! Stay Safe!

Donna

Can you create a concatenated list of values?

The challenge for this week’s #WOW was set by Sean Miller, with a focus on creating a more ‘traditional’ list style report. The challenge was related to the structure of the data, and using Tableau to take multiple rows of data for a patient, and present as a single row, with some of the information concatenated into a string of comma separated values.

Whilst there are other tools & methods that could be used to shape the data before it hits Tableau, which may be more appropriate/performant for a business use, this is after all a Tableau challenge.

I figured out what I was going to need pretty quickly – a table calculation to compare rows to build up the string. However I did struggle with the volume of data being used. It caused Tableau to keep crashing initially, so I had to ping Sean to sense check I was on the right track, and there wasn’t something I was missing. Sean confirmed I was doing the right thing, so I decided to start building against a much smaller set of data. I restricted the data by adding a data source filter, and using a wildcard filter against Member Name starts with ‘a’.

So onto the build.

Building up the concatenated string

Add Member Name and Health Check Name to Rows to get the set of rows we need to work with.

What we’re aiming for is to get a concatenated string of each Health Check Name for each Member Name, on a row.

We’re going to achieve this by building up a string that combines the Health Check Name for the current row, with the combined list from the previous row.

This requires the use of a couple of Table Calculations. The first one is Index(). This can be used to number the rows in a table. I typically create a calculated field to store this.

Index

INDEX()

Set it to be discrete rather than continuous, and add the field to the Rows.

By default, the index is basically displaying a number per row, starting from 1 to however many rows are being displayed. This is because the table calculation has been set to compute Table Down.

We actually want the calculation to restart at 1 when the Member Name changes, ie for each Member Name.

Change this by clicking the ‘carrot’ / arrow on the Index pill and set to Compute Using -> Health Check Name

Next we need to create new calculated field that will build up the string

Health Check Name List

IF [Index]=1 THEN ATTR([Health Check Name])
ELSE PREVIOUS_VALUE(ATTR([Health Check Name])) + ‘, ‘ + ATTR([Health Check Name])
END

If we’re the first row, then store the name of the current row, else get the value from the previous row and concatenate to the current value, separating with a ‘,’.

Add this field to the Text shelf, and verify the table calculation is set to Compute by Health Check Name

You can see how the list is building up for each row, so by the time you get to the last row for the Member Name you’ve got the complete list.

Restrict to 1 row per Member Name

The last row for each Member Name is ultimately what we want. To identify this, I need another table calculation

Size

SIZE()

This does exactly what it ‘says on the tin’; returns the number of rows being displayed, and like INDEX() it can be computed over different partitions in the table.

Change it to be discrete and add to Rows, once again setting it to Compute By -> Health Check Name

Size is displaying the total number of rows associated to each Member Name.

To reduce the data to 1 row per Member Name, we just need the following:

Index = Size?

Index = Size

This returns true if the values match, so add this to the Filter shelf and set to True, and hey presto! 1 row per Member Name.

You can now start adding all the other fields to the output and remove Index & Size from the rows.

You will need to ensure Health Check Name remains on the canvas though, as otherwise the table calculations will break. You can either hide it from displaying as a column, by unchecking Show Header, or move it to the Detail shelf.

To ‘label’ the concatenated string field, so it ‘looks’ like just another regular column, I type the text directly into the Columns shelf

This gives us a ‘double’ column heading, and we can simply remove one

Min # Healthchecks filter

This is achieved by adding Size to the Filter shelf, but to get a range slider, the field needs to be a green continuous pill rather than blue.

List Must Contain filter

Right click on the Health Check Name field and Create -> Parameter

The Edit Parameter dialog will open with the list of distinct values pre-populated. Add the value ‘All’ to the bottom.

This parameter will be presented on the dashboard to allow user selection. But as it stands it won’t do anything. We need another field to read in the value selected in the parameter and filter the rows.

FILTER: Health Check Names

CONTAINS([Health Check Name List],[Health Check Name Parameter]) OR [Health Check Name Parameter]=’All’

This returns True if ‘All’ is selected or the concatenated list we built, contains the value from the parameter.

Add this to the Filter shelf, and set to True.

Note throughout this challenge, all the table calculations must be set to compute by Health Check Name. This includes Nested Calculations

It’s worth double checking, and if you do alter them, recheck the filter value is selected to True and hasn’t reset itself.

And that’s the core of the tricky stuff all sorted. The additional filters need adding for Physician & Age Bracket, formatting applied and then all displayed nicely on a dashboard.

Once I’d got all this I saved my work, and then took off my data source filter, keeping my fingers crossed it didn’t all grind to a halt… it didn’t, but it wasn’t the quickest either.

My published solution is here.

Apparently someone completed the challenge with LODs… I’m off to investigate further!

Happy vizzin’!

Donna