# Can you find the needle in the haystack?

It was Candra’s turn to ‘set’ the #WOW2021 challenge this week providing a hint in the challenge description that the solution would involve sets.

As with many challenges, I built the data out in tabular format to start with to verify I had all the components and calculations correct. The areas of focus are

• Identify number of distinct customers per product
• Identify overall average number of distinct customers per product
• Identify if product above or below average distinct customers
• Identify Top 50 products by Sales
• Identify Unprofitable Products
• Identify products that are both in the top 50 AND unprofitable
• Building the viz

Identify number of distinct customers per product

To start off, add Product Name, Sub-Category, Category to the Rows shelf to begin building out a table. Add Sales (formatted to \$k 0dp) and Profit (formatted to \$k 0dp with negative values as () ) to Text and sort by Sales descending.

To identify the distinct customers per product, we can create

Customer Count per Product

{FIXED [Product Name] : COUNTD([Customer ID])}

Identify overall average number of distinct customers per product

What we’re looking for here is the average of all the values we’ve got listed in the Customer Count per Product column. Ie we want to sum up those values displayed and divide by the number of rows.

The number of rows is equivalent to the number of products, which we can get from

Count Products

{FIXED : COUNTD([Product Name])}

And so to get the overall average we calculate

Avg Overall Customer Count

{FIXED: SUM([Customer Count Per Product])} / [Count Products]

Add these fields to the view as well, so you can see how the values work per row. The last two calculations give you the same value across all rows.

Identify if product above or below average distinct customers

Given the above display, this is just a case of comparing values in 2 columns

Higher than Avg Customer Count

AVG([Customer Count Per Product]) > SUM([Avg Overall Customer Count])

this returns true or false – add this to the view too.

Identify Top 50 products by Sales

We can create a set for this. Right click on Product Name > Create > Set. Name the set something suitable eg Top 50 Products, and on the Top tab, state the number (50) and the field (Sales) and the aggregation (Sum)

Add this to the view, and if you’ve sorted by the sales, you should find the top 50 rows are all In the set, and the rest are Out.

Identify Unprofitable Products

We can use another set for this. Again create a set off of Product Name, call it Unprofitable Products, and on the Condition tab, set the condition so that the Sum of Profit is less than 0

Add this onto the view too.

Identify products that are both in the top 50 AND unprofitable

For this, we’re explicitly looking for the rows that are both In the Top 50 Products set and In the Unprofitable Products set.

We can use the Combined Set functionality to do this.

In the left hand data pane, select both the Top 50 Products and the Unprofitable Products sets (hold down ctrl to multi select), then right click and Create Combined Set. I called the set Products to Include, and select to combine the sets by including Shared members in both sets

If you then add this field to the Filter shelf, you will be left with just the 13 Products that match

This is the single filter field you can use as per Candra’s requirements.

Building the viz

To get the text to display to the left of the bar, you actually need to create a ‘fake’ bar chart.

• Add Products to Include to Filter
• Add Product Name to Rows
• On the Columns shelf, double click and type in MIN(1)
• Add Sales to Columns to the right of MIN(1)
• Sort by Sales descending

Against the MIN(1) marks card

• Change the Size to small
• Set the Opacity of the Colour to 0% and the border to None
• Add Product Name, Sub-Category and Category to the Label shelf and adjust accordingly, aligning left
• Increase the height of each row to make the text visible

On the Sales marks card

• Add Higher than Avg Customer to the Colour shelf and adjust
• Show mark labels
• Create a new field Profit Ratio : SUM([Profit])/SUM([Sales]) Format to % with 0dp and add to Tooltip
• Add Profit, and Customer Count by Product to Tooltip and adjust accordingly

Finally, uncheck Show Header against Product Name and MIN(1) and Sales and format the borders/gridlines etc. Add the title, then add to the dashboard.

All done (I hope…)! My published version is here.

Happy vizzin’! Stay Safe!

Donna

# Can you show the top 10 rank over time for each Olympic country?

The #WOW2020 week 31 challenge was combined with a #PreppinData challenge and launched at a virtual live event. I was fortunate enough to be able to join this event for the first hour and had the pleasure of meeting up with some #WOW regulars Sean Miller, Kyle Yetter & Tim Beard, along with #WOW challenge setter Lorna Brown, and #PreppinData setters, Jenny Martin and Tom Prowse. I was gutted I couldn’t stay on for the whole session, but work commitments got in the way – bah!

I did complete the #PreppinData challenge first, and the last time the team ran a combined event, I did blog on how I built both challenges, but I’m struggling for time, and the #WOW has got a fair bit going on, so I’m afraid the Prep challenge won’t make the cut this time – sorry #Preppers!

So onto the #WOW challenge. Part of this challenge was to utilise the new Relationships feature in 2020.2, so you need to be on this version to follow along.

Creating the data source

If you’d completed the #PreppinData challenge, you could use your own outputs as the inputs for #WOW challenge. I did that initially, but as I was building I had a few minor discrepancies from the solution, so chose to replace my data sources with the hyper files that are referenced in the Prep challenge, these being

• Host Countries – 1 row per Olympic Host City (and Country) per Year since 1896 to 2016
• Country Medals – 1 row per Country attending the Olympics per Year, summarising the number of Bronze, Silver and Gold medals won by that country.
• Medalists – For every Country, for every Year, there is 1 row per Athlete who won a Medal including the type of Medal (Bronze, Silver, Gold).

In Tableau Desktop, connect to the Host Countries hyper file and drag the ‘table’ named Extract into the data source pane. If you right click on the table, you can the rename it – I chose Host.

Then Add a connection to the Country Medals hyper file, and again drag the ‘table’ named Extract into the data source pane so it connects to the Host table. Set the relationship to be on Year. I renamed the ‘table’ again to be Medals.

Now add another connection to the Medalists hyper file, and drag the ‘table’ named Extract to connect to the Medals table, this time setting the relationship both on Country and Year. I renamed the ‘table’ again to be Medalists.

Building the Bump Chart

As with many challenges, if I can, I build the data I need into a table to start with, so I can check my calculations, so lets get the basics

Note – depending on the order you connected your tables in the data source, some field names that exist in multiple tables will be suffixed with fieldname (Extract x). I won’t refer to the Extract part, but will reference fields I use in this blog by prefixing with the table name instead.

Drag out Host.Year, Host.Host Country, Medals.Country to Rows. Lets show the number of medals of each type each country won, so we can use this to sense check some calculations later : put Medals.Gold, Medals.Silver, Medals.Bronze into the table

The bump chart we need to build is ranked based on the score of each country which in turn is determined by giving 3 points for each gold medal, 2 for a silver and 1 for a bronze, so we need a calculated field

Score

IFNULL(SUM([Bronze]),0) + (IFNULL(SUM([Silver]),0) * 2) + (IFNULL(SUM([Gold]),0) * 3)

We need to rank this score, and I’m going to have a calculated field to store this explicitly

Rank Score

RANK_UNIQUE([Score])

Add these 2 fields to the table, and adjust the table calculation of Rank Score so all fields except Year are ticked

Now we could start building out the bump chart now, and I did when I was creating this and would then flit back and forth between doing something on the chart and checking new calcs. However, to keep the blog a bit easier, we’ll continue building out the table.

So first up, we need another rank field. When building out the bump chart initially, I was doing all sorts of things to show the Top 10 only, but whatever I did, I couldn’t stop the lines from joining up between countries when they didn’t exist in consecutive years eg Greece is 1st in 1896, but doesn’t appear in the Top 10 again until 1904. As 1900 was missed, the lines shouldn’t join, but mine were. I was faffing over this for some time, so eventually caved and checked out Lorna’s solution. She’d resolved this simply with

Top 10 Rank Only

IF [Rank Score] < 11 THEN [Rank Score] ELSE NULL END

ie only show the rank if its in the Top 10. Simples really!

Add this onto the table and check the table calculation setting is as previously.

So for the bones of the bump chart we have the two fields we’re going to plot against – Year and Top 10 Rank Only, but before we do that, let’s get the fields we need that are displayed on the tooltip.

# Countries Per Year

{FIXED [Year] : COUNT([Medals])}

This will give us the number of countries that participated in each year. The COUNT([Medals]) comes from dragging the Medals(Count) that is automatically generated as part of the Medals table into the calculated field dialog- in the new Relationships model, this Count field against each table is essentially the equivalent of Number of Records.

# Medals

IFNULL(SUM([Bronze]),0) + IFNULL(SUM([Silver]),0) + IFNULL(SUM([Gold]),0)

A simple tally of the total number of medals won by each country.

Is Host?

[Host Country]=[Country (Extract2)]

where this is checking Host.Host Country against Medals.Country and returns true if they match.

Hosted | Participated

IF [Is Host?] THEN ‘hosted’ ELSE ‘participated’ END

The text on the tooltip differs slightly dependent on whether the country hosted or not.

COLOUR: Country

IF [Is Host?] THEN [Host Country] END

The stars on the bump chart need to be coloured based on country, but we don’t want the circles coloured too, so this field is necessary.

Let’s get all these into our table… you’ll notice (or you may not), but adding some of these fields causes the Rank Score & Top 10 Rank Only to change, so readjust the tableau calculation so only Year remains unchecked.

Now we have everything to build the core bump chart.

Add Host.Year to Columns, Medals.Country to Detail and then add Top 10 Rank Only on Rows. Change the Mark Type to a Line, and verify the table calculation on the Top 10 Rank Only is set to compute by Country only.

Edit the Top 10 Rank Only axis and set the scale to be reversed

Change the Colour to grey and make the Size smaller.

Then add another instance of Top 10 Rank Only to site alongside the existing one on the Rows.(I tend to click on the existing pill, hold down ctrl and then drag – this will create a duplicate instance and will retain the table calculation settings).

Now make the chart dual axis & synchronise the axis.

Change the mark type of the second axis to be a shape, and add Is Host? to the Shape shelf. Adjust so that false is a filled circle and true is a filled star.

Also add Is Host? to the Size shelf on this marks card, and adjust the sizes so true is bigger than false.

At this point you will need to adjust the table calculation of the 2nd Top 10 Rank Only pill, to compute by both Country and Is Host?.

Add COLOUR: Country to the Colour shelf, and adjust the colours to use the Hue Circle palette. Set the NULL value to the same shade of grey as the line.

You’ll need to adjust the table calculation again of the 2nd Top 10 Rank By County pill, so only Year is unselected.

All the following fields need to be added to the Tooltip of the All marks card.

• Score
• # Countries Per Year
• Hosted | Participated
• Top 10 Rank Only (setting the table calc to compute by everything except Year)
• # Medals

Adjust the Tooltip accordingly, then tidy up the formatting of the chart

• Hide the Top 10 Rank Only axis
• Rotate the labels of the Year
• Hide the Year field label
• Remove all row & column lines and gridlines/zero lines

Medals Viz in Tooltip

The Bump chart has 2 Viz in Tooltips, one showing the count of the different medals won and the other showing the top 10 athletes. Build the Medals chart by

• Host.Year to Rows
• Medals.Country to Rows
• Medals.Bronze to Columns
• Then drag the Medals.Silver pill to the bottom of the chart where the Bronze axis is, and when you see 2 green columns, drop the pill. This should have the effect of Measure Values automatically being added to Columns, and Measure Names being automatically added to Rows and the Filter shelf.
• Then add Medals.Gold into the Measure Values pane
• Reorgansie the pills in the Measure Values pane so they are listed Gold, Silver, Bronze
• Show the mark Label

Now hide the Year and the Country columns, and the Value axis at the bottom. Remove all the formatting (the quickest way is to go to the bump chart sheet you should hopefully have formatted already, right click on the tab of the sheet at the bottom, and select Copy Formatting, then go back to the sheet you’re working on, and on the tab, right click & Paste Formatting

If this doesn’t clean everything up enough, just adjust formatting manually.

Finally, set the fit on this sheet to be Entire View. This will squash everything up, but when its referenced from the viz in tooltip, the view will be filtered to the Year and Country. Doing this will remove the ‘this view is too large’ message that may appear on the Viz in Tooltip.

Switch back to the Bump chart and add the sheet to the Tooltip by Insert -> Sheets -> <Select your sheet>. Adjust the maxwidth property to 500 and maxheight property to 100 to make the viz fit better

Top 10 Athletes Viz in Tooltip

Build the initial viz by

• Host.Year to Rows
• Medals.Country to Rows
• Medalists.Athlete to Rows
• Medalists,Medal to Columns and manually reorder the columns.
• Medalists.Medalists(Count) to Columns
• Medalists.Medal to Colour

To work out the Top 10, we need to first work out the score per medal

Medalist Score

CASE [Medal]
WHEN ‘Gold’ THEN 3
WHEN ‘Silver’ THEN 2
WHEN ‘Bronze’ THEN 1
END

then calculate the total score per athlete per games, since an athlete can win more than one medal and appear in multiple games

Total Score per Athlete

{FIXED [Year], [Country (Extract2)], [Athlete] : SUM([Medalist Score])}

where the Country is from the Medals table.

Use this field to sort the Athlete pill

We only want the Top 10 though, so add Athlete to the Filter shelf, and filter by the top 10 of Total Score per Athlete

At this point, things won’t look right, as the filter will be applying over all the data. To get this right, we now need to add the sheet to the Viz in Tooltip, so go back to the Bump chart, and on the tooltip add a reference to this sheet.

Adjust the width and explicitly filter by Host.Year, Medals.Country

If you now return to the Top 10 sheet, an additional filter will have been automatically added to the Filter shelf. Add this filter to context, so it will change to a grey pill.

Now return to the Bump chart and just test out that the chart is presenting correctly when hovering over various fields.

Now return to the Top 10 chart and tidy up all the formatting and hide various fields, and set to Entire View.

The bump chart should now be complete

Building the strip plot

The strip plot is showing a mark for each host with an indicator to show if they finished in the Top 10 or not. Additional information on the tooltip shows the host’s score and medals count. We’ll build this into a table first as below

We ultimately need to show 1 row per year, but the country level data is required to work out the rank. So we’re going to use some more table calculations.

We want to capture the host’s score and medals accrued against all the rows associated with each year.

Score for Host

WINDOW_MAX(IF ATTR([Is Host?]) THEN [Score] END)

Medals for Host

WINDOW_MAX(IF ATTR([Is Host?]) THEN [# Medals] END)

Add these to the table, setting the table calculation to compute over all fields except Year.

We also need to know if the host was in the top 10 or not

Is Host in Top 10?

WINDOW_MAX(IF ATTR([Is Host?]) AND [Rank Score]<=10 THEN 1 ELSE 0 END)

Add this onto the table, and again, sense check the table calculations (this one is nested) to ensure all are computing by all fields except Year

So we have all the bits of data we’ll need, we just need to reduce to 1 row per year.

Index = Size

INDEX() = SIZE()

Add this onto the Filter shelf, and set to true. Check the table calculation again to ensure all fields except Year are set. Recheck the filter is still only selecting true.

This should reduce the data, and just show the last row listed for each country.

• Host.Year to Columns
• Medals.Country to Detail
• Mark Type to Shape
• Is Host in Top 10? to Size and to Shape. Adjust the table calcs and shape/size accordingly (you’ll need to reverse the size).
• Host Country to Tooltip
• Medals for Host to Tooltip (remember table calc)
• Score for Host to Tooltip (remember table calc)
• Index = Size to Filter set to true (again remember table calc)
• Change colour to red

Top 10 Countries Viz in Tooltip

Add Host.Year, Host Country and Medals.Country to Rows and Score to Columns, then click the Sort button in the menu to sort the Country descending. Add Rank Score to Rows and change to be discrete (blue pill). Move it to sit in front of the Country field

We need to restrict the rows based on whether the Country in is the top 10 or it’s the host country

Is Host or Top 10?

ATTR([Is Host?]) OR [Rank Score]<=10

Add this onto the Filter shelf and set to true; once again make sure the table calc is computing by all fields other than Year (and double check its still set to true after adjusting). If you scroll to the bottom, you should see 11 rows for 2016, as the host, Brazil, finished 12th.

And now we need to colour the bars

COLOUR:Host

IF [Rank Score] > 10 THEN ‘Red’
ELSEIF ATTR([Is Host?]) THEN ‘Blue’
ELSE ‘Grey’
END

Add to the Colour shelf and adjust accordingly. Then add Score and # Medals to the Text shelf and format the label displayed. Finally hide some of the fields and format the chart to remove axis/gridlines etc.

Finally, set the fit to Entire View.

Go to the strip plot chart and add this sheet to the Tooltip, setting the filter to be on Year

And that should be all the components you need to build the dashboard.

My published version is here.

Happy vizzin’! Stay Safe!

Donna