Can you create a jittered bar chart?

February 16, 2023 Donna Coles1 Comment

The theme of alternative chart types continued with Lorna setting this challenge to create a jittered bar chart. We needed to use the same fake survey data set used in week 4, so initially some data remodelling was required. I’ve already blogged that here, so please refer to the modelling the data section if you need help.

The questions all need to be grouped based on the text that preceded the question, so I created

Question Group

TRIM( SPLIT( [Question], “-“, 1 ) )

which split off the text in front of the first ‘-‘. I actually created this by right-clicking on the Question field > Transform > Custom Split and selecting to to split off the first 1 column using the – separator.

I also created a field to number the responses

Response No

CASE [Response]
WHEN ‘Strongly Disagree’ THEN 1
WHEN ‘Disagree’ THEN 2
WHEN ‘Neutral’ THEn 3
WHEN ‘Agree’ THEN 4
ELSE 5
END

Once I’d done that I was able to ratify the numbers by building a simple table with Response No and Response on Columns, Question Group on Rows, and CNT(Id) on Text.

Eyeballing these numbers against where the ‘bars’ in Lorna’s viz stopped and I figured that was on the right track. However then I wasn’t particularly sure what to do next.

Lorna’s instructions were very brief. I figured I’d need to use a jitter (ie RANDOM()) somewhere, and as I’d need to add Id to the Detail shelf at some point, to generate 1 dot per respondent, I deduced I’d also need

Total Respondents

{FIXED [Response], [Question Group]:COUNT([Id])}

Adding this into by table above and I got the same values as CNT(Id) which is what I expected.

After scratching my head a bit, I decided that Lorna was possibly being vague, as there was probably help ‘out there’. So I googled, and immediately stumbled upon this very useful blog post, which pointed me in the right direction for the additional calculations required.

I created

Columns

RANDOM()

Jitter

RANDOM()

note – due to the way RANDOM() works, Jitter and Columns won’t contain the same value.

Rows

[Jitter] * [Total Respondents]

And with these I could build out the viz

Add Response No, Response and Columns to Columns. Change Columns to be a dimension.

Add Question Group and Rows to Rows. Change Rows to be a dimension. Add Id to Detail.

Change the mark type to circle, reduce the size and add Question Group to Colour and adjust accordingly.

Finally add Question to Tooltip and adjust the tooltip. Then remove all gridlines, hide the Columns axis, change the title of the Rows axis, and remove the row and column titles (hide field labels for rows / columns).

A pretty short blog today! My published viz is here.

Happy vizzin’!

Donna

Can you create a normalised jitter plot?

February 9, 2023 Donna Coles1 Comment

Continuing the theme of alternative chart types, Kyle decided to challenge us to recreate this jitter plot inspired by an example from The Big Book of Dashboards.

I’ve built jitter plots in the past for #WorkoutWednesday challenges (the hidden RANDOM() function is your friend in this), but I wanted to see how far I could get without having to peak at my previous solutions.

So I connected to the baseball data provided, and cracked on, building the jitter in a single sheet using Measure Names on the columns. But then I got stuck when it came to labelling the tooltips…. surely this didn’t need a sheet per measure did it…

…maybe it did… so I proceeded to recreate as 4 separate sheets, and felt quite smug that I’d managed to make use of the ‘little used’ worksheet caption to provide the summary detail at the bottom of each measure. When I’d finished, I checked Kyle’s solution, as the summary values for the SLG & OPS measures seemed to be mixed up…. and what did I find…. he had managed to build the jitter within a single sheet as he had pivoted the data first! Argggghhhh! It just hadn’t crossed my mind, and Kyle had chosen not to drop that hint in the requirements…. hey ho! c’est la vie! I may well recreate with a pivoted version at a later date, but for now, I am blogging what I did…

My solution has ended up with a lot of calculated fields as a result, as equivalent fields needed to be created for every measure. Most of this was managed via duplicating and editing existing fields, so it actually wasn’t too onerous.

Building the calculated fields

We’ll start as usual by building out the fields required, and will focus on the BA measure initially.

Add Name and BA into a tabular view and Sort descending. Format the BA measure to be a number with 3 decimal places.

We need to know the rank of each player. Create a calculated field

Rank BA

RANK(SUM([BA]))

Add this to the view, and we should get the player rankings displayed from 1 downwards.

Now the requirement wants us to normalise the measures so they can be displayed on the same axis (or in my case, since it’s not a single chart I’m building), within the same axis range.

What this means is we want to plot the measure on a scale between 0 and 1 where 0 represents the lowest measure value, and 1 the highest. for this we need

Min BA

{FIXED :MIN([BA])}

and

Max BA

{FIXED :MAX([BA])}

The normalised value then becomes

Normalise BA

([BA] – [Min BA])/([Max BA]-[Min BA])

The difference between the current value and the lowest value, as a proportion of the range (ie the difference between the highest and lowest values).

Adding this to the table, you should see that the Normalise BA value for the highest ranked player is 1 and that for the lowest ranked is 0.

As part of the information displayed, we also need to know the percentile each player is in.

Percentile BA

RANK_PERCENTILE(SUM([BA]))

Format this to a percentage with 0 dp and add into the table.

Next we need to identify the player selected, so we’re going to create a parameter based off of the Name.

Select a Player

Right click Name -> create -> Parameter. This will open the parameter dialog and auto populate the list of options with the values from the Name field. Default the parameter to Julio Rodriguez.

We can then create a field to identify if the player is the one selected

Is Selected Player?

[Name] = [Select a Player]

Add this into the table, on the Rows before Name and sort so True is listed at the top (just easier to check the results).

So now we need to identify the rank and percentile of the selected player only

Selected Player BA Rank

WINDOW_MAX(IF ATTR([Is Selected Player]) THEN [Rank BA] END)

format this to a number with 0 dp

Selected Player BA Percentile

WINDOW_MAX(IF ATTR([Is Selected Player]) THEN [Percentile BA] END)

format this to a percentage with 0 dp.

The window_max function has the effect of ‘spreading’ the result over all the rows.

Finally we need to get a count of all the players

Count Players

{FIXED:COUNTD([Name])}

format this to a number with 0 dp.

Building the Jitter Plot

To build a jitter plot, we need to plot each mark against 2 axes. The Normalise BA measure is one axis, but we need to create ‘something’ for the other. This is the value to make the ‘jitter’ which is essentially an arbitrary value between 0 and 1 that we can plot the mark against, and means the marks don’t all end up in a single line on top of each other, and we can get a better ‘feel’ for the volume of data being represented.

Jitter

RANDOM()

The random() function is a ‘hidden’ function that will, as it’s name suggests, generate a random number. It is ‘hidden’ as it only works with some data sources. Excel for example is fine, but if you were connected to a Snowflake database, you can’t use it.

The nature of random, also means that you can’t guarantee the value it produces, and it will regenerate on data refresh, so if you’re looking to compare your solution directly, your dots will not be positioned exactly the same.

On a new sheet add Jitter to Columns and Normalise BA to Rows. Add Name to Detail and change the mark type to Circle.

Add Is Selected Player to Colour, adjust accordingly and add a border to the circle. I dropped the opacity to 70%. Order the colour legend, so True is listed first, to ensure this circle is always ‘on top’.

Then add Is Selected Player to Size. Edit the sizes so they are reversed and adjust the sizes until you’re happy.

To label just the selected player mark

Label:BA

IF [Is Selected Player] THEN [BA] END

format this with a custom number font ,##.000;-#,##.000

Add this to the Label shelf and adjust the font colour, and align centrally

Add Rank BA and BA to the Tooltip shelf and adjust tooltip to suit. You will need to adjust the table calculation setting of the Rank BA field so that it is computing by all the fields.

Add Selected Player BA Rank and Selected Player BA Percentile and Count Players to the Detail shelf. Adjust the table calculations as above (including any nested calcs), then show the worksheet caption (Worksheet -> Show Caption), and edit the caption to display the relevant text.

From the analytics pane, drag the Median with Quartiles option onto the canvas and drop it on the table / Normalise BA axis option. Remove the quartile reference lines (right cick axis -> remove reference line), and edit the median reference line to be a dashed line with no label.

Finally remove all gridlines/dividers/axes lines and hide the axes. Title the sheet as per the measure ie BA, and align centrally.

Format the Caption and the Title to have a light grey background and a slightly darker thin border.

Now, repeat all that for the other measures 🙂 This isn’t that bad. All the fields above labelled BA, need duplicating, renaming and updated to reference the next measure eg OBP.

Once done, duplicate the BA jitter plot sheet, and replace all the ‘BA’ related fields with the equivalent ones, by dragging the equivalent field and dropping it directly on top. Sense check the table calculation settings are all ok. You may need to update the text in the caption, as that seems to lose anything to do with the table calculation fields referenced when they get touched.

Ultimately you should end up with 4 sheets.

Putting it all together

On a dashboard, use a horizontal container to position all 4 sheets in side by side. Show the worksheet caption for each sheet. Reduce the outer padding for each sheet to 0, and add a thin border around each sheet.

Add a parameter action to drive the interactivity ‘on click’ of a circle

Select Player

On select of any of the source sheets, update the Select a Player parameter with the value from the Name field. Retain the selected value on ‘unclick’

To prevent the selection on click from being ‘highlighted’, and al the other marks ‘fading’, we need one final step.

Create new calculated fields

True

TRUE

False

FALSE

Add both these fields to the Detail shelf of each of the 4 sheets.

Then add a dashboard filter action for each sheet which on select, goes from the sheet on the dashboard to the worksheet itself, passing the selected fields of True = False. Show all values when unselected.

My published viz is here. Kyle’s solution with a lot less calculated fields, and only 2 sheets (1 for the jitter and 1 for the summary section at the bottom) is here. You will need to pivot the data via the data source pane first through 🙂 Next time, when I really feel something should be able to be done in 1 sheet, I’ll try to think a little longer…. upshot though, I impressed myself at the use of the caption for the summary – something I must consider using more often!

Happy vizzin’!

Donna

Can you create a jittered box plot?

March 24, 2022 Donna ColesLeave a comment

It was Kyle’s turn to set this jittered box plot challenge this week. While it may sound complicated, this is quite a straightforward challenge this week, made more so as Kyle very kindly provides references to other blogs which help you.

So let’s build…

Firstly you need to download the 2 excel files Kyle provides, then relate them via the Team field (this should happen automatically).

On a new sheet add Team to Columns, Age to Rows and Player Code to Detail. Change the mark type to circle, and reduce the Size slightly.

From the Analytics pane, drag box plot onto the sheet and drop onto the Cell image that displays.

Create a new field which is the key field for the jittering functionality

Jitter

RANDOM()

This just generates a random number between 0 and 1.

Add this to Columns and change the view to Entire View so its not all squashed up.

We’ve got our jittered box plot. Now we just have to add in the additional functionality required.

Drag the Playoffs field so that it’s above the line in the data pane on the right hand side (ie change it from a measure to a dimension). Then right click > Aliases to alias to 0 and 1 values to the labels required

Add this field to the Columns in front of the Team field. Re-order so ‘Playoff Teams’ is listed first (I just click on the field name and drag it to the left).

We also need to sort the data based on the median value per team. We need a new field for this.

Median Age per Team

{FIXED [Team]:MEDIAN([Age])}

Add a sort to the Team field, so it sorts by the Median Age per Team descending

Add League to the Filter shelf and set to AL.

Format the worksheet and set the Column Banding to be level 0 and band size 1 to shade the sections as required.

Then adjust the format of the gridlines to remove all row and column gridlines.

Finally, add Name to the Tooltip and adjust accordingly. Then hide the Jitter axes (uncheck Show Header) and adjust the Age axis so it is fixed to start at 18.

You can now add this to a dashboard and you’re done! My published viz is here.

Happy vizzin!

Donna

Donna + DataViz

ramblings in all things Tableau (and occasionally some other Stuff….)

Tag: Random

Can you create a jittered bar chart?

Can you create a normalised jitter plot?

Can you create a jittered box plot?