Can you compare salaries to the median?

For the first WOW challenge of 2024, I set the task of completing a viz related to HR salary data.

Creating the calculations

To start, we’ll build out all the necessary calculations required, and display them in tabular form. First create

Employee

[Forename] + ‘ ‘ + [Surname]

then add Employee ID and Employee to Rows, along with Department and Job Title. Add Pay Level as a discrete dimension (blue pill) to Rows too, then format Salary to £ with 0 dp, and add to Text.

We want to compare each employee’s salary with the median salary of all the employees in the same pay level. For this we need

Median Salary per Pay Level

{FIXED [Pay Level]: MEDIAN([Salary])}

Add this to the view too. You should see that the value is the same for those rows where the Pay Level is the same.

We need to compute the difference between these numbers (to draw a line from the median to the salary)

Difference From Median

SUM([Salary])- SUM([Median Salary per Pay Level])

and we also need to understand the salary as a proportion of the median

Salary / Median %

SUM([Salary])/SUM([Median Salary per Pay Level])

format this to % with 0 dp and add both these fields to the table

Each employee is given a scale value based on the proportion

Salary Scale

IF [Salary / Median %] > 1.2 THEN 6
ELSEIF [Salary / Median %] >1.1 AND [Salary / Median %] <=1.2 THEN 5

ELSEIF [Salary / Median %] >1 AND [Salary / Median %] <= 1.1 THEN 4

ELSEIF [Salary / Median %] > 0.9 AND [Salary / Median %] <= 1 THEN 3

ELSEIF [Salary / Median %] > 0.8 AND [Salary / Median %] <=0.9 THEN 2
ELSE 1
END

Set this to be discrete, then add to the view on Rows.

Finally we need the markers for the 80%, 90% etc of the median salary

80% Median

0.8 * [Median Salary per Pay Level]

90% Median

0.9 * [Median Salary per Pay Level]

110% Median

1.1 * [Median Salary per Pay Level]

120% Median

1.2 * [Median Salary per Pay Level]

Building the Viz

On a new sheet, add Employee ID, Employee, Department and Job Title to Rows. Add Salary to Columns and set the mark type to Circle. Set the Colour of the circle to suit (or use #64cdcc).

Add Median Salary per Pay Level, 80% Median, 90% Median, 100% Median, 120% Median to the Detail shelf. Add Pay Level, Salary Scale and Salary / Median % to Tooltip.

Format the axis (right click) and set the scale to be at £k with 0dp

This will also change the formatting of the salary on the tooltip, but we want it to be more detailed so create

Tooltip-Salary

[Salary]

and format this to £ with 0 dp. Add this to Tooltip too and adjust tooltip to match.

Add a reference line to the axis (right click axis > add reference line) for the Median Salary per Pay Level. Set it to be per cell and display a dashed line with no labels/tooltips displaying.

Add another reference line. This time set it to be a band at the cell level which ranges from the 80% Median to the 90% Median. Don’t display any labels or tooltips or lines and fill the band with the appropriate colour.

Repeat the above process, adding a reference band from 90% – 110% of the median, and a further band from 110% – 120% of the median.

Add another instance of Salary to Columns. Change the mark type of the 2nd Salary marks card to gantt and adjust the colour. Add Difference From Median to Size and decrease the size as small as possible. Double click into the Difference From Median pill on the size shelf and manually type in * -1 to the end.

Now make the chart dual axis and synchronise the axis. Right click on the top axis and move marks to back.

We want the user to be able to filter the chart, so add Department, Pay Level (select All values) and Salary Scale to the Filter shelf.

Finally, format the chart – make each row a bit wider; set the background colour of the worksheet to #f9f8f7; remove column dividers; set row dividers to a thick white line; remove all gridlines, zero lines and axis rulers. Reduce the size of the axis values and increase the width of the initial 3 columns so the text doesn’t wrap as much. Test your filters.

If all working, then add the sheet to a dashboard.

Building the Legend

To build the legend, I simply duplicated the core sheet, then filtered to a specific Employee ID and hid the Employee ID, Employee, Department and Job Title fields. I then edited the reference lines to add custom labels to label each line band, formatting the text to display to the top or bottom. I added an annotation to the Salary circle mark to label that as ‘Employee Salary’ which I then manually moved into position.

When I added this to the dashboard, I then floated a blank object over the top so the legend could not be interacted with.

My published viz is here.

Happy vizzin’!

Donna

Can you track headcount?

Luke decided to set us a challenge this week based on Human Resources (HR) data, using a dataset provided as part of the #RWFD Real World Fake Data project managed by Mark Bradbourne.

The challenge focussed on reporting monthly headcount, using a dataset that contained 1 row per employee with a column relating to the start date (Hire Date) and a column relating to the date the employee left (Termdate), which could have been a future date or NULL/blank.

I’ve worked on headcount reporting before in my day job, but have used a snapshotted data source, which captures the active employees as at midnight on the 1st day of every month. This method makes counting the headcount each month very simple, as it’s just counting the number of rows per snapshot date.

Obviously, this wasn’t an option in this case. Luke gave very little clues as to what approach would be required, apart from saying ‘you will also need to manipulate the data’….

A word about the data…

Just a little side note here…. I ended up in discussions with my fellow #WOW participant Rosario Gauna as after attempting one method, I wasn’t getting as high numbers as the solution originally suggested. There were rows in the data set that had NULL values for most of the data except job title, but I think this was only around 225 rows, so even including them I didn’t get close to the figures. Rosario however, was somehow managing to see approx 15,000 rows of null data. I ended up building and publishing my initial solutions based on a version of the data sent to me directly by Rosario.

Since then, it seems others were also confused, and so the solution was adjusted to ignore all the null rows. As a result I reworked one of my solutions to use the original data source I downloaded from the site, and it’s this I will focus on in the blog.

A word about my solutions…

With little instruction, it took a bit of head-scratching to figure out how to work through the problem. I initially decided to use a scaffold dataset to help. I built a simple file in Excel which contained 1 column listing 1 date per month which went from 01 Jan 2017 to 01 Jan 2021. I then joined this to the HR data in Tableau Desktop using the physical data layer (ie I did not use relations), using join calculations as below (note, this is using the amended copy of data I received where Termdate was already defined as a date field.

The solution using this method is published here.

However, I didn’t feel completely comfortable with this approach. Luke had suggested the difficulty of this was 9/10 and referred to nested tableau calculations… the above solution just used quick table calcs. So I decided to see if I could come up with an alternative, just using the provided data, and that’s what I’ll document below.

Setting up the data

I downloaded the Human Resources.csv file from the link provided, and connected to it in Tableau Desktop. I found the Termdate field presented as a string field, so I converted it to a date, just by clicking on the ABC symbol of the data type against the field. I also added a data source filter to exclude all records where the Id field was NULL.

I then added a new data source, and connected to the same file again. This meant I had 2 instances of the Human Resources data source listed in my Data pane. I renamed the 2nd instance to be Leavers Only, changed the data type of the Termdate field again and added a data source filter so that only records with non-Null Termdate values were retained.

Note – What I will be doing here is using the Human Resources data source to manage the months we’re reporting headcount over. This method only works if there is at least 1 new starter every month, which in this case there is.

Building the calculations

The requirements state that a new starter doesn’t count in the monthly headcount figures until the following month, so the date field we need to use in the output, needs to be created as

Report Date

DATE(DATETRUNC(‘month’,DATEADD(‘month’,1, [Hire Date])))

This basically shifts the date the new starter joined to the 1st day of the following month.

To make the calculations easier to read, I also created

Starters

COUNT([2022_04_06_WW14_Human Resources.csv])

which is literally a reference to the field that automatically gets added as part of the connection – it’s just a bit of a mouthful as its named according the data connection.

Let’s pop these out in a table as below – I’m deliberately using the dateparts of the Report Date field as they’ll be used in the viz like this.

So this is just showing the new starters against the month at which they will count for headcount. We need to get details of leavers now, and we’ll do this by blending.

When I blend data sources, I tend to create specific fields so it’s clear to me that they’re being used for blending, rather than defining blend relationships between differently named fields. So in the Human Resources data source I’ve created

BLEND – Date

[Report Date]

and then in the Leavers data source, I’ve created

BLEND – Date

DATE(DATETRUNC(‘month’,DATEADD(‘month’,1,[Termdate])))

which is shifting the Termdate forward to the the 1st of the next month, since leavers should also only be recorded in the following month.

In the Human Resources data source I then created

Leavers Per Month

ZN(COUNT([2022_04_06_WW14_Human Resources – Leavers Only].[2022_04_06_WW14_Human Resources.csv]))

which is just referencing the automatically generated ‘count’ field from the Leavers data source. I’ve wrapped it in ZN so 0 is reported in the event no match is found.

Add Leavers Per Month into the table, and ensure the linking field from the secondary Leaver data source is connected on the BLEND – Date field

So now we know how many starters and leavers per month, we now need to total these up by generating a cumulative running sum of the starters, but subtracting any leavers along the way.

Official Headcount

([Starters] + PREVIOUS_VALUE(0)) – [Leavers Per Month]

Takes the value of the Starters in the current row, adds it to the value of Official Headcount from the previous row, then subtracts any Leavers recorded against the current row. Add this onto the table, and as its a table calculation, edit it so it is explicitly computing my the month & year of Report Date

For the annual change, we’re looking to compare the difference of the Official Headcount value for the current month eg Jan 2021, with the Official Headcount value for the same month 1 year ago eg Jan 2020.

Annual Change

(ZN([Official Headcount]) – LOOKUP(ZN([Official Headcount]), -12)) / ABS(LOOKUP(ZN([Official Headcount]), -12))

Take the Official Headcount from the current row, subtract the Official Headcount from 12 rows before, then divide the result by the Official Headcount from 12 rows before.

Format this to a percentage at 1 dp. (For some reason my numbers here don’t seem to match Luke’s revised solution…).

Again add to the sheet and set the table calcs of both the nested calcs to compute by month and year

The final measure was a bit of a strangely named one IMO. I felt I should be doing some rolling calculation, but ultimately, it just seemed to be reporting the difference between the headcount now and that 12 months ago ie the numerator of the calculation above.

Net Rolling 12 Change

(ZN([Official Headcount]) – LOOKUP(ZN([Official Headcount]), -12))

Pop this into the view, adjust the table calc settings again

Building the viz

On a new sheet, add Report Date at the month level, but then set to be continuous, to Columns and Report Date at the year level to Colour. Add Official Headcount to Rows (adding the linking field for the blend when prompted), and adjust the table calc settings to compute over both month & year, ensuring the Year of Report Date is listed first.

Then in the legend, select all the years from 2000-2017 and ‘hide’ (this is probably a cheat way, but the quickest – we need to retain the data from the previous years, so can’t just ‘filter’).

Edit the axis to not start from zero. Adjust colours to suit.

Add Annual Change and Net Rolling 12 Change to the Rows shelf, adjust both of the table calc settings, so all nested calcs are computing by Year & then Month.

Edit the axis of both these additional measures to also exclude 0. On the All marks card, click the Label button and tick the Show mark labels option.

Final steps ….

…on the All marks card, click the Tooltip button and uncheck Show tooltips so no tooltips display on hover.

Right click on the Month axis and format the axis so the dates in the Scale section are displayed as abbreviated dates

Edit the same axis and remove the title, then format the whole chart to

  • remove all row and column dividers
  • to set the row banding with a band size of 1
  • adjust the colour of the row and column gridlines to be a slightly darker shade of grey

And you’re done 🙂 My published viz is here

Had a bit of all sorts this week… I’ll be intrigued to see how Luke solved it!

Happy vizzin’!

Donna