On SPACE and DevEx: The Pitfalls of Using Surveys to Measure Software Engineering
Today, a new study has released new data which indicate flaws in the principal measurement technique used in the SPACE and DevEx frameworks, which have been the successors to the “Four Key Metrics” from Google’s DORA team.
In a May 2023 article introducing the DevEx frameworks, the authors claimed: “Surveys in particular are a crucial tool for measuring DevEx and capturing feedback from developers about points of friction in the software-delivery process.”
However, today’s new research has shown multiple issues with relying on subjective survey metrics over empirical source-of-truth data, even when that data is used as statistics. These failures have been indicated in historical research too; for example, prior research identified in the investigation has also shown that “those with the lowest programming skill” are most likely to be most over-optimistic at evaluating software delivery performance in large projects.
A 1992 study found that in one company 32% of software engineers rated their performance in the top 5%, in the second company studied this increased to 42%. Of 714 participants, only one rated their performance below average. The variance between the two companies highlights the issues with doing such studies at a team/company level, whilst the clearly “statistically absurd result” highlights the issues with using such surveys for performance metrics.
In the new research, when software engineers are subjectively asked to measure their own performance, 94% rated their performance either average or better than average. To highlight how the demographics of a team can affect a subjective understanding of performance, the research found that men are 26% more likely than women to consider themselves better than average performers. However; this positivity doesn’t extend to management - software engineers are nearly 17% more likely on average to agree to a great/moderate extent that other managers in the industry are generally good compared to their own.
This research has also highlighted other barriers that stop engineers from speaking up. Both a majority (75%) of software engineers who did speak-up about something they’ve seen at work reported facing retaliation and a majority (59%) of those who didn’t said they didn’t because of a fear of retaliation.
The research has also highlighted risks of exclusion of certain software engineers when these practices are used. Nearly one in three (31%) either didn’t feel their achievements at work were well-celebrated at all or only “to a small extent”. With nearly one in four software engineers saying they were unable to take calculated risks without fear of negative consequences, this research has lighted the challenges with using such surveys in team settings.
These cognitive biases can be seen at a population-wide level but can be even more tricky to manage at a team or company level, where each environment (and the degree to which software engineers feel they can voice their opinions) matters greatly. It’s for this reason that sometimes engineering managers who use subjective surveys to measure developer experience will see their ratings decrease as they start to make improvements, as engineers feel they are better able to speak-up.
Therefore, in order to celebrate successes and to mitigate invisible risks, it is vital that when software delivery is evaluated, it should be done objectively and empirically. At a team or company level, subjective measures like surveys (even where the results are quantified) should not be the sole (or even main) source of input.