Size and Effort
In software development, larger software applications take more effort to develop than smaller ones. The bigger the software size, the more effort. That seems reasonable, and is what we would expect. What isn’t obvious is that the relationships between the core metrics are all exponential. For example, the relationship between size and effort is logarithmic.
This relationship causes some surprises. For example, software productivity (size produced per unit of effort) rises as software size rises. QSM’s data (>10,000 software projects) definitely shows an upward trend in productivity as application size increases. This is true whether we use measures like QSM’s PI (Productivity Index) or ratio based productivity measures (e.g., SLOC or Function Point per person month of effort).
I took another look at productivity data.The follow-up question I answered was, “Does productivity (measured as SLOC/PM) always increase with system size, or could the size-productivity relationship actually behave differently in certain regions of the size spectrum?” To answer this question I used standardized residuals to evaluate the size/productivity regression trend.
Simply put, residuals measure the difference between predicted values (the value of the regression trend at a particular size) and actual metric values. If the regression line provides a poor “fit” in certain size regimes, the residual values will reflect the gap between the values predicted by the trend and actual productivity values for that size regime.
As can be seen in the figure above, the residuals form an almost perfect normal distribution. This implies that there was no unexplained skew in the data.
Productivity and Staff Paradox
Effort, productivity and staff size all tend to be higher on larger software size projects. This can be seen in the following figure, which uses over 4,000 projects completed between 2001 and 2011.
Previous research (e.g., see Armel in Resources) has shown that large team sizes (higher staff) results in lower productivity.
So, large projects have higher productivity. And large projects have higher staff. But higher staff results in lower productivity. How can this be?
To understand the underlying relationships, we need a way to visually examine three variables at once: size, productivity, and staff.
Clustered boxplots provide a view of the trends. A box in a boxplot represents the interquartile range of the data. The bottom of the box is at the first quartile (25^{th} percentile). The dark line inside the box is the median (50^{th} percentile). The top of the box is the third quartile (75^{th} percentile). The “whiskers” extending out from the box represent the range of the values. Individual outliers (if any) show up as circles, and extreme values are asterisks.
To create the following plot, the projects were first divided into quartiles for size and also in quartiles for peak staff. Quartile 1 has the smallest 25% of projects and quartile 4 has the largest 25%. Productivity on the vertical axis is expressed on a log scale to further improve the readability.
In the above plot, productivity decreases as peak staff increases within each quartile of size. To see this, pick any of the size quartiles, and compare the position of the 4 adjacent boxes. In the next graph, this has been done with an oval drawn around the second quartile of size. Productivity drops as staff increases, for a given size.
Next, we can see that within each quartile of peak staff, productivity increases as size increases. Pick any color of box, and compare the position of the 4 boxes with the same color (one from each quartile of size). For a given staff size, productivity is higher on larger projects. In the following graph, the largest staff sizes are identified with arrows.
Simple productivity is higher on larger software development projects. Smaller team sizes tend to have higher productivity. With this data set, we’ve shown that these two statements are not mutually exclusive. Larger teams become more productive as project size increases, but productivity increases even further as team size decreases.
For additional information on this topic, please take a look at the QSM Benchmark Table and the other resources listed below.
About the Author: Paul Below has over 30 years of experience in technology measurement, statistical analysis, estimating, Six Sigma, and data mining. As a Principal Consultant with QSM, he provides clients with statistical analysis of operational performance, process improvement, and predictability. He is a Six Sigma Black Belt, and has one US Patent.
- MB: Main Build, which includes Design, Code, and Test.
- MM: Man Month. 1 MM is 1 person working 1 month.
- N: Sample size, or the number of projects used in the analysis
- PM: Programmer Month (similar to MM)
- SLOC: Lines of source code
References
- “Best-in-Class Performers Use Smaller Teams”, p. 33 of Armel, Kate. “Data-driven Estimation, Management Lead to High Quality”. QSM Software Almanac: Application Development Series, 2014 Research Edition. 25-42. Print and online PDF.
- QSM Performance Benchmark Tables. http://www.qsm.com/resources/performance-benchmark-tables
- Armel, Kate. “Small Teams Deliver Lower Cost, Higher Quality”. QSM Software Almanac: Application Development Series, 2014 Research Edition. 73-75. Print and PDF.
- Below, Paul. “Maximizing Value: Understanding Metrics Paradoxes through use of Transformation”. The IFPUG Guide to IT and Software Measurement: A Comprehensive International Guide. Ed. IFPUG, New York, CRC Press, 2012. 319-333. Print.
- Below, Paul. “Data Mining for Process Improvement”. Department of Defense Journal Crosstalk, Jan/Feb 2011, 10-14. Print and online PDF.