In-Depth
Oops, There It Is: Object-Oriented Project Size Estimation
Java development efforts are sweeping through the enterprise. Existing legacy applications are being reengineered to include Web front-ends. E-business is an idea whose time has come. IBM’s Visual Age for Java includes a mechanism that allows a Java Web-based GUI front-end to communicate with a CICS transaction running on the mainframe. Millions of lines of legacy COBOL/CICS code are being salvaged as a result of these enhancement efforts. Unlike FORTRAN, COBOL, PL/I and Assembler, Java is an object-oriented language. Sizing an object-oriented project is different from the traditional sizing techniques involved with procedural programming projects. In the procedural world, line of code estimates for each module can be plugged into Barry Boehm’s Cocomo model, in order to determine the amount of time required to develop the procedural software artifacts. The Cocomo model, and other procedural sizing techniques such as function point estimation, have been used successfully for years to size procedural projects.
This article introduces an Object-Oriented Project Size Estimation technique that can size object-oriented projects. The technique, similar to its procedural counterparts, scores the software components and passes the scores to a predictive equation to calculate development time estimates.
Discussion
The statistical estimation technique discussed in this article was originally formulated by Dr. Simon Moser and Dr. Oscar Nierstasz as part of a research project at the University of Berne in Switzerland. An article entitled "The Effect of Object-Oriented Frameworks on Developer Productivity" describes the research effort in the September 1996 issue of IEEE "Computer." Within the article, the researchers define a concept called System Meter. System Meter is a measure of the complexity of a set of object-oriented classes and is related to the amount of time required to design, code and test a set of objects. According to the estimation technique, an object’s complexity is a function of the object’s name, and the number of attributes, methods and parameters to object methods. Each of these object components contributes to a Point value for an object. According to the researchers, the Point value of an object is directly related to the amount of time needed to develop the object.
To prove the premise, the researchers gathered the actual amount of time to design, code and test objects written in various object-oriented programming languages. The application areas of the systems developed in the languages covered a broad range. Objects from these systems were analyzed to determine the Point value of each object. Point values were then combined with the actual amount of time taken to produce each object. This effort yielded a set of Point/Days Required to Develop ordered pairs. The ordered pairs were then subjected to regression analysis. The functional form of the regression model is as follows:
Days Required to Develop = (B1 * Points) + (B2 * Points2)
The regression was performed and the researchers came up with B1 equal to 0.367 and B2 equal to 0.0000696 from their initial analysis. Note that the values of B1 and B2 are based upon objects that were part of 36 object-oriented systems that spanned various application domains.
The Object-Oriented Project Size Estimation (Oopsize) technique uses the initial estimates of B1 and B2 to predict how much time is required to design, code and test an object. The objects can be described in a Rational Rose class model, for instance. In the class model, the analyst defines object names, object attributes, object methods and parameters to object methods. Oopsize operates against the information in the Class model, calculates the Point values for each object in the model, and plugs the Point values into the predictive equation to determine an estimated amount of time for each object’s creation.
Determining the Point Values for a Set of Objects
Point calculations are best illustrated by example. Consider the set of object definitions written in Java in Figure 1.
Point values for the objects are formulated as follows:
a) For MyObject1
1. Points for MyObject1 are set equal to 0.
2. The name MyObject1 is parsed. MyObject1 contains two unique tokens: My and Object1. Therefore, Points = Points + 2, or 2 total points.
3. MyObject1 has 3 attributes. Each attribute contains 1 unique token. Therefore, Points = Points + 3, or 5. Each Attribute has a type (i.e., String). Therefore, Points = Points + 3, or 8.
4. MyObject1 contains a method: Method11. Method11 contains 1 unique token. Therefore, Points = Points + 1, or 9.
5. Method11 has two arguments: Parm11 and Parm12. Therefore, Points = Points + 2, or 11. Each argument has a type. Therefore, Points = Points + 2, or 13.
6. MyObject1 contains a method: Method12. Method12 contains 1 unique token. Therefore, Points = Points + 1 or 14. Note that no points are awarded for the return value of Method12.
b) For MyObject2
1. Points for MyObject2 are set equal to 0.
2. The name MyObject2 is parsed. MyObject2 contains 1 unique token: Object2. ("My" was used previously in MyObject1). Therefore, Points = Points + 1, or 1.
3. MyObject2 contains Attribute21. It has a type (i.e. int). Therefore, Points = Points + 2, or 3.
4. MyObject2 contains Method21. Method21 contains 1 unique token. Therefore, Points = Points + 1, or 4.
5. Method21 has 1 argument: Arg1 has a type (i.e. float). Therefore, Points = Points + 2, or 6.
Total points for MyObject1 and MyObject2 are 14 and 6, respectively. Then, substituting the Points values into the predictive equation on page 34, the number of days to design, code and test MyObject1 is 5.15 and the number of days to produce MyObject2 is 2.20. The total number of days for the entire project is 7.35.
Custom Size Estimates
In effect, the Oopsize technique can act as a starting point for object-oriented sizing efforts in your programming shop. The predictive equation parameter values are based upon industry averages. With a modest effort, the predictive equation can be modified to reflect your actual experience in developing objects. That way, your sizings will more accurately reflect development variables and conditions within your own programming shop.
To calibrate Oopsize for development in your own shop, you must record the actual amount of time required to produce a set of objects. You can track development times of objects for a Java project or two. Be sure to save the predicted time estimates originally available, via the Oopsize technique. This effort will result in a set of Points/Actual Days Required To Develop ordered pairs. Pass this data to the following set of equations:
Let Yi = ith Actual Days Required to Develop and Xi = ith Points for object i thru n minimize Q = ∑ i = 1 to n of (Yi - (B1*Xi - B2*Xi2))2
In order to find the minimum, differentiate with respect to each parameter B1 and B2. This yields two equations:
dB1
--- = -2 * ∑ i = 1 to n of (Yi - B1*Xi - B2*Xi2) * Xi
dQ
and
dB2
--- = -2 * ∑ i = 1 to n of (Yi - B1*Xi - B2*Xi2) * Xi 2
dQ
Setting the equations equal to 0 to find the minimum produces:
B1 * ∑ i = 1 to n of Xi2 + B2 * ∑ i = 1 to n of Xi3 = ∑ i = 1 to n of Xi* Yi
and
B1 * ∑ i = 1 to n of Xi3 + B2 * ∑ i = 1 to n of Xi4 = ∑ i = 1 to n of Yi*Xi2
Then, if we define the following:
A = ∑ i = 1 to n of Xi2
B = ∑ i = 1 to n of Xi3
C = ∑ i = 1 to n of Yi*Xi
D = ∑ i = 1 to n of Xi4
and
E = ∑ i = 1 to n of Yi*Xi2
we have
B1 = (-B2 * B + C) / A
where
B2 = (E * A) - (B * C) / (-B2 + D * A)
Substituting quantities A, B, C, D and E in the above two equations produces the new predictive parameter estimates, B1 and B2. Use the new B1 and B2 in calculating object-oriented development time estimates, calibrated for development in your programming shop.
The more data points you use in recalibrating the predictive equation, the better your new parameter estimates. To put that statement in perspective, 60 actual object development times are better than 30.
Another point: If you feel the factors which explain the amount of time required to develop objects in your shop have changed over time (i.e., turnover of personnel, the attainment of coding experience, etc.) then recalibrate Oopsize with the new actual times. You can recalibrate Oopsize any time by gathering actual development times and recalculating the parameter estimates in the predictive equation.
The accuracy of your new predictive equation is measured by a statistic produced by Oopsize called RSQUARED. RSQUARED ranges from 0 to 1 and measures the extent to which the variation in Days Required to Develop is explained by the variation in Points. An RSQUARED value of .90 or better indicates that you have generated a good predictive equation. An RSQUARED value of .75 or better is largely acceptable, and may indeed be a better predictor of development times than the parameter estimates initially provided with Oopsize. For a complete discussion of "goodness of fit," pick up a statistical text on regression analysis.
Conclusion
The Oopsize technique has been used successfully to size object-oriented projects at two major corporations here in Colorado. The comment was made that the estimates produced by Oopsize put the project manager "in the ballpark" with regard to sizing object-oriented development times. Other estimates, put together using best guesses, etc., proved to be off by orders of magnitude.
Oopsize certainly does a better job. For a copy of software that performs the functions described in this article, visit www.softengprod.com.
About the Author: Dick Brodine has been a teacher, writer and software developer on mainframes and other platforms for 23 years. He holds a Masters of Science in Computer Science, a Masters of Science in Operations Research and a Masters in Energy Resources. He can be reached at rbrodine@us.ibm.com.