Forecasting the Final Cost of Iraqi Public School Projects Using Regression Analysis

The actual final cost of public school building projects, like other construction projects, is unknown to the owner till the final account statement is prepared. An attempt to predict the final cost ofsuch projects before work starts, using backward elimination regression analysis technique is carried out.The study covers two story (12 classes) school projectsawardedby the lowest bid system. Records of (65) school projects completed during (2007-2012) are employed to develop and verify the regression model. Based on experts'convictions, nine factors are considered to have the most significant impact on the final cost.Hence they are used as model inputparameters. These factorsare;awarded bid price, average bid price, estimated cost, contractor rank, resident engineerexperience,project location, number of bidders, year of contracting, and contractual project duration. It was found that the developed regression model have the ability to predict the final cost (FC) for school projects, as an output, with avery good accuracy havinga correlation coefficient(R)of(93%),determination coefficient (R 2 )of(86.5%)and average accuracy percentage of(92.02%).


INTRODUCTION
onstruction projects costsare influenced byseveralfactors. These factorsare related to project characteristics, construction teamsand market conditions. When unexpected events occurduring the execution phase of construction projects,their final costs aredriven up. Most of such events are uncontrollable factors that increase the gap between thecontract award priceand the final completion cost. It is greatly important that the client should know what contingencies he must have in hand to ensure his project final completion in time. Lack of information about these factors, lack of relevant data, and weak expectations of possible circumstances to be faced by the project are the main challenges facing researchers in thisessence.This research attempts to use real measurable parameters, to be in hand before the project starts, as predictors for the expected final cost of school projects.

Research Objectives
This research aims at the following objectives: • To explore the factors that can be used to predict the final cost of school building projects before starting works. • To raisethe efficiency of estimating initial costs using data already in hand.

•
To build a mathematical model using multiple regression analysis to predict cost deviation in school building projects before starting works.

Research Hypothesis
At the project start phase, it can be said that awarded bid price, average bid price, estimated cost, contractor rank, resident engineerexperience,project location, number of bidders, year of contracting, and contractual project durationare good predictors to the final cost of public school building projects before starting works.

Research Justification
The reasons forcarrying out this research are: • The large number of under construction school projects accompanied with everlasting cost overrun and the ever growing demand on additional school buildings in Iraq.

•
The need of knowingan accurate anticipated final cost of a construction projects before starting works, is highly essential in budgeting concerns, especially in contingency allocation.

Suitabilty Of Multiple Regression
The objective of most parametric costs estimating approaches is to use some historical cost data and try to find a functional relationship between changes in cost and the factors affecting the final cost.The regression technique is a statistical modeling method that can be used for analysis and prediction in different knowledge domains. Multiple regressionestimation models are well established and widelyused C 2 in cost estimation. They are effective due to their well-definedmathematical procedure, as well as being able to explain the significance of eachvariable and the relationships between them. Basically, regressionmodels are intended to find the linear combination of variableswhich bestcorrelates with dependent variables.The general regressionequation is expressed asfollows [1]: Y is the total estimated final result, A o is aconstant estimated by regression analysis, A 1 , A 2 , … A n are coefficients also estimated by regression analysis, given the availability of some relevant dataI 1 , I 2 ,…I n as measured distinguishable variables that may help in estimating Y [2].

Literature Review
Literature review shows a variety of ways used to predict the project final cost and deviations. Many variables were used as predictors in those studies. Williams [3] concluded five mathematical models to predict the final cost of highway construction using low bids in five states in USA as independent variable. From competitive bidding of highway construction projects, the low bid price and the cost of the completed contract were obtained for each project. These models aimed at predicting the projectfinal cost according to the low bid as the only input. Wibowo and Wuryanti [4]studied education building projects in Indonesia to prepare early stage cost estimate models. They found that the estimated cost at earlier stages could be predicted according to the total project area.
Olatunji [5] collected data of (137) public contract projects executed between (2003) and (2007) in Nigeria. Lowest/winner bid, average bid, consultant's cost estimate, gross floor areawere the model variables to predictthe final construction cost. The conducted regression model has an adjusted R 2 value of (0.949). Mahamid and Amund [6] investigated the statistical relationship between actual and estimated cost of road construction projects. Data collected from (169) road construction projects awarded in the West Bank in Palestine over the years (2004)(2005)(2006)(2007)(2008) were analyzed. The study concluded that (100%) of road projects in Palestine suffer from cost deviation. theydeveloped a regression model with a coefficient of determination (R 2 )of (0.96).
Bedford [7] studied the risks of excessive competition in the Canadian public sector that award contracts solely to the low bid. It has been concluded that the bidding process is a good indicator to the final cost and possible cost escalations. Ganiyu and Zubairu [8] developed a predictive cost model for public building projects in Nigeria using principal components regression. The study showed that the project cost basicallydepends on factors related to; adequacy of equipment, experience in similar projects, time allowed for bidding, level of technology, client commitment to time, repetitive work,design complexity, communications,project scope, construction complexity, and previous relationship with the client.
Mohd et al. [9]studied the historical data of (83) school projects in the Malaysian public sector. Multiple regression analysis was used to predict the effect of the lowest bid, average tender price, and the winning tender in the interpretation of the deviation in cost estimation.The regression model from mean bids showed that the project size, number of tenders, type of schools, and location are the best-fitted predictors to explain biased estimates.
Aziz [10] investigated and ranked factors perceived to affect cost variation in the Egyptian wastewater projects.It was discovered that factors such as: lowest bid procurement method, additional works, bureaucracy in bidding, tendering method, wrong method of cost estimation, and funding problems were crucialin causing cost variation, while, inaccurate cost estimation, mode of payments, unexpected ground conditions, inflation, and pricefluctuation are less important.

Datacollaction
The initial parameters that are intended to be used in the model were collected from the literature review of previous studies as shown in Table ( 1). A questionnaire form has been directed to (50) local experts in order to determine the most significant factors in predicting the final cost of school projects before it starts. Fifty questionnaires were directed to owner's representative engineers in the public sector. Thirty two respondents forms, forming (64%) of the total number of questionnaires, have successfully been submitted. The respondents were asked to select the parameters that they believe is most important in developing the mathematical model.

Modelformulation
Previous studies showed different methods used to study the relation between the final cost and factors believed to influence that final cost. In this research a back elimination regression technique isadopted toanalyze historical cost data in order to provide a powerful modelto assist budgeting and cost estimating before work starts. The Statistical Package for Social Science SPSS and MS Excel are used to develop a suitable model.In order to remove lineartrend from the data, transformationsby taking the natural logs of some of thevariablesareapplied.Then a simple linear model is developed using in each run, the natural log of thefinal project cost (FC) as the dependent variable,the natural logs of; accepted bid price (I 1 ), average bid price (I 2 ), estimated cost (I 3 ), and the untransformed parameters of; contractor rank (I 4 ), experience of R.E. (I 5 ), location of project (I 6 ), number of bidders (I 7 ), year of contracting (I 8 ), contractor duration (I 9 ) as independent variables. SPSS (version 20) is used for data analysis. Backward elimination technique is adopted to develop the regression model as in Tables (2) and (3). The procedure of this technique is to enter all nine variables in the model equation first, then sequentially remove the variable with the smallest partial correlation with the dependent variable in each run.

Resulted Equation
After applying multipleregressionanalysis on the historical data of thewhole (60)school projects, theresultedfinal construction cost estimation equation is: This model is chosen based on the smallest Standard Error of Estimate which is (0.1198686) and the Residual Mean Squarewhich is(0.014). According to Tabachnick and Fidell [1]advice, the relative importance of the independent variables is assessed by examining their respective standardized coefficients i.e. Beta values in Table (3). Predictors with higher standardized coefficients such as: ln accepted bid price (I 1 ), ln estimated cost (I 3 ) and number of bidders (I 7 ) are more important to the regression equation than those with lower valuessuch as contractor rank (I 4 ), experience of R.E. (I 5 ) and contractor duration (I 9 ).Therefore values of I 1 , I 3 , andI 7 indicate a highly significant regression fit. It can be concluded that I 1 ,I 3 , and I 7 contribute significantly to the regression model. The small constants of I 5 and I 9 in the model equation refer to the small effect of experience of R.E and the contractor duration. The exclusion of the average bid price (I 2 ), location (I 6 ) and the year of contracting (I 8 ) parametersis because of their insignificance.

Multi-Colliinearity Assessment
To assess multi-collinearity among the variables, tolerances and variance inflation factors (VIF) are examined as shown in Table ( 4). Tolerance refers to the proportion of the variance of that variable that is not accounted for by other predictors in the modeland is calculated using the formula (1-R 2 ) for each variable. The range of tolerances is from (0)i.e. perfect collinearity, to (1)i.e. no collinearity. A tolerance with values less than (0.1) typically indicates a multi-collinearity problem. Variance inflation factor (VIF) is another index for the diagnostic of multi-collinearitywhichis just the inverse of the tolerance value. The high value of (VIF) for a variable indicates that there is a strong association between that variable and other remaining predictors [1].Variables that have high tolerances will definitely have small variance inflation factors. A variance inflation factor in excess of (10) indicates a multi-collinearity problem [15]. Since the final cost model predictors havetolerances and (VIF) values that does not violates the aforementioned criteria, therefore, multi-collinearity is not a serious problem in this analysis. 1.538 .650

Model Validation
One of the most important steps in developing a cost model is to test its accuracy and validity. This process is also refers to as the model validation.It involves testing and evaluating the developed model with some test or validation data. The validation data should be some representative data from the targeted population but haven't been used in the development of the model. In this study, the validation data is extracted from the same historical data filebut forfiverandomly selectedadditional projects. They are not a part of the (60) projects used in the development of the model. The predicted cost of these five projectscomputed using the model equation are compared withreal cost records and the results of this comparisonis shown in Table ( 5). It is evident now that the model performed very well.Its predictions deviate only by (-8.702%) to (6.513%).

Figure (1): Comparison of Predicted and Actual Final Costs
The coefficient of determination (R 2 ) is found to be (97%), therefore it can be concluded that this model shows a good agreement with the actual measurements. It is finally clear that this model for school building projects in Iraq has the

Model Evaluation
The statistical measures used to measure the performance of themodels included [17] Table (6). The(MAPE) and (AA) generated by (MR) model of (FC) are found to be (7.97%) and (92.02%) respectively.The (R 2 ) value is (86.5%) which indicates that the most variability in the total cost is explained by the terms in the model. Therefore,it can beconcluded that this (MR) model of (FC)showsgood agreement with theactual measurements.