Construction cost estimation of reinforced and prestressed concrete bridges using machine learning

Seven state-of-the-art machine learning techniques for estimation of construction costs of reinforced-concrete and prestressed concrete bridges are investigated in this paper, including artificial neural networks (ANN) and ensembles of ANNs, regression tree ensembles (random forests, boosted and bagged regression trees), support vector regression (SVR) method, and Gaussian process regression (GPR). A database of construction costs and design characteristics for 181 reinforced-concrete and prestressed-concrete bridges is created for model training and evaluation.


Introduction
There are currently more than two million bridges in operation worldwide, and their number is constantly increasing [1]. According to the American Road and Transportation Builders Association (ARTBA), the total investment costs of bridges in the USA were estimated at US$ 27 billion in 2014 [2]. In the European Union, 20.4 billion euros are planned for the construction of Trans-European Networks (TENS) within the transport sector (Connecting Europe Facility) for the 2014-2020 period [3]. This trend will certainly continue in the oncoming years, hence the estimation of construction costs, which are the most significant part of total investment costs, is of utmost importance [4]. Predicting construction costs is one of the most important preliminary steps in any construction project, since cost prediction is crucial to avoid construction delays and ensure successful project completion [5]. The main problem in estimation of transport infrastructure project costs is significant deviation between the estimated costs and the real, actual construction costs, due to intentional underestimation in the initial project phases, when the costs are evaluated in order to decide whether the transport infrastructure should be built. Based on the analysis of 258 transport infrastructure projects worth $90 billion (U.S.), it was found that in the vast majority of projects actual costs were significantly higher than initially estimated, e.g. 34 % higher on an average for bridges and tunnels [6]. This underestimation is obviously not an error, it is prone to subjectivity, and may potentially introduce biases in the decision making process [6]. Therefore, being able to objectively forecast these costs is highly desirable. The estimation of construction costs in transport infrastructure is a complex process influenced by a variety of factors, uncertainty, and imprecision. Methods based on machine learning have shown promising results, enabling automation of the construction costs estimation process, and eliminating the biases introduced by human factor. Hegazy and Ayed designed an artificial neural network (ANN) model for the assessment of highway construction costs [7]. Backpropagation, simplex optimization and genetic algorithm (GA) were used for network training. The network was trained using a set of eighteen highway projects constructed in Newfoundland, Canada. Marcous et al. used ANN with backpropagation learning algorithm to predict the volume of concrete and the weight of prestressing steel in bridge superstructure [8]. A set of twenty-two prestressed concrete bridges over the Nile in Egypt was used for network training. Marinelli et al. used the feed-forward ANN model to predict the quantities of superstructure material (concrete, prestressed steel, and reinforcing steel) using the project data from 68 highway bridges constructed in Greece [9]. Mostafa used multiple regression analysis to estimate the costs of 54 bridges and 72 culverts [10]. Using the multiple regression analysis, Hollar et al. assessed the costs of preliminary engineering of bridges, determined as a percentage of construction costs [11]. The dataset consists of bridge projects in North Carolina, USA, between 2001 and 2009. Cheng and Wu applied support vector machines (SVM) to predict construction costs using the set of twenty-nine construction projects as training cases, with an average prediction error of less than 10 % [12]. Kim and Kim studied preliminary cost estimations using case-based reasoning (CBR) and GA [13]. Fragkakis et al. presented a prediction model for bridge foundation costs that predicted material quantities for various types of foundations, and estimated the total foundation costs using the backward stepwise regression [14]. Cirilovic et al. studied prediction models based on multiple regression analysis and ANNs for the unit costs of road reconstruction works, using a dataset of 200 contracts from 14 countries in Europe and Central Asia signed between 2000 and 2010 [15]. Pesko et al. conducted a similar research on the estimation of traffic infrastructure reconstruction costs in urban areas using ANNs [16]. Elfaki et al. reviewed methods for estimating construction costs including machine learning, rule-based systems, evolutionary systems, agent-based systems, and hybrid systems [5]. Chou et al. studied models based on multiple regression analysis, CBR and ANNs, to predict bid prices for bridge construction projects in Taiwan [17]. The best prediction was obtained using ANN model, with MAPE -used as performance criterion -amounting to 13.09 %. It can be concluded from previous research and studies that researchers have used cost estimation models, the advantage of which being that a wider professional community is familiar with such models. The disadvantage is that most researchers use either linear regression models based on the assumption of linearity, which makes the whole estimation process biased, or neural network models that are significantly more complex to interpret (black box models) and require a more extensive database, or use hybrid models which are even more complicated. This paper offers a comprehensive comparative analysis of seven state-of-the-art machine learning techniques for the estimation of construction costs of RC and PC bridges. Some of the proposed models, such as GPR, have not been, to the best of our knowledge, previously used for estimating construction costs of transport infrastructure projects. GRAĐEVINAR 73 (2021) 1, 1-13 Construction cost estimation of reinforced and prestressed concrete bridges using machine learning Each layer is composed of one or more processing units called neurons, where each neuron in one layer is connected to each neuron of the next layer. Multiple neuron layers with nonlinear transfer functions allow the network to learn nonlinear relationships between input and output vectors [18]. MLP with one hidden layer with bipolar sigmoid activation function and an output layer with linear activation function can approximate arbitrary multidimensional function for a given dataset, given sufficient number of neurons in the hidden layer [19]. The number of neurons in the hidden layer can be determined experimentally for the given dataset, with the upper limit calculated by: (1) (2) where N i represents the number of inputs in the neural network, and N s represents the number of instances used for training. It is suggested to accept the lower value of the number of neurons in the hidden layer given by (1) and (2) [20,21]. Neural network ensembles can be used to improve generalization of ANN, where many neural networks are used together to predict the unseen data. The components that form an ensemble are denoted as base models, or submodels, and each submodel is allowed to have different number of neurons in the hidden layer. MLP neural networks, with early stopping of training to avoid overfitting ,are used as submodels in an ANN ensemble.

Regression tree ensembles
Linear regression represents a global model, where a single formula describes the relations between the inputs and the outputs of the model over the entire data space. It is very hard to design a single global model when there are many features interacting in nonlinear ways. An alternative approach is to divide the data space into smaller partitions, where the modelling of these interactions is easier to achieve. These partitions can be further divided into even smaller regions, until finally one gets the data space cells where simple models can be applied. This is called recursive partitioning. Regression trees use the tree to represent the recursive partition. It splits the input data space in partitions and assigns a prediction value to each partition. The terminal nodes of the tree, denoted as leaves, represent these partition cells. In order to determine to which leaf the input data belongs, and to assign it the prediction value, the algorithm starts from the root node and asks successive binary questions. Depending on the outcome of the question, the sub-branch of the tree is chosen. Eventually, the algorithm arrives at the leaf node, where the prediction is made. This prediction is found as an average of all training data instances which reach that leaf node. Suppose a training dataset D = {(x i ,y i ) ∊ ℝ n × ℝ,i = 1, 2, …, l} which consists of l training pairs (x 1 ,y 1 ), (x 2 ,y 2 ), …, (x i ,y i ) , where x i ∊ ℝ n is the n-dimenzional vector denoting model's inputs and y i are the observed responses to these inputs (model's outputs). Suppose further a division of the input data space into M partitions R m , m = 1, 2, …, M, , where the response is modelled as a constant in each partition: where l {x∊R i } is a binary function that takes the value 0 or 1 depending on the outcome of the question at the tree split point [22]. Constant c i can be determined as an average of responses y i in the region R i . Greedy algorithm is used in order to determine the split point [23,24]. Regression trees can be combined in an ensemble, which represents a predictive model composed of a weighted combination of multiple regression trees. Various algorithms can be used for ensemble learning, such as, for instance, bagging and boosting.

Bagging
A major problem with regression trees is high variance, which occurs due to the fact that only a minor change in the data can cause significantly different tree structures. This happens because the error in one of the top splits propagates all the way down to the leaves. In bootstrap aggregation, or bagging, multiple data subsets D i are created from the training dataset D, by sampling from randomly and with replacement [25]. Each of these subsets is called a bootstrap sample (or simply bootstrap). Since replacement is allowed, the bootstraps might have duplicated data instances, or some of them may be omitted, resulting in bootstraps different from the initial dataset. Each of GRAĐEVINAR 73 (2021) 1, 1-13 Miljan Kovačević, Nenad Ivanišević, Predrag Petronijević, Vladimir Despotović these bootstraps is used to build a single regression tree, which might have a different number of leaves and different structure in comparison to the original tree. All individual trees are further combined in an ensemble (see Figure  2). The predictions are averaged over all trees in the ensemble, thus decreasing the variance and improving prediction.

Random Forests
Random forests represent an extension of bagging that reduces the correlation between the individual trees, thus building an ensemble of decorrelated trees. Suppose that training dataset D is composed of l observations and n features. First, a sample from the training dataset is taken randomly with replacement and bootstrap is created. Before each split, m ≤ n features are randomly selected as candidates for splitting. The best feature (split point) among features is used to split the node iteratively [22,26]. Single tree is grown for each bootstrap and predictions are averaged over all trees in the forest. Typical values for m are approximately [20,24]. Reducing m reduces the correlation between any pair of trees in the ensemble, thus reducing the variance of the average. While in bagging the random data subsets are sampled from the initial dataset for each tree, in random forests, in addition to this, the feature subsets are also randomly selected, instead of using all features to grow the trees. Many random trees form random forests.

Boosting
Boosting is an ensemble technique where predictors are created sequentially, rather than independently, as in bagging.
The rationale behind this is that each subsequent predictor learns from the mistakes committed by previous predictors [22]. When gradient boosting is applied to regression tree ensembles, the first regression tree is the one that maximally reduces the loss function for the selected tree structure and the given training dataset. The residual (prediction error) is then calculated. It represents the mistake committed by the predictor model (the first regression tree). In the next step, a new tree is fitted to the residuals of the first tree. In each step, a new tree is added to the model, which is fitted to the residuals of the previous one. The residual values are usually multiplied by the learning rate (value less than 1) to avoid overfitting. The final model obtained by boosting is simply a linear combination of all trees (usually hundreds or thousands of trees), as shown in Figure 3.
The main idea of boosting is that, instead of using a complex single regression tree, which is easily overfitted, a much better fit is produced if many simple regression trees are trained iteratively, each of them improving the prediction performance of the previous ones [22].

Support vector regression (SVR)
Suppose a training dataset {(x 1 ,y 1 ), (x 2 ,y 2 ), …, (x i ,y i ) ∊ ℝ n × ℝ} is given, where x i ∊ ℝ n is the n-dimenzional vector denoting model's inputs and y i are the observed responses to these inputs (model outputs). SVR tries to find an approximating function f(x) with deviation ε from the observed response y i for all training data x. This approximating function for the nonlinear SVR [27] equals to (4) In Eq. (4) K denotes the kernel function, α i * , α i and b are the parameters derived by the objective function minimization Construction cost estimation of reinforced and prestressed concrete bridges using machine learning for the given constraints (Figure 4.). ξ i and ξ i * denote the slack variables which allow the regression errors to cope to a certain extent with otherwise infeasible constraints of the optimization problem. The constant C > 0 is the parameter chosen by user that denotes the amount od deviation larger than e that can be tolerated. An increase in C penalizes larger errors. Another parameter chosen by the user is the required precision e. RBF kernel function is used in this paper [28].

Gaussian process regression
The GP method represents a non-parametric method that is defined as an infinite set of random variables such that every finite subset follows a multivariate Gaussian distribution. By expanding multivariate Gaussian distribution to an infinite set of random variables, it is possible to observe GP as the posterior distribution over random functions, while the Bayes' rule is applied to determine the probability distribution from the training data in a supervised machine learning setup. Consider a problem of nonlinear regression: Where the function f(·) : R n → R is an unknown and needs to be estimated, y i is the target variable, x are input variables and ε is the normally distributed additive noise. The Gaussian process regression [29] assumes that f(·) follow Gaussian process with mean function µ(·) and covariance function k(· ·). The n observations in an arbitrary data set y = {y 1 , …, y n } , can always be imagined as a sample from some multivariate (n variate) Gaussian distribution where µ = (µ(x 1 ), ..., µ(x n )) T is the mean vector, and K is n × n covariance matrix of which the (i, j) th element Here δ ij is the Kronecker delta function. Let x * be any test point and y * be the corresponding response value. The joint distribution of (y 1 , …, y n , y * ) is an (n + 1) variate normal distribution (y 1 , …, y n , y * ) ∼ N(µ * ,∑), where µ * = (µ(x 1 ), ..., µ(x n ), µ(x * )) T , and the covariance matrix is: The conditional distribution of y * , given y = (y 1 , …, y n ) T is then with The covariance is a crucial part of the model specification.
Various covariance functions are used in the experiments. Each of these covariance functions depends on hyperparameters whose values also need to be tuned. For some covariance functions, hyperparameters can be used to determine which inputs are more relevant than others, using the automatic relevance determination (ARD). For example, consider squared exponential covariance function with different length scale parameters for each input (ARD SE) (11) where r i denotes the length scale of the covariance function along the input dimension i. If r i is very large, relative importance of the i-the input is smaller [29]. The hyperparameters {v, r 1 , ... , r n } and the noise variance σ 2 can be estimated by the maximum likelihood method. The log-likelihood of the training data is given by (12):

Dataset
The proposed cost estimate methods rely on the development of a dataset that includes project and contract documentation of RC and PC bridges constructed at the Corridor X, which is one of the most important Pan-European transport corridors connecting Austria, Hungary, Slovenia, Croatia, Serbia, Bulgaria, Republic of North Macedonia, and Greece ( Figure 5). The literature review [7][8][9][10][11][12][13][14][15][16][17] shows that a significant number of models start from a particular assumption about the model. In this paper, an attempt is made to obtain the model from the experimental data without making any prior assumptions about the model, using a narrow set of data that are available at preliminary stages of project development.
Bridge design is generally affected by many variables; hence selecting the input variables plays a crucial role in modelling construction costs of RC and PC bridges. As concrete and metal works are the most cost intensive (accounting for, on an average, almost 80 % of all costs), variables that are directly related to the amount of concrete works and the amount of metal works are adopted as input variables of the model. In this regard, the following variables were considered: Total bridge span length, Bridge width, Average pier height, Foundation type.
In cases when the Total bridge span length is the same, regardless of whether it is composed of a small number of large individual spans or a large number of short individual spans, a new variable Average bridge span was introduced, which better characterizes the bridge length. It can be obtained by dividing the Total bridge span length with the number of bridge spans. According to [30], the costs related to formwork and scaffolding can amount to up to 20 % of the total construction costs. In order to consider the potential impact of these costs, a variable Type of bridge construction was introduced in this paper. The variables Gross salary, Quarried aggregate price index and Steel price index allow comparison of construction costs of bridges that have been contracted with a different base date.

Evaluation and performance measures
In this study, the performance assessment of models was done using both absolute and relative statistical performance criteria, as suggested by Legates and McCabe [31]. he considered statistical measures were root mean square error (RMSE) and mean absolute error (MAE) as absolute measures, and Pearson's linear correlation coefficient (R) and the mean absolute percentage error (MAPE) as relative measures. RMSE is the measure of differences between values predicted by the model o k and the actually observed (measured) values d k . It is the measure of general accuracy of the model.
MAE is used to represent the mean absolute error of the model according to equation: R is a measure of linear correlation between values predicted by the model o k and the actually observed (measured) values d k : where represents the mean of d k and ō represents the mean of o k , k = 1, 2, ..., N, and N is the number of instances in the dataset. MAPE is a percentage-based measure of prediction accuracy. It is calculated as an average of the absolute percentage error.
The machine learning methods used in this paper were evaluated using 10-fold cross-validation, where the dataset is randomly partitioned into 10 subsets, 9 of them being used for training the model and the remaining one for model validation (testing). The cross-validation procedure is repeated 10 times, with each of the subsets used exactly once for validation, and 10 obtained results are then averaged to produce a single estimation.

Results and discussion
Several   Table  2. Construction cost expressed in €/m 2 is the output variable that needs to be predicted. MLP-ANN with one hidden layer was trained using the Levenberg-Marquardt algorithm [32].The criterion to stop the training was either the maximum number of epochs (set to 1000), the minimum gradient magnitude (set to 10 -5 ) or the network performance (measured as the mean square error and set to 0). All input data are normalized in the range [-1,1] prior to training. The number of neurons in the input layer is determined by the number of input variables, i.e. it consists of 9 neurons, while there is only one neuron in the output layer. The maximum number of neurons in the hidden layer was determined experimentally using Eq. 1 and 2 and equals 18.  Figure 6a shows the performance obtained using RMSE and MAE as absolute measures, while Figure 6b presents results using R and MAPE as relative measures. The best performing model using MAE, R and MAPE as performance measures, is MLP-ANN with 10 neurons in the hidden layer. In order to further improve model performance, the ensembles of MLP-ANNs with early stopping were analysed, with base models having up to 18 neurons in the hidden layer. Each base model is allowed to have different number of neurons in the hidden layer. Optimal base models that form an ensemble are determined based on the minimum RMSE. Ensembles with 1 and up to 100 base models were tested, as shown in Figure 7. Learning curves presenting RMSE and MAE vs. number of base models in the ensemble (see Figure 7a) show that performance improves as the number of base models increases, but the curves saturate at approximately 40 base models; hence there is no point to further add base models in the ensemble, as this would increase the model complexity without significant improvement in performance. Similar behaviour can be observed in Figure 7b, where R and MAPE are used as performance criteria. Construction cost estimation of reinforced and prestressed concrete bridges using machine learning Regression tree ensembles realized using bootstrap aggregation (bagging) are optimized for different model parameters, including number of trees in an ensemble that is limited to 500 and to minimum leaf size ranging from 2 to 15. Grid search is used for optimization. Learning curves presenting MSE vs. number of trees in the ensemble for different minimum leaf sizes are shown in Figure 8. Minimum leaf size of 2 and 3 gives the best performance measured by MSE. There is no need to use more than 50 trees in the ensemble, as no improvement is observed with further increase in the number of trees. Random Forests are analysed for different model parameters, including number of trees in an ensemble limited to 500, minimum leaf size ranging from 2 to 10, and the number of randomly selected features as candidates for splitting. The rule of thumb is that m = n/3 features should be used as candidates for splitting for regression problems [24]. Values of m = 2, m = 3 and m = 4 are considered in this paper. Grid search is used for optimization. Regression tree ensembles realized using boosting are optimized for different model parameters, including number of trees in an ensemble, learning rate, number of splits and number of observations per parent node. Learning rate determines the training speed. Learning rates equal to 0.001; 0.01; 0.1; 0.5; 0.75 and 1.0 are analysed in this paper. Number of splits is exponentially increased, starting from 2 0 = 1 to 2 7 = 128. Number of observations per parent node changes between 5 and 20. Optimal model is obtained using 64 splits and 11 observations per parent node. Grid search is used for optimization. Learning curves presenting MSE vs. number of trees in the ensemble for different learning rates are shown in Figure 9. The learning rate equal to 0.1 gives the best performance measured by MSE. There is no need to use more than 30 trees in the ensemble, as no improvement is observed with further increase in the number of trees.  Miljan Kovačević, Nenad Ivanišević, Predrag Petronijević, Vladimir Despotović SVR was analysed using the RBF kernel function. The LIBSVM library was used for SVR implementation [33]. The normalization, which scales all input data into the range [0,1], was done prior to training and testing. The model hyperparameters C,γ and ε were first roughly tuned using grid search, as shown in Figure  10. The SVR model was then fine-tuned into a more accurate position by iteratively narrowing down the search area, leading to optimum hyperparameters C = 1.7271, γ = 18.7334 and ε = 0.0157. The number of iterations is limited to 100. A Gaussian process is completely defined by its mean and covariance function, and so GPR algorithm is tested using different covariance functions, such as exponential, squared exponential, Matern, and rational quadratic, as well as their equivalent ARD covariance functions that have a separate length scale for each input variable (see Table 3). All inputs and targets are normalized to have zero mean and unit variance. The mean of the Gaussian process is set to zero and the covariance function parameters are determined by maximizing the log marginal likelihood. Table 4 shows summarized results of the prediction of construction costs of RC and PC bridges using all machine learning algorithms considered in this paper. The best performing model is highlighted. The worse prediction is obtained using single MLP-ANN, which is expected as most of the competing models are ensemble based models. On the other hand, the ensemble of MLP-ANNs has one of the best performances according to both absolute and relative performance measures. Figure 7 shows that at least 10 base models are needed to achieve sufficient generalization; however, an ensemble with 100 base models is adopted as the representative and used in further experiments. Regression tree ensembles using bagging, as well as random forests, have shown to be relatively poor predictors for the given dataset, unlike regression tree ensembles using boosting which perform substantially better. SVR using RBF kernel have shown a solid performance with R = 0.86 and MAPE = 12.03 %. We also tested linear and sigmoid kernels, but the prediction was poor. Finally, the best prediction performance was obtained using GPR with ARD-exponential covariance functions, with R = 0.89 and MAPE = 11.60 %.
The additional benefit of GPR models is the fact that their training time is significantly lower in comparison to any of the ensemble methods. As GPR with ARD exponential covariance function performs best according to all performance measures, it will be used for future comparisons. Parameters of ARD covariance functions can be used to decide which inputs (features) are relevant for predicting a particular output, and removing less relevant inputs. The analysis of inputs relevance using ARD exponential covariance function is shown in Figure 11 using the length-scale of the covariance function hyperparameters as the criterion. As the values of the lengthscale hyperparameter are higher, the particular input becomes less relevant. Note that the inputs 3 (bridge width), 4 (type of bridge construction) and 7 (gross salary) have significantly higher values of the length-scale parameter; therefore, they can be considered less relevant. This can be explained by the fact that the quarried aggregate price is dependent on the gross salary, and might carry more informative information than the gross salary itself. Hence, gross salary is implicitly represented by the quarried aggregate price. Regarding the bridge width, the output variable Construction cost estimation of reinforced and prestressed concrete bridges using machine learning "Construction costs" is defined in EUR per square meter of the bridge superstructure, which might influence the lower relevance of the bridge width as a feature. The numerical values of the variable width of the bridges in the considered dataset are within a narrower range, as the bridges carrying the motorway dominate (148), while the number of overpasses is smaller (33). This is one of the reasons why the variable width of bridges has less relevance to the model. By expanding the dataset in future research with a significant number of bridges of different widths, the influence of this variable could be determined more accurately. A minor relevance of the variable Type of bridge structure may be explained by slightly higher costs of making a PC span superstructure, although there are potential savings in the assembly work. The construction of RC bridges is cheaper, but scaffolding is more expensive. In both cases, the impact of the project implementation time frame on costs was not analysed. No significant difference between these two construction methods was observed using the proposed model. Table 5 presents results obtained for different combinations of inputs (features) used for modelling. The binary value 0 or 1 denotes whether a particular feature is omitted from the model or not. Note that all the models with reduced number of features outperform the one with the full set of features. The benefit is not only in performance gain, but also in smaller complexity and faster training of the model. The best performing model is model 2 in Table 5 (see regression plot of modelled and targeted values in Figure 12), which depends on the following input variables: average bridge span, total bridge span length, bridge width, average pier height, foundation type, quarried aggregate price index, steel price index. The performance improves in comparison to the model with the full set of features by 0.8 %, measured by MAPE, leading to MAPE equal to 10.86 %. The improvement is also observed for all other performance measures.

Conclusions
In order to make a decision about the need to build transport infrastructure that includes RC or PC bridges, it is necessary to estimate the cost of construction as accurately as possible in the early phase of project implementation. The estimation of construction costs of RC or PC bridges is a complex process that is influenced by a variety of factors. This paper gives a comprehensive overview of the state-of-the-art machine learning methods that can be used for estimating these costs, including MLP-ANN, ensembles of MLP-ANNs, regression tree ensembles (random forests, boosted and bagged regression trees), SVR with RBF kernel, and GPR with exponential, squared exponential, Matern, and rational quadratic covariance functions.
In order to train and assess the models, a dataset was created that includes project and contract documentation for 181 RC and PC bridges constructed on Pan-European Corridor X. All models were trained and tested under equal conditions using the 10- Table 5. Prediction of construction costs of RC and PC bridges using GPR with ARD exponential covariance function measures, most of the tested models are able to capture very well complex interrelations between the input features, and demonstrate strong generalization capability. Although ensemble methods, such as ensembles of ANNs, regression tree ensembles using boosting and SVR with RBF kernel, perform well, they require a considerable amount of time to train the models, especially if the number of base models in the ensemble is high. On the other hand, the complexity of models based on Gaussian processes is substantially lower, but they are still able to outperform the ensemble models. Moreover, feature reduction is easy to combine with Gaussian process regression using ARD, leading to models with better performance and even lower complexity. Two out of nine input features can be reduced without any negative influence on model performance. To the best of our knowledge, no results were previously reported for implementation of Gaussian process regression in estimating construction costs. The research carried out in this paper has confirmed that methods based on machine learning eliminate the biases introduced by human factor, and offer a fast and reliable tool for the construction industry to estimate construction costs of concrete bridges, even in early implementation stages, when only the basic technical and economic characteristics are available. Further research might be aimed at improving the dataset used for model training and evaluation by including additional relevant data about both the existing and new bridges. The problem of estimating construction costs is considered as a regression problem. However, it can be also observed as a classification problem if the costs are divided into groups. In that case, classification algorithms can be applied. The developed models can also be applied, with some modifications, to other costs during the project life cycle.