Letters

7 pages
7 views

A new model selection strategy in artificial neural networks

Please download to get full document.

View again

of 7
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Share
Description
A new model selection strategy in artificial neural networks
Transcript
  A new model selection strategy in artificial neural networks Erol Eg˘riog˘lu  a,* , C¸ag˘das  Hakan Aladag˘  b , Su¨leyman Gu¨nay  b a Department of Statistics, University of Ondokuz Mayıs, Samsun 55100, Turkey b Department of Statistics, University of Hacettepe, Ankara 06800, Turkey Abstract In recent years, artificial neural networks have been used for time series forecasting. Determining architecture of arti-ficial neural networks is very important problem in the applications. In this study, the problem in which time series areforecasted by feed forward neural networks is examined. Various model selection criteria have been used for the determin-ing architecture. In addition, a new model selection strategy based on well-known model selection criteria is proposed. Pro-posed strategy is applied to real and simulated time series. Moreover, a new direction accuracy criterion called modifieddirection accuracy criterion is discussed. The new model selection strategy is more reliable than known model selectioncriteria.   2007 Elsevier Inc. All rights reserved. Keywords:  Artificial neural networks; Feed forward neural networks; Time series forecasting; Model selection criteria 1. Introduction Artificial neural networks (ANNs) have attracted more and more attention from both academic researcherand industrial practitioners in the recent years [6]. ANNs have been widely used to model time series in variousfields of applications [2]. ANNs have been used as a good alternative method for both linear and nonlineartime series forecasting. Zhang et al. [15] presented a review of the current status in applications of neural net-works for forecasting.One of the most popular neural net paradigms is the feed forward neural network (FNN) and the associ-ated back propagation (BP) training algorithm [13]. In this article, we focus on the three layered FNN thatconsists of input layer, hidden layer and output layer has one node. Numbers of nodes of input and hiddenlayers are varied in specified range. Optimal numbers of nodes of input and hidden layers are determined bymodel selection criteria on tests data. Qi and Zhang [6] investigated well-known criteria such as AIC [1], BIC [9], root mean squared error (RMSE), mean absolute percentage error (MAPE), and direction accuracy (DA)criteria. Buhamra et al. [2] proposed model selection strategy based on Box–Jenkins methods. In addition, formodel selection of ANNs, Siestema and Dow [10] and Reed [7] proposed pruning algorithms, Roy et al. [8] 0096-3003/$ - see front matter    2007 Elsevier Inc. All rights reserved.doi:10.1016/j.amc.2007.05.005 * Corresponding author. E-mail address:  erole1977@yahoo.com (E. Eg˘riog˘lu).  Available online at www.sciencedirect.com Applied Mathematics and Computation 195 (2008) 591–597 www.elsevier.com/locate/amc  proposed the polynomial time algorithm, Murata et al. [5] suggested the network information criterion andWang et al. [14] introduced the canonical decomposition technique.In this article, a new model selection strategy is introduced. Proposed strategy is based on AIC, BIC,RMSE, MAPE, DA and modified direction accuracy (MDA) criterion which is proposed in this study.The remainder of this paper is organized as follows: In Section 2, elements of ANN are introduced. InSection 3, a new model selection strategy is proposed. In Section 4, proposed strategy is applied to real and simulated time series and compared with classical model selection criteria. 2. Elements of the artificial neural networks Determining the elements of the ANN issue that affect the forecasting performance of ANN must be con-sidered carefully. Elements of the ANN are network architecture, learning algorithm and activation function.One critical decision is to determine the appropriate architecture, that is, the number of layers, number of nodes in each layers and the number of arcs which interconnect with the nodes [16]. FNN has been used inmany studies for forecasting. Therefore, our focus is on the feed forward networks. The determining of archi-tecture depends on the basic problem. Since, in the literature, there are no general rules for determining thebest architecture, much architecture must be examined for the correct results. Fig. 1 depicts the broad feedforward ANN architecture that has single hidden layer and single output. Other important architecturesinclude direct connections from input nodes to output nodes. Fig. 2 depicts these architectures. Fig. 1. A broad feed forward ANN architecture.Fig. 2. A direct connected feed forward ANN architecture.592  E. Eg ˘ riog ˘ lu et al. / Applied Mathematics and Computation 195 (2008) 591–597   Learning of ANN for a specific task is equivalent to finding the values of all weights such that the desiredoutput is generated to the corresponding input. Various training algorithms have been used for the determin-ing optimal weight values. The most popularly used training method is the back propagation algorithm [11].In the back propagation algorithm, learning of the artificial neural network consists of adjusting all weightssuch as the error measure between the desired output and actual output [4].Another element of ANN is the activation function. It determines the relationship between inputs and out-puts of a node and a network. In general, the activation function introduces a degree of the nonlinearity that isvaluable for the most ANN applications. The well-known activation functions are logistic, hyperbolic tangent,sine (or cosine) and the linear functions. Among them, logistic transfer function is the most popular one [15]. 3. A new model selection strategy The total available data were divided into the training set, the test set and the validation set. The training setwas used for ANN model development, the test set was used to evaluate the forecasting ability of the modeland the validation set was used to compare model selection criteria.Numbers of nodes of input and hidden layers have been selected by model selection criteria on test data.The well-known model selection criteria are AIC, BIC, RMSE, MAPE, DA. AIC and BIC criteria whichpenalize large models. RMSE and MAPE are measures of the deviations of the predicted values from theactual values. DA is measure forecast direction accuracy. RMSE and MAPE criteria can also be used solelyfor model selection. The common forms of these criteria areAIC ¼ log P T i ¼ 1 ð  y  i  ^  y  i Þ 2 T  ! þ 2 mT  ;  ð 3 : 1 Þ BIC ¼ log P T i ¼ 1 ð  y  i  ^  y  i Þ 2 T  ! þ m log ð T  Þ T  ;  ð 3 : 2 Þ RMSE ¼ P T i ¼ 1 ð  y  i  ^  y  i Þ 2 T  ! 1 = 2 ;  ð 3 : 3 Þ MAPE ¼  1 T  X T i ¼ 1  y  i  ^  y  i  y  i  ;  ð 3 : 4 Þ DA ¼  1 T  X T i ¼ 1 a i ;  a i  ¼  1 if   ð  y  i þ 1   y  i Þð ^  y  i þ 1   y  i Þ >  0 ; 0 otherwise ;   ð 3 : 5 Þ where  y i   is the actual value;  ^  y  i  is the predicted value;  T   is the number of data and  m  is the number of weights.In this study, a novel criterion called modified direction accuracy (MDA) is based on especially directionaccuracy criterion and we received inspiration from Chumby and Modest’s [3]. MDA criterion is computed asfollows:  A i  ¼ 1 ;  y  i þ 1   y  i  6 0 ;  A i  ¼ 0 ;  y  i þ 1   y  i  >  0 ;  F  i  ¼ 1 ;  ^  y  i þ 1  ^  y  i  6 0 ;  F  i  ¼ 0 ;  ^  y  i þ 1  ^  y  i  >  0 ;  D i  ¼ð  A i   F  i Þ 2 ; MDA ¼ P T   1 i ¼ 1  D i T    1  : ð 3 : 6 Þ All mentioned criterion above measure different aspects of forecasting. In this paper, our proposed criteria,which combines all aspects of forecasting criterion mentioned above, consists of summing weighted measuressuch as AIC, BIC, RMSE, MAPE, DA and MDA. We call this new criterion as weighted information crite-rion (WIC). The algorithm of suggested model selection strategy based on WIC is introduced as follows: E. Eg ˘ riog ˘ lu et al. / Applied Mathematics and Computation 195 (2008) 591–597   593  Step 1 : Possible architectures are determined. For example, number of nodes of output layer is 1, number of nodes of input layer is 12 and number of nodes of hidden layer is 12. In this case, totally possiblearchitecture number is 144. Step 2 : Best values of weights are determined by using training data and AIC, BIC, RMSE, MAPE, DA andMDA are calculated for test data. Step 3 : AIC, BIC, RMSE, MAPE, DA and MDA are standardized for possible architecture. For example 144AIC values are standardized as follows:AIC i  ¼  AIC i  min ð AIC Þ max ð AIC Þ min ð AIC Þ : Step 4 : WIC is computed in the following way:WIC ¼ 0 : 1 ð AIC þ BIC Þþ 0 : 2 ð RMSE þ MAPE Þþ 0 : 2 ðð 1  DA Þþ MDA Þ :  ð 3 : 7 Þ Step 5 : Architecture, which has minimum WIC, is selected.WIC criterion, where each criterion appears with its corresponding weight, is computed by (3.7). Because of AIC and BIC often select the smallest models in which number of hidden and input nodes are 1 or 2, theirweights were determined as 0.1. We consider that the other criteria have same importance for forecasting accu-racy, their weights were determined as 0.2. 4. The application of proposed strategy The proposed strategy is applied to real and simulated time series. Real time series are main indicators of the foreign trade of Turkey, namely, proportion of imports covered by exports (PICE), share in GNP of export (SE) time series which were obtained from State Institute of Statistics Prime Ministry Republic of Tur-key [12]. The simulated time series are generated using first order seasonal autoregressive process. SimulatedSAR model is given as follows:  X  t   ¼ 0 : 7  X  t   1 þ 0 : 7  X  t   12 þ e t  ;  e t     N  ð 0 ; 1 Þ :  ð 4 : 1 Þ The architecture given in Fig. 1 is used for real time series. Logistic activation function is used in this archi-tecture. This function is given as  f  ð  x Þ¼ð 1 þ exp ð 0 : 8  x ÞÞ  1 :  ð 4 : 2 Þ The architecture given in Fig. 2 is used for simulated time series. Logistic activation function is used innodes of hidden layer and linear activation function, given below, is used in output node  f  ð  x Þ¼  x :  ð 4 : 3 Þ In all the architectures, number of inputs varies from 1 to 12 and number of nodes in hidden layer variesfrom 1 to 12. Thus, for each time series, 144 architectures are used to analyze. Time series data are divided intothree sections: training data, test data and validation data. Results for the real time series are shortly given inTables 1 and 2. The best architecture according to RMSE for the real time series is given in Table 1. The best architectures according to WIC for the real time series is given in Table 2. For example, in Table 1, for PICE Table 1The best architecture according to RMSE for real time seriesSeries Best architecture for test data Data RMSE MAPE AIC BIC 1-DA MDA WICPICE 4-6-1 Test 7.56 6.56 16.04 13.70 0.2 0.25 0.21Validation 13.09 18.20 17.14 14.80 0.2 0.5 0.21GNP 8-8-1 Test 2.1805 16.06 30.35 24.73 0.2 0 0.13Validation 7.48 34.09 32.82 27.20 0.4 0.25 0.25594  E. Eg ˘ riog ˘ lu et al. / Applied Mathematics and Computation 195 (2008) 591–597   time series, the best architecture according to RMSE criterion for test data has 4 input layer nodes, 6 hiddenlayer nodes and 1 output layer node. For this architecture, RMSE value is 7.56 for test set and 13.09 for val-idation set. In Table 2, for example, for PICE data, best architecture according to WIC criterion for test datais 4-2-1. When this architecture is used, RMSE value is 10.04 for test set and 12.05 for validation set.A good model selection criterion must be consistent for test data and validation data. Namely, an architec-ture which has good criterion value for test data should have a good criterion value for validation data too. Ameasure of consistent is correlation between the criterion values for test data and validation data. Theobtained correlation values for the real time series are given in Table 3.Results of the simulated time series study are shortly given in Tables 4 and 5. The best architecture accord-ing to RMSE for simulated time series is given in Table 4. The best architecture according to WIC for sim-ulated time series is given in Table 5.The obtained correlation of test data criterion values with validation data criterion values for simulatedtime series are shown in Table 6.We observe that WIC criterion has higher correlations values in all series according to Tables 3 and 6. As aresult of  Tables 3 and 6, it is shown that WIC criterion is more consistent than RMSE criterion. According tothese results, it is clear that WIC criterion is more reliable than RMSE. From Tables 1, 2, 4 and 5, we see thatWIC criterion is better than RMSE criterion. Tables also show that WIC criterion is selecting smaller model.WIC criterion is better than RMSE criterion for 1, 2, 3 and PICE time series. RMSE and WIC criterionselected same architecture for Series 4, 5 and GNP time series. Table 2The best architecture according to WIC for real time seriesSeries Best architecture for test data Data RMSE MAPE AIC BIC 1-DA MDA WICPICE 4-2-1 Test 10.04 9.22 8.61 7.83 0.20 0.25 0.10Validation 12.05 16.64 8.97 8.19 0.2 0 0.08GNP 8-8-1 Test 2.18 16.06 30.35 24.73 0.2 0 0.13Validation 7.48 34.09 32.82 27.20 0.4 0.25 0.25Table 3The correlations of criterion values between the test data and the validation dataSeries RMSE WICPICE 0.345 0.750GNP 0.563 0.677Table 4The best architecture according to RMSE for simulated time seriesSeries Best architecture RMSE MAPE AIC BIC DA MDA WIC1 12-1-1 Test 0.68 106.22 4.45 3.44 0.2 0.25 0.17Validation 0.60 101.57 4.21 3.19 0.2 0.5 0.192 12-11-1 Test 0.56 110.32 56.03 44.86 0.6 1 0.50Validation 0.85 389.33 56.88 45.71 0.2 0 0.273 12-12-1 Test 0.56 40.07 61.22 49.04 0.2 0 0.24Validation 0.76 54.33 61.86 49.68 0.4 0.25 0.334 4-3-1 Test 0.70 94.74 5.30 4.12 0.2 0 0.05Validation 1.69 268.86 7.05 5.88 0.6 0.75 0.405 8-1-1 Test 0.91 210.33 3.41 2.71 0.2 0.25 0.12Validation 0.92 127.78 3.43 2.73 0.2 0.25 0.17 E. Eg ˘ riog ˘ lu et al. / Applied Mathematics and Computation 195 (2008) 591–597   595
Related Documents
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks
SAVE OUR EARTH

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!

x