FITSUM BEKELE TILAHUN
Renewable Energy Systems
ITT, TH Köln (Cologne University of Applied Science)
Betzdorfer Strasse 2, 50679 Köln
GERMANY
ftsebeek@gmail.com www.amu.edu.et
RAMCHANDRA BHANDARI
Renewable Energy Systems
ITT, TH Köln (Cologne University of Applied Science)
Betzdorfer Strasse 2, 50679 Köln
GERMANY
ramchandra.bhandari@thkoeln.de www.thkoeln.de
MENEGESHA MAMO
Electrical & Computer Engineering
Addis Ababa University Institute of Technology (AAIT)
King George VI Street, Addis Ababa
ETHIOPIA
menegesha.mamo@aau.edu.et www.aait.edu.et
Abstract:  Current research studies have demonstrated the capability of Artificial Neural Networks (ANNs) in learning to generalize for solving complex industrial problems. However, hardly few such studies have been conducted to investigate if these ANNs are also effective in identifying energy use patterns in industrial processes. In this research work a resilient gradient descent variant of a multilayer neural network (MLP) is developed for determining steam consumption patterns as a function of production rate in textile factory. The model is tested using realtime data from each steamconsuming machine’s daily production and a meter reading of an electrical steam boiler. Parts of these data (85%) were randomly selected in order to train the network. The remaining data were used to test the performance of the trained network. The result obtained showed an acceptable error performance index of magnitude around 0.0674. The model also gave a correlation coefficient (R) between the estimated and target values as 0. 9781. Thus the proposed neural network can be used as a valuable tool as an energy use approximator in industrial production processes. Moreover, with the availability of more training data, an increased prediction capability can be achieved.
KeyWords:  Artificial Neural Networks (ANNs), multilayer neural network (MLP), resilient gradient descent industrial processes, steam consumption prediction.
1 Introduction
One common as well as important Artificial Neural Networks (ANNs) application that finds itself in much practical use is function approximation. Function approximation range from determining realizable feedback function that relates measured outputs to control input in control systems to finding a function that correlated past values of an input signal to output in adaptive filtering. Lately, Artificial Neural Networks (ANNs) have been used extensively in finding underling functional relation of engineering processes. This pertains to the ability of ANNs to predict or solve nonlinear problems with high degree of accuracy given enough data to learn from. A wide variety of ANNs have been used with varying configuration that suits the specific requirements of an application.
A widely used and efficient ANN function approximation is the MLP (multilayer perceptron) networks based on the BP (backpropagation) learning algorithm. Though researches are still contributing to know more about these ANNS, several studies have exemplified the back propagation learning algorithm as the forerunner among the Multilayer perceptron algorithms [13].
The accuracy and convergence speed of these MLPs usually depend on the neural network architectural configuration as well as choice of tuneable parameters during the implementation stage. In previous studies, researchers have used some techniques to solve real applications using these algorithms. However hardly any examples of industrial processes energy consumption prediction from production process were done. This paper is an attempt to answer this question by implementing one of the most powerful ANNsMLP while trying to consider issues relating to their practical application.
Realization of perceptron concept by Rosenblatt in 1958 was the hallmark of ANNs. The perceptron unit is an individual processing unit that accepts weighted input and produces a rule based threshold output. MLP is a feedforward ANN that is implemented by customizing these fundamental units. This customization introduced addition of layers of neurons and a nonlinear transfer function [2, 3].
2 Problem Formulation
Energy consumption determination is perhaps the first crucial element in demand side energy management (DSM). Additionally, in integration process of renewables such as solar plant in industries, knowledge of the load is a necessary requirement. To achieve this, direct measurements of generation and consumption can be done, otherwise known as an energy audit. This method is costly and might mean persistent measurements under different industrial production conditions. Another way is to get the industrial processes average energy consumption from a manufacturer’s specification. This method, even though simple, is not usually practically usable. This is because, it does not take into account the energy utilization under changing scenarios such as a not nominal operation, changing input parameters in production processes, and changing behavior of machines through its life cycle. The last method, which is proposed in this study, is to use ANNs to predict energy use patterns under realtime changing production processes. This however, requires a substantial data and several model configuration trials in order to generalize well.
This work is part of a larger project called “Control and Optimization of a LargeScale Solar Plant in Ethiopian Textile Industry”. The aim of the project is primarily a smooth integration of an economically realizable solar plant for existing steam boiler’s feed water. During the course of this project, determining the thermal energy demand was deemed necessary for optimal sizing and operation of the solar plant. This task was difficult to achieve due to the absence of fund to do energy auditing. Neither was average thermal energy use determination possible since the factory was very old and no known specifications are available. These combined factors lead to the idea of predicting energy use patterns from other available related data through an ANNs. These related data are daily production from steam consuming machines and a KWh meter reading of a boiler.
The proposed research work employs the wellknown BP algorithm for a multilayer feedforward neural network. Figure 1 depicts the methodology used. The work has taken in to consideration all issues pertinent in practical implementation of these ANNs. To this end a Matlab scrip code was written that incorporates all the above mentioned issues and arrived at acceptable performance during runtime.

2.1 Fundamentals of Artificial Neural Networks (ANNs)
ANNs are defined as a collection of processing units with networks for interaction with each other through a weighted interconnection [3]. The whole aim of these networks is to replicate, in a rather simplified manner, the workings of a human biological central nervous system. The performance of these ANNs depends, in a not clearly defined manner, on the number, interconnection and interaction of these constituent units.
The aforementioned units are known as neurons. These neurons receive and give input signals to all other units of which they are connected.
A neuron model is shown in Figure 2. The output strength from the neuron is determined from the function f, which itself depend on the value of weight (W) and bias (b) associated with each interconnection. The implementation process begins when an input is presented to the network and propagated through the network as an output by the transfer function otherwise known as activation function. For MLP this process goes on from neuron to neuron and layer by layer through the output layer that process and gives the final value.
In MLP the training is implemented by examples prior to their usage as a useful network. This training attempts to iteratively adjust connection weights and biases using a known training data. To facilitate this training, the outputs from the network are compared to the target examples, which are known as the error performance index (PI). This error is compared and propagated back through the network to adjust weights and biases until an acceptable PI is achieved.

The final stage of the network implementation involves fixing the adaptive weights and biases using the last values of the training stage. The network then computes the output directly to give an estimated value for the inputs.
2.1. The MLP Architecture and the Back Propagation (BP) Algorithm
The threelayer MLP network with the associated notation is depicted in Fig. 2.
For MLP the result from preceding layer feeds the following layer which is denoted by
Where is the number of layers in the network.
The neurons in the first layer accept network inputs:
The outputs of the network in the final layer are taken as outputs:
The target and input to the network are:
Where and are input and target for the network respectively.
The performance of the network is judged by the mean square error given as
Using the steepest descent algorithm, a formulation for recursive learning of the network is given as
Where is the learning rate.
Since the above error function does not have an explicit relation for the weights in the hidden layer, use of chain rule for derivatives manipulation. The chain rule for a function f with explicit variable n, the derivative for the implicit variable w could be found
Calculation of the second part of the above equations is now straightforward because there is a simple relation between the net input to layer and the weights and bias in that layer:
Thus
Let’s define
Where is the sensitivity i.e. the sensitivity of F that is associated with variation in the ith element of the net input layer m. Employing this definition results in a simpler form for equations (9) and (10) which is:
Thus the steepest descent algorithm can be generalized as
)
The condensed matrix representation is given by:
Where:
Here also the sensitivities will be computed using the chain rule. This computation of sensitivities which are determined from previous layers gave the name backpropagation to the algorithm.
Let’s now define the Jacobian matrix for backpropagation of the sensitivities:
Now let’s take the i, j element of the above matrix:
where
Thus, the Jacobian matrix is given as:
Where:
Finally using the chain rule the sensitivities can be given as:
These sensitivities are propagated backward layer by layer till the input layer as:
2.2.1 Resilient Gradient algorithm
Although the BP algorithm is the best among the MLP networks, in its basic form it has two major limitationslong learning time and possibility of local minima [1, 35]. Thus a variant of the basic BP algorithm known resilient gradient method which is known to remove these drawbacks is utilized. [4, 5]
In this algorithm, only the sign of derivative is used to determine the weight update value. The implementation of this algorithm follows the following rule:
a) If the partial derivative of the corresponding weight has the same sign for the two consecutive iterations, the weight update is increased by a factor say, ɳ+ otherwise
b) the weight update value is decreased by a factor ɳ else
c) if the derivative is zero, then the weight update value remains same.
d) However, if the weight continues to change in the same direction for several iterations, the weight is increased by its update value otherwise the update value is reduced.
2.3 Implementation of BP Algorithm for steam consumption prediction
The diagram in Figure 4 depicts the ANNs training procedure followed. This procedure is a continuous iterative process starting from data collection and preprocessing stage to achieve more efficient neural network training. While at this first step, the data were partitioned into training and testing sets. Following this, selection of suitable network type and architecture (e.g., number of hidden layers, number of nodes in these layers) were done. Then choice of appropriate training algorithm from the multitude of available paradigms were carried out to handles the task. Finally, once the ANNs is trained, analysis to determine the network performance was done. This last stage has dealt with some practical issues with the data, the network architecture, and the training algorithm. The whole procedure is then iterated until an acceptable performance is achieved.
2.3.1 PreTraining Steps
The pretraining steps comprises three separate tasks namely data collection, data Preprocessing, and choice of Network type and architecture.
2.3.1.1 Data Collection
Input data which are actual daily production from all steam consuming machines were collected for the year 2016 in Bahir Dar textile factory. Parts of these data are shown in Figure 3 for first week of August 2016. Further, daily total steam production from an electrical boiler (Collins Walker) was used as an output Data. The existing steam electrical boiler with its specification is given in Table 1. Meter readings for the same year and day as the input data were also recorded. Figure 2, depicts these meter readings for the same days of August 2016.
2.3.1.2 Data preprocessing
The aim of this step is to lay a conducive ground for better network training. Though several data preprocessing steps exit in the literature, this work used feature extraction, normalization, and handling of missing data.
The available data for the ANNs output are meter reading of an electrical boiler. These data show the total electrical energy (KWh) consumed by the boiler. To make these data useful a manipulation to get the total steam delivered at the premises of the steamconsuming machines is done. The procedure is explained as follows:
The total steam delivered at the steamconsuming machines is given by
Where is the steam delivered, is the total boiler steam produced and is the steam transmission loss
The total daily steam produced by the boiler can be determined from
Table 1: Production rate vs boiler meter reading
Where is the daily electrical energy consumed by boiler, is the rated boiler power that relates to boiler steam production b in Kg as given in boiler specification Table 1.
Table 1 Boiler specification
Specification Description 
Specification Value 
Name and Type 
COLLINE, electrical boiler 
Permissible & working pres. 
13 bar, 10.3 bar 
Design & Max Steam temp. 
190^{o}C, 184^{o}C 
Rated steam output 
3348Kg/hr./boiler 
Power consumption 
2106KW/boiler 
The steam loss could range from 520% of the steam produced [6]. In the current model, a stochastic representation of this loss as a uniform distribution of the minimum and maximum values was used. This was done to reduce the uncertainty of quantifying the steam loss in the several varying steam distribution networks.
Figure 3 Daily production rates from steam consuming machine, 1 ^{st} week, 2006
It is reported in [78] that rescaling or normalization of training data improves the learning and convergence of a network. The normalization procedure used in this work aims to adjust the data so that they have a specified mean and variance — typically 0 and 1. This can be done with the transformation
where is the minimum of the input vectors in the data set, and is the maximum value.
Practically what this normalization does is to shift zero of the scale and normalize the standard deviation of the data. Also shuffling of these data were done to decrease the effect of learning of the network for similar sets of data at the expense of another.
Because of limited data, we just can’t afford to simply throw out missing data. Rather, two strategies were used depending on whether the missing data was from input or output. When there was a missing input data, a flag to know this data (either a 1 or 0) were set and a replacement of this missing component with the average values of the input data were carried out. Instead when a missing data was present at the output a modification of the error performance was done in such a way that, for this particular data the performance calculation was skipped to nullify its contribution to learning process.

Finally, the collected data was divided in to two sets: training, and testing. The training set made up 85% of the full data set, with testing making up the remaining 15% each. Caution to make each of these sets representative of the full data set — that the test sets cover the same region of the input space as the training set were considered. For this, selections of each set from the full data set were done.
2.3.2 Choice of Network Architecture
The universally accepted network architecture for fitting problems is the multilayer perceptron [13]. It was shown in [3] that this standard neural configuration uses tansig function in the hidden layers, and linear function in the output layer. This is because the former function produces outputs (which are inputs to the next layer) that are centered near zero, whereas the later function always produces positive outputs.
The choice of the optimum number of hidden units depends on many factors whose interactions are not easy to understand. These factors are amount of training data, number of input and output units, the level of generalization requirement from the network, type of transfer function and the training algorithm [9]. Conflicting trends are observed when the number of hidden units vary i.e. too few leads to underfitting while too many results in overfitting and slow learning process. However, it is highly unlikely to use more than two hidden layers for a standard function approximation problem [3].
To fix the number of neuron in the hidden layer, different authors suggest a ruleof thumb from their experiences. In [10] it is given as
Where n is the number of hidden neurons, and are number of neurons in input and output and a is a constant between 1 and 10.
Another work [11] suggested to use
Where is hidden neuron numbers, is number of training samples, are input and output neurons.
The authors strongly believe that the best way is to try multiple runs for a range of different hidden layers with different neurons in each layer and observe the network performance. For the current work, two hidden layers with ten neurons in each layer achieved the set performance criterion.
2.3.3 Weight Initialization
ANNs weights should be initialized with small random values. Since the BP algorithm work on the weights in a similar fashion, initializing these weights alike will eventually make all units learn in the same way [1214]. Similarly, these small random values will result in network output that corresponds to highest weight update [13]. In this work, effort has been made to make the performance of the final trained neural network independent of the choice of initial weight values. For that several runs of the network for different initial weight values were performed that has resulted in similar performance.
2.3.4 Choice of Training Algorithm
For multilayer networks to perform function approximation, the resilient gradient descent training algorithm provides a guaranteed performance minimization of the error function with relatively fast convergence rate [4, 14, 16]. In this work, this algorithm was tested to check its validity for the task at hand.
2.3.5 Stopping Criteria
For the majority of practical neural networks, the training error never converges identically to zero. As a result, other criteria for deciding when to stop the training is generally considered. There are several methods reported in the literature such as stooping when the performance index reaches a certain level, setting a high training iteration number, training for a fixed iteration then restarting the training with initial weights from previous training and stopping when the gradient of the performance index is sufficiently low [1718]. For this work a stopping criterion when either the performance index is met or when a large number of iteration reached is implemented for the simple reason it met the practical requirement of the task.
2.3.6 PostTraining Analysis
Prior to concluding the work, analysis of the trained network to see if the training was successful is necessary. A powerful method of doing this is to do curve fitting for regression between the trained network outputs and the corresponding targets [3]. For that, we fit a linear function of the form
where m & c are the slope & offset, respectively, of the linear function, is a target value, is a trained network output, and is the residual error of the regression.
The terms in the regression can be computed as follows:
(35)
Where,
A plot of this fitting to gauge the performance of the proposed ANNs is discussed in the results section.
3 Results and Discussions
A Matlab script file for the implementation of resilient gradient variant of the BP algorithm were written. This code was run for different learning rates and varying number of hidden neurons. The regression coefficient (R) and Mean Square Error (MSE) were compared. As can be seen from Figure 5 the resilient gradient method shows superior performance as the complexity of the neural network increase.
(a)
(b)
Figure 5 effect of learning rate and neuron number variation on (a) correlation and; (b) mean square error
Next we will consider performance of the best resilient configuration. Figure 6 shows the regression analysis where the solid line represents the linear regression, the thin dotted line represents the perfect match, and the circles represent the data points. From this figure it is possible to see that the match is good, although not perfect. There are few points that seem to diverge from the regressed line. This might rise due to the presence of an incorrect data point, or because the data is far from other training points. The latter is the case here since the data used is not representative of all input space. Analysis of the scatter plot as shown in Figure 7 clearly shows the case.
Addition of points that span the whole data space will improve the generalization capability of the proposed neural network. Additionally, the correlation coefficient between the estimated and target values, which is the R value was computed.
Generally, the R value varies from –1 to 1, however it is should be closer to 1 for prediction applications of BP algorithm. R=1 means all of the data points lie exactly on the regression line & R=1 means they are randomly scattered away from the regression line. For this case as can be seen from Figure 6, the data does not fall exactly on the regression line, but the variation is very small.
Figure 7 scatter plot of the training data and the steam consumption
The MSE values for the best network configuration are given in Figure 8. As can be seen from this figure, the error overshoot at the start of the training and subsequently receded to a stable lower value. As stated in section 2, the stopping criterion was based on the mean square error that minimized the actual target and output of the network. Although the neural model achieved a relatively minimum values around 300 iteration, further increment was done to get better result with the correlation coefficient. This tradeoff is considered acceptable since the overall neural model error is significantly low about 0.0674.
Figure 9 gives the final trained neural network output after a test data was presented. The variation of neural output is due to variation of daily production rates of steam consuming machines.

Table 2 summarizes the final result i.e. the steam consumption rate of each textile machine. For this the average production value of the machine is presented to the network as input. The output value is given in a range because of the random stochastic nature of steam loss and weight initialization used.
Figure 9 output of trained neural model using test data
Table 2 final steam consumption estimates
Industrial process 
Steam consumption (kg/kg) 
Bleaching 
0.60.9 
Washing 
0.71.1 
Calendaring 
0.81.4 
Jigger 
1.24.5 
Sizing 
7.89.0 
4 Conclusion
In this research paper, effort has been made to estimate industrial steam consumption form machines daily production rates and boiler meter reading. The neural algorithm used is explained in detail with the associated practical issues of implementation. Several simulations run was carried out in Matlab to arrive at an optimum neural configuration. Finally real textile factory data was used for training and test of this final optimized neural model.
From the simulation results of a Matlab code implementation, it was found out that the resilient gradient descent algorithm of an MLP is a valuable tool for function approximation such as energy use prediction. However, practical considerations that relates to preprocessing as well as selection of representative input data were found to be a prerequisite before implementation.
Moreover, it was found out that the number of layers and the amount of neurons in those layers has a direct influence on the accuracy of the network. From the experiment it was found out that two hidden layers and hundred neurons on those layers has resulted in best performance of the network. However, the number of neurons in a layer could be reduced with the availability of more data to train the network.
References:
[1] YuRong Zeng, Yi Zeng, Beomjin Choi, Lin Wang. MultifactorInfluenced Energy Consumption Forecasting Using Enhanced Back propagation Neural Network. Energy, 2017; 127:381396.
[2] Uzlu E, Kankal M, Akpınar A, Dede T. Estimates of energy consumption in Turkey using neural networks with the teaching–learningbased optimization algorithm. Energy 2014; 75: 295303.
[3] Martin T. Hagan, Howard B. Demuth. Neural Network Design 2 ^{nd }Edtion, 2014.
[4] Alaa Ali Hameed, Bekir Karlik, Mohammad Shukri Salman. Backpropagation Algorithm with Variable Adaptive Momentum. KnowledgeBased Systems,2016; 114:7987.
[5] C.G. Looney, “Advances in feedforward neural networks: demystifying knowledge acquiring black boxes”, IEEE Transactions on Knowledge and Data Engineering, Volume: 8, Issue: 2,1996
[6] “Energy Audit of Bahir Dar Textile Share Company, Ethiopia”, Bangalore: The Energy and Resources Institute; 53 pp., Project Report No. 2013IB22, 2014
[7] J. Sola, “Importance of input data normalization for the application of neural networks to complex industrial problems”, IEEE Transactions on Nuclear Science, Volume: 44, Issue: 3, 1997
[8] Zhang Q., Sun S. Weighted Data Normalization Based on Eigenvalues for Artificial Neural Network Classification. In: Leung C.S., Lee M., Chan J.H. (eds) Neural Information Processing. ICONIP. Lecture Notes in Computer Science, Springer, 2009; 5863.
[9] N. Murata, S. Yoshizawa & S. Amari, “Network information criteriondetermining the number of hidden units for an artificial neural network model”, IEEE Transactions on Neural Networks, Volume: 5, Issue: 6, 1994
[10] Saduf Afzal, Mohd. Arif Wani “Comparative Study of Adaptive Learning Rate with Momentum and Resilient Back Propagation Algorithms for Neural Net Classifier Optimization”
[11] Wahed, M. A “Adaptive learning rate versus Resilient back propagation for numeral recognition” Journal of AlAnbar University for Pure Science, 94105,2008
[12] D. Erdogmus. Accurate initialization of neural network weights by backpropagation of the desired response. Proceedings of the International Joint Conference on Neural Networks, 2003
[13] Go J., Baek B., Lee C. Analyzing Weight Distribution of Feedforward Neural Networks and Efficient Weight Initialization. In: Fred A., Caelli T.M., Duin R.P.W., Campilho A.C., de Ridder D. (eds) Structural, Syntactic, and Statistical Pattern Recognition. Lecture Notes in Computer Science, Springer, 2004;3138
[14] Weipeng Cao, Xizhao Wang, Zhong Ming, Jinzhu Gao, A Review on Neural Networks with Random Weights, Neurocomputing, 2017; In Press, Corrected Proof — Note to users.
[15] E. Barnard, “Optimization for training neural nets,” IEEE Trans. on Neural Networks, vol. 3, no. 2, pp. 232–240, 1992.
[16] T. P. Vogl, J. K. Mangis, A. K. Zigler, W. T. Zink and D. L. Alkon, “Accelerating the convergence of the backpropagation method,” Biological Cybernetics., vol. 59, pp. 256–264, 1988.
[17] Liu, Y., Starzyk, J.A., Zhu, Z.,. Optimized approximation algorithm in neural networks without overfitting. IEEE Trans. Neural Networks 19 (6), 2008; 983–995.
[18] Masoud Yaghinin, Mohammad M. Khoshraftar, Mehdi Fallahi. A hybrid algorithm for artificial neural network training. Engineering Applications of Artificial Intelligence, 2013:26:293301.