oalogo2  

AUTHOR(S): 

Fitsum Bekele Tilahun, Ramchandra Bhandari, Menegesha Mamo

 

TITLE

Industrial Process Steam-Consumption Prediction through an Artificial Neural Networks (ANNS) Approach

pdf PDF htmlFULL-TEXT HTML

 

ABSTRACT

Current research studies have demonstrated the capability of Artificial Neural Networks (ANNs) in learning to generalize for solving complex industrial problems. However, hardly few such studies have been conducted to investigate if these ANNs are also effective in identifying energy use patterns in industrial processes. In this research work a resilient gradient descent variant of a multilayer neural network (MLP) is developed for determining steam consumption patterns as a function of production rate in textile factory. The model is tested using real-time data from each steam-consuming machine’s daily production and a meter reading of an electrical steam boiler. Parts of these data (85%) were randomly selected in order to train the network. The remaining data were used to test the performance of the trained network. The result obtained showed an acceptable error performance index of magnitude around 0.0674. The model also gave a correlation coefficient (R) between the estimated and target values as 0. 9781. Thus the proposed neural network can be used as a valuable tool as an energy use approximator in industrial production processes. Moreover, with the availability of more training data, an increased prediction capability can be achieved.

 

KEYWORDS

Artificial Neural Networks (ANNs), multilayer neural network (MLP), resilient gradient descent industrial processes, steam consumption prediction.

 

1 Introduction

One common as well as important Artificial Neural Networks (ANNs) application that finds itself in much practical use is function approximation. Function approximation range from determining realizable feedback function that relates measured outputs to control input in control systems to finding a function that correlated past values of an input signal to output in adaptive filtering. Lately, Artificial Neural Networks (ANNs) have been used extensively in finding underling functional relation of engineering processes. This pertains to the ability of ANNs to predict or solve non-linear problems with high degree of accuracy given enough data to learn from. A wide variety of ANNs have been used with varying configuration that suits the specific requirements of an application.

A widely used and efficient ANN function approximation is the MLP (multi-layer perceptron) networks based on the BP (back-propagation) learning algorithm. Though researches are still contributing to know more about these ANNS, several studies have exemplified the back propagation learning algorithm as the forerunner among the Multi-layer perceptron algorithms [1-3].

The accuracy and convergence speed of these MLPs usually depend on the neural network architectural configuration as well as choice of tuneable parameters during the implementation stage. In previous studies, researchers have used some techniques to solve real applications using these algorithms. However hardly any examples of industrial processes energy consumption prediction from production process were done. This paper is an attempt to answer this question by implementing one of the most powerful ANNs-MLP while trying to consider issues relating to their practical application.

Realization of perceptron concept by Rosenblatt in 1958 was the hallmark of ANNs. The perceptron unit is an individual processing unit that accepts weighted input and produces a rule based threshold output. MLP is a feed-forward ANN that is implemented by customizing these fundamental units. This customization introduced addition of layers of neurons and a nonlinear transfer function [2, 3].

 

2 Problem Formulation

Energy consumption determination is perhaps the first crucial element in demand side energy management (DSM). Additionally, in integration process of renewables such as solar plant in industries, knowledge of the load is a necessary requirement. To achieve this, direct measurements of generation and consumption can be done, otherwise known as an energy audit. This method is costly and might mean persistent measurements under different industrial production conditions. Another way is to get the industrial processes average energy consumption from a manufacturer’s specification. This method, even though simple, is not usually practically usable. This is because, it does not take into account the energy utilization under changing scenarios such as a not nominal operation, changing input parameters in production processes, and changing behavior of machines through its life cycle. The last method, which is proposed in this study, is to use ANNs to predict energy use patterns under real-time changing production processes. This however, requires a substantial data and several model configuration trials in order to generalize well.

This work is part of a larger project called “Control and Optimization of a Large-Scale Solar Plant in Ethiopian Textile Industry”. The aim of the project is primarily a smooth integration of an economically realizable solar plant for existing steam boiler’s feed water. During the course of this project, determining the thermal energy demand was deemed necessary for optimal sizing and operation of the solar plant. This task was difficult to achieve due to the absence of fund to do energy auditing. Neither was average thermal energy use determination possible since the factory was very old and no known specifications are available. These combined factors lead to the idea of predicting energy use patterns from other available related data through an ANNs. These related data are daily production from steam consuming machines and a KWh meter reading of a boiler.

The proposed research work employs the well-known BP algorithm for a multilayer feedforward neural network. The work has taken in to consideration all issues pertinent in practical implementation of these ANNs. To this end a Matlab scrip code was written that incorporates all the above mentioned issues and arrived at acceptable performance during run-time.

 

2.1 Fundamentals of Artificial Neural Networks (ANNs)

ANNs are defined as a collection of processing units with networks for interaction with each other through a weighted interconnection [3]. The whole aim of these networks is to replicate, in a rather simplified manner, the workings of a human biological central nervous system. The performance of these ANNs depends, in a not clearly defined manner, on the number, interconnection and interaction of these constituent units.

The aforementioned units are known as neurons. These neurons receive and give input signals to all other units of which they are connected.

A single neuron model is shown in Figure 1. The output strength from the neuron is determined from the function f, which itself depend on the value of weight (W) and bias (b) associated with each interconnection. The implementation process begins when an input is presented to the network and propagated through the network as an output by the transfer function otherwise known as activation function. For MLP this process goes on from neuron to neuron and layer by layer through the output layer that process and gives the final value. A typical MLP neural network is depicted in Figure 2.

In MLP the training is implemented by examples prior to their usage as a useful network. This training attempts to iteratively adjust connection weights and biases using a known training data. To facilitate this training, the outputs from the network are compared to the target examples, which are known as the error performance index (PI). This error is compared and propagated back through the network to adjust weights and biases until an acceptable PI is achieved.


 

Fig.1 A single neuron ANNs

Fig. 2 Three-Layer Network, Abbreviated Notation

The final stage of the network implementation involves fixing the adaptive weights and biases using the last values of the training stage. The network then computes the output directly to give an estimated value for the inputs.

 

2.2 The MLP Architecture and the Back Propagation (BP) Algorithm

The three-layer MLP network with the associated notation is depicted in Fig. 2.

For MLP the result from preceding layer feeds the following layer which is denoted by

Where is the number of layers in the network.

The neurons in the first layer accept network inputs:

The outputs of the network in the final layer are taken as outputs:

The target and input to the network are:

Where and are input and target for the network respectively.

The performance of the network is judged by the mean square error given as

Using the steepest descent algorithm, a formulation for recursive learning of the network is given as

Where is the learning rate.

Since the above error function does not have an explicit relation for the weights in the hidden layer, use of chain rule for derivatives manipulation. The chain rule for a function f with explicit variable n, the derivative for the implicit variable w could be found

Calculation of the second part of the above equations is now straightforward because there is a simple relation between the net input to layer and the weights and bias in that layer:

Thus

Let’s define

Where is the sensitivity i.e. the sensitivity of F that is associated with variation in the ith element of the net input layer m. Employing this definition results in a simpler form for equations (9) and (10) which is:

Thus the steepest descent algorithm can be generalized as

)

 

The condensed matrix representation is given by:

Where:

Here also the sensitivities will be computed using the chain rule. This computation of sensitivities which are determined from previous layers gave the name backpropagation to the algorithm.

Let’s now define the Jacobian matrix for backpropagation of the sensitivities:

Now let’s take the i, j element of the above matrix:

where

Thus, the Jacobian matrix is given as:

Where:

Finally using the chain rule the sensitivities can be given as:

These sensitivities are propagated backward layer by layer till the input layer as:

 

2.2.1 Resilient Gradient algorithm

Although the BP algorithm is the best among the MLP networks, in its basic form it has two major limitations-long learning time and possibility of local minima [1,3-5]. Thus a variant of the basic BP algorithm known resilient gradient method which is known to remove these drawbacks is utilized. [4,5]

In this algorithm, only the sign of derivative is used to determine the weight update value. The implementation of this algorithm follows the following rule:

a) If the partial derivative of the corresponding weight has the same sign for the two consecutive iterations, the weight update is increased by a factor say, ɳ+ otherwise

b) the weight update value is decreased by a factor ɳ- else

c) if the derivative is zero, then the weight update value remains same.

d) However, if the weight continues to change in the same direction for several iterations, the weight is increased by its update value otherwise the update value is reduced.

 

2.3 Implementation of BP Algorithm for steam -consumption prediction

The diagram in Fig. 3 depicts the ANNs training procedure followed. This procedure is a continuous iterative process starting from data collection and preprocessing stage to achieve more efficient neural network training. While at this first step, the data were partitioned into training and testing sets. Following this, selection of suitable network type and architecture (e.g., number of hidden layers, number of nodes in these layers) were done. Then choice of appropriate training algorithm from the multitude of available paradigms were carried out to handles the task. Finally, once the ANNs is trained, analysis to determine the network performance was done. This last stage has dealt with some practical issues with the data, the network architecture, and the training algorithm. The whole procedure is then iterated until an acceptable performance is achieved.

 

2.3.1 Pre-Training Steps

The pre-training steps comprises three separate tasks namely data collection, data Preprocessing, and choice of Network type and architecture.

 

2.3.1.1 Data Collection

Input data which are actual daily production from all steam consuming machines were collected for the year 2016 in Bahir Dar textile factory. Parts of these data are shown in Table 1 for August 2016. Further, daily total steam production from an electrical boiler (Collins Walker) was used as an output Data. The existing steam electrical boiler with its specification is given in Table 2. Meter readings for the same year and day as the input data were recorded. Table 1, last entry depicts these meter readings for August 2016.

 

2.3.1.2 Data pre-processing

The aim of this step is to lay a conducive ground for better network training. Though several data pre-processing steps exit in the literature, this work used feature extraction, normalization, and handling of missing data.

The available data for the ANNs output are meter reading of an electrical boiler. These data show the total electrical energy (KWh) consumed by the boiler. To make these data useful a manipulation to get the total steam delivered at the premises of the steam-consuming machines is done. The procedure is explained as follows:

The total steam delivered at the steam-consuming machines is given by

Where is the steam delivered, is the total boiler steam produced and is the steam transmission loss

The total daily steam produced by the boiler can be determined from

Table 1: Production rate vs boiler meter reading

Where is the daily electrical energy consumed by boiler, is the rated boiler power that relates to boiler steam production b in Kg as given in boiler specification Table 2.

The steam loss could range from 5-20% of the steam produced [6]. In the current model, a stochastic representation of this loss as a uniform distribution of the minimum and maximum values was used. This was done to reduce the uncertainty of quantifying the steam loss in the several varying steam distribution networks.

It is reported in [7] that rescaling or normalization of training data improves the learning and convergence of a network. The normalization procedure used in this work aims to adjust the data so that they have a specified mean and variance — typically 0 and 1. This can be done with the transformation

where is the minimum of the input vectors in the data set, and is the maximum value.

Practically what this normalization does is to shift zero of the scale and normalize the standard deviation of the data. Also shuffling of these data were done to decrease the effect of learning of the network for similar sets of data at the expense of another.

Because of limited data, we just can’t afford to simply throw out missing data. Rather, two strategies were used depending on whether the missing data was from input or output. When there was a missing input data, a flag to know this data (either a 1 or 0) were set and a replacement of this missing component with the average values of the input data were carried out. Instead when a missing data was present at the output a modification of the error performance was done in such a way that, for this particular data the performance calculation was skipped to nullify its contribution to learning process.

Finally, the collected data was divided in to two sets: training, and testing. The training set made up 85% of the full data set, with testing making up the remaining 15% each. Caution to make each of these sets representative of the full data set — that the test sets cover the same region of the input space as the training set were considered. For this, selections of each set from the full data set were done.

 
 

Fig.3 Backpropagation algorithm implementation

Table 2 Colline Walker boiler specification

Specification Description

Specification Value

Name and Type

COLLINE, electrical boiler

Permissible & working pres.

13 bar, 10.3 bar

Design & Max Steam temp.

190oC, 184oC

Rated steam output

3348Kg/hr./boiler

Power consumption

2106KW/boiler

Table 1 Input and output data for training

Date

Steam Consuming Machine Production (m2)

Meter (KWh)

Bleach

Wash

Calendar

Size

Jigger

Boiler

1-Aug-16

19700

27294

7413

13443

10600

38934

2-Aug-16

17276

12161

18325

14820

13090

38939

3-Aug-16

15500

23199

0

11154

11900

38942

4-Aug-16

10484

8765

4088

15187

8600

38947

5-Aug-16

13198

17699

15944

15275

6730

38950

6-Aug-16

22546

12974

19427

13622

7300

38954

7-Aug-16

0

4326

0

3654

800

0

8-Aug-16

0

3100

884

4163

3800

38454

9-Aug-16

8400

0

0

3760

4450

38960

10-Aug-16

1300

7923

0

8852

11950

38961

11-Aug-16

7300

13267

5500

13828

4800

38964

12-Aug-16

11950

18240

0

15654

14600

38968

13-Aug-16

28695

3700

3601

16306

72466

38972

14-Aug-16

16200

19671

0

15392

7300

0

15-Aug-16

13600

2986

7484

14709

17100

38980

16-Aug-16

4460

24151

0

14362

16000

38983

17-Aug-16

15137

20968

22084

14027

11900

38987

18-Aug-16

22000

13246

0

13042

11200

38992

19-Aug-16

23960

28312

0

13533

13200

38995

20-Aug-16

13620

19793

10306

12531

2000

39000

21-Aug-16

4200

3167

0

4538

500

0

22-Aug-16

9000

5730

10935

3769

4550

39005

23-Aug-16

9500

20940

26936

9029

5900

39007

24-Aug-16

8200

19855

17364

12430

12250

39011

25-Aug-16

17500

23378

192

13260

6500

39014

26-Aug-16

9880

19764

0

14060

9700

39018

27-Aug-16

0

7300

10944

12865

2100

39022

28-Aug-16

5150

3755

3760

9873

6300

0

29-Aug-16

0

0

1698

3017

0

39027

30-Aug-16

0

3900

0

0

0

0

31-Aug-16

0

0

0

0

0

0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2.3.1.3 Choice of Network Architecture

The universally accepted network architecture for fitting problems is the multilayer perceptron [1-3]. It was shown in [3] that this standard neural configuration uses tansig function in the hidden layers, and linear function in the output layer. This is because the former function produces outputs (which are inputs to the next layer) that are centered near zero, whereas the later function always produces positive outputs.

The choice of the optimum number of hidden units depends on many factors whose interactions are not easy to understand. These factors are amount of training data, number of input and output units, the level of generalization requirement from the network, type of transfer function and the training algorithm [8]. Conflicting trends are observed when the number of hidden units vary i.e. too few leads to under-fitting while too many results in over-fitting and slow learning process. However, it is highly unlikely to use more than two hidden layers for a standard function approximation problem [3].

To fix the number of neuron in the hidden layer, different authors suggest a rule-of thumb from their experiences. In [9] it is given as

Where n is the number of hidden neurons, and are number of neurons in input and output and a is a constant between 1 and 10.

Another work [10] suggested to use

Where is hidden neuron numbers, is number of training samples, are input and output neurons.

The authors strongly believe that the best way is to try multiple runs for a range of different hidden layers with different neurons in each layer and observe the network performance. For the current work, two hidden layers with ten neurons in each layer achieved the set performance criterion.

 

2.3.2 Training the Network

2.3.2.1 Weight Initialization

ANNs weights should be initialized with small random values. Since the BP algorithm work on the weights in a similar fashion, initializing these weights alike will eventually make all units learn in the same way. Similarly, these small random values will result in network output that corresponds to highest weight update [11]. In this work, effort has been made to make the performance of the final trained neural network independent of the choice of initial weight values. For that several runs of the network for different initial weight values were performed that has resulted in similar performance.

 

2.3.2.2 Choice of Training Algorithm

For multilayer networks to perform function approximation, the resilient gradient descent training algorithm provides a guaranteed performance minimization of the error function with relatively fast convergence rate [4, 12, 14]. In this work, this algorithm was tested to check its validity for the task at hand.

 

2.3.2.3 Stopping Criteria

For the majority of practical neural networks, the training error never converges identically to zero. As a result, other criteria for deciding when to stop the training is generally considered. There are several methods reported in the literature such as stooping when the performance index reaches a certain level, setting a high training iteration number, training for a fixed iteration then restarting the training with initial weights from previous training and stopping when the gradient of the performance index is sufficiently low [14,15]. For this work a stopping criterion when either the performance index is met or when a large number of iteration reached is implemented for the simple reason it met the practical requirement of the task.

 

2.3.3 Post-Training Analysis

Prior to concluding the work, analysis of the trained network to see if the training was successful is necessary. A powerful method of doing this is to do

curve fitting for regression between the trained network outputs and the corresponding targets [3]. For that, we fit a linear function of the form

where m & c are the slope & offset, respectively, of the linear function, is a target value, is a trained network output, and is the residual error of the regression.

The terms in the regression can be computed as follows:

(35)

Where,

A plot of this fitting to gauge the performance of the proposed ANNs is discussed in the results section.

 

3 Results and Discussions

A Matlab script file for the implementation of resilient gradient variant of the BP algorithm were written. This code was run for different learning rates and varying number of hidden neurons. The regression coefficient (R) and Mean Square Error (MSE) were compared. As can be seen from Fig. 4 the resilient gradient method shows superior performance as the complexity of the neural network increase.

Fig. 4 R and MSE values for different learning rates

0.1, 0.15, 0.2 and number of hidden neurons

Next we will consider performance of the best resilient configuration. Fig. 5 shows the regression analysis where the solid line represents the linear regression, the thin dotted line represents the perfect match, and the circles represent the data points. From this figure it is possible to see that the match is good, although not perfect. There are few points that seem to diverge from the regressed line. This might rise due to the presence of an incorrect data point, or because the data is far from other training points. The latter is the case here since the data used is not representative of all input space. Analysis of the scatter plot as shown in Figure 6 clearly shows the case.


Fig. 5 Regression results for the best BP

Fig. 6 Scatter graph of Input data

Addition of points that span the whole data space will improve the generalization capability of the proposed neural network. Additionally, the correlation coefficient between the estimated and target values, which is the R value was computed.

The R value varies from –1 to 1, however it is should be closer to 1 for prediction applications of BP algorithm. R=1 means all of the data points lie exactly on the regression line & R=-1 means they are randomly scattered away from the regression line. For this case as can be seen from Figure 4, the data does not fall exactly on the regression line, but the variation is very small.

The MSE and plot of the MLP output vs target values for the best network configuration are given in Figure 7 and 8 respectively.

 

 


Fig. 7 MSE of the best MLP

 

Table 3 gives the final result i.e. the steam consumption rate of each textile machine. For this the average production value of the machine is presented to the network as input. The output value is given in a range because of the random stochastic nature of steam loss and weight initialization used.

 
 

 

 

 

 


Figure 8 Trained vs target output

Table 3 Textile machine Steam consumption rate

Textile Machines Steam consumption (Kg/Kg)

Bleaching

Washing

Calendaring

Jigger

Sizing

0.6-0.9

0.7-1.1

0.8-1.4

1.2-4.5

7.8-9

 

  

 

 

4 Conclusion

From the results of a Matlab code implementation, it was found out that the resilient gradient descent algorithm of an MLP is a valuable tool for function approximation such as energy use prediction. However, practical considerations that relates to pre-processing as well as selection of representative input data were found to be a prerequisite before implementation.

Moreover, it was found out that the number of layers and the amount of neurons in those layers has a direct influence on the accuracy of the network. From the experiment it was found out that two hidden layers and hundred neurons on those layers has resulted in best performance of the network. However, the number of neurons in a layer could be reduced with the availability of more data to train the network.

 

REFERENCES

[1] Alsmadi, M., Omar, K., and Noah, S. “Back Propagation Algorithm: The Best Algorithm among the Multi-layer Perceptron Algorithm,” IJCSNS International Journal of Computer Science and Network Security 9 (4): 378-83, 2009.

[2] K. M. Hornik, M. Stinchcombe and H. White, “Multilayer feedforward networks are universal approximators,” Neural Networks, vol. 2, no. 5, pp. 359–366, 1989.

[3] Martin T. Hagan and Howard B. Demuth, Neural Network Design 2nd Edtion, eBook

[4] M. Riedmiller, H. Braun, “A direct adaptive method for faster backpropagation learning: the RPROP algorithm”, IEEE International Conference on Neural Networks, 1993.

[5] C.G. Looney, “Advances in feedforward neural networks: demystifying knowledge acquiring black boxes”, IEEE Transactions on Knowledge and Data Engineering, Volume: 8, Issue: 2,1996

[6] “Energy Audit of Bahir Dar Textile Share Company, Ethiopia”, Bangalore: The Energy and Resources Institute; 53 pp., Project Report No. 2013IB22, 2014

[7] J. Sola, “Importance of input data normalization for the application of neural networks to complex industrial problems”, IEEE Transactions on Nuclear Science, Volume: 44, Issue: 3, 1997

[8] N. Murata, S. Yoshizawa & S. Amari, “Network information criterion-determining the number of hidden units for an artificial neural network model”, IEEE Transactions on Neural Networks, Volume: 5, Issue: 6, 1994

[9] Saduf Afzal, Mohd. Arif Wani “Comparative Study of Adaptive Learning Rate with Momentum and Resilient Back Propagation Algorithms for Neural Net Classifier Optimization”

[10] Wahed, M. A “Adaptive learning rate versus Resilient back propagation for numeral recognition” Journal of Al-Anbar University for Pure Science, 94-105,2008

[11] D. Nguyen and B. Widrow, “Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights,” Proceedings of the IJCNN, vol. 3, pp. 21–26, July 1990.

[12] E. Barnard, “Optimization for training neural nets,” IEEE Trans. on Neural Networks, vol. 3, no. 2, pp. 232–240, 1992.

[13] T. P. Vogl, J. K. Mangis, A. K. Zigler, W. T. Zink and D. L. Alkon, “Accelerating the convergence of the backpropagation method,” Biological Cybernetics., vol. 59, pp. 256–264, 1988.

[14] W. S. Sarle, “Stopped training and other remedies for overfitting,” In Proceedings of the 27th Symposium on Interface, 1995.

[15] C. Wang, S. S. Venkatesh, and J. S. Judd, “Optimal Stopping and Effective Machine Complexity in Learning,” Advances in Neural Information Processing Systems, J. D. Cowan, G. Tesauro, and J. Alspector, Eds., vol. 6, pp. 303- 310, 1994.

Cite this paper

Fitsum Bekele Tilahun, Ramchandra Bhandari, Menegesha Mamo. (2017) Industrial Process Steam-Consumption Prediction through an Artificial Neural Networks (ANNS) Approach. International Journal of Mechanical Engineering, 2, 72-81

 

cc.png
Copyright © 2017 Author(s) retain the copyright of this article.
This article is published under the terms of the Creative Commons Attribution License 4.0