 VARIANCE-TO-MEAN RATIO IN ICT TRAFFIC

(1)

_______________________________________

* West Pomeranian University of Technology.

Przemysław WŁODARSKI*

VARIANCE-TO-MEAN RATIO IN ICT TRAFFIC

In this paper the ICT network traffic analysis from the point of view of the variance- to-mean ratio is presented and discussed. The analysis is based on the real traffic captured at the West Pomeranian University of Technology. The results of averaged traffic analysis over different time scales shows the relation between variance-to-mean ratio and the level of self-similarity, which in turn affects the quality of service in ICT networks.

KEYWORDS: Variance-to-mean ratio, self-similarity, ICT network traffic

1.INTRODUCTION

Network traffic analysis is very crucial in anomalies identification [1, 2], denial-of-service attacks detection [3] or traffic modeling and prediction [4, 5].

As far as the attacks is concerned, the most popular way is to use many dispersed hosts or even networks to aggregate requests to the server in order to suspend particular service or host (DDoS – Distributed Denial of Service). The most symptoms can be observed by analysing network performance, which is usually slow as a result of high bandwidth consumption. Furthermore, not only flood attacks are effective in blocking services or servers. There are many methods that exhibits original patterns in network traffic variability, which can be captured by variance-to-mean ratio (VMR) analysis.

Another application of VMR in network traffic is the Hurst exponent estimation associated with the self-similarity and long-range dependence that is a common phenomenon in ICT traffic [6, 7, 8]. One of the main property of self-similarity is that the autocorrelation function decays very slowly and is not summable:

 







0

,

k

H k

r (1)

where: ^r



^k ^H



²

 

^k ¹



²^H ²^k²^H ^k ¹²^H



, 2    

for k0,1,. (2)

(2)

Another property, resulting from (1) and (2), is that the variance of aggregated random variable X_k^(m⁾ is proportional to the aggregating level raised to the power of 2H2:



^X ^m



^~ ^^m²^H^²

Var  (3)

where: X_k^{ }^m m^¹



X_km__m_₁X_km



(4) Hurst exponent H can be estimated from (3) by logarithm operation and linear regression. It is the measure of self-similarity and its values for positive values of autocorrelation r



k,H



0 for k0 are within the range of 0.5 and 1. A special case when H0.5 is for uncorrelated time series when r

 

k 0.

2. VARIANCE-TO-MEAN RATIO

Variance-to-mean ratio (VMR) for random variable X, also called as dispersion index, is defined as [9, 10]:

   

 

X E

X X Var

VMR  (5)

Variance represents how dispersed the data are and is closely related to the variability of numbers in the analysed time series. On the other, expected value corresponds to the average level of intensity (e.g. number of packets per time unit). The relation in (5) enables the analysis of second order properties ignoring fluctuation of the sample mean. VMR can be used as a simple measure to detect a change in the time series variability. Although there are no criteria how to determine a threshold level, it could be derived from the properties of the particular issue. There are special cases of VMR values, where random variables are related to specific distributions. For example, for the Poisson distribution VMR

 

X 1, for binomial distribution 0VMR

 

X 1 and for negative binomial distribution ^VMR

 

^X ^¹^.

Taking into account (4) and (5), on can obtain another measure associated with VMR factor – index of dispersion for counts, which is defined as:



























 



L

1 j

j L

1 j

j E X

X Var ) L (

IDC (6)

which is proportional to the aggregating level, like for the variance case in (4):

 

L ~ L²^H^¹

IDC  (7)

One can find that the slope of linear regression can be used to estimate the value of Hurst exponent.

(3)

3. RESULTS

The results are based on the measurement of the real computer network traffic recorded in 2014 on the main Ethernet switch (HP ProCurve 4000M) located in the building of the West Pomeranian University of Technology, Faculty of Chemical Technology and Engineering. Incoming and outgoing traffic was monitored on the 1000Base-LX port (1 Gb/s) connected to the main academic router located at the Academic Centre of Computer Science. Number of packets that passes through this port was saved to the log file every 10 ms.

The measurement lasts 7372.8 seconds (about 122 minutes). In order to examine variability and Hurst exponent value over time, all data was divided into 720 blocks, 1024 samples in each block. The overall sample mean equals to 17 pkts/10 ms (1700 pkts/s). The overall variance was 38.371 pkts/10 ms (standard deviation: 6.194 pkts/10 ms). Fig. 1 and show the mean traffic intensity and variance for all blocks.

0 20 40 60 80 100 120

10 15 20 25

time (min.)

mean traffic intensity (pkts/10ms)

Fig. 1. Mean traffic intensity scaled in packets per 10 ms

0 20 40 60 80 100 120

1 2 3 4 5

time (min.)

Variance-to-mean ratio

P1 P2

Fig. 2. Average values of the variance-to-mean ratio

(4)

The mean value of packets per time unit changes smoothly over time (Fig. 1) while the influence of traffic intensity is removed in VMR results (Fig. 2). One can observe some peaks and higher values (above 2) that can reflect different properties of the traffic. In order to investigate these properties, a period P1 – P2 for which values of VMR are relatively high was chosen. Value of P1 is 6144 s (about 102.4 min.), value of P2 is 6502.4 s (about 108.3 min.). For comparison purposes 3 random blocks of data within the range P1 and P2, and 3 random blocks that are outside this range, was selected and analysed from the point of view self-similar properties.

Index of dispersion for counts (IDC) method was applied to estimate the value of Hurst exponent. The results of this estimation is presented in Fig. 3 and Table 1. All points in fig. 3 corresponds to the values calculated according to the 6 and 7 equations after logarithm operation, for particular level of aggregation (x axis). Solid lines and points denoted by boxes corresponds to the results of selected period. Dotted lines and points denoted by x corresponds to the results for other randomly selected data out of the selected range. One can see that for data within the range P1 and P2, the value of Hurst exponent is significantly higher (values above 0.8) than for data out of this range (values below 0.6). It can be caused by some network traffic pattern related to specific service provided by remote server or local network device.

1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2

1 0.5 0 0.5 1 1.5

random trace (1a) regression for random trace (1a) random trace within the range of P1 and P2 (1b) regression for random trace within the range of P1 and P2 (1b) random trace (2a)

regression for random trace (2a) random trace within the range of P1 and P2 (2b) regression for random trace within the range of P1 and P2 (2b) random trace (3a)

regression for random trace (3a) random trace within the range of P1 and P2 (3b) regression for random trace within the range of P1 and P2 (3b)

log(aggregation level)

log(IDC)

Fig. 3. The results of the index of dispersion for counts for random traces in and out of the range of P1 and P2

(5)

Table 1. The results of Hurst exponent estimation

Trace Slope coefficient Hurst exponent

trace 1a 0.0700 0.535

trace 2a 0.1560 0.578

trace 3a 0.1118 0.556

trace 1b 0.6871 0.844

trace 2b 0.7291 0.865

trace 3b 0.8018 0.901

4. CONCLUSIONS

ICT network traffic that is highly overdispersed exhibits additional, statistical second-order properties related to self-similarity and long-range dependence. VMR analysis makes it possible to remove the sample mean bias connected with traffic intensity fluctuations that change very slowly over time.

The results show a relationship between VMR values and the values of Hurst exponent that are responsible for the level of self-similarity. It is not known why this relationship holds but it is probably caused by some properties of a network service behavior, hosted by a server or other network device, locally or remotely. Each network service (or even protocols) can be characterised, from the point of network traffic, by the rules associated with the amount of transmitted data and times when these data occurs. Analysed traffic is responsible for producing specific, highly correlated network traffic patterns.

However, the collected data cannot provide enough information to answer the question what exactly causes this phenomenon, but it is a good starting point for further research. Moreover, one of the well-known consequences of higher values of H is the worse network performance because of less efficient operation of queueing systems implemented in all network devices. This in turn involves the lower quality of service and difficulties in traffic shaping and congestion control.

REFERENCES

[1] Barford P., Kline J., Plonka D., Ron A., A Signal Analysis of Network Traffic Anomalies, Proc. 2nd ACM SIGCOMM Workshop on Internet measurment, 2002, pp. 71-82.

[2] Kim M., Kong H., Hong S., Chung S., A Flow-based Method for Abnormal Network Traffic Detection, NOMS, IEEE/IFIP, Volume 1, 2004, pp. 599-612.

(6)

[3] Shinde, P., Guntupalli, S., Early DoS Attack Detection using Smoothened Time- Series and Wavelet Analysis, Information Assurance and Security, 2007. IAS 2007, pp. 215-220.

[4] Hinich M. J., Molyneux R. E., Predicting information flows in network traffic, Journal of the American Society for Information Science and Technology, Volume 54, Issue 2, 2003, pp 161–168.

[5] Field A. J., Harder U., Harrison P. G., Measurement and Modelling of Self- similar Traffic in Computer Networks. IEE Proc.-Commun., 2004, vol. 151 issue 4, pp. 355-386.

[6] Leland, W., Taqqu, M., Willinger, W., Wilson, D., On the self-similar nature of Ethernet traffic (extended version). IEEE/ACM Transactions on Networking, 2, 1 (1994), 1-15.

[7] Taqqu, M., Willinger, W., Scherman, R. Proof of a Fundamental Result in Self- Similar Traffic Modeling. ACM SIGCOMM Computer Communication Review, 27, 2 (1997), 5-23.

[8] Park, K., Kim G,. T., Crovella, M. E., On the relationship between file sizes, transport protocols and self-similar network traffic. Technical report, Boston University, Computer Science Department, 1996.

[9] Faber M. H., Statistics and Probability Theory, Springer, 2012.

[10] Garvey P. R., Book S. A., Covert R. P., Probability Methods for Cost Uncertainty Analysis: A Systems Engineering Perspective, Second Edition, Chapman and Hall, 2015.

(Received: 5. 02. 2016, revised: 4. 03. 2016)