BADANIA EMPIRYCZNE - BADANIACH EKONOMICZNYCH M ETODY ILO Ś CIOWE W

Normowaniu poddano wartości emisji zanieczyszczeń pyłowych powietrza z zakładów szczególnie uciążliwych. Badaniu poddano 379 powiatów. Dane za-czerpnięto z GUS-u, dotyczą one 2005 roku. Wskaźnik emisji zanieczyszczeń py-łowych wybrano ze względu na dużą jego rozbieżność w zależności od charakteru zakładów znajdujących się na terenie poszczególnych powiatów. Istnieje wiele powiatów o niewielkiej emisji zanieczyszczeń zbliżonej praktycznie do zera. Są to powiaty z obszarów nieuprzemysłowionych. Istnieje również wiele powiatów znajdujących się w rejonach mocno uprzemysłowionych gdzie emisja zanieczysz-czeń jest znaczna. Ponadto istnieje niewielka liczba powiatów, na terenie których znajdują się bardzo duże zakłady przemysłowe, jak na przykład elektrownie. Za-kłady te emitują kilkaset razy więcej zanieczyszczeń niż typowe zaZa-kłady. Stanowią one obiekty nietypowe zakłócające wartości miar syntetycznych.

Na podstawie wartości wskaźnika emisji zanieczyszczeń pyłowych wyzna-czono wartości zmiennej przez podzielenie wartości wskaźnika przez liczbę zareje-strowanych w danym powiecie firm. Otrzymano w ten sposób zmienną: emisja zanieczyszczeń pyłowych na sto firm. Zmienną tą zestandaryzowano z wykorzy-staniem zwykłego odchylenia standardowego oraz ważonego odchylenia standar-dowego. Zestandaryzowane wartości dla pierwszych stu wartości przedstawiono na rys. 2. Wykorzystanie standaryzacji z wagami spowodowało zwiększenie wahań zestandaryzowanej zmiennej.

Rysunek 2. Porównanie standaryzacji i standaryzacji z wagami dla pierwszych stu obiektów

Źródło: obliczenia własne

0 20 40 60 80 100

-2 -1,5

-1 -0,5

0 0,5 1 1,5 2

Wartość zmiennnej po standaryzacji

Numer obiektu

Standaryzacja Standaryzacja z wagami

192 Kesra Nermend Badaną zmienną podano unitaryzacji zerowanej oraz unitaryzacji z

warto-ściami progowymi (rys. 3). Wartości progowe były wyliczone na podstawie odchy-leń standardowych. Przyjęto jako w_σ wartość 1,5. Unitaryzacja z wartościami progowymi spowodowała zwiększenie oscylacji wartości, ale jednocześnie dowała znaczne przesunięcie wykresu w górę. To przesunięcie w górę jest spowo-dowane niesymetrią rozkładu wartości, przy czym w zależności od formy niesyme-trii uzyskujemy przesunięcie w górę lub w dół. Te ostatnie jest szczególnie nieko-rzystne, gdyż pojawiają się wartości ujemne. W przypadku niektórych metod two-rzenia miar syntetycznych jest to niedopuszczalne.

Rysunek 3. Porównanie unitaryzacji zerowanej i unitaryzacji z wartościami progowymi liczonymi z histogramu dla pierwszych stu obiektów

Źródło: obliczenia własne

Wady związanej z przesunięciem wykresu nie mają wartości progowe wy-znaczane z histogramu. Histogram, który posłużył do wyliczenia wartości progo-wych miał sto przedziałów. Lewe i prawe progi wyznaczenia wartości progoprogo-wych były równe i określone zostały na dwadzieścia elementów. W rezultacie uzyskano znaczne zwiększenie wartości wahań zmiennej przy bardzo małym przesunięciu wykresu. Niedogodnością tej metody wyznaczania wartości progowych jest fakt, że konieczna jest dość znaczna liczba obiektów do dokładnego określenia wartości progowych.

0 20 40 60 80 100

0 0,2 0,4 0,6 0,8 1

Wartość zmiennnej po unitaryzacji

Unitaryzacja

Unitaryzacja z progami

Numer obiektu

PODSUMOWANIE

Przetestowano różne metody eliminacji wpływu obiektów nietypowych na normowanie zmiennych. Wszystkie one powodują zwiększenie wahań wartości, co wpływa na zwiększenie rozróżnialności obiektów typowych ze względu na daną zmienną. W przypadku unitaryzacji, przy małej liczbie obiektów, wartości progo-we można wyznaczyć przy pomocy odchylenia standardoprogo-wego. W tym przypadku dobrze byłoby, gdyby rozkład wartości zmiennej był symetryczny. Przy dużej licz-bie olicz-biektów można wyznaczyć wartości progowe z histogramu. W tym przypadku warunek symetrii rozkładu wartości nie jest konieczny.

LITERATURA

Amir D. A. (2000) Statystyka w zarządzanium PWN, Warszawa

Borys T. (1978) Metody normowania cech w statystycznych badaniach porównaw-czych, Przegląd Statystyczny, nr 2

Grabiński T., Wydymus S., Zeliaś A. (1989) Metody taksonomii numerycznej w modelowaniu zjawisk społeczno-gospodarczych, PWN, Warszawa

Kolenda M. (2006) Taksonomia numeryczna. Klasyfikacja, porządkowanie i anali-za obiektów wielocechowych, Wydawnictwo Akademii Ekonomicznej im. Oskara Langego we Wrocławiu,Wrocław, ISBN 83-7011-805-4

Kozak R., Staudhammer C., Watts S. (2007) Introductory Probability and Statis-tics: Applications for Forestry and Natural Sciences. CABI, ISBN 1845932757 Kukuła K. (2000) Metoda unitaryzacji zerowanej, PWN, Warszawa, ISBN 83-01-13097-0

Nermend K. (2008) Rachunek wektorowy w analizie rozwoju regionalnego, Wy-dawnictwo Naukowe Uniwersytetu Szczecińskiego, Szczecin, ISBN 978-83-7241-660-5

Nowak E. (1990) Metody taksonomiczne w klasyfikacji obiektów społeczno-gospodarczych, PWE, Warszawa, ISBN 83-208-0689-5

Standardization of Variables Describing Untypical Objects

Abstract: In many investigations the problem of untypical objects, whose characteristics are described by very large values, appears. Such objects may affect significantly the investigations results due to the reduction of the scope of variables in the process of standardization. Negative impact of the untypi-cal values can be minimized by the use of certain methods of standardization.

The article presents two such methods: standardization with the weighted standard deviation and unitarization with threshold values.

Key words: untypical objects, standardization methods

METODY ILOŚCIOWE W BADANIACH EKONOMICZNYCH

X, 2008, str. 194 - 206

IS MULTIPLE LINEAR REGRESSION THE PROPER TOOL OF MODELLING A BEHAVIOUR OF REAL SYSTEMS?

Jacek J. Nowak

Katedra Zarządzania SW im. Bogdana Jańskiego w Warszawie e-mail: jacek.nowak@janski.edu.pl

Abstract: Methodological assumption that multiple linear regression is an adequate tool of modelling the behaviour of real systems is checked. To do this the experiment is organised on the basis of simple “real” system repre-sented as finite discrete automaton. Main result is that in situation of “black box” modelling the approximation of output variables with multiple linear regressions (from several samples and under different conditions) may not fulfil any of criterions of feasible approximation of systems behaviour, also in situations where real relation between input and output variables is strictly linear and only one of variables is omitted.

Keywords: model of real system, multiple linear regression, real system structure, discrete automaton, „black box” modelling, quality of approxima-tion

INTRODUCTION

After almost 40 years the main postulates of W. Leontief on modelling real economic systems are still actual¹. In present paper we take one of steps on this difficult road of verification of assumptions of modelling real systems.

Multiple linear regression is often used to describe then analyse, forecast or simulate the behaviour of such real systems as economic, environmental, technical and social ones. Most of these models are quantitative models based on statistical data. There still appear failures or unsatisfactory results when applying such quan-titative models especially to forecast or simulate behaviour of economic systems. It

1 We mean especially postulate: “What is really needed, in most cases, is a very difficult and seldom very neat assessment and verification of these assumptions in terms of observed facts.” [W. Leontief, 1971, p. 2].

is often pointed out by experienced practitioners and methodologists² that these failures are caused mainly by structural changes in modelled systems³ or by one or more principal factors from listed below:

a) omitted or misspecified variables,

b) multicollinearity (near multicollinearity) between explanatory variables, c) using bad data, (e.g. inadequate or incomplete or data with errors), d) stochastic nature of relations,

e) series of data containing time trends,

f) misspecification of the functional form of relations between variables, g) inadequate, incomplete (or lack of) theory of the modelled system.

After the period of significant creativity resulting in many ideas of models taking into account structural changes⁴ in recent years the greater attention in re-searches is paid to testing hypotheses of structural break⁵.

But still the more fundamental problem than hypotheses testing is how to construct the adequate model. The opinion is that if there exists no one of above mentioned reasons of failures of models’ applications, excluding lack of or incom-plete theory, then the regression model with good ratios of its stochastic structure ought to be proper for representing, forecasting or simulating given real system.

But models of real systems, especially econometric ones, are constructed of-ten in situations of “black box” modelling. It means that the data, which modeller possess, are describing only input and output series but are not describing (internal) states of modelled system.

In [Nowak 2000] the following problem was stated: Is the multiple regres-sion an adequate tool of modelling and forecasting in “black box” situations under the assumption that there exists no one of above mentioned reasons of failures of models’ applications (excluding lack of theory or incomplete theory)?

To answer the above question there was conducted an experiment of model-ling simple system under above assumption. We have generated several samples of observations, estimated parameters of linear regressions with two explanatory (in-put) variables and approximated endogenous (out(in-put) variables within and out of samples (using true, observed values of explanatory variables).

Instead of fulfilling the above assumption⁶ and very high values of determi-nation coefficients, R², the quality of approximation of dependent variables out of samples was very poor. The greater part of regressions produced mostly

2 See e.g. Mayer (1993), Broemeling (1982), Hendry, Richard (1983), survey in Nowak (1981).

3 The main reason of these structural changes is the nature of economic (social) real-ity (cf Marschak, 1950 or Leontief, 1971).

4 See e.g. survey of such ideas in Nowak (2004).

5 See e.g. Bai (1999), Elliott and Müller (2006) or Juhl and Xiao (2009).

6 We could not avoid only some correlation between explanatory variables in some samples instead of generating their values on the base of random tables.

196 Jacek J. Nowak mation errors⁷ greater then 10% of really observed values of endogenous variables.

The remaining regressions produced nearly half of such infeasible errors. There were several errors greater than 100% (!) of approximated values, some of them greater than 500%. The whole sample, consisted of all generated observations, numbered 34 elements. In addition, great variability of estimates of regression parameters was observed (from one sample to another sample).

There exist several practical criterion s of adequacy (feasibility) of approxi-mations (or forecasts). We can take into account one or combination of at least three following criterions:

1) no error is greater than 10% of observed (approximated) value (very strong condition),

2) no more then 10% of errors is greater than 10% of observed (approxi-mated) values,

3) mean of absolute values of relative errors is not greater than 10%.

Results of [Nowak 2000] showed that none of above criterions of feasibility of approximation was fulfilled by any regression from any sample.

The general conclusion of this investigations was that in situation of “black-box” and instead of lack of principal reasons of failures of modelling multiple lin-ear regression is not the adequate tool of modelling behaviour of system.

The goal of present investigations is to check if the above described results were not casual and conclusion will be valid in situations of changed two types of conditions: A) with another initial states and B) quite different sequence of obser-vations. To check this we must repeat the mentioned experiment with two parts according to A and B.

W dokumencie BADANIACH EKONOMICZNYCH M ETODY ILO Ś CIOWE W (Stron 191-196)