• Nie Znaleziono Wyników

Classification of countries belonging to the european union with respect to the internet usage

N/A
N/A
Protected

Academic year: 2021

Share "Classification of countries belonging to the european union with respect to the internet usage"

Copied!
7
0
0

Pełen tekst

(1)

MONIKA KRAWIEC SGGW Warszawa

Summary

The new Information and Communication Technologies (ICTs) have made a great impact on all aspects of European society and economy changing the way in which people do business, how they learn and spend their leisure time. They are also becoming a major force behind the global development. Access to these technologies is spreading rapidly. In the last few years the dependence on and increasing impor-tance of the Internet have grown significantly. Thus the aim of the paper is to pro-vide a classification of European Union members with respect to the Internet usage by the use of multivariate data analysis on the base of statistical data obtained from EUROSTAT.

Keywords: information technology, internet usage, cluster analysis 1. Introduction

During the past decade Information and Communication Technologies (ICTs) have become available, i.e. accessible and affordable, for the general public. IT, telecommunication, television and other media of electronic information transfer are more and more commonly used in process of products and services exchange in countries representing the high level of economic develop-ment [3]. However, a gap remains between users and non-users or between “haves” and “have nots”. There are several reasons for this “digital divide”: from missing infrastructure or access, to missing incentives to use ICTs, to a lack of computer literacy or skills necessary to take part in the Information Society [1].

The Information Society has produced a numerous collection of new Information and Com-munication Technologies that today have transformed the approach to global development. Access to these technologies is spreading rapidly, for example in 2005 the number of Internet users in developing countries crossed 500 million. By some estimates more than 75 percent of the world’s population now lives within range of mobile network and the number of mobile phones reflects the highest dynamics of increase [5].

Thus, Information Technologies are becoming a major driving force behind the development of our civilization today heading towards Information Society. At the same time Information Soci-ety developments present us with new challenges and opportunities across all areas of sociSoci-ety. The impact of Information and Communication Technologies is transforming economic and social activity. The Information Society Commission agrees that broadband telecommunications infra-structure is increasingly seen as having an importance in the 21st century that parallels that of elec-tricity in the 20th and railroads in 19th (www.isc.ie).

(2)

The dependence on and increasing importance of the Internet have also grown tremendously over the last few years. Thus the aim of the paper is to provide a classification of European Union members with respect to the Internet usage by the use of multivariate data analysis on the base of statistical data obtained from EUROSTAT.

2. Methods of analysis

Multivariate methods involve analysis of more than one statistical variable at a time. This en-ables comparison of considered objects characterized by many varien-ables assessing a given com-plex phenomenon. The group of objects may include countries, administrative units, enterprises, households etc. These objects can be compared, one to another, by using an available set of diag-nostic variables. The data collected is usually displayed in a matrix where rows represent observa-tions and columns - variables.

In the multivariate comparative analysis it is important to ensure that the final diagnostic vari-ables are comparable. It means, among others, that it is necessary to strip varivari-ables of their natural units by which diagnostic characteristics are expressed and bring the variables to a state when they lend themselves to comparison, which implies smoothing of the range of variability of the charac-teristics. In order to achieve this, methods of normalization of diagnostic variables are used [9]. There are many normalization procedures described and applied in the literature [4]. In this re-search we applied standardization - the most commonly used normalization method.

Cluster analysis is a classification method that is used to arrange a set of objects into clusters. The aim is to establish a set of clusters such that objects within a cluster are more similar to each other than they are to objects in other clusters. Algorithms designed to perform cluster analysis are usually divided into two classes called hierarchical and nonhierarchical methods [6]. A hierarchical clustering is a sequence of partitions in which each partition is nested into the next partition in the sequence. In our research we applied this approach.

A key step in a hierarchical clustering is to select a distance measure. The common distance measures are the Euclidean distance, the squared Euclidean distance, the "City-Block" distance and plenty of others [2]. Given a distance measure, objects can be combined. Hierarchical clustering builds (agglomerative), or breaks up (divisive), a hierarchy of clusters. The traditional representation of this hierarchy is a tree data structure (called a dendrogram). Dendrogram summa-ries the process of clustering. Similar objects are joined by links whose position in the diagram is determined by the level of similarity between the objects. Agglomerative algorithms begin at the top of the tree, whereas divisive algorithms begin at the bottom.

The agglomerative methods begin with each object being considered as separate clusters and then proceeds to combine them until all objects belong to one cluster. The very often recom-mended algorithms for hierarchical clustering are average linkage, complete linkage and Ward's linkage [6].

• In average linkage clustering the distance between two clusters is defined as the average of distances between all pairs of objects, where each pair is made up of one object from each group.

• Complete linkage (called also furthest-neighbor method) method measures the distance be-tween clusters through the distance bebe-tween the two points in the clusters furthest from one another.

(3)

• Ward’s linkage is distinct from two above methods because it uses an analysis of variance ap-proach to evaluate the distances between clusters. In short, this method attempts to minimize the sum of squares of any two clusters that can be formed at each step.

Each linkage algorithm may produce different results using the same data, so we decided to apply all three above mentioned methods.

One of the biggest problems in cluster analysis is identifying the final number of clusters. As the fusion process continues increasingly dissimilar clusters must be fused, i.e. the classification becomes increasingly artificial. Deciding upon the proper number of clusters is largely subjective, although looking at a graph of the level of similarity at fusion versus number of clusters may help. There will be sudden jumps in the level of similarity as dissimilar groups are fused. The agglom-eration distance obtained that way enables us to fix a final number of clusters via dendrograms. 3. Research results

The analysis was carried out on the base of the most recent complete data obtained from the EUROSTAT data base for the year 2005. Our data set contained about 5% of missing values. In that case values of means for variables were regarded. Countries with more than 20% of missing values were excluded, so finally 23 European countries were investigated from the point of view of features (variables) referring to the Internet usage published within the Information Society section. The set of proposed variables included:

x1 - percentage of households with Internet access at home (percentage of households who have Internet access at home; all forms of Internet use are included; the population considered is aged 16 to 74);

x2 - share of households having a broadband connection (the availability of broadband is measured by the percentage of households that are connectable to an exchange that has been con-verted to support xDSL-technology, to a cable network upgraded for Internet traffic, or to other broadband technologies; it covers all households having at least one member in the age group 16 to 74 years.);

x3 - percentage of individuals who accessed the Internet on average at least once a week (this indicator covers all individuals aged 16 to 74 who access the Internet, on average, at least once a week, within the last three months before the survey);

x4 - share of individuals using the Internet for interacting with public authorities - for obtain-ing information from public authorities web sites (this indicator covers all individuals aged 16 to 74);

x5 - share of individuals using the Internet for interacting with public authorities - for downloading official forms (this indicator covers all individuals aged 16 to 74);

x6 - share of individuals using the Internet for interacting with public authorities - for sending filled forms (this indicator covers all individuals aged 16 to 74);

x7 - share of individuals having ordered/bought goods or services for private use over the Internet in the last three months (this indicator covers all individuals aged 16 to 74, financial in-vestments are excluded);

x8 - share of enterprises having a broadband connection - the availability of broadband is measured by the percentage of enterprises that are connectable to an exchange that has been con-verted to support xDSL-technology, to a cable network upgraded for Internet traffic, or to other broadband technologies.(it consists of enterprises with 10 or more full-time employees);

(4)

x9 - share of enterprises using the Internet for interacting with public authorities - for obtain-ing information (covers all enterprises with 10 or more full-time employees);

x10 - share of enterprises using the Internet for interacting with public authorities - for obtain-ing forms (covers all enterprises with 10 or more full-time employees);

x11 - share of enterprises using the Internet for interacting with public authorities - for return-ing filled in forms (covers all enterprises with 10 or more full-time employees);

x12 - share of enterprises having received orders on-line. (this indicator covers on-line selling via Internet and EDI or other networks within the previous year; (only enterprises selling more than 1% on-line are included; enterprises with 10 or more full-time employees are covered);

x13 - number of personal computers per 100 inhabitants;

x14 - broadband penetration rate - number of broadband lines subscribed in percentage of the population (this indicator shows how widely broadband access to the internet has spread in the countries on the general level, not specifying by user group);

x15 - number of internet hosts per 100 inhabitants.

At the beginning a preliminary data analysis was carried out. In order to establish discriminat-ing capacities of characteristics, variations of diagnostic variables were evaluated. All variables were characterized by coefficients of variation higher than 20%, so at this stage none of variables was eliminated. Then, dendrograms obtained by the use of various methods were analyzed. There are three key pieces of information that one can get from the dendrogram. They are: the rough percentage of all individuals that fall within each cluster; how similar to one another the elements of a cluster are and how different one cluster is from its closest neighbor. Figure 1 presents den-drogram for European countries obtained by the use of Ward’s method.

Ward's method SE DK NL LU FI DE IT SK HU PL LT GR CZ LV CY UK PT SL ES BE IR EE AT 0 50 100 150 200 250 300 D is ta n c e

Fig. 1. Dendrogram for European countries1 obtained by the use of Ward’s method (squared Euclidean distance)

1Acronyms: AT - Austria, BE – Belgium, CZ – Czech Republic, CY – Cyprus, DE - Germany, DK - Denmark, EE –

Estonia, ES – Spain, FI – Finland, GR – Greece, HU – Hungary, IT – Italy, IR - Ireland, LT – Lithuania, LV – Latvia, LU – Luxemburg, NL - Netherlands, PL – Poland, PT Portugal, SE – Sweden, SK- Slovakia, SL – Slovenia, UK - United King-dom.

(5)

The number of clusters was assessed graphically on the base of dendrogram, so one could state that in 2005 there were two main clusters. The first cluster could be divided into two groups. The first one of them contained two countries: Sweden and Denmark, the second one – four countries: Netherlands, Luxembourg, Finland and Germany. The second cluster contained three groups of countries. Belgium, Estonia, Ireland, United Kingdom, Austria, Slovenia, Portugal and Spain be-longed to the first group. Poland, Greece, Slovakia, Lithuania, Hungary, Italy and the Czech Re-public formed the second group. Cyprus and Latvia were the members of the third group. Detailed results of clustering by the use of Ward’s method are displayed in Table 1.

Table 1. Clusters obtained by the use of Ward’s method (squared Euclidean distance)

Cluster: Countries

1  Sweden, Denmark

 Netherlands, Luxembourg, Finland, Germany

2

 Belgium, Estonia, Ireland, UK, Austria, Slovenia, Portugal, Spain

 Poland, Greece, Slovakia, Lithuania, Hungary, Italy, Czech Rep.

 Cyprus, Latvia

Source: own computations

As it was mentioned, different clustering methods may provide different results, so we decided to apply two other methods, too. Both, the complete linkage method and the group average method also resulted in two clusters, the same as those given in Table 1. The only difference was the fact that United Kingdom joined the first cluster.

In the next step of researches coefficients of correlation for all considered diagnostic variables were evaluated and are presented in Table 2.

Table 2. Matrix of correlation coefficients evaluated for diagnostic variables

x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 x1 1,00 x2 0,86 1,00 x3 0,88 0,91 1,00 x4 0,84 0,79 0,94 1,00 x5 0,70 0,65 0,81 0,87 1,00 x6 0,58 0,57 0,71 0,76 0,87 1,00 x7 0,82 0,61 0,76 0,78 0,65 0,59 1,00 x8 0,71 0,83 0,76 0,71 0,58 0,53 0,55 1,00 x9 0,21 0,26 0,28 0,20 0,30 0,35 0,26 0,24 1,00 x10 0,20 0,24 0,26 0,22 0,32 0,38 0,29 0,22 0,93 1,00 x11 0,14 0,22 0,16 0,12 0,15 0,35 0,10 0,29 0,67 0,68 1,00 x12 0,71 0,61 0,65 0,57 0,36 0,41 0,74 0,57 0,50 0,45 0,37 1,00 x13 0,70 0,49 0,65 0,70 0,61 0,69 0,84 0,53 0,26 0,27 0,13 0,67 1,00 x14 0,85 0,96 0,87 0,77 0,61 0,53 0,61 0,88 0,26 0,20 0,18 0,65 0,50 1,00 x15 0,73 0,78 0,78 0,74 0,60 0,51 0,49 0,57 0,29 0,27 0,18 0,55 0,34 0,80 1,00

(6)

As the correlation matrix revealed the existence of strong and significant relationships between diagnostic variables we decided to eliminate some of them. For this purpose method of reduction given in [8] was employed. Basing on the correlation coefficients matrix, we decided to eliminate strongly correlated diagnostic variables (absolute value of critical level of the correlation coeffi-cient was equal to 0,7), so the following variables were only accepted for further cluster analysis: x3, x9, x11, x12 and x13. Once again we applied Ward’s method and we obtained results which are presented in Table 3.

Table 3. Clusters obtained by the use of Ward’s method after reducing number of variables

Cluster: Countries

1  Sweden, Denmark

2  Luxembourg, Finland, Germany

 Netherlands, Belgium, Estonia, Ireland, Austria, Slovenia, Slovakia 3

 Poland, Spain, Hungary, Italy, Czech Rep.,  Portugal, Greece, Lithuania

 Cyprus, Latvia Source: own computations

This time, there were three clusters. Their composition was identical with this obtained by the use of complete linkage method, while the average group method provided a little different results. Anyway, regardless the method applied, Sweden and Denmark always belonged to the same group of “leaders” with regard to the Internet usage, while Cyprus and Latvia were the “losers”. Sweden and Denmark together, reported maximal values for 13 of 15 considered diagnostic variables.

However, Estonia and Slovenia were the “leaders” among new EU members. These two coun-tries revealed the highest proportion of enterprises having internet (above 90%) and the highest share of enterprises having broadband connection (67% in Estonia and 74% in Slovenia). On the other hand - the worst situation had place in Cyprus and Latvia – below 50%. Estonia and Slovenia were also the best in a group of new Member States with respect to the percentage of households having Internet access at home.

Unfortunately, in 2005 Poland together with Hungary and the Czech Republic belonged to the group of “weak” countries. In Poland, despite two variables: “Share of enterprises using the Inter-net for interacting with public authorities for obtaining and returning forms” (x10 and x11), all other diagnostic variables had lower values than the average in European Union.

Generally, our studies confirmed that the highest use of computers and Internet was observed in Nordic countries, while the lowest rates were reported in countries being the new European Union members. In our opinion the main reasons of this situation appear to be too high costs of access and/or equipment costs as well as lack of skills to use the Internet.

4. Concluding remarks

The new Information and Communication Technologies have made a great impact on all as-pects of our society and economy changing the way in which people do business, how they learn and spend their leisure time. It is obvious that applied properly Information Technology is a source for economic development and enhanced quality of life through increased openness and

(7)

inter-change of information as well as better public and private service. These create important chal-lenges for the Polish government.

The research results presented in the paper show that in general Internet penetration is the highest in Scandinavian countries and tends to be lower in the new EU Member States as well as in the Mediterranean countries. In Poland use of Information Technologies is really insufficient and our country belongs to the so called “B-team” with regard to Information Technology.

Although the former government released a draft for a large scale effort on IT entitled “Pro-posed directions of Information Society development in Poland up to 2020”, this what Poland needs the most is a clear and stabile policy that would be realized by successive governments re-gardless of their political provenience. Moreover, costs of access to Internet and of equipment that are one of the highest in Europe should be reduced. Then, such an overall strategy could hopefully lead Poland into the “A-team” in respect of ICTs use.

Bibliography

1. Demunter Ch. (2005): The digital divide in Europe. [in] Statistics in focus. No 38, pp. 1-7.

2. Everitt B. S., Dunn G. (2001): Applied Multivariate Data Analysis. Arnold Publishers, London, pp. 213.

3. Kubiak B.F., Korowicki A. (2006): Computer networks and multimedia techniques in development of virtual organizations and e-commerce. [in] Studies and proceedings of Polish Association for Knowledge Management. No 6. Bydgoszcz, pp. 44-56.

4. Kukuła K. (2000): Zero Unitarization Method. PWN, Warsaw.

5. Łukasik-Makowska B. (2005): Social adaptation of students to new Information and Communication Technologies (ICTs). [in] Models and methods of managing informa-tion and knowledge. Wroclaw University of Economics Pub., pp. 160-171 (in Polish). 6. Milligan, G.W. and Cooper, M.C. (1985): An Examination of Procedures for

Determin-ing the Number of Clusters in a Data Set. [in] Psychometrika. No 50, 159-179. 7. Timm N. H. (2002), Applied Multivariate Analysis. Springer-Verlag, New York, pp.

445-465.

8. Witkowska D. (2002): Artificial neural networks and statistical methods. C.H. Beck Pub., Warsaw, pp. 71-73 (in Polish).

9. Zelia A. (2002) Some Notes on the Selection of Normalization of Diagnostic Variables. [in] Statistics in Transition. Vol. 5, No 5, pp. 787–802.

HANNA DUDEK

e-mail: hdudek@mors.sggw.waw.pl MONIKA KRAWIEC

e-mail: mkrawiec@mors.sggw.waw.pl Szkoła Główna Gospodarstwa Wiejskiego Wydział Ekonomiczno-Rolniczy

Katedra Ekonometrii i Informatyki

Cytaty

Powiązane dokumenty

Klemma (2002: 151) odnosi siê wprawdzie do pojêcia tekstu, ale doskonale obrazuje poruszany tu problem – wydaje siê wypowiedzi¹, któr¹ mo¿na okreœliæ jako metodê z³ote-

W niniejszej rozprawie doktorskiej przedstawiono różne, niezależne metody służące optymalizacji (obniżeniu) dawki pacjenta podczas diagnostyki chorób serca z

W odniesieniu do celów i wymagań dotyczących eksploatacji portu lotniczego, oprócz wymienionych powyżej – dotyczących zapewnienia bezpieczeństwa oraz ciągłości

Свое первоначальное «узкое» значение слово чуж ина (с ударением на втором слоге) сохранило в украинском языке до сего дня (хотя оно

Jest niesprzeczny, pełny (A jest tautologią zawsze i tylko wtedy, gdy A jest dowodliwe), rozstrzygalny (istnieje pro­ cedura wykazywania tautologiczności, która

Na nowy model polityki społecznej, oparty na paradygmacie inwestycji, powinny — według Ferrery (2013) — składać się następujące czynniki: nastawienie polityki społecznej

Podstawowa wiedza tyflopsychologiczna na temat specyfiki procesów poznawczych u osób z głęboką niepełnosprawnością wzroku i wynikają- ce z niej zasady tworzenia tyflografiki

Kolumna „Co już wiecie” daje panu Fua jaśniejszy obraz poziomu zrozumienia ułamków i braków w wiedzy na ich temat wśród uczniów; kolumna „Jak się tego nauczycie”