• Nie Znaleziono Wyników

The structure of do it yourself activities; an application of nonmetric principal components analysis

N/A
N/A
Protected

Academic year: 2021

Share "The structure of do it yourself activities; an application of nonmetric principal components analysis"

Copied!
39
0
0

Pełen tekst

(1)

DTB WDRKING PAPER

>-(!)

o

-J

o

en

Z

E

:c

Q)

u t )

W

>-J-en

o

c::

Z

0

«

'+=ï

en

Ë

. W ~

U

0 Z~

W"=:

U"C

en

c::

>-

ca

U

en

_ c::

-J 0

o

'+=ï

a..

~

0 : =

o

a.

u.

a.

J-

~

::::>!

J-

::s

-

a.

J-

E

en

0

Zu

:c~

U

0

o:ë

W

Q)

en

E

wt::

0:

ca

al

a.

J-

Q)

0 0

THE STRUCTURE OF DO IT VOURSELF

ACTlVITIES; AN APPLICATION OF

NONMETRIC PRINCIPAL COMPONENTS

ANALVSIS

Henny Coolen

Gaston Hilkhuysen

'.

92-01

(2)

THE STRUCTURE OF DO IT YOURSELF

ACTIVITIES; AN APPLICATION OF

NON-METRIC PRINCIPAL COMPONENTS ANALYSIS

8ibl iotheek TU DelH

11111111111

(3)

OTB WORKING PAPER

om

Research Institute for Policy Sceinces and Technology Delft University of Technology

Thijsseweg 11, 2629 JA Delft Telephone (015) 783005

(4)

THE STRUCTURE OF DO IT YOURSELF

ACTIVITIES; AN APPLICATION OF

NON-METRIC PRINCIPAL COMPONENTS ANALYSIS

Henny Coolen Gaston Hilkhuysen

(5)

The OTB Working Papers are published by: Delft University Press

Stevinweg 1 2628 CN Delft Telefoon: (015) 783254 On behalf of:

OTB Research Institute for Policy Sciences and Technology Thijsseweg 11

2629 JA Delft

CIP-DATA KONINKLUKE BffiUOlHEEK, DEN HAAG Coolen, Henny

The structure of do it yourself activities; an application of nonmetric principal components analysis / Henny Coolen, Gaston Hilkhuysen. - Delft : Delft University Press. - lli. - (OTB Working Paper / OTB Research Institute for Policy Sciences and Technology ISSN 0923-9871 ; 92-01)

ISBN 90-6275-773-1 NUGI655

Trefw.: do it yourself; housing; statistical analysis

No part of this book may be reproduced in any form by print, photoprint, microfilm or any other means without permission from the publisher, Delft University Press, Delft, The Netherlands.

(6)

CONTENTS

1 INTRODUCfION ... . . . . . . . . . . . . . . . . . . . . .. 1

2 DATA . . . 3

3 NONMETRIC PRINCIPAL COMPONENTS ANALYSIS . . . 7

3.1 Principal components analysis . . . 7

3.2 Nonmetric principal components analysis . . . .. 9

4 RESULTS OF TIIE ANALYSIS .. . . . . . . . . . . . . . . . . . .. 13

4.1 Component loadings . . . 13

4.2 Fit of the Components . . . .. 16

4.3 Clusters of DIY activities . . . .. 18

4.4 Optimal transformations . . . 20

4.5 Relationship with other variables . . . .. 20

5 CONCLUSIONS . . . ,. 25

(7)

'Is that it?' said Eeyore. 'Yes, ' said Christopher Robin. 'Is that what we were looldng for?'

'Yes, ' said Pooh.

'Oh!' said Eeyore. 'Weil, anyhow - it didn 't rain, ' he said. AA. Mi/ne

(8)

1

INTRODUcrION

Researchers of ten collect data from a number of subjects on a relatively large number of measures, which are thought to be indicators of a smaller number of basic variables so called latent variables. Given these data one then wants to find out whether the observed scores can be described in terms of such a small set of latent variables. This can be done by a technique called principal components analysis. The aim of this technique is the orderly simplification of a number of interrelated measures (Burt, 1940). The terms latent variabie and component have the same meaning in this context. When a set of measures has, for some reason, a great deal in common a latent variabie may be said to exist. Whether such a variabie 'exists' is not a statistical matter. The researcher decides on its interpretation and has to show its validity and usefulness for research.

Principal components analysis is a weil known technique, but it can only be applied to numerica! variables. In a lot of research situations (e.g. surveys) the researcher is confronted with a set of non-numerical (ordinal or nomina!) variables, of which one also wants to find out whether this set can be reduced to a much smaller set of latent variables. This can be done by a nonmetric principal components analysis, a technique which is not so weil known as its numerical counterpart.

The purpose of this paper is to describe this technique and to show its applicati-on to a set of 40 napplicati-on-numerical measures of Do It Yourself activities. The data to which we apply nonmetric principal components analysis wil! be described in paragraph 2. A concise description of the technique may be found in paragraph 3. Paragrapb 4 contains tbe results of tbe application of nonmetric principal components analysis to tbe measures of Do It Yourself activities. In tbis para-graph we also try to validate the solution of the analysis and we relate it to otber variables not included in the analysls to show its usefulness in further research. Paragraph 5 of the paper contains the conclusions.

We wish to thank our colleague Ans Metselaar for her stimulating comments on a preliminary draft and her valuable suggestions on several aspects of our paper.

(9)
(10)

2

DATA

Since the autumn of 1982 the OTB Research rnstitute for Policy Sciences and Technology (OTB) of the University of Delft has evaluated several 'Do it Yourself (Dry) projects on the Dutch housing market. The data which are analysed in this paper come from the following projects:

ERA flats complex, Zoetermeer; the primary interest of the researchers were the Dry activities within the rental sector. The complex involved 288 housing units built in 1970 with moveable inner walls. rn 1983 the current tenants of 104 housing units filled out a questionnaire concerning their abilities to perform Dry activities. The project is reported in Hoenderdos and Metselaar (1985).

Zevenkamp, Rotterdam; in this project 28 new owner-occupiers built their own patio bungalows in timber frame constructions. Afterwards the owners received a questionnaire about their experiences with the Dry activities. The survey was held in 1984 and reported in Priemus, Van Bokhoven and Groetelaers (1989).

Jubileumplan, Huizen; a contractor built the shells of 93 houses. T4e new tenants completed the shells. During the project they filled out two ques-tionnaires monitoring their preferences for and experiences with Dry activities. Tbe project, which took place in 1985, was reported by Hoender-dos and Metselaar (1990).

Build-It-Yourself, Amsterdam; nineteen new owner-occupiers completed shells, built by a contractor. In 1986 they had completed their DlY activ-ities and they received a questionnaire concerning their experiences, activities and abilities.

The projects differed with respect' to several relevant characteristics for Dry work. One of these characteristics is the housing sector in which a project takes place. Zoetermeer en Huizen are in the rental sector and Amsterdam and Rotterdam in the owner-occupier sector. Owner-occupiers are known to be more active in DlY work than people who rent a house (KnuIst, 1983). It also makes a difference wh ether the project concerns already build houses (Zoetermeer) or newly build ones (the other projects), since in the latter situation one has more opportunities for DIY activities. Finally it seems noteworthy to realize that some of the projects (Huizen, Amsterdam and Rotterdam) were intended as Dry

(11)

projects. It seems reasonable to assume that these projects attract people who are not afraid of DIY activities. Although the projects differed with respect to these characteristics, for most of the analyses the four sets of respondents were taken together and treated as one dataset. In paragraph 4.5 we shall come back to the differences between the projects.

The four questionnaires contained the same list of 40 DIY activities. The respondents indicated whether they could do the DIY activity themselves, or only with the help of someone or whether they had to leave the DIY activity to a

crafts-man. The categories were originally coded as 1, 2 and 3 respectively. The DIY activities and the relative frequencies of the responses for the whole dataset are shown in Table 1.

In this paper we shall try to answer several questions with respect to the DlY activities. What are the relationships between the DlY activities? Did each question about a particular DIY activity provide new information about the respondent, or would one question concerning its overall skill or a few questions have been sufficient?

If it was not the overall skill that determined whether one did an activity without any help or whether one left it to a craftsman, what did determine the responses? One cao think of the bulk of the activity, the risk of the activity or the type of mate rial used.

Apart from the responses on the DIY activities, the respondents also differed with respect to age, education, and the project they were involved in. Did these variables distinguish between their responses on the DIY activities?

Before we start answering these questions we shall give a concise description of the technique we used to help us answer them.

(12)

Table 1 Relative frequencies of responses on DIY activities (N=244)

by with by no

DlY activity oneself help craftsman resp

1 Replace a central-heating boiler 21 20 57 2

2 InstaIl centraI heating 21 22 56 1

3 InstaIl a boiler 26 19 54 1

4 Make a gas-connection 34 20 46 1

5 Construct dormer window 29 26 43 2

6 Repair flat roof 40 17 40 2

7 InstaII an outlet 49 14 37 0

8 Repair tiled roof 42 15 41 3

9 Construct an e1ectricity circuit 34 28 38 0

10 Sweep the chimney 44 10 43 3

11 Repair gutter 47 15 36 2

12 Connect outside tap 49 16 33 2

13 InstaIl bathroom equipment 51 18 30 0

14 Fix insulation under roof 52 13 32 2

15 InstaIl and connect a kitchen-unit 51 18 31 1

16 Fix insulation under ground floor 51 15 33 1

17 Connect a tap 53 16 30 0

18 Plaster a waII 38 27 34 1

19 Build a waII of sand-lime brides 35 29 30 5

20 Hang a door 57 18 24 1

21 Build a brick waII 34 34 31 1

22 Tile a waII 54 18 27 1

23 Tile afloor 55 17 28 1

24 Put g1ass in a window or door 63 13 23 1

25 InstaIl a plug soeket 72 14 14 0

26 Clean gutter 77 4 18 1

27 Lay a parquet floor 44 24 29 2

28 Construct a waII of wood and plasterboard 67 13 20 0

29 Construct a lower ceiling 57 20 22 1

30 Repair a loek 67 11 22 0

31 Put a loek in door 71 11 18 0

32 Carpetting 68 14 17 1

33 Wainscot a waII 75 11 13 1

34 Pave garden path 81 8 10 1

35 Change tapwasher 89 6 6 0 36 Exterior painting 85 6 9 0 37 Hang lamps 92 4 4 0 38 WaIlpapering 81 12 7 0 39 Interior painting 91 5 4 0 40 Whitewasbing 89 6 5 0

(13)
(14)

3

NONMETRIC PRINCIPAL COMPONENTS

ANALY-SIS

Principal components analysis (PCA) is a multivariate technique for analyzing the relationships within a set of numerical variables. It is mainly used for two purposes. To identify groups of inter-correlated variables, viz. to find the structure within a set of variables, and to reduce the number of variables being studied. These purposes stem from the fact that of ten phenomena under study are observed through multiple indicators. One then wants to find out whether the variables are associated as expected and whether the set of variables under study can be reduced to a much smaller set which can be used for further analysis.

Hotelling (1933) gave the first systematic account of principal components analysis as a data analytic technique by using homogeneity ideas as his starting point. He looked for the linear combination of the observed variables with the largest variance, which is the same as the linear composite with maximum sum of squared correlations with the variables. This linear combination is called the principal component. Second and consecutive best solutions, which are rêquired to be orthogonal to the already computed solutions, resulted in additional components.

Eckart and Young (1936) introduced a natural way to define principal compo-nents analysis for p compocompo-nents, i.e. to formulate optimality properties which hold for the fust p components simultaneously. They used the least squares properties of the singular value decomposition to obtain a simultaneous solution in p components. Although this is a more natural way to introduce multidimensi-onal solutions, it results in the same solution as a successive analysis (Rao, 1964). In the next paragraph we shall describe this simultaneous solution.

3.1 Principal components analysis

Assume we have n observations on m standardized variables hj, j

=

1, ... ,m, which are collected in the nxm matrix H. The typical element hij of H contains the standardized score of obervation i on variabie j. In principal components analysis

(15)
(16)

component loadings and they are collected in the matrix A. In figure 1 cos (a)

represents the component loading of variabie j on component 1 and cos (90 - a)

gives the loading of this variabie on the second component.

Formally the PCA solution can be found by minimizing the least squares loss function

(1)

where H, A and X are defined as before and wh ere tr means trace. Eckart and Young (1936) showed that the solution for optimal X and A can be found by computing the singular value decomposition of Hand by retaining the largest p

singular values with corresponding singular vectors. Let H

=

KALT be the

p-truncated singular value decomposition of H with K a nxp matrix of left singular

vectors and KTK

=

I, L a mxp matrix of right singular vectors and LTL

=

I, and

A a pxp diagonal matrix containing the the p largest singular values. Now X

=

K

and A = LA are the solutions to the PCA problem for the component scores X

and the component loadings A.

Textbooks on multivariate analysis of ten consider PCA as a technique to analyse

and decompose the correlation matrix of a set of variables. If H is standardized

then HTH

=

R is the correlation matrix. N ow R

=

HTH

=

(KALT) T (KALT)

=

LA2LT. is called the eigenvalue decomposition of R. So taking the largest p

singular values of H is the same as taking the largest p eigenvalueso f R. The

solution for the component scores X and the component loadings A are now

given by A

=

LA and X

=

HLA (Green, 1976).

3.2 Nonmetric principal components analysis

The exposition of PCA so far applies to numerical variables only, because the computation of correlations is somewhat problematical when some or all of the variables are non-numeri cal. In order to be able to use non-numerical variables

in PCA one will have to code them numerically. The correlations between the

variables and consequently the PCA solution, which is based on these correlati-ons (see above), will now both dep end on the particular codings chosen. This means that different codings of the variables will generally lead to different PCA

solutions. In nonmetric principal components analysis (NMPCA), -as outlined

here, one is looking for codings of non-numerical variables which are optimal in a well-defined sense.

Non-numerical variables in research are generally of two types: ordinal or

nominal. An ordinal variabie consists of a set of orde red classes. An example of

such a variabie is level of education with categories low, middle low, middle high

and high. Nominal variables are classifications made up of just a set of

equiva-lence classes. The variabie type of housing tenure with categories private renter,

(17)

Figure 2 AD example of an identity (Ieft) and a monotone (right) transfor-mation

transformed scores trensformed scores

4.5 r - - - , ? r - - - ' - - - , 4 6 3.5 5 3 4 2.5 3 2 1.5 2 1 L-.----'-_-'--'---'-_--'---''---' o 1 2 3 4 5 6

originel codes originel codes

and ordinal variables will be made suitable for PCA by optimally sealing these variables. The general idea bebind optimal sealing is to scale the variables in a way that optimizes an objective criterion, which in the case of PCA would be an appropriate version of 10ss function (1). The terms sealing, coding and transfor-mation are used interchangeblely in this paper. A sealing of a variabIe is a real valued function defined on its values. We shall use the notation Sj: hj -+ R. The

type of sealing that is employed will be determined by what we knowor assume

about a variabie, i.e. which measurement level (numerical, ordinal or nominal) we associate with a variabIe.

For nominal variables the transformation of such a variabie is required to maintain the equivalence structure of the original values. Such a transformation is ealled an identity transformation. Let '~>' be the relation 'has the same value as', then we can express this restriction as

hIJ PRECSUCC hij

-which means that observations in the same class remain in the same class af ter the transformation.

For ordinal variables we require in addition that the transformation be monoto-ne with the order of the original values. If '~' denotes the empirical order relation, the additional constraint for ordinal variables becomes

-10

(2)

(18)

Both types of transformations are illustrated in figure 2. A more elaborate treatment of measurement levels and optimal scaling can be found in Young (1981) and Gifi (1990).

We can now outline the NMPCA problem that has to be solved. Let Y be the nxm matrix with optimally scaled observations, i.e. Yj

=

Sj(hj) (j

=

1, ... ,m). In NMPCA one now has to minimize the loss function

a(Y;X;A) = tr (Y - XA Y(Y - XA 1)

under the conditions

U Ty .

=

0 I T Yj Yj

=

1 Yj e Cj for j

=

1, ... ,m (4) (5)

Here u is a n vector with only ones. The first two conditions guarantee that the variables are standardized. The notation Yj €

e

j indicates that there may be scaling restrictions for variabie j.

So in the approach to NMPCA we advocate here not only the component scores X and the component loadings A but also the scalings of the variables Y are parameters that have to be estimated.

Minimization of (4) can be accomplished by using an iterative algorithm based on the principle of alternating least squares (ALS) (Young, 1981; Gifi, 1990). This principle involves the partitioning of the set of parameters into subsets. The ALS algorithm then minimizes the loss function by altematingly optimizing it with respect to one of the subsets. At each stage of the algorithm this gives the conditional least squares estimates of one of the subsets, while keeping all the other parameters fixed at their current value. Once the conditional least squares estimates of a subset are obtained the old estimates of these parameters are replaced by the new ones. The algorithm then switches to another set of parame-ters and the process is repeated. By 'cycling through the sets of parameters in this way we obtain a convergent algorithm. For the minimization of loss function (4) each iteration would consist of two steps. The conditional least squares estimates of the component scores X and the component loadings A for fixed scalings of the variables Y can be computed, as in ordinary PCA, by means of a singular value decomposition of Y. Minimization of (4) over the scaling parameters Y for fixed X and A amounts to solving a normalized co ne regression problem (Gifi, 1990).

(19)

The computer program that we used for the computation of the NMPCA solutions that are presented in this paper is called PRINCALS (Gifi, 1985). It minimizes for computational reasons (Gifi, 1990) the 10ss function

o(Y;X;A)

=

tr (X - YA 1)T(X - YA

1)

(6)

It can be shown (Gifi, 1990) that minimizing (6) is equivalent to minimizing 10ss function (4).

(20)

4

RESULTS OF THE ANALYSIS

The columns of the data matrix represent the 40 DIY activities, the respondents form the 244 rows. In the analysis each DIY activity was treated as an ordinal variabie. This means that during the process of optimal scaling, the order of the categories of each variabie is fixed, viz. doing the activity onese/f, doing the activity with the help of someone or leaving it fo a craftsman. In contrast to a classical PCA, the category quantifications were allowed to vary as long as their order remained fixed. Hence only monotone transformations were allowed. In our analysis a two dimensional solution was computed, i. e. the first two principal components were extracted.

4.1 Component loadings

Table 2 shows the component loadings and the fit of the solution. The second and third columns show the component loadings of the variables, i. e. the correlations. of the optimally scaled DIY activities with the principal components. As usual the squares of these correlations indicate the proportion explained variance. They are displayed under the heading 'fit' (columns 4 and 5).

The first component explained more than 40 percent of the varianee for all DIY activities except for 'install a plug socket', 'carpetting', 'change tapwasher', 'exterior painting', 'hang lamps', 'wallpapering', 'interior painting' and 'white-washing'. On the second component the following DIY activities had a fit larger than 10 percent: 'replace a central-heating boiler', 'install central heating', 'carpetting', 'pave garden path', 'change tapwasher', 'exterior painting', 'hang lamps', 'wallpapering', 'interior painting' and 'whitewashing'.

Notice that the DIY activities with a relative low fit on the first component, have a relative high fit on the second component (i. e. 'carpetting', 'change tapwasher', 'exterior painting', 'hang lamps', 'wallpapering', 'interior painting' and 'white-washing'). The first component accounted for the highest possible amount of common variance. The second component accounted for the highest amount of common variance unexplained by the first component. DIY activities that were badly represented on the first component, but had a large amount of variance

(21)

Table 2 Component loadings and fits of DIY activities on first and second component ordered on the component loadings of the second component.

Component loadings Fit on component on component

1 2 1 2

1 Replace a central-heating boiler -.691 .346 .4TI .120 2 InstalI central heating -.711 .333 .506 .111 3 InstalI a boiler -.739 .312 .546 .rD7 4 Make gas-connection -.742 .290 .551 .084

5 Construct dormer window -.804 .246 .646 .061 6 Repair flat roof -.806 .228 .650 .052 7 lnstall an outlet -.849 .210 .721 .044

8 Repair tiled roof -.747 .198 .558 .039 9 Construct an electricity circuit -.688 .192 .473 .037 10 Sweep tbe chimney -.707 .186 .500 .035 11 Repair gutter -.810 .186 .656 .035 12 Connect outside tap -.853 .165 .728 .027

13 InstalI bathroom equipment -.840 .135 .706 .018 14 Fix insulation under roof -.824 .131 .679 .017 15 InstalI and connect a kitchen-unit -.866 .120 .750 .014 16 Fix insulation under grounf floor -.818 .111 .669 .012 17 Connect a tap -.794 .099 .630 .010 18 Plaster a wall -.705 .093 .497 .009 19 Build a wall of sand-Iime bricks -.794 .093 .630 .009

20 Hang a door -.742 .053 .551 .003

21 Build a brick wall -.729 .043 .531 .002 22 Tile a wall -.TI1 .012 .594 .000

23 Tile afloor -.760 -.045 .578 .002 24 Put glass in a window or door -.818 -.049 .669 .002 25 InstalI a plug socket -.596 -.059 .355 .003 26 Clean gutter -.657 -.064 .432 .004 27 Lay a parquet floor -.655 -.066 .429 .004 28 Construct a wall of wood and plasterboard -.825 -.091 .681 .008

29 Construct a lower ceiling -.825 -.105 .681 .011 30 Repair a loek -.706 -.118 .498 .014 31 Put a loek in door -.723 -.139 .523 .019

32 Carpetting -.554 -.352 .307 .124 33 Wainscot a wall . -.756 -.352 .572 .124 34 Pave garden patb -.638 -.424 .407 .180 35 Change tapwasher -.504 -.450 .254 .203 36 Exterior painting -.579 -.484 .335 .234 37 Hang Iamps -.456 -.542 .208 .294 38 Wallpapering -.387 -.596 .150 .355 39 Interior painting -.454 -.641 .206 .411 40 Whitewashing -.507 -.693 .257 .480 Eigenvalue .520 .083

14

(22)

Figure 3 0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.7 -1

Plot of the component loadings

DIY activities are represented as vectors. The component I~dings detennine

their directions. Numbers correspond with the numbers of the DIY activities in Table 1.

2

5

39

-O.B -0.6 -0.4 -0.2

in common are represented on the second component.

Figure 3 shows the DIY activities as vectors in the plane defined by the two principal components which form the reference axes. The component loadings determine the direction of the vectors. It has some interesting characteristics. For eacb DIY activity tbe sum of its fits over tbe components equals tbe squared lengtb of its vector, wbicb tberefore signifies tbe proportion of variance in tbe DIY activity that the two components account for. A DIY activity that is weIl represented in the solution has a long vector, whereas a short vector indicates that the components scarcely represent the corresponding DIY activity. So 'whitewashing' (40) is better represented in the two-dimensional solution than for instance 'instaIl a plug socket' (25).

(23)

Figure 4 2 o -1 -2 -3 -4 -4

Plot of the component scores

The plot shows the repondents represented as plus sips, relatlve to the two components. +

~+l++++

+14"+ -++

1

+

+"#- + +~ ++ +

-+

++

$

~+

+ + + + + ++ +-:If-+ +t++ +4,+ ,t-+ + + +

}*F~~

+ + + + + + + ++ + + + + + + +

f

t + + ++ + + +

t

+ + + + ++ + ++ + -2 0

4.2 Fit of the Components

The mean fit over the variables within a component, called the eigenvalue,

shows the significance of a component. It indicates the proportion of the total

variance in all the DIY activities that a component accounts for. Because each

component explains the largest amount of variance unaccounted for by previous components, the size of the eigenvalues decreases for each new component. The bottom row of Table 2 shows the eigenvalues of the two components. The first one has an eigenvalue of .52, indicating that the 40 questions concerning the respondents DIY activities had a large amount of variance in common. The eigenvalue for the second component is .08. Hence the solution with two components accounted for 60% of the total variance in the DIY activities of which almost 90% was accounted for by the first principal component. This clear dominance of the fust component suggests that one component may be the best representation for the 40 DIY activities. Apart from this first component the 16

(24)

Table 3 Category coordinates on the tirst component

by with by

oneself help craftsman Replace a central-heating boiler .958 .643 -0.589 InstalI central heating .942

.

6n

-0.625 Install a boiler .909 .644 -0.682 Make a gas-connection .808 .436 -0.796

Construct dormer window .891 .526 -0.899 Repair flat roof .778 .447 -0.956 InstalI an outlet .762 .168 -1.085

Repair tiled roof .712 .423 -0.879 Construct an electricity circuit .711

.m

-0.854 Sweep the chimney .677 .489 -0.799

Repair gutter .753 .186 -1.029

Connect outside tap .766 -.003 -1.155 InstalI bathroom equipment .714 -.003 -1.210

Fix insulation under roof

.

n7

.028 -1.109 InstalI and connect a kitchen-unit .766 -.073 -1.217

FIX insulation under ground floor .708 .035 -1.340 Connect a tap .671 -.071 -1.134

Plaster a wall .676 .226 -0.952

Build a waU of sand-lime brides .808 .168 -1.138

Hang a door .544 -.029 -1.256

Build a briek wall .724 .196 -1.047

Tile a wall .633 -.195 -1.177

Tile afloor .630 -.216 -1.134

Put glass in a window or door .595 -.548 -1318 InstalI a plug socket .340 -.384 -1.352 Clean gutter .348 -.364 -1.353" Lay a parquet floor .598 .047 -0.961 Construct a wall of wood and plasterboard .557 -.666 -1.440 Construct a lower ceiling .635 -.248 -1.418 Repair a loek .470 -.513 -1.218 Put a loek in door .433 -.488 -1.428

Carpetting .348 -.341 -1.123

Wainscot a wall .409 -.763 -1.733 Pave garden path .303 -.822 -1.659 Change tapwasher .175 -.984 -1.745 Exterior painting .225 -.535 -1.745 Hang lam ps .129 -1.275 -1.868 Wallpapering .163 -.349 -1.287 Interior painting .126 -.588 -2.052 Whitewashing .162 -.611 -2.008

(25)

activities did not have much varianee in common. This idea of one-componental data was further confirrned when we plotted the respondents relative to the two components, using their component scores. As can be seen in figure 4 they formed approximately a so called 'horse shoe', which is also an indication of one-dimensionality of the data (Van Rijckevorsel, 1987).

Table 3 aided the interpretation of the first component. It shows the category coordinates of the DIY activities, which are the mean component scores within a certain category of a DIY activity, on the first component. The category leave the

activity to a craftsman had a negative category score for all the variables, while

can do activity oneself showed positive category scores. Therefore the first

component seemed to represent the overall subjective skill of the respondent. This interpretation was further confirmed when we found a high positive correla-tion of .95 between the total number of activities a respondent claimed he or she could do without any help and the component scores on the first component. The number of activities a respondent claimed he or she had to leave to a craftsman had a high negative correlation of -.98 with the component scores of the fust component.

4.3 Clusters of DIY activities

Although the second component had a low eigenvalue, hence was of little significanee, it allowed for an interesting additional interpretation. Figure 3 clearly shows two distinet clusters of DIY activities. Nine DIY activities form the lower cluster: 'whitewashing', 'interior painting', 'wallpapering', 'hang lamps', 'exterior painting', 'change tapwasher', 'pave garden path', 'wainscot a wall' and 'carpetting'. These DIY activities concern minor jobs, without much risk neither for the performer nor for failures. Most of them are performed when one moves to another house. The large contrast between the upper and lower cluster could possibly mask a subclustering of activities within the large upper cluster. We see in figure 5 what happens if we zoom in on the upper cluster. It shows that the upper cluster contains several subclusters.

The cluster at the bottom of figure 5 consists of nine DIY activities: 'put a loek in door', 'repair a loek', 'construct a lower ceiling', 'contruct a wall of wood and plasterboard', 'lay a parquet floor', 'clean gutter', 'install a plug soeket', 'put glass in a window or door' and 'tile a floor'. These DIY activities concern jobs with risk for loss of material in case of improper performance. There is no risk for the performer. Four of the jobs concerned carpentry. On top of this cluster there are three DIY activities: 'tile a wal!', 'build a brick wall' and 'hang a door', which do not form a separate cluster. They are intermediate types between the masonry and carpentry jobs in the adjacent clusters.

The second highest cluster contains fifteen DIY activities. From bottom to top these are: 'build a wall of sand-lime bricks', 'plaster a wall', 'connect a tap', 'fix insulation under ground floor', 'install and conneet a kitchen-unit', 'fix insulation onder roor, 'install bathroom equipment', 'conneet outside tap', 'repair gutter', 'sweep the chimney', 'construct an electricity circuit', 'repair tiled roor, 'install an outlet', 'repair flat roor, and 'construct dormer window'. These jobs have a much 18

(26)

Figure 5 Plot of the compo-nent loadings zoomed in

The second compo-nent is expanded to make clusters better visible. The moving activities are omitted.

0.4 -i - - - , . - , 0.3 0.2 0.1 o -0.1 31 -0.2 -1 -0.8 -0.6 -0.4 -0.2 0

higher risk than the ones mentioned in the previous clusters. Incorrect perfor-mance can lead to damage of the house, due to leakage, fire or the falling of heavy materiais. Nine of the jobs con-cern plumbing.

'Make a gas-connection', 'install a boi-ler', 'install centra! heating' and 'replace

a central-heating boiler' form the top cluster. All these jobs concern gasfitting and have a high risk for the house as weIl as for the performer: mistakes can lead to explosions.

The distinction between the lowest clus-ter with the moving OIY activities and the others is clear. The distinction between the remaining clusters is more complicated. The types of the activities, such as carpentry, plumbing, masonry and electric installation, are distributed over the clusters, but each cluster shows a dorninating type: carpentry for the

lowest cluster, plumbing for the cluster

in the rniddle and gasfitting for the upper cluster in figure 5. Hence the OIY activities are clustered on activity type. The order of the OIY activities among the second component seems to repre-sent the risk of the activity, activities without risk having a negative <.>r low positive component loading and risky activities having high positive component loadings.

Although the second component allows for an interesting interpretation, one has to bear in rnind that it accounts for just 8% of the variance in the variables. lts use should therefore depend on the researchers goals. 'Subjective skill' rnight be too po or an interpretation of the data to satisfy these goals. But, while using the second component, one should re-member the dominantly one-componen-tal characteristics of the data.

(27)

4.4 Optimal transformations

In tbe NMPCA all tbe DIY activities were treated as ordinal variables, so tbat monotone transformations of tbe original codings were allowed. Figure 6 sbows wbat tbe optimal transformations look like for a selection of tbe DIY activities. For tbe other activities the transformations are similar to the ones shown. The optimal transformations are of two types. For 'whitewashing', 'construct a lower ceiling' and several other activities, which are not shown, the optimal transform-ations are linear or nearly linear. This means that as far as these activities are concemed a PCA and a NMPCA would result in almost tbe same solution. This is not the case for other DIY activities of wbicb 'build a brick wall', 'install central beating', 'make a gas-connection' and 'repair flat roof are good examples. The optimal transformations of these activities are clearly nonlinear. This is especially the case for 'install central heating'. For this activity the optimal transformation even tends towards a dichotomy: leave the activity to a craftsman versus do the activity oneself with or without the help of someone else.

The categories 1 (do the activity oneself) and 2 (do the activity only with the help of someone else) bave transformed scores tbat are very close, wbile category 3

(leave the activity to a craftsman ) bas a distant transformed score. Altbough all the nonlinear transformations show this tendency, some do so to alesser extent than others as can be seen in figure 6. As judged from the optimal transform-ations we think it was appropriate to perform a nonmetric analysis on these data.

4.5 Relationship with other varia bles

So far our analysis has focused on tbe evaluation and interpretation of the two-dimensional NMPCA solution. In this paragraph we want to find out whether this solution and our interpretation of it are valid and useful. One way to find out is to relate the solution to other variables, which were of course not included in the NMPCA but which were expeçted to be related to overall subjective skill. We did so by first computing tbe mean component scores for every category of tbe variables we wanted to relate to the NMPCA solution and then depicting these categories as points in the Euclidean space defined by the twO components. Figure 7 contains these points and shows the relationships between the NMPCA solution and the variables 'subjective skill', 'project respondent was in', 'age' and 'education' .

What struck us immediately in this figure was the order of the categories of the variabie 'subjective skill', which is an independently measured indicator for this characteristic, along the first component. The category perfect layman was located on the complete left, occasional DIY activist just to the left of the origin,

experienced DIY activist just on the right of the origin and craftsman on the complete right. This relationship further confirmed our interpretation of the first component as representing the overall subjective skill of the respondent.

Figure 7 also shows the relation with the variabie 'project respondent was in'.

Zoetermeer was located on the left of the figure, Huizen was placed near the origin, while R'dam and A 'dam were located on the right side. Zoetermeer

(28)

Figure 6 Optimal transformations of selected varia bles lIHlTEWASHING 4 Ol ::l 8 Ol Q 2 ~ 1i! 1 Ol

~

0 ·1 0.5 1 1 . 5 2 2 . 5 3 3.5 ORIGINAL CODES

BUILD A BRICK WALL 2, - - - -, ~ 1.5 8 Ol

r

·

5 o 0

...

~.0.5 ... ·1 -1. 5 L.... ... - ' -... ...J~-'-'"..J 0.5 1 1.5 2 2.5 3 3.5 ORIGlNAL CODES MAKE A GAS-CONNECTION 1. 5, - - - . . ,

~

1 Ol 0.5

~

0 ~-0.5

~

·1 -1. 5 L...-'-'-'-... ~ ... ...J 0.5 1 1.5 2 2.5 3 3.5 ORIGINAL CODES

CONSTRUCT A LOWER CEILING 2 Ol

~

1.5 Ol 1 Q ~ 0.5 0

...

0 Ul

~.0

.

5

·1 0.5 1 1 . 5 2 2.53 3.5 ORIGINAL CODES

INSTALL CENTRAL HEATING

1, - - - -- - - , Ul ~ 0.5 Ul ~ 0 1i!-0.5

~

-1 -1. 5 L.... ... - ' -... ...J'-'-... ...J 0.5 1 1.5 2 2.5 3 3.5 ORIGINAL CODES

REPAlR FLAT ROOP 1.5 r - - - - -- - - ,

~

1 Ol 0.5

§

0 ~-0.5

~

-1 -1. 5 L... ... - ' -... ~ ... ...J 0.5 1 1.5 2 2.5 3 3.5 ORIGINAL CODES

(29)

concerned a project in the rental sector, where the respondents already occupied their apartments which contained moveable inner walls. The Huizen project is also in the rental sector, but here we have to do with newly build houses, which were to be finished by the new tenants. As was to be expected, see paragraph 2, this project attracted people with a higher overall subjective skill than the Zoetermeer project. The projects in Amsterdam and Rotterdam were in the owner-occupier sector and concerned newly build shells which were to be completed by the new owner-occupier. The difference between the latter and the former projects is not only the housing sector in which the projects took place, but also the fact that the Amsterdam and Rotterdam projects offered more opportunities for DIY work. As was predicted in paragraph 2 this attracted people with an even higher overall subjective skill than the former projects did. It is clear from inspecting figure 7 that the differences between the projects are very weIl, and in a predictabie way, represented in the NMPCA-solution.

The variabie 'age' showed its temporal order on the first component. The youngest group was located on the right of the figure, while all older groups were located to the left of each younger group. Hence the overall subjective skill decreases with increasing age. This is as one rnight expect. Generally speaking younger people are more active and physically better able to perform DIY activities.

The variabie 'education' represents the respondent's highest education with diploma. It shows an interesting order along the first component. The category

none or grammar school was located at the far left side of the plot, while HA VO

and MAVO laid closer to the origin. To the right of the origin we found MBO, LBO and HBOjUniversity. Hence respondents with no education or a general education (MAVO, HAVO) were situated to the left of the origin, indicating low overall subjective skill. Respondents, who received professional (MBO, LBO and HBO) or university education, claimed generally a higher subjective skill, hence they are situated to the right of the origin. Although the fact that university education correlates with higher subjective skill may seem puzzling, this result has been found before. Knulst (1983) has shown that since the sixties DIY work has for several reasons become popular among the higher educated, while at the same time the skill for doing DIY activities became a status symbol for this group.

The findings described in this paragraph not only make our interpretation of the principal component likely, they also show that it is a useful variabie in research. Eventually we would like to point at another interesting feature of figure 7. As

can clearly be seen, the variables we related to the NMPCA solution vary mainly along the first component. Although there is some variation in these variables along the second, it is much less. This is another indication of the one-compon-ental character of the data.

(30)

Figure 7

0.5

o

-0.5

-1

Mean component scores per category of selected variables

SiDgIe digits represeDt subjective skiD: 1 = perfect Iayman, 2 = occasional DIY activist, 3 = experienced DIY activist, 4 = craftsman.

2 Huijoen

//')

:.1<

A?

O

·'·',.

/<:~:

.. // /

\"'-""~:~~

/~~_ .. -. mavo . ;."./~' .. bi'!"v.Q-· -Zoete~meer / / ./ / ",,,,,-" / 1 /' / / 60 61

none or gra~mar school

-1 o 30 .. ' ... . > .... : - ...•. ".

-. -.

=~:~l~~·", hb 0 I un :L~'e.J:" s

ity,

. ""',~,' d'i3-m " \ ',4: \ A" dam 1

(31)
(32)

5

CONCLUSIONS

The nonmetric principal components analysis of the 40 DIY activities resulted in a dominantly one-dimensional solution. Both the eigenvalues and the component

scores pointed in this direction. The projections of the categories of the DIY

activities on the principal component showed a clear pattern, which indicated that it measures the overall subjective skin of the respondents. This interpretation was validated by correlating the first component with two different measures of

subjective skin and by relating it to an independently measured indicator of

subjective skin. On the second component the analysis groups the DIY activities in

four more or less distinctive clusters which represent the amount of risk involved in the activities. Each cluster has a dominant activity. Therefore the second compo-nent added to the interpretation of the solution. But we have to keep in mind that the second component is not very important for the solution compared with the

first. It 'explains' slightly over 10% of the variance that the two components account

for. Besides, as can clearly be seen in figure 7, it is mainly the first component that discriminates weIl between the categories of the background variables. This makes the principal component a useful variabie in further research.

Given the NMPCA solution of the DIY activities and our interpretation of it, one may wonder whether it is necessary to present respondents with a list of 40 of such activities. Would not just one general question be enough? The answer to this

question depends to a large extent on the purpose of the research. If one is only

interested in the overall subjective skin of the respondents one general question,

with possibly a few more categories, seems enough. If, on the other hand, the

interest would also lie in the type of activity one should ask a few more questions. For instance one could choose a chàracteristic activity from each cluster that is distinguished in figures 3 and 5. However, in both cases it does not seem necessary to use a list of 40 DIY activities.

In this paper we analysed the relationships between 40 DIY activities by means of nonmetric principal components analysis. This technique is a generalization of

classical principal components analysis. It allows the researcher to analyse not only

numerical variables but also ordinal and nominal variables. Although we treated all the variables as ordinal in our analysis, the technique allows for analyses using a set

(33)

of variables with mixed measurement levels. As we have shown the interpretation of the NMPCA parameters is similar to the ones in PCA Besides, we have for every variabie in the analysis a set of sealing parameters. It is up to the researeher to decide whether the optimal scalings are useful. Since NMPCA is mostly used as an exploratory data analysis technique, researchers applying the technique also rely heavely, as we have done, on the use of plots and graphs to support the under-standing and interpretation of the results. We believe graphs and plots to be very valuable tools not only of NMPCA but of data analysis in general. Although these tools seem to be used more in exploratory analyses, there is nothing against using them also with the more classical techniques. The plots we presented in this paper could for instanee also be used when presenting the results of a classical PCA

(34)

REFERENCES

Burt, c., 1940, The factors of the mind, London (University of London Press).

Eckart, C. and G. Young, 1936, The approximation of one matrix by another of

lower rank, Psychometrika, 1, p. 211-218.

Gifi, A, 1985, PRINCALS, Report UG-85-03, Leiden (Department of Datatheory, University of Leiden).

( Gifi, A, 1990, Nonlinear multivariate analysis, New York (Wiley).

(Green, P.E. and J.D. Carroll, 1976, Mathematical tools for applied multivariate

Vnalysis, New York (Wiley).

Hoenderdos, AL.M. and A W.C. Metselaar, 1985, Huurderszelfwerkzaamheid in een ERA-flat, Zoetermeer, Delft (DUP), (in Dutch).

Hoenderdos, AL.M. and A W.c. Metselaar, 1990, Cascobouw in de huursector, evaluatie van het jubileumproject van 93 afbouwwoningen in Huizen, Delft (DUP), (in Dutch).

Hotelling, H., 1933, Analysis of a complex of statistical variables into principal components, Journalof Educational Psychology, 24, p. 417-441, 498-520.

KnuIst, W.P., 1983, Doe-het zelf; wat weten we erover? Een verkenning onder het doe-het-zelverspubliek, in: Groetelaets, P. and H. Priemus, Konfrontatiekollege

Volkshuisvesting Doe Het Zelf, inleidingen, Delft, (DUP), (in Dutch).

Milne, AA, 1973, Winnie-The-Pooh, London (Methuen & Co Ltd).

Priemus, H., J.H.M. van Bokhoven and P. Groetelaers, 1989, Zelfbouw in

Rotterdam, evaluatie van zelfwerkzaamheidsprojecten in de premie-A-koopsector in de Rotterdamse wijken Zevenkamp en Beverwaard, Delft (DUP), (in Dutch).

(35)

Rao, C.R., 1964, The use and interpretation of principal component analysis in applied research, Sankhya, 22, p. 329-358.

an Rijckevorsel, J.L.A, 1987, The application of fuzzy coding and horseshoes in multiple correspondenee analysis, Leiden (DSWO Press).

Young, F.W., 1981, Quantitative analysis of qualitative data, Psychometrika, 46, p. 347-388.

(36)
(37)
(38)
(39)

ot6

OTB Research Institute for Policy Sciences and Technology Department of Computer Applications and Information Systems

P.O. Box 5030 2600 GA Delft Thijsseweg 11 2629 JA Delft Phone +31.15.783005 Telefax +31.15.784422

I

Cytaty

Powiązane dokumenty

In this section we used a standard random number generator which we verified to return a nearly uniform distribution for samples of size 10 6 lending some credibility to the

a j z j has a long history and dates back to the earliest days when the geometrical representation of complex numbers

Liman, Zeros of certain polynomials and analytic func- tions with restricted coefficients, Journal of Classical Analysis 2 (2014) 149–157..

The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations... The leading dimensions

(6 points) 600 metres of fencing is used to construct 6 rectangular animal pens as shown.. (a) Find the formula for the area of each pen in terms

Murphy, Lower bounds on the stability number of graphs computed in terms of degrees, Discrete Math. Selkow, The independence number of a graph in terms of degrees,

(This doubles the distance between vertices of.. the same color.) The result is an S(a, d)-packing coloring using a finite number of colors.. We can also provide a lower bound on

We prove that the domination number γ(T ) of a tree T on n ≥ 3 vertices and with n 1 endvertices satisfies inequality γ(T ) ≥ n+2−n 3 1 and we characterize the extremal