• Nie Znaleziono Wyników

Bayesian Exponential Survival Model in the Analysis of Unemployment Duration Determinants

N/A
N/A
Protected

Academic year: 2021

Share "Bayesian Exponential Survival Model in the Analysis of Unemployment Duration Determinants"

Copied!
6
0
0

Pełen tekst

(1)

FOLIA OECONOMICA 269, 2012

[191] Wioletta Grzenda

BAYESIAN EXPONENTIAL SURVIVAL MODEL

IN THE ANALYSIS OF UNEMPLOYMENT DURATION

DETERMINANTS

Abstract. The primary objective of the work is to identify demographic and socio-economic

factors influencing the unemployment duration in the recent period in Poland. Different ap-proaches to the problem have been applied. In this paper we have used a survival parametric model in Bayesian approach. The following determinants have been concerned in the model: sex, marital status, education level, information about continuing an education, region of Poland, and age of respondent. The empirical analysis is based on “Household budgets in 2008” survey of Central Statistical Office and indicates the main factors influencing unemployment duration.

Key words: unemployment; survival exponential model; Bayesian inference; MCMC

method.

I. INTRODUCTION

The significance of unemployment results from its economic, social and po-litical aspects. To investigate the unemployment determinants, event history models (Drobniþ and Frątczak, 2001) and logit models (Daras and Jerzak, 2005; Collier, 2003) are usually applied. Another approach, based on standardised unemployment rates can be found in (Socha and Sztanderska, 2000). The most frequently reported factors related to unemployment are: sex, age, education status and place of living. Besides these, many other determinants are considered such as: the techniques of searching for a job, the number of received job offers, the period of unemployment benefit, minimal salary rates, etc.

The primary objective of this work is to identify the demographic and socio-economic features, which influence the unemployment. A Bayesian exponential survival model was applied to analyse the determinants affecting the duration of unemployment period.

(2)

II. THE SCOPE OF RESEARCH

The empirical analysis is based on “Household budgets in 2008” survey of Central Statistical Office. According to the aim of this research, we take into consideration unemployed persons, who were looking for a job and were ready to take a job (Eurostat). As different factors can influence unemployment de-pending on its duration, a decision has been made to consider only persons who were unemployed maximally for 24 months. In this way we chose 2512 indi-viduals. 214 of them already found a job and waited for starting work – for these persons an event holds, while the others are censored.

In a model, a dependent variable is time defined as the number of months of unemployment. The characteristics of independent variables that potentially may have an impact on the unemployment duration has been discussed below.

The first potential determinant is sex: 1 – man (49.56%), 2 – woman (50.44%). One can expect that higher chances for finding a job have men, then women, who more time devote to their families.

Marital status is one of the factors considered by employers when hiring new employees. Hence, it is important to examine if unmarried people have more chance of finding a job. Marital status was encoded as follows: 1 – unmarried (49.32%), 2 – married (42.40%), 3 – separated (1.04%), 4 – a widower, a widow (2.07%), 5 – divorced (5.18%).

We can suppose that education status is one of the most important determi-nants influencing the chance of finding a job. Education level was encoded as follows: 1 – higher (10.19%), 2 – post-secondary (2.95%), 3 – secondary profes-sional (21.38%), 4 – secondary general (12.18%), 5 – basic vocational (34.63%), 6 – primary school (18.67%). A related factor potentially influencing unem-ployment periods is whether a respondent continues education. The latter vari-able takes two values: 1 – yes (8.16%), 2 – no (91.84%).

The regions of Poland differ in the economic and technological develop-ment, hence we can suppose that the residents of the west and central region of Poland have more chance of finding a job, then the residents of the remaining regions. Region of Poland was defined as follows: 1 – central (province: áódzkie, mazowieckie) (18.95%), 2 – southwest (province: dolnoĞląskie, opolskie) (10.91%), 3 – south (province: maáopolskie, Ğląskie) (14.81%), 4 – northwest (province: wielkopolskie, zachodniopomorskie, lubuskie) (17.12%), 5 – north (province: kujawsko-pomorskie, warmiĔsko-mazurskie, pomorskie) (17%), 6 – east (province: lubelskie, podkarpackie, ĞwiĊtokrzyskie, podlaskie) (21.22%).

Next determinant which has been taken into consideration is age (min=17, max=66). It is important to examine if young persons have more chance of find-ing a job.

(3)

III. RESEARCH METHOD

In this paper we have used a Bayesian survival exponential model. The Bayesian methods combine subjective prior knowledge with the information acquired from the data by using Bayes’ theorem (Bolstad, 2007; Bernardo and Smith, 2004; Gelman et al., 2000).

The proposed exponential model is one of the most important models in sur-vival analysis (Blossfeld et al., 1989; Blossfeld and Rohwer, 1995). This sursur-vival parametric model will be presented in Bayesian approach. The Bayesian analysis of survival parametric models has been discussed in many works (Ibrahim et al., 2001). Suppose we have independent identically distributed survival times

y1,!,yn

c

y ; with each yi, i 1,...,n having an identical exponential dis-tribution with parameter O. The censoring indicators we denote by

v1,!,vn

c

v , where vi 0 if yi is right censoring and vi 1 if yi is failure time, i 1,...,n. The density function for yi is f

yi|O

expO

Oyi

, the survival function S

yi|O

exp

Oyi

. In regression models we have one more additional element – a matrix of independent variables X (nu ). Let p xci

denote ith row of the matrix, then D

n,y,X,v

is the observed data.

Let Oi M

xicȕ , where xi, (pu1) is a vector of covariates, ȕ , (pu1) is a vector of regression coefficients and

M

is a known function. For

xicȕ exp

xicȕ

M , we have the following likelihood function:

>

@

>

@

. exp exp exp exp exp exp exp exp | | | 1 1 1 1 1 1 ¿ ¾ ½ ¯ ® ­ c  ¿ ¾ ½ ¯ ® ­ c c  c  c

¦

¦

–

–

  n i i i n i i i n i v i i v i i i n i v i i v i i y v y y y S y f D L i i i i ȕ x ȕ x ȕ x ȕ x ȕ x ȕ O O (1)

Often for regression coefficients ȕ we choose uniform improper prior or a normal prior. In our model we take a p–dimensional normal prior Np

ȝ0,Ȉ0

for ,ȕ where ȝ denotes the prior mean vector, and 0 Ȉ denotes the prior co-0 variance matrix. Then the posteriori distribution for ȕ is given by

ȕ|D

L

ȕ|D

p ȕ|ȝ0,Ȉ0

p v , (2)

ȕ|ȝ0,Ȉ0

p denotes multivariate normal density with mean ȝ and covariance 0 matrix Ȉ . 0

(4)

IV. MODEL ESTIMATION

Estimation and verification of all the models has been performed using SAS system. In order to obtain objectively correct results, we have used a priori dis-tributions that have a minimal impact on a posteriori distribution. Moreover, we have neither results of statistical modeling for the investigated time period nor the data for the entire country. Still, only credible information may improve the quality of model estimation. Therefore, non-informative independent normal prior distributions have been used for all regression parameters to estimate all the models: p

ȕ ~ N

0,106I

.

The estimated models have been evaluated to assess the convergence of gen-erated Markov chains. Inference in Bayesian analysis under unchecked conver-gence for some model parameters may result in wrong conclusions. Using Ge-weke’s test (Geweke, 1992) we have found that there is no indication that the Markov chain has not converged for all the parameters of investigated models, at any significance level.

The same result has been obtained for Heidelberger-Welch test (Heidelber-ger and Welch, 1983), which consists of two parts i.e. a stationarity test and a halfwidth test. The halfwidth test additionally reports whether the sample size is adequate to meet the required accuracy for the mean estimate.

Thus, it can be assumed that the obtained posterior samples are appropriate for statistical inference. The results of model estimation have been summarized in table 1.

Table 1. Posterior sample mean and interval statistics

Parameter Mean Highest Probability Density

Interwal (D 0.05) Exp(Mean) Exp(-Mean) Intercept 4.8315 4.1880 5.4644 125.399 0.008 Sex 1 –0.3547 –0.6374 –0.0850 0.701 1.426 Education 1 –0.9193 –1.4357 –0.3769 0.399 2.508 Education 2 –0.2906 –1.2897 0.6852 0.748 1.337 Education 3 –0.6864 –1.1603 –0.2497 0.503 1.987 Education 4 –0.5770 –1.1029 –0.0260 0.562 1.781 Education 5 –0.3488 –0.8063 0.0671 0.706 1.417 Region 1 –0.4681 –0.8869 –0.0566 0.626 1.597 Region 2 0.1815 –0.4396 0.7919 1.199 0.834 Region 3 –0.3878 –0.8294 0.0572 0.679 1.474 Region 4 –0.3229 –0.7676 0.1152 0.724 1.381 Region 5 –0.2975 –0.7364 0.1300 0.743 1.346 Age 0.0150 0.00262 0.0263 1.015 0.985

(5)

Basing on the highest probability density interval (Bolstad, 2007), statisti-cally significant variables are sex, age and at least one level of other variables.

V. SUMMARY AND CONCLUSIONS

We obtained that among variables chosen to model: sex, marital status, edu-cation level, information about continuing an eduedu-cation, region of Poland and the age at the moment of research, only two variables have been determined to be statistically insignificant: marital status and information about continuing an education.

In the case of first determinant, previous assumptions that unmarried people have more chance of finding a job, were not confirmed. Information about con-tinuing an education has turned out to be statistically insignificant, but we can state that by improving the education status, one can increase the chances for finding a job in the future. We obtained that the individuals, who had education higher than primary, have more chance of finding a job. The persons having a secondary professional education have 98.7% more chance of finding a job comparing to those who have attended primary schools only, the persons having a higher education have this chance more than twice as high as the members of the former group. Our research confirms the previous assumption and is consis-tent with the results of other studies (Daras and Jerzak, 2005).

The results for sex variable also confirm our previous speculations, we ob-tained that men have 42.6% more chance of finding jobs than women. The re-sults of other research (Daras and Jerzak 2005; Socha and Sztanderska, 2000) indicate worse situation of women in the labor market, even if women are better educated and are more actively searching for a job. Moreover employers are more likely to hire men, than women, due to the role women play in their fami-lies i.e. they more frequently take care of children.

One of important factors influencing the unemployment duration is age; we obtained that the chances of finding a job decrease by about 1.5% as the age of a respondent increases by one year. According to other researchers (Daras and Jerzak, 2005) the unemployment rate is also dependant of the age, the lowest chances for finding a job have persons aged over 44.

The results for region of Poland variable are that only one level of this vari-able is statistically significant being central region i.e. provinces áódzkie i ma-zowieckie. We obtained that the mean unemployment duration for the residents of this region is shorter by 37.4% then for the residents of the east region.

The model applied in this article enables the identification of demographic and socio-economic factors influencing the unemployment duration. The advan-tage of survival models is the fact they include all the history of an individual. But these models demand data, which are frequently not provided by commonly made surveys.

(6)

REFERENCES

Bernardo J.M., Smith A.F.M. (2004), Bayesian Theory, John Wiley & Sons, New York.

Blossfeld H.P., Hamerle A., Mayer K. (1989), Event history analysis, Statistical theory and

appli-cation in the social sciences, Hillsdale, NJ: L. Erlbaum.

Blossfeld H.P., Rohwer G. (1995), Techniques of event history modeling, New approaches to

causal analysis, Hillsdale, NJ: L. Erlbaum.

Bolstad W.M. (2007), Introduction to Bayesian statistics, John Wiley & Sons, New York.

Collier W. (2003), The impact of demographic and individual heterogeneity on unemployment Duration: A regional study, Studies in Economics, 0302.

Daras T., Jerzak M. (2005), Wpáyw cech spoáeczno-demograficznych osób bezrobotnych na moĪliwoĞü znalezienia pracy, Materiaáy i Studia, Zeszyt nr 189.

Drobniþ S., E. Frątczak (2001), Employment patterns of married women in Poland, Careers of

couples in contemporary society, New York.

Gelman A., Carlin J.B., Stern H.S., Rubin D.B. (2000), Bayesian data analysis, Chapman & Hall/CRC, London.

Geweke J. (1992), Evaluating the accuracy of sampling-based approaches to calculating posterior moments. In Bernardo J., Berger J., Dawiv A., Smith A., Bayesian Statistics, 4, 169-193. Heidelberger P., Welch, P. (1983), Simulation run length control in the presence of an initial

tran-sient, Operation Research, 31, 1109 –1144.

Ibrahim J.G., Chen M-H, Sinha D. (2001), Bayesian survival analysis, Springer-Verlag, New York.

Socha M., Sztanderska U. (2000), Strukturalne podstawy bezrobocia w Polsce, PWN, Warszawa.

Wioletta Grzenda

BAYESOWSKI WYKàADNICZY MODEL PRZEĩYCIA W ANALIZIE DETERMINANT

DàUGOĝCI CZASU POZOSTAWANIA BEZ PRRACY

Celem niniejszego opracowania jest identyfikacja czynników demograficznych oraz spo áecz-no-ekonomicznych wpáywających na dáugoĞü czasu pozostawania bez pracy. Zbiór danych wyko-rzystany w badaniu pochodzi z badaĔ Gáównego UrzĊdu Statystycznego „BudĪety Gospodarstw Domowych 2008”.

Do analizy determinant dáugoĞci czasu pozostawania bezrobotnym wykorzystano bayesowski wykáadniczy model przeĪycia. W estymacji modelu wykorzystano metody Monte Carlo oparte na áaĔcuchach Markowa, a w szczególnoĞci próbnik Gibbsa.

W wyniku przeprowadzonej analizy otrzymano, Īe wĞród wybranych do modelowania zmiennych objaĞniających: páeü, stan cywilny, poziom wyksztaácenia, informacja o tym, czy re-spondent nadal siĊ doksztaáca, region Polski, który zamieszkuje respondent oraz wiek w momencie badania, tylko dwie okazaáy siĊ statystycznie nieistotne: stan cywilny oraz informacja o tym, czy respondent nadal siĊ doksztaáca.

Cytaty

Powiązane dokumenty

Może ono być wyni- kiem zaburzeń rozwoju umysłowego, utraty słuchu, zaburzeń ekspresji mowy, autyzmu, może wynikać z przyczyn psychospo- łecznych, czy organicznego

[r]

The first element is directed toward man’s intellect, the second—toward the aspirations of the will, and the third—to- ward man’s emotional sphere and his

[i in.] Europejskie i francuskie projekty digitalizacji dzieł znajdujących się poza obrotem handlowym (out of commerce) [on-line].. w sprawie harmonizacji niektórych aspektów

The regressions revealed that in the years of analysis the general government deficit and the rate of GDP growth (the measu- res of fiscal policy) had a statistically

Niemiecka społeczność Królestwa Polskiego w latach Wielkiej Wojny, Stanisław Czerep – Polacy – żołnierze armii rosyj- skiej w walce na obszarze Królestwa

1) Program Safe Harbour przewiduje bardzo łagodne warunki przystąpienia dla amerykańskich firm i pozostawia im szerokie pole do interpretacji. 2) Zasadniczo program

Celem pracy jest określenie, w jaki sposób polskie przedsiębiorstwa zarządzają ryzykiem waluto- wym, i ustalenie, jakie czynniki wpływają na skłonność do stosowania