• Nie Znaleziono Wyników

Epidemiological concepts of validation of biomarkers for the identification/quantification of environmental carcinogenic exposures.

N/A
N/A
Protected

Academic year: 2021

Share "Epidemiological concepts of validation of biomarkers for the identification/quantification of environmental carcinogenic exposures."

Copied!
180
0
0

Pełen tekst

(1)

Epidemiological concepts

of validation of biomarkers

for the identification/quantification

of environmental carcinogenic exposures

3

Edited by

(2)

of the potential of diet to prevent cancer and of the ways in which heredity can affect individual susceptibility to carcinogens, with the ultimate aim of reducing the cancer burden in Europe. ECNIS is coordinated by Prof. Konrad Rydzyƒski, Nofer Institute of Occupational Medicine, Êw. Teresy 8, 91-348 ¸ódê, Poland.

This review has been prepared as part of ECNIS Work Package 8: Evaluation of the contribution of biomarker technology to the identification/quantification of environmental carcinogenic exposures.

© ECNIS, 2007

All rights reserved. No part of this book may be reproduced in any form without the permission of the publisher.

Compiled and edited by Paolo Vineis and Valentina Gallo Imperial College, London, and ISI Foundation, Torino Department of Epidemiology and Public Health St Mary’s Campus Norfolk place W2 1PG London (UK) Tel: +44 (0) 20 7594 3372 Fax: +44 (0) 20 7594 3456 Website: http://www.ecnis.org

With the contribution of:

Bernadette Schoket, David Phillips, Sy Garte, Emanuela Taioli, Micheline Kirsch-Volders, Steffen Loft, Jakob Linseisen, Sabine Rohrmann, Roel Vermeulen, Fabrizio Veglia, Mateuca Raluca, Sara Geneletti, Peter Farmer, Cosetta Minelli, Ulf Stromberg, Giuseppe Matullo, John Thompson, Erika Gyo∂∂rffy, Livia Anna, Katalin Kovács, Carlos A. González.

We are grateful to Soterios Kyrtopoulos and two anonymous reviewers for very thoughtful suggestions.

ISBN 978-83-60818-03-9

Technical editor: Katarzyna Rogowska

Cover design, computer typesetting: Beata Grabska

Published by Nofer Institute of Occupational Medicine Âw. Teresy 8, 91-348 ¸ódê, Poland

Tel.: +48 (0) 42 631 45 04 Fax: +48 (0) 42 656 83 31 E-mail: ecnis@ecnis.org

(3)

Executive summary . . . 5

Introduction . . . 7

1. The epidemiological theory: principles of biomarker validation Paolo Vineis, Valentina Gallo . . . 9

1.1. Validity and reliability . . . 9

1.2. Impact of measurement error: random and systematic error . . . 10

1.3. Sources of variability: inter-subject, intra-subject, laboratory . . . 11

1.4. Measurement of variation . . . 13

1.5. Publication bias . . . 15

1.6. Laboratory drift; study design; quality control . . . 16

1.7. Overall evaluation: ACCE . . . 17

2. Challenges from new technologies and new biomarkers . . . 19

2.1. How to screen for promising intermediate markers? Paolo Vineis, Roel Vermeulen . . . 19

2.2. The use of DAGs in causality assessment for biomarkers Sara Geneletti . . . 25

2.3. A methodology to integrate individual data with ecologic data in molecular epidemiology Cosetta Minelli, Paolo Vineis, Emanuela Taioli, John Thompson . . . 35

2.4. Semi-Bayes for balancing false-positive and false-negative findings in molecular epidemiology studies. Ulf Stromberg . . . 41

2.5. Strategic issues: genotype-phenotype correlations Micheline Kirsch-Volders, Mateuca Raluca, Giuseppe Matullo . . . 44

(4)

Valentina Gallo, David Phillips . . . 59

3.2. Heterocyclic Aromatic Amines Sabine Rohrman, Jakob Linseisen . . . 70

3.3. 1-Hydroxypyrene Katalin Kovács, Erika Gyo∂∂rffy, Livia Anna, Bernadette Schoket . . . 75

3.4. Oxidative damage to DNA Steffen Loft, Peter Møller . . . 83

3.5. Adducts of N-nitroso compounds Carlos A. González . . . 88

3.6. Predictive ability of biomarkers: an updated meta-analysis on bulky DNA adducts Fabrizio Veglia, Giuseppe Matullo, Paolo Vineis . . . 97

4. Comments and suggestions for future work . . . 107

Appendix 1. DNA adducts . . . 109

Appendix 2. Heterocyclic Aromatic Amines . . . 135

Appendix 3. 1-Hydroxypyrene . . . 145

(5)

The aim of this report is to review concepts of biomarker validation and assess the current status of validation mainly with regards to the use of biomarkers in molecular epidemiology.

The first part of the report describes the epidemiological criteria of biomarker validation in population studies. Then some chapters address newer problems encountered in the introduction of intermediate biomarkers into use, in the management of missing data, in causality assessment for biomarker and genetic data, and in the study of phenotype-genotype correlations.

The final part applies criteria of validation to some of the main biomarkers used today in molecular epidemiology (DNA adducts, oxidative damage to DNA , urinary 1-hydroxypyrene, markers for heterocyclic aromatic amines, and N-nitrosocompound adducts). We have found that there are some important gaps of knowledge that deserve attention. Most validation studies have dealt with analytical validation, like reproducibility and repeatability. Some studies also considered validity in terms of sensitivity and specificity in comparison with a “gold standard” (which, however, does not exist for avant-garde markers). Very few studies considered clinical validity, i.e. the ability to predict disease, or the relationship between a decrease in marker and a decrease in disease risk. The latter gaps are challenges for future ECNIS research, with the following goals:

1. To develop a validation strategy, i.e. to identify those validation criteria that are essential before any marker is introduced into epidemiological practice.

2. To conduct pooled analyses to peruse the existing data and fill the gaps (for example, the pooled analysis of DNA adducts we show in Chapter 3).

3. Finally, when pooled analyses are insufficient, to conduct field research to obtain original data on biomarker validity, for example within the existing, large prospective population studies.

(6)
(7)

The present report has been prepared by a group of European scientists in the context of the EU-funded Network of Exellence ECNIS whose goal is to provide effective biomarkers for the study of the relationships between environmental toxicants, dietary habits and cancer. The use of biomarkers in cancer epidemiology has a rather long history (the wording “molecular epidemiology” was originally proposed by Perera and Weinstein in 1982) and great successes have been achieved, like the investigation of the predictive ability of chromosome aberrations, the relationship between aromatic amines in tobacco, the NAT2 genotype and bladder cancer, or the mechanisms by which benzene induces leukaemia. However, many biomarkers are introduced into research without proper validation, and this hampers successful research, introducing bias or simply blurring existing associations. In addition, new and complex issues are emerging through the introduction of high-throughput technologies, like the expected large number of false positives, or the complex interplay between environmental exposures, intermediate markers, confounders and disease. Causality assessment has really become a challenge.

For these reasons, we think it can be useful to summarize the main criteria for biomarker validation (Chapter 1); to offer some insights into new technical and statistical developments for the management and interpretation of biomarker data (Chapter 2); and to offer some examples of existing information (usually sparse) on biomarker validation (Chapter 3).

Paolo Vineis and Valentina Gallo Imperial College London, United Kingdom

(8)
(9)

principles of biomarker validation

Paolo Vineis and Valentina Gallo Imperial College, London, United Kingdom

1.1. Validity and reliability

To achieve an accurate estimate of the association between any marker and disease, in epidemiology we need reliable and valid measurements of exposure, covariates (potential confounders and effect modifiers), and outcomes. Causal inference cannot be interpreted in the absence of such requirements. We will distinguish, in the information to follow, between a marker (any variable that can be measured and is informative for the purposes of the study), an assay (a specific laboratory test which aims at measuring that marker), and a measurement (the concrete act of measuring the value of a marker in an individual, by a specific assay). For example, PAH-DNA adducts are a type of a marker, P32-postlabelling is a type of an assay, and the actual data are the measurements. Validity is defined as the (relative) lack of systematic measurement error when comparing the actual observation with a standard, that is a reference method which represents the “truth”. Such “truth” is in fact an abstract concept we are interested in, for example “cancer ” as defined through a histologic characte-rization. While validity entails a “standard”, reliability concerns the extent to which an experiment or any measuring procedure yields the same results on repeated trials (1). By “the same results” we do not mean an absolute correspondence, but a relative concept, i.e. “a tendency towards consistency of repeated measurements of the same phenomenon”. Reliability is relative to the type and purpose of the measurement: for some purposes we may accept a level of unreliability that is unacceptable for other purposes. In addition to being reliable, the marker must be valid, i.e. provide an accurate representation of some abstract concept.

Validity and reliability are independent: a measurement may be perfectly reliable (reproducible in different laboratories and repeatable at different times), but consistently wrong, i.e. far away from the true value. For example, a gun may be completely reliable if all the shots are comprised within a small area but seriously biased if such area is far away from the target; conversely, the gun is unbiased but unreliable if the shots have an average distribution around the center of the target, but are dispersed over a large area.

We are interested both in the validity and reliability; however, since validity is often not measurable, reliability is sometimes used (incorrectly) as a surrogate. An aspect that is clearly relevant to the discussion of measurement error is timing: any inference about the meaning of biomarker measures should be strictly time-specific, since time influences the results in several different ways.

(10)

The major components of biomarker variability that affect the design of epidemiologic studies are variability between subjects (inter-subject), within subjects (intra-subject) and variability due to measurement errors. The impact of the three categories of varia-bility on the biomarker response can be represented by a linear model of this form (2):

Yijk= u + ai + bj+ eijk [1.1.]

where

Yijk— the marker response for subject i at time j and replicate measurement k; u — the true population mean response;

ai — the offset in mean response for subject i (assumed to be normally distributed with mean = 0 and variance = si2; the variance represents the extent of inter-subject variability); bj — the offset in response at time j (assumed to be normally distributed with mean = 0

and variance = sj2; this variance represents the extent of intra-subject variability);

eijk — the assay measurement error (normally distributed with mean = 0 and variance = sijk2) (2).

The normality of distribution, assumed in the model, must be verified. In fact, many biomarkers have distributions that are far from being normal; normalization can be achieved through an appropriate transformation, for example log transformation. The model is based on a linear (additive) assumption, which implies that measurement errors are independent of average measurements. Such assumption must be verified case by case, for example, by checking whether errors correlate with the mean.

1.2. Impact of measurement error: random and systematic error

The errors of marker measurement may have different impact depending on error distribution. If the epidemiological study has been conducted blindly, i.e. the laboratory analyses have been done with no knowledge of the exposed/unexposed or diseased/healthy status of the subjects; we expect that the measurement error will be evenly distributed across strata of exposure or disease. (This is true only if the error is equally distributed across the scale of the exposure — i.e. non smokers may be more difficult to characterize than smokers. Also, detectable levels of biomarkers may influence controls, or the unexposed subjects, more than cases, or the exposed subjects. More controls may have undetectable levels than cases; therefore, the measurement error may influence in a differential way the cases vs. controls). This kind of misclassification leads to underestimation of the risk ratio due to a “blurring” of the relationship between exposure and disease.

Both underestimation and overestimation of the association of interest may occur when misclassification is not evenly distributed across the study variables. We may have a more general distortion of the etiologic relationship, and not only a “blurring” of the association, if the classification of exposure depends on the outcome (diseased/healthy

(11)

status). Blurring is “bias towards the null”, while distortion as a consequence of uneven distribution of misclassification can be in either direction, both towards and away from the null hypothesis.

A realistic example of bias depending on the knowledge of the disease status by the researcher is related to the degradation of analytes when biological samples are stored for a long time. If the samples from the cases affected by the disease of interest and, respectively, those from controls (within a cohort design) are analyzed at different times, bias can arise from differential degradation in the two series. For example, the researcher may decide (incorrectly) to analyze the samples from the cases as soon as these arise in the cohort, while the controls are analyzed at the end of the study. Since the levels of, say, vitamin C decrease rapidly with time, serious bias may arise from differential timing of measurement in the two series. For this reason, biochemical analyses should be made after the matching of cases and controls for the time since sample collection.

1.3. Sources of variability: inter-subject, intra-subject, laboratory

Variation in biomarkers includes inter-individual (inter-subject) variation, intra-subject variation (i.e. variation in marker over a particular time period), biological sampling variation (i.e. variation depending on the frame of biologic sample collection) and labora-tory variation. Sometimes, the intra-individual and/or sampling variations are so large that the laboratory measurement variation makes a marginal contribution to overall variation. A particular example of intra-subject variation is associated with error due to the handling, processing, and storing of specimens; such variability can be measured only if repeated samples from the same individual are collected.

Inter-subject variability in marker response may derive from factors such as ethnic group, gender, diet or other characteristics. Similarly, the marker response may vary within the same subject over time due to the changes in the diet, health status, variation in exposure to the compound of interest (for dietary items, season is an important variable), and variation in exposure to other compounds that influence the marker response.

Biological sampling variation is related to the circumstances of biological sample collection. For example, hyperproliferation of colonic cells is extremely variable at different points of the colon mucosa. Therefore, not only the intra-subject variation over time is important, due to the varying exposure to agents that induce cell pro-liferation, but also the measurements are strongly influenced by how and where the mucosa is sampled. For example, a study (3) estimated that 20% of the variability of the rectal mucosa proliferation index (measured by nuclear antigen immuno-histochemistry) is due to subject, 30% to the biopsy within the subject, and 50% is due to crypts within a biopsy. In other words, as much as 80% of the variation is related to sampling.

(12)

Laboratory measurements can have many sources of error, in particular, two general classes of laboratory errors: those that occur between analytical batches and those that occur within the batches. An example of the study that was designed to assess the different sources of laboratory variation is reported by Taioli et al. (2), using the model described above. In one experiment, they drew blood from five subjects three times in three different weeks, in order to measure DNA-protein cross-links. The results indicated that the variation between batches was quite important and larger than the variation between subjects.

Methodological issues should be discussed, as much as possible, within biomarker categories, due to the specificities of each category. The following table shows how methodological data can be organized according to biomarker type. Intra-individual and sampling variations are considered due to the extent of their influence on actual measurements for most markers.

Table 1.1. Methodological data organisation according to biomarker type

Internal dose

Hormones Yes (diurnal variation) No

Water-soluble nutrients Yes (short half-life) No

Organochlorine No (longer half-life) No

Biologically effective dose

Peripheral white blood cells Yes (half-life: weeks to months) No Exfoliated urothelial cells DNA adducts Yes (half-life: months) Yes

Early biological effect

Lymphocyte metaphase chromosome aberration More or less stable ? Somatic cell mutations glycophorin A Probably low No (?)

Intermediate markers

Cervical dysplasia Yes Yes

Colonic hyperproliferation Yes Yes

Genetic susceptibility

Genotype assay No No

Noninducible phenotype No No

Inducible phenotype Yes No

Tumour markers Yes Yes

(13)

1.4. Measurement of variation

The extent of variability in measurements can be measured itself in several ways. Let us distinguish between continuous measurements and categorical measurements. A general measure of the extent of variation for continuous measurements is the Coefficient of Variation (CV — standard deviation/mean, expressed as a percentage). A more useful measure is the ratio between CVb and CVw : CVw measures the extent of laboratory variation within the same sample in the same assay, CVb measures the between-subject variation, and the CVb/CVw ratio indicates the extent of the between subject variation relative to the laboratory error. Large degrees of laboratory error can be tolerated if between-person differences in the parameter to be measured are large. A frequently used measure of reliability for continuous measurements is the intra-class correlation coefficient, i.e. the between person variance divided by the total (between plus within-subject) variance. The intra-class coefficient is equal to 1.0 if there is exact agreement between the 2 measures on each subject (thus differing from the Pearson correlation coefficient that takes the value 1.0 when one measure is a linear combination of the other, not only when the two exactly agree). A coefficient of 1.0 occurs when within-subject variation is null, i.e. laboratory measurements are totally reliable. The intra-class correlation coefficient can then be used to correct measures of association (e.g. relative risks) in order to allow for laboratory error. The intra-class correlation coefficient can be used to estimate the extent of between-subject variability in relation to total variability. The latter includes variation due to different sources (reproduci-bility, repeata(reproduci-bility, and sampling variation). To measure reproduci(reproduci-bility, i.e. the ability of two laboratories to agree when measuring the same analyte in the same sample, the mean difference between observers (and the corresponding confidence interval) has been proposed (4).

In addition to reproducibility, i.e. agreement between readers on the same set of observations, with similar techniques we can measure repeatability, i.e. agreement within the same observer at different times (repeat observations).

Another concept that is used in biomarker validation is inter-observer concordance in the classification of binary outcomes. It should be borne in mind that concordance between two observers can arise by pure chance. Therefore, agreement beyond chance is measured. However, total potential agreement between two readers cannot be 100%, i.e., to be fair we must subtract chance agreement from 100% to have an estimate of total attainable agreement. The final measure is the difference between observed agreement and chance agreement, divided by the total possible agreement beyond chance; this measure is called kappa index. There are other measures of agreement beyond chance, and the use of kappa has to be made cautiously since there are some methodological pitfalls; for example, the value of kappa strictly depends on the prevalence of the condition which is studied: with a high underlying prevalence we expect a high level of agreement (4). Until now we have considered reliability as a property of the assay in the hands of different readers (reproducibility) or at repeat measurements

(14)

(repeatability). Let us consider, now, validity of assessment, i.e. correspondence with a standard. It is essential to bear in mind that two readers may show very high levels of agreement, as measured e.g. by Pearson correlation coefficient (i.e. r = 0.9), even if the first consistently records twice the value of the second observer. Or, alternatively (for, example, when using the intra-class correlation coefficient), two readers could show high levels of agreement (e.g. ICC = 0.9) but poor validity if the same errors repeat themselves for both raters. Now we are interested in the correspondence of the measure-ment with a conceptual entity, for example accumulation of the p53 protein as a con-sequence of gene mutation (in fact, without a mutation the protein has a very short half--life and rapidly disappears from the cells).

The Table 1.2. below shows data on the correspondence between immunohisto-chemistry and p53 mutations. Sensitivity of immunohistoimmunohisto-chemistry is estimated as 85%, i.e. false negatives are 15% of all samples containing mutations; specificity is estimated as 71%, i.e. 29% of samples not containing mutations are falsely positive at immuno-histochemistry. A combined estimate of sensitivity and specificity is the area under the Receiver-Operating-Curve (ROC), i.e. a curve which represents graphically the relation-ship between sensitivity and (1-specificity). It is usually believed (5) that sensitivity and specificity indicate properties of a test irrespectively of the frequency of the condition to be detected (however, this is an assumption that requires to be verified). In the example of the Table, the proportion of samples showing a mutation is high (32/73 = 44%); it would be much lower for example in patients with benign bladder conditions or in healthy subjects. A measure which is useful to predict how many subjects, among those testing positive, are really affected by the condition we aim to detect is the positive predictive value. In the example, among 39 patients testing positive at immuno-histochemistry, 27 actually have mutations, i.e. immunohistochemistry correctly predicts mutations in 69% of the positive cases. Let use suppose, however, that the prevalence of mutations is not 44%, but 4.4% (32/730). With the same sensitivity and specificity values (85% and 71%, respectively) we would have a positive predictive value of 11.8%, i.e. much lower. The predictive value is a very useful measure, because it indicates how many true positive cases we will obtain within a population of subjects who test positive Table 1.2. Validity of p53 immunohistochemistry as compared to mutations in the p53 gene (bladder cancer

patients) (15)

No mutations 29 7 5 41

All mutations 5 8 19 32

Total 34 15 24 73

Sensitivity of immunohistochemistry (+ and ++) = 27/32 = 85%. Specificity = 29/41 = 71%.

Positive predictive value = (8+19)/(15+24) = 27/39 = 69%.

p53 nuclear reactivity (immunohistochemistry)

(15)

with the assay we are applying. However, we must bear in our minds that the predictive value is strongly influenced by the prevalence of the condition: a very low predictive value may simply indicate that we are studying a population in which very few subjects actually have the condition we want to identify.

1.5. Publication bias

It has been suggested that publication bias is an important issue in epidemiological research, especially when performing meta-analyses or pooled-analyses of epidemiological studies (6,7). The hypothesis for the existence of publication bias is that studies with statistically significant outcomes, generally, are more likely to be published than non-significant studies (8).

Publication bias would be shown by a specific statistical analysis involving funnel plots. The presence of asymmetry in the plot of study precision versus the logarithm of the point estimate (usually the Odds Ratio — OR) suggests publication bias, which can be tested statistically (9). A significant asymmetry indicates the presence of bias; if publication bias existed, then the funnel plots from the published data should have a certain degree of asymmetry because of the lack of small published studies with negative results. It has been suggested that time-lag bias could be an alternative to publication bias (10), with first studies giving more favourable results compared with the subsequent studies (Fig. 1.1.).

Fig. 1.1. The strength of the association is shown as an estimate of the odds ratio (OR) without confidence

intervals. Eight topics in which the results of the first study or studies differed beyond chance (P < 0.05) when compared with the results of the subsequent studies. Adapted by permission from Macmillan Publishers Ltd: Nature Genetics (29(3):306–309), copyright (2001).

(16)

1.6. Laboratory drift; study design; quality control

When we organize and analyze an epidemiologic study employing biomarkers, we want to minimize total intra-group variability, in order to identify inter-group differences (e.g. between exposed and unexposed or between diseased and healthy subjects), if they exist. Total intra-group variation is the weighted sum of inter-subject, intra-subject, sampling and laboratory variation, with weights that are inversely correlated to the numbers of subjects, measurements per subject, and analytical replicates used in the study design, respectively. Obviously, if we do not have detailed information we cannot adjust for intra-group variation. This is the reason why in epidemiologic studies employing biomarkers it is important to collect, whenever possible, a) repeat samples (day-to-day, month-to-month or year- to year variation may be relevant depending on the marker); b) potentially relevant information on subject characteristics that may influence inter-subject variation; c) conditions under which samples have been collected and laboratory analyses have been conducted (batch, assay, specific procedures). Concer-ning item c), measurement variation may occur as a consequence of many different aspects that are related not only to the choice of the assay, but to:

— collection of the sample (how and when a blood sample was drawn; the type of test tube utilized; amount of biological material collected; for some measurements: whether the subject was fasting; avoidance of exposure to light if we are interested in vitamin C); — processing of the sample (e.g. speed of centrifuging to separate different blood

components; use of a gradient to separate lymphocytes);

— storing (in a simple refrigerator at –20°C; at –70°C; in liquid nitrogen at –196°C; for how long);

— laboratory analyses (inter-laboratory variation; assay; technician performing the assay; batch; accidental contamination of the sample).

Therefore, in order to minimize intra-group variation, technical details should be con-sidered. As an example, for blood collection the following variables need controlling (11–12): 1. Collection tubes contamination; for example, in the case of trace metals all materials

(needles, tubes, pipettes, etc.) should be of a type which do not release metals. 2. Type of additives; for example, collection of plasma entails the use of heparin. 3. Order of collection tubes; to avoid carry-over of trace additives, tubes without

additives should be proceeded first.

4. Time of venipuncture; for example, measurement of compounds that undergo substantial changes during the day, like hormones, requires very accurate timing. 5. Subject posture; pysiological compounds like proteins, iron, cholesterol can be

in-creased by 5–15% in the standing position in comparison with supine position. 6. Hemolysis may occur as a consequence of tube transport and manipulation. 7. Storage conditions.

Laboratory drift is a special problem, which, however, is not peculiar of laboratory analyses (for example, drift in the quality of interviews typically occurs during longitu-dinal epidemiological studies). Laboratory drift is a consequence of changes in procedures

(17)

and accuracy in the course of time, so that the first samples that are analyzed tend to differ from subsequent samples. Avoidance of laboratory drift implies a monitoring program, which consists in repeated quality controls. For example, measurements may be compared with a standard at different points in time. Another source of “drift”, which cannot be technically avoided, is degradation of analytes when they are stored for a long time.

To know more about how the variability in laboratory measurements influences study designs decisions see Rundle et al. (13).

1.7. Overall evaluation: ACCE

ACCE is a core group that takes its name from the four components of evaluation — analytical validity; clinical validity; clinical utility; and ethical, legal, and social implications and safeguards. The effort builds on a methodology previously described for evaluating screening and diagnostic tests. The ACCE process includes collecting, evaluating, interpreting, and reporting data about DNA (and related) testing for disorders with a genetic component in a format that allows policymakers to have access to up-to-date and reliable information for decision-making. The ACCE model contains a list of 44 questions, targeting the four areas of ACCE, to develop a comprehensive review of a candidate test for potential use (14).

Analytical validity

Analytical validity focuses on the ability of the test to measure accurately and reliably the marker/genotype of interest. The components of analytical validity are sensitivity, specifi-city, and test reliability. Sensitivity evaluates how well the test measures the mar-ker/genotype when it is present. Specificity, on the other hand, evaluates the test to deter-mine how well the test measures the marker/genotype when it is not present. The relia-bility of a test measures how often the same results are obtained when a sample is retested.

Clinical validity

Clinical validity focuses on the ability of the genetic test to detect or predict the associated disorder (phenotype). Clinical validity is also the PPV, positive predictive value, that is, the proportion of individuals who develop the disease given that they have the marker/genotype.

Clinical utility

Clinical utility addresses the elements that need to be considered when evaluating risks and benefits associated with the introduction of the test into routine clinical practice. A test that has clinical utility, such as blood cholesterol, provides the individual

(18)

with valuable information that can be used for prevention, treatment, or life planning, regardless of results.

References

1. Carmines EG, Zeller RA: Reliability and validity assessment. London: Sage Publications; 1979. 2. Taioli E, Kinney P, Zhitkovich A, Fulton H, Voitkun V, Cosma G, et al. Application of reliability

models to studies of biomarker validation. Environ Health Perspect 1994;102:306–9.

3. Lyles CM, Sandler RS, Keku TO, Kupper LL, Millikan RC, Murray SC, et al. Reproducibility and variability of the rectal muicosal proliferaton index using proliferating cell nuclear antigen immunohistochemistry. Cancer Epidemiol Biomakers Prev 1994;3:597–605.

4. Brennan P, Silman A. Statistical methods for assessing observer variability in clinical measures. Br Med J. 1992;304:1491–4.

5. Fletcher RH, Fletcher SW, Wagner EH. Clinical Epidemiology — the essentials. 2nd edition. Baltimore: Williams and Wilkins;, 1988.

6. Friedenreich CM. Methods for pooled analyses of epidemiologic studies. Epidemiology 1993;4:295–302.

7. Colhoun HM, McKeigue PM, Davey Smith G. Problems of reporting genetic associations with complex outcomes. Lancet 2003;361:865–72.

8. Stern JM, Simes RJ. Publication bias: evidence of delayed publication in a cohort study of clinical research projects. BMJ 1997;315(7109):640–5.

9. Egger M, Smith GD, Shneider M, Minder C. Bias in meta-analysis detected by a simple, graphical test. Br Med J 1997;315:629–34.

10. Ioannidis JP, Ntzani EE, Trikalinos TA, Contopoulos-Ioannidis DG. Replication validity of genetic association studies. Nat Genet 2001;29:306–9.

11. Young DS, Bermes EW: Specimen collection and processing : sources of biological variation. In: Tiertz NW [ed.]. Textbook of clinical chemistry. Philadelphia: W.B. Saunders Co.; 1986. 12. Pickard NA. Collection and handling of patients specimens. In: Kaplan LA, Pesce AJ.

[eds.]. Clinical chemistry: theory, analysis and correlation. 2nd edition. St. Louis: C.V. Moody Co.; 1989.

13. Rundle AG, Vineis P, Ahsan H. Design options for molecular epidemiology research within cohort studies. Cancer Epidemiol Biom Prev 2005;14(8):1899–907.

14. Sanderson S, Zimmern R, Kroese M, Higgins J, Patch C, Emery J. How can the evaluation of genetic tests be enhanced? Lessons learned from the ACCE framework and evaluating genetic tests in the United Kingdom. Genet Med 2005;7(7):495–500.

15. Esrig D, Spruck CH III, Nichols PW. P53 nuclear protein accumulation correlates with mutations in the p53 gene, tumor grade and stage in bladder cancer. Am J Pathol 1993;143:1389–97.

(19)

new technologies

and new biomarkers

2.1. How to screen for promising intermediate markers?

Paolo Vineis1and Roel Vermeulen2

1

Imperial College, London, United Kingdom 2

IRAS, Utrecht, The Netherlands

Intermediate markers

Intermediate biomarkers directly or indirectly represent events on the continuum between exposure and disease. Intermediate biomarkers can provide important mecha-nistic insight into the pathogenesis of cancer, including early effects that are proximate to exposure and subsequent pre-neoplastic alterations. As such, they complement classic epidemiological studies that use cancer endpoints. In addition, intermediate biomarkers can provide initial clues about the carcinogenic potential of new exposures years before cancer develops (1–5).

One group of intermediate biomarkers, biomarkers of early biologic effect (6), generally measure early biologic changes that reflect early, non-clonal and generally non-persistent effects. Examples of early biologic effect biomarkers include measures of cellular toxicity, chromosomal alterations, DNA, RNA and protein/peptide expression, and early non-neoplastic alterations in cell function (e.g., altered DNA repair, altered immune function). Generally, early biologic effect markers are measured in substances such as blood and blood components (red blood cells, white blood cells (WBCs), DNA, RNA, plasma, sera, urine, etc.) because they are easily accessible and because in some instances it is reasonable to assume that they can serve as surrogates for other organs. Early biological effect markers can also be measured in other accessible tissues such as skin, cervical and colon biopsies, epithelial cells from surface tissue scrapings or sputum samples, exfoliated urothelial cells in urine; colonic cells in feces; and epithelial cells in breast nipple aspirates. Other early effect markers include measures of circulating biologically active compounds in plasma that may have epigenetic effects on cancer development (e.g., hormones, growth factors, cytokines).

For maximum utility, an intermediate biomarker must be shown to be predictive of developing cancer, preferably in prospective cohort studies (7) or potentially in carefully designed case-control studies of cases with low-stage/grade tumors. The criteria for validating intermediate biomarkers have been discussed by Schatzkin and colleagues (8,9) and focus on the calculation of the etiologic fraction (also known

(20)

as the attributable fraction or proportion) of the intermediate endpoint, which varies from 0 to 1. The closer the etiologic fraction is to 1, the greater the biologic marker reflects events, either directly or indirectly, on the causal pathway to disease. The availability of numerous prospective cohort studies with stored blood specimens should enhance our ability to rapidly test the relationship between a wide variety of early biologic effect markers using both standard and emerging technologies (10,11) and cancer risk. Such studies could ultimately produce a new generation of endpoints to evaluate the carcinogenic potential and mechanisms of action of various risk factors. In addition, this line of research may one day identify a panel of intermediate markers easily analyzed from blood samples that can be used to identify individuals at elevated risk of developing cancer in the future, who may then benefit from targeted primary and secondary preventative strategies.

A second group of intermediate markers represents events further down the continuum from exposure to disease, where early hyperplastic or pre-neoplastic alterations may have occurred, sometimes due to clonal expansion of a genetically or epigenetically altered cell. These have been referred to as biomarkers of altered structure and function (12). Some of these events can be identified by standard histologic techniques, at times enhanced through the use of special methods. More subtle and earlier pre-neoplastic changes may be detected through proliferation and apoptosis assays as well as molecular analyses that reflect early clonal events in cell cycle control. These markers are frequently analyzed in tissues from organ sites of interest.

Screening for potential intermediate markers

The screening for potential intermediate markers has a strong parallel with the field of genomics where both candidate gene approaches and genome wide screens are performed to identify genetic risk factors. In the candidate gene approach, priors are formulated and single nucleotide polymorphisms (SNPs) in genes are selected. In the genome wide screens, no priors are formulated as basically all genes in the whole genome are studied (discovery approach). The methodologies in the domain of inter-mediate markers are less developed as in the field of genomics however similar approaches can be distinguished.

Candidate approach

Numerous markers have been proposed as markers for carcinogenesis. These were based mostly on either the observation that they were frequently detected in tumors (e.g. t(14;18), t(8;21), t(15;17), P53, KRAS mutations, and HPV, SV40 infections) or that they were associated with genotoxic exposures (e.g. chromosomal aberrations (CA), sister chromatid exchanges (SCE), micronuclei (MN), DNA adducts). Selection of inter-mediate endpoints based on the observation that a marker is linked with either

(21)

the disease or the exposure of interest is a reasonable approach. However, its validity should still be tested by completing the criteria of validity, i.e. that it is linked to both the exposure and disease of interest (i.e. on or reflecting the causal pathway). Unfortunately, many of the historical proposed markers have not been properly validated or have turned out not to fulfil this criteria with the exception of chromosomal aberrations, micronuclei and several virus infections such as human papiloma virus (HPV). In the 1990s, historical cohort studies from Scandinavian countries and Italy, using archived data from many laboratories active in the field of cytogenetic biomonitoring, evaluated the level of CA, SCEs and MN in peripheral blood cells (PBCs) as a biomarker of cancer risk (13–16). A significant association between the frequency of CA in healthy individuals and subsequenct incidence or mortality for all cancer was observed. These findings were subsequently confirmed by two independent cohorts in five Central-Eastern European countries (17), and to a lower extent in the Czech Republic (18,19). No association was found between mean SCE level and cancer risk (13,20,21). The possible role of MN as a predictor of cancer risk has been studied in an international collaborative study, and preliminary results based on a cohort of almost 7000 subjects showed a significant association between MN frequency and the risk of cancer (22). Nine studies have examined the association between cancer at different sites and the levels of “bulky” DNA adducts. A global 73% excess of adduct levels was found in cases compared with controls in current smokers (95% CI: 31 to 115%). No association between cancer status and DNA adduct levels was found among former smokers, whereas never-smokers showed inconsistent results. These observations are in accordance with the findings of three prospective studies (23–25), which also found that DNA adduct levels measured in WBCs were predictive of lung cancer. However, results were inconsistent in that two studies found the association only in current smokers while the third one found it in never-smokers (current smokers were not investigated). It is fair to conclude that the candidate approach for selection of intermediate markers has until now not been overly successful. However, given the ever increasing knowledge of the etiology of many diseases on a molecular level, improved molecular techniques and the availability of prospective studies with biological materials, the candidate approach should deliver promising markers in the near future.

Discovery approach

Recent breakthroughs in biotechnology (e.g. genomics, transcriptomics, proteomics, and metabonomics) offer the possibility of developing a new series of biomarkers of cancer risk. These techniques enable investigators to broadly explore biologic responses to exogenous and endogenous exposures, to evaluate potential modification of those responses by variants in essentially the entire genome, and to define tumors at the chromosomal, DNA, mRNA and protein levels. Given their ome-wide screening, these techniques do not require any prior knowledge and can be used purely

(22)

as discovery tools. Of these new techniques, proteomics and metabonomics are hypothetically the most promising tools for identifying new markers. In light of the fact that the human genome consists of approximately 25–30,000 genes (26), a fraction of what was originally expected, it has become clear that mammalian systems are more complex than what can be determined by genes alone. Alternate splicing, as well as over 200 post-translational modifications affect proteins structure, function, stability and interactions with other molecules (27). It is therefore likely that a number of different proteins/peptides may be expressed by a single gene. Both proteomics and metabonomics have recently been adapted to high throughput, highly sensitive technologies (e.g. SELDI-TOF/MS, NMR) making it possible to screen large number of samples for a large number of potential markers. The results until now have not however been as promising as originally thought. This is largely due to limitations in the current techniques, small sample sizes and generally poor study designs. However, as techniques are still improving and methodological issues are better taken into consideration, it is merely a matter of time before these techniques will produce new leads in the discovery of novel intermediate markers. However, given that these techniques are still only discovery tools, these leads need to be carefully investigated and compared to existing biological information from in vivo or in vitro tests. Secondly, they should be confirmed in other independent studies (candidate approach) preferably using different platforms.

Conclusions

The future of biomarker research will result in further improvements in sensitivity, specificity and throughput of existing classical and omic methodologies. As more of these technologies become commercially available, their application in epidemio-logical studies will require large-scale efforts to validate and establish them as useful tools in biomarker research. Screening of potential biomarkers should be based on both candidate and discovery approaches.

References

1. Schatzkin A, Freedman LS, Schiffman MH, Dawsey SM. Validation of intermediate end points in cancer research. J Natl Cancer Inst 1990;82(22):1746–52.

2. National Research Council. Biological markers in environmental health research. Committee on Biological Markers of the National Research Council. Environ Health Perspect 1987;74:3–9. 3. Schulte PA, Rothman N, Schottenfeld D. Design considerations in molecular epidemiology.

In: Schulte PA, Perera FP, editors. Molecular epidemiology: principles and practices. San Diego, CA: Academic Press; 1993. p. 159–98.

4. Schatzkin A, Gail M. The promise and peril of surrogate end points in cancer research. Nat Rev Cancer 2002;2(1):19–27.

(23)

5. Toniolo P, Boffetta P, Shuker DEG, Rothman N, Hulka B, Pearce N. Application of Biomarkers in Cancer Epidemiology. Lyon: IARC; 1997.

6. National Research Council. Biological markers in environmental health research. Committee on Biological Markers of the National Research Council. Environ Health Perspect 1987;74:3–9. 7. Schatzkin A, Freedman LS, Schiffman MH, Dawsey SM. Validation of intermediate end points

in cancer research. J Natl Cancer Inst 1990;82(22):1746–52.

8. Schatzkin A, Freedman LS, Schiffman MH, Dawsey SM. Validation of intermediate end points in cancer research. J Natl Cancer Inst 1990;82(22):1746–52.

9. Schatzkin A, Gail M. The promise and peril of surrogate end points in cancer research. Nat Rev Cancer 2002;2(1):19–27.

10. Tomer KB, Merrick BA. Toxicoproteomics: a parallel approach to identifying biomarkers. Environ Health Perspect 2003;111(11):A578–9.

11. Nicholson JK, Wilson ID. Opinion: understanding 'global' systems biology: metabonomics and the continuum of metabolism. Nat Rev Drug Discov 2003;2(8):668–76.

12. National Research Council. Biological markers in environmental health research. Committee on Biological Markers of the National Research Council. Environ Health Perspect 1987;74:3–9. 13. Bonassi S, Znaor A, Norppa H, Hagmar L. Chromosomal aberrations and risk of cancer

in humans: an epidemiologic perspective. Cytogenet Genome Res 2004;104(1–4):376–82. 14. Bonassi S, Hagmar L, Stromberg U, Montagud AH, Tinnerberg H, Forni A, et al. Chromosomal

aberrations in lymphocytes predict human cancer independently of exposure to carcinogens. European Study Group on Cytogenetic Biomarkers and Health. Cancer Res 2000;60(6):1619–25. 15. Hagmar L, Bonassi S, Stromberg U, Brogger A, Knudsen LE, Norppa H, et al. Chromosomal aberrations in lymphocytes predict human cancer: a report from the European Study Group on Cytogenetic Biomarkers and Health (ESCH). Cancer Res 1998;58(18):4117–21.

16. Hagmar L, Bonassi S, Stromberg U, Mikoczy Z, Lando C, Hansteen IL, et al. Cancer predictive value of cytogenetic markers used in occupational health surveillance programs: a report from an ongoing study by the European Study Group on Cytogenetic Biomarkers and Health. Mutat Res 1998;405(2):171–8.

17. Boffetta P, van der HO, Norppa H, Fabianova E, Fucic A, Gundy S, et al. Chromosomal aberrations and cancer risk: results of a cohort study from Central Europe. Am J Epidemiol 2007;165(1):36–43.

18. Rossner P, Boffetta P, Ceppi M, Bonassi S, Smerhovsky Z, Landa K, et al. Chromosomal aberrations in lymphocytes of healthy subjects and risk of cancer. Environ Health Perspect 2005;113(5):517–20.

19. Boffetta P, van der HO, Norppa H, Fabianova E, Fucic A, Gundy S, et al. Chromosomal aberrations and cancer risk: results of a cohort study from Central Europe. Am J Epidemiol 2007;165(1):36–43.

20. Norppa H, Bonassi S, Hansteen IL, Hagmar L, Stromberg U, Rossner P, et al. Chromosomal aberrations and SCEs as biomarkers of cancer risk. Mutat Res 2006;600(1–2):37–45.

21. Hagmar L, Bonassi S, Stromberg U, Brogger A, Knudsen LE, Norppa H, et al. Chromosomal aberrations in lymphocytes predict human cancer: a report from the European Study Group on Cytogenetic Biomarkers and Health (ESCH). Cancer Res 1998;58(18):4117–21.

(24)

22. Znaor A, Fucic A, Strnad M, Barkovic D, Skara M, Hozo I. Micronuclei in peripheral blood lymphocytes as a possible cancer risk biomarker: a cohort study of occupationally exposed workers in Croatia. Croat Med J 2003;44(4):441–6.

23. Tang D, Phillips DH, Stampfer M, Mooney LA, Hsu Y, Cho S, et al. Association between carcinogen-DNA adducts in white blood cells and lung cancer risk in the physicians health study. Cancer Res 2001;61(18):6708–12.

24. Peluso M, Munnia A, Hoek G, Krzyzanowski M, Veglia F, Airoldi L, et al. DNA adducts and lung cancer risk: a prospective study. Cancer Res 2005;65(17):8042–8.

25. Bak H, Autrup H, Thomsen BL, Tjonneland A, Overvad K, Vogel U, et al. Bulky DNA adducts as risk indicator of lung cancer in a Danish case-cohort study. Int J Cancer 2006;118(7):1618–22.

26. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, et al. The sequence of the human genome. Science 2001;291(5507):1304–51.

27. Banks RE, Dunn MJ, Hochstrasser DF, Sanchez JC, Blackstock W, Pappin DJ, et al. Proteomics: new perspectives, new biomedical opportunities. Lancet 2000;356(9243):1749–56.

(25)

2.2. The use of DAGs in causality assessment for biomarkers

Sara Geneletti

Imperial College, London, United Kingdom

The present contribution aims to summarise some new approaches to the elucidation of “causal” pathways that involve biomarkers or gene-environment interactions. We do not address two philosophical issues that underlie such approaches. One issue is about the definition of the ‘cause’. It is well-known that the definition varies according to different schools of thought although it has generally been agreed in the literature (1–4) that the key is to invoke interventions. These are defined as deliberate interference with the natural course of events, as for instance, in an experiment. However, in epidemiology, we usually deal with observations and not experiments, and thus we will sometimes use the word ‘cause’ to represent a more intuitive and speculative concept in contrast to the more rigorous “interventional” concept. The second issue is related to the interpretation of graphic representations of causal pathways (DAGs, see below). This is a very powerful and general approach, and broadly speaking there are two approaches to causal inference that make use of Directed Acyclic Graphs (DAGs). One approach, put forward by Judea Pearl (1), combines graphic models with functional equations and counterfactuals. The second approach (6,2,5) is based on the statistical decision theory and takes the view that the aim of causal inference is to inform (??) future decision making processes. This makes it particularly appropriate for epidemiological research as it can readily be applied to health policy decisions. The decision theoretic approach is used in the exposition below. However, we focus on the practical side of graphic models rather than promoting a specific philosophical interpretation.

Causal inference

The main objective of causal inference is to separate the language of statistical association from the language of causality. In this contribution, we focus on the use of graphic models for causal inference in the context of biomarker research and gene-environment interactions. The web of relationships between disease, exposure, genes and other biomarkers is in fact easily visualisable in terms of graphical models.

Directed Acyclic Graphs (DAGs) are graphic representations of relationships among variables, and they can be used to disentangle causal relationships from simple statistical associations. However, causal assumptions must be applied to distinguish the causal arrows from the merely associated ones, and this external information needs to be explicitly stated as it is not in itself contained in a DAG. In particular, DAGs derived from observational data without interventions are just representations of conditional independences (see below) and thus are purely probabilistic until additional causal information can be brought to bear.

(26)

Components of the Decision Theroretic (DT) approach

Conditional Independences and DAGs

Conditional (in) dependence is a probabilistic concept: consider 3 variables A,B and C. We say A is independent of C conditional on B (A B | C) if P(A,B|C) = P(A|C)×P(B|C). Simply put, this says that if we know what C is, knowing what A is gives us no further information on B. An example from genetics is as follows: if we want to know the genetic makeup of an individual (B), we might be able to gain some information on it by looking at the genetic makeup of a sibling (A). If however, we can access the genetic makeup of the parents (C), then knowing the sibling's makeup gives us no further information on A.

DAGs encode such conditional independences. It is important to note that more than one DAG can encode the same set of conditional independences. The independence above corresponds to the three DAGs in Figure 2.1. below, however, only the first corresponds to the example given above.

1. A C B

2. A C B

3. A C B

Fig. 2.1. A is independent of B conditional on C. Interventions and Regimes

In order to make rigorous causal inference, interventions need to take place, or when no interventions take place, conditional independences need to be found that enable us to in-terpret the consequences of natural events as those of interventions (as in the case of Men-delian randomization, see below). A clear example of why intervention and naturally occuring events do not necessarily lead to the same results is confounding. Often the key to making causal inference from observations is whether some randomization had occurred.

F HDL CVD

Fig. 2.2. F is an intervention on HDL-cholesterol intake, which is associated with CVD-cardiovascular disease.

Formally, causality is encoded in DAGs by using the concept of regime. A regime indicates the intervention status of a variable and is denoted by F. Thus if we are looking at a binary variable X (0,1) and its possible regimes, we will consider the “active treatment regime” F = 1, the “baseline treatment regime” F = 0 and the observational regime F = º (the empty set), corresponding to setting X = 1 with no uncertainty, setting X = 0 with no uncertainty and finally letting X arise without manipulations of any sort. The last regime is what is generally termed the observational regime and is generally the data gathering situation. Interventions are represented by decision nodes (square boxes) in augmented DAGs (2) and these can be used to make some causal inference from DAGs as shown in Figure 2.2.

(27)

It is often questionable to extract a DAG from observational data and assume that the arrows represent causal relationships.

The role of DAGs in biomarker research

When the interest is in the role of a biomarker in a biological pathway, DAGs can be used in the first instance as a tool to play around with possible configurations. Here they are only a visual aid and encode no statistical information, and the arrows can be interpreted as speculative causal relationships. When data are available, DAGs can be extracted using conditional independence tests (1,7).

The DAG representations can be very useful for determining which set of relation-ships fits with the observed data. Note again that at this stage, DAGs represent purely statistical relationships and do not say anything about causal relationships. However, they can then be augmented by causal assumptions, if there any plausible ones. For in-stance, time ordering can be invoked to restrict the number of DAGs, or knowledge about interventions or randomization can be included to constrain their numbers.

An example concerning intermediate biomarkers

A recent example of a difficult interpretation of data concerning an intermediate biomarker is the role played by C-Reactive Protein (CRP) in cardiovascular disease. Is CRP a really intermediate marker, i.e. does it lie on the biological pathway between the risk factors and the disease or is it simply an indicator of the risk factors? For example, both metabolic changes, such as triglyceride or HDL-cholesterol elevation, and inflammation are the risk factors for CVD: is CRP just an epiphenomenon of their alteration or is it a genuine mechanism that mediates their action on CVD?

Consider the following simple example. If we have observational (i.e. F = º) data on the variables HDL, CVD and CRP, then, in theory, we can determine what conditional (in)dependences exist between them. Say for instance that we have found that

CVD CPR | HDL, F = º [2.5.]

i.e. knowing about CRP does not give us additional information about the likelihood of CVD if we already know the HDL levels in the observational regime. This results in the 3 DAGs in Figure 2.3A. These DAGs are associational — can we reduce them to one that represents causal relationships? We can do this by making explicit and external (to the DAG) assumptions:

1. CVD cannot cause increased HDL, because it is always the temporal endpoint i.e. CVD does not precede increased levels of HDL. This reduces the DAGs to those in Figure 2.3B.

2. We know that CRP production is governed by a particular gene G, and we have data on the population under study that indicates that people who naturally produce less

(28)

CRP have (say) lower HDL levels than those who produce more CRP. From this data we get that G HDL | CRP, F = º that is, once we know CRP levels, the type of gene (mutated or not) does not give us further information about the levels of HDL. If we believe that this emulates a controlled trial on CRP and further G CVD | HDL, CRP (i.e. CVD does not depend on the gene when we know what the HDL and CRP levels are), then we can use this information to make causal inference. This redcuces the possible DAGs to Figure 2.3C.

Fig. 2.3. (A) The DAGs that encode the conditional independence CVD CPR | HDL, F = º. (B) 3.2 CVD

has to be “downstream”. (C) Additional information states that G HDL | CRP, F = º.

A. B. C.

CRP HDL CVD CRP HDL CVD CRP HDL CVD

CVD HDL CRP CRP HDL CVD G

CRP HDL CVD

3. Another possibility to get from the scenario in Figure 2.3B. is if we have the results of a randomized trial of medication in another population that changes the levels of HDL (and only HDL), the aim of which is to reduce the incidence of CVD. We can think of this as a case where F = 1 for those who got the treatment and F = 0 for those who did not. If the results of the trial indicate that CVD incidence depends on the treatment, for instance, those taking the active treatment are less likely to experience CVD, then we get Figure 2.4A. if we can extract CVD F | HDL, from the data. I.e. CVD only depends on the treatment through the HDL levels — if this does not hold, then the treatment is affecting the CVD levels through both the HDL and some other unobserved factors. If we also monitor the CRP levels and the treatment further does not affect these, then we may conclude based on the evidence above and that from the trial, that Figure 2.4B. describes the associations between the variables and based on the causal assumptions we have made so far, be interpreted as encoding causal relationships.

Fig. 2.4. The DAG resulting from the combination of the evidence from the observational and

expe-rimental regimes. A. B. HDL CVD CRP HDL CVD F F ||| ||| ||| ||| |||

(29)

Depending on the set of conditional independences found in the data, different — usually more complex dependence structures can be found. The exposition above aims to show that the DAGs themselves are not causal, and can only be interpreted as such by ma-king additional assumptions that contain the causal information. Further, this information is (a) non-trivial and (b) must be plausible in the context under investigation. For example, in point 3. above, we are drawing on information from another study to inform the results of the current study — we must have grounds to believe that the two populations are comparable (exchangeable) in order to combine the two pieces of evidence.

Association vs. Intervention in the context of “genetic causality”

Experiments and intervention are the gold standard for causal inference. If we have a con-trolled randomized trial then we can make inference about causal relationships between the treatment and the effect. When we are faced with observational data we cannot make causal inference without making some strong assumptions as we will probably have to contend with confounders etc. Where does this leave us when we consider genetic data? Essentially, when someone says “this gene mutation causes increased incidence of this disease” they must have evidence to the effect that if someone were to intervene on that gene and mutate it, we would see increased incidence of the disease on average in the population under study. What type of evidence would be convincing? A randomized controlled trial with the mutation as a treatment would be, but this is not generally possible in a human population. Perhaps something similar in animals may be plausi-ble — but then this might require additional assumptions (see examples of research on knocked-out mice, for example on genes for hypertension). Finally, if enough data are available on the mutation in a suitably diverse population, a Mendelian randomization argument can be convincing, as it replaces the controlled randomization of a trial with natural randomization, thus breaking the link with non-genetic confounders. Mendelian randomization can also be used as an instrumental variable to make causal inference about exposures (8).

When none of the above arguments can be brought to bear and the basis for the statement is that a statistical association between the mutation and the disease has been found, then a statement about causality can be made only in a speculative sense. This can be useful as it can help determine what types of studies or experiments are needed in order to make more rigorous causal inference. However, more evidence is needed before serious policy decisions can be made.

DAGs in gene-environment interactions

Another issue in Epidemiology and causality, is that of gene environment interactions (GEI). DAGs can again offer a visual and also more rigorous description of GEIs. We look at Ottman's (9) models for GEI and cast them as conditional independences and cor-responding DAGs.

(30)

F

W Y

G First some notation and assumptions: 1. Gene: G = 1 mutation, G = 0 normal. 2. Exposure: W = 1 exposed, W = 0 unexposed. 3. Disease: Y = 1 presence, Y = 0 absence.

4. Assume that there are no confounders between W and Y.

5. E(Y|W = w, G = g) denotes the expected disease status when W = w and G = g. Although we only consider a binary response here, this can be extended to a con-tinuous disease measure.

In the DAGs below, the double box means deterministic node, and a hollow arrowhead means deterministic relationship. Note that a deterministic node is like a funcational relationship — it has no associated uncertainty. However, it is not like a decision node (with a single box) as it depends not on an external decision one could make, but on the the interaction we are considering.

Ottman Models

I use F when it is necessary only in the exposition below. When it is not included, it means that the expectation is conditional on F = º.

Model A

The genotype produces an exposure which can also be produced non-genetically. Ottman says it is not strictly an interaction model, so the DAG should be:

Fig. 2.5. Ottman Model A. F is the intervention, W is the environmental exposure, G is the genotype and Y

is the health outcome of interest.

Conditional independences are: — Y F|(G, W),

— G F, and

— E(Y = 1|G = 1,W = 1, F = º) E(Y = 1|G = 1,W = 0, F = º) > 0, — E(Y = 1|G = 0,W = 1, F = º) E(Y = 1|G = 0,W = 0, F = º) > 0,

that is, if the genotype is present and there is no intervention the patient will get disease, if the exposure is present (naturally), the patient will get the disease,

— E(Y = 1|G = 1, F = º) E(Y = 1|G = 1, F = 0) > 0. ||| ||| ||| ||| |||

(31)

If we treat condition (F = 0) then the subject is less likely to get disease than if she has genotype and we don’t treat.

Model B

The genotype exacerbates the effect of the exposure, but there is no effect of the genotype on the unexposed. We introduce IW as the interaction indicator for W. It is a deterministic function of W and we can see it as a “switch” for the influence of G on the W–Y relation-ship. Thus, when W = 1 IW is “on” (1) and allows G to interact with W and alter the effect on Y. When W = 0, IW is “off ” (0) and it blocks G's effect on Y. IW is a non-random node and takes on the same values as W. It has a W subscript to indicate that it depends only on W. The arrow between Y and W says that there is an association between Y and W irrespective of G. Conditional independences: — Y G|IW = 0, — Y G|IW = 1, — Y F|(W,G), — G W. F W G

Fig. 2.6. Ottman Model B.

IW Y

The first conditional independence says that G does not affect Y when IW = 0 (which by definition means W = 0). The second (dependence) says that Y is affected by G when IW = 1. The final two are as before. So:

— E(Y = 1|W = 1) – E(Y = 1|W = 0) > 0 means that W has a positive effect of exposure. — E(Y = 1|G = 1,W = 0, IW = 0) – E(Y = 1|G = 0,W = 0, IW = 0) = 0 means that

there is no effect of G for W = 0.

And finally E(Y = 1|G = 1,W = 1, IW = 1) – E(Y = 1|G = 0,W = 0, IW = 0) = 0 > 0 means that G has a positive effect when W=1.

Model C

In this scenario, exposure exacerbates the effect of the genotype but has no effect on persons with the low-risk genotype. We introduce IG which behaves in a similar way as IW above, by regulating G's interaction with W. The arrow between Y and G says that there is an association between Y and G irrespective of W.

||| ||| ||| |||

(32)

Conditional independences: — Y W|IG = 0,

— Y W|IG = 1, — Y F|(W,G), — G W.

The first conditional independence says that W does not affect Y when IG = 0, the second (dependence) says that Y is affected by W when IG = 1. The final two are as usual. So: — E(Y = 1|G = 1) – E(Y = 1|G = 0) > 0 means that G has a positive effect.

— E(Y = 1|G = 0,W = 1, IG = 0) – E(Y = 1|G = 0,W = 0, IG = 0) = 0 means W has no effect when G = 0.

And finally E(Y = 1|G = 1,W = 1, IG = 1) – E(Y = 1|G = 1,W = 0, IG = 1) = 0 > 0 means there is a positive effect of W for G = 1.

F

W

G

Fig. 2.7. Ottman Model C.

IG Y

Model D

Both exposure and genotype are required to increase risk. F

W

G Fig. 2.8. Ottman Model D.

I Y

Here I acts as a “switch” for both W and G and is defined as follows: I = “on” (1) if and only if W = G = 1 otherwise I = “off ” (0)

Conditional independences: — Y (G,W)|I = 0, — Y (G,W)|I = 1, — Y F|(W,G, I = 1), — G W. ||| ||| ||| ||| ||| ||| ||| |||

Cytaty

Powiązane dokumenty

In the absence of univocal evidence of influence of the physical exercise of given intensity on DOMS, the main cognitive objective of this study was the estimation of influence

Zestawienie litologii i porowatoœci oraz nasyce - nia wod¹ i gazem na podstawie interpretacji iloœciowej danych geofizyki otworowej i obliczone krzywe prêdkoœci (wykorzystano

[r]

Those patients, without dementia, with lower levels of Aβ 1-42 and higher levels of t-tau and p-tau in CSF, represent a high risk group in the development of full-blown

Skutnabb-Kangas (2000: 502) maintains that a universal covenant of Linguistic Human Rights should guarantee, among other things, that “any change of mother tongue is

Prowadzenie nawigacji na akwenach ograniczonych związane jest z oceną bezpieczeństwa statku, zwłaszcza podczas wykonywania manewrów antykolizyjnych. W pracy przedstawiono

Analysis of DNA adducts in white blood cells in humans has been used to monitor dietary exposure to PAHs, although attributing the exposure detected by these methods solely to diet

present paper establishes modi ed results in a similar dire tion..