• Nie Znaleziono Wyników

ATM - AFAM

WRN -AFAM

4.3. Search for selection at molecular level – case study

4.3.2. Multi-null-hypotheses method

Until recently, demonstrations of natural selection at the molecular level in the human genome were not so numerous. However, by now, there is a number of examples (Bamshad et al. 2002, Gilad et al. 2002, Toomajian and Kreitman 2002, Wooding et al. 2002), with perhaps the most spectacular being the ASPM locus, a major contributor to brain size regulation in primates (Zhang 2003, Evans et al. 2004). Usually, the model used for detection of selection is the Wright-Fisher model of genetic drift with mutation. Significant departure from predictions under the null hypothesis of neutrality may provide evidence for an alternative hypothesis of selection. However, there exist other alternatives, which may cause departures from the null, mimicking the effect of natural selection. Among these, the most important are population substructure and past change of population size (Nielsen 2001). These influences may be difficult to disentangle from effects of selection. In this section there is described the author‟s approach based on applying a series of nested null hypotheses, instead of just one. Comparison of test outcomes against these nulls will, arguably, help eliminate genetic and/or population-related factors other than selection as causes of departures from strict neutrality.

In a series of papers (for example Bonnen et al. 2000, Bonnen et al. 2002, Trikka et al.

2002) scientists from Houston genetic centers were investigating SNP haplotypes at four genes: ataxia telangiectasia mutated (ATM), human helicase RECQL, Bloom‟s syndrome (BLM), and Werner‟s syndrome (WRN). Since these genes are also implicated in human

familial cancers and impaired DNA repair, they could be potentially subject to natural selection.

ATM gene product is a member of a family of large proteins implicated in regulation of the cell cycle and response to DNA damage. Predominant abnormalities in this gene, which exhibits a remarkable diversity, involve point mutations or small rearrangements leading to splicing mutations (Teraoka et al. 1999). Li and Swift (2000) determined that patients heterozygous for splice site mutations have significantly longer survival than those homozygous for single truncating mutations. Some of the ATM mutations are responsible for ataxia telangiectasia, a recessive pleiotropic disorder, clinically characterized by cerebellar ataxia, oculcutaneous telangiectasia, immunodeficiency, sensitivity to radiomimetic agents, and predisposition to cancer.

In mentioned above work (Bonnen et al. 2000, 2002) the analysis of haplotypes revealed reduced recombination and extensive linkage disequilibrium at the ATM locus. Due to this, association studies using ATM haplotypes have a significant potential for detection of genetic backgrounds that contribute to disease. By comparison of detected SNPs with corresponding sequences of great apes our group discovered a bi-allelic polymorphism shared by humans and chimpanzees (Bonnen et al. 2000). Perhaps this polymorphism arose independently in the two species, but if it were the consequence of polymorphism present in a common ancestor of humans and chimpanzees, the finding would imply the existence of very old mutations in ATM. The latter hypothesis is consistent only with overdominance at the ATM locus because only such form of selection can preserve mutations for an almost arbitrarily long time (Slatkin and Rannala 2000). However, until author‟s works no tests of this hypothesis were performed.

The remaining three genes analyzed are human DNA helicases. All polypeptides encoded by these genes share a central region of seven helicase domains (Siitonen et al. 2003). They are involved in many aspects of DNA metabolism, including transcription, accurate chromosomal segregation, recombination, and repair. Helicase-dependent DNA repair include mismatch repair, nucleotide excision repair, and direct repair. Since genomes are subject to damage by chemical and physical agents in the environment, as well as by free radicals, endogenously generated alkylating agents or replication errors, the genetically determined effectiveness of repair is one of the important factors deciding about the fitness of corresponding phenotype.

Bloom and Werner syndromes, being similarly as ataxia telangiectasia rare autosomal recessive disorders, have overlapping clinical features, of which high predisposition to malignancies is the most remarkable (Siitonen et al. 2003). WRN plays an additional role in preventing premature aging via a mechanism suggested to be common for eukaryotes (Sinclair et al. 1997) and is involved in exonuclease activity (Huang et al. 1998). It has

BLM-binding regions containing N-terminal exonuclease domain with activity inhibited by BLM binding. At the same time, the WRN helicase activity is not affected by BLM binding (Von Kobbe et al. 2002). Cells in Bloom syndrome exhibit hypermutability including hyperrecombinality between sister chromatids and homologous chromosomes (Yusa et al.

2004). Karow et al. (2000) emphasizes the role of BLM as an antirecombinase for suppresion of tumorigenesis. Wu and Hickson (2003) have proposed a similar mechanistic explanation of BLM-based tumorigenesis suppression. BLM-catalized dissolution of double Holiday junctions prevents sister chromatid exchange and through suppression of ectopic recombination and crossing-over between homologous chromosomes BLM product prevents loss of heterozygosity. Adams et al. (2003) concluded that BLM maintains genomic stability by promoting efficient repair DNA synthesis and thereby prevents double-strand break repair by less precise pathways.

Interestingly, Ellis et al. (1994) have determined that a 6-bp ATCTGA deletion and 7-bp TAGATTC insertion at nucleotide 2281 of BLM cDNA, is a mutation inherited from a founder of Ashkenazi Jewish population and nearly all Ashkenazi Jews with Bloom syndrome inherit this mutation, named blmAsh, identical by descent from this common ancestor. Cells derived from individuals suffering from any the two syndromes show significant levels of genomic instability caused by the increased level of chromosomal aberrations (Yamagata et al. 1998), however RECQL has not been related to any disease and its functions, other than DNA unwinding, remain unknown. Geneticists from Houston (Trikka et al. 2002) performed detailed linkage disequilibrium and recombination analysis for these helicases with results not as extreme as for the ATM. For the BLM we confirmed the founder haplotype of Ashkenazi Jews homozygous for blmAsh.

The range of functions crucial for survival enumerated above as well as the characteristic patterns of polymorphism present in our samples suggest that these genes may be under selective forces possible for detection. The simplest directional deleterious selection that may be postulated is unlikely due to existence of old mutations in all loci. More feasible is a form of balancing selection. The current section tackles the problem of identification of selection and presents a methodology based on incorporating demography into null hypotheses.

To detect departures from the neutral model, the following statistics described in detail in section 4.2 were used: Tajima‟s (1989) T (for uniformity, the nomenclature of Fu (1997) and Wall (1999) is followed), Fu and Li‟s (1993) F*, Kelly‟s (1997) ZnS and Wall‟s (1999) Q.

The choice of above tests was dictated by: (a) the type of data at disposal, and (b) by the proposed methodology of verification whether a detected departure from the neutral expectation can be considered to be a result of a given type of selection operating at the locus: Issues, assigned above as (a) and (b) are discussed in more detail in what follows.

a) Since the SNPs analyzed come from intronic regions of the target genes, it was not possible to use McDonald-Kreitman (1991) type tests based on the differences in ratios of nonsynonymous and synonymous mutation rates within and between species (resulting in polymorphism and divergence, respectively), although they are reported to be very powerful in detection of selection and not dependent on population demographic effects (Nielsen, 2001). Similar reasons excluded the application of Akashi‟s (1995) test, as well as Nielsen and Weinreich‟s (1999) test, in which the ages of nonsynonymous and synonymous mutations are estimated and compared with predictions of the neutral model.

Hudson, Kreitman and Aguade‟s (1987) HKA test was not used due to lack of chimpanzee sequences for all introns containing our SNPs. The test using interspecific divergence rate calculated from only a few introns that could be obtained using BLAST search of the databases of the Chimp Sequencing Project, was considered to be potentially biased.

b) Natural selection is not the only genetic force causing departures from predictions of the neutral Wright-Fisher model in the usual form, i.e. assuming a panmicting population and constancy of the population size. Since none of these assumptions strictly holds for actual human demography, there is a proposition to incorporate demographic effects into null hypotheses. Then, the departure from the modified nulls could be considered as caused by selection. One of the delicate points of this approach is that scientists only know a general outline of the past human demography. Nevertheless, it is possible to assume demography that is more realistic than that assumed in the classical Wright-Fisher model, and at the same time, which is conservative. Conservative means that it is more difficult to reject the null with this assumed demography than it would have been with the actual unknown demography. It implies that we have to use conservative parameter values for growth and migration rates in expanding and sub-structured human population. These parameter values are different for different types of selection. This is why it is so crucial to know, before proposing modified null hypotheses, whether the genealogy implied by data is similar to that caused by (i) growth, deleterious selection or positive selective sweeps, or (ii) shrinkage, substructure or balancing selection. As it was presented in greater detail in the section 4.2, tests which can reliably assign the pattern of departure to (i) or (ii), are these belonging to Fu‟s (1997) F’(r,r’) class. Tajima‟s T and Fu‟s F* are two the most extreme cases of such tests: F’(0, 1) and F’(0, ) respectively. The first relies on estimates of  = 4N based on the average number of nucleotide differences and on the number of segregating sites, the second compares the number of mutations located on external and internal branches of genealogy. Similar idea of comparison of the lengths of old and recent branches of genealogy is incorporated in Kelly‟s ZnS statistic based on the

average linkage disequilibrium at the locus. However, this latter produces similar, inflated, patterns both for selective sweeps with recombination and for balancing selection. Also, Wall‟s W and Q tests based on the number of adjacent congruent segregating sites employ similar principle. The latter pair of tests is reported (Wall 1999) to be especially well designed for detection of balancing selection, which may be suspected to operate on genes associated with disease and presenting a polymorphism with the excess of old mutations (test Q is preferred over W if recombination is present).

In order to exclude genetic forces other than selection as sources of significant test outcomes, we applied four different null hypotheses:

H00, panmictic population, with population size constant in time,

H01, panmictic population, with population size increasing exponentially 10 times over the period of 5,000 human generations, to achieve present effective population size Nend = 100,000.

H02, sub structured population, growing like in H01, composed of 4 demes with a split 5,000 generations ago and between-deme migration rate m  Nend = 100.

H03, demography like in H02, but with recombination with estimated intensities (Table 6).

The influence of genetic forces assumed in the null hypotheses on site frequency spectra of the ATM gene for African Americans, predicted under selective neutrality, is presented in Figure 2.

Fig. 4.3:2. The illustration of the influence of null hypothesis on expected frequencies of segregating sites of types: 1 to n/2

Rys. 4.3:2. Ilustracja wpływu hipotezy zerowej na oczekiwane częstości pozycji segregujących typoów: 1 do n/2

The segregating site is said to be of the type i if it has i and n-i variants in a sample, therefore, the less frequent the segregating site is, the closer to one is its type (reaching one for singletons). Charts in Fig. 1 present simulated frequencies of a sample composed of n =142 sequences 13.5 kbp long, conditioned on 13 segregating sites (corresponding to ATM sequence for AfAm population) assuming selective neutrality under null hypotheses (a) H00, (b) H01, (c) H02 and (d) H03. Observe excess of rare segregating sites and reduction of frequent segregating sites under H01 compared to H00. Such reduction is characteristic for samples corresponding to all considered genes and populations (results not shown). The H02 and H03 result in slight excess of rare segregating sites over H00. Since the neutral site frequency spectrum changes for various null hypotheses, so should the critical values of tests based on the shape of such spectra (for example T or F*). Horizontal axis denotes the type of the segregating site, while vertical axis shows the relative frequency of the site of a given type. In the charts, vertical bars indicate the average frequencies over all simulations, whereas horizontal upper and lower bars indicate maximum and minimum values of these frequencies, respectively. Note that horizontal lower bars, for all types of segregating sites except the rarest, indicate frequency zero, and therefore are hardly visible.

For detection of balancing selection, H01 and H02 are less conservative than H00, although they are still conservative in the sense of either preserving the excess of older mutations, or reducing the number of younger mutations, or both, for feasible scenarios of human population history. The reason for this is that actual increase of the human population size was most likely larger than 10-fold growth over 5,000 generations. This makes H01

conservative, if the direction of departures from neutrality is towards excess of old mutations or reducing the number of young mutations or both (Fu 1996, Fu 1997).

Since H02 is always more conservative in the sense discussed above than H01, then if H01

is conservative, so must be also H02. H03 assumes the same demography as H02, but takes recombination into account. It is therefore the most conservative and including a maximum number of genetic forces. Hence, departures from H03 should be interpreted as most likely caused by balancing selection. The results of testing for all loci, populations and null hypotheses are presented in Tables 8, 9, 10 and 11 for tests T, ZnS, F* and Q, respectively.

Outcomes of tests T and F* against H00 are similar and significantly positive for ATM and RECQL. Such outcomes indicate that the polymorphism in loci considered exhibits an excess of old mutations, or a deficit of young mutations or both, compared to the neutral Wright-Fisher model (Fu 1997). At the same time, WRN and BLM do not show significant deviation from neutrality, although they deviate in the same direction as ATM and RECQL.

Site by site comparison of human SNPs with corresponding ape sequences confirms the existence of old mutations in all loci.

For all helicases, a sample composed of 10 chromosomes from 2 chimpanzees, 1 bonobo and 2 gorillas indicates that human polymorphism is monomorphic among apes and we could treat the common ape haplotype as the ancestral sequence. For all genes considered such ancestral haplotype is present in human population at low frequencies.

Some of the mentioned above SNPs, like for example IVS15+33444t-c in RECQL, represent young mutations with mutated nucleotides present at very low frequencies, but other, such as IVS1-30638g-c or IVS19-30329g-t in the same gene, include derived mutations observed in the second, third and fourth most frequent haplotype. Such mutations, and especially those present in most common haplotypes, like IVS1-8213g-a in WRN or IVS1-20561t-c in BLM, are frequent and therefore likely to be old, consistent with the positive outcomes of Fu‟s F‟ (r, r‟) tests.

Table 4.3:8 Significance of the Tajima‟s T test for various null hypotheses. Dark,

significant for 3-4 populations. Light, non significant for 1-2 populations. Unshaded, non significant for 3-4 populations Gene Population Value T

(H00) T (H01)

T (H02)

T (H03)

AfAm 2.42 * *** * *

ATM Caucasian 3.48 *** *** *** ***

Asian 2.55 * *** ** **

Hispanic 3.20 ** *** ** **

AfAm 2.83 * *** * *

RECQL Caucasian 3.10 ** *** ** **

Asian 2.65 * *** * *

Hispanic 2.93 ** *** ** **

AfAm 0.79 NS a * NS NS

WRN Caucasian 1.26 NS * NS NS

Asian 1.36 NS * NS NS

Hispanic 1.10 NS * NS NS

AfAm 2.06 NS *** * *

BLM Caucasian 2.50 * *** ** **

Asian 1.78 NS ** NS NS

Hispanic 1.87 NS ** NS NS

***: p < 0.001, **: 0.01 > p  0.001, *: 0.05 > p  0.01, a NS (non significant): p > 0.05.

The excess of old mutations is also observed in ATM, and furthermore this locus contains a bi-allelic trans-polymorphism, shared by humans and chimpanzees at SNP IVS62+424g-a (shaded nucleotide in Table 7; note also framed nucleotides A in gorilla sequences, different from both chimp and human variations). If this between-species polymorphism is inherited from a common ancestor, the mutation must be several million years old (only balancing selection can preserve such old mutation) and even if it arose independently in humans and chimpanzees, the comparison of the most probable ancestral sequence, shared by chimp and

bonobo, with human haplotypes indicates old mutations having pattern similar to IVS1-30638g-c or IVS19-30329g-t in RECQL.

Table 4.3:9 Significance of the Kelly‟s ZnS test for various null hypotheses. The

meaning of shaded regions is the same as in Table 8 Gene Population Value ZnS

(H00) ZnS (H01)

ZnS (H02)

ZnS (H03)

AfAm 0.29 NS a * NS *

ATM Caucasian 0.47 * ** ** **

Asian 0.49 * ** * *

Hispanic 0.45 * ** * *

AfAm 0.24 NS * NS NS

RECQL Caucasian 0.36 NS * * *

Asian 0.52 * ** * *

Hispanic 0.32 NS * NS NS

AfAm 0.06 NS ?b NS NS

WRN Caucasian 0.10 NS * NS NS

Asian 0.18 NS * NS NS

Hispanic 0.12 NS * NS NS

AfAm 0.12 NS * NS NS

BLM Caucasian 0.18 NS * NS NS

Asian 0.17 NS * NS NS

Hispanic 0.15 NS * NS NS

**: 0.01 > p  0.001, *: 0.05 > p  0.01, a NS (non significant): p > 0.05,

b ? (borderline): p = 0.05.

The phylogenetic tree (Fig. 3) reveals that the most ancient human hyplotypes 5 and 13 are very rare, and the most frequent haplotypes 2 and 31 with respective frequencies of 31%

and 28%, belong to two separate clades. Having an indication about the excess of old mutations, it is possible to understand why the outcomes of T and F* against H01 are significant for all loci. However, more interesting are the outcomes of testing against H02 and H03, as they incorporate not only growth, but also substructure and, in the case of H03, the effect of recombination. For both these hypotheses T and F* are significant for ATM and RECQL, and F* is also significant for BLM.

In Fig. 3, the first number indicates the reference number of haplotype and the second (if present) the frequency in percents (if absent the frequency is less than 1%). The number in parentheses gives the rank of the haplotype according to the global frequency in human population. For example the uppermost haplotype number 2 has frequency 31% and is the most frequent haplotype.

The pattern found in Kelly‟s ZnS test outcomes (Table 9) is essentially the same as that in F’(r,r’) tests, yet the overall power seems to decrease. Still, the ATM and RECQL outcomes for the most reliable nulls are significant, although the significance is more evident in the

case of ATM. BLM and WRN are both non-significant. For ATM and RECQL loci Wall‟s Q outcomes (Table 11) are on the boundary of significance against H03 and non significant for WRN and BLM even against H01.

Table 4.3:10 Significance of the Fu‟s F* test for various null hypotheses. The meaning of shaded

regions like in Table 8

Gene Population Value F* (H00) F* (H01) F* (H02) F* (H03)

AfAm 2.10 * *** ** **

ATM Caucasian 2.60 ** *** *** ***

Asian 0.96 NS a * NS NS

Hispanic 2.47 * *** ** **

AfAm 1.68 NS ** * *

RECQL Caucasian 2.30 ** *** ** **

Asian 1.52 NS ** * *

Hispanic 2.23 * *** * *

AfAm 0.21 NS NS NS NS

WRN Caucasian 1.58 NS * * *

Asian 1.47 NS * * NS

Hispanic 0.05 NS NS NS NS

AfAm 1.72 NS *** * *

BLM Caucasian 1.90 * *** ** **

Asian 1.58 NS ** * *

Hispanic 1.65 NS ** * *

***: p < 0.001, **: 0.01 > p  0.001, *: 0.05 > p  0.01,

a NS (non significant): p > 0.05.

Nielsen (2001) suggests being conservative in conclusions about selection based on tests using only haplotype spectrum data, because other alternative hypotheses lead to similar results. The main alternative is that of population growth, which can be easily mistaken for a selection. These concerns, which are especially important in the case of selective sweeps as leading to an excess of young mutations (Fu 1997), are not directly applicable to this study, with samples displaying excess of old mutations. Furthermore the concerns of Nielsen are implicitly based on the assumption that testing is performed against H00, i.e. classical Wright-Fisher model of neutral genetic drift in a panmictic constant-size population. In this study however, it was tested not only against H00, but also against other null hypotheses formulated in a conservative way. If conservative rates of growth and migration have been chosen, then demographic factors should not obscure inferences.

Table 4.3:11 Significance of the Wall‟s Q test for various null hypotheses. The meaning of shaded

regions is the same as in Table 8

Gene Population Value Q (H00) Q (H01) Q (H02) Q (H03)

AfAm 0 NSa NS NS NS

ATM Caucasian 0.36 NS * * *

Asian 0.29 NS * NS ? b

Asian 0.29 NS * NS ? b