• Nie Znaleziono Wyników

6.2.1. Single sample

Similarly to the analyses on data from Roadmap Epigenomics (chapter 5), to assess the obtained results I looked how well they match characteristics of the peaks we expect to observe in a given experiment. In the case of H3K27me3 ChIP-seq we expect long

Figure 6.1: Example coverages from a single patient (DA01). From the top: input sample;

three ChIP-seqs, against H327me3, H3K27ac and H3K27me3; ATAC-seq; genes in this region (specically, it is a approximately 800 kb long fragment of chromosome 7). Note that the scale for input and H3K27me3 ChIP-seq is 0-30, while for the rest it's 0-70.

domains, up to millions of bp. Furthermore, H3K27me3 is a repressor, so we expect genes within its domains to have lowered expression compared with the genes outside of the domains.

In gure 6.2 I present the distribution of lengths of peaks obtained by three peakcallers:

MACS, SICER and HERON. HERON was run in three settings - with Gaussian distri-bution (with and without control le) and with negative binomial distridistri-bution (without control le). MACS and SICER were run with default parameters. As previous tests have shown that adding "broad" option to MACS to make it look for longer peaks does not inuence results too much I decided not to include this version in the results presented in this chapter.

As one can see, again HERON gives longer peaks than other peakcallers in most of the cases. The only exception are results obtained with negative binomial distribution for patient DA05; here majority of peaks are similar to MACS peaks in terms of their lengths (there are some outliers, however, with length up to 30 Mb). In all the other cases, including results for patient DA05 obtained using Gaussian distribution, HERON peaks are longer than those given by MACS or SICER, both in terms of mean and median.

Again, SICER gives much longer peaks than MACS, but not as long as HERON.

I also checked the expression of genes that overlap with discovered peaks. In gure 6.3 I present the results, aggregated from all the patients. One can see that all the peakcallers seem to return reasonable results - median expression of genes overlapping with a peak is much lower than median expression of all the genes, and that holds for every peakcaller.

I also checked how it looks for separate patients; in gure 6.4 I present six examples.

Figure 6.2: Length of H3K27me3 peaks discovered by MACS, SICER and HERON with dierent distributions. On the left, the whole boxplots are shown together with outliers.

On the right, x axis is zoomed in to range (0, 50 kb); for readability, outliers are not shown here.

Figure 6.3: Median expression of genes overlapping with peaks discovered by HERON, MACS and SICER. HERON G means HERON with Gaussian distribution; HERON NB means HERON with negative binomial distribution; HERON G+C means HERON with Gaussian distribution and with a control le.

Again, results of HERON with negative binomial distribution for patient DA05 are not satisfactory: expression of the genes within those peaks is even higher than expression of all the genes. In all the other cases (including those not shown in the gure for read-ability's sake) all the peakcallers result in peaks that seem to correlate with signicantly lowered expression; the dierence is statistically signicant according to Mann-Whitney test (at the level α < 0.05). Interestingly, unlike in the case from the previous chapter (see gure 5.3), MACS gives very similar results to HERON in terms of expression of the genes that overlap with peaks.

Additionally, I performed these analyses specically for transcripts that dier between patients. I wanted to see if we could catch dierences between patients. I took only those transcipts that have a peak in at least 2 patients and don't have a peak in at least 2 patients, i.e. transcripts that, according to peakcalling, have dierential pattern of H3K27me3 binding. Then, for every patient I checked the expression of each of the transcripts. For every peakcaller, for every transcript I calculated its mean expression in patients where the transcript overlaps with some peak of a given peakcaller, and mean expression in patients where there is no peak discovered that would overlap the transcript.

Next, I calculated the mean over all the selected transcripts of these two sets of values.

I present the results in gure 6.5. As one can see, the expression of transcripts with a peak is lower than those without a peak, and again, it holds true for every peakcaller.

The dierence between expressions is signicant according to Mann-Whitney test1. It is the smallest for HERON with negative binomial distribution, but still signicant.

1The p-values, as estimated using R, were equal: MACS: 1.47−186SICER: 1.043−154, HERON with negative binomial distribution: 5.166−68, with Gaussian distribution: 3.575−155, with Gaussian distribu-tion and with a control le: 1.392−139.

Figure 6.4: Expression of genes (in RPKM) overlapping with peaks discovered by HERON, MACS and SICER for selected example patients. On the left, boxplots for the whole range are shown, together with outliers. On the right, for readability, x axis is zoomed in to range (0, 5) and the outliers are hidden.

Figure 6.5: Median expression of transcripts with and without a H3K27me3 peak (in RPKM). In this plot, only the transcripts for which peak presence diers between the patients are shown.

Another way of assessing results is to see how reproducible they are between samples.

While we can expect some variability between patients, we also expect some peaks to be common. The precise borders might be dierent, but here for the sake of this analysis I assume that if two peaks in two samples overlap with each other by at least one nucleotide, it means they represent the same peak and I call this peak common between the two samples. In gure 6.6 I present how many peaks discovered by each of the tested peakcallers are present in one patient, how many are in two patients etc. In panel A, the raw counts of peaks are presented; this way of visualisation, however, can be misleading, as the peakcallers called drastically dierent numbers of peaks: for example MACS called a total of 172 971 distinctive peaks (if a peak is common between a couple of patients, i.e. in those patients peaks were called that overlap with each other, I count it as one peak), while HERON with negative binomial distribution called only 26 073 distinctive peaks. To take that into account, I additionally calculated normalised counts by dividing each count by the total number of peaks called by the given peakcaller. The results are presented in panel B. It can be seen that for every peakcaller most peaks are present only in one patient, which may be due to the dynamic character of the H3K27me3 modication. For MACS the fraction systematically drops; only 0.0078 (1 344 out of 172 971) of the peaks are present in all 12 patients. For HERON, however, the fraction drops at the beginning, but there is a raise at the end of the curve; for example, for HERON with Gaussian distribution 0.07 peaks (2 771 out of 41 035) are present in all the patients. This suggest HERON manages to discover signicantly more reproducible peaks than MACS. SICER seems to give better results than MACS (in terms of peaks'

Figure 6.6: Reproducibility of peaks between patients. The x axis represents number of patients, denoted "x"; the y axis represents how many peaks are present in x patients, according to the tested peakcallers. In (A), y axis simply represents absolute number of peaks; for example, there are around 100 000 peaks discovered by MACS that are present only in 1 patient, around 44 000 discovered by SICER in 1 patient etc. To take into account the fact that dierent peakcallers called dierent number of peaks - especially MACS called much more peaks than other peakcallers - in panel (B) every number of peaks is represented as a fraction of all the peaks discovered by the given peakcaller;

for example, 0.6 of peaks discovered by MACS are present in 1 patient etc. The data presented in both panels is similar, but in (B) the numbers are normalised to facilitate comparison between the peakcallers.

reproducibility), but still worse than HERON; for example, 0.06 of SICER's peaks are present in 11 or 12 patients, while for HERON it is 0.1, 0.14 and 0.16 (for Gaussian distribution, Gaussian with control le and negative binomial distribution, respectively).

HERON's advantage might lie in the fact that it discovers long peaks, so they overlap

with each other more easily than short MACS peaks and so they can be more easily considered common between some samples. To take that into account I performed a similar analysis, but with transcripts instead of peaks. For every patient, a transcript is considered to have a H3K27me3 peak if it overlaps with some peak at least by one nucleotide. Next, I plotted how many transcripts overlap with a peak in one patient, in two patients etc. The results are shown in gure 6.7A. They are very similar to the ones on gure 6.6; in particular, when it comes to transcripts that overlap with a H3K27me3 domain in majority of patients, MACS still discovers signicantly less of them compared with HERON. Interestingly, here HERON with negative binomial distribution seems less reliable; we can observe a spike at "11 patients", followed by a drop. Because for patient DA05 results obtained with negative binomial distribution were far from satisfactory as far as the length of peaks and expression of genes within them are concerned, I decided to check how the plot would look like if we exclude this patient from the analysis. It is shown in panel B of gure 6.7; as expected, it looks much better. Removing any other sample from the analysis didn't x the issue, that is, there was still a peak at the second to last sample and a drop at the last (results not shown here). It suggests that while theoretically the negative binomial distribution can t the data better, in practice it is less stable than Gaussian and might lead to unsatisfactory results.

6.2.2. Multiple samples

If we're only interested in general landscape in gliomas, not in individual trends in every patient, we can provide many signals at once, which might be especially benecial for poorly enriched data. I checked how providing many samples at once inuences the obtained results.

In gure 6.8 I present the results obtained on ChIP-seq against H3K27me3 modication, using Gaussian distribution. In the top panel, the mean length of the obtained peaks is shown and in the bottom one - the mean expression of genes overlapping the identied peaks. As far as the length is concerned, the results are similar to the ones presented in the previous chapter (see gure 5.4); peaks called using multiple les tend to be much shorter than those called using a single le. As far as the expression of genes is concerned, however, the results are dierent. Genes overlapping peaks called using multiple les tend to have higher expression than those overlapping peaks called with a single le.

To verify the signicance of the dierence between the sets of expressions, I decided to perform multiple one-sided t-tests and Mann-Whitney tests. For every pair of results

"results on patient DA0* vs results on all patients DA", "results on patient GB0*/PG0*

vs results on all patients GB-PG", "results on any patient vs results on all patients" I performed the tests verifying a null hypothesis "there is no dierence between means of these expressions" against an alternative hypothesis "expressions in the rst (/second) set are on average higher". I present the obtained results in table 6.2. According to Mann-Whitney test, in every case expression of genes that overlap peaks identied using a single sample is signicantly lower than of those overlapping peaks identied using multiple samples. The p-values obtained using t-test are more variable: in 15 out of 24

Figure 6.7: Reproducibility of transcripts that overlap with a peak. The x axis represents number of patients, denoted "x"; the y axis represents how many transcripts have a H3K27me3 peak in x patients. As here the scale is similar for all the peakcallers, I did not include the version with normalised counts, where y axis represents fraction of transcripts instead of raw number. In (A), all the patients are included. In (B) patient DA05 is removed.

cases mean expression calculated using the results of peakcalling on one le is lower than the one calculated using the results of peakcalling on multiple les, while in one case it is the other way around. Interestingly, for one patient (DA03) mean expression based on 1-sample peakcalling is lower than the one based on peakcalling on all samples, and at the same time it is higher than the one based on peakcalling on all DA samples, and both these dierences are signicant according to the t-test (p-value < 0.05).

It is worth noting that in gure 6.8 for simplicity I included one bar for every peakcalling run on multiple samples; for example the mean expression denoted "DA" is a mean ex-pression of all the genes that overlap the peaks identied using all DA samples, and the

Figure 6.8: Comparison of peakcalling on single sample and multiple samples. In the top plot, mean peak length is shown; in the bottom one, mean expression of genes within the peaks (in RPKM). In both barplots, the rst 12 bars represent results obtained on a single sample; the rst 5 are for DA samples (light red), and the following 7 for GB and PG samples (light blue). The last three bars represent results obtained on multiple samples: on all the DA samples (red), all the GB and PG samples (blue), and on both these subsets (purple), respectively.

measure of expression is taken from all the patients (so we average over much bigger sam-ple of expression measures than in the case of a single patient analysis). In the statistical tests presented in table 6.2 for every comparison I used only the expressions measured in the specic patient; for example, for comparison "DA01 vs DA" I compare expres-sions of genes overlapping the peaks identied when running HERON only on patient DA01 with expressions of genes overlapping the peaks identied by providing HERON with all the DA samples, and all the expression measures are from RNA-seq experiment conducted on DA01 patient. This is why means and medians of expression dier for

"DA", "GB-PG" and "all" cases; while in every comparison mean "DA" expression is calculated for the same genes, the patient whose expressions are taken into account is dierent. I additionally ran the same statistical tests using the set of all expressions from all the patients for "all" variant, from all DA patients for "DA" variant and from all GB patients for "GB" variant, i.e. the same sets of expressions that were used in gure 6.8.

As the results yielded similar conclusions, I do not present them here. In the previous chapter (see gure 5.4) there was no confusion like that, as for every tissue we only had one measure of expression, even if we had multiple ChIP-seq experiments.

The dierence between the results obtained in the previous chapter and here might stem from the characteristics of the analysed samples. In chapter 5 healthy tissues from Roadmap Epigenomics project were analysed; here we analyse cancer tissues, specically from glioma tumours. The variability of histone modication patterns, both within and between samples, is much higher in cancer samples than in the healthy ones. Additionally, the higher the grade of the glioma the more variability we can expect. It means that for the cancer samples the subset of genomic regions that are covered by H3K27me3 domains in almost all cells in all the analysed samples is much smaller and hence harder to identify than for Roadmap Epigenomics samples. Furthermore, gliomas are known to have an aberrant gene expression; it might be that due to malfunctions in some of the regulatory mechanisms expression of some genes that are indeed located within a H3K27me3 domain is atypically high. In such case using gene expression to assess hypothetical localisation of H3K27me3 peaks would be misleading.

It is worth mentioning here that providing many similar samples might cause issues with algorithm's convergence when we want to use Gaussian distribution with full covariance matrix. For example, when I ran HERON on all the ATAC-seq samples (24 in total) it didn't manage to converge, as the samples were too correlated with each other and it caused the covariance matrix to not be full rank. In such case the distribution is degenerate and does not have a density. This can be easily overcome by using a diagonal covariance matrix instead of a full one, which is why this is the default option in HERON.

The issue seems to appear when the samples are strongly correlated with each other; for example in the case of ATAC-seqs, minimal correlation between the samples was equal to 0.97. In the case of ChIP-seqs the correlation varied more. Obviously the samples were still strongly correlated (which is why we run peakcalling on all of them at once, hoping it will help discover peaks repetitive between them), but they were far from being identical;

for example, in the case of H3K27me3 ChIP-seq minimal correlation was equal to 0.51.

peakcalling

on 1 sample peakcalling on multiple samples

t-test

p-value Mann-Whitney p-value patient set mean median mean median H1: > H1: < H1: > H1: <

DA01 all 2.476 0.056 2.587 0.079 0.562 0.438 1 < 2.2e-16

DA 2.218 0.063 0.363 0.637 1 3.2e-6

DA03 all 2.353 0.05 3.218 0.084 0.999 0.001 1 < 2.2e-16

DA 1.726 0.065 0.003 0.997 1 < 2.2e-16

DA04 all 2.226 0.042 3.149 0.105 0.995 0.005 1 < 2.2e-16

DA 2.214 0.087 0.486 0.514 1 < 2.2e-16

DA05 all 1.09 0.037 1.756 0.068 0.996 0.004 1 < 2.2e-16

DA 1.412 0.053 0.89 0.11 1 < 2.2e-16

DA06 all 1.818 0.042 3.503 0.071 1 1.6e-05 1 < 2.2e-16

DA 1.862 0.056 0.548 0.452 1 < 2.2e-16

GB01 all 2.549 0.061 4.900 0.072 1 1.5e-05 1 4.5e-10

GB 5.056 0.073 1 4e-06 1 3.9e-13

GB03 all 2.827 0.048 3.513 0.066 0.822 0.178 1 < 2.2e-16

GB 3.564 0.066 0.84 0.160 1 < 2.2e-16

GB04 all 2.439 0.077 3.211 0.093 0.947 0.053 1 < 2.2e-16

GB 3.322 0.094 0.968 0.032 1 < 2.2e-16

GB05 all 1.816 0.053 2.646 0.093 1 2.6e-05 1 < 2.2e-16

GB 2.758 0.093 1 2.3e-06 1 < 2.2e-16

GB06 all 1.309 0.028 2.655 0.109 1 2.3e-08 1 < 2.2e-16

GB 2.759 0.109 1 1.8e-09 1 < 2.2e-16

GB07 all 1.783 0.05 2.647 0.093 1 4e-05 1 < 2.2e-16

GB 2.753 0.094 1 5e-06 1 < 2.2e-16

PG11 all 1.079 0.04 2.200 0.063 1 3e-10 1 < 2.2e-16

GB 2.203 0.061 1 2.7e-10 1 < 2.2e-16

Table 6.2: Comparison of expression of genes that overlap peaks detected using either a single or multiple samples. Multiple peakcallings were performed: using only one sample, using all the samples from the given grade ("DA" / "GB") and using all the samples ("all"). For each of the identied peak sets, a set of genes that overlap them was identied.

Then, for every set expressions of genes within it were calculated, using results of RNA-seq experiment performed on the patient of interest (which is why both mean and median are dierent for the set "all" in every patient; while the set of genes whose expression is concerned stays the same, the expressions of dierent patients are taken into account; the same for the sets "DA" and "GB"). Column "set" indicates which set of les was used in the comparison. In columns "patient" and "peakcalling on 1 sample" values would be repeated, so for readability the second row for each patient is left blank. Both statistical tests were performed in two variants: using an alternative hypothesis "mean expression of genes identied with 1-sample peakcalling is higher" (denoted "H1: >" in the column header) and the other way around. The estimated p-value is bolded if it is lower than

Figure 6.9: Peakcalling on H3K4me3 samples belonging to grades PA and GB, with an attempt to identify common and unique peaks. Plot represents means of emissions estimated for each of the states as a results of the EM algorithm. Each dot represents one sample and is coloured according to the grade of the tumour. Right panel shows the same data as the left one, but zoomed in to (0, 1) range.

6.2.3. Multiple samples with subpopulations

As the patients in the project were divided into three groups basing on the grade of the cancer (I will denote them as PA, DA and GB), I decided to try to use the feature of peakcalling on multiple samples divided into subpopulations to dierentiate between the grades. The grades are expected to have dierent pattern of histone modication, and so I wanted to see if we could catch these dierences with HERON; this could be very benecial as far as learning about glioma's molecular mechanisms and development is concerned. As the DA and GB grade are considered to be more similar to each other than to PA, and similarly PA and DA are more similar to each other than to GB, I focused on dierentiating between PA and GB, DA and GB, PA+DA and GB, PA and DA+GB. I decided to try this feature on the H3K27me3 ChIP-seq, as HERON is supposed to work mostly on the long domains characteristic for H3K27me3, and on the H3K4me3 ChIP-seq, as it usually gives very good signal-to-noise ratio, which should make the analysis easier.

Figure 6.9 presents the nal estimated means for H3K4me3 ChIP-seq data, in the setting

"PA vs GB". Similarly to the case from the previous chapter, the states are hard to inter-pret; there is no one state that would represent peaks unique for the PA or GB samples.

the last state seems to represent peaks common for both groups; the other states proba-bly represent background, or - in the case of the third state - very poorly enriched peaks.

The results were similarly unsatisfactory for other settings and peakcalling parameters.

Additionally I've tried to dierentiate between the dierent ChIP-seqs. I've provided HERON with ChIP-seq against H3K4me3 and H3K27ac, either from one sample, all

Figure 6.10: Peakcalling on samples from dierent experiments (H3K4me3 and H3K27ac ChIP-seqs), with an attempt to identify common and unique peaks. Plot represents means of emissions estimated for each of the states as a results of the EM algorithm. Each dot represents one sample and is coloured according to the experiment. The right panel shows the same data as the left one, but zoomed in to the (0, 3) range.

the samples from a specic grade or simply all the samples. I denoted the H3K4me3 samples as one population, and H3K27ac samples as the other. I hoped HERON will discover peaks that are unique for every of these modications (for example we expect active enhancers to have H3K27ac peak, but no H3K4me3 peak) and ones common for them (most active promoters are expected to be marked by both these modications). I tried similar procedure but with H3K27me3 and H3K4me3, and also with all the three ChIP-seqs at once. I ran most tests providing only samples from one grade, as I did not want the dierences between the samples that can be attributed to the dierence in grades to inuence the results and make the analysis harder. In gure 6.10 I present the obtained results for the "H3K4me3 vs H3K27ac" setting, for patients with PA grade.

These results look slightly more promising than the previous ones. As one can see, here the rst three states seem to represent background; the fourth state might represent some of the H3K27ac peaks, specically those that have very poor enrichment. In this state, all the H3K27ac samples have a higher mean than in the previous states and higher than all the H3K4me3 samples. The last state seem to represent all the H3K4me3 peaks, including those common with H3K27ac peaks. Means of H3K4me3 samples are substantially higher here than in any other state and than means of H3K27ac samples. It is worth noting that it is expected that H3K4me3 ChIP-seq would have a higher signal-to-noise ratio than H3K27ac ChIP-seq, and so the means of the H3K4me3 samples in

"peak" states would be higher than those of H3K27ac samples. In the last state all the H3K27ac samples have higher means than in the previous one, which could mean that the H3K27ac peaks that overlap with H3K4me3 peaks tend to be higher.

Powiązane dokumenty