• Nie Znaleziono Wyników

Predicting functional effect of human missense mutations

N/A
N/A
Protected

Academic year: 2021

Share "Predicting functional effect of human missense mutations"

Copied!
2
0
0

Pełen tekst

(1)

Delft University of Technology

Predicting functional effect of human missense mutations

van den Berg, Bastiaan; Thornton, JM; Reinders, Marcel; de Ridder, Dick; Beer, TAP Publication date

2013

Document Version Final published version Citation (APA)

van den Berg, B., Thornton, JM., Reinders, M., de Ridder, D., & Beer, TAP. (2013). Predicting functional effect of human missense mutations. 1.

Important note

To cite this publication, please use the final published version (if applicable). Please check the document version above.

Copyright

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons. Takedown policy

Please contact us and provide details if you believe this document breaches copyrights. We will remove access to the work immediately and investigate your claim.

This work is downloaded from Delft University of Technology.

(2)

Predicting functional effect of human missense mutations

B.A. van den Berg

*1,3,4

, J.M. Thornton

2

, M.J.T. Reinders

1,3,4

, D. de Ridder

1,3,4

, and T.A.P. de Beer

2

Data set

Introduction

Our aim is to prioritize human missense mutations by their probability of being disease causing. Such a computational method could be used to obtain a reduced set of mutations with a relatively large fraction of disease related mutations, thereby aiding in the search for this type of mutation within a large mutation set.

Whereas a range of methods is available for this purpose, only few employ the availability of the 1000G data to obtain a set of neutral mutations. The novelty of our approach is the use of separate classifiers that were trained on a subset of mutations from one amino acid to any other amino acid. The combined performance of these classifiers show an improved performance compared to the often used prediction method PolyPhen2.

* b.a.vandenberg@tudelft.nl

1 Delft Bioinformatics Lab, Faculty of Electrical Engineering, Mathematics & Computer Science, Delft University of Technology, Delft, The Netherlands,

2 European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK,

3 Netherlands Bioinformatics Centre, Nijmegen, The Netherlands,

4 Kluyver Centre for Genomics of Industrial Fermentation, Delft, The Netherlands

. . . R R T G P L N

F

V T C M R E T . . .

W

introduced amino acid

substituted amino acid

Pfam domain

. . . R - - R T G N K D

F

I A S I R - - T C . . .

. . . K - - R T G P L D

F

I A S I R - - Q T . . .

. . . K - - R T G N L D

Y

I S C M Q - - K A . . .

. . . Q - - R T G E Q D

W

I E C I R - - K G . . .

. . . K - - R T G P L D

F

I A S I R - - Q T . . .

position variation

Feature data sources

The data sources shown in the figures were employed for feature extraction.

Because the availability of structure data is limited, structure-based features were only acquired for part of the mutations.

Protein structure

Protein sequence

MSA with homologous sequences

Results: classification performance

Features

Combined performance of the sub-classifiers PolyPhen2 performance on the entire data set Performance of the classifier that was trained on the entire data set Prediction performances of the sub-classifiers

103.627

neutral

7.78

8

missense mutations

di

sease

14.095

proteins

Classification

10-fold cross-validation

linear discriminant analysis (LDA) classifier

area under the receiver operator curve (AUROC) protocol:

classifier: measure:

Missense mutation feature vector

Twenty features encode the missense mutation, each column representing one amino acid. The substituted amino acid is set to -1 and the introduced amino acid to 1. All other amino acids are 0.

The first feature is a conservation score based on the MSA with homologous sequences as obtained by the Evolutionary Trace Server. The second feature is a binary feature that indicates if the introduced amino acid is in the position variation or not.

Binary feature that indicates if the mutation falls within a Pfam domain or not.

Nineteen features that give the minimal 'characteristic' distance between the

introduced amino acid and the amino acids in the position variation. The used characteristics are, for example, hydrophobicity, size, and isoelectric point.

Protein structure features: solvent exposed area and the three backbone angles.

Amino acid counts

The data set is composed of in total 111.415 mutations in 14.095 proteins. The disease mutations were obtained from the OMIM database and the neutral mutations were obtained from the 1000 Genomes project.

Mutations were split into 20 (non-overlapping) subsets, with in each subset mutations from one amino acid to any other amino acid. The phenylalanine, tryptophan, and tyrosine subsets are combined into one set to increase the set size, resulting in a total of 18 subsets.

ala arg asn asp cys glu gln gly his

ile

leu lys met pro ser thr val FWY

Separate classifiers were trained on each of the eighteen mutation subsets using the settings below. For comparison, one classifier was trained on the entire dataset.

Most of the sub-classifiers as well as their combined result (green) show an improved performance compared to PolyPhen2 (blue). In particular, a striking improvement is observed for charged (arg, lys, asp, glu) and aliphatic (leu, val) sub-classifiers. The reduced performance of the classifier trained on the entire data set (purple) supports the use of sub-classifiers.

Comparison of the occurrences in the neutral and disease set shows which mutations are relatively safe (blue) and dangerous (red).

Cytaty

Powiązane dokumenty

The values of heart rate (A), and time-domain indices of heart rate variability — standard deviation of normal RR intervals (SDNN) (B), square root of the mean squared differences

The authors concluded that a marked decrease of heart rate variability (HRV) may be an important clinical feature in Marfan syndrome (MS) patients with confirmed FBN1

Nuc- lear envelope defects associated with LMNA mutations cause dilated cardiomyopathy and Emery-Dreifuss muscular dystrophy.. Vaughan A, Alvarez-Reyes M, Bridger JM

Address for correspondence: Hui-Jun Ma, Department of Dermatology, Mylike Medical Cosmetic Hospital of Beijing, Beijing, China, e-mail:dr_mahj@163.com; Cheng-Rang Li, Institute

Somatic mutations occur in the cells of various types of cancer including breast, ovarian, prostate, bladder, lung, and colon [3, 33, 34].. Moreover, somatic mutations were

Results of Polish Adult Leuke- mia Study Group (PALG) project assessing TP53 mutations with next- -generation sequencing technology in relapsed and refractory chronic

Analiza stanu mutacji TP53 u doros łych chorych na ALL przeprowadzona przez zespół Chiaretti [38] obejmowała grupę 98 pacjentów i wykaza ła mutacje u 8,2% chorych, co by

Celem obecnego badania jest okre ślenie cz ęstości wyst ę- powania mutacji inwersyjnych (INV22 i INV1) w populacji chorych na ci ężką HA w Polsce oraz oszacowanie cz