RNA 3D structure:
bioinformatics perspective
Janusz M. Bujnicki
IIMCB, Warsaw & UAM, Poznan
Today
MEDICINE and BIOLOGY depend on DNA analysis
Human karyotype with color added to distinguish chromosome pairs. {{PD-USGov-NIH}}
2
Today
identification of specific DNA
sequences
Human karyotype with color added to distinguish chromosome pairs. {{PD-USGov-NIH}}
…AAGTGCT CGAGCA T…
MEDICINE and BIOLOGY depend on DNA analysis
3
Today
identification of specific DNA
sequences
diagnostics gene therapy
forensics etc.
Human karyotype with color added to distinguish chromosome pairs. {{PD-USGov-NIH}}
…AAGTGCT CGAGCA T…
MEDICINE and BIOLOGY depend on DNA analysis
4
Today & Tomorrow
rapid growth of research on…
http://upload.wikimedia.org/wikipedia/commons/f/f2/0324_DNA_Translation_and_Codons.jpg
information storage
executive layer
workhorse
5
Today & Tomorrow
rapid growth of research on…
RNA!!!
http://upload.wikimedia.org/wikipedia/commons/f/f2/0324_DNA_Translation_and_Codons.jpg
information storage
executive layer
workhorse
6
Today & Tomorrow
rapid growth of research on…
RNA!!!
http://upload.wikimedia.org/wikipedia/commons/f/f2/0324_DNA_Translation_and_Codons.jpg
information storage
executive layer
workhorse
7
RNA
… T A C G G C G T T A G A C A A G T G C G T G A G T A C A C A …
… A T G C C G C A A T C T G T T C A C G C A C T C A T G T G T …
… U A C G G C G U U A G A C A A G U G C G U G A G U A C A C A …
MASTER REGULATOR OF LIVING CELLS!
8
RNAs as the… new proteins?
Analogies between proteins and RNAs:
Sequences: linear polymers
Structures: complex 3D shapes Functions: catalysis, regulation
The “1D-3D-F” code:
sequence ↔ 3D structure ↔ function RNase P & RNase P
RNA protein
RNAs as the… new proteins?
Analogies between proteins and RNAs:
Sequences: linear polymers
Structures: complex 3D shapes Functions: catalysis, regulation
• Experimental structure determination is very difficult for RNA
• There are many RNA genes with unknown function (in particular in Eukaryota)
• We need to break the code (at least 1D-3D) to better understand their function The “1D-3D-F” code:
sequence ↔ 3D structure ↔ function RNase P & RNase P
RNA protein
Ludwig Edward Boltzmann (1844-1906) Charles
Darwin (1809-1882)
EVOLUTIONARY BIOLOGY
STATISTICAL THERMODYNAMICS Protein
PROTEIN 3D structure prediction two schools of thought
11
Homologous macromolecules retain very similar 3D structures despite accumulated
substitutions in sequences RNase A RNase 4
According to Anfinsen:
native stucture corresponds to the global free energy minimum of the system
PROTEIN structure prediction
two schools of thought
TIME:
~MILLIONS OF YEARS GRANULARITY:
~RESIDUES
TIME:
~MILISECONDS / SECONDS GRANULARITY:
~ATOMS PHYSICAL MODEL:
1D→3D FOLDING EVOLUTIONARY MODEL:
DIVERGENCE & CONSERVATION
13
Families of homologous RNAs
also retain very similar 3D structures despite accumulated
substitutions in sequences
RNAs also fold into low energy structures:
Thus far it has been exploited on the level of 2D prediction A-riboswitch G-riboswitch
RNA folding and evolution:
…quite like proteins?
TIME:
~MILLIONS OF YEARS GRANULARITY:
~RESIDUES
TIME:
~MILISECONDS / SECONDS GRANULARITY:
~ATOMS
STRUCTURE PREDICTION FOR RNA
UAUCGUAUGCUUUGCGCGCAGCAGCGAAGCGCUGACAC
PHYSICAL MODEL:
1D→3D FOLDING EVOLUTIONARY MODEL:
DIVERGENCE & CONSERVATION
15
TEMPLATE
16
ModeRNA:
RNA homology-modeling
Lena Rother
Kristian
Rother
MODEL
17
ModeRNA:
RNA homology-modeling
MODEL VS
CRYSTAL STRUCTURE RMSD 2.27 Å
18
ModeRNA:
RNA homology-modeling
MODEL VS
CRYSTAL STRUCTURE RMSD 2.27 Å
19
ModeRNA:
RNA homology-modeling
aminoacyl tRNA
synthetase
RNA is more dynamic than proteins
http://pharmacy.utah.edu/medchem/faculty/Davis_D.htm
NMR structure of a functionally important domain of the HCV IRES RNA
in complex with an inhibitor of viral replication.
RNA changes conformation upon ligand binding
decoding region A-site HIV-1 frameshift inducing element
HIV-1 TAR RNA
RNA thermometers
change structure depending on temperature
Jens Kortmann, Franz Narberhaus
Nat Rev Microbiol 2012 Apr 16;10(4):255-65.
Bacterial RNA thermometers: molecular zippers and switches.
RNA thermometers
change structure depending on temperature
Jens Kortmann, Franz Narberhaus
Nat Rev Microbiol 2012 Apr 16;10(4):255-65.
Bacterial RNA thermometers: molecular zippers and switches.
Riboswitches:
change structure upon ligand binding
Kim, J. N.; Breaker, R. R.,
Purine sensing by riboswitches.
Biology of the Cell 2008, 100, (1), 1-11.
Riboswitches:
change structure upon ligand binding
Kim, J. N.; Breaker, R. R.,
Purine sensing by riboswitches.
Biology of the Cell 2008, 100, (1), 1-11.
RNA is much more dynamic than proteins
SimRNA
Coarse-grained model for RNA folding
3' 5'
1zih.pdb: GCAA RNA tetraloop, NMR
27
Boniecki MJ, Lach G, Dawson WK, Tomala K, Lukasz P, Soltysinski T, Rother KM, Bujnicki JMSimRNA: a coarse-grained method for RNA folding simulations and 3D structure prediction Nucleic Acids Res. 2015 in press
G G G C
G C
A A
G
C C
U 3' 5'
1zih.pdb: GCAA RNA tetraloop, NMR
SimRNA
Coarse-grained model for RNA folding
28
Michał Boniecki
Boniecki MJ, Lach G, Dawson WK, Tomala K, Lukasz P, Soltysinski T, Rother KM, Bujnicki JM SimRNA: a coarse-grained method for RNA folding simulations and 3D structure prediction Nucleic Acids Res. 2015 in press
Statistical potential:
• Distances between close atoms
• Angles between virtual bonds
• Torsion (dihedral) angles
• Residue-residue interactions (short and long-range) Boltzmann distribution law:
Frequently observed conformations
= local energy minima
Eta-Theta map =
= potential for backbone
U
Distribution of contacts in 3D =
= potential for interactions
SimRNA
Energy function
29
Boniecki MJ, Lach G, Dawson WK, Tomala K, Lukasz P, Soltysinski T, Rother KM, Bujnicki JMSimRNA: a coarse-grained method for RNA folding simulations and 3D structure prediction Nucleic Acids Res. 2015 in press
SimRNA Sampling
• Monte Carlo, Metropolis algorithm
perform random move
if energy decreases, move is accepted
otherwise move is accepted with probability:
• simulated annealing (folding, unfolding)
• Replica Exchange (typically 10 replicas)
e E -E kT
2 130
Boniecki MJ, Lach G, Dawson WK, Tomala K, Lukasz P, Soltysinski T, Rother KM, Bujnicki JMSimRNA: a coarse-grained method for RNA folding simulations and 3D structure prediction Nucleic Acids Res. 2015 in press
SimRNA Restraints
• complete freezing
• secondary structure (WC cis base pairs)
• atom position (”pinning”)
• atom distance (”tethering”)
r
Score(r)
r
(0,0)
radius thrs
1.0 slope
d
Score(d)
d
(0,0)
mindist max dist
1.0 slope
Score(d)
d
(0,0)
mindist max dist
welldepth
...(...)...
31
Boniecki MJ, Lach G, Dawson WK, Tomala K, Lukasz P, Soltysinski T, Rother KM, Bujnicki JMSimRNA: a coarse-grained method for RNA folding simulations and 3D structure prediction Nucleic Acids Res. 2015 in press
SimRNA folding simulation 1l2x.pdb, viral pseudoknot
RMSD to the crystal structure E
5' 3'
GGCGCGGCACCGUCCGCGGAACAAACGG
32
Boniecki MJ, Lach G, Dawson WK, Tomala K, Lukasz P, Soltysinski T, Rother KM, Bujnicki JMSimRNA: a coarse-grained method for RNA folding simulations and 3D structure prediction Nucleic Acids Res. 2015 in press
SimRNA folding simulation 1l2x.pdb, viral pseudoknot
RMSD to the crystal structure E
GGCGCGGCACCGUCCGCGGAACAAACGG
33
Boniecki MJ, Lach G, Dawson WK, Tomala K, Lukasz P, Soltysinski T, Rother KM, Bujnicki JMSimRNA: a coarse-grained method for RNA folding simulations and 3D structure prediction Nucleic Acids Res. 2015 in press
time E
RMSD to the crystal structure E
SimRNA folding simulation 1l2x.pdb, viral pseudoknot
results for all replicas
34
Boniecki MJ, Lach G, Dawson WK, Tomala K, Lukasz P, Soltysinski T, Rother KM, Bujnicki JMSimRNA: a coarse-grained method for RNA folding simulations and 3D structure prediction Nucleic Acids Res. 2015 in press
SimRNA folding simulation 1l2x.pdb, viral pseudoknot
RMSD to the crystal structure
Energy barrier
E
..(((((...)))))...
...(((...)))
..(((((...)))))...
...(((...)))
NATIVE PREDICTION
RMSD: 4.2 Å
35
Boniecki MJ, Lach G, Dawson WK, Tomala K, Lukasz P, Soltysinski T, Rother KM, Bujnicki JMSimRNA: a coarse-grained method for RNA folding simulations and 3D structure prediction Nucleic Acids Res. 2015 in press
SimRNA folding recapitulates folding funnels for many RNA molecules
Energy
RMSD 0=native
Energy funnel
native structure minimal energy
1a60 1e95
(((((...)))))(((...)))...
...((((((...))))))...
(((((...)))))(((...)))...
...((((((...))))))...
.((((...))))...
...(((((...))))).
(((((...)))))...
...(((((...))))).
((((((...))))))...
...((.((((((...)))))).)).
((((((...))))))...
...(((((((((...))).)))))).
1ymo
.(((((...)))))...
...(((...)))
.(((((...)))))...
...((((...))))
437d
((((...))))...
...(((...)))
((((...))))...
...((((...).)))
2a43
..(((((...)))))...
...(((...)))
..(((((...)))))...
...((((...))))
1l3d
36
Boniecki MJ, Lach G, Dawson WK, Tomala K, Lukasz P, Soltysinski T, Rother KM, Bujnicki JMSimRNA: a coarse-grained method for RNA folding simulations and 3D structure prediction Nucleic Acids Res. 2015 in press
Alternative (suboptimal) structures
37
1fqz: domain IIID of hepatitis C virus IRES
NMR
cluster 1
cluster 2
cluster 3
2 3 1
Boniecki MJ, Lach G, Dawson WK, Tomala K, Lukasz P, Soltysinski T, Rother KM, Bujnicki JM SimRNA: a coarse-grained method for RNA folding simulations and 3D structure prediction Nucleic Acids Res. 2015 in press
Non-canonical base pairs predicted correctly
38
1fqz: domain IIID of hepatitis C virus IRES superposition
Boniecki MJ, Lach G, Dawson WK, Tomala K, Lukasz P, Soltysinski T, Rother KM, Bujnicki JM SimRNA: a coarse-grained method for RNA folding simulations and 3D structure prediction Nucleic Acids Res. 2015 in press
SimRNA workflow
39
• conversion to coarse-grained representation
• simulation
• clustering of low-energy decoys
• selection of decoys:
- lowest energy
- biggest clusters (typically 1st, 2nd, 3rd)
• conversion to full-atom representation
• full-atom (fine-grained) refinement
• additional model quality verification
Boniecki MJ, Lach G, Dawson WK, Tomala K, Lukasz P, Soltysinski T, Rother KM, Bujnicki JM SimRNA: a coarse-grained method for RNA folding simulations and 3D structure prediction Nucleic Acids Res. 2015 in press
PDB ID: 1L2X
QRNAS: physics-based energy function discriminates native-like structures
(RMSD < 2 Å)
Decoys generated by unfolding
Juliusz Stasiewicz, Janusz M. Bujnicki, unpublished 1
Original structure
PDB ID: 1BYXQRNAS:
MolProbity score improvement
Structure refined with QRNAS
Juliusz Stasiewicz
2
ModeRNA / SimRNA / QRNAS modeling pipeline
template: Azoarcus intron structure (1zzn) target: phage Twort intron
3
ModeRNA / SimRNA / QRNAS modeling pipeline
comparative model (ModeRNA)
4
ModeRNA / SimRNA / QRNAS modeling pipeline
refolded variable parts (SimRNA)
5
ModeRNA / SimRNA / QRNAS modeling pipeline
all-atom refinement (QRNAS)
6
ModeRNA / SimRNA / QRNAS modeling pipeline
phage Twort intron structure (1y0q)
7
ModeRNA / SimRNA / QRNAS modeling pipeline
model vs reference (1y0q)
8
R5 R6
P Watson-
Crick RIB
Hoogsteen
Sugar
R6 P R5
RIB
S1 CA
S2 CA
! distance
orientation
nt edge
steric clashes
Known structures of protein-RNA complexes
Statistical
exp AB
ABobs
AB
n
p n
GRAMM decoys
Quasichemical
exp AB ABobs
AB
n
p n
tot B
A
AB
x x N
n
exp
Coarse-grained potentials
Estat = Er + Ea + Es + Ep
RNA-protein docking
DARS-RNP & QUASI-RNP
DARS-RNP and QUASI-RNP: new statistical potentials for protein-RNA docking.
9
Tuszynska I, Bujnicki JM.BMC Bioinformatics. 2011 Aug 18;12:348
NPDock:
rigid body RNA/DNA-protein docking
10
NPDock: a web server for protein-nucleic acid docking.Tuszynska I, Magnus M, Jonak K, Dawson W, Bujnicki JM.
Nucleic Acids Res. 2015 Jul 1;43(W1):W425-30.
SimRNP: modeling of RNA-protein complexes
S
O O
O
O
O
O O O
N N
N N
N
N N
H N2 N
COOH
COOH
COOH
HO H N2 N
S
O O
O
O O
O O O
N N
N N
N
N N
H N2 N
COOH
COOH
COOH
HO H N2 N
Representation of protein chain:
like in REFINER Representation of aromatic residues
ARG
H N2 COOH HN
NH NH2
H N2 COOH N N H
H N2 COOH
H N2 COOH OH
HIS PHE
TYR
H N2 COOH N
TRP
Michal Boniecki et al. 11
E TOTAL = E(protein) + E(RNA) + E(protein,RNA)
SimRNP: modeling of RNA-protein complexes
Michal Boniecki et al. 12
4 protein chains 1 RNA chain
RNA
protein protein
1E8O_P_posRestr_.avi
INPUT:
• sequences of all components
• structures/models of some components
• disorder / flexibility
• molecule shapes (cryoEM, SAXS/SANS)
• distance restraints (cross-linking, FRET, etc.)
• accessibility (enzyme active sites exposed etc.) http://genesilico.pl/pyry3d
PyRy3D:
coarse-grained modeling of complexes
13
Kasprzak JM, Dobrychłop M, Koryciński M, Potrzebowski W, Susik M, Pogorzelska L, Niemiec R, Rudnicki W, Bujnicki JM PyRy3D: a software tool for modeling of large macromolecular complexes with user-defined restraintsunpublished
Is it possible to build models that agree with all the input data?
How many such models exist?
PyRy3D:
coarse-grained modeling of complexes
14
Joanna Kasprzak
Kasprzak JM, Dobrychłop M, Koryciński M, Potrzebowski W, Susik M, Pogorzelska L, Niemiec R, Rudnicki W, Bujnicki JM PyRy3D: a software tool for modeling of large macromolecular complexes with user-defined restraints
unpublished
21 complexes with simulated maps
5 complexes with experitmental cryoEM maps
PyRy3D benchmarking
15
Kasprzak JM, Dobrychłop M, Koryciński M, Potrzebowski W, Susik M, Pogorzelska L, Niemiec R, Rudnicki W, Bujnicki JM PyRy3D: a software tool for modeling of large macromolecular complexes with user-defined restraintsunpublished
ANIMATIONS MODEL RANKING
SOLUTION SCORING
SIMULATIONS PARAMETER
TESTING INPUT FILE PREPARATION
PyRy3D Chimera plugin
RESTRAINT VIOLATION
VISUALIZED ENERGY / SCORE PLOTS
http://iimcb.genesilico.pl/pyry3d
Mateusz Dobrychlop, Joanna Kasprzak et al. 16
http://genesilico.pl/pyry3d/
Mateusz Dobrychlop, Joanna Kasprzak et al., collaboration: Witold Rudnicki (ICM) 17
RNA 3D structure prediction
Grzegorz Lach, Krzysztof Formanowicz, Michal Boniecki et al.
18
Given a target sequence, predict its 3D structure
UAUCGUAUGCUUUGCGCGC
AGCAGCGAAGCGCUGACAC
RNA structure-based sequence design
Grzegorz Lach, Krzysztof Formanowicz, Michal Boniecki et al.
19 UAUCGUAUGCUUUGCGCGC
AGCAGCGAAGCGCUGACAC
Given target 3D structure, predict a sequence that folds to form that structure
RNA structure-based sequence design
Grzegorz Lach, Krzysztof Formanowicz, Michal Boniecki et al.
20 UAUCGUAUGCUUUGCGCGC
AGCAGCGAAGCGCUGACAC
Given target 3D structure, predict a sequence that folds to form that structure
...with the aid of secondary structure design (and prediction)
UAUCGUAUGCUUUGCGCGC
AGCAGCGAAGCGCUGACAC
RNA structure-based sequence design
Grzegorz Lach, Krzysztof Formanowicz, Michal Boniecki et al.
21 UAUCGUAUGCUUUGCGCGC
AGCAGCGAAGCGCUGACAC
Given target 3D structure, predict a sequence that folds to form that structure
...with the aid of secondary structure design (and prediction)
UAUCGUAUGCUUUGCGCGC AGCAGCGAAGCGCUGACAC
Positive design:
Maximize the positive energetic effect of forming the target structure Negative design:
Minimize the positive energetic effect of forming all other structures
RNA design with DesiRNA and SimRNA
22
DesiRNA – algorithm for secondary structure design
• use both positive and negative design
• consider oligomerization (monomers vs homooligomers)
1. generate initial sequences randomly or according to constraints 2. for each sequence compute the MFE and suboptimal structures 3. select sequences forming structures similar to the target structure
penalize sequences that form other structures
4. identify potential sites of mutations with largest effect on structure 5. exhaustively mutate at selected positions
6. go back to 2.
Grzegorz Lach, Krzysztof Formanowicz, Michal Boniecki et al.
SimRNA – algorithm for 3D structure folding used in a positive design mode
• define target structure (starting coordinates and/or restraints)
• use additional ”move”: sequence substitution
• use DesiRNA fitness function for sequence
RNase H zinc finger ZF-QQR
DNA RNA
N
C protein
structure modeling
nucleic acid structure modeling
protein- -nucleic acid
docking
SELEX enzyme
engineering
re-design
of specific contacts
Design, modeling, and engineering of a sequence-specific RNase H
23
Sulej AA, Tuszynska I, Skowronek KJ, Nowotny M, Bujnicki JMSequence-specific cleavage of the RNA strand in DNA-RNA hybrids by the fusion of ribonuclease H with a zinc finger.
Nucleic Acids Res. 2012 Dec;40(22):11563-70
BsMiniIII RNase
cuts dsRNA sequence-specifically
Głów D, Pianka D, Sulej AA, Kozłowski ŁP, Czarnecka J, Chojnowski G, Skowronek KJ, Bujnicki JM Sequence-specific cleavage of dsRNA by Mini-III Rnase.
Nucleic Acids Res. 2015 Mar 11;43(5):2864-73, Breakthrough Article