• Nie Znaleziono Wyników

Algorithms Inspired by Petri Nets in Modeling of Complex Biological Systems

N/A
N/A
Protected

Academic year: 2021

Share "Algorithms Inspired by Petri Nets in Modeling of Complex Biological Systems"

Copied!
9
0
0

Pełen tekst

(1)

1

Algorithms Inspired by Petri Nets in Modeling of Complex Biological Systems

Anna Gogolińska

Synopsis of the PhD dissertation

1. Background

The immediate goal of this thesis was to develop new ideas based on the PN formalism and provide new computational research tools and algorithms for biological/structural data representation and analysis.

Biological system may be a population, an organism, a physiological system, a tissue, a cell or even a biomolecule such as a protein or a piece of nucleic acid. Biological sciences have made tremendous progress in recent years, however, many crucial phenomena are still poorly understood and they need strong efforts in all fields, including computer science.

Mathematical models of the complex systems should grasp their main features from the physical reality and transfer them into mathematical entities. Many modeling methods have been proposed over the last years, like for example ordinary differential equations (ODEs), process calculi, Boolean networks, Bayesian networks, stochastic equations, or cellular automata. One of the mathematical modeling techniques, used in this thesis, are Petri nets (PNs).

1.1 Petri nets

Petri nets formalism belongs to the mathematical languages created to describe the distributed systems.

Def. 1. A Petri net [1-2] graph is a 4-tuple (P,T,F,W), where:

 P is a finite set of places.

 T is a finite set of transitions (or actions), such that P ∩ T = Ø

 F is a set of directed arcs, satisfying: F ∩ (P×P) = F ∩ (T×T) = Ø (the place may be connected with the transition or the transition with the place; two places or two transitions cannot be connected)

W:F→{1,2,3...} is a weight function assigned to arcs. The weight of one is assigned to an arc as a default.

On the plot of the network places are represented by circles, transitions by squares, arcs by

arrows. Weights are represented by numbers placed near to arcs.

(2)

2

We do not have any actions in PN created according to Def. 1 - it is only a steady framework.

In order to have actions, we need marking. A marking describes the distribution of tokens over places.

Def. 2. A Petri net is a quintuple (P, T, F, W, M 0 ) where M 0 is initial marking, P, T, F, W like in Def. 1.

Places represent usually some objects or elements. Transitions represent events. Transitions transfer a token from one place to another – this process is called a firing of the transition.

During firing a transition t consumes tokens from its input places and puts them into output places – the number of tokens transferred is described by the weights of arcs involved so firing of a transition changes the marking. Due to those simple rules PNs are very elastic and powerful modeling tool.

Firing of transitions is a random (from all transitions which may fire one is chosen randomly) and concurrent process, transitions which do not have common places may fire at the same time. If some transitions have common input places, they may compete for the tokens, and firing of one transition may cause that another transition will not be longer enabled. Those transitions are in conflict.

Def. 3. Two transitions t 1 and t 2 are in the soft conflict if they have at least one common input place.

The Petri net can be represented in an algebraic approach in the form of matrices with integer coefficients: an input matrix, an output matrix and an incidence matrix.

Def. 4. Let PT = (P, T, F, W, M 0 ) be a Petri net, where P, T, F, W like in Def. 1, M 0 like in Def. 2. The input matrix is a matrix C + = (a ij ) n×m , where:

. The output matrix is a matrix C - = (α ij ) n×m , where:

.

Def. 5. The incidence matrix is a matrix C = (a ij ) n×m , where C = C + - C - .

The incidence matrix N is necessary to define one of the most important property of the biological Petri nets: t-invariants.

Def. 6. T-invariant is a vector x N l (where l = |T|), satisfying: C∙x=0.

The analysis of the t-invariants and its meaning is one of the most important parts of the PN

based biological model examination [3]. The PN model is considered as a correct model if

every transition is a part of at least one t-invariant. This property is called "covered by t-

invariants" (CTI).

(3)

3

Many types and extensions of PNs have been developed. They are for example: colored PNs [4], continuous PNs [5], stochastic PNs [6], timed PNs [7], priority-based PNs [8].

1.2 Immune system as a model of biological system

Humans and other animals live in an unsafe environment, full of microorganisms like bacteria, viruses, parasites and fungi. They may be very dangerous for higher organisms. The central role in defense against them plays the immunological system (IS). It is very complex and can be classified into:

 cellular response: against pathogens living in cells

 humoral response: against pathogens living in body fluids.

Second division is into:

 innate immune system: provides an immediate defense against the infection in a non- specific way

 adaptive immune system: can recognize a pathogen and can respond in a specific way.

The adaptive immune response consists of a few parts: reaction of the antigen presenting cells, for example dendritic cells, presentation of the antigen to lymphocytes, proliferation of helper T cells, development of Tc cells and the cellular response, production of antibodies by B cells and the humoral response [9].

Many phenomena have impact on the IS. They are for example fever – it does not cause new interactions in the immune response, but only stimulates some existing ones [10]. The process of ageing has also effects on IS. It causes decreasing the amount of native T and B cells, but also an increase in a number of "memory" T cells [11]. Diseases, especially those related to the IS, change its behavior. One the most studied is AIDS, caused by the HIV virus. The HIV virus kills the Th lymphocytes and decreases their number in the organism. AOIS (Adult- Onset Immunodeficiency Syndrome) is a newly discovered disease, which symptoms are similar to AIDS [12]. The cause of this disease is presence of anti-INF-γ antibodies. Those antibodies bind to INF-γ cytokines and the low level of INF-γ paralyzes the immune system.

Moreover, diseases not directly connected to the IS may have impact on it – for example Autism Spectrum Disorder (ASD). In the autistic people levels of some cytokines [13] are altered. The IS system is reach enough to serve as a good test case for Petri nets based modeling.

1.3 Molecular dynamics simulations

The molecular dynamics (MD) simulations [14] are computer calculations of the trajectory of

the motion of every atom from a given input set, for example proteins. The theoretical basis of

the MD simulation is straightforward: the localization of every atom in each time step is

calculated by solving the Newton or Langevin equations of motion using, for example, the

Verlet algorithm. For those calculations necessary are: the localization of an atom in the

previous step, the mass of the atom and the force acting on the atom. The force is the most

difficult to compute and it is obtained using predefined force fields. The force field is a set of

parameters and functions describing the potential energy of the system.

(4)

4

The MD simulations allow to look inside the world of proteins and nucleic acids. However their output trajectories are usually a huge datasets and are hard to analyze, especially by a human inspection, thus some sophisticated methods of analysis are necessary.

2. Description of thesis

The following section contains descriptions of chapters of the dissertation.

2.1 Chapter 1

In the first chapter of the thesis the description of the Petri nets is presented (see Background).

This chapter contains basic definitions about the networks and the theory of a t-invariant analysis. Extended types of the Petri nets are described: stochastic, continuous, colored, timed and priority-based.

In the priority-based PNs the transition with the highest priority from all enabled transitions is fired [8]. So, when one starts simulations of the Petri net several times from the same initial marking then the same sequences of transitions firings will be obtained. This strict determinism is not desirable in the studies described in the Chapter 5. Therefore this type of the PNs have been modified in the thesis. During the firing in the modified priority-based PNs the transition with the highest priority among transitions in a soft conflict is selected. This allows to obtain more diverse sequences of transitions firings, however, the firing order is still quite strictly controlled.

In order to mimic in the best possible way the relations between events in the MD simulations a new type of the Petri net has been developed. It is called the random priority-based PNs.

Besides priorities and elements described in Def. 1 , the random priority-based PN contains also a random variable with continuous uniform distribution on the interval <0, 1>. In those networks transitions fire randomly and the probability of a firing is proportional to a priority of a transition. Transitions are drawn from a set of conflicted transitions. Such firing rules produce diverse sequences of transitions firings, but they keep relationship between priorities.

This property was strongly desirable in studies presented in Chapter 5.

2.2 Chapter 2

In this chapter the PN model of the immune system is described. This model was inspired by the work of Na et al. [15-17], but the model presented in this thesis is more detailed, expanded and up-to-date [18].

My PN model of the IS was initially limited to the adaptive part of the immune response, however, the reaction of macrophages turn out to be important and has been added to the model. The created PN of IS consists of more than one hundred places and one hundred transitions, and can be divided into five parts: the reaction of dendritic cells, the proliferation of helper T cells, the cellular and humoral responses and the macrophages reaction. Many phenomena and diseases were added to the model to study their impact on the IS. They are:

fever, ageing, infection of the HIV virus, Adult-Onset Immunodeficiency Syndrome (AOIS)

(5)

5

disease and autism spectrum disorder (ASD). Those phenomena were added to the model in different ways. The method used to introduce the particular feature depended on the type of the introduced changes and was discussed in detail in the thesis. Added phenomena were tested by PN simulations and calculations of the number of tokens in selected places.

During the simulations interesting effects have been observed. In the study the strong positive effect of the fever is clearly noted. Especially when all processes related to the fever are active, the infection is stopped much faster than without the fever. The model shows that the ageing impairs IS – the IS cannot control the infection. During the AIDS or AOIS the immune system does not work properly. Particularly pioneering is our study of the correlation between fever and autism. In [19] the authors presented that the fever improves condition of autistic children but this phenomenon is not studied well. The simulations performed have shown that the fever changes the amount of cytokines and usually brings it closer to the level observed in healthy children or the changes are qualitatively the same for the healthy and autistic children.

The analyses of t-invariants of the IS model were performed. All t-invariants and MCT-sets were calculated and their biological meaning were analyzed. The model has a CTI property, which is a strong condition for the validity of the model. Many t-invariants were found to be very similar and to describe analogous processes, so t-clusters were calculated and analyzed.

This part of the study shows a great potential of PN IS model. My efforts and tests have shown that modeling diverse biological process using this tool is quite practical and feasible.

2.3 Chapter 3

The MD trajectory analysis, described in the last chapter of the thesis, requires scrutiny of huge datasets. This generates problems with simulations of large PNs. One of designed algorithms (OPOA) can generate very big networks and simulations of those PNs is an important part of the presented study. To perform simulations of such large networks the parallel algorithm for PN has been designed. The algorithm is implemented using the CUDA technology [20]. The CUDA technology allows to use computational capabilities of the graphical cards processors (GPU) by a programmer. However, it requires appropriate organization of the calculations into threads and threads' blocks and suitable use of graphical cards memories.

The algorithm consists of two parts: the preprocessing and the simulation. During the preprocessing some values, important in the simulation, are calculated. It is performed mostly by GPU, however some part of the calculation is very difficult to parallelize because of many possible branches. This part is executed by CPU. The simulation is performed on GPU only.

The algorithm was tested on two graphical cards and two processors. Two groups of PNs

were used in tests: the first were artificially created, the second ones were generated by the

OPOA algorithm described below. For small networks the CPU implementation of the

algorithm is faster, however for larger nets the GPU algorithm is more efficient. The larger

the network is the bigger is the difference between CPU and GPU execution, especially in the

preprocessing stage. The algorithm is quite general and may be used by other researchers.

(6)

6

2.4 Chapter 4

The development of PN based algorithms required real simulation data. In the fourth chapter the MD study leading to such data of the antigen-antibody protein complexes were presented.

Two complexes were studied: pollen from timothy grass Phl p2 and its antibody, and chemokine MCP-1 with its antibody. Those studies gave interesting results themselves, published in the paper [21], however, they were performed mainly to generate test data sets for analysis done in the last chapter. The first complex is connected with allergies, in the second one the MCP-1 chemokine is a protein which perhaps plays a role in autism spectrum disorder. For both complexes classical molecular dynamics (MD) simulations and many steered molecular dynamics (SMD) simulations were calculated. In SMD simulations some atoms are fixed in the space and the harmonic potential (external force) is added to the others.

The SMD method was used to enforce a dissociation of the antigens from the antibodies.

About 70 SMD simulations were performed and different directions of the unbinding force vector were tested. We have found that the forced dissociation of the complex in the lateral direction (approximately perpendicular to the main axis of the antibody) requires forces being about 30% lower than that in the vertical direction. However, the lateral dragging is still measurable. These results support feasibility of new, faster, AFM based medical nanodiagnostic procedures [21]. In this chapter the correlation between results and numerical noises was tested, and the bioinformatics analysis of MCP-1 was performed. Obtained MD and SMD trajectories were used in the next part of the research.

2.5 Chapter 5

The last chapter opens quite new areas of PN formalism applications. It contains information about a possible usage of the Petri nets in the MD simulations analysis. MD trajectories are difficult to analyze, especially by human inspection, therefore the whole framework of analysis has been created. The idea of this novel approach is below.

At the beginning of the new analysis process the MD input data, i.e. MD trajectory, is used to

generate an appropriate PN. Three new, dedicated algorithms of PN generation were

developed: OPOA (one PN place corresponds to a position of one atom), OPOC (one place

corresponds to one conformation of the protein) and CON (one place corresponds to a contact

between two amino acids). In every algorithm various aspects of the simulations are

highlighted. In the OPOA algorithm one place corresponds to a position of one atom and one

transition to movements of the atom. The token represents the current localization of the

atom. However, such Petri net will be very large. Therefore the discretization of the space and

coarse grain model were used. In the used coarse grain model one amino acid is represented

by its central CA atom, so the places represent only locations of those CA atoms. During the

discretization process the three dimensional grid is laid over the space and it divides the

space into cubes. The edge of each cube is equal to the resolution of the grid k. Each cube

represents a new point in the new space and every atom which sits in the same cube in the

is localized in one point in the new space. During the algorithm development, a problem

was identified and it was called "the stealing problem". Two solutions of the stealing problem

were created and it resulted in two variants of the OPOA algorithm:

(7)

7

 OPOAv1 – in this variant the stealing problem is solved by creating dedicated places for every atom

 OPOAv2 – here an additional auxiliary construction is added to the network.

In the OPOC algorithm one place corresponds to one conformation and one transitions to changes between conformations [22]. A token marks the current conformation. During the PN generation it has to be determined if two conformations are the same or different, therefore the structural alignment is necessary. Two well-known and popular structural alignment algorithms were tested [23-24], however, also two my own algorithms were designed. The designed algorithms are approximate but faster (even 500 times) than used so far and quite accurate. During the PN generation in OPOC the structural alignment is calculated many times and efficiency of this calculation is very important.

In the CON algorithm places represent contacts between two amino acids. Transitions correspond to trajectory frames and describe changes in amino acid contacts in consecutive frames. Thus, in the CON algorithm, apart from the generation of the PN, calculations of contacts between amino acids are also important.

Def. 7. Two amino acids a1 and a2 are in contact if the distance between their CA atoms is smaller than a given constant c:

.

An original algorithm of calculation contacts has been designed. The average computational complexity of this algorithm is linear.

A created Petri net can be analyzed or can be used in simulations. It is a novel approach and it requires that PN simulations should mimic the MD simulations. To achieve this not only the classical Petri nets were used (see Def. 1 and Def. 2) but also timed, priority-based and random priority-based PNs have been generated. To reproduce sequences of events during the SMD simulations the Ranchamdan timed Petri nets were used. However, the preformed tests shown that they are not suitable for PN generated by the OPOA algorithms and therefore a new type of Petri nets was invented. Those new PNs are called the guard time Petri nets. In the guard time PNs the time parameter is assigned to transitions and firing of transitions increases the clock state. Transitions with time higher than the current clock state cannot fire, but transitions with time smaller than the current clock state can fire freely in random order.

The protocol for a simulation of the extended types of the PN was designed and presented as the pseudo-code in the dissertation.

The sequences of transitions and places obtained during the PN simulations may be used to

generate a PDB format file [25]. It is some kind of a reverse transformation from the PN

universe back to the protein structure space. In some cases the PDB file is easier to analyze

than the original massive PN simulations data but this transformation is possible only for the

OPOA and OPOC algorithms, because in the CON generated PN information about

localizations in the space is not stored.

(8)

8

Examples of generated Petri nets, sequences of places and transitions obtained during the PN simulation and created PDB files are presented in the thesis. Those examples were analyzed.

The features of the designed algorithms were presented.

3. Summary and future prospects

The topics presented is the thesis demonstrate that PNs are useful tools in biological studies. I have tried to show that they are suitable for investigations of the immune system. Our newly created model of IS has been successfully used in analysis of numerous phenomena. Further elements can be added to the model relatively easily. A simulation of the PN executed on the GPU using the current CUDA technology is possible and it is demonstrated that for large networks PN GPU is faster than executed on the CPU. The presented SMD simulations show that the dragging of an antigen from an antibody depends on the direction of the force applied:

a lateral process requires lower forces than a vertical one. This simulation result gives theoretical foundation for Lateral Force Spectroscopy measurements by the AFM method.

Perhaps the most innovative aspect of this thesis is paving a way to applications of the PN formalism in MD simulations. Such attempts do not exist in the literature. Three algorithms of the PN generation from the MD trajectories were designed. Using created PN changes of molecular conformations, changes in locations and time evolution of amino acid – amino acid contacts can be tracked. It is shown that the PN generated by the OPOC algorithm can be used for clustering of molecules conformations. An advantage of our PNs algorithms over other methods of MD data analysis is that PNs represent not only conformations (or positions) of the elements, but also the connections and relations of those objects as well.

The studies presented in this thesis are the first results in their areas and do not fully exhaust the topic. Many new ideas may be tested, like for example the utility of other types of Petri nets in the analysis of MD trajectories. Especially, a new type or a modification of timed PN is required to accurately present the features of SMD trajectories during the PN simulation.

An idea of new PN generation algorithm from a MD trajectory has been invented. This new algorithm may be called OPOV (one place one velocity), and it will describe not the positions of the amino acids, but their velocities. However, first the results of the CON algorithm should be better investigated. The CON algorithm is the least studied. It will be useful for generating PNs using the OPOC algorithm from MD trajectories calculated in lower temperatures. The existing Petri nets generated algorithms can be also enriched. I would like to generate some "macro" PN, whose every place would represent a smaller, more detailed Petri net. New features can be added and studied in the PN model of the IS, like for example, a better representation of memory cells.

During the preparation of this thesis, and a participation in NCN founded project (N519 578138) I have learnt that ASD has a strong genetic component. Together with other colleagues, I have developed a framework for information search on new ASD related genes.

We have initiated a project (GENIUS) aimed at finding an inexpensive biochemical, PCR

based, test checking selected genes. The GENIUS project is carried out in cooperation with

the Pracownia Genetyki Nowotworów w Toruniu lab and the Centre for Modern

(9)

9

Interdisciplinary Technologies of Nicolaus Copernicus University in Torun, with the financial support from the kujawsko-pomorskie voivodeship.

Publications

[1] W. Reisig, Petri nets: an introduction: Springer-Verlag New York, Inc., 1985.

[2] T. Murata, "Petri nets: Properties, analysis and applications," Proceedings of the IEEE, vol.

77, pp. 541-580, 1989.

[3] I. Koch, et al., Modeling in systems biology the petri net approach. London: Springer, 2011.

[4] K. Jensen, "Coloured petri nets," in Petri nets: central models and their properties, ed:

Springer, 1987, pp. 248-299.

[5] R. David and H. Alla, Discrete, continuous, and hybrid Petri Nets: Springer, 2005.

[6] M. A. Marsan, "Stochastic Petri nets: an elementary introduction," in Advances in Petri Nets 1989, ed: Springer, 1990, pp. 1-29.

[7] J. R. Silva and P. M. del Foyo, "Timed Petri Nets," 2012.

[8] P. J. Fortier and H. E. Michel, Computer systems performance evaluation and prediction:

Access Online via Elsevier, 2003.

[9] J. Parkin and B. Cohen, "An overview of the immune system," The Lancet, vol. 357, pp. 1777- 1789, 2001.

[10] J. D. Hasday, et al., "The role of fever in the infected host," Microbes and Infection, vol. 2, pp.

1891-1904, 2000.

[11] D. Weiskopf, et al., "The aging of the immune system," Transplant international, vol. 22, pp.

1041-1050, 2009.

[12] S. K. Browne, et al., "Adult-Onset Immunodeficiency in Thailand and Taiwan," New England Journal of Medicine, vol. 367, pp. 725-734, 2012.

[13] C. Molloy, et al., "Elevated cytokine levels in children with autism spectrum disorder,"

Journal of neuroimmunology, vol. 172, pp. 198-205, 2006.

[14] H. J. C. Berendsen, et al., "Molecular dynamics with coupling to an external bath," The Journal of Chemical Physics, vol. 81, pp. 3684-3690, 1984.

[15] D. Na, et al., "Integration of Immune Models Using Petri Nets," 2004.

[16] I. Park, et al., "Fuzzy Continuous Petri Net-Based Approach for Modeling Helper T Cell Differentiation," in Artificial Immune Systems. vol. 3627, C. Jacob, et al., Eds., ed: Springer Berlin / Heidelberg, 2005, pp. 331-338.

[17] I. Park, et al., "Fuzzy Continuous Petri Net-Based Approach for Modeling Immune Systems,"

in Neural Nets. vol. 3931, B. Apolloni, et al., Eds., ed: Springer Berlin / Heidelberg, 2006, pp.

278-285.

[18] A. Gogolinska and W. Nowak, "Petri Nets Approach to Modeling of Immune System and Autism," in Artificial Immune Systems. vol. 7597, C. Coello Coello, et al., Eds., ed: Springer Berlin / Heidelberg, 2012, pp. 86-99.

[19] L. K. Curran, et al., "Behaviors associated with fever in children with autism spectrum disorders," Pediatrics, vol. 120, p. e1386, 2007.

[20] C. Nvidia, "Programming guide," ed, 2013.

[21] A. Gogolinska and W. Nowak, "Molecular basis of lateral force spectroscopy nano- diagnostics: computational unbinding of autism related chemokine MCP-1 from IgG antibody," Journal of Molecular Modeling, vol. 19, pp. 4773-4780, 2013.

[22] R. Jakubowski, et al., "Computational Studies of TTR Related Amyloidosis: Exploration of Conformational Space through a Petri Net-Based Algorithm," TASK Quarterly, p. 18(3):267, 2014.

[23] I. N. Shindyalov and P. E. Bourne, "Protein structure alignment by incremental combinatorial extension (CE) of the optimal path," Protein engineering, vol. 11, pp. 739-747, 1998.

[24] Y. Ye and A. Godzik, "Flexible structure alignment by chaining aligned fragment pairs allowing twists," Bioinformatics, vol. 19, pp. ii246-ii255, 2003.

[25] K. Henrick, et al., "Remediation of the protein data bank archive," Nucleic acids research,

vol. 36, pp. D426-D433, 2008.

Cytaty

Powiązane dokumenty

In the trajectory-dynamic properties framework, we optimize the utility function used for trajectory planning in the CDPPN by a Lyapunov-like function, obtaining as a result

It is the aim of this section to close the gap between the results given in Tables 1 and 2 for the values t = 0, 1, 2 and therefore to show that in these cases the method

And it is easy to show that in the worst case the number of prime implicants exponentially depends on the expression size (for example, when the literals never repeat in the

We will identify the mark-dynamic properties of the HDPPN as related to only place-transitions Petri nets, and we will relate the trajectory-dynamic properties of the HDPPN as

A number of parabolic equations originating in Applied Sciences admit the formulation in an abstract form (1) below, where A is a sectorial operator in a Banach space X (cf..

Keywords: shortest path game, game theory, Nash equilibrium point, Lyapunov equilibrium point, Bellman’s equation, Lyapunov-like fuction,

Large deviations results for particular stationary sequences (Y n ) with regularly varying finite-dimensional distributions were proved in Mikosch and Samorodnitsky [19] in the case

To deal with such uncertainties, a new approach based on discrete time Markov decision processes (MDPs) has been pro- posed that associates the modelling power of PNs with the