Computer simulation is required to study the dynamics of fundamental cellular mechanisms such as gene networks regulation, signalling and metabolism. The combined use of “–omics” allow us to reconstruct biochemical reaction networks at a dropping cost and at a fast pace. This exciting capability empowers computational systems biology and networks medicine, two rapidly emerging and very promising fields expected to have significant impact on drugs design by the end of the decade. Computational Bioengineering and Biocomputing become indispensable since in silico experimentation can help us get a handle on fundamental questions such as: “How is the cell expected to react if these two genes are blocked?”, or “At what parts of a pathway we can intervene to prevent cell death?” etc.
The computational challenge we are facing when trying to address such important questions using reconstructed networks is how to deal effectively with the model’s complexity. For example the E-coli metabolic network reconstruction consists of 1260 molecular species, interacting through 2077 reactions. The cytoplasm of a whole E-coli cell is estimated to contain half-million large organic molecules of different types (proteins, ribosomes, tRNA etc.) and ~40 million molecules overall (if we also consider ions and water. The number of biochemical reaction events happening during an E-coli cell cycle is estimated to be in the order of 1014 to 1016. Today, the stochastic simulation performance of a state-of-the art CPUs is below 0.5Mreactions/sec (millions of reactions per second). It is therefore clear that whole-cell simulation (a grand challenge towards Science 2020) is out of reach, unless effective algorithms, parallel processing methods, and specialized computer hardware accelerators are all employed in a concerted effort to address successfully the simulation “speed challenge”.
Coupled Ordinary Differential Equations (ODEs) have been used traditionally to describe the dynamical behaviour of chemical reacting systems. However ODEs fail to account for the naturally occurring stochasticity in cellular systems, and especially for the intriguing stochastic behaviors exhibited in the low species count regimes. Markov processes and in particular Gillespie’s Stochastic Simulation Algorithms (SSA) have emerged as the method of choice for stochastic simulation since they can approximate accurately the solution of the Chemical Master Equation.
Although conceptually simple, the SSA requires computation of the propensities of all reactions (m) in a reaction cycle (RC), leading to an RC complexity in O(m). Considering that a typical simulation may last for billions of reaction cycles, there is a definite need for parallel processing especially as the size of the network (number of reactions) increases. Concerned with this issue, Gillespie has introduced the First Reaction Method (FRM) SSA. The FRM-SSA calculates for every reaction channel the putative next activation time, using a randomly drawn number. The particular reaction, with the smallest time, is finally activated completing the simulation of an RC. Since the calculation of each putative time can proceed independently of the others, the algorithm can be parallelized across the m reaction channels.
To address the high complexity of Gillespie’s SSAs, Gibson and Bruck introduced the Next Reaction Method (NRM) with RC complexity in O(logm). For very large networks with thousands of reactions the NRM is the method of choice for most available software packages. It gains performance by maintaining a priority queue with the root storing the reaction channel having the smallest tentative execution time. Every time a reaction "fires" the priority queue is updated using a single random number. Moreover, only the propensities of reactions with reactant species affected by the current reaction need to be updated. Although much faster that the FRM, the NRM is much harder to parallelize since it needs to maintain a global large data structure. Therefore the FRM is used in most parallel implementations of the SSA.
In a typical network simulation we need to compute a very large ensemble of stochastic realizations to approximate the probability density function of species populations. Since we typically run thousands to millions of different realizations, parallel computing is mandatory. Therefore when designing a parallel computing resource to accelerate SSAs we have to consider two important requirements: (i) Using available processing elements (PEs)) to accelerate a single simulation run of a network; we call this Single Simulation in Parallel (SSIP) mode of operation, (ii) Using available PEs to perform multiple independent simulation runs in parallel; we call this Multiple Simulations in Parallel (MSIP) mode of operation.
Motivated by the above needs, the first two goals set for this multidisciplinary project are to deliver flexible IP core designs which can be used to generate (through high level synthesis) different FRM-SSA and NRM-SSA multiprocessor SoC instances for FPGAs.
Goal 1: Design, develop and validate a fully parametric SoC architecture for realizing Gillespie’s FRM-SSA. The design will be described in the form of a soft IP core. It will be described in VHDL and will be fully parametric, in terms of the network characteristics (n= number of species, m= number of reactions, q= max. reaction order) and target SoC specifications (mode of operation= SSIP vs. MSIP, N= number of parallel PEs). The VHDL descriptions can be used to generate flexibly (by high level synthesis) any desired SoC instance for an FPGA implementation. We will validate the design by implementing with FPGAs various synthesized SoC instances with N=8 or more PEs for the stochastic simulation of large networks having at least 4K reactions and operating in both modes.
Goal 2: Design, develop and validate a fully parametric SoC architecture realizing Gibson and Bruck’s NRM-SSA. This alternative SoC architecture will also be captured as a soft IP core in synthesizable parametric VHDL code. The same detailed parameterization as for the FRM-SSA will be used. It will be thoroughly tested with a large variety of biomodels using FPGAs and its performance and scalability characteristics will be compared to those of the FRM-SSA SoCs as the size of the simulated network increases.
In the field of computer architecture there is recently a clear trend towards using a network of routers-on-chip to be able to build powerful CPUs with more than N=8 cores. An interesting prototype is Intel’s Single Cloud Computer (SCC), an experimental many-core processor chip with 48 P54C cores. The cores are organized in pairs, called "tiles", and the tiles are interconnected via a 4x6 mesh network of routers. So our next goal is to develop parallel processing realizations of the FRM-SSA and the NRM-SSA for the large-scale efficient simulation of biochemical reaction networks using emerging many-core CPUs.
Goal 3: Develop parallel software implementations of the FRM and NRM SSAs for simulating large reacting systems with Intel’s Single Cloud Computer many-cores NoC CPU. We will evaluate scalability of NoC performance implementations as the number of cores increases and for large networks..
Our final goal is to develop a pilot portable computational resource for high performance stochastic simulation accessible to practicing systems biologists over the internet.
Goal 4: Develop a pilot web server prototype that will allow scientists to submit over the internet network models in SBML format for parallel stochastic simulation. Submitted network models will be parsed and executed using the best available FPGA SOC implementation depending on their parameters. The resource will be based on a PC server as front end with FPGA boards acting as the back end. The end user will not need to have any knowledge ot parallel processing to use the resource.
The technical work of the project is organized into three Work Packages. WP1 will address Goals 1 and 2, WP2 will address Goal 3 and WP3 will address Goal 4.Work package 4 is concerned about the dissemination of the results of the research worldwide.
Anticipated benefits: It is well recognized that the main limitation of stochastic simulation today is the “speed challenge”. The proposed high performance computing for systems biology research is expected to raise by at least 2 orders of magnitude the complexity of biochemical reacting systems that can be efficiently simulated stochastically, thus enabling in silico experimentation with cellular systems having thousands of reactions. This will be a decisive step forward towards whole-cell simulation, one of the grant challenges of the 21st century. It will enable the better understanding that single-cell stochasticity (intrinsic and extrinsic "noise") play in fundamental biological mechanisms such as cell differentiation, tumor proliferation etc. In addition, the international collaborations of the team and the prototypical portable computing platform and web server to be developed will make the project results immediately exploitable by scientists all over the world. This we hope will maximize their impact on scientific fields increasingly relying on “executable biology”, ranging from rational drug design to metabolic engineering and personalized medicine.