SSA NoC Software
Through our collaboration with Prof. D. Soudris' group at the National Technical University of Athens (NTUA), we had access to Intel's Single Cloud Computer (SCC), a Network on Chip CPU with 32 dual Pentium cores connected through a very fast mesh network of routers. We investigated how this powerful processor architecture can be exploited to accelerate stochastic simulation.
Intel’s SCC is an experimental NoC processor consisting of 48 P54C cores, organized in pairs, called tiles. The tiles are inter-connected through a mesh network of routers. The main components of a tile are, apart from the two cores, a unified L2 cache memory (256KB assigned to each core), a router that connects the tile to the mesh network and a Message Passing Buffer (MPB). The latter component is used for the commitment and retrieval of messages between cores, thus enabling inter-core communication. Each core is assigned 8KB from the MPB of its tile. The SCC chip operates on a board that contains Dual In-Line Memory Modules (DIMMs) and subsystems that enable the training and control of the chip. Each core is booted with a version of the Linux operating system. The chip is partitioned in memory domains, each one containing 12 cores (6 tiles) that are assigned to a single DIMM through a Memory Controller (MC). That way, a portion of the main memory is assigned to each core, thus creating its private memory space. The chip allows dynamic voltage adjustment across each voltage domain of 4 tiles (8 cores). Dynamic frequency adjustment is performed across the frequency domain of a single tile.
The board of the SCC is connected to a Management Console Personal Computer (MCPC) over a Peripheral Component Interconnect Express (PCIe) cable. The SCC is programmed in a Single Program, Multiple Data (SPMD) fashion. The MCPC is equipped with a special library (called RCCE) that supports the Message Passing and the Dynamic Voltage and Frequency Scaling qualities of the SCC. The code is compiled on the MCPC and runs on the SCC. Even though each SCC core is independent, they all share the /shared directory both among them and with the MCPC. That way, exchange of data is possible from the MCPC to the SCC cores and vice versa. The MCPC also contains utilities for the measurement of the chip's power consumption along with other operation parameters.
We have mapped efficiently both the FRM and the NRM Stochastic Simulation Algorithms to the SCC NoC. For this task we designed and implemented a parallel computing software framework for stochastic simulation of biochemical reaction networks for many-core CPUs. The framework developed regulates the entire flow of stochastic simulation that the user wants to perform. This includes configuring the simulation job taking into account parameters associated with the size of the biomodel (provided in a standard SBML file format) in terms of the number of molecular species (n) and reactions (m), the stochastic simulation algorithm (SSA) to be used (FRM or NRM), the desired mode of parallel operation (SSIP or MSIP), the total simulation time (lab time), the sampling period for the returned results, and the available processors (cores) in the "sectors" (memory domains) of the SCC processor. As a result the software framework developed fully prepares the system and performs the most efficient parallel simulation based on user choices and limitations of the available resources.
A simulation job starts with the uploading of the parsed model data, proceeds with the parallel execution of one simulation run (SSIP) or multiple simulation runs (MSIP) on the cores of the SCC NoC CPU, and finally when it is completed the results are sent back to the host PC, through which the user has access to the SCC processor board. The massive simulation results (can be thousands of time series of molecular species, with thousands of data points each) can be converted by an appropriate software tool (Results Parser, developed in conjunction with the rest of the project work tasks) in a form suitable for subsequent processing by data analysis software packages that may be available to a user of the system (e.g. Matlab, SciPy, Excel etc).
In order to validate the mapping of the algorithms to the mesh network topology of the SCC NoC and be able to asses performance scalability as the number of available cores increases with the problem size, we performed many tests with different models of biological networks of increasing complexity. Initially we investigated the number of reactions of a biomodel which can be performed efficiently using a "tile" of two cores of the SCC, before creating the need for communication between the tiles. After confirming the proper functioning of the above, we focused on improving our implementation to exploit efficiently an increasing number of processors and maintain high efficiency even when using all 48 cores of the Intel SCC NoC. To achieve this we had to develop and implement new methods of communication between the processor cores in the NoC, that can be thought actually as a 3-dimensional mesh with a 4 x 6 x 2 configuration. The new generic inter-processor communication methods we developed allow efficient message exchanges in three dimensions: 1) along the X-axis (horizontal) in the network, 2) along the Y-axis (vertical) in the network, and 3) intercommunication among the two cores within each tile. They also include routing messages through a central "hub" node, or through near-neighbor communications in the mesh, or using a binary tree of appropriate size embedded into the mesh.
Deliverables (technical report)
D2.1 A method for mapping efficiently SSAs to Intel’s SCC many-cores CPU