 Research Article
 Open Access
Inferring Parameters of Gene Regulatory Networks via Particle Filtering
 Xiaohu Shen^{1} and
 Haris Vikalo^{1}Email author
https://doi.org/10.1155/2010/204612
© X. Shen and H. Vikalo. 2010
 Received: 6 April 2010
 Accepted: 24 August 2010
 Published: 30 August 2010
Abstract
Gene regulatory networks are highly complex dynamical systems comprising biomolecular components which interact with each other and through those interactions determine gene expression levels, that is, determine the rate of gene transcription. In this paper, a particle filter with Markov Chain Monte Carlo move step is employed for the estimation of reaction rate constants in gene regulatory networks modeled by chemical Langevin equations. Simulation studies demonstrate that the proposed technique outperforms previously considered methods while being computationally more efficient. Dynamic behavior of gene regulatory networks averaged over a large number of cells can be modeled by ordinary differential equations. For this scenario, we compute an approximation to the CramerRao lower bound on the meansquare error of estimating reaction rates and demonstrate that, when the number of unknown parameters is small, the proposed particle filter can be nearly optimal.
Keywords
 Markov Chain Monte Carlo
 Particle Filter
 Gene Regulatory Network
 Stochastic Simulation Algorithm
 Chemical Master Equation
1. Introduction
Recent development of DNA and protein microarrays sparked a surge of interest in studying gene regulatory mechanisms. The excitement is due to the capability of the microarrays to conduct simultaneous tests of an entire genome of an organism. By testing a number of biological samples taken over a period of time, one can track the network dynamics. The experimental advances have been accompanied by the theoretical developments in modeling and computational studies of the networks. Combination of these research efforts provides critical information about the functionality of cells and organisms, reveals mechanisms of genetic diseases, enables optimization of diagnostic techniques and therapies, and provides aid in the process of drug discovery.
To enable the analysis of gene regulatory networks, we need accurate yet tractable models capturing their dynamical behavior. The molecular interactions in gene regulatory networks are inherently stochastic. For instance, the number of created proteins is a random variable due to thermal fluctuations in a cell which cause promotors to randomly switch between an active and a repressed state. The fluctuations in the number of proteins are enhanced by the protein degradation which is a stochastic process itself. This, along with several other sources of randomness, call for probabilistic modeling of gene regulatory networks. However, a very detailed description of a network may be difficult to analyze and often requires considerable computational efforts. Hence, several models with varying degrees of accuracy and complexity have been proposed. These models rely on representations via chemical master and chemical Langevin equations [4–6], and ordinary differential equations [7, 8] as well as Bayesian [9, 10] and Boolean [3, 11] networks. Having selected one of the above models, we are interested in finding its structure and parameters that provide the best explanation of the experimental data. This requires further computational studies and opens up questions related to, for example, stability and control of the network. However, inference problems in gene regulatory networks are often challenging, and the difficulty of a problem increases with the complexity of the model and the size of the network.
In this paper, we consider models of GRN based on chemical master equations and study the problem of estimating stochastic rate constants therein. Such models provide the most precise description of the network processes; however, they are also computationally the most demanding. We limit our focus on smallsized networks with a known structure but unknown rate constants. We approximate a chemical master equation by a related chemical Langevin equation [12] and employ a particle filter with the Markov Chain Monte Carlo move step to solve the rate estimation problem. Simulation studies demonstrate that the proposed technique outperforms previously considered methods while being computationally more efficient. Dynamic behavior of gene regulatory networks averaged over a large number of cells can be modeled by ordinary differential equations. For this scenario, we compute an approximation to the CramerRao lower bound on the meansquare error of estimating reaction rates and demonstrate that, when the number of unknown parameters is small, the proposed particle filter can be nearly optimal.
The paper is organized as follows. Section 2 describes the chemical master equation model of a gene regulatory network and its approximation by a chemical Langevin equation. Section 3 presents the particle filtering algorithm for the estimation of the stochastic rate constants and compares its performance with prior work. In Section 4, a deterministic model based on ordinary differential equations is described, and the CramerRao lower bound on the performance of estimating rate constants is computed. Finally, we conclude the paper in Section 5.
2. Models Based on Chemical Master and Chemical Langevin Equations
In (1), denotes the total number of reactions that are possible within the network (i.e., the number of the socalled reaction channels), and is the vector describing change in the number of molecules of each of the species due to the reaction in the reaction channel (e.g., is the change, either positive or negative, in the number of molecules of the network component due to the reaction in the channel). Moreover, in (1) is the socalled propensity function, that is, is the probability that during time interval there is a reaction in the channel. The propensity function can further be expressed as , where is the probability that one reaction takes place in and denotes the number of possible simultaneous reactions. (The coefficients are often referred to as the stochastic rate constants. The function counts all possible combinations of individual molecules that may lead to a reaction in the channel.) The chemical master equation is often used to simulate the Markov process and enable computational studies of GRN. To this end, one may employ various stochastic simulation algorithms, originally proposed by Gillespie [4].
We should point out that while the chemical Langevin equation (2) may be used as a network model for the purpose of parameter estimation, in general it is not sufficiently accurate to provide reliable simulations of the network dynamics. To conduct computational studies of a GRN, we still need to model them using stochastic simulation algorithms.
In [14], the authors find the best linearmodel fit to the data presumed to be generated by (6), and then infer parameters based on the derived linear model. In [15, 16], the use of statistical mechanics tools for the estimation of the parameters of a network modeled by (6) was considered. In [17, 18], a Markov Chain Monte Carlo (MCMC) algorithm was employed to infer the network parameters. This approach provides sound estimate of the parameters, but it requires a very high computational effort. As an alternative, we propose the use of a particle filter with an MCMC move step. This we describe in the next section.
3. Particle Filter with Markov Chain Monte Carlo Move Step
where , , , , , and denotes the covariance matrix of the measurement noise.
Introduction of the missing values enables propagating (9) by means of a particle filter, where the filter relies on a Gaussian importance density (12). A simple sequential importance resampling (SIR) scheme provides asymptotically consistent estimates, that is, the approximation converges to the true value of the parameters as the number of particles grows. However, the SIR scheme often suffers from sample impoverishment and, therefore, has weak performance. To improve the sample diversity and the performance of the particle filter, we employ the importance sampling scheme with an MCMC move step. Specifically, we use the MetropolisHastings algorithm to decide whether a resampled particle will be accepted or not. For implementation details, we refer the reader to the formal algorithm given below.
Algorithm 1 (initialization).
Set . Draw from the prior density . Assign particle weights , for , and normalize them.
Algorithm 2 (iterations).
 (i)
, , , , .
Set and update the particle weights as
 (ii)
(Normalization) Normalize the weights , and compute
 (iii)
 (iv)(Resample move) If resampling is performed in Algorithm 2( ), then for :
 (a)
Draw a candidate from a kernel density , where is the empirical covariance of in the previous step and is the smoothing parameter.
 (b)
Draw missing data from an importance density and set .
 (c)
 (d)
Set with prob. .
 (a)
3.1. Computational Study of a Viral Infection Network
We demonstrate the performance of the proposed algorithm on a viral infection network previously studied in [19, 20]. The network comprises reaction channels,
R^{1}:
R^{2}:
R^{3}:
R^{4}:
R^{4}:
R^{6}:
where denotes viral protein molecules and denotes synthesized viral cells. Reaction is the processes of producing viral cells from the viral and protein. Reactions and are the transcription and translation process of the viral genes, respectively. Reaction models replication of a viral template into a viral .
3.2. Computational Study of Prokaryotic Regulation
In this subsection, we illustrate the performance of the proposed algorithm when employed for estimating reaction rates in a network with parameters. In particular, we consider estimation of the reaction rates in a GRN model of prokaryotic auto regulation. The system is characterized by the following reactions [18]:
R^{1}:
R^{2}:
R^{3}:
R^{4}:
R^{5}:
R^{6}:
R^{7}:
R^{8}:
R^{9}:
R^{10}:
R^{11}:
R^{12}:
Reactions represent the reversible processes of repressor protein binding to and . Reactions are the transcription and translation processes of genes and . Reactions represent the degradation process of proteins and mRNAs in the system. The state vector collects the numbers of components , , , , , and , and hence it is a dimensional state vector.
Similar to the study of the viral infection network in the previous subsection, to infer the reaction rates, we assume that the above network evolves according to (3). However, the network is simulated via Gillespie's algorithm. In particular, we generate noisy observations , , where the measurement noise is Gaussian with (i.e., the noise variance matrix is ). The particle filter (Algorithm 1) is performed with , , and the resampling threshold . The log values of the parameters are initialized from the uniform distribution .
True and estimated parameters for the two algorithms. Algorithm 2(i) employs MCMC iterations, and Algorithm 2(ii) employs iterations.
Algorithm 1  Algorithm 2(i)  Algorithm 2(ii)  

 0.08  0.0707  0.0443  0.0869 
 0.82  0.8219  0.6726  0.7134 
 0.09  0.0597  0.1121  0.0650 
 0.9  0.5625  1.3913  0.5943 
 0.25  0.3283  0.1826  0.2862 
 0.1  0.1195  0.5800  0.0469 
 0.35  0.2875  0.9009  0.2561 
 0.3  0.4167  0.8943  0.3577 
 0.1  0.1197  0.1573  0.0985 
 0.1  0.1432  0.5097  0.2943 
 0.12  0.1178  1.2766  0.0984 
 0.1  0.1384  0.1669  0.1232 
time(s) 



4. A Deterministic Model of Gene Regulatory Networks
where .
Other steps of Algorithm 1 remain unchanged.
4.1. CramerRao Lower Bound on the MeanSquare Error of Estimating Reaction Rates
where denotes the matrix with all zero entries except the entry which is equal to .
Notice that are functions of and ; therefore, we can recursively calculate from and . The value of s can be obtained by numerically solving (19) (e.g., using Mathematica). This enables computation of and, therefore, the desired CRLB. (Note that the CRLB computed in this section assumes the discretized model (22); as , it approaches the true bound on estimating in (19)).
4.2. Computational Study of a Viral Infection Network
5. Conclusions
In this paper, we studied the problem of estimating reaction rates in a gene regulatory network modeled by a chemical Langevin equation, that is, a highdimensional stochastic differential equation. We proposed a solution which employs a particle filtering algorithm with Markov Chain Monte Carlo move step. Extensive simulation studies demonstrated that the proposed technique requires less computational complexity to achieve performance comparable to previously proposed methods. Moreover, we considered the deterministic description of the average network dynamics based on an ordinary differential equation model. For this scenario, we computed an approximate CramerRao lower bound on the meansquare error of the estimation and demonstrated that, for some of the parameters, the proposed particle filter can be nearly optimal. The computed CRLB is indicative of the number of data points (i.e., the number of experiments) required to achieve a desired accuracy of inferring reaction rates. Further studies are needed to enable nearCRLB performance in the scenario of estimating a large number of unknown parameters.
Declarations
Acknowledgment
This work was supported in part by the National Science Foundation under Grant no. CCF0845730.
Authors’ Affiliations
References
 Loomis WF, Sternberg PW: Genetic networks. Science 1995, 269(5224):649. 10.1126/science.7624792View ArticleGoogle Scholar
 Thieffry D: From global expression data to gene networks. BioEssays 1999, 21(11):895899. 10.1002/(SICI)15211878(199911)21:11<895::AIDBIES1>3.0.CO;2FMathSciNetView ArticleGoogle Scholar
 Albert R: Boolean Modeling of Genetic Regulatory Networks. In Complex Networks. Springer, New York, NY, USA; 2004.Google Scholar
 Gillespie DT: Exact stochastic simulation of coupled chemical reactions. Journal of Physical Chemistry 1977, 81(25):23402361. 10.1021/j100540a008View ArticleGoogle Scholar
 Gillespie DT: A rigorous derivation of the chemical master equation. Physica A 1992, 188(1–3):404425.MathSciNetView ArticleGoogle Scholar
 McAdams HH, Arkin A: Stochastic mechanisms in gene expression. Proceedings of the National Academy of Sciences of the United States of America 1997, 94(3):814819. 10.1073/pnas.94.3.814View ArticleGoogle Scholar
 Chen T, He HL, Church GM: Modeling gene expression with differential equations. Proceedings of Pacific Symposium on Biocomputing, 1999 2940.Google Scholar
 Grognard F, de Jong H, Gouze JL: PiecewiseLinear Models of Genetic Regulatory Networks: Theory and Examples, Lecture Notes in Control and Information Sciences (LNCIS). Springer, New York, NY, USA; 2007.Google Scholar
 Friedman N, Linial M, Nachman I, Pe'er D: Using Bayesian networks to analyze expression data. Journal of Computational Biology 2000, 7(34):601620. 10.1089/106652700750050961View ArticleGoogle Scholar
 Heckerman D: A tutorial on learning with Bayesian networks. In Learning in Graphical Models. Kluwer Academic Publishers, Dordrecht, The Netherlands; 1998.Google Scholar
 Kauffman SA: Metabolic stability and epigenesis in randomly constructed genetic nets. Journal of Theoretical Biology 1969, 22(3):437467. 10.1016/00225193(69)900150View ArticleGoogle Scholar
 Cai X, Wang X: Stochastic modeling and simulation of gene networks. IEEE Signal Processing Magazine 2007, 24(1):2736.View ArticleGoogle Scholar
 Boys RJ, Wilkinson DJ, Kirkwood TBL: Bayesian inference for a discretely observed stochastic kinetic model. Statistics and Computing 2008, 18(2):125135. 10.1007/s112220079043xMathSciNetView ArticleGoogle Scholar
 Chen KC, Wang TY, Tseng HH, Huang CYF, Kao CY: A stochastic differential equation model for quantifying transcriptional regulatory network in Saccharomyces cerevisiae. Bioinformatics 2005, 21(12):28832890. 10.1093/bioinformatics/bti415View ArticleGoogle Scholar
 Berg J: Dynamics of gene expression and the regulatory inference problem. Europhysics Letters 2008., 82(2):Google Scholar
 Benecke A: Gene regulatory network inference using out of equilibrium statistical mechanics. HFSP Journal 2008, 2(4):183188. 10.2976/1.2957743View ArticleGoogle Scholar
 Golightly A, Wilkinson DJ: Bayesian sequential inference for stochastic kinetic biochemical network models. Journal of Computational Biology 2006, 13(3):838851. 10.1089/cmb.2006.13.838MathSciNetView ArticleGoogle Scholar
 Golightly A: Bayesian inference for nonlinear multivariate diffusion processes, Ph.D. thesis. Newcastle University; 2006.Google Scholar
 Srivastava R, You L, Summers J, Yin J: Stochastic vs. deterministic modeling of intracellular viral kinetics. Journal of Theoretical Biology 2002, 218(3):309321. 10.1006/jtbi.2002.3078MathSciNetView ArticleGoogle Scholar
 Goutsias J: Quasiequilibrium approximation of fast reaction kinetics in stochastic biochemical systems. Journal of Chemical Physics 2005, 122(18):15.Google Scholar
 Gillespie DT: Chemical Langevin equation. Journal of Chemical Physics 2000, 113(1):297306. 10.1063/1.481811View ArticleGoogle Scholar
 Li Z, Osborne MR, Prvan T: Parameter estimation of ordinary differential equations. IMA Journal of Numerical Analysis 2005, 25(2):264285. 10.1093/imanum/drh016MathSciNetView ArticleMATHGoogle Scholar
 Cramer H: Mathematical Models of Statistics. Princeton University Press, Princeton, NJ, USA; 1946.MATHGoogle Scholar
 Vikalo H, Hassibi B, Hassibi A: Limits of performance of quantitative polymerase chain reaction systems. IEEE Transactions on Information Theory 2010, 56(2):688695.MathSciNetView ArticleMATHGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.