Polynome: Parameter Estimation for Boolean Network Models
Input
Supporting Algorithms/Procedures
Output
Description
Polynome is a web-based tool for parameter estimation and simulation of Boolean models of biological networks.
Given a set of input data, Polynome generates a Boolean network which fits the
(Booleanized) data and simulates the model. We represent Boolean
networks as time discrete dynamical systems, i.e., a Boolean network
with n nodes is represented as an n-tuple of polynomial functions
with coefficients in a field with two elements, where the i-th
polynomial function describes the state transitions of the i-th
node in the network.
There are two types of models that can be generated, deterministic or stochastic.
For a network of n nodes, a deterministic Boolean network will
have n polynomial functions, whereas a stochastic Boolean
network will have n sets of polynomial functions. There are
two ways to simulate a model, in parallel or sequentially. Parallel
simulation updates all nodes simultaneously, whereas sequential
simulation updates nodes according to an update schedule (provided by
the user). Each of these options is described below.
There are a number of options for viewing the model and its simulation. If the user
chooses to view the simulation, some statistics about the state space
are returned to the screen, including the number of components and
their sizes.
Input
Number of nodes
The number of nodes is the number of agents or constituents in the biological network. Currently the maximum value for n is 20; however certain procedures may require a smaller value for n (see Supporting Algorithms).
Format of the input data
Polynome accepts numerical data (floating point or multi-state) in table format: columns correspond to nodes and rows to time-series measurements. In the example below, the data represent 4 time-series measurements of 3 nodes.
| 1.1 | 2.8 | 3.4 |
| 2.5 | 1.2 | 3.3 |
| 3.2 | 2.1 | 2.4 |
| 0.1 | 2.9 | 2.1 |
Polynome is capable of handling multiple data sets. Each data set is preceded by a line beginning with a hash mark (#). Any characters following the hash mark in the same line are treated as user comments and are therefore ignored by the software.
| #First time series |
| 1.1 | 2.8 | 3.4 |
| 2.5 | 1.2 | 3.3 |
| 3.2 | 2.1 | 2.4 |
| 0.1 | 2.9 | 2.1 |
| #Second time series |
| 5.8 | 12.8 | 7.2 |
| 6.5 | 11.2 | 3.3 |
| 9.2 | 12.7 | 8.4 |
| 10.1 | 12.0 | 9.5 |
Data sets can be entered in the Input Data box or uploaded in a plain-text file.
Supporting Algorithms/Procedures
Data
Discretization
Polynome is designed to work with multi-state data, that is, data whose entries come from a finite set.
The current implementation is limited to two (2) states. Booleanizing the data is often done in modeling and it is a reasonable procedure considering the typically small
amounts of data available, as well as its low precision and noise content.
If the input data are not Boolean, they are Booleanized via a procedure based on an existing discretization method.
Inconsistency
We view a time series (s1, s2, ..., sm) as a sequence of input-output pairs ((s1,s2),(s2,s3),...,(sm-1,sm)). Discretization may generate inconsistencies; this refers to discretized measurements where one input returns more than one output. Inconsistencies can occur when time delays, hidden variables or noise in the data are present.
Consider for instance the next discretized time course with 4 time steps (s1, …,s4):
s1 1 1 1
s2 1 1 1
s3 0 0 1
s4 0 0 0
Here, the state (1 1 1) has two outputs: (1 1 1) and (0 0 1).
Because the model class underlying Polynome is that of polynomial functions, inconsistencies are handled as follows. All transitions from a state with multiple outputs are removed. In our example, we are left with the transition s3 -> s4.
An exception is the case where the number of nodes is small enough so that the minimal-model estimation method is used.
Modeling
The set of possible models that fits given data is typically very large.
We have implemented three procedures for searching the model space.
Minimal model sampling
This procedure samples the subspace of minimal models and returns a collection of most likely PDSs. Our implementation is based on a Groebner-fan method which computes the model space using Groebner fans and samples using relative sizes of the cones in the fan. This method returns PDSs that fit the data exactly. It is used for networks with n <= 10. If a wiring diagram is requested, then it returns a labelled graph, where the weight on an incoming edge of a node x indicates the probability of the associated source node being a true input to x. If a stochastic model is requested, then it returns a set of weighted functions per node.
Minimal model estimation
This procedure computes an ``approximate'' polynomial dynamical system (PDS), that is, one that almost fits the data. It is based on the evolutionary algorithm REACT, which optimizes an internal fitness function which depends on model complexity and data fit. Because of its computational complexity, it is only used for networks with n <= 5. It is called if a wiring diagram is requested and the data are inconsistent or if a deterministic model is requested.
Minimal model selection
This procedure selects from the subspace of minimal models a PDS which fits the data. Our implementation is based on the Minimal Sets Algorithm, which restricts the data to essential coordinates as identified by the algorithm and explores the minimal model space involving only those corresponding variables. This procedure is used for networks with n > 10. It returns a minimal PDS and wiring diagram when a deterministic model is requested.
Wiring diagrams
There are two main
Polynomial dynamical systems
Simulation
Output
Type of model
This input determines the type of polynomial model to be constructed.
- Deterministic: this option retrieves a deterministic Boolean model from the input data. According to the input data and network size, a deterministic model will be retrieved.
- Stochastic: The user can choose a stochastic consensus model. The stochastic Boolean model for a network of n nodes will return n sets of polynomial functions, where the i-th set of polynomial functions describes the possible state transitions of the i-th node in the network. At each update, the model uses one of the possible functions chosen at random according to a given probability distribution.
Type of simulation
This will determine update schedule of the polynomial functions in a model, that is, the order in which to evaluate the functions.
- Parallel: This type of simulation is setup by default in Polynome. Also known as synchronous update, in this type of simulation all functions get evaluated at the same time.
- Sequential: the functions are evaluated in some specified order. The function indices must be entered in the input box provided with each index used exactly once and should be in the range 1 to n (number of nodes).
Viewing Options
Booleanized data
If the user selects this option, the discretization into 2 states of the input raw data, will be displayed.
Model
Wiring diagram
The wiring diagram of a network of size n is the directed graph with n nodes where the nodes represent the elements of the network and arrows denoted causal relationships between nodes.
If the user selects this option, Polynome will display wiring diagram corresponding to the inferred network. The user can select the format of the image file from the list of formats (gif, jpg, png and ps).
Polynomial dynamical system
In this option, Polynome displays the polynomial model inferred through the input data.
State space simulation
The State Space of a network of size n is the directed graph with 2n nodes where the nodes represent the different states si of the network and an arrow from si to sj indicates the transition from input si to sj output; in other words, if f is the polynomial model of a given network, then f(si)= sj.
If the user selects this option, Polynome will display the state space corresponding to the inferred network. In a stochastic network, the probability for each transition can be displayed in the state space. The user can select the format of the image file from the list of formats (gif, jpg, png and ps).
Credits
- Elena Dimitrova
- Luis Garcia-Puente
- Franziska Hinkelmann
- Abdul Jarrah
- Reinhard Laubenbacher
- Brandilyn Stigler
- Mike Stillman
- Paola Vera-Licona