A statistical model of macromolecules dynamics for Fluorescence Correlation Spectroscopy data analysis

In this paper we propose a new mathematical model to describe the mechanisms of biological macromolecules interactions. Our model consists of a discrete stationary random sequence given by a solution of difference stochastic equation, characterized by a drift predictive component and by a diffusion term. The relative statistical estimations are very simple and effective, promising to be a good tool for mathematical description of collective biological reactions. This paper presents the mathematical model and its verification on simulated data set, obtained on the basis of the well-known Stokes-Einstein model. In particular we considered several mix of particles of different diffusion coefficient, respectively: D1 = 10μm/sec and D2 = 100μm/sec. The parameters evaluated by this new mathematical model on simulated data, show good estimation accuracy, in comparison with the prior parameters used in the simulations. Furthermore, when analyzing the data for mix of particles with different diffusion coefficient, the proposed model parameters V (regression) and σ (square variance of stochastic component) have a good discriminative ability for the molar fraction determination.


Introduction
Biological systems are intrinsically complex, due the huge number of interacting objects.In nanotechnology and nanobiotechnology applications, an important role is played by dynamics of macromolecules or interaction between them (proteins, metabolites, substrates,).In such wide view, mathematical models can be a useful tool for investigating a large number of questions in metabolism, biochemistry, genetics and gene-environment interactions, genotype-phenotype mapping and their use in medicine.Various complex systems like biochemical reactions are driven by binding / unbiding dynamics of the single species concentrations, described by mathematical models [18,3,7,14].
Protein-ligand binding is a key biological process at molecular level.The identification and characterization of small-molecule binding sites on therapeutically relevant proteins could have tremendous implications for target 234 STATISTICS ANALYSIS FOR BIOLOGICAL INTERACTION evaluation and rational drug design [4].Simulated statistical data based on simple chemical kinetic hypotheses can be used to validate the presence or absence of a binding in a set of molecules that makes up a complex system [11].In some natural conditions and normalization, it has the asymptotical behavior as a discrete Markov diffusion [12].In the simpler case of non-interacting species, the dynamic tracking of particles reduces to the study of mixtures of molecules with different speed of Brownian motion, due to their different hydrodynamic radii and to the corresponding different diffusion coefficients.
In the last two decades, the FCS technique has been developed, based on the laws of molecular diffusion, formulated from Browns observations of random particle motion and used to study the movements and the interactions of biomolecules at extremely dilute concentrations.FCS is a fluorescence based technique that can be used to study a variety of sample types.The analysis of FCS data examines the minute intensity fluctuations caused by spontaneous deviations from the mean at thermal equilibrium.These fluctuations can result from variations in local concentrations owing to molecular mobility or from characteristic intermolecular or intramolecular reactions of fluorescently labeled biomolecules present at low concentrations.
The analytical potential of FCS for the life sciences has been shown in numerous applications since the original work of Rigler and coworkers [1,17] FCS was successfully used to study association and dissociation of nucleic acids [10] and proteins [15].Moreover, these features allowed for real-time investigation of enzyme kinetics [9].
FCS is a method worth considering for a variety of biological and physicochemical questions.FCS has great advantages for quantitatively analyzing biomolecular mobilities and interactions in situ.Precise values of physical parameters, such as diffusion coefficients, are determined relatively easily and quickly.Essential information about processes governing molecular dynamics can, thus, be derived from the temporal pattern by which fluorescence fluctuations arise and decay.Although FCS curves are easy to record, interpreting them may not be straightforward.Many free parameters in the fitting procedure will return seemingly good fit results, even if the values for these parameters fail to reflect the sample properties or have no physical meaning at all.Any changes in molecular shape or size, on binding or cleavage, that affect the hydrodynamic radius of a particle, are also reflected in the diffusion coefficient and, thus, in the average diffusion time through the observation volume.
Interpretation of FCS data generally requires fitting the curves to a mathematical model.The type of applicable model and the parameters that can be extracted depends on the system under investigation.The diffusion coefficient D is derived from the characteristic decay time of the correlation function [19].
Usually, in order to estimate the diffusion coefficient and other parameters of interest, the autocorrelation function of the data is fitted typically using a nonlinear least squares algorithm.In this paper we introduce a new computational model and the corresponding statistical parameters characterizing the mobility of particles, in order to describe the dynamic of binding among different species.The model is based on difference stochastic equation with predictable linear component, and with an additional stochastic component, determined by a stationary, in the wide sense, random sequence.The statistical analysis of real measurements is particularly important when the observation of physical data (biological, chemical etc.) has often no justification in the form of simple physical theory.
In numerous complex systems like biological processes with equilibrium, the dynamics of concentration, or frequencies of some predefined attribute presence, can be described by the mathematical model of binary discrete Markov diffusions, based on statistical data of elementary hypotheses validation about the presence or absence of the predefined attribute in the set of elements that make up a complex system [11].
In our consideration, we proceed from the following principles of statistical analysis of physical data: • all the elements that make up the system can gain or lose said attribute over time, that is the relative frequency of attribute, a discrete dynamic variable; • the basic objects of our study are discrete Markov diffusions, characterized by relative frequencies of presence or absence of the attribute in a sample of fixed volume at each time instant; • it is assumed that the average frequency of the attribute at a fixed time depends only on the average frequency of the attribute relative to the next previous time.Such relationship is called persistent regression and it is used as a fundamental condition for the subsequent analysis of the model.
The present work introduces a new computational method and the corresponding statistical estimators of parameters characterizing the dynamics of frequency (concentration).We proceed from the following Stat., Optim.Inf.Comput.Vol. 4, September 2016 D. KOROLIOUK, V.S. KOROLIUK, E. NICOLAI, P. BISEGNA, L. STELLA, AND N. ROSATO 235 considerations: two different models, the physical (Stokes-Einstein) diffusion process and the mathematical one (stationary discrete Markov diffusion), describing the same physical process.

Motivation of the model
The FCS method of macromolecules dynamic monitoring, registres the number of fluctuations of fluorescently labeled molecules in a confocal observation volume.The basic object of analysis is the fluctuation α t of the fluorescence intensity, that is the difference of its current value at the instant t ≥ 0 and its average value.The fluorescence intensity is proportional to the number of the labeled molecules observed at the instant t.A single molecule can freely diffuse in and out the observation volume or undergo chemical reactions.Thus resulting in a combination of motions of different species characterized by different drift and diffusion parameters.The mathematical model of stationary Gaussian diffusion, defined by a continuous process of Ornstein-Uhlenbeck type, is motivated by its important special case of motion of particles in a viscose liquid medium, known in physics as the Langevin equation.Consider the equation: where m is the particle mass, v(t) is the particle speed, β is the medium viscosity and µ t is a noise component which will further be clarified.
Here, the component of the "force" m v(t) which slows down the particle, is proportional to the "speed" v(t), and the presence of "noise" is associated with chaotic collisions of the particle with the molecules of the medium (due to thermal motion of the latter).In this formulation as the noise is used the random process µ t = Ẇt , where W = {W t , t0} be a stochastic process of Brownian motion.According to Paley-Wiener-Zygmund Theorem, the random function W is nowhere differentiable almost surely.
In order to give meaning to the equation: known in physics as the Langevin equation, should be given a probabilistic meaning of the derivative process Ẇt ,t ≥ 0. This is done by operating the concept of Ito stochastic integral.

Statistical model of stationary discrete Markov diffusion
In our case, we deal with discrete time, so using the structures like (1), we have to operate with finite (stochastic) differences.Therefore, we have no need to operate with stochastic integrals.In [11], a model of discrete Markov diffusions with persistent regression is considered as a stationary discrete Markov diffusion α t , t ∈ N 0 = {0, 1, 2, ...}, given by a solution of the difference stochastic equation: where and The stationary, in wide sense, random sequence α t , t = 0, 1, 2, is characterized by two numerical parameters: V : regression parameter of the predictable component; σ 2 : variance of the stochastic component σw ( t + 1), t = 0, 1, 2, defined by a sequence of independent, identically distributed Gaussian random variables with parameters (3) or alternatively, by martingale-differences which satisfy the conditions of the Central Limit Theorem [8].
An important role is played by the condition of statistical stationarity for the discrete stationary discrete Markov diffusion [12]: where In this case, one can use several sufficient statistics which allow to estimate the numerical parameters of the model generated by the solutions of the difference stochastic equations (2) (3).In statistical analysis of stationary discrete Markov diffusion with persistent linear regression (2) (3), the main problem is to construct the statistical estimations of numerical parameters V (the regression) and σ 2 (the square variance of the stochastic component).The condition of statistical stationarity allows the use of the correlation analysis of the stationary discrete Markov diffusion.The covariances related to the stationary discrete Markov diffusion, defined by solution of equation (2) (3), in terms of the statistical stationarity (4), has the following form: The following relations take place: and also The covariances (5) generate, as estimators, the following statistics: The statistics (8) generate the following two estimators of the predictable component parameter V : which are strongly consistent and asymptotically unbiased.Naturally one has the following obvious estimation The stationarity condition (4) generates the stochastic component dispersion σ 2 : where Remark 1 The properties of the statistical estimator V 0 T are studied in [12].The dispersion-type estimator V T does not depend on its mean value.

Remark 2
The formula (7), under the condition 0 < V < 1, can be also used as control of dynamics as follows: So in logarithmic scale, the covariance of the stationary discrete Markov diffusion is characterized by the linear function a − sb where, by definition, The verification of stationary discrete Markov diffusion property can be done using the following estimations of the parameters a and b:

Kinetic Stokes-Einstein diffusion model
The discretized three-dimensional Brownian motions of n particles moving inside a cubic box of side L centered at the origin of a Cartesian frame have been obtained by numerical simulation (see also [3]).Each Cartesian component of the position of each particle has been modeled as an independent discretized scalar Brownian motion.Let W t denote the value of such scalar Brownian motion at time t=0, 1, 2. The following time-marching scheme has been adopted: where each dW t is an independent random variable of the form √ 2DδN (0, 1).Here N (0, 1) denotes the normal random variable, δ is the sampling time interval, and D is the diffusion coefficient, given by where kT is kinetic energy, r is radius of the particle, η is the viscosity of the medium.Periodic boundary conditions have been enforced at the box boundaries.The initial position of each particle was assumed to be a three-dimensional random variable, uniformly distributed inside the cubic box.A spherical region of interest (ROI) of diameter D R centered at the origin has been considered.At each time instant t, the number α t of particles inside the ROI has been recorded.The algorithm has been implemented using massively parallel GPU computing and MATLAB.Two simulation campaigns have been performed.First, all particles have been given the same diffusion coefficient D, and different values of D have been considered.Then, a mixture of two particle families with different diffusion coefficients D 1 and D 2 have been taken into account.The volume fraction of the slow-diffusing family was denoted by f .
The Parameter values adopted in the simulations are an ROI diameter of 1 µm, 10 particles expected in ROI giving a density of 19.099 particles/µm 3 .Assuming a sampling frequency of 50 kHz (acquisition time: δ = 20 µs) with a simulation time of 10s (500 ksamples).The first run of simulation was performed setting the diffusion coefficient at different values, respectively: 10, 30, 50, 80, 100 µm 2 /s.In the second run, two set of particles were considered to exist concurrently in the ROI, in detail: 10 and 100 µm 2 /s changing the ratio between the two species from 0 to 1, as schematized in the following

The stationary discrete Markov diffusion models verification by direct numerical simulation
The models verification by numerical simulation is performed by mean of numerical simulations of the trajectories of stationary discrete Markov diffusion ( 1) -( 2) for specified parameters V and σ .Next, according to statistical formulas ( 7) -( 11), are calculated the values of estimates V 0 T , σ 2 0T and σ 2 T comparing with the original (theoretical) settings of parameters V , σ.
For different runs of the random numbers generator, we have the following result of the direct numerical simulation.
The following is the model numerical simulation and parameters estimation, case V = 0, 2; σ = 10.
In our calculations, we use 30000 samples of the data length, which give good convergence.The model verification is done for a wide range of V (0 < V < 1) and σ (σ > 0), so we give a calculation for another original setting of parameters V , σ.
The use of stationary discrete Markov diffusion model in the analysis of statistical data, based on the Stokes-Einstein diffusion model, gives effective mechanisms of interactions analysis of slow and fast diffusions (see Fig. 3, 4).Let us first consider the two extreme cases mix-rate =(0|1) and (1|0).
For the mix-rate =( 0|1) one can obtain, using the estimators ( 8) -( 12), the following parameters evaluations: The same calculations for the mix-rate = (1|0) give the following: One can see that the values of the main parameters V , σ in both the cases are very different.This is an indication that the main parameters V , σ can also be discriminant for any value of the mix-rate.So now we should to repeat 240 STATISTICS ANALYSIS FOR BIOLOGICAL INTERACTION the same calculus of the model parameters for the mix-rate values (0,1|0,9); (0,3|0,7); (0,5|0,5); (0,7|0,3), (0,9|0,1) and then to compare the dynamics of parameters V , σ and σ 2 0 by variating the mix-rate.
5. The mix rate discrimination capability of the fundamental parameters of the model using Kinetic Stokes-Einstein diffusion model simulated data The simulated data, based on Stokes-Einstein diffusion model, has been produced for seven mix-rate values (0,1|0,9); (0,3|0,7); (0,5|0,5); (0,7|0,3) and (0,9|0,1).Every simulation run contains 500000 samples, used for evaluation of parameters V , σ and σ 2 0 by direct application of the estimators ( 8) -( 12).The following results has been obtained for the parameter σ : The parameter V evaluation gives us the following: And for the parameter σ 2 0 we have: One can see from the tables and relative plots that, the parameters σ and V have good ability to distinguish the mix rate of two ensembles of the particles with different diffusion coefficient, according to the Stokes-Einstein diffusion model.
However, the parameter σ 2 0 does not have such property.

Conclusions
The following results has been obtained: 1.A new statistical model, presented by the solution of equation ( 2) -( 3) is proposed for the mathematical description of macromolecules interaction, which is characterized by numerical parameters V (regression) and σ 2 (the square variance of the stochastic component).2. There were obtained statistical estimators ( 8), ( 11) and ( 12) of the interaction parameters (V , σ 2 ) promised to be far superior in accuracy and efficiency than the traditionally used method of autocorrelation curve fitting, based on kinetic Stokes-Einstein model ( 16) -( 17).Generally the biochemical systems are quite complex, involving many species and many reactions, thus it is difficult to study and analyze them.The strategy provided utilizes an experimental technique combined with stochastic difference equations to extrapolate parameters directly related to biochemical system dynamics.The proposed model may be used for a large class of biochemical reaction models.A possible application in biophysics is the study of biological macromolecules diffusion in solution (eg.: enzymes and substrates), combining our model with dataset acquired by a fluorescence correlation spectroscopy system.In this case, in fact, the fluorescent fluctuations signal of marked molecules has been observed in a reduced number of molecules in a small measuring volume (about 1 cubic micron) defined by the focusing of excitation laser.The speed of these fluctuations depends, in turn, from the diffusion coefficient of molecules under study, hence by its hydrodynamic radius.Applying in next works the proposed model to experimental data of fluorescence correlation spectroscopy, it will be possible to directly determine the diffusion coefficients and molar fractions of molecules under study.In particular, it will be possible to follow the dynamic of binding/unbinding of enzymatic macromolecules and its substrate, accordingly to the mass variation of the observables.These processes could be of fundamental importance for the development of new drugs as inhibitors of artificial enzymes involved in pathologies.To understand complex biological systems, it is necessary to obtain a thorough understanding of the interaction between molecules and pathways.Such fact is even truer for understanding complex diseases such as cancer, Alzheimer and others as well.For these reasons, we believe the method here presented could provide a new important tool to improve biomedical and pharmaceutical research.

Figure 5 .
Figure 5. Left: the plot of σ estimated values, as a function of mix-rate.Right: the parameter σ estimation for the 7 groups of the samples acquired by 2 different runs of the random numbers generator.

Figure 6 .
Figure 6.Left: the plot of V estimated values, as a function of mix-rate.Right: the parameter V estimation for the 7 groups of the samples acquired by 2 different runs of the random numbers generator.

Figure 7 .
Figure 7. Left: the plot of σ 2 0 estimated values, as a function of mix-rate.Right: the parameter σ 2 0 estimation for the 7 groups of the samples acquired by 2 different runs of the random numbers generator.

3 .
The statistical estimators (8) -(12), applied to a numerically simulated process defined by the statistical model (2) -(3), give excellent coincidence of the parameters evaluation with their real value, which proves the high accuracy of estimators (8) -(12).4. The proposed model has been tested and verified on a set of simulated data derived from a discrete kinetic Stokes-Einstein Brownian process of two biological molecules with different diffusion coefficients and different molar fractions.The statistics (8) -(12) has quasi-linear discriminant capacity for the proportion of different molar fractions containment.