Stochastic dependence examples in life. Problem of mathematical modeling (approximation). Stochastic model of a literary work

Considering the dependence between characteristics, let us first of all highlight the dependence between the change in factor and resultant characteristics, when a very specific value of the factorial characteristic corresponds to many possible values ​​of the effective characteristic. In other words, each value of one variable corresponds to a certain (conditional) distribution of another variable. This dependence is called stochastic. The emergence of the concept of stochastic dependence is due to the fact that the dependent variable is influenced by a number of uncontrolled or unaccounted factors, as well as the fact that changes in the values ​​of variables are inevitably accompanied by some random errors. An example of a stochastic relationship is the dependence of agricultural crop yields Y from the mass of applied fertilizers X. We cannot accurately predict the yield, since it is influenced by many factors (precipitation, soil composition, etc.). However, it is obvious that with a change in the mass of fertilizers, the yield will also change.

In statistics, observed values ​​of characteristics are studied, so stochastic dependence is usually called statistical dependence.

Due to the ambiguity of the statistical relationship between the values ​​of the resultant characteristic Y and the values ​​of the factor characteristic X, the dependence scheme averaged over X is of interest, i.e. pattern expressed by conditional mathematical expectation M(Y/X = x)(calculated with a fixed value of the factor characteristic X = x). Dependencies of this kind are called regression, and the function ср(х) = M(Y/X = x) - regression function Y on X or forecast Y By X(designation y x= f(l)). At the same time, the effective sign Y also called response function or explained, output, resultant, endogenous variable, and the factor attribute X - regressor or explanatory, input, predictive, predictor, exogenous variable.

In Section 4.7 it was proved that the conditional mathematical expectation M(Y/X) =ср(х) gives the best forecast of Y from X in the root-mean-square sense, i.e. M(Y- f(x)) 2 M(Y-g(x)) 2, where g(x) - any other UPOH forecast.

So, regression is a one-way statistical relationship that establishes correspondence between characteristics. Depending on the number of factor characteristics describing the phenomenon, there are steam room And multiple regression. For example, paired regression is a regression between production costs (factor characteristic X) and the volume of products produced by the enterprise (resultative characteristic Y). Multiple regression is a regression between labor productivity (resultative characteristic Y) and the level of mechanization of production processes, working hours, material intensity, and worker qualifications (factor characteristics X t, X 2, X 3, X 4).

They are distinguished by shape linear And nonlinear regression, i.e. regressions expressed by linear and nonlinear functions.

For example, f(X) = Oh + Kommersant - paired linear regression; f(X) = aX 2 + + bx + With - quadratic regression; f(X 1? X 2,..., X p) = p 0 4- fi(X(+ p 2 X 2 + ... + p„X w - multiple linear regression.

The problem of identifying statistical dependence has two sides: establishing tightness (strength) of connection and definition forms of communication.

Dedicated to establishing closeness (strength) of communication correlation analysis, the purpose of which is to obtain, based on available statistical data, answers to the following basic questions:

  • how to choose a suitable statistical connection meter (correlation coefficient, correlation ratio, rank correlation coefficient, etc.);
  • how to test the hypothesis that the resulting numerical value of the relationship meter really indicates the presence of a statistical relationship.

Determines the form of communication regression analysis. In this case, the purpose of regression analysis is to solve the following problems based on available statistical data:

  • choosing the type of regression function (model selection);
  • finding unknown parameters of the selected regression function;
  • analysis of the quality of the regression function and verification of the adequacy of the equation to empirical data;
  • forecasting unknown values ​​of the resultant characteristic based on given values ​​of factor characteristics.

At first glance, it may seem that the concept of regression is similar to the concept of correlation, since in both cases we are talking about a statistical dependence between the characteristics being studied. However, in reality there are significant differences between them. Regression implies a causal relationship when a change in the conditional average value of an effective characteristic occurs due to a change in factor characteristics. Correlation does not say anything about the causal relationship between characteristics, i.e. if there is a correlation between X and Y, then this fact does not imply that changes in values X determine the change in the conditional average value of Y. Correlation simply states the fact that changes in one value, on average, correlate with changes in another.

Stochastic empirical dependence

The dependence between random variables is called stochastic dependence. It manifests itself in a change in the distribution law of one of them (the dependent variable) when the others (arguments) change.

Graphically stochastic empirical dependence, in the coordinate system dependent variable - arguments, is a set of randomly located points that reflects the general tendency of the behavior of the dependent variable when the arguments change.

A stochastic empirical dependence on one argument is called a pair dependence; if there is more than one argument, it is called a multidimensional dependence. An example of a paired linear relationship is shown in Fig. 1.()

Rice. 1.

Unlike the usual functional dependence, in which changes in the value of an argument (or several arguments) correspond to a change in a deterministic dependent variable, in a stochastic dependence there is a change in the statistical distribution of a random dependent variable, in particular, the mathematical expectation.

Mathematical modeling (approximation) problem

The construction of a stochastic dependence is otherwise called mathematical modeling (approximation) or approximation and consists of finding its mathematical expression (formula).

An empirically established formula (function), which reflects a not always known, but objectively existing true relationship and corresponds to the basic, stable, repeating relationship between objects, phenomena or their properties, is considered as a mathematical model.

The stable relationship of things and their true dependence. whether it is modeled or not, it exists objectively, has a mathematical expression, and is considered as a law or its consequence.

If a suitable law or consequence from it is known, then it is natural to consider them as the desired analytical dependence. For example, the empirical dependence of the current strength I in the voltage circuit U and load resistance R follows from Ohm's law:

Unfortunately, the true dependence of variables in the vast majority of cases is a priori unknown, so there is a need to detect it, based on general considerations and theoretical concepts, that is, constructing a mathematical model of the pattern in question. It is taken into account that the given variables and their increments against the background of random fluctuations reflect the mathematical properties of the desired true dependence (behavior of tangents, extrema, roots, asymptotes, etc.)

The approximating function selected in one way or another smooths out (averages) random fluctuations in the initial empirical values ​​of the dependent variable and, thereby suppressing the random component, is an approximation to the regular component and, therefore, to the desired true dependence.

The mathematical model of the empirical relationship has theoretical and practical significance:

· allows you to establish the adequacy of experimental data to one or another known law and identify new patterns;

· solves for the dependent variable the problem of interpolation within a given interval of argument values ​​and prediction (extrapolation) outside the interval.

However, despite the great theoretical interest of finding a mathematical formula for the dependence of quantities, in practice it is often enough only to determine whether there is a connection between them and what its strength is.

The task of correlation analysis

A method for studying the relationship between changing quantities is correlation analysis.

The key concept of correlation analysis that describes the relationship between variables is correlation (from English correlation - coordination, connection, relationship, relationship, interdependence).

Correlation analysis is used to detect stochastic dependence and assess its strength (significance) by the magnitude of the correlation coefficients and correlation ratio.

If a relationship is found between variables, then the correlation is said to be present or that the variables are correlated.

Indicators of the closeness of the connection (correlation coefficient, correlation ratio) modulo vary from 0 (in the absence of connection) to 1 (in the case of degeneration of the stochastic dependence into a functional one).

A stochastic relationship is considered significant (real) if the absolute estimate of the correlation coefficient (correlation relationship) is significant, that is, 2-3 greater than the standard deviation of the coefficient estimate.

Note that in some cases a connection can be found between phenomena that are not in obvious cause-and-effect relationships.

For example, for some rural areas, a direct stochastic relationship has been identified between the number of nesting storks and children born. The spring count of storks makes it possible to predict how many children will be born this year, but the dependence, of course, does not prove the well-known belief, and is explained by parallel processes:

· the birth of children is usually preceded by the formation and establishment of new families with the establishment of rural houses and farmsteads;

· expanding nesting opportunities attracts birds and increases their numbers.

Such a correlation between characteristics is called a false (imaginary) correlation, although it may have practical significance.

Probability theory is often perceived as a branch of mathematics that deals with the “calculus of probabilities.”

And all this calculation actually comes down to a simple formula:

« The probability of any event is equal to the sum of the probabilities of the elementary events included in it" In practice, this formula repeats the “spell” that is familiar to us since childhood:

« The mass of an object is equal to the sum of the masses of its constituent parts».

Here we will discuss not so trivial facts from probability theory. We will talk, first of all, about dependent And independent events.

It is important to understand that the same terms in different branches of mathematics can have completely different meanings.

For example, when they say that the area of ​​a circle S depends on its radius R, then, of course, we mean functional dependence

The concepts of dependence and independence have a completely different meaning in probability theory.

Let's start getting acquainted with these concepts with a simple example.

Imagine that you are conducting a dice-throwing experiment in this room, and your colleague in the next room is also tossing a coin. Suppose you are interested in event A – your colleague gets a “two” and event B – your colleague gets a “tails”. Common sense dictates: these events are independent!

Although we have not yet introduced the concept of dependence/independence, it is intuitively clear that any reasonable definition of independence must be designed so that these events are defined as independent.

Now let's turn to another experiment. A dice is thrown, event A is a two, and event B is an odd number of points. Assuming that the bone is symmetrical, we can immediately say that P(A) = 1/6. Now imagine that they tell you: “As a result of the experiment, event B occurred, an odd number of points fell.” What can we now say about the probability of event A? It is clear that now this probability has become zero.

The most important thing for us is that she changed.

Returning to the first example, we can say information the fact that event B happened in the next room will not affect your ideas about the probability of event A. This probability Will not change from the fact that you learned something about event B.

We come to a natural and extremely important conclusion -

if information that the event IN happened changes the probability of an event A , then events A And IN should be considered dependent, and if it does not change, then independent.

These considerations should be given a mathematical form, the dependence and independence of events should be determined using formulas.

We will proceed from the following thesis: “If A and B are dependent events, then event A contains information about event B, and event B contains information about event A.” How can you find out whether it is contained or not? The answer to this question is given by theory information.

From information theory we need only one formula that allows us to calculate the amount of mutual information I(A, B) for events A and B

We will not calculate the amount of information for various events or discuss this formula in detail.

It is important for us that if

then the amount of mutual information between events A and B is equal to zero - events A and B independent. If

then the amount of mutual information is events A and B dependent.

Appeal to the concept of information is of an auxiliary nature here and, as it seems to us, allows us to make the concepts of dependence and independence of events more tangible.

In probability theory, the dependence and independence of events is described more formally.

First of all, we need the concept conditional probability.

The conditional probability of event A, provided that event B has occurred (P(B) ≠0), is called the value P(A|B), calculated by the formula

.

Following the spirit of our approach to understanding the dependence and independence of events, we can expect that conditional probability will have the following property: if events A and B independent , That

This means that information that event B has occurred has no effect on the probability of event A.

The way it is!

If events A and B are independent, then

For independent events A and B we have

And

a relationship between random variables in which a change in the distribution law of one of them occurs under the influence of a change in the other.


View value Dependency Stochastic in other dictionaries

Addiction- bondage
subjection
subordination
Synonym dictionary

Dependence J.— 1. Distraction. noun by value adj.: dependent (1). 2. Conditionality of something. what kind of circumstances, reasons, etc.
Explanatory Dictionary by Efremova

Addiction- -And; and.
1. to Dependent. Political, economic, material. Z. from smth. weighs me down, oppresses me. H. theory from practice. Living in dependence. Fortress z. (state........
Kuznetsov's Explanatory Dictionary

Addiction— - the state of an economic entity in which its existence and activities depend on material and financial support or interaction with other entities.
Legal dictionary

Fisher dependency- - a relationship establishing that an increase in the level of expected inflation tends to raise nominal interest rates. In the most strict version - dependence........
Legal dictionary

Linear Dependency— - economic and mathematical models in the form of formulas, equations in which economic values, parameters (argument and function) are interconnected by a linear function. The simplest........
Legal dictionary

Drug Dependence- a syndrome observed in drug or substance abuse and characterized by a pathological need to take a psychotropic drug in order to avoid the development......
Large medical dictionary

Drug Dependence Mental- L. z. without withdrawal symptoms if you stop taking the drug.
Large medical dictionary

Drug Dependence Physical- L. z. with withdrawal symptoms in case of discontinuation of the drug or after the introduction of its antagonists.
Large medical dictionary

Serfdom Dependency- personal, land and administrative dependence of peasants on landowners in Russia (11th century - 1861). Legally formalized in the law. 15th - 17th centuries serfdom.

Linear Dependency- a relationship of the form С1u1+С2u2+... +Сnun?0, where С1, С2,..., Сn are numbers, of which at least one? 0, and u1, u2, ..., un are some mathematical objects, for example. vectors or functions.
Large encyclopedic dictionary

Serfdom Dependency— - personal, land and administrative dependence of peasants on feudal lords in Russia in the 11th century. -1861 Legally formalized at the end of the 15th-17th centuries. serfdom.
Historical Dictionary

Serfdom Dependency- personal dependence of peasants in the feud. society from the feudal lords. See Serfdom.
Soviet historical encyclopedia

Linear Dependency— - see the article Linear independence.
Mathematical Encyclopedia

Lyapunov Stochastic Function is a non-negative function V(t, x), for which the pair (V(t, X(t)), Ft) is a supermartingale for some random process X(t), Ft is the s-algebra of events generated by the flow process Xdo........
Mathematical Encyclopedia

Stochastic Approximation- a method for solving a class of statistical problems. assessment, in which the new assessment value is an amendment to an existing assessment based on a new observation.........
Mathematical Encyclopedia

Stochastic Geometry is a mathematical discipline that studies the relationship between geometry and probability theory. S. g. developed from the classical. integral geometry and problems about geometric........
Mathematical Encyclopedia

Stochastic Dependency- (probabilistic, statistical) - dependence between random variables, which is expressed in a change in the conditional distributions of any of the values ​​when the values ​​change.......
Mathematical Encyclopedia

Stochastic Game- - a dynamic game, in which the transition distribution function does not depend on the prehistory of the game, i.e. S. and. were first defined by L. Shapley, who considered antagonistic.........
Mathematical Encyclopedia

Stochastic Matrix- a square (possibly infinite) matrix with non-negative elements such that for any i. The set of all nth-order symmetry systems is a convex hull........
Mathematical Encyclopedia

Stochastic Continuity— property of sample functions of a random process. A random process X(t), defined on a certain set called. stochastically continuous on this set if for any........
Mathematical Encyclopedia

Stochastic Indiscernibility- a property of two random processes and means that the random set is negligible, i.e., the probability of the set that is equal to zero. If X and Y are stochastic........
Mathematical Encyclopedia

Stochastic Boundedness— boundedness in probability, is a property of a random process X(t), which is expressed by the condition: for an arbitrary one there exists C>0 such that for all A. V. Prokhorov.
Mathematical Encyclopedia

Stochastic Sequence- a sequence of random variables defined on a measurable space with a non-decreasing family of -algebras allocated on it having the property of consistency........
Mathematical Encyclopedia

Stochastic Convergence- the same as convergence in probability.
Mathematical Encyclopedia

Stochastic Equivalence— equivalence relation between random variables that differ only on the zero probability set. More precisely, random variables X 1 and X 2. specified on one........
Mathematical Encyclopedia

Alcohol addiction— Alcohol is a narcotic substance; for a discussion, see the article drug addiction.
Psychological Encyclopedia

Hallucinogenic Addiction- Drug addiction, in which the drugs are hallucinogens.
Psychological Encyclopedia

Addiction- (Dependence). A positive quality that promotes healthy psychological development and human growth.
Psychological Encyclopedia

Dependence, Drug Dependence— (drug dependence) - physical and/or psychological effects resulting from addiction to certain medicinal substances; characterized by compulsive impulses........
Psychological Encyclopedia

Let it be necessary to study the dependence and both quantities are measured in the same experiments. To do this, a series of experiments is carried out at different values, trying to keep other experimental conditions unchanged.

The measurement of each quantity contains random errors (we will not consider systematic errors here); therefore, these values ​​are random.

The natural relationship of random variables is called stochastic. We will consider two problems:

a) establish whether there is (with a certain probability) a dependence on or whether the value does not depend on;

b) if the dependence exists, describe it quantitatively.

The first task is called analysis of variance, and if a function of many variables is considered, then multivariate analysis of variance. The second task is called regression analysis. If the random errors are large, then they can mask the desired dependence and it may not be easy to identify it.

Thus, it is enough to consider a random variable depending on as a parameter. The mathematical expectation of this value depends on this dependence being the desired one and is called the regression law.

Analysis of variance. Let us carry out a small series of measurements for each value and determine Consider two ways of processing these data, allowing us to investigate whether there is a significant (i.e., with an accepted confidence probability) dependence of z on

In the first method, the sampling standards of a single measurement are calculated for each series separately and for the entire set of measurements:

where is the total number of measurements, and

are the average values, respectively, for each series and for the entire set of measurements.

Let's compare the variance of a set of measurements with the variances of individual series. If it turns out that at the chosen confidence level it is possible to calculate for all i, then there is a dependence of z on.

If there is no reliable excess, then the dependence cannot be detected (given the accuracy of the experiment and the adopted processing method).

Variances are compared using Fisher's test (30). Since the standard s is determined by the total number of measurements N, which is usually quite large, you can almost always use the Fisher coefficients given in Table 25.

The second method of analysis is to compare averages at different values ​​with each other. The values ​​are random and independent, and their own sampling standards are equal to

Therefore, they are compared according to the scheme of independent measurements described in paragraph 3. If the differences are significant, i.e., exceed the confidence interval, then the fact of dependence on has been established; if the differences between all 2 are insignificant, then the dependence cannot be detected.

Multivariate analysis has some features. It is advisable to measure the value in the nodes of a rectangular grid so that it is more convenient to study the dependence on one argument, fixing another argument. Carrying out a series of measurements at each node of a multidimensional grid is too labor-intensive. It is enough to carry out a series of measurements at several grid points to estimate the dispersion of a single measurement; in other nodes we can limit ourselves to single measurements. Analysis of variance is carried out according to the first method.

Remark 1. If there are many measurements, then in both methods individual measurements or series can, with a noticeable probability, deviate quite strongly from their mathematical expectation. This must be taken into account when choosing a confidence probability close enough to 1 (as was done in setting the limits separating permissible random errors from gross ones).

Regression analysis. Let the analysis of variance indicate that the dependence of z on is. How to quantify it?

To do this, we approximate the desired dependence with a certain function. We find the optimal values ​​of the parameters using the least squares method, solving the problem

where are the measurement weights, selected in inverse proportion to the square of the measurement error at a given point (i.e. ). This problem was analyzed in Chapter II, § 2. We will dwell here only on those features that are caused by the presence of large random errors.

The type is selected either from theoretical considerations about the nature of the dependence or formally, by comparing the graph with graphs of known functions. If the formula is selected from theoretical considerations and correctly (from the theoretical point of view) conveys the asymptotics, then usually it allows not only to well approximate the set of experimental data, but also to extrapolate the found dependence to other ranges of values. A formally selected function can satisfactorily describe the experiment, but is rarely suitable for extrapolation .

It is easiest to solve problem (34) if it is an algebraic polynomial. However, such a formal choice of function rarely turns out to be satisfactory. Typically, good formulas depend nonlinearly on parameters (transcendental regression). It is most convenient to construct transcendental regression by selecting such a leveling replacement of variables so that the dependence is almost linear (see Chapter II, § 1, paragraph 8). Then it is easy to approximate it by an algebraic polynomial: .

A leveling change of variables is sought using theoretical considerations and taking into account asymptotics. We will further assume that such a change has already been made.

Remark 2. When passing to new variables, the problem of the least squares method (34) takes the form

where the new weights are related to the original relations

Therefore, even if in the original formulation (34) all measurements had the same accuracy, the weights for the leveling variables will not be the same.

Correlation analysis. It is necessary to check whether the replacement of variables was really leveling, that is, whether the dependence is close to linear. This can be done by calculating the pair correlation coefficient

It is easy to show that the relation is always satisfied

If the dependence is strictly linear (and does not contain random errors), then or depending on the sign of the slope of the straight line. The smaller , the less the dependence resembles linear. Therefore, if , and the number of measurements N is large enough, then the leveling variables are chosen satisfactorily.

Such conclusions about the nature of the dependence based on correlation coefficients are called correlation analysis.

Correlation analysis does not require a series of measurements to be taken at each point. It is enough to make one measurement at each point, but then take more points on the curve under study, which is often done in physical experiments.

Remark 3. There are proximity criteria that allow you to indicate whether the dependence is practically linear. We do not dwell on them, since the choice of the degree of the approximating polynomial will be considered below.

Remark 4. The ratio indicates the absence of a linear dependence but does not mean the absence of any dependence. So, if on a segment - then

Optimal degree polynomial a. Let us substitute into problem (35) an approximating polynomial of degree :

Then the optimal values ​​of the parameters satisfy the system of linear equations (2.43):

and they are not difficult to find. But how to choose the degree of a polynomial?

To answer this question, let's return to the original variables and calculate the variance of the approximation formula with the found coefficients. An unbiased estimate of this variance is

Obviously, as the degree of the polynomial increases, the dispersion (40) will decrease: the more coefficients are taken, the more accurately the experimental points can be approximated.