Mean value and variation of an alternative trait. Studying the shape of the distribution of a characteristic. Main characteristics of distribution patterns Limit value of dispersion

Variation indicators

Variation indicators characterize the fluctuation of individual values ​​of a characteristic in relation to the average value, which is no less important than determining the average itself. The average does not show the structure of the population, how the variants of the averaged characteristic are located around it, whether they are concentrated near the average or significantly deviate from it. The average value of a characteristic in two populations may be the same, but in one case all individual values ​​differ little from it, and in the other these differences are large, i.e. in one case the variation of the trait is small, and in the other it is large.
This can be shown with this example. Let's assume that two teams of 3 people each perform the same job. The number of parts produced per shift by individual workers was:
in the first brigade - 95, 100, 105;
in the second brigade - 75, 100, 125.
The average output per worker in the teams was

, .
The average output is the same, but the fluctuation of the output of individual workers in the first brigade is much less than in the second.
Consequently, the more the variants of individual units of the population differ from each other, the more they differ from their average, and vice versa - the variants that differ little from each other are closer in value to the average, which in this case will more realistically represent the entire population.

Therefore, to characterize and measure the variation of a trait in the aggregate, in addition to the average, the following indicators are used:

  • absolute - variation range, average linear and standard deviation, dispersion;
  • relative - coefficients of variation.

Variation range (or range of variation) - this is the difference between the maximum and minimum values ​​of the characteristic:

In our example, the range of variation in shift output of workers is: in the first brigade R = 105-95 = 10 children, in the second brigade R = 125-75 = 50 children. (5 times more). This suggests that the output of the 1st brigade is more “stable”, but the second brigade has more reserves for increasing output, because If all workers reach the maximum output for this brigade, it can produce 3 * 125 = 375 parts, and in the 1st brigade only 105 * 3 = 315 parts.
The disadvantage of the variation range indicator is that its value does not reflect all fluctuations of the trait.
The simplest general indicator reflecting all fluctuations of a characteristic is average linear deviation, which is the arithmetic mean of the absolute deviations of individual options from their average value:
for ungrouped data
,
for grouped data
,
where xi is the value of the attribute in a discrete series or the middle of the interval in the interval distribution.
In the above formulas, the differences in the numerator are taken modulo, otherwise, according to the property of the arithmetic mean, the numerator will always be equal to zero. Therefore, the average linear deviation is rarely used in statistical practice, only in cases where summing indicators without taking into account the sign makes economic sense. With its help, for example, the composition of the workforce, the profitability of production, and foreign trade turnover are analyzed.
Variance of a trait is the average square of deviations from their average value:
simple variance
,
variance weighted
.
The formula for calculating variance can be simplified:

Thus, the variance is equal to the difference between the average of the squares of the option and the square of the average of the option of the population:
.
However, due to the summation of the squared deviations, the variance gives a distorted idea of ​​the deviations, so the average is calculated based on it standard deviation, which shows how much on average specific variants of a trait deviate from their average value. Calculated by taking the square root of the variance:
for ungrouped data
,
for variation series

The smaller the value of the variance and standard deviation, the more homogeneous the population, the more reliable (typical) the average value will be.
Average linear and standard deviation are named numbers, i.e. they are expressed in units of measurement of a characteristic, are identical in content and close in meaning.
It is recommended to calculate absolute variations using tables.
Table 3 - Calculation of variation characteristics (using the example of the period of data on the shift output of crew workers)

Groups of production workers, pcs.

Number of workers

The middle of the interval

Calculated values

170-190 10 180 1800 -36 360 1296 12960
190-210 20 200 4000 -16 320 256 5120
210-230 50 220 11000 4 200 16 800
230-250 20 240 4800 24 480 576 11520
Total: 100 - 21600 - 1360 - 30400

Average shift output of workers:

Average linear deviation:

Production variance:

The standard deviation of the output of individual workers from the average output:
.

Calculating variances involves cumbersome calculations (especially if the average is expressed as a large number with several decimal places). Calculations can be simplified by using a simplified formula and dispersion properties.
Dispersion has the following properties (provable in mathematical statistics):

1. if all values ​​of a characteristic are reduced or increased by the same value A, then the dispersion will not decrease,


Calculation of the variance of an alternative characteristic

Among the characteristics studied by statistics, there are also those that have only two mutually exclusive meanings. These are alternative signs. They are given, respectively, two quantitative values: options 1 and 0. The frequency of option 1, which is denoted by p, is the proportion of units possessing this characteristic. The difference 1-р=q is the frequency of options 0. Thus,

xi wi
1 p
0 q

Arithmetic mean of the alternative sign
, because p+q=1.

Alternative trait variance
, because 1-р=q
Thus, the variance of an alternative characteristic is equal to the product of the proportion of units possessing this characteristic and the proportion of units not possessing this characteristic.
If values ​​1 and 0 occur equally often, i.e. p=q, the variance reaches its maximum pq=0.25.
The variance of an alternative attribute is used in sample surveys, for example, of product quality.

Concept of variation

The average gives a generalizing characteristic of the entirety of the phenomenon being studied.

Variation of the trait is called the difference in individual values ​​of a characteristic within the population being studied.

The average value is an abstract, generalizing characteristic of the characteristic of the population being studied, but it does not show the structure of the population.

The average value does not give an idea of ​​how individual values ​​of the characteristic being studied are grouped around the average, whether they are concentrated close to or significantly deviate from it.

If individual values ​​of a characteristic are close to the arithmetic mean, then in this case the mean well represents the entire population. And vice versa.

The variability of individual values ​​is characterized by indicators of variation.

The term “variation” comes from the Latin variatio – change, fluctuation, difference. However, not all differences are usually called variation.

Under Variation in statistics we understand such quantitative changes in the value of the characteristic being studied within a homogeneous population that are caused by the intersecting influence of various factors. Variation of a trait is distinguished in absolute and relative values. Absolute – R, L, σ, σ 2.

Variation indicators

1 set 2 set
n=5 80, 100, 120, 200, 300 n=8 145, 150, 155, 160, 160, 162, 168, 180

80 100 120 x 200 300

Therefore, in this case there is a need to determine the variation of the trait, i.e. the ratio of individual values ​​of a series relative to each other.

Variation indicators

1. The range of variation is the difference between the maximum and minimum values ​​of a characteristic.

R = X max - X min

R 1 = 300-80 = 220 R 2 = 180-145 = 35

Practice: for a homogeneous population, for product quality control.

2. Indicators that take into account deviations of all options from the arithmetic mean.

a) Average linear deviation

b) Standard deviation

Average linear deviation represents the arithmetic mean of the absolute values ​​of deviations of individual options from the average.

for not grouped:

;

for grouped:

Practice: it is used to analyze:

1. Composition of employees

2. Rhythm of production

3. Uniform supply of materials

Flaw: this indicator complicates calculations of the probable type and makes it difficult to use methods of mathematical statistics

Mean square deviation (standard)- This

for ungrouped data

for grouped data

For moderately skewed distributions

The standard deviation, like the mean linear deviation, is an absolute indicator and is expressed in the same units as the arithmetic mean.

Indicators of mean square or mean linear deviations for two populations turn out to be incomparable if the characteristics themselves are different for these populations. These indicators are not comparable for different characteristics of the same population. Those. when the means in both populations are expressed in the same units of measurement and are the same, comparison is possible and will reflect differences in the variation of the trait.

The standard deviation is a measure of the reliability of the mean. The smaller σ, the better the arithmetic mean reflects the entire represented population.

3. Dispersion used to measure the variability of a trait. This indicator more objectively reflects the measure of variation

for not grouped

for grouped

A distinctive feature of this indicator is that when squaring the proportion of small deviations falls, and large ones increase in the total amount of deviations.

This is also an absolute indicator

The variance has a number of properties, some of which make it easier to calculate:

1. The variance of a constant value is 0

2. If all variants of characteristic values ​​(x) ↓ by the same number, then the variance does not decrease

3. If all options are ↓ the same number of times (K times), then the variance is ↓ K 2 times

x f x"

x 100 times

The variance σ is 0.909*10000=9090

The calculation of variation indices for quantitative characteristics was discussed above, but the task of estimating variation can be set qualitative signs. For example, when studying the quality of manufactured products, they can be divided into good and defective.

In this case, we are talking about alternative characteristics.

Alternative trait variance

Alternative signs are called those that some units of the population possess and others do not. For example, the presence of work experience for applicants, an academic degree for university teachers, etc. The presence of a characteristic in population units is conventionally denoted by 1, and the absence by 0. x 1 =1, x 2 =0. The proportion of units possessing the characteristic (in the total population) is denoted by p, and the proportion of units not possessing it by q. Those. p+q=1, q=1-p.

Let's calculate the average value of the alternative characteristic

; ;

Those. the average value of an alternative characteristic is equal to the proportion of units possessing these characteristics to the proportion of units not possessing these characteristics.

The standard deviation is equal to B p =

Quality is checked: 1000 finished products, 20 defective.

Find the percentage of defects: (20/1000)*100%=0.02%

Dispersion has a number of properties, which simplify the calculation.

1. If you subtract some constant number A from all the values, then the standard deviation from this will not change.

Variation— these are differences in individual values ​​of a characteristic among units of the population being studied. The study of variation is of great practical importance and is a necessary link in economic analysis. The need to study variation is due to the fact that the average, being the resultant, performs its main task with varying degrees of accuracy: the smaller the differences in the individual values ​​of the attribute that are subject to averaging, the more homogeneous the set, and, therefore, the more accurate and reliable the average, and vice versa. Therefore, by the degree of variation one can judge the limits of variation of a characteristic, the homogeneity of the population for a given characteristic, the typicality of the average, the relationship of factors that determine the variation.

Changing the variation of a characteristic in the aggregate is carried out using absolute and relative indicators.

Absolute measures of variation include:

Range of variation (R)

Range of variation is the difference between the maximum and minimum values ​​of the attribute

It shows the limits within which the value of the characteristic in the studied variable changes.

Example. The work experience of the five applicants in previous work is: 2,3,4,7 and 9 years.
Solution: range of variation = 9 - 2 = 7 years.

For a generalized description of differences in attribute values, average variation indicators are calculated based on taking into account deviations from the arithmetic mean. The difference is taken as a deviation from the average.

In this case, in order to avoid the sum of deviations of variants of a characteristic from the average turning to zero (zero property of the average), one must either ignore the signs of the deviation, that is, take this sum modulo , or square the deviation values

Average linear and square deviation

Average linear deviation- this is from the absolute deviations of individual values ​​of a characteristic from the average.

The average linear deviation is simple:

The work experience of the five applicants in previous work is: 2,3,4,7 and 9 years.

In our example: years;

Answer: 2.4 years.

Average linear deviation weighted applies to grouped data:

Due to its convention, the average linear deviation is used in practice relatively rarely (in particular, to characterize the fulfillment of contractual obligations regarding uniformity of delivery; in the analysis of product quality, taking into account the technological features of production).

Standard deviation

The most perfect characteristic of variation is the mean square deviation, which is called the standard (or standard deviation). () is equal to the square root of the average square deviation of individual values ​​of the characteristic from:

The standard deviation is simple:

Weighted standard deviation is applied to grouped data:

Between the root mean square and mean linear deviations under normal distribution conditions the following ratio takes place: ~ 1.25.

The standard deviation, being the main absolute measure of variation, is used in determining the ordinate values ​​of a normal distribution curve, in calculations related to the organization of sample observation and establishing the accuracy of sample characteristics, as well as in assessing the limits of variation of a characteristic in a homogeneous population.

Dispersion

Dispersion- represents the average square of deviations of individual values ​​of a characteristic from their average value.

The variance is simple:

In our example:

Weighted variance:

It is more convenient to calculate the variance using the formula:

which is obtained from the main one through simple transformations. In this case, the average square of deviations is equal to the average of the squares of the attribute values ​​minus the square of the average.

For ungrouped data:

For grouped data:

Alternative trait variation consists in the presence or absence of the property being studied in units of the population. Quantitatively, the variation of an alternative attribute is expressed by two values: the presence of a unit of the studied property is denoted by one (1), and its absence is denoted by zero (0). The proportion of units possessing the characteristic being studied is denoted by the letter , and the proportion of units not possessing this characteristic is denoted by . Considering that p + q = 1 (hence q = 1 - p), and the average value of the alternative characteristic is equal to

,

mean square deviation

Thus, the variance of an alternative attribute is equal to the product of the proportion of units that have this property () and the proportion of units that do not have this property ().

The maximum value of the average square deviation (dispersion) takes in the case of equality of shares, i.e. when i.e. . The lower limit of this indicator is zero, which corresponds to a situation in which there is no variation in the aggregate. Standard deviation of the alternative characteristic:

So, if in a manufactured batch 3% of products turned out to be non-standard, then the dispersion of the share of non-standard products is , and the standard deviation or 17.1%.

Standard deviation is equal to the square root of the average square deviation of individual values ​​of the attribute from the arithmetic mean.

Relative Variation Measures

Relative measures of variation include:

Comparing the variation of several populations for the same characteristic, and even more so for different characteristics, using absolute indicators is not possible. In these cases, for a comparative assessment of the degree of difference, relative indicators of variation are constructed. They are calculated as the ratio of absolute variations to the average:

Other relative characteristics are also calculated. For example, to assess variation in the case of a skewed distribution, calculate the ratio of the average linear deviation to the median

since, thanks to the property of the median, the sum of absolute deviations of a characteristic from its value is always less than from any other.

As a relative measure of dispersion that evaluates the variation in the central part of the population, the relative quartile deviation is calculated, where is the average quartile of the half-sum of the difference between the third (or upper) quartile () and the first (or lower) quartile ().

In practice, the coefficient of variation is most often calculated. The lower limit of this indicator is zero, it has no upper limit, but it is known that as the variation of a characteristic increases, its value also increases. The coefficient of variation is, in a certain sense, a criterion for the homogeneity of the population (in the case of normal distribution).

Let's calculate the coefficient of variation based on the standard deviation for the following example. The consumption of raw materials per unit of production was (kg): according to one technology at , and according to the other at. A direct comparison of the value of standard deviations could lead to the misconception that the variation in raw material consumption by the first technology is more intense than by the second (. The relative measure of variation ( allows us to draw the opposite conclusion

Example of calculation of variation indices

At the stage of selecting candidates to participate in the implementation of a complex project, the company announced a competition for professionals. The distribution of applicants by work experience showed the following results:

Let's calculate the average production experience, years

Let's calculate the variance by length of work experience

The same result is obtained if you use a different formula for calculating variance for the calculation

Let's calculate the standard deviation, years:

Let's determine the coefficient of variation, %:

Variance addition rule

To assess the influence of factors that determine variation, a grouping technique is used: the population is divided into groups, choosing one of the determining factors as a grouping characteristic. Then, along with the total variance calculated for the entire population, the within-group variance (or the average of the group) and the between-group variance (or the variance of the group means) are calculated.

Total variance characterizes the variation of a trait in its entirety, formed under the influence of all factors and conditions.

Intergroup variance measures the systematic variation due to the influence of the factor by which the grouping is made:

Within-group variance evaluates the variation of a trait that has developed under the influence of other factors not taken into account in this study and is independent of the grouping factor. It is defined as the average of the group variances.

All three variances () are related to each other by the following equality, which is known as rule for adding variances:

On this ratio, indicators are built that evaluate the influence of a grouping characteristic on the formation of general variation. These include the empirical coefficient of determination () and the empirical correlation ratio ()

() characterizes the share of intergroup variance in the total variance:

and shows how much the variation of a trait in the aggregate is due to the grouping factor.

Empirical correlation relationship(!!\eta = \sqrt( \frac(\delta^2)(\sigma^2) )

evaluates the closeness of the connection between the studied and grouping characteristics. The limit values ​​are zero and one. The closer to one, the closer the connection.

Example. The cost of 1 sq.m of total area (conventional units) on the housing market for ten 17th houses with improved layout was:

It is known that the first five houses were built near the business center, and the rest were built at a considerable distance from it.

To calculate the total variance, let's calculate the average cost of 1 sq.m. total area: The total dispersion is determined by the formula :

Let's calculate the average cost of 1 sq.m. and the dispersion for this indicator for each group of houses that differ in location relative to the city center:

A) for houses built near the center:

b) for houses built far from the center:

Variation in the cost of 1 sq.m. total area caused by a change in the location of houses is determined the magnitude of intergroup variance:

Variation in the cost of 1 sq.m. total area, due to changes in other indicators that we do not take into account, is measured the value of within-group variance

The found variances add up to the total variance

Empirical coefficient of determination:

shows that the dispersion of the cost of 1.sq.m. of the total area in the housing market is 81.8% explained by differences in the location of new buildings in relation to the business center and 18.2% by other factors.

The empirical correlation relationship indicates a significant impact on the cost of housing by the location of houses.

The rule for adding variances for a share the sign is written as follows:

and three types of proportion variances for grouped data are determined by the following formulas:

total variance:

Formulas for intergroup and intragroup variances:

Characteristics of the distribution shape

To get an idea of ​​the shape of the distribution, indicators of the average level (,), indicators of variation, asymmetry and kurtosis are used.

In symmetric distributions, the arithmetic mean, mode and median coincide (. If this equality is violated, the distribution is asymmetric.

The simplest indicator of asymmetry is the difference, which in the case of right-sided asymmetry is positive, and in the case of left-sided asymmetry it is negative.

Asymmetrical distribution

To compare the asymmetry of several rows, a relative indicator is calculated

Variations are used as generalizing characteristics central moments of distribution th order, corresponding to the power to which the deviations of individual values ​​of a characteristic from the arithmetic mean are raised:

For ungrouped data:

For grouped data:

The first-order moment, according to the property of the arithmetic mean, is equal to zero.

The second order moment is the dispersion.

Moments of the third and fourth orders are used to construct indicators that evaluate the features of the shape of empirical distributions.

The third-order moment is used to measure the degree of skewness or skewness of the distribution.

— asymmetry coefficient

In symmetric distributions, like all central moments of odd order. The inequality of the third order central moment to zero indicates the asymmetry of the distribution. Moreover, if , then the asymmetry is right-sided and the right branch is elongated relative to the maximum ordinate; if , then the asymmetry is left-sided (on the graph this corresponds to the elongation of the left branch).

To characterize the peakedness or flatness of the distribution, the ratio of the fourth order moment () to the standard deviation to the fourth power () is calculated. For a normal distribution, therefore, kurtosis is found using the formula:

For a normal distribution it vanishes. For peaked distributions, for flat-topped ones.

Kurtosis of distribution

In addition to the indicators discussed above, a general characteristic of variation in a homogeneous population is a certain order in the change in distribution frequencies in accordance with changes in the value of the characteristic being studied, called distribution pattern.

The nature (type) of the distribution pattern can be revealed by constructing a variation series based on a large volume of observations, as well as by choosing the number of groups and the value of the integrals in which the pattern could most clearly appear.

Analysis of variation series involves identifying the nature of the distribution (as a result of the action of the variation mechanism), establishing the distribution function, and checking the compliance of the empirical distribution with the theoretical one.

Empirical distribution, obtained from observational data, is graphically represented by an empirical distribution curve using a polygon.

In practice, there are various types of distributions, among which we can distinguish symmetric and asymmetric, single-vertex and multi-vertex.

Establishing the type of distribution means expressing the mechanism of pattern formation in analytical form. Many phenomena and their characteristics are characterized by characteristic distribution forms, which are approximated by the corresponding curves. With all the variety of distribution forms, the most widely used theoretical ones are the normal distribution, Pausson distribution, binomial distribution, etc.

A special place in the study of variation belongs to the normal law, due to its mathematical properties. For the normal law, the three-sigma rule is satisfied, according to which the variation of individual values ​​of a characteristic is within the range of the average value. At the same time, about 70% of all units are within the boundaries, and 95% are within the boundaries.

The assessment of the correspondence between the empirical and theoretical distributions is carried out using goodness-of-fit criteria, among which the Pearson, Romanovsky, Yastremsky, and Kolmogorov criteria are widely known.

σ p 2 =

Substituting into the variance formula q = 1 - R, we get

σ p 2 =

Thus , σ p 2 = pq- the variance of an alternative characteristic is equal to the product of the proportion of units possessing the characteristic by the proportion of units not possessing this characteristic.

Standard deviation(σ ) equal to the square root of the variance. Simple standard deviation:

σ =

weighted

σ =

The standard deviation is a general characteristic of the size of the variation of a characteristic in the aggregate; it shows how much on average specific options deviate from their average value; is an absolute measure of the variability of a characteristic and is expressed in the same units as the variants, therefore it is economically well interpreted.

Standard deviation of an alternative characteristic

σ p =

In statistical practice, there is often a need to compare variations of different characteristics. For example, it is of great interest to compare variations in the age of workers and their qualifications, length of service and wages, costs and profits, length of service and labor productivity, etc. For such comparisons, indicators of absolute variability of characteristics are unsuitable: it is impossible to compare the variability of work experience, expressed in years, with the variation of wages, expressed in rubles.

To carry out this kind of comparisons, as well as comparisons of the variability of the same characteristic in several populations with different arithmetic averages, relative indicators of variation are used

Relative Variation Measures are defined as the ratio of absolute variation indicators to the arithmetic mean.

This oscillation coefficient, defined as the ratio of the range of variation to the arithmetic mean in percent
.

Linear coefficient of variation determined similarly, but by the average linear deviation
.

The most common of these is the coefficient of variation.

The coefficient of variation represents the ratio of the standard deviation to the arithmetic mean, expressed as a percentage:

Relative indicators of variation characterize the degree of fluctuation of a characteristic within the average value. By the value of, for example, the coefficient of variation, one can determine the degree of homogeneity of the population being studied. The population is considered sufficiently homogeneous if the coefficient of variation does not exceed 33%. Limits have been established to assess the quality and stability of the average value. The best values ​​for the coefficient of variation are
; Values ​​up to 50% are considered acceptable.

6.3. Properties of dispersion and simplified methods for its calculation.

The technique of calculating dispersion using formulas is quite complex, and for large values ​​of options and frequencies it can be cumbersome. The calculation can be simplified using the properties of dispersion (provable in mathematical statistics):

First property - if all values ​​of a characteristic are reduced by the same constant amount A, then the variance will not change;

σ 2 (Ha) X 2

Secondproperty- if all values ​​of a characteristic are reduced to the same number i times, then the dispersion will correspondingly decrease by i 2 once.

σ 2 (X/ i ) = σ x 2 : i 2

The third property (property of minimality) - mean square deviation

from any value A(different from the arithmetic mean) more

variance of the trait per squared difference between the arithmetic mean and the value A

σ A 2 = σ x 2 +(x- A) 2

Using the properties of dispersion, we obtain the following simplified formula calculating variance in variation series with equal intervals according to the moment method:

σ 2=∙ (

- moment of second order

- square of the first order moment

Based on the last property of dispersion, the simplified dispersion formula for any series (discrete, interval with equal and unequal intervals) the dispersion formula will take the form:

6.4. Types of dispersions.

The variation of a characteristic is due to various factors, some of these factors can be identified if the statistical population is divided into groups according to any characteristic. Then, along with studying the variation of a trait throughout the entire population as a whole, it becomes possible to study the variation for each of its constituent groups, as well as between these groups. In the simplest case, when the population is divided into groups according to one factor, the study of variation is achieved through the calculation and analysis of three types of variances: general, intergroup and intragroup.

Total variance σ 2 measures the variation of a trait throughout the entire population under the influence of all factors that caused this variation. It is equal to the mean square deviation of individual values ​​of the attribute X from the total average and can be calculated as simple variance or variance weighted.

Intergroup variance δ 2 characterizes the systematic variation of the resultant order, due to the influence of the factor-attribute that forms the basis of the group. It is equal to the mean square deviation of group (partial) means
, from the total average

and can be calculated as simple variance or how variance weighted according to the formulas, respectively:

Intergroup dispersion reflects the variation in the characteristic that forms the basis of the grouping.

Within-group (private) variance (in each group) σ i 2 , reflects random variation, i.e. part of the variation due to the influence of unaccounted factors and independent of the factor-attribute that forms the basis of the group. It is equal to the mean square deviation of individual values ​​of the characteristic within the group X from the arithmetic mean of this group , (group average) and can be calculated as simple variance or how variance weighted according to the formulas, respectively:

Based on the within-group variances for each group, i.e. based σ i 2 can be determined average of within-group variances:

According to rule for adding variances the total variance is equal to the sum of the average of the within-group and between-group variances:

Using the rule of adding variances, you can always determine the third, unknown, variance from two known variances, and also judge the strength of the influence of the grouping characteristic.

The share of variation of a grouping characteristic in the aggregate is characterized by the empirical coefficient of determination
.

23. Variance of alternatives. Sign

Alternative trait variance (if in a statistical population a characteristic changes in such a way that there are only two mutually exclusive options, then such variability is called alternative) can be calculated using the formula:

Substituting q = 1- p into this dispersion formula, we get:


The growth coefficient K i is defined as the ratio of a given level to the previous or base level; it shows the relative rate of change in the series. If the growth rate is expressed as a percentage, it is called the growth rate.

Base growth rate

Chain growth factor

24. Study of the main development trend

One of the most important tasks of statistics is to determine in the dynamics of the general trend of development of a phenomenon. The development of a phenomenon over time is influenced by various factors. Therefore, when analyzing dynamics, we are talking about the main trend, which is quite stable (sustainable) throughout the studied stage of development. The main development trend (TREND) called a smooth and stable change in the level of a phenomenon over time, free from random fluctuations. For this purpose, the time series are processed by the methods of enlarging intervals, moving average and analytical leveling. The simplest method for studying the main trend in time series is enlargement of intervals. This method is based on enlarging time periods, which include the levels of the dynamics series (at the same time, the number of intervals is decreasing). Identification of the main trend can also be carried out using the moving average method. Its essence lies in the fact that the average level is calculated from a certain number, usually odd (3, 5, 7, etc.), of the first levels of the series, then from the same number of levels, but starting from the second, further – starting from the middle one, etc. Thus, the average “slides” along the dynamics series, moving by one term. The disadvantage of series smoothing is that the smoothed series is “shortened” compared to the actual series, and therefore information is lost. In order to provide a quantitative model expressing the main trend in changes in the levels of a time series over time, analytical alignment of the time series is used. Main content analytical alignment method in time series is that the general development trend is calculated as a function of time:, where the levels of the time series are calculated using the corresponding analytical equation at a point in time.

^ Aligning a dynamics row in a straight line:
. Parameters a 0, a 1 according to the least squares method are found by solving the following system of normal equations:
, where y are the actual (empirical) levels of the series; t– time (ordinal number of the period or moment in time). The calculation of parameters is greatly simplified if we take the central interval (moment) as the beginning of time (t = 0). Thus, the system takes the form
. Thus we get:
;
.
25.Analyt.alignment by name method Square

Least square method used for a more accurate quantitative assessment of the dynamics of the phenomenon being studied. The simplest and most frequently encountered in practice is a linear relationship described by the equation:

Y x = a + bX, or Y theoretical. = Y average + vX,

where Y x - theoretical (calculated) levels of the series for each period;
a is the arithmetic mean indicator of the level of the series, calculated by the formula:
а=ΣУ fact. /n;
в - direct parameter, coefficient showing the difference between the theoretical levels of the series for adjacent periods, is determined by calculation using the formula: в = Σ(ХУ fact)/ΣХ 2
where n is the number of levels of the dynamic series;
X - temporary points, natural numbers, entered from the middle (center) of the series to both ends.

If there is an odd row, the level occupying the middle position is taken as 0. For example, with 9 levels of the row: -4, -3, -2, -1, 0, +1, +2, +3, +4.

With an even number of levels in a series, two values ​​occupying the middle position are designated by -1 and +1, and all others - by 2 intervals. For example, with 6 row levels: -5, -3, -1, +1, +3, +5.

Calculations are carried out in the following sequence:


  1. They represent the actual levels of the time series (U f) (see table).

  2. The actual levels of the series are summed up and the sum Y fact is obtained.

  3. Find conditional (theoretical) time points of the series X so that their sum (ΣХ) is equal to 0.

  4. The theoretical time points are squared and summed to give EX 2 .

  5. The product of X and Y is calculated and summed to obtain ΣXY.

  6. Calculate the parameters of the straight line:
    а = ΣУ fact / n в = Σ(Х У fact) / ΣX 2

  7. By successively substituting the values ​​of X into the equation Y x = a + aY, the aligned levels of Y x are found.

26.Analysis of seasonal fluctuations

When comparing quarterly and monthly data for many socio-economic phenomena, periodic fluctuations are often discovered that arise under the influence of the changing seasons. In statistics, periodic fluctuations that have a definite and constant period equal to an annual interval are called seasonal variations or seasonal waves, the time series is called the seasonal time series. In statistics, there are methods for studying and measuring seasonal fluctuations. The simplest is the construction of special indicators called seasonality indices (Is). The combination of these indicators reflects the seasonal wave. Seasonality indices - % of the ratio of actual (empirical) intragroup levels to theoretical (calculated) levels, serving as a basis for comparison. In order to identify a stable seasonal wave, they are calculated using data for several years (at least 3), distributed over months. For each month, the average value of the level is calculated ( ), then the average monthly level is calculated for the entire series y¯. After which the seasonal wave indicator is determined - the seasonality index Is as a percentage of the average for each month to the overall average monthly level of the series, %. The average seasonality index for 12 months should be equal to 100%, then the sum of the indices should be 1200. When the level shows an upward or downward trend, deviations from a constant average level can be distorted by seasonal fluctuations. In this case, the actual data is compared with the aligned data, i.e., obtained by analytical alignment. Formula:
.

27.I. interpolation and extrapolation

When studying long-term dynamics, sometimes it becomes necessary to determine unknown levels within a series of dynamics.

Interpolation is the approximate calculation of missing levels within a homogeneous period when the adjacent levels on both sides are known.

Extrapolation is the calculation of the missing level when the level on only one side is known. If the level is calculated towards the future, this is called forward extrapolation; if it is calculated towards the past, it is called retrospective extrapolation.

Both interpolation and extrapolation must be carried out during the period of validity of one pattern. It is assumed that the pattern of development found within the series is preserved.

Methods for calculating an unknown level depend on the nature of the change in the phenomenon under study. If the level changes are smooth, the missing level can be determined by the half-sum of two adjacent levels, by the average absolute increase, by the average growth rate.

While maintaining post-x absolute increases in the missing levels of the dynamic series having calculated: = +

First level

If constant growth rates are assumed, the missing level of the series is calculated using the formula:

If sharp fluctuations are observed in the dynamics series, then it is better to use the average absolute increase or the average growth rate for the entire study period, as indicated in the formulas.

Indexes are comparative relative values ​​that characterize changes in complex socio-economic indicators (indicators consisting of non-summable elements) in time, in space, compared to the plan.

An index is the result of a comparison of two indicators of the same name, in the calculation of which it is necessary to distinguish between the numerator of the index ratio (the compared or reporting level) and the denominator of the index ratio (the base level with which the comparison is made). The choice of base depends on the purpose of the study. If dynamics are studied, then the size of the indicator in the period preceding the reporting period can be taken as the base value. If it is necessary to make a territorial comparison, then data from another territory can be taken as the base. Planned indicators can be taken as a basis for comparison if it is necessary to use indexes as indicators of plan implementation.

Indices form the most important economic indicators of the national economy and its individual sectors. Index indicators make it possible to analyze the performance of enterprises and organizations that produce a wide variety of products or are engaged in various types of activities. Using indices, you can trace the role of individual factors in the formation of the most important economic indicators and identify the main production reserves. Indices are widely used in comparing international economic indicators when determining living standards, business activity, pricing policy, etc.

There are two approaches to interpreting the capabilities of index indicators: generalizing (synthetic) and analytical, which in turn are determined by different tasks.

29.Aggregation indexes

General index reflects changes in all elements of a complex phenomenon. If indexes do not cover all elements, they are called group or subindexes. There are aggregate and average indices, the calculation of which constitutes a special research technique called the index method. When constructing general indexes: 1. you need to select the elements that should be combined in one index; 2. choose the right co-measurer or weight, i.e. constant attribute. The choice of weight depends on which attribute is being indexed - quantitative or qualitative. The main form of general indexes is the aggregate form. The aggregate form index is constructed using the sum method. The aggregate form is used if we have element-by-element data in the reporting and base periods . Commodity index:
; in-s physical volume prod
; ^ Consumer price index is a general measure of inflation. The indexed value in it will be the price of the product. When constructing a price index, the number of goods sold in the current (reporting) period is usually taken as index weights. An aggregate price index with reporting weights was first proposed by Paasche and bears his name: Paasche aggregate price index formula
, Where
- actual cost of products (turnover) of the reporting period;
- the conditional cost of goods sold in the reporting period at basic prices.

formula for the Laspeyres aggregate price index:

30.Average arithm. and harmon.ind., connection with the unit.

The main form of general indexes is the aggregate form. The aggregate form index is constructed using the sum method. The aggregate form is used if we have element-by-element data in the reporting and base periods . Many statistical indicators characterizing various aspects of social phenomena are in a certain connection with each other (often in the form of a product). Statistics characterize these relationships quantitatively. Many economic indicators are closely interrelated and form index systems. The following is accepted factor analysis practice: if the effective indicator = the product of volumetric and qualitative factors, then the qualitative factor is fixed at the level of the base period; if the influence of a qualitative indicator is determined, then the volume factor is fixed at the level of the reporting period. Let's consider the construction of interrelated indices using the example of price indices, the physical volume of products (if we are talking about selling prices) or the physical volume of trade turnover (if we are talking about retail prices) and the product cost index (turnover in actual prices). Physical volume and price indices are factorial in relation to product cost index(turnover in actual prices):
, or
. Thus, the product of the price index and the index of physical volume of production gives an index of product value (turnover in actual prices). The index system allows you to use two known index values ​​to find the value of a third unknown. Index of physical volume of production: ;In addition to the aggregate method of calculating general indices, there is another method, which consists in calculating general indices as the average of the corresponding individual indices. To the calculation of such weighted average indexes resorted when the available information does not allow calculating the aggregate index. So, if the quantities of individual products produced in natural meters are unknown, but individual indices are known
and the cost of production of the base period ( p 0 q 0 ), we can determine the arithmetic average index of the physical volume of production. The initial basis for construction is the aggregate form. From the available data, only the denominator of this formula can be obtained. To find the numerator, the formula for the individual production volume index is used, from which it follows that q 1 = q 0 i q. Substituting this expression into the numerator of the aggregate form, we obtain the general index of physical volume in the form arithmetic average index of physical volume of production , where the weights are the cost of individual types of products in the base period ( q 0 p 0 ):
.