Measurements in marketing research. Some typical scales Semantic scale

E.P. Golubkov Academician of the International Academy of Informatization, Doctor of Economics, Professor of the Academy of National Economy under the Government of the Russian Federation

1. Measurement scales

To collect data, questionnaires (questionnaires) are being developed. Information for filling them is collected by taking measurements. Measurement is understood as the definition of a quantitative measure or density of a certain characteristic (property) of interest to the researcher.

Measurement is a procedure for comparing objects according to certain indicators or characteristics (features).

Measurements can be qualitative or quantitative and be objective or subjective. Objective qualitative and quantitative measurements are made by measuring instruments, the operation of which is based on the use of physical laws. The theory of objective measurements is quite well developed.

Subjective measurements are made by a person who, as it were, performs the role measuring device. Naturally, the results of a subjective measurement are influenced by the psychology of human thinking. A complete theory of subjective measurements has not yet been built. However, we can talk about the creation of a common formal scheme for both objective and subjective measurements. On the basis of logic and the theory of relations, a theory of measurements has been built, which makes it possible to consider both objective and subjective measurements from a unified standpoint.

Any measurement includes in its composition: objects, indicators and a comparison procedure.

The indicators (characteristics) of some objects (consumers, product brands, stores, advertising, etc.) are measured. Spatial, temporal, physical, physiological, sociological, psychological and other properties and characteristics of objects are used as indicators for comparing objects. The comparison procedure includes determining the relationships between objects and how they are compared.

The introduction of specific comparison indicators allows you to establish relationships between objects, for example, "more", "less", "equal", "worse", "preferable", etc. There are various ways to compare objects with each other, for example, sequentially with one object taken as a reference, or with each other in an arbitrary or ordered sequence.

Once a characteristic has been determined for a selected object, the object is said to have been measured for that characteristic. Objective properties (age, income, amount of beer drunk, etc.) are easier to measure than subjective properties (feelings, tastes, habits, attitudes, etc.). In the latter case, the respondent must convert their scores to a density scale (to some numerical system) that the researcher must develop.

Measurements can be taken using various scales. There are four characteristics of the scales: description, order, distance and the presence of a starting point.

The description involves the use of a single descriptor or identifier for each gradation in the scale. For example, "yes" or "no"; “agree” or “disagree”; the age of the respondents. All scales have descriptors that define what is being measured.

The order characterizes the relative size of the descriptors (“greater than”, “less than”, “equal”). Not all scales have order characteristics. For example, one cannot say more or less “buyer” compared to “non-buyer”.

Such a characteristic of the scale as distance is used when the absolute difference between the descriptors is known, which can be expressed in quantitative units. The respondent who bought three packs of cigarettes bought two packs more than the respondent who bought only one pack. It should be noted that when there is a "distance", then there is an order. A respondent who bought three packs of cigarettes bought “more” than a respondent who bought only one pack. The distance in this case is two.

A scale is considered to have a starting point if it has a single origin or zero point. For example, an age scale has a true zero point. However, not all scales have a zero point for the measured properties. Often they have only an arbitrary neutral point. Let's say, answering the question about the preference for a certain brand of car, the respondent replied that he had no opinion. Gradation “I have no opinion” does not characterize the true zero level of his opinion.

Each subsequent characteristic of the scale is built on the previous characteristic. Thus, "description" is the most basic characteristic that is inherent in any scale. If a scale has "distance", it also has "order" and "description".

There are four levels of measurement that determine the type of measurement scale: names, order, interval and ratios. Their relative characteristics are given in Table. one.

Table 1
Characteristics of scales of various types

The scale of names has only the characteristic of description; it assigns only its name to the described objects, no quantitative characteristics are used. The objects of measurement fall into many mutually exclusive and exhaustive categories. The naming scale establishes an equality relationship between objects that are combined into one category. Each category is given a name, the numerical designation of which is an element of the scale. Obviously, measurement at this level is always possible. “Yes”, “No” and “Agree”, “Disagree” are examples of such scales. If respondents were classified according to their occupation (name scale), then it does not provide type information; “greater than”, “less than”. In table. Table 2 provides examples of questions formulated both in the naming scale and in other scales.

table 2
Examples of questions formulated in different measurement scales

A. Name scale
1. Please indicate your gender: male, female
2. Select brands of electronic products that you usually buy:
-Sony
-Panasonic
-Phillips
-Orion
-etc.
3. Do you agree or disagree with the statement that the image of the Sony company is based on the production of high quality products agree disagree

B. Order Scale
1. Please rank the manufacturers of electronic products according to the system of your preference. Give a “1” to the firm that ranks first in your preference system; "2" - second, etc.:
-Sony
-Panasonic
-Phillips
-Orion
-etc.
2. For each pair of grocers, circle the one you prefer:
Kroger and First National
First National and A&P
A&P and Kroger
3. What can you say about the prices in "Vel-Mart":
They are higher than in the Sears
The same as in Sears
Lower than in Sears.
B. Interval scale
1. Please rate each brand of product in terms of its quality:

2. Please indicate how much you agree with the following statements by circling one of the numbers:

d. Relationship Scale
1. Please enter your age_________ years
2. Approximately indicate how many times during the last month you shopped at the duty store in the time interval from 20 to 23 hours
0 1 2 3 4 5 other number of times _______
3. How likely is it that you will use a lawyer to draw up your will?
______________ percent

The order scale allows you to rank the respondents or their responses. It has the properties of a naming scale combined with an order relation. In other words, if each pair of categories of the naming scale is ordered relative to each other, then an ordinal scale will be obtained. In order for scale estimates to differ from numbers in the ordinary sense, they are called ranks at the ordinal level. For example, the frequency of buying a certain product (once a week, once a month, or more often). However, such a scale indicates only the relative difference between the measured objects.

Often, the expected clear distinction between assessments is not observed, and the respondents cannot unequivocally choose one or another answer, i.e. some adjacent gradations of responses are superimposed on each other. Such a scale is called semi-ordered; it lies between the scales of names and order.

The interval scale also has a characteristic of the distance between individual gradations of the scale, measured using a certain unit of measurement, that is, quantitative information is used. On this scale, the differences between the individual gradations of the scale are no longer meaningless. In this case, you can decide whether they are equal or not, and if not, then which of the two is greater. The scale values ​​of features can be added. It is usually assumed that the scale has a uniform character (although this assumption requires justification). For example, if shop assistants are evaluated on a scale that has gradations: extremely friendly, very friendly, somewhat friendly, somewhat unfriendly, very, unfriendly, extremely unfriendly, then it is usually assumed that the distances between individual gradations are the same (each value from the other differs by one – see Table 2).

The ratio scale is the only scale that has a zero point, so a quantitative comparison of the results can be made. Such an addition allows us to talk about the ratio (proportion) a: b for scale values ​​a and b. For example, a respondent may be 2.5 times older, spend three times as much money, fly twice as often as another respondent (Table 2).

The chosen measurement scale determines the nature of the information that the researcher will have when studying some object. But rather, it should be said that the choice of a scale for measurements is determined by the nature of the relationship between objects, the availability of information and the goals of the study. If, say, we need to rank product brands, then, as a rule, we do not need to determine how much one brand is better than another. Therefore, there is no need to use quantitative scales (intervals or ratios) for such a measurement.

In addition, the type of scale determines what kind of statistical analysis can or cannot be used. When using a denomination scale, it is possible to find distribution frequencies, average trends in modal frequency, calculate correlation coefficients between two or a large number series of properties, the use of non-parametric criteria for testing hypotheses.

Among the statistical indicators at the ordinal level, indicators of the central trend are used - the median, quartiles, etc. To identify the interdependence of two signs, the Spearman and Kendall rank correlation coefficients are used.

On the numbers belonging to the interval scale, you can perform quite a variety of actions. The scale can be compressed or stretched any number of times. For example, if the scale has divisions from 0 to 100, then by dividing all numbers by 100, we get a scale with values ​​​​from the interval from 0 to 1. You can shift the entire scale so that it consists of numbers from -50 to +50.

In addition to the algebraic operations discussed above, interval scales allow all statistical operations inherent in the ordinal level; it is also possible to calculate the arithmetic mean, dispersion, etc. Instead of rank correlation coefficients, the Pearson pair correlation coefficient is calculated. A multiple correlation coefficient may also be calculated.

All the calculation operations listed above are also applicable to the scale of ratios.

It must be borne in mind that the results obtained can always be converted to a simpler scale, but never vice versa. For example, the gradations “strongly disagree” and “somewhat disagree” (interval scale) can be easily translated into the “disagree” category of the scale of names.

Using measurement scales

In the simplest case, the assessment of the measured attribute by some individual is carried out by choosing, as a rule, one answer from a series of proposed ones or by choosing one numerical score from a certain set of numbers.

To assess the quality being measured, graphic scales are sometimes used, divided into equal parts and provided with verbal or numerical designations. The respondent is asked to make a mark on the scale in accordance with his assessment of this quality.

As mentioned above, the ranking of objects is another widely used measurement technique. When ranking, an assessment is made according to the measured quality of a set of objects by ordering them according to the severity of this feature. The first place, as a rule, corresponds to the highest level. Each object is assigned a score equal to its place in the given ranked series.

The advantage of ranking as a method of subjective measurement is the ease of implementation of procedures that does not require any time-consuming training of experts. However, it is almost impossible to sort big number objects. As experience shows, when the number of objects is more than 15 - 20, experts find it difficult to build rankings. This is explained by the fact that in the process of ranking, the expert must establish the relationship between all objects, considering them as a single set. As the number of objects increases, the number of links between them grows in proportion to the square of the number of objects. Preservation in memory and analysis of a large set of relationships between objects are limited by the psychological capabilities of a person. Therefore, when ranking a large number of objects, experts can make significant errors. In this case, the method of paired comparisons can be used.

Paired comparison is a procedure for establishing a preference for objects when comparing all possible pairs and then ordering objects based on the results of the comparison. Unlike ranking, which sorts all objects, pairwise comparison of objects is more a simple task. Pairwise comparison, like ranking, is a measurement in an ordinal scale.

However, this approach is more complex and is more likely to be used in surveys of experts rather than mass respondents.

Let us assume that the attitude to such product values ​​as “benefit”, “design”, “quality”, “warranty period”, “after-sales service”, “price”, etc. is clarified. We assume that a simple ranking (determining the weights of features ) is difficult or has great importance a sufficiently accurate determination of the scale weights of the characteristics under study, therefore their direct expert determination cannot be carried out. Let's designate for simplicity these values ​​by symbols A1, A2, A3,..., Ak.

Respondents (experts) compare these characteristics in pairs in order to establish the most important (significant) of them in each pair.

From the symbols we form all possible pairs: (А1А2), (А1А3), etc. In total, such pair combinations will turn out to x (k - 1) / 2, where k is the number of evaluated features. Then the objects are ranked according to the results of their pairwise comparison , .

The method of paired comparisons can also be used in determining the relative weights of goals, criteria, factors, etc., carried out when conducting various marketing research.

In many cases, when compiling questionnaires, it is not advisable to develop measurement scales from scratch. It is better to use the standard types of scales used in the marketing research industry. These scales include the Modified Likert Scale, the Life Style Study Scale, and the Semantic Differential Scale.

On the basis of a modified Likert scale (interval scale), adapted to the goals of the ongoing marketing research, the degree of agreement or disagreement of respondents with certain statements is studied. This scale is symmetrical and measures the intensity of respondents' feelings.

In table. 3 is a questionnaire based on the Likert scale. This questionnaire can be used when conducting telephone surveys of consumers. The interviewer reads out the questions and asks the interviewees to determine the extent to which they agree with each statement.

Table 3
Questionnaire to identify the opinion of the consumer regarding the product of a certain brand

There are various options for modifying the Likert scale, for example, a different number of gradations is introduced (7 - 9).

The scale for the study of life style is a special area of ​​​​application of the modified Likert scale and is designed to study the value system, personal qualities, interests, opinions about work, leisure, shopping various people. This information allows you to make effective marketing decisions. An example of a questionnaire for studying life style is given in Table. four.

Table 4
Life Style Questionnaire

Please circle the number that best represents the degree to which you agree or disagree with each statement.

Statementstrongly agreeI agree to some extent I am neutral Disagree to some extentstrongly disagree
1. I buy a lot of special items1 2 3 4 5
2. I usually have one or more of the latest fashions.1 2 3 4 5
3. The most important thing for me is my children1 2 3 4 5
4. I usually keep my house in great order.1 2 3 4 5
5. I prefer to spend the evening at home than go to a party.1 2 3 4 5
6. I like to watch or listen to broadcasts of football matches.1 2 3 4 5
7. I often influence friends' purchases.1 2 3 4 5
8. Next year I will have more money for shopping.1 2 3 4 5

The semantic differential scale contains a series of bipolar definitions that characterize various properties of the object under study. Since many marketing stimuli are based on mental associations and relationships that are not explicitly expressed, this type of scale is often used when determining the image of a brand, store, etc. The results of studying the opinions of consumers regarding two restaurants (#1 and #2) based on the semantic differential scale are given in Table. 5.

Table 5
Comparative evaluation of two restaurants

Legend: the solid line is ratings for restaurant #1, the dotted line is ratings for restaurant #2.

In table. 5 specifically, all positive or negative ratings are not located only on one side, but are randomly mixed. This is done in order to avoid the “halo effect”. It consists in the fact that if the first estimated object has the first higher estimates ( left-hand side questionnaire) compared to the second object, then the respondent will tend to continue to put marks on the left.

One of the advantages of this method is that if individual gradations in the scale are assigned numbers: 1, 2, 3, etc., and the data of different respondents are entered into the computer, then the final results can be obtained in graphical form (Table 5 ).

When applying the above scales, the question arises of the appropriateness of using a neutral point. It all depends on whether or not the respondents have a neutral opinion. It is not possible to give an unequivocal recommendation on this issue.

The same can be said about whether to build a scale symmetrical or asymmetrical.

There are a great many options for scales built on the basis of the principles outlined. The final choice is usually made on the basis of a test of the level of reliability and accuracy of measurements made using various scale options.

Reliability and Validity of Marketing Information Measurement

The methods of constructing the scales described above do not give a complete picture of the properties of the estimates obtained. Additional procedures are needed to identify inherent errors in these estimates. Let's call this the measurement reliability problem. This problem is solved by identifying the correctness of the measurement, stability and validity.

In the study of correctness, general acceptability is established this method measurements (scales or systems of scales). The concept of correctness is directly related to the possibility of taking into account various kinds of systematic errors as a result of measurement. Systematic errors have a certain stable nature of occurrence: either they are constant, or they change according to a certain law.

Stability characterizes the degree of coincidence of the measurement results during repeated applications of the measurement procedure and is described by the magnitude of the random error. It is determined by the constancy of the respondent's approach to answering the same or similar questions.

For example, you are one of the respondents answering the questionnaire in Table. 5 regarding the activities of a certain restaurant. Due to the slow service at this restaurant, you were late for a business meeting, so you gave the lowest rating for this indicator. A week later, you received a call asking you to confirm that you had actually taken part in the survey. You were then asked to answer a series of follow-up questions over the phone, including a question about the speed of service on a scale of 1 to 7, with 7 being the fastest service. You bet 2 by demonstrating high level the identities of the scores and therefore the stability of your scores.

Most complex issue the reliability of the measurement is its validity. Validity is connected with the proof that a well-defined given property of an object is measured, and not some other more or less similar one.

When establishing reliability, it should be borne in mind that three components are involved in the measurement process: the object of measurement, the measuring means by which the properties of the object are displayed on the numerical system, and the subject (interviewer) who makes the measurement. The prerequisites for a reliable measurement lie in each individual component.

First of all, when a person acts as an object of measurement, he can have a significant degree of uncertainty regarding the property being measured. So, often the respondent does not have a clear hierarchy life values, and consequently, it is impossible to obtain absolutely accurate data characterizing the importance of certain phenomena for him. He may be poorly motivated, as a result of which he inattentively answers questions. However, only in the last place should one look for the reason for the unreliability of estimates in the respondent himself.

On the other hand, it may be that the method of obtaining an estimate is not able to give the most accurate values ​​of the measured property. For example, the respondent has a detailed hierarchy of values, and to obtain information, a scale with variations of answers only “very important” and “not at all important” is used. As a rule, all values ​​from the above set are marked with the answers “very important”, although in reality the respondent has a greater number of significance levels.

Finally, in the presence of high accuracy of the first two components of the measurement, the subject making the measurement makes gross errors; the instructions for the questionnaire were not clearly drawn up; the interviewer formulates the same question differently each time, using different terminology.

For example, during the interview, during which the respondent's value system should be revealed, the interviewer could not convey the essence of the survey to the respondent, could not achieve a friendly attitude towards the study, etc.

Each component of the measurement process can be a source of error associated with either stability, correctness, or validity. However, as a rule, the researcher is not able to separate these errors according to their sources of origin and therefore studies the errors of stability, correctness and validity of the entire measuring complex in the aggregate. At the same time, the correctness (as the absence of systematic errors) and the stability of information are the elementary prerequisites for reliability. The presence of a significant error in this respect already nullifies the validity check of the measurement data.

In contrast to correctness and stability, which can be measured quite strictly and expressed in the form of a numerical indicator, the validity criteria are determined either on the basis of logical reasoning or on the basis of proxy indicators. Usually, data from one method are compared with data from other methods or studies.

Before proceeding to the study of such components of reliability as stability and validity, it is necessary to make sure that the chosen measurement tool is correct.

It is possible that the subsequent stages will turn out to be redundant if at the very beginning it turns out the complete inability of this tool at the required level to differentiate the population under study, in other words, if it turns out that some part of the scale or one or another gradation of the scale or question is not systematically used. And, finally, it is possible that the original feature does not have a differentiating ability in relation to the object of measurement. First of all, it is necessary to eliminate or reduce such shortcomings of the scale and only then use it in the study.

Among the shortcomings of the scale used, first of all, should be attributed the lack of scatter of responses according to the scale values. The hit of answers in one point indicates the complete unsuitability of the measuring tool - the scale. This situation may arise either due to “normative” pressure towards the generally accepted opinion, or due to the fact that the gradations (values) of the scale are not related to the distribution. given property for the objects under consideration (irrelevant).

For example, if all respondents agree with the statement “it’s good when a construction tool is universal”, there is not a single answer “disagree”, then such a scale will not help to differentiate the attitude of respondents to different types building tools.

Using part of the scale. Quite often it turns out that only some part of the scale actually works, some one of its poles with an adjacent more or less extensive zone.

So, if respondents are offered a scale that has positive and negative poles, in particular from +3 to -3, then when evaluating some obviously positive situation, respondents do not use negative assessments, but differentiate their opinion only with the help of positive ones. In order to calculate the value of the relative measurement error, the researcher must know for sure what metric the respondent uses - all seven gradations of the scale or only four positive ones. Thus, a measurement error of 1 point says little if we do not know what the actual variation in opinions is.

For questions with qualitative gradations of answers, a similar requirement can be applied to each item of the scale: each of them must receive at least 5% of the answers, otherwise we consider this item of the scale to be inoperative. The requirement of a 5% filling level for each gradation of the scale should not be considered strictly mandatory; depending on the objectives of the study, higher or lower values ​​of these levels can be put forward.

Uneven use of individual scale items. It happens that a certain value of a feature systematically falls out of the field of view of the respondents, although neighboring gradations characterizing a lower and higher a high degree the severity of the feature, have significant content.

A similar picture is observed in the case when the respondent is offered a scale that has too much detail: being unable to operate with all the gradations of the scale, the respondent chooses only a few basic ones. For example, respondents often regard a ten-point scale as some modification of a five-point scale, assuming that "ten" corresponds to "five", "eight" - "four", "five" - ​​"three", etc. At the same time, basic assessments are used much more often. , than others.

To identify these anomalies of uniform distribution on a scale, the following rule can be proposed: for a sufficiently high confidence probability (1-a > 0.99) and, therefore, within fairly wide limits, the content of each value should not differ significantly from the average of neighboring content. What is the chi-square test used for?

Definition of gross errors. In the process of measurement, gross errors sometimes occur, the cause of which may be incorrect records of the initial data, poor calculations, unskilled use of measuring instruments, etc. This is found in the fact that in the measurement series there are data that differ sharply from the totality of all other values. To find out whether these values ​​should be recognized as gross errors, a critical limit is set so that the probability that the extreme values ​​will exceed it would be small enough and would correspond to a certain level of significance a. This rule is based on the fact that the appearance of excessively large values ​​in the sample, although possible as a result of the natural variability of values, is unlikely.

If it turns out that some extreme values ​​of the population belong to it with a very low probability, then such values ​​are recognized as gross errors and excluded from further consideration. It is especially important to identify gross errors for samples of small volumes: not being excluded from the analysis, they significantly distort the parameters samples. For this, special statistical criteria for determining gross errors are used.

So, the differentiating ability of the scale, as the first essential characteristic of its reliability, implies: ensuring sufficient data spread; identification of the respondent's actual use of the proposed length of the scale; analysis of individual "outliers" values; exclusion of gross errors. After the relative acceptability of the scales used in these aspects has been established, one should proceed to identifying the stability of the measurement on this scale.

Measurement stability. There are several methods for assessing the stability of measurements: retesting; including equivalent questions in the questionnaire and dividing the sample into two parts.

Often interviewers at the end of the survey partially repeat it, saying at the same time: “Finishing our work, we will again briefly go through the questions of the questionnaire so that I can check whether I have correctly recorded everything from your answers.” Of course, we are not talking about repeating all the questions, but only the critical ones. However, it must be remembered that if the time interval between testing and retesting is too short, then the respondent may simply remember the initial answers. If the interval is too long, then some real changes may take place.

The inclusion of equivalent questions in the questionnaire involves the use of questions on the same problem in one questionnaire, but worded differently. The respondent should perceive them as different questions. The main danger of this method lies in the degree of equivalence of questions; if this is not achieved, then the respondent answers different questions.

The division of the sample into two parts is based on a comparison of answers to the questions of two groups of respondents. It is assumed that the two groups are identical in composition and that the mean response scores for the two groups are very close. All comparisons are made on a group basis only, so comparison within a group is not possible. For example, college students were surveyed using a modified five-point Likert scale about their future career. The questionnaire contained the statement: “I believe that I am expected brilliant career". Responses were summarized from “strongly disagree” (1 point) to “strongly agree” (5 points). Then the total sample of respondents was divided into two groups and the average scores for these groups were calculated. The average score was the same for each group and equaled 3 points. These results gave grounds to consider the measurement reliable. When we analyzed the group answers more carefully, it turned out that in one group all students answered “both agree and disagree”, and in the other group, 50% answered “strongly disagree”, and the other 50% answered “strongly agree”. As can be seen, a deeper analysis showed that the answers are not identical.

Due to this shortcoming, this method of assessing the stability of measurements is the least popular.

One can speak about the high reliability of the scale only if repeated measurements of the same objects with its help give similar results. If the stability is checked on the same sample, then it often turns out to be sufficient to make two consecutive measurements with a certain time interval - such that this interval is not too large to affect the change in the object itself, but not too small so that the respondent can from memory “pull up” the data of the second measurement to the previous one (i.e., its length depends on the object of study and ranges from two to three weeks).

There are various indicators for assessing the stability of measurements. Among them, the root mean square error is most commonly used.

So far, we have been talking about absolute errors, the size of which was expressed in the same units as the measured value itself. This does not allow us to compare the measurement errors of different features on different scales. Therefore, in addition to absolute ones, relative indicators of measurement errors are needed.

As an indicator for bringing the absolute error into a relative form, you can use the maximum possible error in the scale under consideration, by which the arithmetic mean measurement errors are divided.

However, this indicator often “does not work well” due to the fact that the scale is not used throughout its entire length. Therefore, the relative errors calculated on the actually used part of the scale are more indicative.

To increase the stability of the measurement, it is necessary to find out the distinguishing features of the items on the scale used, which implies a clear fixation by the respondents of individual values: each assessment should be strictly separated from the next one. In practice, this means that in successive samples, respondents clearly repeat their assessments. Therefore, a high distinguishability of scale divisions should correspond to a small error.

But even with a small number of gradations, i.e., with a low level of distinguishing capabilities of the scale, there may be low stability, and then the scale fraction should be increased. This happens when categorical answers “yes”, “no” are imposed on the respondent, but he would prefer less harsh assessments. And therefore he chooses in repeated trials sometimes “yes”, sometimes “no”,

In the event that a mixture of gradations is detected, one of two methods of scale enlargement is used.

First way. In the final version, the fractionality of the scale is reduced (for example, from a scale of 7 intervals, they switch to a scale of 3 intervals).

The second way. For presentation to the respondent, the former fractionality of the scale is retained and only during processing are the corresponding points enlarged.

The second method seems to be preferable, since, as a rule, a large fraction of the scale induces the respondent to a more active reaction. When processing data, the information should be recoded in accordance with the analysis of the distinctiveness of the original scale.

An analysis of the stability of individual questions of the scale allows: a) to identify poorly formulated questions, their inadequate understanding by different respondents; b) clarify the interpretation of the scale proposed for assessing a particular phenomenon, and identify a more optimal variant of the fractional scale value.

Validity of the measurement. Validation of the validity of the scale is undertaken only after sufficient correctness and stability of the measurement of the initial data have been established.

Validity of measurement data is the evidence of consistency between what is measured and what should have been measured. Some researchers prefer to proceed from the so-called cash validity, that is, validity in terms of the procedure used. For example, they believe that satisfaction with a product is the property that is contained in the answers to the question: “Are you satisfied with the product?”. In serious marketing research, such a purely empirical approach may not be acceptable.

Let us dwell on possible formal approaches to determining the level of validity of the methodology. They can be divided into three groups: 1) designing a typology in accordance with the objectives of the study based on several features; 2) use of parallel data; 3) judicial procedures.

The first option cannot be considered a completely formal method - it is just some schematization of logical reasoning, the beginning of a justification procedure, which can be completed here, or can be supported by more powerful means.

The second option requires the use of at least two sources to identify the same property. Validity is determined by the degree of consistency of the relevant data.

In the latter case, we rely on the competence of the judges, who are asked to determine whether we are measuring the property we need or something else.

The constructed typology is to use control questions, which, together with the main ones, give a closer approximation to the content of the property under study, revealing its various aspects.

For example, you can determine your satisfaction with the car model you are using with a head-on question: “Are you satisfied with your current car model?” Combining it with two other indirect ones: "Do you want to switch to another model?" and “Do you recommend your friend to buy this car model?” allows for a more reliable differentiation of respondents. Next, a typology is carried out in five ordered groups from the most satisfied with the car to the least satisfied.

The use of parallel data consists in the development of two equal methods for measuring a given attribute. This allows you to establish the validity of the methods relative to each other, that is, to increase the overall validity by comparing two independent results.

Let's consider various ways of using this approach and, first of all, equivalent scales. Equivalent selections of features are possible to describe the measurement of behavior, attitude, value orientation, i.e. some setup. These samples form parallel scales, providing parallel reliability.

We consider each scale as a way to measure some property and, depending on the number of parallel scales, we have a number of measurement methods. The respondent gives answers simultaneously on all parallel scales.

When processing this kind of data, two points should be clarified: 1) consistency of points on a separate scale; 2) consistency of assessments on different scales.

The first problem arises from the fact that response patterns do not represent an ideal picture; answers often contradict each other. Therefore, the question arises of what to take as the true value of the respondent's assessment on this scale.

The second problem directly concerns the comparison of parallel data.

Let's consider an example of a failed attempt to improve the reliability of the measure of "car satisfaction" using three parallel ordinal scales. Here are two of them:

Fifteen judgments (in the order indicated on the left, at the beginning of each line) are presented to the respondent in a general list, and he must express his agreement or disagreement with each of them. Each judgment is assigned a score corresponding to its rank in the indicated five-point scale (on the right). (For example, agreement with judgment 4 gives a score of "1", agreement with judgment 11 - a score of "5", etc.)

The method of presenting judgments as a list considered here makes it possible to analyze the points of the scale for consistency. When using ordered scales of names, it is usually considered that the items forming the scale are mutually exclusive and the respondent will easily find the one that suits him.

The study of the distributions of answers shows that the respondents agree with contradictory (from the point of view of the initial hypothesis) judgments. For example, on the “B” scale, 42 people out of 100 simultaneously agreed with judgments 13 and 12, that is, with two opposite judgments.

The presence of contradictory judgments in the answers on the B scale leads to the need to consider the scale unacceptable.

This approach to improving the reliability of the scale is very complex. Therefore, it can only be recommended for the development of responsible tests or methods intended for mass use or panel studies.

It is possible to test one method on several respondents. If the method is reliable, then different respondents will give the same information, but if their results agree poorly, then either the measurements are unreliable, or the results of individual respondents cannot be considered equivalent. In the latter case, it must be established whether any group of results can be considered more reliable. The solution of this problem is all the more important if it is assumed that obtaining information by any of the considered methods is equally admissible.

The use of parallel methods for measuring the same property faces a number of difficulties.

First, it is not clear to what extent both methods measure the same quality of an object, and, as a rule, there are no formal criteria for testing such a hypothesis. Therefore, it is necessary to resort to a meaningful (logical-theoretical) substantiation of a particular method.

Second, if parallel procedures are found to measure common property(the data do not differ significantly), the question remains about theoretical justification application of these procedures.

It must be admitted that the very principle of using parallel procedures turns out to be not a formal, but rather a substantive principle, the application of which is very difficult to justify theoretically.

One of the widely used approaches to establishing validity is the use of so-called judges, experts. Researchers approach a specific group of people with a request to act as competent persons. They are offered a set of features designed to measure the object under study, and are asked to evaluate the correctness of attributing each of the features to this object. The joint processing of the opinions of the judges will allow assigning weights to the features or, what is the same, scale marks in the dimension of the object under study. A list of individual judgments, characteristics of an object, etc. can act as a set of features.

Judging procedures are varied. They can be based on methods of paired comparisons, ranking, sequential intervals, etc.

The question of who should be considered judges is quite debatable. Judges chosen as representatives of the studied population, one way or another, must represent its micromodel: according to judges' assessments, the researcher determines how adequately certain points of the survey procedure will be interpreted by the respondents.

However, when selecting judges, a hard-to-answer question arises, what is the influence of the judges' own attitudes on their assessments, because these attitudes can differ significantly from the attitudes of the subjects in relation to the same object.

In general terms, the solution to the problem is to: a) carefully analyze the composition of judges in terms of the adequacy of their life experience and characteristics social status relevant indicators of the surveyed general population; b) to reveal the effect of individual deviations in judges' scores relative to the overall distribution of scores. Finally, not only the quality but also the size of the sample of judges should be assessed.

On the one hand, this number is determined by consistency: if the consistency of opinions of judges is high enough and, accordingly, the measurement error is small, the number of judges can be small. It is necessary to set the value of the allowable error and, based on it, calculate the required sample size.

If the complete uncertainty of the object is detected, i.e. in the case when the opinions of judges are distributed evenly across all categories of assessment, no increase in the sample size of judges will save the situation and will not bring the object out of the state of uncertainty.

If the object is sufficiently indefinite, then a large number of gradations will only introduce additional interference into the work of the judges and will not bring more accurate information. It is necessary to reveal the stability of the judges' opinions with the help of a repeated test and, accordingly, to narrow the number of gradations.

The choice of one or another particular method, method or technique of testing for validity depends on many circumstances.

First of all, it should be clearly established whether any significant deviations from the planned measurement program are possible. If the research program sets a rigid framework, not one, but several methods of testing the data for validity should be used.

Second, it must be kept in mind that the levels of robustness and data validity are closely related. Unstable information, already due to insufficient reliability according to this criterion, does not require too strict verification for validity. Sufficient stability should be ensured and only then appropriate measures should be taken to clarify the boundaries of data interpretation (i.e., to identify the level of validity).

Numerous experiments to identify the level of reliability allow us to conclude that in the process of testing measurement tools, from the side of their reliability, the following sequence of main stages of work is advisable:

a) Preliminary control of the validity of methods for measuring primary data at the stage of developing the methodology. Here it is checked how much the information corresponds to its purpose in essence and what are the limits of the subsequent interpretation of the data. For this purpose, small samples of 10–20 observations are sufficient, followed by adjustment of the method structure.

b) The second stage is the piloting of the methodology and a thorough check of the stability of the initial data, especially the selected indicators and scales. At this stage, a sample is needed that represents a micromodel of the real population of the subjects.

c) During the same general pilotage, all necessary operations related to checking the level of validity are carried out. The results of the pilot data analysis lead to the improvement of the methodology, to the refinement of all its details and, as a result, to the final version of the methodology for the main study.

d) At the beginning of the main study, it is desirable to carry out a stability test of the method used in order to calculate accurate indicators of its stability. The subsequent clarification of the boundaries of validity runs through the entire analysis of the results of the study itself.

Regardless of the reliability assessment method used, the researcher has four successive steps to improve the reliability of measurement results.

First, in the case of extremely low reliability of measurements, some questions are simply dropped from the questionnaire, especially when the degree of reliability can be determined during the development of the questionnaire.

Secondly, the researcher can “roll up” the scales and use fewer gradations. For example, the Likert scale in this case can only include the following gradations: “agree”, “disagree”, “have no opinion”. This is usually done when the first step has been completed and the examination has already been carried out.

Third, as an alternative to the second step, or as an approach following the second step, the reliability assessment is carried out on a case-by-case basis. Let's say a direct comparison of the answers of respondents during their initial and repeated testing or with some equivalent answer is carried out. Responses from unreliable respondents are simply not included in the final analysis. It is obvious that if this approach is used without an objective assessment of the reliability of the respondents, then by throwing out “objectionable” answers, the results of the study can be adjusted to the desired ones.

Finally, after the first three steps have been used, the level of reliability of the measurements can be assessed. Usually, the reliability of measurements is characterized by a coefficient that varies from zero to one, where one characterizes the maximum reliability.

It is generally believed that the minimum acceptable level of reliability is characterized by the figures 0.65-0.70, especially if the measurements were taken for the first time.

Obviously, in the process of conducting various and numerous marketing researches by different firms, there was a consistent adaptation of measurement scales and methods for their implementation to the goals and objectives of specific marketing research. This makes it easier to solve the problems discussed in this section, and makes it rather necessary when conducting original marketing research.

Reliability (validity) of measurements characterizes completely different aspects than the reliability of measurements. The measurement may be reliable, but not reliable. The latter characterizes the accuracy of measurements in relation to what exists in reality. For example, a respondent was asked about his annual income, which is less than $25,000. Not wanting the interviewer to give the true figure, the respondent indicated an income of “more than $100,000”. When retesting, he again called this figure, demonstrating a high level of measurement reliability. Lies are not the only reason for the low level of measurement confidence. You can also call a bad memory, poor knowledge of reality by the respondent, etc.

Let's consider another example that characterizes the difference between the reliability and reliability of measurements. Even watches with inaccurate movements will show the time at one hour twice a day, demonstrating high reliability. However, they can go very inaccurately, i.e. the time display will be invalid.

The main direction of checking the reliability of measurements is to obtain information from various sources. This can be done in different ways. Here, first of all, the following should be noted.

We must strive to formulate questions in such a way that their wording contributes to obtaining reliable answers. Further, questions related to each other can be included in the questionnaire.

For example, the questionnaire contains a question about the extent to which the respondent likes a certain food product of a certain brand. And then it is asked how much of this product was bought by the respondent in the last month. This question aims to test the validity of the answer to the first question.

Often, two different methods or sources of information are used to assess the reliability of measurements. For example, after completing the questionnaires in writing, a number of respondents from the initial sample are additionally asked the same questions by phone. The degree of their reliability is judged by the similarity of the answers.

Sometimes two samples of respondents are formed on the basis of the same requirements and their answers are compared to assess the degree of reliability.

Questions to check:

  1. What is a dimension?
  2. How is objective measurement different from subjective measurement?
  3. Describe four scale characteristics.
  4. Define the four types of scales and indicate the types of information contained in each of them.
  5. What are the arguments for and against the use of neutral gradation in a symmetrical scale?
  6. What is the Modified Likert Scale and how does the Lifestyle Scale and the Semantic Differential Scale relate to it?
  7. What is the "halo effect" and how should the researcher control it?
  8. What components determine the content of the concept of "measurement reliability"?
  9. What are the disadvantages of the measurement scale used?
  10. What methods for assessing the stability of measurements do you know?
  11. What approaches to assessing the level of validity of measurements do you know?
  12. How is the reliability of a measurement different from its validity?
  13. When should a researcher evaluate the reliability and validity of a measurement?
  14. Suppose you are a market researcher and you are approached by the owner of a private grocery store with a request to create a positive image for the store. Design a semantic differential scale to measure the relevant metrics of a given store's image. When performing this work, you must do the following:
    a. Conduct a brainstorming session to identify a set of measurable indicators.
    b. Find relevant bipolar definitions.
    in. Determine the number of gradations on the scale.
    d. Select a method for controlling the halo effect.
  15. Design a measurement scale (justify the choice of the scale, the number of gradations, the presence or absence of a neutral point or gradation; think about whether you are measuring what you planned to measure) for the following tasks:
    a. A toy company wants to know how preschoolers react to the Sing With Us video game, in which the child has to sing along with cartoon characters.
    b. A dairy company is testing five new yoghurt flavors and wants to know how consumers rate these flavors in terms of their sweetness, pleasantness and richness.

References

  1. Burns Alvin C., Bush Ronald F. Marketing Research. New Jersey, Prentice Hall, 1995.
  2. Evlanov L.G. Theory and practice of decision making. M., Economics, 1984.
  3. Eliseeva I.I., Yuzbashev M.M. General theory of statistics. M., Finance and statistics, 1996.
  4. Sociologist's workbook. M., Nauka, 1977.

semantic differential(English) semantic differential) - a method of constructing individual or group semantic spaces(English) semantic space). object coordinates in the semantic space are its assessments on a number of bipolar graduated (three-, five-, seven-point) rating scales (eng. rate scale), whose opposite poles are given with the help of verbal antonyms. These scales were selected from a variety of trial scales using factor analysis methods.

The semantic differential method was introduced into psychological research by Charles Osgood. Charles E. Osgood) in 1952. C. Osgood substantiated the use of three basic seven-point rating scales:

semantic differential(in the narrow sense) is also called the bipolar graduated rating scale used in the semantic differential method.

Calculation of the distance indicator between features in semantic differential.

S.D. as a method of self-report, in the application of which the researcher receives the information he needs from the words of the respondent himself, the following characteristics are characteristic: 1) Closeness (limited) - evaluation of the value of a feature on a given scale; space scale between opposite values ​​is perceived by the subject as a continuum of gradations of the severity of the trait. 2) Orientation ( controllability) - directed associations about given objects, evaluated by respondents on a number of scales, representing those characteristics of the object of study that interest the sociologist and consider them important for the study of a particular problem. 3) Evaluation - procedure S.D. includes the respondent as the subject of assessment, the object of assessment and the scale as an assessment tool. four) scaling- method S.D. involves obtaining information about the severity of the object of certain qualities, given by a set of scales. 5) Projection - at the heart of the method of S.D. there is an assumption that for the respondent the object being evaluated acquires significance not only because of its objective content, but also because of its reasons associated with the personal attitude of the respondent attached to the object of study. 6) Mass character - the possibility of using the method in mass surveys. 7) Standardization - respondents are presented with the same instruction, evaluation objects and scales.

9.Possibilities and limits of application of projective technique

Retrospective talks about the past, and talks on creative thoughts. Projective methods are aimed at measuring personality traits and features of intelligence. They have a number of features that make them significantly different from standardized methods, namely:

Features of the stimulus material; A distinctive feature of the stimulus material projective methods is its ambiguity, uncertainty, lack of structure, which is a necessary condition for the implementation of the projection principle. In the process of interaction of the personality with the stimulus material, its structuring takes place, during which the personality projects the features of its inner world: needs, conflicts, anxiety, etc.

Features of the task assigned to the respondent; A relatively unstructured task that allows for an unlimited variety of possible answers is one of the main features of projective techniques. Testing using projective techniques is disguised testing, since the respondent cannot guess what exactly in his answer is the subject of the experimenter's interpretation. Projective methods are less susceptible to falsification than questionnaires based on information about the individual.

· features of processing and interpretation of results. There is a problem of standardization of projective methods. Some methods do not contain a mathematical apparatus for objective processing of the results obtained, do not contain norms. These methods are primarily characterized by a qualitative approach to the study of personality, and not quantitative, like psychometric tests. And therefore, adequate methods for checking their reliability and giving them validity have not yet been developed. Some techniques have developed parallel shapes (Holtzmann's inkblot method) as an example of solving the problem of reliability. There are approaches to solving the problem of the validity of projective methods. For a more accurate study, the data obtained using projective methods should be correlated with data obtained using other methods.

Focus group procedure

The main parameters of a focus group study, such as the number of participants, their social characteristics, the number of groups, etc., are determined by two factors: the general methodological requirements for conducting group interviews and the requirements arising from the objectives of the study. Let's start with the first and then move on to the second.

Number of participants

This issue is considered well-established and has a long history. The key formula, from which the subsequent authors did not deviate in principle, was expressed by R. Merton and his co-authors: “The size of the group should obviously be determined by two considerations. It should not be so large as to be unmanageable or prevent adequate participation by the majority of members, and it should not be so small that it does not provide substantially more coverage than a one-person interview. Experience shows that these requirements the best way are satisfied with a group size of 10-12 people.. Whatever the purpose, the group should not be enlarged to such an extent that the majority of its members are only an audience for the few who have the opportunity to speak.

It seems that over the past decades, researchers' opinions about the optimal number of participants have shown a downward trend. Today, the most appropriate size of the group is determined by the majority of authors in 8-10 people [b3, b5, etc.]. This, perhaps, is not the limit yet, since some authors advocate a further reduction in the number of participants to 6-8 people [b8, 262 and etc.]. The organization of a discussion with 12 participants is now considered by many researchers as overkill, since it is possible, but difficult to conduct groups with such a number of respondents, and the moderator is no longer able to involve everyone in the discussion. In two short hours, respondents with such a large number cannot get used to interacting with so many new acquaintances; equally, the moderator cannot give due attention to everyone. This is all the more true for groups with an even larger number of participants.

It should be noted that the question of the optimal number of groups still remains controversial. His decision largely depends on the personal style and qualifications of the moderator. So, D. Templeton insists on 10-12 respondents, but when reading her work, one gets the feeling that as a moderator she has an outstanding and therefore rare talent, and this makes it difficult to replicate her experience. We will return to the issue of correlating the number of respondents and the moderators' personal style of work in later chapters.

The minimum number of participants in which the specific effects of group discussion can still somehow manifest itself, according to various authors, is 4-5 people. The typical opinion this issue sounds like this:

"A group interview can be conducted with a minimum number of participants equal to five, since this number can still adequately represent the range of opinions to some extent and create general interaction. If there are less than five people in a group, we try to collect information as much as possible, but in principle we try to refuse to hold such groups" [b3].

Describing the balance of factors that determine the optimal size of a group within the above limits, it should first of all be noted that the properties of groups depend very strongly on its size, and a difference of 1-2 people has a very noticeable effect on their dynamics.

An increase in the number of participants over 10-12 leads, as already mentioned, to a decrease in controllability, which manifests itself mainly in two aspects: either a passive audience appears, and the exchange of remarks begins to be carried out between a small number of people who have seized the initiative, or a general discussion breaks up into several private ones, conducted between neighbors on the table. In the first case, a sample shift occurs because the opinion of the “silent majority” turns out to be unrepresented, and the totality of active persons can represent a very specific contingent. In the second case, uncontrolled private discussions quickly move away from the given topic. and recording the discussion becomes technically impossible.

The growth of anarchy, characteristic of large groups, to some extent can be restrained by an increase in the rigidity of the leadership of the discussion and the introduction of stricter discussion rules. In this case, the number of participants can be increased, perhaps even up to 15-20 people [Merton], but the group nature of the discussion is inevitably lost, i.e. the group interview turns into a series of individual and, moreover, highly structured (the latter is due to the very small amount of time allotted to each respondent). Apparently, such interviews can no longer be called focus groups.

The process of reducing the effectiveness of group discussion with a decrease in the number of participants is more difficult to describe. The decrease in the number of opinions submitted, associated with a decrease in the number of respondents, cannot be considered the main argument, since the increase in the number has a natural limit. From our point of view, two factors are the most important. First, respondents should feel a certain lack of time allotted for speeches. An acute shortage of time demoralizes, but a moderate one mobilizes; requires speaking succinctly and only to the point, and not indulging in lengthy arguments. In addition, the pressure on the time budget felt by everyone helps the moderator to stop those who would like to usurp the right to speak. Secondly, a decrease in the number of respondents also reduces the number of opinions presented, and this leads to impoverishment of the subject of discussion and increases the likelihood that it will follow some atypical path that does not adequately model the dynamics of opinions in society. Apparently, it is these effects that T. Trynbaum means, who points out that members of small groups tend to unconsciously act as experts, and not as average consumers who report their personal experience. This situation is not constructive, because the average consumer cannot play the role of an expert. D. Morgan adds to this that small groups are very sensitive to the dynamics of interactions between participants, since in such groups the action of individually colored personal relationships (likes, dislikes, etc.) is formed faster and acts much more strongly than in large ones.

The most appropriate number of group members depends on the level of experience of the moderator and partly on the personal style of his work. An inexperienced moderator should not be guided by the maximum number of participants. From our point of view, the optimal size of a focus group for a novice moderator is 7 people. This number is optimal in terms of ease of discussion management. With 8-10 participants, the difficulties increase, but so does the useful effect that can be obtained by an experienced moderator. With 6 or fewer participants, difficulties of a different kind increase, which include lengthy monologues, deviation from the topic, fading of the discussion, etc., which the moderator also needs to cope with,

SEMANTIC DIFFERENTIAL SCALE

A self-report technique for assessing attitudes in which participants are asked to tick those boxes from a set of polarized adjectives or sentences that best describe their feelings towards an object.

One of the most popular techniques for measuring attitudes in marketing research is semantic differential scale.

It has been found to be particularly useful in researching the image of a corporation, brand or product.

This scale grew out of a study by Charles Osgood and colleagues at the University of Illinois related to the hidden structure of words. This technique has, however, been adapted to make it suitable for measuring expectations.

The original semantic differential scale consisted of a large number of bipolar adjectives that were used to determine human responses to an object of interest. Osgood found that most reactions can be grouped into three main arrays: (1) the array ratings, represented by such pairs of adjectives as bad-good, sweet-sour, useful-useless; (2) array strength, represented by such pairs of adjectives as mighty-helpless, strong-weak, deep-shallow; (3) array activity, represented by such pairs of adjectives as fast-slow, alive-dead, quiet-noisy. These three arrays have shown a tendency to occur regardless of the object being evaluated. Thus, the generally accepted rule when using the technique of semantic differential when creating a scale was to choose a suitable sample from acceptable or basic pairs of adjectives, so that the object could be assessed using each of the arrays - assessment, strength and activity. This object could subsequently be compared with other objects using the obtained estimates.

Market researchers have taken Osgood's general idea and adapted it to their own needs. First, instead of using major pairs of adjectives for the objects of their interest they developed their own. These pairs were not always opposite in meaning and were not always only two words. In addition, the researchers used separate phrases to indicate the ends of the scale, and some of these phrases contained expectations attributed to the product. For example, one end of the scale could be labeled "worth the money" and the other end "not worth the money." Second, rather than using estimates of valuation, strength, and activity, marketers were more interested in creating profiles of brands, stores, companies, or anything else to be compared, and overall scores by which objects could be compared. . In this regard, the use of the semantic differential in marketing research followed rather the method of using the sum rating scale in relation to the construction of the scale itself.

Rice. 14.2 is parallel to fig. 14.1 in relation to the characteristics used to describe the bank, but it is in a semantic differential format. Everything we have done in Fig. 14.2, they tried to express those words that could be used to characterize the bank and thus serve as a basis for forming attitudes in terms of positive and negative statements. Note that the negative phrases sometimes appear on the right side of the list and sometimes on the left. This is done in order to prevent a respondent with a positive attitude from simply crossing out the right or left phrase without trying to read their content carefully.

Rice. 14.2 An example of the form of a semantic differential shala

Service is impolite: - : : - : - : - : - : - : - : - : Service is polite

Convenient location: - : : - : - : - : - : - : - : - : Location is inconvenient

Opening hours are inconvenient: - : : - : - : - : - : - : - : - : Convenient opening hours

Loan rates are high: - : : - : - : - : - : - : - : - : Loan rates are low

Such a scale can be used in the survey. Each respondent will be asked to read the entire set of polarizing phrases and mark those that best describe their feelings about the object of interest. Typically, respondents are instructed to consider the last position on the scale as very extreme performance object, central position - as neutral and intermediate positions - as slightly characterizing or quite characterizing object of interest. So, for example, if a survey participant feels that the service at Bank A is polite, but to a rather modest extent, he will mark the sixth position on the scale, if read from left to right.

A person may be asked to rate several banks using the same scale. When several banks are evaluated, their individual portraits can be compared. For example, fig. 14.3 (sometimes called snake diagram because of its shape) shows that Bank A is viewed as being more polite, more conveniently located, and offering loans at lower interest rates but with more inconvenient opening hours than Bank B. Note that when constructing of these profiles, all positive descriptions were located on the right side. This practice makes it much easier to interpret the results. Marked scores represent the average score for all participants for each description item. The resulting profile gives a clear picture of how respondents perceive the differences between the two banks.

Likert scale- this is an assessment of some statement, most often on a symmetrical, usually five-point scale with values:

1) unconditionally agree;

2) rather agree;

3) agree and disagree equally;

4) rather disagree;

5) definitely disagree.

Evaluate the attitude to statements like:

o "This store sells high quality products."

o "This store has poor customer service."

o "I love shopping at this store."

Semantic differential is a method of constructing individual or group semantic spaces. The coordinates of the object in the semantic space are its assessments on a number of bipolar graded (three-, five-, seven-point) rating scales, the opposite poles of which are set using verbal antonyms. These scales were selected from a variety of trial scales using factor analysis methods.

Osgood justified the use of three basic seven-point rating scales: "assessment": good 3 2 1 0 -1 -2 -3 bad "strength": strong 3 2 1 0 -1 -2 -3 weak "activity": active 3 2 1 0 −1 −2 −3 passive The semantic differential (in the narrow sense) is also called the bipolar graduated evaluation scale used in the semantic differential method. Contents * 1 Construction of value coordinates * 2 Applications of the method * 3 Literature * 4 Notes.

8. Bogardus scale. Guttman scale.

Bogardus social distance scale

It is one of the first methods, developed. to measure attitudes towards racial and ethnic groups. This scale is based on the following fundamental assumption: the greater the prejudice an individual experiences in relation to a particular group, the less he wants to interact with the members of this group. Items are formulated in terms of inclusion or exclusion. “Would you like to have X spouse?” is an inclusive question. “Would you like to ban all Y from coming to America?” - exclusive question.

Ethnic, interethnic tolerance- this is a tolerant or positive attitude towards people of a different nationality and race.

ethnic xenophobia- this is a negative attitude, fear or hatred towards representatives of another nationality, race.

Guttman scale. The main assumption is that the attitude is some one-dimensional structure, and the responses of the respondent, expressing it, on a good scale should represent a certain coherence and hierarchy. According to Gutman, this means that a respondent with a certain attitude accepts (agrees) with some statements and does not accept others. Thus, statements form a certain ordered set: from those that are accepted by the majority of people to those that are accepted by a few. Method is based on the principle of homogeneity, and the scale itself is cumulative: the items are formulated and ordered in such a way that the respondent's choice of any of them implies automatic agreement with all items of a lower rank. Attitude measurement consists in the fact that the respondent indicates those statements on the scale that he can accept; at the same time, he uses only dichotomous answers ("yes - no" or "agree - disagree"). The assessment of the attitude is the assessment of the corresponding class (point) of the scale. Thus, if the final score received by the respondent is known, it is possible to predict his answers for all statements.

semantic differential

Instruction

This part of the study determines the value of each department store to you by rating them using a series of scales limited by opposite adjectives. Using an "X" mark the place on the scale between opposite adjectives that best describes your opinion of the store.

Please make marks on each scale without missing any.

The form

Sears this is:

Powerful:-:-:-:-:X:-:-: Weak

Unreliable:-:-:-:-:-:X:-: Reliable

Modern:-:-:-:-:-:-:X: Old fashioned

Cold:-:-:-:-:-:X:-: Warm

Caring:-:X:-:-:-:-:-: Indifferent

Respondents make marks on the scale in those places that best reflect their opinion about the object being evaluated [I]. Thus, in our example Sears was rated as weak, reliable, very old-fashioned, warm and caring. A negative adjective or phrase can be either on the right or on the left of the scale. This allows you to control the tendency of some respondents, who are too positive or negative about the object in question, to put marks only on the right or left side without reading the descriptions of the items. Previously, the author described methods for selecting scale categories and compiling a semantic differential scale. Based on this material, the author developed a semantic differential scale to measure perceptions of people and products (Box 9.2. "Marketing Research Practice").

Individual items of the semantic differential can take values ​​from -3 to +3 or from 1 to 7. The data obtained is usually analyzed using profile analysis, when average or median values ​​​​are calculated for each rating scale and then compared using graphing or statistical analysis. This helps to determine the common features of the difference and similarity of objects. To assess differences between segments of respondents, the researcher compares the average responses of different segments. Although the mean value is often used as a summary, the question of treating the obtained data as interval values ​​remains debatable. On the other hand, in cases where the researcher needs to make a general comparison of objects, for example, when determining store preferences, the scores for individual items are summed to obtain overall assessment object.

Box 9.2. Marketing research practice

Semantic differential scale for measuring perceptions of people and products

1. Coarse:-:-:-:-:-:-:-: Fine

2. Delightful:-:-:-:-:-:-:-: Calm

3. Uncomfortable:-:-:-:-:-:-:-: Comfortable

4. Dominant:-:-:-:-:-:-:-: Minor

5. Thrifty:-:-:-:-:-:-:-: Wasteful

6. Pleasant:-:-:-:-:-:-:-: Unpleasant

7. Modern:-:-:-:-:-:-:-: Not modern

8. Organized:-:-:-:-:-:-:-: Unorganized

9. Rational:-:-:-:-:-:-:-: Emotional

10. Early:-:-:-:-:-:-:-: Mature

11. Formal:-:-:-:-:-:-:-: Informal

12. Conservative:-:-:-:-:-:-:-: Liberal

13. Complex:-:-:-:-:-:-:-: Simple

14. Colorless:-:-:-:-:-:-:-: Colorful

15. Humble:-:-:-:-:-:-:-: Vain

The versatility of the semantic differential scale has made it very popular in marketing research. It is widely used to compare brands, products, company images, to develop advertising and promotion strategies, and to develop new types of products. There are several types of the main scale.