SAMPLING STRATEGIES
WHERE WE'VE BEEN
We've discussed how to operationally define outcome variables, how to develop and administer various data collection processes to measure variables that reliable and valid results can be obtained, and how to report and interpret the results of measurement processes.
WHERE WE'RE GOING NOW
We are going to discuss how to use sampling strategies to collect reliable and data about a large group by sampling the characteristics of a smaller We'll discuss various types of sampling processes and determine how a sample should be to lead to sensible inferences.
CHAPTER PREVIEW
The preceding chapters have described strategies for operationally defining d developing measurement processes to assess variables. In most cases, en we assess variables we identify the group of people whose abilities, feelings, or attitudes are of interest to us and then measure the performance of each individual within that group. However, when the numbers of people come very large or measurement strategies become complex, it is more practical to use the performance of a subgroup to estimate the performance of entire group. This estimate can be obtained using the sampling procedures cussed in this chapter. After reading this chapter, you should be able to
As every baseball fan knows, the only really perfect way to find out how the 700 major league baseball players feel about the designated hitter rule is to ask all 700 of them for their opinions. Only then can we conclude with perfect accuracy, for example, that 84% (but only 63% of the nonpitchers) are opposed to that rule. Even here, however, our conclusions would be inaccurate if there were anything ambiguous about our questioning technique or about the responses of the players or if we tried to generalize about next year's players. But what if we wanted to know the answer to this question but could not (because of lack of time or money) afford to contact all 700 players? Obviously, if we asked some of the players, we would have a better idea about their opinions than if we asked none of them. If we asked 10 players, it seems obvious that we could be more confident in our conclusions than if we asked only 2. Likewise, 50 would probably be better than 10. In addition, if these 50 players were evenly split between the National and American leagues, would that not make our conclusions more convincing? And certainly it would be better to have players from several different positions, rather than all pitchers or all designated hitters. How big does the group of players have to be before we can make useful decisions? How can we be confident of getting a representative sample of all the major league players? What does better mean when we talk about finding players to whom we can direct our questions?
The problems mentioned in the preceding paragraph focus on sampling. The term sampling refers to strategies that enable us to pick a subgroup from a larger group and then use this subgroup as a basis for making inferences about the larger group - the researcher's goal is always to generalize about the population based on observations of the sample. Sampling strategies not only make it possible to collect data from a smaller number of respondents, but these strategies also make it possible to go into greater depth with this smaller number - by asking more and deeper questions or by following up the structured questions with more open-ended or qualitative questions (see chapter 9) than would be possible with a larger group of respondents. When using such a subgroup to make decisions about the larger group, the subgroup must be as closely representative of the larger group as possible.
In discussing sampling procedures, the term sample refers to the subgroup, and the term population refers to the entire group from which the sample was drawn. In the example at the beginning of this chapter, the 700 major league baseball players are the population (assuming that this is the entire group we are talking about); whereas the smaller group of 50 players to whom we might address our questions would be a sample. The terms sample and population will be used in this way throughout the chapter.
There are many situations in which sampling is not necessary at all. If we want to know how many of the 26 graduate students in our educational research class plan to enroll in the advanced class next semester, the best way to find out would be to ask all the students in the class this question. Likewise, if a third-grade teacher wanted to know how many of her students could add three-digit numbers, the best way to answer this question would be to administer to all the students a valid test involving three-digit addition.
Sampling techniques are useful when we want to know how a large group would be described with regard to several variables, but there would be major added costs, narrow restrictions on the number of questions that could be asked, or some other difficulty in administering the data collection procedure to every member of the target population. If we wanted to find out how many graduate students in entire university were interested in taking a certain advanced course next semester, it would be difficult and expensive to ask every graduate student this question. Even if we had a list of all their names, it would be difficult to get them all to reply to a questionnaire. Finally, we would probably have time to ask only one question, and it would be unlikely that we would be able to uncover the qualitative reasons behind interest or lack of interest in the course. Similarly, if we wanted to know how many third grade children throughout the entire United States can add two-digit numbers, we would find it a lot more convenient to administer our test to a sample of first graders than to interview every third grader in the nation.
The following are a few examples of questions related to education where sampling would be helpful in finding the answers:
In many of the preceding cases, sampling would be helpful simply because there are a large number of persons in the target population. In other cases, there would be a small number of respondents, but the amount of time needed to collect the data from each person would make it desirable to deal with a smaller group. By dealing with a smaller number of freshman students, for example, a single teacher might be able to find the time necessary to discover how many children use Piaget's formal operational skills in the second example. Likewise, by dealing with a smaller number of teachers in the sixth example, the researcher could spend time with all the teachers in the sample to make sure that they have clearly understood the incentive system before expressing an opinion about it.
There are several different types of samples that could be drawn from a population. For instance, the PTA survey in the first example could be administered to any of the following samples:
Each of the preceding approaches has a fundamental weakness. As you read the rest of this chapter, you will see the flaws in these approaches. You will see that the quality of a sampling strategy depends on how the sample is drawn (including the response rate) and how many persons are in the sample. These factors will be discussed in the following sections. As you proceed with this chapter, keep in mind that the sampling strategy is not the only factor that contributes to the validity and usefulness of inferences drawn from a sample of a population. It is also essential that the data collection process (the questionnaire, the interviewing procedure, the observational strategy, etc.) be valid. A solid sampling strategy can be rendered useless by a measurement process that is invalid or lacking in depth.
The manner in which a sample is drawn is an important factor in determining how useful the sample will be for making inferences about the population from which it is drawn. It is quite possible to have a very large sample upon which no sound decision can be based. This occurs because the respondents in the sample are not really similar to the population about which we want to make generalizations. For example, it is not at all uncommon for magazines to report the results of surveys based on the responses of thousands of readers. A close examination of such surveys often reveals that the results are far less useful than if they were based on 25-50 representative respondents rather than the reported thousands. To be useful, the sample must be representative of the population about which we wish to make generalizations. To provide useful information, magazines should let their readers know what the population is to which the survey results can be generalized, but few popular magazines do this. (Their reluctance to do so is probably based on their awareness of how bad the surveys really are.)
Random sampling is generally the best and simplest way to draw a sample from a population. With random sampling, every member of the population has an equal opportunity to be included in the sample, and pure chance is the only factor that determines who actually goes into the sample. Keeping this definition in mind, which of the following is an example of random sampling?
The answer is that none of the above is an example of random sampling, because in each case some factor in addition to chance includes or excludes respondents who could be part of the sample. If the population about which we wish to generalize comprises all adult citizens who are eligible to vote in the city, then we cannot use any of the preceding examples to make completely valid generalizations about this population.
The following are examples of random selection of a sample of the adult citizens who are eligible to vote in the city:
In both of these cases, chance is the only factor that determines who will be selected for the sample.
To select a strictly random sample, it is essential to have a complete list of all the members of the population. This is often difficult to accomplish. For instance, in the example under consideration, if the population about which we wish to generalize includes registered voters as well as eligible voters have not registered, a list of such voters would hard to obtain - it may be difficult to get a list of those who have not registered to vote. On the other hand, if we decided to limit our generalizations to registered voters, then a list would be more readily accessible, although we would have to be aware that our inferences would not necessarily apply to eligible voters who had not registered, voters who had become eligible since the last election, and voters who had recently moved into town. In addition, it would be possible that some voters who had been eligible in the last election had become ineligible since that time. As we said, it is sometimes very difficult to obtain a complete list of the entire population.
When we deal with smaller, more clearly defined populations, the process of devising a comprehensive list is much simpler. For example, for each of the hypothetical voter populations described earlier, a list would be readily available or could easily be compiled. To draw a random sample, it would be necessary merely to draw a sample of names from the list at random.
Random sampling has such a clear advantage over most other methods with regard to generalizations we can make that it should be used as often as possible. This means, for example, that if we have a choice of collecting responses from 50 persons selected at random or from 300 respondents selected through a nonrandom process, the small but random sample is preferred.
Biased sampling is the worst way to draw a sample. With this "method," we put together our sample by using naturally occurring or artificially constructed groups of subjects without the benefit of random selection. For example, if a professor wants to determine how many students would enroll in an experimental course, she could ask the 30 students taking her Introductory Educational Research course this question. If she tried to generalize beyond these 30 students, she would have a biased sample. There is no good reason to believe that these 30 students are typical of the 970 other graduate education students in her university. Likewise, if a researcher stood in a shopping mall and questioned whoever walked past, she would be getting a biased sample. How would she know that the people who walked past were representative of anyone about whom she would wish to generalize? And furthermore, what makes her think that the person who is willing to talk to her is similar to the person who looks down and avoids eye contact in order to evade being interviewed?
Using the Table of
Random Numbers There are 525 students
who have taken graduate courses at the university during the
past year. Ms. Jefferson has all the names and addresses in
an AppleWorks (electronic) data base. She has decided
to select a random sample of 100 of these students to
survey. The electronic data base automatically assigns
numbers to each record (e.g., Janet Jones is record number
116 of 525). If the computer did not do this, Ms. Jefferson
would simply have to assign numbers to each student
herself. The names and addresses
happen to be stored in alphabetical order, but this is not
essential. The main reason for listing the numbers in
alphabetical order is that a systematic order makes it
easier to keep records accurately (in this case, also to
ensure that all the students are really listed). Ms. Jefferson gets a
printout of a table of random numbers like that in Appendix
B of this book. However, there is a problem: the table has
only two-digit numbers, and most of Ms. Jefferson's students
have been assigned three-digit numbers. What can she do?
Well, she could use pairs of numbers. For example, she could
select a two-digit number and the two-digit number to the
right of it. By putting them together, she would have a
four-digit number. She could then read the "hundreds" values
of these combined numbers (ignoring the left-most digit).
This strategy is what Ms. Jefferson will employ, and the
reason will become clear in the following paragraphs. Ms.
Jefferson decides that she will place her finger in the
table at random and then move downward. When she reaches the
bottom of a column, her plan is to move to the top of the
next column. Ms. Jefferson follows
this plan and enters the table in the 11th line from the top
of the third column, where she finds the number 72.
Immediately to the right is 58. Combining these two numbers,
she gets 7258. She drops the 7 and gets 258, and so she
selects student number 258 from her data base. Going down the page, she
comes to 73 and 57, and so she selects student 357 from the
data base. Next, she finds 65 and
27. Since there are only 525 students in her population,
there is no number 527, and so she skips this number and
goes down the column. She next finds 23 and 09
and selects student 309. She continues doing this until she
has the 100 subjects she needs for her sample. This strategy would work.
However, since we know that Ms. Jefferson has access to a
computer, there are two much more efficient ways to draw her
sample: Either of these
strategies would be equivalent to the use of the table of
random numbers. The table of random
numbers can also be used to draw a systematic sample. For
example, Ms. Jefferson could use the table to select a
number between 1 and 525. She could then take every fifth
student on the list until she had 100 students. (When she
reached the end of the list, she would simply "wrap around"
and continue at the beginning.)
The magazine surveys cited earlier are examples of biased sampling, and it is this bias that renders them useless. We know of a magazine that each month published the results of a "reader survey." The editors decided that they wanted a representative sample of readers each month, and so they decided to do something random. They reasoned that the expiration dates on their magazines occurred in a random manner, so with each renewal notice they sent a survey form. This way they were able to get what they presented as a representative sample of views on interesting issues to publish each month. The respondents simply enclosed the survey in the postage-paid envelope when sending in payment for the next year's subscription! The problem, of course, is that neither subscribers who did not renew nor new subscribers would be included in the sample. This may be an interesting sales method, but it is not a good technique for obtaining a representative sample of opinions on the topics covered in the questionnaires.
In recent years, television news programs have begun conducting surveys by instructing viewers with one opinion to call one number and those with the opposite opinion to call another. There is usually a "small charge" for making the phone call. (Some day we expect to see a survey that instructs us to dial one number if we value such surveys and another if we think they are a waste of time. The results would undoubtedly show that people overwhelmingly think these are valuable!) If you do not by this time see the problem of bias in such surveys, you should reread the chapter to this point. The newscasters sometimes have the integrity to point out that these are "not scientific surveys," but the situation would be more accurately stated if the newscaster prefaced the results by saying, "This just in from an unreliable and biased source...."
The textbook says that
"if a professor wants to determine how many students would
enroll in an experimental course, she could ask the 30
students taking her Introductory Educational Research course
this question. If she tried to generalize beyond these 30
students., she would have a biased sample. There is no good
reason to believe that these 30 students are typical of the
970 other graduate education students in her
university." Our students ask, "Why
not? How do you know the sample is
biased?" Our reply is that we
don't know. That's the whole point. With random selection, we
do know that mathematical probability is the only
factor that can account for differences between the sample
and the population from which it was drawn. And
mathematicians have given us precise ways to estimate how
likely it is that these differences exist. Random selection does not
guarantee an accurate sample. It rather lets us rule
out all sources of bias other than random variability and
offers us an estimate of this degree of variability. Short
of our collecting data from the entire population, a
suitably large random sample gives the best estimate of
population characteristics. Yet our students still
ask, "But isn't it possible that this professor's
class of 30 students could be more representative of the
entire population than a random sample of 30 from the whole
school?" And our final answer is
this: Yes. Anything is possible, but our concern is
what is probable. Basing her decisions on this group
of 30 would obviously be better than putting an ad in the
newspaper asking for people interested in the experimental
course and then conducting a survey with 30 of these
respondents. But without random sampling, the professor
simply does not have a clear idea how representative her
sample is or how far inferences can be generalized. However,
by using some of the other sampling strategies discussed in
this chapter, such as quota sampling, she can improve the
quality of her biased sample - and this would enable her to
have greater confidence in her conclusions.
Biases and Really Bad Biases
On the other hand, many reputable researchers and organizations are forced to rely on nonrandom sampling. This is because random sampling is sometimes difficult to accomplish. If we want to find out, for example, how American teenagers feel about sex or drugs, it would be difficult to obtain a random sample of all the teenagers in the country - and it would be very expensive even to come close. It would be possible to obtain a random sample of all the teenagers in a specific city, perhaps, but once we try to use the results of a few small-scale surveys to generalize about the whole country, we're really dealing with a biased sample. Nonrandom sampling is most justified in cases where it is actually impossible to derive a true random sample. In such cases, reputable researchers upgrade their nonrandom sample by making it a quota or stratified sample (discussed next in this section) and by carefully delineating the precise nature of the sample and the limitations in generalizing about specific populations. These limitations should be kept in mind when we interpret these surveys.
In most cases when educators need to use a sampling strategy, they can do better than using a badly biased technique. In spite of this, various forms of biased sampling seem to be the most prevalent sampling techniques in educational settings. Unfortunately, most of the persons who use biased techniques are not aware of what they are doing wrongly. For example, many actually believe that talking to every fifth person who walks through the door or talking to whoever happens to be in the lounge are ways to get representative opinions. Such methods are examples of biased sampling, and they often result in nonrepresentative results.
It is important to cite one more example before moving beyond biased samples. What happens if you have a population of 1,000 people and select a genuine random sample of 200, then mail out a survey to which only 25% respond? Are these 50 respondents a random sample? The answer is no! This would represent a biased sampling strategy. The obvious bias is that only those who volunteered to respond are part of the actual sample available for analysis. People who volunteer to respond are by definition different from those who do not. A much better strategy would be to select a smaller sample and attempt to get 100% responses via direct contacts. In this case, responses from 50 out of 50 respondents selected in a sample would be vastly superior to 50 responses out of 200 from the entire population.
To summarize, random sampling is the best technique; biased sampling is the worst. The strategies discussed next help make a nonrandom sample come as close as possible to possessing the characteristics of a random sample.
Quota sampling provides a way to give respectability to a nonrandom sample. If done well, quota sampling can lead to strong inferences. When using this strategy, researchers identify important characteristics that they already know the target population possesses, and then they select the nonrandom (and therefore biased) sample in such a way as to make it correspond to the population with regard to these known characteristics. We might get a quota sample of American teenagers in a city by consulting census information and discovering what percentage of teenagers in that city is of each gender, what percentage belongs to each of the various races, and what percentage lives in each of several different neighborhoods. Based on this information, we would set quotas even before we set out to conduct our survey, determining that we would get a certain number of males, a certain number of females, a certain number of whites, of blacks, and so forth. When conducting the survey, would use these quotas to set the limit on how many persons possessing each characteristic we would include in our survey.
Although it is desirable to set quotas before we select the sample, it is also possible to use quota sampling strategies retrospectively. For example, a researcher who needs a sample but is forced to deal with an intact group might feel compelled to do no research at all (because the only available sample is biased). Instead, in addition to questions related to his outcome variables, this researcher might include on his questionnaire some questions about the characteristics of his respondents. These additional questions should focus on areas that are most likely to introduce biases. When analyzing the data, the researcher could compare the characteristics of the sample with those of the population to verify that there are no obvious biases.
For example, an organization with a small budget may be interested in knowing the attitudes of American college students regarding drug and alcohol use. Realizing that students are likely to react to questions by giving socially desirable answers, this organization might hire a researcher at a nearby university who is known for her ability to establish rapport and obtain frank answers from students. Because of time and travel restrictions, this researcher would have to collect her data from respondents at or near her own university. She might obtain detailed responses to a wide variety of questions from all 200 of the students in her classes, which all sophomores are required to take. Can the organization use these results to generalize about "American college students"? A further examination indicates these students had SAT scores typical of American college students, and that there were percentages of whites, blacks, Hispanics, Asians, males, females, old students, young students, rich students, poor students, liberal arts majors, engineering majors, etc. comparable with the percentages known to be typical of the rest of the country. The researcher also notes that two of her many questions overlapped almost exactly with those asked by a nationally prominent survey organization, and the responses of her students were almost identical to those. At this point, the organization has good reason to believe that these results can be generalized. Their confidence cannot be as great as if they had conducted a random survey with an equally good interviewer, but they are more confident than if they had sent their interviewer over to the local bar to interview students or if they had conducted a random survey in a manner very likely to obtain reactive, false responses.
The flaw in this after-the-fact quota sampling is that the demographics of the sample may indeed reveal obvious biases with regard to the target characteristics. Then the researcher is left with nothing more than a limitation that can be stated but no longer corrected. For instance, in the preceding example, what would the organization do if the researcher reported that her group included an over representation of Asian students and education majors? This would be a difficult problem with after-the-fact quota sampling. Preplanned quota sampling is more likely to minimize differences. In the preceding example, the problems would be minimized by selecting fewer Asian students or education majors for the sample. The retrospective strategy can increase our confidence in nonrandom samples when the subjects meet the quotas and caution us regarding the nature of biases when the subjects do no meet the quotas. (This retrospective strategy will be discussed again later in this chapter, when we discus essentially random samples.)
Systematic sampling is a strategy whereby only two factors determine membership in the sample - chance and "the system." The system is simply a way of facilitating the random selection process. For example, instead of using a table of random numbers to select a sample of 100 individuals from a list o 1,000 names, a researcher might randomly select number between 1 and 10, start with the name that corresponds to that number, and then take every 10th name on the list thereafter. The resulting sample is essentially the same as a random sample, unless there is a systematic bias in the way the names appear on the list. However, if the system employed were related to any sort of system in the list, this could be a very bad technique. For example, if a classroom were arranged so that boys and girls sat in alternate seats throughout the room, when the researcher took every 2nd or 10th child she would get an atypical sample of either all boys or all girls. Likewise, if she had a list of 2,000 names and obtained a sample of 100 by starting with a random number between 1 and 10 and selecting every 10th name, this would exclude the whole second half of the list, which would likely be a serious bias. (This problem would be overcome by taking every 20th name instead of every 10th name.) It is usually relatively easy to identify and eliminate such sources of bias in the selection of subjects by examining the lists ahead of time and asking the compilers how they were assembled.
Stratified sampling is a strategy whereby members of a sample are selected in such a way as to guarantee appropriate numbers of subjects for subsequent subdivisions and groupings during the analysis of data. For stratified sampling to be most effective, the respondents within each stratum should be selected at random. Such random stratified sampling is mistakenly thought by many novice researchers to be the ideal sampling technique. This is not quite the case. In general, simple random sampling is the easiest and most desirable procedure. Stratified sampling is useful only when you plan to subdivide the subjects for subsequent analysis to make various comparisons and decisions or when you have too large a population to be able to assign each subject a number in advance. Readers of this book are most likely to use stratified sampling for the first reason. National marketing and political polling organizations often use it for both reasons.
The use of stratified sampling can be clearly illustrated with an example from a political survey. Let's say we are going to conduct a survey during which we will ask respondents to indicate (1) their race, (2) the candidate for whom they intend to vote, and (3) their attitude toward environmental issues. A random sample of 200 people from a town of 50,000 would generally give us a good estimate of (1) how many persons of each race live in the town, (2) how many persons planned to vote for each candidate, and (3) the percentage of persons for and against each environmental issue. However, what if we wanted to know whether black Democrats were more similar to white Democrats than to black Republicans, or whether within the Democratic Party blacks and whites differed on environmental issues? If there were only 10% blacks in the town, this would give us a probability of 20 blacks in the sample. If blacks tended to be 80% Democrat in this town, this would give us 4 black Republicans and 16 black Democrats.
As we shall discover in the next section, the validity of inferences based on 4 or 16 subjects is not nearly as strong as the validity of inferences based on 200 subjects. If we want to make such subanalyses, we should "stack" the sample by including many more blacks, so that we would have more for the subanalysis. Such a move, of course, would bias the overall sample by giving disproportionate weight to black subjects. This overrepresentation would have to be countered by a weighting strategy to correct this false emphasis. Stratified sampling would be used in this case specifically to provide ample numbers of subjects for later subdivision, subanalysis, and reporting.
What is the difference between stratified sampling and quota sampling? They are similar in that both specify numbers of subjects to be included in the sample based on selected characteristics. However, they differ sharply in their purpose. The purpose of quota sampling is to make the sample as closely representative as possible of the larger population with regard to important characteristics. This is done by selecting proportional numbers of subjects with specific characteristics. To pursue the example from the preceding paragraph, if the original population had 10% blacks, then the quota sample should have 10% blacks. On the other hand, the purpose of stratified sampling is to ensure sufficient numbers for subanalysis. To do this, the researcher will often select subjects in such a way as to deliberately make the entire sample dissimilar to the larger population with regard to the specified characteristics. For example, the researcher might select 30% blacks so that there will be enough black respondents in the sample to permit meaningful subanalysis based on the race of respondents.
Essentially random is a term often applied to a sample that was not randomly selected but that the researcher thinks is unbiased anyway. For example, there might be 300 students in the freshman class at your high school, and you might want to find out what percentage of them can read with at least sixth grade ability. Testing all 300 would be expensive and time-consuming, but you could easily test 30 of them. Pursuing the matter, assume you discover that the English classes are heterogeneously grouped. The factor that appears to influence who goes into what English class is when they eat lunch. This, in turn, depends on a combination of alphabetical order and what electives the students are taking. At this point, you might argue that such classes are "essentially randomly selected." However, you could improve your logic further by using quota sampling strategies. You might select one of the classes at random and discover that it contains about the expected number of males and females, the expected number of persons from various racial and ethnic groups, the expected number of persons from various academic tracks, and so forth. This would further strengthen your case that the sample is free of bias. To top it off, you might examine their most recent standardized test scores and note that these are similar to the mean of the entire class. Having done all this, the conclusion that only 35% of the freshmen can read at the level of a sixth grader or better would have a great deal more weight than if you had no reason believe that your sample was essentially random.
The comparative advantages and disadvantages the four major sampling strategies are summarized Table 8.1. The table summarizes the following points: (1) random sampling is the simplest and best strategy if there is a complete list of the population; (2) systematic sampling is almost as good as random sampling; (3) quota sampling gives respectability to nonrandom samples; and (4) stratified sampling is useful in situations where no list of the population is available or when subdivision of the sample is intended and adequate numbers are not likely to be present in the subunits through regular random sampling. (Additional sampling strategies, as well as combinations of the preceding strategies, are discussed in Asher [1976]. Purposive sampling, which is often employed in qualitative research studies, is described in chapter 9.)
Table 8.1 The Relative
Advantages and Disadvantages of the Four Basic Sampling
Techniques Random
sampling Theoretically
most accurate. Influenced only
by chance. Sometimes a list
of the entire population is unavailable or
practical considerations or prevent random
sampling. Systematic
sampling Similar to
random sampling. Often easier
than random sampling. The system can
sometimes be biased. Quota
sampling Can be used when
random sampling is impossible. Quick to
do. There may still
be biases not controlled by the quota
system. Stratified
sampling Ensures large
enough sample to subdivide on important
variables. Needed when
population is too large to list. Can be combined
with other techniques. Can be biased if
strata are given false weights, unless weighting
procedure is used for overall analysis.
Part I
Categorize each of the following as either badly biased sampling, random sampling, systematic sampling, quota sampling, or stratified sampling:
Part 2
Evaluate the quality of each of the sampling techniques from Part 1. How good would the sample be if the strategy were carried out?
HOW LARGE SHOULD THE SAMPLE BE?
In addition to depending on the procedure by which it is selected, the quality of a sample depends up its size. In general, if a sample is scientifically selected, we can place more confidence in the results of a larger sample than we can in the results of a smaller sample. This is because the likelihood that the characteristics of a discrepant minority improperly influence our perceptions of the whole population decreases as the sample grows larger (Note, however, that adding 25 people to a sample of 5 will result in a much greater increase in accuracy than will adding 25 people to a sample of 100.) At a certain point, the benefits from increasing the size of the sample may be outweighed by the cost of sampling a larger number of respondents.
How large should a sample be? To answer this question, we need to undertake a brief exploration of confidence intervals. (These are closely related to the concept of the standard error of measurement, which was discussed in chapter 5.) A confidence interval states a range of numbers, such as ±5% or ±10%. When we use a sample to estimate a population characteristic, we are aware that it is just that - an estimate. The confidence interval states how accurate we think this estimate is. (Newscasters often use term probable error when they indicate the confidence intervals of survey results.) The confidence interval can be applied to the sample estimate to indicate the range within which the population characteristic almost certainly falls.
This can best be understood by considering an example. Imagine a PTA president who has found at 37% of a sample of parents responding to a survey stated that they would come to a meeting on Thursday night if child-care services were provided. She would make an inference from this sample that about 37% of the entire population of parents would have been likely to give this same answer, if they all would have been surveyed. She would be aware that her estimate of 37% is not an exact measurement, as would be obtained by asking all 1,000 parents for their answer to the survey question. If she had a confidence interval of ±15% for this estimate, this would mean that the true percentage of persons who would have said yes to this question is likely to fall somewhere in the range between 52% and 22%. This a very wide range. (The true percentage, of course, could be ascertained by directing the question to everyone in the population.) If the confidence interval were smaller, the range within which the true percentage would be likely to fall would have been smaller. For example, if her confidence interval were ±5%, then it would be expected that the actual number of persons willing to come to the meeting would probably be somewhere between 42% and 32%. If the confidence interval were ±1%, then she would expect the actual number to fall somewhere within the range of 38% and 36%. (A confidence interval of ±3% is considered desirable in high-quality surveys, like those performed by major polling organizations to make projections regarding presidential elections. That's why these national surveys often report a sample size of about 1,300.)
{Note: Actually, a confidence interval of ±15% typically means that there is a 95% probability that the percentage falls between 52% and 22%. The tables in this book are based on 95% confidence intervals. Other possibilities (such as 99% confidence intervals) can be used as well.}
As you can see, by keeping the confidence intervals narrow we increase the expectation that we are making an accurate estimate. These confidence intervals are based on sound mathematical theory, which will not be explored here. The critical factor in determining confidence intervals is the number of persons constituting a sample. As the size of the sample increases, the confidence intervals become more narrow, and our estimates are likely to be more accurate. By applying some simple mathematical procedures or consulting an appropriate table before we conduct a survey, we can estimate ahead of time how large a sample we will need in order to obtain confidence intervals that will give us a satisfactory amount of confidence in the accuracy of the results we expect to receive from the sample.
|
Table 8.2 Initial Estimates of Confidence Intervals Based on Sample Size
|
Table 8.2 presents estimates of confidence intervals based on sample size. A sample of 100 gives confidence intervals of ±9.8%. A sample of 200 gives smaller confidence intervals, ±6.9%. By using this table, you can easily estimate the confidence intervals surrounding results obtained from surveys. In addition, you can use this table to decide how large a sample you need in order to have an acceptable degree of confidence in your results. For example, if you live in a city of 500,000 and want to estimate a characteristic with ±5% confidence intervals, a sample of 400 would accomplish this.
{Note: The confidence intervals in Table 8.2 are sometimes conservative estimates. A more complete treatment of sampling theory (e.g., Asher, 1976) would describe correction factors. A correction factor is used when the sample is a major part of the population. For instance, a sample of 50 from a population of 50,000 would yield confidence intervals of ±14%, as shown in Table 8.2. However, a sample of 50 from a population of 100 would yield more narrow confidence intervals - about ±9.9%, which could be ascertained by using a correction factor not discussed in this book.}
Analyzing Subgroups from Surveys
Survey results can be put to valuable use to perform various subanalyses. For example, a principal might want to know whether the responses of parents who attended PTA meetings the previous year differed from those who had not previously attended meetings. She might also want to know whether parents with more than one child in the school gave different responses than parents with only one child in attendance. Likewise, she might be interested in knowing whether the answers from one-parent families were different from those from two-parent families. Such information can be useful in helping her make decisions.
It is important to remember that whenever we subdivide the original sample for such subanalyses, we reduce the size of our sample - and therefore increase the size of our confidence intervals. The result is a reduction in accuracy. For example, assume that a principal decided to sample 100 families to estimate their attitudes toward a proposed change. This would give her a confidence interval of ±9.8% for her whole sample. What if she wanted to compare the responses of persons who had attended meetings the previous year with those who had not done so? Let's assume that 25% had attended the previous year. This would mean that her sample of 100 would probably contain about 75 nonattenders and 25 attenders. What kind of confidence intervals would she have for her analysis of the responses of attenders? Considering the interval levels given in Table 8.2, we would have an interval of about ±20% for 25 people. On the other hand, since their sample is larger, the 75 nonattenders would have a confidence interval of about ±12%.
Faced with such difficulties, the principal would be left with two alternatives: (1) keep the same sample and merely acknowledge the weaknesses of the confidence intervals in the subanalyses, or (2) draw the sample in such a way as to ensure a sufficiently large subsample for the subanalyses. Accepting the first alternative would be a reasonable decision in many circumstances; it would merely be necessary to keep in mind that inferences based on the subanalysis would not be on as firm a footing as those derived from the overall sample. The second alternative requires stratified sampling, which was discussed earlier in this chapter.
To use stratified sampling, we would select our sample in such a way that each subsample (stratum) for subanalysis will have enough members to provide the desired confidence intervals. Thus, if our principal wanted to do the subanalysis comparing attenders and nonattenders, she would have to select 100 families from each category in order to provide 10% confidence intervals for each of these subgroups. This would be good for the subanalysis, but notice what has happened to the overall analysis. In the original population, there were 75% nonattenders 25% attenders. In the sample, however, there now 50% nonattenders and 50% attenders. Thus, the adjustment to improve the confidence intervals has resulted in a bias in the overall sample. A mathematical adjustment (see Asher, 1976) would be necessary to restore the proper proportions in the overall sample.
WHAT TO DO ABOUT NONRESPONDENTS
In an ideal world, a researcher would identify a sample of 100 persons to interview, would interview all 100 of them, and would be able to interpret the results according to the confidence intervals derived from Table 8.2. In reality, however, things go wrong. Some people are impossible to find, and others simply refuse to answer our questions.
The confidence intervals listed in Table 8.2 are based on the assumption of random selection of respondents. Collecting data from a badly biased sample of 5,000 people who call a "900" telephone number to register an opinion is no better than collecting data from 5 people who called the same number. But what happens when a researcher makes a legitimate attempt to collect data about a large population but is still able to collect data from only 200 of the 250 respondents selected for the sample? The following questions and answers focus on this problem of what to do when not everyone in a sample responds.
The following guidelines should influence your choice of strategies for dealing with the problem that occurs when people selected to be part of a sample choose not to respond:
The moral of the story is that it's best to use random sampling. However, when random strategic are impossible, the next best course of action may b to plan to use random sampling, even though you fail to carry out this plan, but to be aware of how you failed and to use this information to help interpret the data.
The discussion in this
chapter tries to clarify sampling theory by describing how
to obtain appropriate samples. If you understand how to draw
a good sample, you should be able to interpret the results
of surveys reported in professional journals or in the
public media. However, a few guidelines
may be helpful:
Interpreting the Results of Nonrandom
Surveys
For example, if there is a huge percentage of
nonrespondents, the results may be useless. A response
rate of 100 out of a random sample of 125 from a
population of 1,000 is likely to be more representative
than a response rate of 100 out of the entire
population.
Try to determine the characteristics of the
nonrespondents, and judge whether these characteristics
are likely to be related to the conclusions of the
report. For example, if a researcher sent out 200
questionnaires and received 100 responses, it would be
useful to notice whether students academically in the top
half of the class responded more often than those in the
bottom half. If the report focused on the income of the
alumni or the rate at which they graduated from college,
this would be an obvious bias. On the other hand, if the
top and bottom halves of the class were about equally
represented among the 100 respondents, then it would be
much more likely that the results would be authentic.
The basic method for interpreting nonrandom samples is to employ the retrospective quota strategies described earlier. The quality of the sample is likely to improve to the extent that the researcher followed the prescribed steps. The author of the report should supply enough information for you to determine how closely the sample of actual respondents resembles the population to which you would like to generalize the results.
LONGITUDINAL VERSUS CROSS-SECTIONAL SURVEYS
A frequent goal of researchers is to study the characteristics of persons as they go through a program or an institution. For example, a researcher may want to find out how the percentages of students in grades 9 through 12 differ regarding drug use or ability to think at a specified level of abstraction. The following are conclusions that may arise from surveys of learners:
Longitudinal Surveys
When researchers continue to survey the same persons as they move through a program or an institution, this is referred to as a longitudinal survey. The preceding conclusions could be based on longitudinal surveys. (If this were the case, the conclusions should be phrased more carefully to state that the same subjects were followed throughout the study.)
A serious problem that often occurs during longitudinal research is that persons surveyed during an initial stage of a study may be unavailable to be surveyed during later stages. This presents problems comparable to experimental mortality, which will be discussed in chapter 10. Changes in response patterns of the group may occur because the composition of the group changed rather than because of changes in the attitudes or behaviors that the researcher intends to measure. In addition, a serious difficulty in conducting longitudinal research is that it takes a long time to collect the data.
Cross-Sectional Surveys
Researchers may solve some of the problems of longitudinal studies by conducting cross-sectional research. With this method, researchers survey different groups of persons who are simultaneously moving through a program or an institution. Instead of following a single group of students through a program or institution, the researchers examine participants currently at different levels when the survey is conducted. With this approach, there is no problem with dropouts. However, it is possible that differences among the different levels may occur because of differences in the subjects arising from sampling or from factors unrelated to the institution or program being studied. Sometimes researchers combine both methods by following groups of students from several grade levels as they move through a program. This combined method would permit both cross-sectional and longitudinal comparisons.
When researchers conduct either longitudinal or cross-sectional research, they typically employ sampling strategies. It is important to know how the researchers obtained their samples and to evaluate the quality of these samples according to the criteria described in this chapter.
Researchers planning to conduct or interpret extensive surveys should realize that both longitudinal and cross-sectional methods may entail unexpected difficulties. More detailed books (e.g., Magnusson & Bergman, 1990) will provide guidelines to avoid pitfalls. It is also important to note that both longitudinal and cross-sectional studies are descriptive strategies. It is difficult to demonstrate cause-and-effect relationships with survey data. To establish causality, it is best to employ the experimental or quasi-experimental methods described in chapters 11 and 12 of this book. For example, in the drug usage example cited at the beginning of this section, the relative absence of change may occur because more drug users drop out of school, thus reducing the percentage of seniors reporting illegal use.
To collect background data for his research, Eugene a Anderson decided to find out how many people within his city owned pets, what percentage of the pet owners had their pets altered to prevent unwanted pregnancies, and how various groups of people felt about increasing licensing fees for pets. There were 750,000 people living in his city. He wanted to sample 500 respondents, but the task of locating 500 randomly selected people seemed overwhelming, even if he could handle most of the work by telephone. The people would be too spread out d too hard to reach. Therefore, Mr. Anderson contacted a friend at a local marketing research firm, and he discovered that there was a shopping center at which the marketing firm did most of its research. The marketing firm had discovered that samples drawn from this shopping center were usually very similar to the rest of the city with regard to most characteristics. Indeed, the conclusions of the research that the firm conducted at this one location were usually representative of what would have been concluded if the whole city were surveyed.
This was a valuable discovery. By drawing his participants from the shopping center, Mr. Anderson would have a high-quality quota sample. He followed same procedure his friend at the marketing firm followed. He stood near the fountain in the mall and asked his question to the first person to approach him from the west at the end of each two-minute interval. In this way, he easily got his sample of 500 within two weeks. He found that 62% of the people he interviewed owned pets, 36% did not, and 2% refused to talk to him. (For purposes of his research, restricted his definition of pets to dogs and cats.) Of those who owned pets, 20% had them spayed or neutered, whereas the other 80% had not done so. Of those who had their pets altered, 45% favored raising the licensing fee. Of those who had not had their pets altered, only 40% favored raising the fee.
Mr. Anderson started to conclude that pet owners who had their pets spayed or neutered were slightly more willing to favor an increase in fees. However, first he checked his confidence intervals. The interval for a sample of 500 was ±4.4%, and therefore he felt that his estimate that 62% owned pets and 36% did not was pretty accurate. Since there were 300 persons who stated that they owned pets, this gave him a confidence interval of ±5.6% for the question about altering pets. He had 240 respondents who owned unaltered pets and 60 who owned altered pets. This meant that his estimate for that group (alterers) was within a satisfactory ±6.2%, but his confidence interval for the second group (nonalterers) was a high ±13%. This estimate was imprecise. Therefore, he decided that it would be rash to base any inferences on the relatively small discrepancy in opinions he had discovered.
Sampling makes it possible to estimate the characteristics of a larger group by examining the characteristics of a smaller group drawn from the larger one. The larger, entire group is referred to as a population. The smaller group drawn from the population is called a sample. To provide an accurate estimate of the characteristics of a population, a sampling procedure should provide a sample that resembles the population as closely as possible. Random sampling is the best procedure for drawing a sample from a population, since it maximizes the probability that the sample will be like the population in all respects except chance variations. Biased sampling is the worst way to draw a sample; since it allows uncontrolled biases into the sample, we no longer know how closely the biased sample resembles the overall population. Quota sampling attempts to upgrade nonrandom sampling by removing some of the most obvious biases. Systematic sampling is very similar to random sampling; it starts at a random point in a population and then systematically selects members for the sample. Stratified sampling is useful when we have no list of the population or when want to guarantee that we shall have enough members of subgroups within our sample to allow us to perform further subanalyses of the data.
Larger samples are more likely to furnish accurate estimates of their populations than are smaller samples. It is possible to estimate how accurately a sample of a given size from a designated population will represent the characteristics of that population. This information can be used to determine ahead of time how many individuals we should sample to be within a designated degree of accuracy in estimating the characteristics of a target population.
What Comes Next
In chapter 9 we'll study principles regarding the collection of reliable and valid data of an exploratory or interpretive nature in naturalistic settings. Starting in chapter 10, we'll begin to focus on conducting and interpreting research that deals with cause-and-effect relationships and on generalizing results in education. To apply these forms of research, we shall build upon the principles discussed in the first eight chapters.
When conducting survey research, it is important to keep in mind that the goal of selecting a sample is to obtain a small group that is as representative as possible of the larger group about which you are going to make generalizations. The following guidelines from this chapter will be helpful:
Asher, J. W. (1976). Educational research and evaluation methods. Boston: Little, Brown and Company. Chapter 7 provides a thorough discussion of the rationale behind the determination of sample size and sampling methodologies. Tables are given for correction factors.Freedman, D., Pisani, R., & Purves, R. (1988). Statistics (2nd ed.). New York: W.W. Norton. Chapters 19 to 23 offer a clear and comprehensive treatment of sampling theory. The discussion is replete with useful examples. The presentation is advanced yet comprehensible to the serious reader who wants to learn more about sampling than is contained in the present textbook.
Jaeger, R. M. (1984). Sampling in education and the social sciences. New York: Longman. This book is both theoretically accurate and readable. Practically oriented readers can easily skim over the technical parts. Any educator planning to do serious sampling research should consult this or a similar book.
Krathwohl, D. R. (1993). Methods of educational and social science research: An integrated approach. New York: Longman. Chapter 8 discusses a wider range of sampling strategies than the present chapter has described. In addition, on pages 386-388 of chapter 16, Krathwohl provides a good discussion of dealing with nonrespondents and interpreting survey results with imperfect response rates.
Magnusson, D., & Bergman, L. R. (1990). Data quality in longitudinal research. New York: Cambridge University Press. This book gives a comprehensive treatment of the problems likely to arise in collecting longitudinal data.
Rosenthal, R., & Rosnow, R. L. (1975). The volunteer subject. New York: John Wiley. The authors describe the characteristics that tend to apply to persons who volunteer to take part in surveys or experiments. This kind of information can help us determine how far we can generalize the results of research that employs volunteers.
Review Quiz 8.1
Part 2
Review Quiz 8.2