Chapter 10

Internal Validity

 

The purpose of this chapter is to help you meet the objectives listed below:

1. Define internal validity.

(Review question 1)

(Textbook emphasis pp. 218-221)

2. Define and give examples of each of the major threats to internal validity.

(Review questions 2-5)

(Textbook emphasis pp. 221-240)

(Internal Validity Exercise)

3. Describe how each of the threats operates to weaken the internal validity of a conclusion about a treatment.

(Review question 6)

(Textbook emphasis pp. 221-240)

(Internal Validity Exercise)

 

Chapter 10 in the textbook is written in a format closely resembling the programmed format used in this Workbook. Since that is the case, and since a lengthy treatment of each of the threats to internal validity is often necessary, the programmed treatment will not be repeated here. (You should reread the relevant parts the textbook chapter instead.) Rather, the Workbook supplies a Review Quiz and an Internal Validity Exercise. The answers to the Review Quiz and Exercise may provide additional instruction to clear up points of confusion. Answer these questions and perform the exercise. If your efforts reveal that you don't understand specific threats to internal validity, refer to the section of the textbook in which these topics are treated.

 

REVIEW QUIZ

1. Which of the following is the best description of internal validity?

a. It deals with the question of how consistently the test measures whatever it is that it claims to be measuring during an experiment.

b. It deals with the question of how worthwhile and how practical the results of an experiment are.

c. It deals with the question of whether the observed outcomes of an experiment are the result of the experiment itself rather than the result of some extraneous factor.

d. It deals with the question of whether the same results would occur if the experiment were replicated in a different setting.

e. It deals with the question of how far the results of a study can be generalized.

2. When a researcher is concerned that the results which she observes after an experiment might have occurred because of the composition of the group itself rather than because of the experimental treatment, with which of the major threats to internal validity is she most obviously concerned?

a. Selection.

b. Pretesting.

c. History.

d. Maturation.

e. Statistical regression.

3. If a researcher is concerned that it might have been some extraneous event that occurred while the experiment was going on rather than the experimental treatment itself which caused an observed outcome, with which of the following threats to internal validity is he most obviously concerned?

a. Selection.

b. History.

c. Maturation.

d. Pretesting.

e. Experimental mortality.

4. Mrs. Brown has administered a test to her students to find out how well they have mastered the basic skills they will need to study her math unit. She has discovered that they are weak in some areas, and therefore she decides to provide a remedial unit to bring them up to par. She provides her remedial unit and afterwards measures the students' performance again to see if they have improved. She finds a substantial degree of improvement and therefore she concludes that her remedial unit was effective. Which of the following appears to be a threat to the internal validity of her conclusion that the remedial unit was effective?

a. Maturation.

b. Pretesting.

c. History.

d. Instrumentation.

e. Experimental mortality.

5. Mr. Dickens teaches in an adult basic education program. One of his major problems is that people sometimes start coming to the program and stop shortly thereafter, without giving the program much of a chance to provide learning experiences. Mr. Dickens identifies several important counseling techniques which he and his assistants who teach the students could employ in order to develop better rapport with the students. He feels that better rapport will increase the likelihood that the students will want to stay in the program. He trains his teachers and implements this new idea, and at about the same time there are major layoffs in the steel mill which is located nearby. A large number of the unemployed persons from the steel mill come to the adult basic education classes, and a much larger percentage of students than in the past remain in the program. Mr. Dickens concludes that this improved participation occurred because of his new counseling technique. What is the most obvious threat to the internal validity of Mr. Dickens' conclusion that this new counseling program is effective?

a. Maturation.

b. Pretesting.

c. Social psychological factors

d. History.

e. Instrumentation.

6. How do the threats to internal validity operate in order to weaken a conclusion about a treatment?

a. They make it impossible to determine whether it was the treatment or the threat which caused the outcome observed after a treatment has been administered.

b. They make it impossible to implement the treatment as fully as the experimenter would like to see it implemented.

c. They make it difficult to apply appropriate statistical analyses to the results of an experiment.

d. They make it impossible to assign the subjects to experimental and control groups to examine the soundness of conclusions.

e. They provide unique features which make it impossible to generalize the results of an experiment to anything beyond that experimental situation.

 

POSSIBLE PROBLEMS AND SOLUTIONS

  1. Confusion regarding what constitutes a pretest. Some students think that a formal test has to take place. Actually, any data collection that occurs prior to the treatment may be regarded as a pretest.

  2. Confusion regarding what constitutes maturation. Many students equate maturation with the developmental changes that are studied in child psychology. While these developmental changes do constitute maturation, the term refers to any change which occurs as the natural result of the passage of time and which could influence the performance of the subjects with regard to the outcome variables. Thus, what is normally called fatigue may sometimes be considered maturation.

  3. Confusion regarding what constitutes history. The term history refers to any event that occurs while the treatment is in progress that could influence the performance of the subjects with regard to the outcome variables. Thus, if "things happen differently" during the treatment, these "things" may constitute a threat to internal validity. Note, however, if that the different event is the occurrence of a pretest, this threat is referred to as pretesting, not history. Likewise, if the different event is an alteration in the way data are collected, this threat is referred to as instrumentation, not history.

  4. Confusion regarding what constitutes selection bias. Many students equate selection with volunteering. While volunteering is often an example of selection bias, the term refers to the fact that the subjects in the experimental group may somehow be different than expected even before they receive the treatment. Thus, volunteering is just one potential source of selection bias. The two best ways to control selection bias are (a) randomly assign subjects from a larger pool to experimental and control conditions - discussed in Chapter 11, and (b) thoroughly pretest the subjects to ascertain the degree to which they already possess the outcome variable (if there is only one group) or to compare the experimental and control groups (if there are multiple groups).

  5. Confusion regarding what constitutes statistical regression. The simple fact that subject display an initially extremely high or low performance on a data collection process does not in itself constitute a setting where statistical regression is likely to occur. If an entire group is selected for an experiment and that entire group scores low (or high), no predictable pattern of regression is likely to occur. When an entire that group scored extremely low is retested, the most likely prediction for that group on the retest is another extremely low score. Regression becomes a problem only when the subjects are selected on the basis of their extreme scores to receive the treatment.

  6. Confusion of pretesting and instrumentation. The similarity is that they both deal with the data collection process. The difference is that pretesting may give the false impression that the treatment has ben effective by sensitizing the subjects to the subject matter of the data collection process, so that they either actually improve their performance as a result of this sensitization or at least appear to have improved. Instrumentation, on the other hand, may give the false impression that the treatment has been effective by actually changing the data collection process in some way.

  7. Confusion of instability and statistical regression. The similarity is that they arise because data collection processes in education are not perfectly reliable. Instability refers to the idea that the score of the entire group, when retested, may shift up or down, because of the unreliability of the data collection process. Statistical regression refers to the idea that subsets of the entire group (those selected on the basis of extreme performance) are likely to shift toward the mean when retested.

  8. Confusion of history with the treatment itself. Students occasionally believe that since the treatment itself occurs as a historical event, it poses a threat of history. This is not true. The treatment is an event that is supposed to influence the outcome variable. History consists of extraneous events that also influence the outcome variable and therefore make it difficult to ascertain the actual impact of the treatment.

  9. Confusion of threats to internal validity with similar threats to external validity. This confusion occurs with regard to pretesting, selection bias, expectancy, and history, since all of these can influence either internal or external validity. This confusion will not occur until you have read Chapter 15. Since that is the case, this problem will be discussed at that time.

 

INTERNAL VALIDITY EXERCISE

1. Mrs. Jones is a remedial reading specialist. She has given reading screening tests to all 60 first graders at Warren Harding Elementary School. She has identified the 10 weakest readers, and now she plans to provide them with her new program of Reading Skills Development. Nine months later, at the end of the school year, Mrs. Jones plans to evaluate her Reading Skills Development Program by retesting all 60 children and seeing if her 10 remedial children improve more than the other 50. Which of the following is most obviously a threat to the internal validity of Mrs. Jones's attempt to evaluate her program?

a. Maturation.

b. Pretesting.

c. Statistical regression.

d. Experimental mortality.

e. History.

2. Mrs. Smith finds that her first graders are having trouble with their basic arithmetic concepts. She therefore tries a new program and plans to evaluate them at the end of the year to see if they have made improvements. She will consider her program to be successful if the children have mastered a large number of skills at the end of the year which they had not mastered at the beginning of the year. Which of the following is the most obvious threat to Mrs. Smith's evaluation of this program?

a. Maturation.

b. Pretesting.

c. Statistical regression.

d. Experimental mortality.

e. History.

3. Ms. Anderson feels that students can learn to understand Shakespeare's plays better if they read the words while watching the play on the screen rather than merely watching and listening to the soundtrack. To test this hypothesis, she has her first period students watch and listen to the PBS presentation of Hamlet. However, she has her third period class watch the play with the sound turned off and their copies of the play open in front of them. (Both classes have already read the play.) She finds that on her test during the class session after viewing the play, the first and third period classes score about the same. She concludes that there is no additional advantage in hearing the words while viewing the play. Which of the following is the most obvious threat to the internal validity of Ms. Anderson's study?

a. Statistical regression.

b. History.

c. Experimental Mortality.

d. Instrumentation.

e. Selection bias.

4. Each year Mr. Brown provides a unit in his physical education class on "The Rules of International Athletic Competition." Since the present year will be an Olympics year, he decides to revise and upgrade this unit. He initiates his program to coincide with the start of the televised portions of the Winter Olympics. On the final exam, he asks his usual ten questions about International Athletic Competition. He finds that the students this year score substantially higher than the students the previous two years on the same questions. He concludes that his new program has been effective and resolves to continue it the next year. Which of the following is the most obvious threat to the internal validity of Mr. Brown's study?

a. Statistical regression.

b. History.

c. Maturation.

d. Instrumentation.

e. Expectancy.

5. Since the government officials have predicted an increasing energy shortage, Miss Billingham is planning to teach a unit on energy conservation. In September, she polls all the students to find out their attitudes toward various types of automobiles. She administers her program in November. In April, she polls them again on the same topic. She finds that on the second occasion the students seem to have a much more favorable attitude toward automobiles which conserve energy. She concludes that her energy conservation unit produced this impact. Which of the following is the most obvious threat to the internal validity of Miss Billingham's conclusion?

a. History.

b. Experimental mortality.

c. Pretesting.

d. Maturation.

e. Statistical regression.

6. Mr. Phipps is a basketball coach. He has developed a new strategy for teaching basic skills to his varsity players. These are all reasonably experienced players who should know better, but who sometimes make basic mistakes. Mr. Phipps develops a checklist of essential skills which he feels his players should be able to perform automatically. A student volunteer has agreed to work with the team during the whole season. During the first three practice sessions, Mr. Phipps has this student volunteer (who has never played basketball and knows none of the players) observe the twelve players and indicate whether or not they can perform each of the skills on the checklist. Then Mr. Phipps implements his new strategy. Two months later, he has the same student observe the players again during three more practice sessions. The student rates them on the same checklist. He discovers that they have improved substantially in their mastery of these essential skills. What is the most obvious threat to the internal validity of this study?

a. History.

b. Instrumentation.

c. Pretesting.

d. Experimental mortality.

e. Expectancy.

7. Mrs. Wilson is a teacher of English composition. She has attended a writing workshop at which she has discovered a dramatically new way to teach creative expression. She wants to use the new method in her classes, but the department chairman is skeptical. He says she can depart from the traditional way and use the new method only if she can prove it will work. She receives permission to try it for one semester with one of her classes. Mrs. Wilson is unperturbed, since she is confident the new method will work. She has all four sections of her composition courses write an essay. She reads and grades all the essays on the basis of creative expression. Then she uses the traditional method with three of these sections and the new method with the other one. At the end of the semester, she retests them with another essay. She finds that the students using the traditional method have not improved in creative expression, whereas those using the new method have improved substantially. Her chairman guffaws loudly and tells Mrs. Wilson that he has serious reservations about the internal validity of Mrs. Wilson's study. What is the most obvious threat to the internal validity of Mrs. Wilson's study?

a. History

b. Experimental mortality.

c. Pretesting.

d. Expectancy.

e. Statistical regression.

 

CROSS-REFERENCES TO OTHER CHAPTERS

Chapter 10 makes reference to the following concepts that are defined and discussed in other chapters. These are listed in the order in which they occurred in Chapter 10.

Dependent, independent, and intervening variables (which are the key elements in most of the figures in this chapter) are discussed in Chapter 2.

Experimental design (which is the main strategy for overcoming many of the threats discussed in this chapter) is discussed in Chapters 11 and 12.

The characteristics of volunteers (which often create a problem of selection bias as mentioned on page 223) are discussed on pages 184 and 185

Interviews (which may cause problems in instrumentation, mentioned on page 227) are discussed in Chapter 7 on pages 133-137. Guidelines for other data collection strategies are also discussed in Chapter 7.

Reliability (which is an important consideration in the discussion of statistical regression and instability) is covered in Chapter 5.

Unobtrusive measurement (which is a solution to the problem of pretesting and is mentioned on page 234) is discussed on pages 142-145.

Tests of significance (which help solve the problem of instability, introduced on page 235) are discussed in Chapter 14.

Selection bias, history, pretesting, and expectancy (which can be threats to both internal and external validity) are discussed in further detail in Chapter 15.

 

EXAMPLES OF IMPORTANT CONCEPTS IN THIS CHAPTER

The textbook chapter contains clear examples of each of the threats to internal validity. The best place to find these is in the review quizzes after each threat. You should try to make clear distinctions between the correct and incorrect answers in each of these exercises. The Internal Validity Exercise in this Workbook provides additional, clear examples.

 

DEFINING KEY TERMS

The following matching exercises focus on the key terms in this chapter. Instead of using them as matching exercises, you may find it effective to try to define each of the terms. The correct answers can be found by checking the answers to the matching exercise.

 

MATCHING EXERCISE "A"

Listed below are several terms that are employed in discussions of research design. Match each term with one of the definitions given below.

a. Experiment.

b. Manipulation.

c. Subject.

d. Treatment.

e. Observation.

f. Pretest.

g. Posttest.

h. Experimental group.

i. Control group.

 

  1. _____ An event or activity which is expected to produce an outcome.

  2. _____ The act of arranging circumstances in such a way that one set of subjects receives the treatment and a different set of subjects receives no treatment or an alternate treatment.

     
  3. _____ An observation or measurement conducted prior to a treatment.

     
  4. _____ An observation or measurement conducted after a treatment has been administered.

     
  5. _____ An attempt to establish a cause and effect relationship by administering a treatment to one group and withholding it from another group of subjects.

     
  6. _____ The act of collecting data about the performance of a subject.

     
  7. _____ A person who takes part in an experiment by receiving the treatment.

     
  8. _____ The group of subjects which receives the treatment.

     
  9. _____ The group of subjects from whom the treatment is withheld or who receive an alternate treatment.

 

MATCHING EXERCISE "B"

Listed below are the major threats to internal validity that were discussed in this chapter. Match each term with one of the descriptions given below.

a. History.

b. Selection.

c. Maturation.

d. Pretesting.

e. Statistical regression.

f. Instrumentation.

g. Experimental mortality.

h. Expectancy.

i. Social-Psychological Threats

j. Instability

  1.  _____ A group's performance during a measurement process after a treatment may arise from the selection or composition of the group itself rather than from the treatment.

  2.  _____ Observed differences in an outcome variable could be the result of changes in the data collection process rather than because of the treatment itself.

     
  3. _____ Subgroups selected on the basis of extremely high or extremely low scores tend to give a false impression of change by shifting their average toward the mean of the original group on subsequent administrations of the same or related measurement processes.

     
  4. _____ A simultaneous, extraneous event (or combination of events) during the time frame of the experiment other than the treatment is responsible for the observed outcome.

     
  5. _____ The composition of a group may change because of dropouts, and this change gives the false impression that the performance of the members of the group has changed.

     
  6. _____ An outcome variable may occur as a routine result of the passage of time rather than because of the treatment.

     
  7. _____ It might be the reaction to an initial administration of the measurement process rather than the treatment itself which has resulted in the differences observed in the outcome variable.

     
  8. _____ Chance fluctuations in the scores obtained from a measurement process may give a false impression that differences have been found where there are none.

     
  9. _____ Artificial expectations arising from the experimental situation can give the false impression that the treatment has had an impact (or, conversely, that the treatment had no impact.)

     
  10. _____ The dynamics of the social situation surrounding the introduction of the treatment may have the impact of setting up "rival treatments" that compete with the experimental treatment.

 

SUPPLEMENTARY ACTIVITIES

  1. Identify at least one example of a study in the professional literature which you think was seriously lacking in internal validity.

  2. Identify at least one example of a study in the popular media which you think was seriously lacking in internal validity.

  3. Identify at least one situation from your own professional career in which either you or a colleague has drawn a conclusion which was seriously threatened by one of the following threats to internal validity:

    a. History.
    b. Selection.
    c. Maturation.
    d. Pretesting.
    e. Statistical regression.
    f. Instrumentation.
    g. Experimental mortality.
    h. Expectancy.
    i. Instability
    j. Social-Psychological Threats

  4. Identify an example of a study which you could conduct where statistical regression would be a serious threat to the internal validity of that study.

  5. Identify a situation in either the professional or popular literature where one person has criticized the validity of the results obtained by another person. Identify the specific threats to internal validity which are stated in this critique.

  6. Write a letter to the editor of a newspaper pointing out the threat to the internal validity of a conclusion which has been published in that paper.

 

Answers to Quizzes and Exercises

Review Quiz

  1. (c) This is a good statement of the purpose of internal validity. If you chose (d) or (e), that was external validity, which is the topic of Chapter 15.

  2. (a) This is a good statement of the definition of selection bias as a threat to internal validity.

  3. (b) This is a good statement of the definition of history as a threat to internal validity.

  4. (b) The most obvious threat is that the students' scores might improve on the posttest simply because they found out on the pretest what kind of questions they would be asked and adjusted their study skills accordingly. If you thought that statistical regression should be on the list, you were wrong: the entire group's performance was low - there was no group selected on the basis of low performance. In order for (a) maturation to be the correct answer, we would have to have some reason to believe that the students would improve simply as the result of the passage of time.

  5. (d) The layoffs at the steel mill are a historical event that could provide a plausible explanation for the increased willingness to remain in the program.

  6. (a) This is a good paraphrase of how the threats to internal validity work. If you chose (e), that was external validity, which is the topic of Chapter 15.

 

Internal Validity Exercise

  1. (c) Mrs. Jones has first tested the entire group, then selected a subgroup on the basis of their extremely low scores. This subgroup, when retested, is likely to regress toward the mean.

  2. (a) Mrs. Smith should realize that first graders are likely to improve in some math-related skills through ordinary maturation that occurs as time passes during the school year.

  3. (e) She should have done something to determine that the groups were initially equal. As far as Ms. Anderson knows one group may have been much weaker than the other at the start. If this were the case, the fact that both groups were equal at the end of the experiment would indicate that the weaker group had made substantial gains.

  4. (b) The Olympics are an historical event that could account for the improved knowledge, even if Mr. Brown's program had no beneficial impact.

  5. (a) or (c) It is likely that news of the impending energy shortage served as a historical event that could have encouraged the students to learn about energy conservation. It is also possible that the pretest (having the students describe their patterns of energy use in September) could have sensitized the students to conserve energy (because they became aware they had been wasteful), and this increased awareness could have led to the improved conservation habits.

  6. (b) Since the observation strategy was new, it is likely that the student assistant would have been relatively inept at using it during the pretest. If the student assistant simply got better at using the observation strategy by the end of the experiment, this might result in improved scores. Essentially, the instrument would have changed. (The person administering the data collection process is considered to be part of the instrument.)

  7. (d) Mrs. Wilson obviously has a vested interest in seeing the new method work. There is no evidence that she did anything to minimize the chance that her expectations would influence the results, and so expectancy is a serious threat.

 

Matching Exercise "A"

1. d

2. b

3. f

4. g

5. a

6. e

7. c

8. h

9. i

 

Matching Exercise "B"

1. b

2. f

3. e

4. a

5. g

6. c

7. d

8. j

9. h

10. i