Theoretical

Site hosted by Angelfire.com: Build your free website today!

J Chron Dis Vol. 38, No. 7, pp. 543-548. 1985

THE "CASE-CONTROL" STUDY: VALID SELECTION OF SUBJECTS

Olli S. Miettinen

School of Public Health, Harvard University, Boston, Massachusetts, U.S.A.

INTRODUCTION

Valid selection of subjects in "case-control" studies remains problematic on the level of prinnciples even [1], to say nothing about the difficulties of research practice itself.

As an illustration, consider the hypothetical example of testing the hypothesis that a cause of traveller's diarrhea is the consumption of tequila (a Mexican drink), with cases derived from a hospital in Acapulco, Mexico, over a defined period of time. What might be the proper "control" group?

Lilienfeld, in his textbook on epidemiology [2], expresses the commonly held view that the "control" group should be representative of "the general population" (as to the exposure rate). But the meaning of this is very obscure in the example at hand.

This general concept is made somewhat more restrictive in the recent textbook by Schlesselman, specific to "case-control" studies [3]. He refers to a need, in general, to sample "the target population," defined as "a subset of the general population that is both at risk of the study exposure(s) and the development of the study disease." While, as was noted, the meaning of even "the general population" is already quite unclear in the example at hand, totally mysterious is the concept of its subset that is "at risk" for both tequila use and the development of traveller's diarrhea.

Whatever may be the meanings of those concepts in this example, it is clear that the common practice of using neighbors (who might reside in Boston, Montreal, etc.) as "population controls" would be very inappropriate—grossly exaggerating the causal relation of the incidence of traveller's diarrhea to the consumption of tequila.

It is the purpose here to propose basic principles of valid selection of subjects in "case-control" studies. They derive from a reassessment of the presumed nature of the "case-control" study.

THE ESSENCE OF THE "CASE-CONTROL" STUDY

A commonly held concept of the "case-control" study is that it is the alternative to the cohort study. The distinction is taken to be one of "sampling" in the sense that in a cohort study one "samples" people free of the illness, representing different categories of the causal factor (potential or known) under study, and follows them forward in time to learn about the development of the illness, while in a "case-control" study one selects subjects on the basis of presence or absence of the illness at issue and determines their histories regarding the causal factor. In other words, so goes the thinking, in a cohort study the investigative movement is "from cause to effect" and in a "case-control" study "from effect to cause" [3].

This concept of the essence "case-control" studies as representing the reverse of cohort studies I believe to be in error. I term it "the 'trohoc' fallacy," using Feinstein's [4] highly descriptive term for the commonly espoused reverse-of-cohort notion.

Almost as erroneous, and misleading, I consider the component notion that in "case-control" studies the concern is to compare cases with non-cases.

Any study—"case-control" or whatever—on the incidence of illness must be based on the incidence experience of a particular population as it moves over time. The study population may be defined by an event experienced by its members, with the membership lasting forever thereafter; or it may be defined by a state, lasting for the duration of that state. The two types of membership criterion define cohorts (i.e. adynamic, or closed populations) and dynamic (open) populations, respectively. They are exemplified, respectively, by the patients in a clinical trial—a cohort defined by the event of enrollment into it—and the catchment population (for a given illness) of a particular hospital—a dynamic population defined by the state that if the illness were to develop, one would go to the hospital. As another illustration of the duality, consider the Framingham Heart Study. The study population is a cohort, defined by the event of enrollment into it, in 1948, once and for all. An alternative to this would have been to follow the population of Framingham residents from 1948 onward—the catchment population for a case registry there—with people entering and exiting the resident status in the course of the study. Thus the term "cohort" refers to one of the two possible types of dynamics for the study population, and the alternative to a cohort study is a dynamic population study—rather than a "case-control" study.

Given a study population's experience over time, or a study base, it is necessary to ascertain the relevant facts about the occurrence of the illness in this experience. One approach to this is to employ a simple census, that is, to ascertain all of the relevant facts on all members of the study population, as is commonplace with populations (cohorts) involved in clinical trials. An alternative to this approach is one of combining census and sampling. First one uses a census of the base population as to outcome—to identify; cases. Then a second census is conducted on the cases to ascertain other facts (concerning the determinants, modifiers and confounders) on them. Finally, a sample of the base is used to obtain information of the latter type about it. This alternative to the census approach may be considered the census-sample or case-base approach [5]. It may also be termed the case-referent approach [6], since the study base, which the sample represent is the direct referent of the empirical pattern of occurrence in the study. On the other hand, the term "case-control" approach is a misnomer, as the base sample is no more a control series than a census of the base (referent) is.

The "case-control" term is, I believe, a reflection of the "trohoc" fallacy of the essence of this type of study. It reflects the misguided notion and practice of comparing cases with noncases in "case-control" studies. If a census of the base were used, then, in a simple situation the concern would be to compare the index rate r₁ = c₁/ B₁ with the reference rate r₀ = c₀/ B₀, c_i and B_i denoting the number of cases and the size of base segment represent the i-th category of the determinant. By no means would the natural comparison be between the case series and the base as to their distributions by the determinant. If a sample size b = b₁ + b₀ is drawn from the base B = B₁ + B₀ so as to estimate the relative sizes B₁ and B₀, then r'₁ = c₁/ b₁ and r'₀ = c₀/ b₀ are stochastically proportional to r₁ and r₀, so that empirical rate ratio is estimable as (c₁/ b₁) / (c₀/ b₀). The point is that the contrast between the index and reference categories of the determinant regardless of whether census or sample of the base is used. And the utility of appreciating this lies in its account on the study base—the population experience of which the reference series is to become representative sample (as to the distribution of the determinant).

Summarizing, the "case-control" study is not the alternative, or even an alternative of the the cohort study. A cohort study involves a cohort as its base population, and alternative is the use of a dynamic population. A "case-control" study involves a census-sample—case-base or case-referent—strategy of ascertaining the facts about the study base, and its alternative is a simple census.

DEFINITION OF THE STUDY BASE

With the "case-control" study viewed as a matter of census-sample approach to fact-finding about the study base, valid selection of subjects (case census and base sample) presupposes understanding of the definition of the study base, notably a duality in it. This duality corresponds, roughly, to the common distinction between "population-based" case-referent studies on the side and "hospital-based" ones on the other [3], but it needs clarification. For, in terms of the conceptualization of the essence case-referent studies proposed here, all of them are population-based—having to do with the experience of the base population over the time-frame of the study.

The base may be demarcated a priori—as, for example, the population (dynamic) of a particular metropolitan area over a particular period of calendar time [7]. With such a definition—a primary definition—of the base, the cases of interest are defined, secondarily, as the entirety of cases arising from the base so defined. The challenges are to ascertain those cases on a census basis and to obtain a proper sample of the base itself.

Alternatively, the cases may be defined a priori—as, for example, those appearing in a particular hospital over a particular span of calendar time. Such a case series is, I propose, to be thought of as a census of cases in the corresponding base by definition—with the definition of the base thus secondary to the case selection. The justification for this proposition is the imperative that the case series and the base sample be representative of the same population experience, that is, that they be coherent. With a census ascertainment of cases achieved by definition, the challenge is proper definition of the secondary base inherent in the case enrollment—and, thereupon, its proper sampling.

Given that the case series is the totality of cases in a secondary base, such a base must be the population experience (the entirety of it) in which each potential case, had it occurred, would have been included in the case series. The introductory example serves as an illustration of this. For the cases of traveller's diarrhea identified in an Acapulco hospital over a particular span of time the corresponding base is the experience, over that time span, of the population in which each potential case of traveller's diarrhea, had it occurred, would have appeared in the hospital and would have been enrolled in the case series. In other words, it is the experience of the hospital's catchment population for traveller's diarrhea over the period of case accrual. This definition of the base makes it obvious that neighbors and siblings are unlikely to be even members of the study base (secondary), let alone representative of it (as to tequila use).

Among actual studies, exceptionally illustrative is the International Agranulocytosis and Aplastic Anemia Study [8]. Enrolled are two types of case—the one that is admitted to hospital because of the disease, and that which develops during hospitalization. For the cases of the former type the corresponding population experience (study base) is that of the catchment population (for agranulocytosis and aplastic anemia) of the participating hospitals over the period of case enrollment—an out-of-hospital population experience. By contrast, for the cases developing during hospitalization the base is the in-hospital experience of all patients in whom such development would have been diagnosed in the participating hospitals over the study period, regardless of the admission diagnosis. Thus, for the two types of case, the respective base samples must be representative of the hospitals' catchment populations and their monitored patients, respectively. These statements imply that in that study the primary commitment was to hospital-identified cases, that the challenges were to properly define, and then to properly sample, the corresponding base experiences, with those definitions secondary to the means of case enrollment. This could have been the study plan, consonant with the goal of assessing rate ratios but not absolute rates. In point of fact, the goal was to assess absolute rates. Therefore, the base experiences were made quantifiable in absolute terms by the use of primary definitions for them—the experience of the population of West Berlin over the study period, for example. Thus the challenge was to identify all relevant cases of the two illnesses arising from those population experiences, and monitoring of all hospitals in the areas involved was judged to afford a reasonable approximation to the desired census ascertainment of such cases. It deserves particular note that even if the base population is of the primary type, it by no means needs to be "the general population" (whatever may be the meaning of this phrase). This principle is well ingrained in the clinical trial paradigm. In these trials one studies the incidence of health outcomes in relation to treatment—not in the "general population" nor the "general patient population" but in the particular type of patient enrolled in the trial. It accords, as well, with the outlook in laboratory science: nobody demands that the study population be representative of "the general rat population" or its counterpart in some other species; what matters is that one is clear on what particular type of population it is.

It is worthy of further note that the definition of the study base, be it primary or secondary, does not restrict the population to people "at risk" for the "exposure" (index category of the determinant under study) nor to those "at risk" for the illness at issue. If everyone in the study population were predestined to their status of "exposure" or "nonexposure," this would be no worse than "self-selection" in the context of being "at risk," especially if predestination were based on randomization (by the Lord). As for the risk of the illness, there is no imperative to have the study population consist of individuals of nonzero risk. The point is, instead, that the inclusion of people of known zero risk is a matter of waste and/or obfuscation.

Finally, the concept and definition of the study base are critical to the understanding of whether the case series should be representative of all cases [2,4], or whether instead this demand is "misplaced" [9]. As long as "all cases" means all cases arising from the study base, this quality is inherent in the census of cases—census ascertainment in the context of a primary base, and census-by-definition with a secondary one.

VALID SELECTION OF SUBJECTS

As was noted above, the overriding principle of subject selection in case-referent studies is that the case series and referent sample be representative of the same base experience.

When the base is defined in primary terms, the challenges are, as was noted, to obtain a census of cases in it together with a sample of the base itself, with the latter representative of the base as to its distribution by the determinant(s) of interest. The particulars of complete case ascertainment are matters of procedural technics etc. and the attainment of valid sample of such a base is a matter of sampling theory in general. Both of these topics are outside the scope of the presentation here.

When the base definition is secondary to case selection, and the case series thus valid by definition, the challenge is valid sampling of the base. This sampling is guided by definition of the secondary base. Thus, as a first example, for the population experience (population time) formed by people who would have come to the Acapulco hospital had traveller's diarrhea occurred in the time period of case accrual, an appropriate sample is constituted by people who did come there due to a condition which is known to be interchangeable with traveller's diarrhea as a reason for ending up in the Acapulco hospital, and whose occurrence is known to be unrelated to the use of tequila. Similarly, as a second example, for cases of agranulocytosis appearing in the study hospitals because of this disease, a suitable sample of the base (in respect to recent drug use) is patients who did come to those hospitals for some other acute condition which is known to be referred to the hospitals in the same circumstances (of geographic location etc.) as agranulocytosis is, and whose occurrence is known to be unrelated to the use of the drugs at issue. Neighbors, when not travelling, are likely to be members of this base population during the study period, but their histories of recent drug use are not representative of the the study base if the neighbors' histories are taken under circumstances that tend to be atypical in terms of recent drug use. In particular, if the history of recent exposure is taken as of the time of home interview, arranged at the neighbor's convenience, the history of the use of analgesics hours or days earlier is unlikely to be typical of such time segments in the total period of the neighbor's membership in the study population: he/she may be available at a particular time because a condition leading to analgesic use is present at that time, or because it is absent. As a third and final example, for cases of agranulocytosis developing in the participating hospitals (after admission), the reference series must be selected from the in-hospital population, without regard for the reason for hospitalization —in sharp contrast to the reference series for the cases who were hospitalized with, and because of, agranulocytosis.

As has been implied, in the selection of a hospital reference series for cases incident outside of the registry, the need is to be able to defend the diagnostic inclusions, not exclusions. The criteria for admissible diagnostic entities, given above, have to do with diagnostic entities as they represent the reason for entry into the hospital rather than incidental conditions [9]. In order that such a condition (primary-diagnosis) be, as a reason for hospitalization. similar to the illness under study, it is commonly important to choose diagnostic entities for which hospitalization is equally elective, or obligatory, as for the illness under study. In respect to incidental (secondary) diagnoses, the admissibility (non-exclusion) criteria for the reference subjects and the cases must be the same, as these series represent, respectively, the study base itself and the cases that develop in it.

If the cases are identified from a registry of deaths, then the selection of reference subjects must be guided by principles completely analogous to those guiding the selection of hospital reference subjects for cases identified from a hospital. The main principle is that for the cases (of death) so identified the corresponding secondary base is the catchment population of the registry for those deaths—the experience of an out-of-registry, living population. Thus, a proper sample of it is not, generally, deaths from all other causes indiscriminately but, insofar as a registry series is to be used at all, deaths from causes whose occurrence is unrelated to the determinant under study [11].

Thus far, in the context of a secondary base, the focus has been on the selection of the reference series—on the premise that the case series is valid by definition. It is important to note, however, that the feasibility of finding a proper base sample depends on how the case series is defined, because the definition of the secondary base is inherent in case selection. Consequently, the attainment of the cardinal condition of validity—coherence of the numerator (case) and denominator (reference) series in terms of the population experiences represent—may be enhanced by care in the definition of the case series itself. In particular, it is commonly helpful to restrict case (and, secondarily, reference subject) admissibility according to area of residence. The more the admissibility is restricted to the vicinity of the source of cases (hospital, say), the more likely it is that the corresponding reference subjects, had they developed the illness under study, would have presented themselves to the source, thus ensuring, at least, that they are members of the base population. This, in turn, enhances the likelihood that the reference series is representative of the base.

Whatever has been said here about representativeness refers to the distribution of the determinant(s) conditional on subject characteristics controlled in the analysis of the study. Thus, if the reference series is unrepresentative of the age—and thereby determinant—distribution of the base at large (merely by virtue of being a hospital series or as a result of matching by age), the imperatives of representativeness (concerning distribution of the determinant in the base) may still be satisfied conditionally on age; and with age controlled in the analysis, validity is maintained. The requirement of conditional validity here is analogous to that of conditional simple random sampling in the context of stratified sampling.

The issue of validity in the selection of subjects into a case-referent study is not simply a matter of coherence between the numerator (case) and denominator (reference) series as to what they represent—as census and sample, respectively. To be coupled with the imperative of representativeness of the same base is that of comparability, unrelated to representativeness. The issue is comparability as to accuracy of information of the determinant under study. For example, if the cases are interviewed in a hospital and their history-giving is influenced by this setting and/or the illness per se, it is necessary to assure, by selection or otherwise, similar influences for the reference series as well. A prime example of this comparability requirement is the study of the etiology of a congenital malformation—suggesting that the reference series be a series of other malformation(s) rather than of well babies [12].

SUMMARY

For valid selection of subjects in "case-control" (case-referent) studies it is critical to understand that these studies do not represent an alternative to cohort studies but, rather, to census-ascertainment of the facts about the study base. Specifically, in these studies, the fact-finding scheme is to obtain a census of the study base with respect to outcome, and then a census of the cases together with a sample of the base to gather information on the determinant(s) as well as modifiers and confounders. If the base is defined a priori (primary base), then the challenge is to devise a scheme to obtain a census of the cases in it and a sample of the base itself ("control" or reference series) that is representative of it, conditional on the covariates that will be controlled in the analysis of the data. On the other hand, if the definition of the base is secondary, a corollary of the way the cases are selected, then the case series is best viewed as the totality of the cases in the base, a matter of definition. The corresponding secondary base is the population experience in which each potential case, had it occurred, would have been included in the case series. Representative sampling of a secondary base tends to call for the use of subjects coming to the source of cases because of other conditions—conditions whose occurrence is known to be unrelated to the determinant under study and whose diagnosis and referral to source are known to have the same relation to the determinant as those of the illness und study. With both types of base, primary and secondary, the accuracy of the information on the determinant should be comparable between the case and reference series, and this requirement of comparability, just as that of representativeness, can have important implications for the selection of the study subjects.

REFERENCES

1. Ibrahim MA: (Ed): Case-Control study: Concensus and controversy. J Chron Dis 32: 1-90, 1979

2. Lilienfeld AM: Foundations of Epidemiology. New York: Oxford University Press, 1976. 283 pp

3. Schlesselman JJ: Case-Control Studies. Design, Conduct. Analysis. New York: Oxford University 1982. 354 pp

4. Feinstein AR: Clinical biostatistics XX. The epidemiologic trohoc, the ablative risk ratio, and "retrospective" research. Clin Pharmac Ther 14: 291-307, 1973

5. Miettinen OS: Design options in epidemiologic research. An update. Scand J Work Envir Health 8 (suppl. 1): 159-168, 1982

6. Miettinen OS: Estimability and estimation in case-referent studies. Am J Epid 102: 226-235, 1976

7. Cole PT, Monson RR. Haning H et al: Smoking and cancer of the lower urinary tract. N Engl J Med 284: 129-134, 1971

8. The International Agranulocytosis and Aplastic Anemia Study. The design of a study of the drug etiology of agranulocytosis and aplastic anemia. Eur J Clin Pharmac 24: 833-836, 1983

9. Jick H, Vessey MP: Case-Control studies in the evaluation of drug-induced illness. Am J Epid 107: 1-5, 1978

10. Horwitz RI, Feinstein AR: Alternative analytic methods for case-control studies of endometrial cancer. N Engl J Med 299: 1980-1994, 1978

11. Wang JD, Miettinen OS: Occupational mortality studies. Principles of validity. Scand J Work Envir Health 8: 153-158, 1982

12. MacMahon B, Pugh TF: Epidemiology. Principles and Methods. Boston: Little, Brown. 1970. 376 pp

J Clin Epidemiol Vol. 41, No. 8, pp. 709-713, 1988

Editors' Note—The controversy in this issue contains three parts. The first two parts represent separate dissents, by Miettinen and by Greenland and Morgenstern, from views previously published by Kramer and Boivin in this journal. The third part contains a response by the original authors. As noted in the accompanying editorial, the dispute involves a fundamental paradigm in epidemiologic research. Readers are invited to add their comments for publication in future issues.

STRIVING TO DECONFOUND THE FUNDAMENTALS OF EPIDEMIOLOGIC STUDY DESIGN

Olli S. Miettinen

Department of Epidemiology and Biostatistics, Faculty of Medicine, McGill University, Montreal, Quebec and Department of Theory of Medicine, Faculty of Medicine, Free University, Amsterdam, The Netherlands

Abstract—The fundamentals of epidemiologic study design have remained a matter of confusion. Most authors still see the main design options to consist of the "cohort" study and the "case-control" study, augmented by the "cross-sectional" study. Others regard these as options only with respect to the perceived "directionality" dimension of design decisions. Few have come to appreciate that, realistically, there are no options as to directionality in the usual sense of "following forward" vs "investigating backward", or in the related sense of "inferential reasoning" being "from cause to effect" vs "from effect to cause". Related to this, few appreciate that the perceived duality of options constituted by "sampling by exposure" and "sampling by outcome" is, similarly, but an illusion. Old illusions like these confound the discernment of even those who, today, strive to deconfound the fundamentals of epidemiologic study design.

1. CONFOUNDING AND A VISION FOR DECONFOUNDING

Kramer and Boivin, in an article that just appeared [1], start out with the observation that

Despite their common usage and general acceptance, existing classifications of research designs for epidemiologic studies are inconsistent and confusing.

They document their contention by references to a dozen or so eminent authors on the subject. The main source of the confusion, Kramer and Boivin write, is

the conceptual 'confounding' of three different aspects of research design: (1) directionality. (2) sample selection, and (3) timing... Directionality refers to the order in which exposure and outcome are investigated .. .: forward, from exposure to outcome; backward, from outcome to exposure; or simultaneously ... Sample Selection pertains to the criteria used to choose study subjects; it can be based on exposure, outcome, or other criteria. Timing concerns the relation between the time of the study proper and the calendar times of exposure and outcome; historical (exposure and outcome both occurred prior to the study); concurrent (exposure and outcome are contemporaneous with the investigation); or mixed timing.

The directionality aspect of design they take to involve the options of the "cohort", the "case-control" and the "cross-sectional" study:

We define a cohort study as a study in which subjects are followed forward from exposure to outcome. By definition, study subjects are free of the outcome at the time exposure begins. Inferential reasoning is from cause to effect... In case-control studies, the directionality is the reverse of that of cohort studies. Study subjects are investigated backwards from outcome to exposure, and the reasoning is from effect to cause. In cross-sectional studies . . .

Given this revelation, Kramer and Boivin set out to deconfound "the three distinct methodologic concepts... by organizing these concepts into a unified, 'unconfounded' classification of epidemiologic study design."

2. RECONFOUNDING AND REJECTION OF THE VISION

Now that Kramer and Boivin are on the move "toward an 'unconfounded' classification of epidemiologic research design"—accompanied by their half-a-dozen acknowledged consultants, and by approving referees besides—I find myself in a position to suggest what is likely to be in store for them beyond the station they have now reached: reconfounding and, finally, rejection of the vision.

Their future I surmise on the basis of my own past.

I was at their current stage of "unconfounded-ness" in the early 1970's. Viewing epidemiologic study design as a matter of "attempting to optimize a number of component decisions", I wrote [2, p. 3/1], in 1975, that

these component choices have to do primarily with the time-directionality of the inquiry in the sense of whether, upon admissibility, the interest is in subsequent, preceding or simultaneous experiences; the timing of the noted phenomena in relation to the outset of the research project;...

More specifically, as for the key issue of directionality, I wrote [2, p. 3/2]:

While the process under study has, of course, a forward directionality in time, the inquiry in research, insofar as it does take account of the duration of the process, moves either forward (follow-up study), backward (background or case history study) or both ways (ambidirectional study) in time. On the other hand, the duration may not be allowed for, the inquiry being momentary in time (cross-sectional study).

Subsequent experience has shown me, and may well continue to show to Kramer and Boivin, that this kind of attempt at the attainment of "a unified, 'unconfounded' classification" is well received. Writes Greenland [3], in 1986:

For me and my colleagues, Miettinen's unpublished course text of the early I970's was the first systematic introduction to theoretical epidemiology ... This text embodied landmark epidemiologic thinking, if only because it synthesized so many previously disconnected theoretical developments (many of them Miettinen's) into a coherent, unitary theory.

Reception aside, if Kramer's and Boivin's journey toward "unconfounded" fundamentals for epidemiologic study design will continue as a replication of mine, they are headed for a period of reconfoundedness: in 1985 I wrote about my early course texts [4, p. viii]:

Although various colleagues kept urging their publication, to me those texts seemed somehow fundamentally unsatisfactory. Indeed, for several years that followed I taught without any text. Instead, I felt it necessary to search, even in the class context, for the true essence of epidemiologic research as a foundation for its principles.

In this dark period I expect Kramer and Boivin, if they indeed will come to it, to agonize over their own writings of 1987, especially over the meaning, if any, of their own words echoing traditional tenets under the heading of "directionality" (Section 1).

If they do find themselves reconfounded, they may also end up concluding, as I did, that despite all the traditional commitment to the ideas of "directionality", this entire "axis of classification" must be rejected as something founded on nonsense; in other words, that needed is a completely new point of departure, a new agenda, for epidemiologic study design.

3. A NEW DEPARTURE, A NEW AGENDA

In the early 1970's my point of departure in epidemiologic study design was my upbringing in epidemiology and biostatistics: design, just as analysis, was a matter of methodology to me— a matter of the means toward a preset end. As a consequence, my academic eye was myopic and saw nothing but the processes of study. Seen were, in broadest terms, the processes of subject selection, information-acquisition on the subjects and, down the line, processes of data analysis, with the latter subsuming, in the sense of last resort, matters of contrast-formation and control of confounding. So it was clear, in this myopic realm, that "internal validity involves three distinct components: validity of selection, validity of information and validity of comparison" [2, p. 2/10]. With "internal validity" the key goal in study design —added concerns including "external validity", "efficiency" and "size"—the issues of design were such matters of process as "directionality", "timing", "control of confounding", etc. [2, Chap. 3].

In the beginning of the eighties—under the pressure of preparing on an Honarary Guest Lecture on epidemiologic study design [5] while still agonizing about the fundamentals—I took the desperate step of moving one step backward and asked myself what I should really mean by "study design" anyway. This made me take notice of the fact that "'design' means, in general, a vision and stipulation of an end result, as in 'the design of a boat', but not (the) scheme of accomplishing (that) end, such as that for the construction of a boat defined by the design (blueprint)" [4, p. 21]. While this definition in terms of end result was at variance with the concept of study design as a matter of means (only), I went on to adopt the synthetic view [4, p. 21] that

Both meanings can be given ... to 'study design' in epidemiologic research. Thus, this term can refer lo the plan for the end result in the sense of the type and quantity of empirical information the study is to yield, and to stipulations regarding the process of securing that information.

This new point of departure for conceptualizing the fundamentals of epidemiologic study design led to everything falling in place, rapidly and comfortably, at long last. It was immediately clear that the end result of an epidemiological study is an empirical occurrence relation, expressing an empirical frequency of the occurrence of the (outcome) phenomenon of interest in relation to some determinant(s) of this frequency, conditionally on some extraneous determinants [4, Section 1.3].

This kind of end result has a particular conceptual form; and it has a particular empirical content in terms of that form, naturally.

The form of the end result can be designed in detail, the component agenda being [3, pp. 21-22] the following:

1. The nature of the occurrence relation itself, including:
a. The outcome state(s) or event(s) whose occurrence the study is to address.
b. The parameter in terms of which the population occurrence of that phenomenon is to be characterized in the study.
c. The determinant(s) (potential or known) to which the occurrence parameter will be related.
d. The time relation between the outcome and the determinant status.
e. Modifiers (potential or known) to be considered.
f. Potential confounders on which the empirical relation is to be conditioned (particularly in the context of causal research.)
2. The domain of the empirical occurrence relation, that is, the type of situation for which the occurrence relation is to be studied.

As for the empirical content of the end result, only the approach to it is subject to be designed, naturally: what population experience (population-time) is to be captured as to the empirical occurrence, and how. Thus, design decisions have to be made on the following agenda [4, pp. 22-23].

1. The study base itself (the experience to be captured in the study) in terms of:
(a) Membership in the base population, with the sub-issues of:
(i) Eligibility criteria (which also determine whether the base population is a cohort or a dynamic population).
(ii) Distribution matrix (distribution of the base experience according to the determinants under study, modifiers of the relations, and confounders).
(iii) Size.
(b) Duration of the time period for outcome information, and the timing of this period.
2. The approach to harvesting the information in the study base, in terms of the following:
(a) The sampling design (simple census vs case-referent strategy).
(b) The scheme of information acquisition on members of the sample(s).

4. "DIRECTIONALITY": PURGED FROM THE NEW AGENDA

Greenland [3], while deeming the design agenda above to be "probably essential for future theoretical development", asserts that "most of the important concepts can be found elsewhere...".

What he does not say is that some of the concepts that are elsewhere regarded as centrally important are not part of the agenda he lauds.

Worthy of very careful note is the purge of "directionality" from the agenda of design. Thus, purged is the "axis of classification" that conduces to the "cohort" vs "case-control" duality [2, 1]; to the perceived availability of two symmetrical options, the "cohort" and "trohoc" options [3]; in short, to the "trohoc fallacy" [2].

To understand this purge, consider first the design of the conceptual form of the end result, of the empirical occurrence relation. None of the design topics under this broad heading involves the putative choice between "reasoning from cause to effect" and "reasoning from effect to cause" [1], whatever the intended meanings of those phrases may be. Nor does any of them involve a choice between subjects being "followed forward from exposure to outcome" and "investigated backwards from outcome to exposure" [1]. Only, designed is the form of a function which expresses a parameter of (the frequency of) occurrence in relation to some determinant(s), conditionally on some potential confounders [4, pp. 5-10]. In the context of the putative categories of "directionality" it bears noting that the relation is designed to be either longitudinal, with the time-referent of the determinant antecedent to that of the outcome, or cross-sectional, with the time-referent of the determinant simultaneous with that of the outcome [4, Section 2.4, pp. 226-227]. Thus the outlook in the design of an occurrence relation, inherently outcome-centered, is generally retrospective, with the cross-sectional outlook an extreme, special case of this. (Design of the domain of the relation involves none of these issues, naturally.)

In designing the approach to empirical content for the occurrence relation of the design form, there is no escape from the need to define a study population, either a closed one (a cohort) or an open one (a dynamic population) [4, pp. 48-53]. All populations move, inelectably, forward in time. Insofar as the concern is with the occurrence of an event (i.e. with incidence), a segment of the study population's forward movement must be captured in the study; in other words, the study based in incidence studies must be longitudinal, always. On the other hand, when the concern is with the occurrence of a state (i.e. with prevalence), a cross-section of the study population's forward movement will do; in other words, the study base in prevalence studies can be cross-sectional [4, p. 56]. Either way, the cases occurring in the study base must be identified and classified according to their determinant histories whenever the occurrence relation is designed to be longitudinal, otherwise according to determinant status. Again, there is no duality of options as to the directionality of "reasoning": incidence studies always require that the study population (not necessarily a cohort!) is followed forward; and studies with a longitudinal occurrence relation always require that information is secured on study subjects backwards from outcome, whether by means of prospective arrangements or retrospectively [4, pp. 84-85].

In sort, in epidemiologic study design,

—directionality of reasoning is not involved;

—directionality of the occurrence relation is always backward in time, except when, in the extreme, the relation is cross-sectional;

and

—directionality of the study population's movement is always forward in time, except when, in the extreme, arrested in a population cross-section.

In other words, no choice between forward and backward directionalities is involved.

5. "SAMPLE SELECTION" ON THE NEW AGENDA

Since, according to Kramer and Boivin, "The two concepts that have been most commonly confused are directionality and sample selection", the place of the latter, too, on the new agenda requires comment (considering that those authors, their consultants, and their referees are totally oblivious to the new agenda).

The study population in science is not to be construed as a sample of any "target" population—only as one of representatives of a domain of interest [4, pp. 44-47, 107]. It is formed, with a view to suitable distribution of the determinant(s) etc., with either allocation (in experiments) or selectivity within a source population [4, pp. 56-60].

In any case, in the study base that the study population provides, whether longitudinal or cross-sectional, the concern is, always, to identify all of the cases - to obtain the numerators of the empirical rates. As for the rate denominators, the design options are census and sampling of the base [4, pp. 69-73].

The notion of having the options of selecting study subjects "by either exposure or outcome" [1] is out a corollary of the "directionality" fallacy - a fallacy that I, too, suffered in the early I970's.

REFERENCES

1. Kramer MS, Boivin J-F. Toward an "unconfounded" classification of epidemiologic study design. J Chron Dis 1987: 40: 683-688.

2. Miettinen OS. Principles of Epidemiologic Research. Boston: Harvard; 1975 (unpublished course text).

3. Greenland S. Book review. Theoretical Epidemiology: Prlncpies of Occurrence Research in Medicine, by Olli S. Miettinen. New York: Wiley; 1985: 359 pp.

4. Miettinen OS. Theoretical Epidemiology: Principles of Occurrence Research in Medicine. New York: Wiley; 1985.

5. Miettinen OS. Design options in epidemiologic research: an update. Scand J Work Environ Health 1982; 8 (Suppl. 1): 159-168.

J Clin Epidemiol Vol. 42, No. 4, pp. 325-331, 1989

PRINCIPLES OF NONEXPERIMENTAL ASSESSMENT OF EXCESS RISK, WITH SPECIAL REFERENCE TO ADVERSE DRUG REACTIONS

Olli S. Miettinen¹ and J. Jaime Caro²

¹Department of Epidemiology and Biostatistics, Faculty of Medicine, McGill University, Montreal, Quebec and Department of Theory of Medicine, Faculty of Medicine, Free University, Amsterdam, The Netherlands; ²Division of General Internal Medicine, Royal Victoria Hospital, Montreal, Quebec, Canada

Abstract—The risk of a particular kind of adverse reaction to a particular agent must be thought of with reference to the contemplated type of exposure and the type of person potentially exposed, and critically important in the contemplated exposure is its duration. In the general case, a distinction has to be made between risk during exposure and that after its discontinuation. For both of these risks, modifications of their magnitudes especially by previous exposure to the same agent must be considered. These risks are studied by focusing on incidence densities specific to subintervals of the total periods of risk. Assessment of these densities is generally best accomplished by following a (very large) dynamic (open) population, not specified on the basis of exposure. Cases of the adverse event, without regard for their etiology, that occur in this source population over the period of follow-up need to be identified and classified, first as to whether they arose from the study population proper or from the extraneous segment of the source population. Those arising from the study population and characterized by "recent" exposure—implying potential causation by it—need to be classified according to the attained duration of exposure and, where applicable, time since its discontinuation at the time of the inception of the adverse event. An appropriate sample of the source population over its follow-up needs to be obtained and classified in like manner. Such data provide for estimation of the excess incidence-densities in various duration intervals of continuing exposure and also in the risk period after its discontinuation; and these estimates are translatable to estimates of the risks of interest—specific for particular types of both exposure and potentially exposed person.

INTRODUCTION

Recent controversy about the International Agranulocytosis and Aplastic Anemia Study (IAAAS) [1-6] brought to focus a general lack of clarity on the principles of nonexperimental assessment of absolute risks for adverse drug reactions. While there is some literature on those principles [7-9], no exposition sufficient for judging the IAAAS or planning for future studies seems to be available. In particular, whereas the "honorary advisory committee responsible for the design of the study" states that employed in the IAAAS was "the standard epidemiological techniques for calculating excess risks" [2], we are unaware of such a technique. The investigators themselves state that "the calculation is straightforward" [3], but their audiences have found it difficult to understand [4,5]. The IAAAS report itself [1] makes methodologic references to nothing but our own writings on the general subject of risk quantification, writings that fall far short of delineating the main principles of non-experimental assessment of adverse drug reaction risk, calculational and other. The Lancet editorial [5] found even the very measure, or concept, of excess risk in the study "rather odd". What is more, it went on to suggest as a substitute "the number of cases per million defined daily doses (DDD), or per 100,000 packs sold", and the study's advisory board embraced this proposal—which was not new [4]—as "one that we had made ourselves" [2]. The lack of literature about the principles of nonexperimental assessment of adverse drug reaction risk, coupled with high-level confusion about even its most fundamental aspects in reference to the IAAAS, led us to attempt a delineation of those principles, with special reference to research problems of the sort dealt with in the IAAAS. Two features define research situations of this type, neither one of them specific to risks attendant to drug use. First, the adverse event is rare even among the exposed and/or the delay before the adverse reaction is potentially quite long—each of these implying infeasibility of experimental research. Second, the typical duration of the pathogenic process does not differ dramatically from the typical duration of exposure, which makes it necessary to consider the risk both during continued exposure and after discontinuation of exposure.

This research situation we examine under the premise that nonexperimental research is feasible, specifically that precursor or early stages of the adverse event do not lead to changes in relevant exposure [10]—a premise that, incidentally, may not fully obtain with respect to the IAAAS.

THE RESEARCH PROBLEM

Scientific assessment of risks attendant to particular exposures is of practical consequence insofar as it bears on decisions about exposure—be they clinical, regulatory or whatever. In clinical practice such decisions have to do with a given course of treatment contemplated for current implementation, usually without much regard for potential future courses of treatment. In this context, the excess risk needs to be thought of in reference to the situation at issue: the particular type of contemplated exposure on the particular type of potentially exposed person. The type of exposure may be defined in terms of concentration/dose or whatever but, inescapably, the contemplated duration of exposure generally bears on its excess risk. The potentially exposed person, in turn, may be characterized with respect to age, contraindications or whatever, but never to be ignored is his previous exposure to the agent. For, previous adverse reaction to the agent tends to constitute a contraindication to re-exposure, whereas extensive previous exposure without the adverse reaction may be evidence of relative non-susceptibility to it.

Risk is the probability that an adverse event—including inception of an adverse state—will occur. Thus, the risk of an adverse reaction is the probability that the adverse event will occur as a reaction to a particular exposure, given that the exposure will take place. Risk in an individual with a given set of characteristics equals the cumulative incidence (theoretical) over the same period of time for populations (cohorts) with those characteristics [11, Appendix 1]. These rates result from incidence densities (expected numbers of cases divided by the respective population-times of follow-up) specific for the component periods involved.

In the context of an adverse event that is rare in the absence of the exposure at issue, the risk of its induction by exposure is, in practical terms, equal to the excess risk of the adverse event incurred by an exposed person on account of the exposure—over the time period in which such a reaction can occur*. In other words, the adverse reaction risk can be viewed as the risk difference over that period, contrasting exposure to comparable nonexposure. This practical risk is, more specifically, the absolute excess risk of the adverse event in the exposed as distinct from relative excess risk, the latter being a ratio measure of comparative risk between the exposure and an alternative to it.

*This conceptualization of the absolute risk of adverse reaction as the excess risk involves a subtlety: Let R represent the average risk that the adverse event would occur as an adverse reaction to the exposure, given exposure, in a particular population or potentially exposed people, and let R₁ and R₀ represent the average risks or the adverse event in this population conditionally on exposure and nonexposure, respectively. Then, R₁ = R + (1 - R) R₀, so that R ≈ R₁ - R₀—if susceptibility to induction of the adverse event by the exposure is independent of susceptibility to its induction by background causes. In the other extreme, if these susceptibilities are completely correlated, then R₁ = max (R, R₀) [11, Appendix 3. 12]. Thus, the excess risk among users, R₁ - R₀, does not really equal the adverse reaction risk, R, but it commonly is, nevertheless, the quantity of practical interest.

In quantification of excess risks for adverse reactions, it is necessary to consider, in broadest terms, two component periods: that during continued exposure and that after its discontinuation. When the contemplated exposure is of very long duration in relation to the duration of the pathogenic process (that from biological precipitation to the clinically evident adverse event), only the risk during continuation of exposure really matters; and when the contemplated exposure is of very short duration in this same sense, the risk after discontinuation of exposure represents practically the entire risk. In the intermediate, general situation, exemplified by agranulocytosis as an adverse reaction to common types of analgesic use, both components need to be considered expressly. The excess incidence (risk) density is not constant over time. The during-continuation excess density might initially increase with the duration of exposure but then decrease, due to depletion of susceptibles. The after-discontinuation excess incidence density decreases in time to zero, and thus the after-discontinuation excess risk is confined, as a practical matter, to a period singular in duration, independent of the duration of the antecedent exposure. Thus a limited, operational period of potential excess risk after discontinuation of exposure can be defined, making it long enough to encompass practically all adverse reactions occurring after the most recent exposure. On the other hand, if this period is made unduly long, both the efficiency (precision) and the validity of the study tend to be compromised. Since the duration of this period usually cannot be set a priori, due to incomplete knowledge of the distribution of the length of the pathogenic process, if not of the elimination of the agent or its metabolites, various segments of a total period deliberately chosen to be excessive must be considered.

In short, then, the research problem in the general case of nonexperimental assessment of excess risk is to quantify the incidence densities of the adverse event at issue separately for various time intervals of continued exposure and for various time intervals after "recent" discontinuation of exposure—"recent" in the sense of exposure within the defined period of after-discontinuation risk. These quantifications need to be made with allowance for other particulars of the exposure and the person at issue, notably history of previous exposure to the same agent. In addition, the incidence density in the absence of influence of the exposure at issue (no "recent" exposure) needs to be quantified, again with allowance for modification by the same set of person characteristics (except that no modification is expected by exposure before the "recent" period).

Quantification of the incidence densities is generally of concern only in the domain of no previous occurrence of the adverse event in association with the exposure, since history of such an occurrence generally constitutes a contraindication to further exposure. Other restrictions of the domain may be adopted as well.

STUDY DESIGN

Upon having designed the form of the risk function under the principles delineated above, it remains to design the terms of gaining empirical content for it.

Study of the incidence densities of interest must involve following a population over time. Despite rather common illusions to the contrary, there is no alternative to this [11, 13-15].

One might think of forming a cohort of exposed persons, each of them entered as of the inception of a particular episode of exposure (perhaps regardless of previous exposure), and of following its members over their defined, operational periods of risk. If such an index cohort were formed, it would be supplemented by a comparable reference cohort of non-exposed persons followed for an arbitrary period of time. Such an approach, especially in the context of nonprescription drugs, would generally be quite impractical, however. The rarity of adverse reactions would tend to require immense sizes for the cohorts. In the context of short-term exposures in conjunction with short induction periods, enrolment independently of the outcome would tend to constitute an un-surmountable problem; and in the context of quite long periods of risk, coverage of the risk period by following an entry cohort [11] would be quite impractical.

The feasibility problems inherent in that quasi-emulation of a clinical trial are alleviated substantially by electing to follow a population whose membership is not defined on the basis of entry or nonentry into exposure (or re-exposure) at a particular point in a subject's life. Any population, whether defined in terms of whatever enrolment event (and, thus, a closed population, a cohort [11]) or whatever state (and, thus, an open, or dynamic, population [11]) may be followed for the purpose. It may be given a primary definition, as when following a national population (dynamic); or the definition may be secondary to the way cases of the adverse event are identified, as when following the catchment population (dynamic) for the adverse event of a set of hospitals [11, 13].

At any given moment in its time course (follow-up), such a source population [11], whether open or closed, consists of three dynamic (open) subpopulations to be distinguished among:

(1) The index population, consisting of those from the study domain who have been "recently" exposed, that is, within the defined period for after-discontinuation risk (and who are, therefore, subject to incurring the adverse reaction at that moment).

(2) The reference population, defined by two criteria beyond those for the study domain:

(a) No "recent" exposure (and, hence, not subject to incurring the adverse reaction—even if subject to incurring the adverse event involved—at that moment).

(b) Comparability with the index population in terms of extraneous determinants of the manifest adverse event risk (notably contraindications, exposure to other etiologic agents, and determinants of the detection of the adverse event).

(3) The remaining, extraneous population.

The index and reference populations constitute the study population [11] at that moment; and it may be worth noting again that this population is open (dynamic) over time even when the source population is closed (a cohort). Also to be noted is that the index population consists, at any given moment, of sub-populations defined in terms of the phase of the risk associated with exposure—in broadest terms of people still exposed and those who have "recently" discontinued exposure, with both of these subpopulations composed of further subpopulations according to the attained duration of the current or "recent" exposure. Finally, the reference population, for it to be comparable, need not be identical to the index population in terms of extraneous indicators of risk, as long as it is similar to it and accurate data on these characteristics are recorded (for control of partial confounding in the analysis stage of the study).

As the source population is followed over time, the first-order concern in the context of a primary base is to identify all cases of the adverse event developing in it during the follow-up; but, failing this, it is essential the case detection remains independent of exposure status. Each case needs to be classified as to its origin with respect to the subpopulations of the source population, outlined above. When the source population is defined secondarily to case identification, all cases are identified as matter of definition [11,13], but again, the process that brings the cases to attention must be independent of exposure status.

For the purpose of quantifying the (person time) denominators of the densities of interest, a sample of the study base—study population over the period of follow-up [11, 13]—needs to be drawn. The principles of such sampling (even with special reference to the IAAAS) have been delineated elsewhere [11, 13], and they are not replicated here. Suffice it to make a couple of remarks on the important topic of matching - remarks that illustrate the distinction between the source population follow-up on one level and the study base on another (and have novelty in themselves, to us at least). If individual matching is employed in the selection of the base sample, such sampling should be prompted by only those cases which derive from the study population itself (excluding cases arising from the extraneous segment of the source population). Just as the initial series of cases are identified from the source population at large for practical reasons, so too are the case prompted samplings directed, operationally, to the source population. In a second stage, again members of the source population sample should be classified analogously to the first classification of the cases from it, according to their broadest origins; and again, only those members of the sample that derive from the study base itself should be taken to course - in the attainment of a set matching ratio, for example.

Information about previous exposure (before the inception of the ongoing or "recently" discontinued exposure—the index exposure) is needed for all index subjects (cases and members of the base sample derived from the index base). Such exposure is to be assessed as of the time of inception of the index exposure (even if this did not take place "recently"). How best to characterize the entirety of previous exposure, especially in its temporal aspects, needs to be judged in terms that are specific to the adverse event and, perhaps, to the type of exposure as well.

Information on other potential modifiers of the magnitude of the excess risk (modifiers other than previous exposure), and on potential confounders, needs to be secured on both the cases and on members of the base sample arising from the study base itself, if not on all subjects.

DATA ANALYSIS

Layout of data

In the usual situation in which the index and reference segments of the study base have different distributions according to extraneous determinants of risk for the adverse event (confounding), and also when appreciable modification of the excess risk (by age or whatever) is suspected, it is necessary, in tabular analysis, to form essentially homogeneous strata wording to the confounders/modifiers. For the i-th stratum the data layout and notation might be those in Table 1. The table implies, among other things, that subjects from the index population are classified first according to the category of duration of attained exposure (index i), and further, within it, according to category of duration of discontinuation (index j)—with continuing exposure represented by the first category of the latter (j = 0).

Overall incidence-density in the source population

For the source population's overall incidence density (ID) of the adverse event in the k-th stratum, the numerator (number of cases) is c*_k (Table 1)—if the ascertainment was complete.

The corresponding denominator (T*_k, population-time of follow-up) may not be available from outside sources of information. But if the source population was, as would be usual, dynamic, and a simple representative sample of it was obtained, then T*_k may be estimated as (b*_k / b*) T*, where b* is the size of the total sample and T* the total population-time of the source population's follow-up. On the other hand, if, as is usual, a dynamic source population was sampled in a manner that leads to non-representativeness of the sample regarding the source population's distribution by age etc. over the strata (as with sampling through cases of other illnesses [11, 13], or with matching), then it is necessary to make use of outside information as to the sizes of the stratum-specific populations or as to the corresponding overall rates (ID*_k) themselves.

Reference incidence-density

Given the stratum-specific overall incidence density (ID*_k), and the presumption that the exposure-classified cases and members of the base sample are (stochastically) representative of all cases (c*_k in number) and the entire source-base sample (b*_k in size) within the stratum, respectively, the stratum-specific rate (incidence density) in the reference population is estimable as

id_0k = [(c_0k / b_0k) / (c*_k / b*_k] (id*_k) (1)

In this formulation, c_0k / b_0k and c*_k / b*_k are (stochastically) proportional to ID_0k and ID*_k, respectively, so that their ratio is an estimate of ID_0k / ID*_k ; multiplication by id*_k then provides an estimate of ID_0k. An alternative formulation, applicable in the context of a representative sample (of size b*) of the source population's follow-up (of size T*), is

id_0k = [(c_0k / b_0k) / b* / T* (2)

If it is also realistic to take the reference population to be the entire non-index segment of the source population—which may be quite exceptional [11, 13-15]—then an alternative formulation for id_0k is

id_0k= (1 - ef_k) (id*_k) (3)

where ef_krepresents the estimate of the etiologic fraction [11, 16]—the proportion of cases representing adverse reaction to the exposure—in stratum k, i.e.,

ef_k = [(idr_k - 1) / (idr_k)] [(c_k - c_0k) / c_k] (4)

with idr_k = [(c_k - c_0k) / (b_k - b_0k)] / (c_0k / b_0k), the estimate of incidence density ratio contrasting the entire index experience to the reference experience in stratum k [17]. The proportion of cases arising from the index population is taken as (ck - c0k) / ck on the premise, noted above, that there is no extraneous segment in the source population (so that c'_k = b'_k = 0).

Index incidence-densities and excess risk

For incidence density in the (i,j) subcategory of exposure, the stratum-specific estimate is

id_(ij)k = [(c_(ij)k / b_(ij)k) / (c_0k / b_0k)] (id_0k) = (idr_(ij)k) (id_0k) (5)

where idr again represents the estimate of incidence density ratio (cf. equation 1).

The excess risk, or risk difference, over the i-th interval of duration, RD_(i0), then, has the stratum-specific estimate of

rd_(i0) ≈ (idr_(i0)k - 1) (id_0k) D_i (6)

where Di represents the length of the i-th sub-interval of duration of exposure. (If the risks were not low it would be necessary to avoid the approximation and derive the index risk estimate as r_(i0)k = 1 - exp [- ∑_i (id_(i0)k) D_i] and also to use an analogous formulation for the reference estimate, thereupon taking rd_(i0)k = r_(i0)k - r_0k.) In the context of a representative sample of the source base, the estimate may be taken as

rd_(i0)k ≈ (c_(i0)k / b_(i0)k - c_0k / b_0k) (b* / T*) D_i (7)

(cf. equation 2). Naturally, if T*_k is known from outside sources, then even a nonrepresentative sample of the source base at large, as long as it is representative within the strata, provides for taking

rd_(i0)k ≈ (c_(i0)k / b_(i0)k - c_0k / b_0k) (b*_k / T*_k) D_i (8)

The corresponding overall estimate for the i-th interval of duration, given presumed constancy of RD_(i0) over the strata, can be taken as

(9)

in the context of a representative sample of the source base at large. The first part of this expression is a pooled estimate of case vs base-sample (c / b) odds difference, analogous to an estimate that Mantel and Haenszel [18] gave for the corresponding ratio. This difference of quasi-rates (quasi-incidence-densities) is translated to actual incidence-density difference by multiplication by b* / T* and to risk difference by further multiplication by the duration of the risk period D_i. If the sample of the source base is representative within the strata only, then the corresponding estimate is

(10)

For the excess risk during continued exposure from its inception through interval i, the estimate may then be taken as the sum of the estimates, rd_(i0), for the intervals involved (with the summation again justified by the smallness of the component risks; cf. context of equation 6). By the same token, for exposure to be terminated within the i-th category of duration, only a fraction—one half perhaps—of the last interval-specific estimate is to be employed.

The incidence density for period j after "recent" discontinuation of exposure in category i of attained duration of use in stratum k (ID_(i,j)k, j > 0), has an estimate analogous to that given for ID_(i0)k in equation (5). Similarly, estimates of the excess risk in subperiod j after discontinuation of exposure after various durations of attained exposure are arrived at in analogy with equations (6-10); and for the excess risk over the entire after-discontinuation period, after a given duration of attained exposure, the estimate is again the sum of the components involved—over the period of significant excess in after-discontinuation incidence density of the adverse event.

The total excess risk associated with any potential exposure is, as was noted, approximately equal to the sum of the risks during and after the exposure, and an estimate of this is obtained, naturally, as the sum of the component estimates.

Epilogue

The presentation of data analysis here has been focused on point estimation in the context of tabular presentation (Table 1) of the data, because understanding of this topic is fundamental to everything else in the analysis of data on adverse reactions. A null chi square (χ²₀) may be derived in the manner [18] that is usual for stratified data—even if recast elsewhere [11, p. 144]—contrasting the entire index category to the reference category.

Analysis in terms of regression models is beyond the scope of the presentation here.

Acknowledgement—Thoughtful critique of the draft manuscript by Dr A. L. Gould is gratefully acknowledged, along with contributions by the Journal's referees and Editor (ARF).

REFERENCES

1. The International Agranulocytosis and Aplastic Anemia Study. Risks of agranulocytosis and aplastic anemia. A first report of their relation to drug use with special reference to analgesics. JAMA 1986; 256: 1749-1757.

2. Doll R, Lunde PKM, Moeschlin S. Analgesics, agranulocytosis and aplastic anemia. Lancet 1987; Jan 10: 101.

3. Levy M, Shapiro S. Safety of dipyrone. Letter to the Editor. Lancet 1986; Nov 7: 1033-1034.

4. Anonymous. Dipyrone hearing by the German Drug Authority. Lancet 1986; Sept 27: 737-738.

5. Editorial. Analgesics, agranulocytosis, and aplastic anemia: a major case-control study. Lancet 1986; Oct 18: 899-890.

6. Kramer MS, Lane DA, Hutchinson, TA. Analgesic use, blood dyscrasias, and case-control pharmacoepidemiology. A critique of the International Agranulocytosis and Aplastic Anemia Study. J Chron Dis 1987; 40: 1073-1081.

7. Jick H, Vessey MP. Case-control studies in the evaluation of drug-induced illness. Am J Epidemiol 1978; 107: 1-7.

8. Mann JI. Principles and pitfalls in drug epidemiology. In: Inman WHW, Ed. Monitoring for Drug Safety. Lancaster: MTP Press Ltd; 1986: 443^58.

9. Finney DJ. Statistical logic in the monitoring of reactions to therapeutic drugs. Ibid; 423-441.

10. Strom BL, Miettinen OS, Melmon KL. Post-marketing studies of drug efficacy: how?. Am J Med 1984; 77: 703-708.

11. Miettinen OS. Theoretical Epidemiology. Principles of Occurrence Research in Medicine. New York: Wiley; 1985.

12. Miettinen OS. Causal and preventive interdependence. Elementary principles. Scand J Work Environ Health 1982; 8: 159-168.

13. Miettinen OS. The "case-control" study: valid selection of subjects. J Chron Dis 1985; 38: 543-548.

14. Miettinen OS. Striving to deconfound the fundamentals of epidemiologic study design. J Clin Epidemiol 1988; 41: 709-713.

15. Miettinen OS. The clinical trial as a paradigm for epidemiologic research. J Clin Epidemiol 1989; in press.

16. Miettinen OS. Proportion of disease prevented by a given exposure, trait or intervention. Am J Epidemiol 1974; 99: 325-332.

17. Miettinen OS. Estimability and estimation in case-referent studies. Am J Epidemiol 1976; 103: 226-235.

18. Mantel N, Haenszel W. Statistical aspects of the analysis of data from retrospective studies of disease. J Natl Cancer Inst 1959; 22: 719-748.

J Clin Epidemiol Vol. 42, No. 6, pp. 491-496, 1989

THE CLINICAL TRIAL AS A PARADIGM FOR EPIDEMIOLOGIC RESEARCH

Olli S. Miettinen

Department of Epidemiology and Biostatistics, McGill University, Montreal, Quebec, Canada and Free University, Amsterdam, The Netherlands

Abstract—The extent to which the clinical trial serves, and fails, as a paradigm for epidemiologic research in general is examined. It is argued, first, that the traditional paradigms—investigating epidemic and endemic occurrence of illness in the context of public-health activities, inclusive of the deployment of census, vital and morbidity statistics and sample surveys—are misleading for scientific research. Major examples of the consequences of these paradigms are the preoccupations with time and place, and with "the general population" or some other "target population"—both alien from the vantage of clinical trials and, indeed, of science in general. Then it is shown, by the use of the clinical trial paradigm, that traditional epidemiologic thought and practice in cause-effect research are misguided in the context of such common contexts as the use of empirical contrasts between exposure and unspecified nonexposure, the employment of "representative" distributions of determinants, and, even, as to the belief that cohort and "case-control" studies constitute alternatives to each other. On the other hand, it is argued that for etiologic research the ordinary (parallel) clinical trial is misleading as a paradigm, especially as for learning about the essential temporal aspects of the cause-effect relation.

INTRODUCTION

In my teaching of theoretical epidemiology [1], the clinical trial has become a very important point of reference. For some central purposes it serves as the supreme paradigm, and for others, I need to caution my students about its inadequacies as a model. As for other teachers and authors, it is my impression that some [2, inter alia] have a tendency to make too much of it as a paradigm for nonexperimental epidemiologic research, while others [3, inter alia] fail to draw certain important lessons from the theory and practice of clinical trials. The issues are important and subtle, and the variation in views calls for an attempt at a systematic exposition of them.

As is well known, clinical trials are experimental studies on the effects of interventive agents (typically drugs) or interventions (treatments) themselves in clinical medicine. Regardless of whether the conceptual focus is on effects of interventive agents (explanatory trial [4]) or of treatments themselves (pragmatic trial [4]), empirically a clinical trial relates the occurrence of some outcome phenomenon to the categories of a treatment—conditionally on various extraneous determinants of the occurrence, or potential confounders. The interest in occurrence is directed, inherently, to the frequencies of occurrence of various categories of the phenomenon at issue, even if in the context of a quantitative phenomenon the frequency distribution may be characterized by some overall descriptive parameter, such as the mean. The relation of the outcome parameter to the determinant (treatment), so long as it is conditional on the entire set of potential confounders, unknown as well as known, can be given a causal interpretation. The requisite conditionality is pursued by randomization, possibly augmented by blocking and/or control in the analysis of the data.

The essence of epidemiologic research has remained less well understood. In recent years, I have come to the view and conviction that such research should be defined as occurrence research [1, p. vii], that is, research in which the occurrence of a phenomenon is related to some (potential) determinant(s) of that occurrence— commonly conditionally on some other determinants (potential confounders) [1, pp. 12-16].

These definitions imply that clinical trials are epidemiologic studies. This, coupled with the relatively advanced state of development of the theory and practice of clinical trials in comparison with epidemiologic research at large, implies the importance of seeking to understand the extent to which the clinical trial serves as a paradigm for nonexperimental epidemiologic research.

TRADITIONAL PARADIGMS

For a perspective on the role of the clinical trial as a paradigm for nonexperimental epidemiologic research, cognizance of the traditional paradigms is relevant. A general—and authoritative—sense of these may be gleaned from Lilienfeld's Foundations of Epidemiology [3].

With concern for occurrence of illness and related phenomena in "general" human populations, and with particular concern for etiology and prevention on the community level, the traditional epidemiologist relates rates of occurrence to time, place and personal characteristics; and the latter, while subsuming demographic, social, economic, behavioral and biologic ones, does not include clinical actions [3, pp. 1-2]. Given this traditional separation of epidemiology from clinical medicine, it is no wonder that the clinical trial does not, to the traditionalist, have any particular status as a paradigm for epidemiologic research at large.

Outstanding among the traditional paradigms of epidemiologic research is the investigation of epidemics, and a particularly eminent single investigation in this genre is John Snow's investigation of cholera epidemics in London in 1848-1854 [3, pp. 24-25]. (In witness to this, a picture of the epidemiologically famous Broad Street water pump adorns the cover of Lilienfeld's book, inter alia.)

Another major tradition that today's epidemiology still draws from is that of census, vital, and morbidity statistics on a national scale [3, pp. 22-23, Chaps 4-5], supplemented by morbidity surveys [3, pp. 115-116].

With deference to these traditions, the traditionalist takes the broad strategy of epidemiologic (etiologic) inquiry to consist of two phases: first, relating occurrence rates in "general" populations to characteristics of those populations in the aggregate, and second, relating individual health outcomes to characteristics of individuals [3, pp. 13-15]. (These two types of inquiry have been termed "descriptive" and "analytic" epidemiology, respectively—in defiance, or ignorance, of general principles of science. Properly, the "analytic method" in science is one in which complex phenomena are "resolved into more simple concepts or general principles so that there will be a transition from the particular to the general" [5]; and in any case, "descriptive" and "analytic" do not properly denote alternatives to each other, because meaningful description requires "penetrating analysis" [6]. On the other hand, descriptive problems contrast with ones that are inferential as to causation [1, pp. 11-12].)

The success of John Snow, and others, in applying "the epidemiologic method" to etiologic problems led to the emergence of epidemiology as a "study" [3, p. 1] defined by its concern for occurrence of illness and related phenomena in human populations in the community—with special reference to etiology but without any restriction as to subject-matter (as to the-type of health phenomenon or etiologic agent), and with a strong commitment to the methodologic traditions delineated above. Even though concern for etiologic, or occurrence, aspects of phenomena cannot properly define a science, on account of the diversity of subject-matter (cf. morphology), epidemiology—in the subject-matter sense—is viewed by the traditionalist as a field, and even as a science. This outlook is manifest in professorships, societies, journals etc. devoted to "epidemiology". (That the methodology in epidemiologic research be both scientific and singular is not, philosophically, a proper basis for regarding epidemiology as a science [7].)

The methodologic traditions of epidemiology, apart from having given rise to a malformed field of "study" (in the subject-matter sense), are themselves quite inadequate for the purposes of etiologic research itself, to say nothing about the other realms of occurrence research. Some of these shortcomings will come to focus in examining, in the sections below, the utility of the clinical trial as a paradigm for occurrence research.

TIME AND PLACE

"The epidemiologist is interested in the occurrence of disease by time, place and persons" [3, p. 1], even though such particularism of interest (spatio-temporal specificity) is generally alien in the context of clinical trials and, indeed, in science in general [7]. In clinical trials, traditional epidemiologists' contrary belief [3, pp. 223-224] notwithstanding, there usually is no particularistic "target population" of interest—no population to be "sampled" and to be treated as the referent, or target, of sample-to-population inference. Instead, clinical trials usually address abstract (nonparticularistic) questions—referring to abstract domains, denned not by time and place but by type (as to indication, age and other person-characteristics) —in the spirit of science in general [7].

What the clinical trial paradigm teaches, here, is that to the extent that in etiologic research the concern is, as it usually is, merely with a cause-effect relation of etiologic interest, the particularistic outlook should give way to the abstract one. On the other hand, actual, ultimate questions of etiology, having to do with causal explanation of cases that actually have occurred [1, p. 326], involve an added component issue, namely realization or distribution of the determinant and of modifiers of its effect—a particularistic issue [1, pp. 255-256]. Thus, whatever may be the cause-effect relation between smoking and the incidence/risk of lung cancer (an abstract issue), the etiologic role of smoking is nil in cases, or in times and places, characterized by absence of smoking (a particularistic characterization of the realization or distribution of the determinant).

When the concern is with an etiologically relevant cause-effect relation (abstract) rather than an actual etiologic problem (particularistic), adoption of the clinical trial paradigm as to the irrelevance of time and place leads to major benefits in terms of both validity and efficiency of research, as illustrated in the sections below. These benefits may be viewed as resulting from liberation from the traditional preoccupation with (particularistic) "general populations" or other "target populations" and the associated survey outlook in epidemiologic research. That preoccupation and outlook is natural when epidemiologists are officers of public health and, thus, concerned with community diagnosis about occurrence, epidemic or endemic; but, as has been noted, it is alien to science [7].

In clinical trials, as in all scientific (abstract) occurrence research, there is an expressly designed, particularistic study population, formed within an explicitly selected source population. The study population represents the domain (abstract) of the study [1, pp. 44-45] instead of "the general population" or any other particularistic "target population"; it is not construed as a sample of any population [1, Chap. 3]. The clinical scientist's outlook is in accord with that of the laboratory scientist, who does not dream of, or pursue, sampling of the "general" rat population in any community of rats.

DETERMINANT SCALE AND CONTRAST

The traditional epidemiologist, preoccupied with "the general population", commonly studies the effect of a potential etiologic agent by contrasting the exposed segment of that (or a related) "target population" with the remainder in it, that is, by contrasting exposure with nonexposure, with the latter unspecified apart from the inherent absence of exposure [3, Chaps 8-9).

In a clinical trial concerned with the effect of an agent (explanatory trial [4]), treatment with the agent is contrasted with comparable treatment without the agent (placebo treatment), not with mere absence of the treatment involving the agent. This is understood to be necessary for isolating, empiricaly, the effect of the agent from that of the extraneous aspects of the treatment with the agent. Only the theoretical contrast is, in its simplest terms, one between presence and absence of the agent [1, p. 30].

The lesson to be gleaned from this, cardinal, feature of the clinical trial paradigm is that, in reference to any loosely defined source population in causality-oriented nonexperimental research, the common two-point scale (exposure vs nonexposure, say) for the determinant in empirical terms is generally indefensible. In the source population there are, at any given moment, people representing the empirical index category of the determinant, embodying the agent at issue (cf. treatment with agent); and there are, of course, people not representing the index category—commonly the vast majority.

Of the latter, non-index, segment of the source population, some represent the reference category of the empirical scale of the determinant, characterized by comparability of extraneous effects with the empirical index category of the determinant [1, pp. 30-31, cf. treatment with placebo]; the remainder—commonly the vast majority—fall in the extraneous, "other" category of the determinant's empirical scale (cf. falling outside the realm of the clinical trial). With this trichotomy pertaining to the source population, the study population must be viewed as a subpopulation of it, consisting of representatives of the index and reference categories but not of the extraneous category, the clinical-trial paradigm being very clear on this. Thus, in proper nonexperimental research on cause-effect relations the study population generally comprises only a small subsegment of the source population (to say nothing about "the general population"—whatever the definition of the latter may be) [1, pp. 29-36, 218-227]. A monument to the still common failure to appreciate the distinction between the non-index range of the empirical scale of the determinant and a proper reference category, as a subsegment of this range, is the concept and problem of "the healthy worker effect"—arising from a contrast between an index population and "the general population" as the reference population [1, pp. 32-33].

That the need to appreciate the essence of the experimental contrast as a paradigm for non-experimental epidemiologic research is not felt is, perhaps, the prime example of the "double standards" [8] still prevalent between the experimental and nonexperimental modalities of cause-effect research—with major implications for the validity of the latter.

DETERMINANT DISTRIBUTION

Inherent in the traditional contrast-formation in epidemiologic study design is a propensity to allow the distribution of the source ("target") population to determine the distribution of the determinant(s) in the study population itself [3, Chap. 9]. The famous Framingham Heart Study is an example, and to some perhaps even a paradigm [3, pp. 199-201], of this: the study population was formed as representative sample of the source population, even when selectivity according to various determinants of risk would have been feasible on the basis of the data from the baseline survey. The practice is rooted in the notion that "the general population" or some other "very large population" is an ideal "target" of generalization, and that an ideal study population is derived by probability sampling of such a "target" population [9]. (This outlook, a derivative of the traditional paradigms of epidemiologic research noted above, again reflects defiance, or ignorance, of established principles of science, specifically as to the essence and foundations of scientific generalization [1, pp.47, 108-109].)

Theory and practice of clinical trials (and of laboratory research) are to the effect that deliberate choice of the distribution (allocation) of experimental units among the compared categories of the determinant is free of any negative influence on validity yet of great relevance for efficiency of study—with equal allocation optimal in the context of any single contrast with equal unit costs between the categories [1, pp. 55-60].

That no allocation occurs in nonexperimental research (by definition) is not a rational basis for not following the clinical trial paradigm. Determinant-selective admission into the study population, within any source population, remains perfectly feasible in most instances [1, pp. 30, 57], and failure to take advantage of this is inexcusable in the context of expensive follow-up. An example of this error of design, much more startling than the Framingham Heart Study, or even the subsequent Collaborative Perinatal Project [10], is a truly mammoth cohort study just recently initiated in China [11].

BASIC TYPES OF STUDY

The most central piece of traditional dogma in epidemiologic research is that there are, in broadest terms, two fundamental types of epidemiologic study (with individuals as the units of observation): the "cohort study", contrasting persons in different categories of a determinant as to subsequent occurrence of (categories of) an outcome phenomenon, and the "case-control study" contrasting persons in different categories of an outcome phenomenon as to an antecedent distribution of a determinant [3, Chaps 8-9].

In an attempt to understand this, consider the clinical trial: the concern is, always, to contrast categories of treatment as to subsequent occurrence of (categories of) some outcome phenomenon, whereas comparing patients in different categories of outcome as to the antecedent distribution of treatment is uninteresting if not downright perverse [2, 12].

That the "case-control" outlook, as defined, has no place in clinical trials suggests that it has no place in cause-effect research in general. And indeed it does not: in all research (applied) on the causation of a phenomenon, a sufficient, and indeed the only proper, concern is to compare the occurrence of categories of this outcome phenomenon among categories of the determinant at issue [1,2, 12, 13]. To this end one must always aim at classifying all members of an explicit study population as to the outcome phenomenon and, in the context of a binary outcome, always classify the identified cases according to their histories of the determinant— to obtain the numerators of the empirical outcome rates for the compared categories of the determinant. And with equal force, one must always assess the sizes of the corresponding rate denominators in the entire study population, that is, classify its members (over the follow-up) according to the determinant, analogously to the classification of the cases. The latter may be done in absolute (census) or relative (sampling) terms; but either way, the proper concern is to contrast the determinant categories as to outcome rates themselves (involving census denominators) or the corresponding quasi-rates (involving sample denominators); the concern is not to contrast the determinant distributions of the cases and the study population, nor of cases and noncases.

In short, the clinical trial paradigm shows that the notion of a "case-control" study, as defined, is but a fallacy ("trohoc fallacy" [12]) and an illusion [13], and not a true alternative to anything. For the employment of a cohort (closed population) as the study population the true alternative is that of using a dynamic (open) population [1, pp. 48-53]; and, as was observed above, for the census approach to the assessment of rate denominators (in addition to numerators), the alternative is sampling of the study population [12; 1, pp. 69-73]—the census-sample, case-base or case-referent strategy.

Integral to the "case-control" fallacy, to its illusion of a fundamental symmetry in design options, is the traditional notion that the case series and the "control group" of noncases should have identical distributions according to various extraneous determinants of the outcome occurrence or, if they do not, special measures should be taken to avoid bias [3, pp. 167-168].

That this notion is wrong becomes apparent upon examination of the clinical trial paradigm. In a trial with identical age-distributions of subjects between the index (agent) and reference (placebo) treatments, comparison of the index and reference rates without regard for age is not biased on account of age, however strongly age may determine the outcome. By the same token, comparison is also valid, without regard for age, between the corresponding quasi-odds of outcome, for which the case series supplies the numerators and a representative sample (in lieu of census) of the noncases provides the denominators; their ratio is a valid estimate of the actual outcome odds ratio contrasting the two treatments. That the case and non-case series have different distributions by age is, even in the context of unconfounded study base, the expected consequence of age being an extraneous determinant of the outcome rate; the difference is not, in such a situation, a manifestation of confounding, and no cause for remedial action [1, pp. 69-70]. Even if the notion is modified so that the pursuit of identity of distributions between cases and noncases is taken to pertain to predictors of determinant status, or to extraneous outcomes related to the determinant, the presumption of symmetry fails again, as examination of the clinical trial paradigm will readily show.

BASIC OUTLOOK

If there is one outstanding context in which the traditional epidemiologist does emulate the practices in clinical trials, it is the context of the most basic outlook in "etiologic" (generally mere cause-effect) research. He dreams of, and pursues, a cohort obtained as a simple random sample of "the target population", classified as of its formation ("baseline", zero time, t₀) according to "exposure" categories and followed into the future so as to detect subsequent cases—with the aim of relating the incidence of their occurrence to the status at the baseline [3, p. 194]. This is exactly what is done in clinical trials, short of any sampling, naturally, even if complexities arise from noncompliance, for example; and yet this is at variance with the proper basic outlook in etiologic research, the traditionalist—and even revisionist [2]— epidemiologic presumption notwithstanding.

The baseline has meaning in clinical trials but not, generally, in nonexperimental cohort studies. Its meaningfulness in clinical trials derives from the built-in artifice that the determinant status undergoes intentional perturbation at the baseline only: as a result of subject selection, the determinant commonly has at t₀a history of constancy; and in any case, the intention is that after this point it is artificially kept constant for each subject in its assigned category. In nonexperimental cohorts the determinant commonly has a history of inconstancy before t₀, and in any case the intention is not to keep it constant after t₀. In nonexperimental cohort studies it is, therefore, necessary to account, expressly, for the time course of the determinant both before and after the cohort's t₀, attributing no special import to the determinant status as of the cohort's t₀ ; indeed, even in a clinical trial outcome occurrence is not actually related to determinant status at t₀ (which is in transit), but to futuristic determinant allocation at t₀.

There is an even larger fallacy in any attempted emulation of clinical trials in etiologic research, in that the ordinary ("parallel") clinical trial is not suited, even in principle, for being a paradigm in this context. Consider a couple of examples. If all people at early age (t₀) were assigned to, and thereafter continually maintained at, different but constant levels of blood alcohol, one would never learn that it is the level at the time of accident outcome (accident or no accident) that alone matters in etiologic explanation of cases; that the history does not. Similarly, if people at large, at suitably early age, were assigned to, and maintained at, different but constant levels of daily physical activity, one would never learn that the level hours before the outcome in terms of myocardial infarction or no myocardial infarction matters in the sense of causation (precipitation); that the levels during an antecedent period matter in the sense of prevention; and that yet earlier levels do not matter at all.

As those examples suggest, the proper outlook in etiologic research, with keen appreciation of the relevance of time, is that of anchoring the scale of time, individually, to the moment of outcome classification; of forming intervals of (individual) time backward from that moment; and, then, thinking of, and treating, the realizations of "the" determinant at the various intervals of retrospective time as realizations for interval-specific separate determinants, commonly mutually confounding [1, pp. 33-34, 226-227], of incidence density. The dream, properly, is that those specific to ranges of determinants retrospective time do not exhibit an unmanageable degree of collinearity. Preoccupation with a cohort's t₀ [2], as was noted above, is misguided [13] in this context; and the added point was that historical constancy of the determinant as of the time of outcome—inherent in parallel clinical trials—is tantamount to intractable confounding among the separate, time-specific historical determinants of current incidence density.

This inherently retrospective, inherently multi-determinant outlook in meaningful etiologic research even in referrence to a single agent, outlook in which time is inherently anchored to that of outcome classification, cannot be captured by the use of the (parallel) clinical trial as a paradigm.

Acknowledgements—This work was supported by the EMGO Institute, Free University, Amsterdam. Critique of the manuscript by Dr J. Fleiss is gratefully acknowledged.

REFERENCES

1. Miettinen OS. Theoretical Epidemiology. Principles of Occurrence Research in Medicine. New York: Wiley; 1985.

2. Horwitz RI. The experimental paradigm and observational studies of cause-effect relationships in clinical medicine. J Chron Dis 1987; 40: 91-99.

3. Lilienfeld AM. Foundations of Epidemiology. New York: Oxford University Press; 1976.

4. Schwartz D, Lellouch J. Explanatory and pragmatic attitudes in therapeutic trials. J Chron Dis 1967; 20: 637-648.

5. Van Laer PH. Philosophy of Science. New York: The Philosophical Library Press Ltd; 1956: Part I, pp. 72-73.

6. Thompson JA. Introduction to Science. Cambridge, Mass.: The University Press; 1911: 40.

7. Friend JW, Feibleman J. What Science Really Means. London: George Allen Unwin; 1937: 110-111, 149, 179.

8. Feinstein AR, Horwitz RI. Double standards, scientific methods, and epidemiologic research. N Engl J Med 1982; 307: 1611-1617.

9. Moore FE. Committee on design and analysis of studies. Am J Publ Health 1960; 50: 10-19.

10. Heinonen OP, Slone D, Shapiro S. Birth Defects and Drugs. Littleton: Publishing Sciences Group; 1977: Chap 2.

11. Hammond EC, You W-C, Wang L-D. Possible contribution from epidemiologic studies. Environ Health Persp 1983; 48: 107-111.

12. Miettinen OS. The "case-control" study: valid selection of subjects. J Chron Dis 1985; 38: 543-549.

13. Miettinen OS. Striving to deconfound the fundamentals of epidemiologic study design. J Clin Epidemiol 1988; 41: 709-713.

14. Miettinen OS, Caro JJ. Principles of nonexperimental assessment of excess risk, with special reference to adverse drug reactions. J Clin Epldemiol 1989; 42: 325-331

J Clin Epidemiol Vol. 42, No. 6, pp. 499-502, 1989

UNLEARNED LESSONS FROM CLINICAL TRIALS: A DUALITY OF OUTLOOKS

Olli S. Miettinen

Department of Epidemiology and Biostatistics, McGill University, Montreal, Quebec, Canada and Free University, Amsterdam, The Netherlands

Dr Feinstein and I, it turns out, have simultaneously and independently addressed the same topic, namely epidemiologists' failure to learn important lessons from the design of clinical trials for application in nonexperimental epidemiologic research [1,2]. Recognizing this, Dr Feinstein, as Editor of this Journal, suggested that we each comment on the other author's piece to enhance the learning. I readily agreed, because our writings on the same topic are so very different that without efforts to bridge the two the reader may end up quite confused.

That our writings are so different is not, I hasten to underscore, inherently a manifestation of major disagreement. For, our frames of reference are quite different [3,4], which is not to say that they are contradictory. In the adoption of a frame of reference, the concern is to find a framework that is most intelligible, or least confusing—conditionally on it being intellectually tenable [5-7].

I shall examine Dr Feinstein's article from the vantage of my own terms of reference [2,4-6], naturally. My focus in this will be on the lessons, or principles, that Dr Feinstein adduces—without any regard for the extent to which they have remained unlearned for the purposes of nonexperimental epidemiologic research. These principles I shall examine in the sequence of Dr Feinstein's presentation.

In an epilogue I shall set forth the essence of the difference in our outlooks regarding the principles—and thus the lessons—embodied in the design of clinical trials.

Introduction

Of the design-dependent virtues of applied medical science, Dr Feinstein sets out to address validity alone—implicity to the exclusion of issues of efficiency and relevance.

With reference to validity, Dr Feinstein seems to place his focus on the desideratum that randomization is designed to help deliver, namely avoidance of what he terms "susceptibility bias". This is, in my terms, the desideratum of comparability of populations—a component in the broader concern to assure freedom from intractable confounding [4; pp. 29-36].

Admission criteria

Under the heading of "Admission Criteria", Dr Feinstein addresses those criteria which have to do with the domain of study [4; pp. 44-45] but not those which pertain to the actual study population within that domain [4; pp. 48-56]. He emphasizes the narrowness of the domain in terms of presence of a particular indication and absence of all contra-indications in clinical trials; and he takes the view that this embodies a lesson of import to nonexperiment,al research —a lesson on an important means of pursuing comparability of populations in the absence of randomization.

This lesson is less than compelling to me. The main reason is that, as Dr Feinstein explains, such a restriction of study domain in clinical trials does not spring from the purpose of pursuing comparability of populations. An added reason is Dr Feinstein's failure to distinguish between studies of intended effects on one side and those of unintended effects on the other.

A study of an intended effect (efficacy) of a therapeutic agent must generally be confined to a specific domain of indication, not for reasons of validity of comparison but in order for the result to be of any meaning even conceptually, let alone practically. Absence of contra-indications, by contrast, is not essential conceptually. In nonexperimental studies of efficacy, of necessity within particular domains of indication, details of the indication tend to constitute an intractable problem of confounding—in the particular sense of incomparability of populations [8, 9, 4; p. 40].

By contrast, studies of unintended effects (whether adverse or beneficial) of therapeutics, in order for them to have conceptual meaning, need not necessarily be confined to any restricted domain in terms of either indications or contra-indications. As for validity of comparison, indication—its presence or absence— has no inherent tendency to be a confounder in these studies, whereas proper contra-indications pertaining to the adverse outcome at issue do represent confounders, inherently [9,4; p. 40]. Thus, in studies of unintended effects, indication can commonly be ignored, whereas relevant contra-indications cannot. As for relevant contra-indications, practical interest, together with availability of clinical experience, suggest restriction of the study domain to their absence [10]. With such a restriction, confounding by the contra-indication is always avoided totally —in sharp contrast to what restriction of domain to presence of indication means in studies of efficacy.

Given these principles of mine, I have to disagree with Dr Feinstein regarding his statement that, as a generic example, "case-control studies of the teratogenic (or oncogenic) hazards of pharmaceutical substances administered during pregnancy", when not involving restriction of the domain according to "therapeutic indications for the pharmaceutical treatment", "will regularly be distorted by susceptibility bias". This statement would be correct only if therapeutic indications during pregnancy regularly were, in themselves, indicative of unusual risks for malformations (or cancers).

Dr Feinstein's distinction, in this context, between studies of "therapeutic agents" on one side and those of "etiologic agents" on the other side is confusing to me. For, as is manifest in his writing (see paragraph above), therapeutic (and preventive) medications can be of concern as etiologic agents. Properly, I suggest, the issue here is whether that which is said about studying effects of intervention in clinical medicine extends to studies of cause-effect relations in general. Logically, the answer must be affirmative—insofar as the nonclinical agent has indications or contra-indications.

The issue of "protopathic bias" is a case in point. As defined and illustrated, it is a matter of confounding by a precursor of the outcome, serving as an indication or contra-indication for the exposure (or imposure). Whereas this phenomenon is the epitome of potential confounding in clinical intervention studies of the nonexperimental sort, Dr Feinstein's example illustrates how, in principle, it can mar non-experimental studies on other topics as well; but it does not illustrate need to restrict the study domain to presence of indication and absence of contra-indications. The "different example" of "protopathic bias" addresses, alas, quite another issue. While confounding is primarily a matter of the study base, the example illustrates the point that the base must not be sampled by the use of cases of another illness such that their histories as to the determinant are distorted by "protopathic" changes. The issues of correct sampling are very subtle [11,4; pp. 79-83], and nothing about them can be learned from the principles of clinical trials.

The "healthy worker effect" is another case in point, primarily an example of confounding by contra-indication. Again, the solution is not inherently a matter of restricting the study domain to that of the absence of all possible contra-indications, to say nothing about presence of indication. Instead, the relevant lesson from clinical trials is, in my view, that an index occupation, representing the presence of the agent at issue, is to be contrasted to a suitable "placebo" occupation free of that agent yet comparable to the index occupation in respect to extraneous determinants of the occurrence of the outcome at issue [12,4; pp. 30-31].

The "third problem" potentially conducive to incomparability of populations in etiologic research on nonmedicinal agents is not of the form of potential confounding by indication or contra-indications in clinical trials: familiar longevity or any given personality type does not serve as an "indication" or a "contraindication" for etiologic exposures, any more than, say, maleness does.

Performance criteria and decision

Under the heading of "Performance Criteria and Decision", Dr Feinstein turns to the lessons that should be learned from the post-randomization phase of a clinical trial. He emphasizes the experimental principle of making analytic contrasts on the basis of the result of randomization, regardless of what intervention actually took place; and he laments that this principle is "almost totally disregarded in statistical analyses for non-randomization studies", as "people are classified and analysed according to what they actually received, which may or may not be what was initially assigned".

He seems to overlook the fact that the experimental principle, having to do with anchoring comparability of populations on randomization, is inherently inapplicable in nonexperimental studies. In his pursuit of the elusive analogy, Dr Feinstein seemingly equates the result of randomization with that of judgement and, thus, unreasoned experimental "intention-to-treat" with nonexperimental "original reasons for choosing a particular agent"—that is, with the very forces that randomization is designed to neutralize.

Whereas I have probably—and regrettably— failed to understand Dr Feinstein's thinking here, my own outlook on contrasts in intervention research is one of focussing on domains ("nodes") of interventive decision-making and contrasting its associated potential strategies (algorithms) of intervention [4; pp. 35-36]; and the reason for this is the pursuit not of comparability but of relevance. And in non-experimental contexts I worry about the reasons for choosing a particular strategy as confounders. For coping with such confounding, the clinical trial suggests the counterpart of blocking (i.e. matching of the study base [4; p. 60]) together with control in the analysis.

Detection bias

Under "Detection Bias" Dr Feinstein addresses the imperative to secure comparability of outcome information between/among the subpopulations representing the compared categories of the determinant in the study.

Transfer bias

Having already read about the principle of intention-to-treat analysis under the heading of "Performance Criteria and Decisions" (see above), it is confusing to me to find, under the subsequent heading of "Transfer Bias", that "a fourth principle in randomized trials is that the results be analysed and reported for the complete cohort of people originally admitted to the study". Regardless, it remains a mystery to me what the connection is between that principle and the possibility of "transfer bias" or whatever other bias in forming the compared subcohorts. Moreover, to me a cohort is a cohort [4; pp. 48-51], and I know of no reason to think of components in it in terms of its "original" members on one hand and those by which the cohort is "augmented" on the other hand; in fact, I do not even understand what this distinction is. "Decrementing" a cohort, in turn, is unthinkable to me, as a cohort is a closed population by definition [4; p. 48].

The first "example" of "transfer bias" is to me an example of confounding by (severity of) indication for hospitalization. As for appreciating the confounding and coping with it, there is again nothing that the nonexperimental researcher needs to learn from clinical trials. Indeed, while the experimentor has placed reliance—often excessive—on randomization, it has been the nonexperimental researcher who has toiled with the essence of confounding, its detection, and the means, beyond randomization, to control it.

The second "example" is not about confounding. It has to do with studying change (or incidence) over a time span that is too long for coverage by following a cohort from its entry to the period of interest throughout its span. The issues of study design in this context are complex, and the approaches are inherently nonexperimental [4; pp. 62-66]. In these matters, again, there is nothing to be learned from the design of clinical trials, which, inherently, address only the earliest segment of such a long period of interest [13,4; p. 63].

High quality basic data

Aiming at the "improving quality of data, rather than preventing biased comparisons" (cf. Detection Bias), is the "last scientific principle in randomzied trials" according to Dr Feinstein.

Epilogue

Overall, then, a clinical trial has, according to Dr Feinstein, five features illustrative of generally applicable principles of validity in epidemiologic cause-effect research, as follows:

1. Domain definition according to presence of indication and absence of contraindications.

2. Analysis on an intent-to-treat basis.

3. Detection of outcome events in an unbiased way.

4. Analysis on the basis of the entire cohort.

5. Data acquisition with attention to accuracy.

I have presented disagreement with the principle putatively illustrated by the first feature, and with some of the lessons Dr Feinstein draws from the first, second and fourth feature.

With those five features constituting the scientific essence of a clinical trial from the vantage of validity when viewed through the eye of Dr Feinstein's mind, it remains for me to delineate the validity gestalt of a clinical trial as I see it.

Most broadly, I see ends, and I see means. As for ends, I see [4; p. 30] focus on internal validity; I see this treated more as a matter of comparability than of total accuracy; and I see it broken down to comparability of (1) extraneous effects of the treatments (effects other than those of the agents at issue—in explanatory trials [14]), (2) populations (representing the compared treatments), and (3) observations (between/among those populations). And as for means, I see [4; p. 30] the use of (1) placebo treatment (for end no. 1), (2) randomization, commonly augmented by blocking and increasingly also by control of confounders in data analysis (for end no. 2), and (3) blinding (for means no. 1, end no. 1, and end no. 3). Looking at clinical trials in juxtaposition to (proper) nonexperimental epidemiologic studies, still from the vantage of validity, I see absolutely no difference in the ends, by principle; and I see absolutely unavoidable differences in the means, by definition. In the means, the quintessence of the difference is that where the experimenter engages in manipulation, the nonexperimental researcher is left with the substitute of selection [4; pp. 30, 57]—properly in the sense of selective admissibility within the study domain, and not in the still prevalent, misguided sense of sampling of a set "target population" [2,4; pp. 47, 72-73].

The duality of our outlooks aside, Dr Feinstein and I are of one mind in our fundamental concern to promote adherence to proper scientific standards in nonexperimental epidemiologic research; and in this context I owe Dr Feinstein an expression of my appreciation of the standards in his initiative to draw my critique of his paper. I have offered it from the stance of great respect.

REFERENCES

1. Feinstein AR. Epidemiologic analyses of causation: the unlearned scientific lessons of randomized trials. J Clin Epidemiol 1989; 42: 481-489.

2. Miettinen OS. The clinical trial as a paradigm in epidemiologic research. J Clin Epidemiol 1989; 42: 491- 496.

3. Feinstein AR. Clinical Epidemiology. The Architecture of Clinical Research. Philadelphia: WB Sanders; 1985.

4. Miettinen OS. Theoretical Epidemiology. Principles of Occurrence Research in Medicine. New York: John Wiley; 1985.

5. Miettinen OS. Design options in epidemiologic research. Scand J Work Environ Health 1982; 8 (Suppl. 1): 7-14.

6. Miettinen OS. Striving to deconfound the fundamentals of epidemiologic study design. J Clin Epidemiol 1988; 41: 709-713.

7. Greenland S, Morgenstern H. Classification schemes for epidemiologic research designs. J Clin Epidemiol 1988; 41: 715-716.

8. Miettinen OS. Efficacy of therapeutic practice: will epidemiology provide the answers? In: Melmon KL, Ed. Drug Therapeutics: Concepts for Physicians. New York: Elsevier-North Holland; 1980: 201-208.

9. Miettinen OS. The need for randomization in the study of intended effects. Stat Med 1983; 2: 267-271.

10. Miettinen OS, Caro JJ. Principles of risk assessment, with special reference to adverse drug reactions. J Clin Epidemiol 1989; 42: 325-331.

11. Miettinen OS. The "case-control" study: valid selection of subjects. J Chron Dis 1985; 38: 543-549.

12. Wang JD, Miettinen OS. Occupational mortality studies: principles of validity. Scand J Work Environ Health 1982; 8: 153-158.

13. Miettinen OS, Ellison RC, Peckam GJ el al. Overall prognosis as the primary criterion of outcome in a clinical trial. Contr Clin Trials 1983; 4: 227-237.

14. Schwartz D, Lellouch J. Explanatory and pragmatic attitudes in therapeutic trials. J Chron Dis 1967; 20: 637-648.

Source: Olli S. Miettinen. Theoretical Epidemiology. ~ Selected issues from the "Journal of Clinical Epidemiology", 1985-9