“Problems of Selecting Experts for Delphi Exercises,” Academy of Management Journal (March 1972), Vol. 15:1, pp. 121-124

 

Gordon Welty

Wright State University

Dayton, OH 45435 USA

 

 

[//121] The Delphi technique of aggregating the forecasts of a number of experts on multidisciplinary issues/1/ is a recent development in long-range forecasting. By sequentially polling the experts' opinions, interspersed with feedback of information on the just-previous poll to the experts, consensus is generated. Meanwhile, anonymity is guaranteed in the polling procedures so that social psychological pressures usually present in the committee and face-to-face group approaches to multidisciplinary issues are avoided./2/  The Delphi technique is presumably more efficient than the usual committee, and efficiency considerations presuppose cost-sensitivity in forecasting exercises.

 

A crucial factor for the Delphi exercise is, clearly, the selection of experts; experts require honoraria, etc., which laymen do not. Only in relatively cost-insensitive forecasting exercises would this not be a crucial consideration. The selection problem has been confronted on two levels.

 

On one hand, there has been an attempt to distinguish greater and lesser subject-matter expertise among a given group of experts. On the other hand, there has been an attempt to distinguish within given forecasting-subject matter the relevance of expertise to forecasting. We will briefly note some of the literature on the former consideration, and, then, we will turn to the latter consideration and present empirical findings which bear upon it.

 

While a number of the developers of Delphi have discussed the problem of distinguishing among levels of expertise, it is far from resolution. Brown and Helmer, for example, propose that one's degree of expertise might be ascertained by a self-assessment of forecast subject-matter knowledgeability (1964). More recently, an attempt to test this self-assessment technique for selection of experts or for weighting of expert opinions [121/122] showed equivocal findings (Bender et al, 1969, p. 12). Two British investigators were so dissatisfied with the outcome of self-weighting in a Delphic study that they state that they "would not introduce the complication of self-weighting in any future study" (Catling and Rodgers, 1971, p. 144). It seems clear that the problem of differentiating among various levels of expertise requires more research. Let us now turn to the problem of distinguishing classes of judgmental forecasting issues for which expertise is required.

 

An important early inquiry into the parameters governing judgmental forecasting exercises was "Project Outcomes." As part of this long-term effort, 900 students and 778 legislators in 7 countries were sequentially polled on their anticipations of cold-war outcomes. The principal investigator, Professor Nehnevajsa, noted that the majority of all correlations (78.3 percent) exceeded 0.76 (1962, p. 12).  This strongly suggests that differences in anticipations (or predictions of outcomes) are not substantial between the students and legislators; whereas, expertise on internationally and domestically held values presumably is held by the legislators.

 

Further heuristic documentation that expertise in the domain of prediction of values is nonexistent is contained in the Project Outcomes report of Stanley Shively, where he states: “There are innumerable minor differences of response between the respondents from the various countries, and … this difference was generally greater than that found between the students and legislators” (1962, p. 56 and Tables 2, 3, 4, 20, 21, and 22).   Although the Project Outcomes studies were similar to Delphic exercises in many ways, they were not expressly Delphi-type exercises and did not take into account explicitly the possibility of interaction effects. However, they do pinpoint the problem under consideration.

 

It is widely acknowledged that the forecasting of values is as important as the more conventional forecasting of technological breakthroughs, because, for example, technological efforts are dependent upon a complex set of value judgments for findings.  Most definitions of technological forecasting make reference to "levels of support," which presuppose a set of values. Witness the travails to which NASA has recently been subjected and consider the effect of this "reshuffling of national priorities" upon "purely technological" forecasts of the likelihood of putting a man on  Mars by the year 2000!

 

In a Delphi-like study of the future of American values (Baier and Rescher, 1969, p. 6) Nicholas Rescher polled a selection of future-oriented individuals, focusing on "high-level scientists and science administrators."/3/  In an earlier study, we replicated a substantial portion of Rescher's research. Although the questions posed were identical to those of Rescher's generic Question 2, and comparable procedures were used throughout, the respondents were radically [122/123] different. Instead of high-status scientists, sophomore engineering students were selected, so the forecasters' discipline was relatively constant but the level of expertise of the forecasters varied considerably. Our findings are presented elsewhere; suffice it to say here that the relevance of expertise to the forecasting of values was not evidenced (Welty, 1971).

 

We have just completed the replication of another substantial portion of Rescher's study. The questions here were identical to those of Rescher's generic Question 3, and comparable procedures were used throughout.   However, in this replication not only did the respondents differ in terms of level of expertise, but also they differed in terms of their academic disciplines. Forty-three (predominantly junior) sociology majors constituted the group of lay forecasters.

 

For each of Rescher's 17 items (representing a value of American culture in the year 2000 A.D.), an opinion of the probable change in emphasis was elicited on a five-point scale, ranging from 1 (greatly increased emphasis) to 5 (greatly decreased emphasis) (Rescher, 1969, p. 144). Each item mean was computed and compared by the F-ratio with the (rescaled) item means reported by Rescher. (For Item 10, computational errors in Rescher's report forced us to eliminate it from consideration.) Because the covariance structure of Rescher's data was unavailable, it was not possible to compute a single multivariate F-ratio for the overall comparison. Instead, we have computed a univariate F-ratio for each of the 16 items in the fashion of posterior pairwise analysis.

 

For 14 of the items, we found no significant difference at p = 0.05 between the mean responses of Rescher's respondents and the student respondents. Rescher's respondents expected significantly more emphasis on "material" values and significantly less emphasis on "spiritual" values than did the student respondents. We have included, in Table 1, the mean values and standard deviations for each item responded to by Rescher's experts and the students.

 

Thus, we can conclude that our earlier finding that states that expertise is not relevant to the forecasting of values is sustained. The two items which differed could best be explained as evidencing the "idealism" which is widely prevalent (and fostered) among students./4/ Thus, in cost-sensitive value forecasting exercises it is highly questionable whether "experts" have any function other than providing prestige to the exercise.

 

Finally, we might remark that Quinn's prescription simply to eschew long-range forecasting of global and distal values, such as those illustrated here, in favor of short-run projections of proximate goals over a few years, avoids a basic question (1967, pp. 104-105).  While the selection of experts may be less problematic for such short-run forecasting exercises, this [123/124] has not been demonstrated. Further, a question still remains as to the nature of the parameter that specifies the short-run horizon in which the use of experts in judgmental forecasting is warranted. This question clearly requires further empirical study.

 

TABLE 1

Values

Experts

Students

 

m

s

m

s

1.

Self-regarding (e.g. prudence)

2.90

0.99

2.81

1.14

2.

Other-regarding

2.39

0.91

2.19

0.96

3.

Material*

2.14

0.79

3.00

1.23

4.

Spiritual * *

3.04

0.91

2.44

1.30

5.

Aesthetic

2.35

0.81

2.47

1.03

6.

Religious

3.87

0.72

3.74

1.58

7.

Personal

2.67

1.04

2.37

1.09

8.

Social

2.14

0.66

2.16

0.81

9.

Local

3.29

0.87

2.98

1.21

10.

National

-

-

3.72

1.22

11.

International

2.15

0.64

1.93

0.99

12.

Prowess

2.96

0.89

3.00

1.23

13.

Intellect

2.14

0.82

2.02

0.83

14.

Character

2.83

0.94

2.63

1.16

15.

Self-oriented (e.g. prestige)

2.87

0.87

3.14

1.19

16.

Parochial

3.62

0.69

3.63

1.09

17.

Humanitarian

2.69

0.96

2.30

1.17

* Sig. p< 0.001, univariate F, df = 1, 93

** Sig. p< 0.01, univariate F, df = 1, 93

 

 

 

 

 

NOTES

1. Cf. Norman Dalkey, "The Delphi Method," Proceedings of the American Statistical Association (Social Statistics Section), 131st Annual Meeting (Washington, D. C., 1972).

 

2.  See Andrew van de Ven and Andre L. Delbecq's review of the pertinent literature in their "Nominal Versus Interacting Group Processes for Committee Decision Making Effectiveness," Academy of Management Journal (June 1971), pp. 203-212.

 

3.  Rescher gives a justification of his selection of these experts. Cf. Nicholas Rescher, "The Study of Value Change," Journal of Value Inquiry, Vol. 1 (1967), p. 21.

 

4.  Cf. Seymour M. Lipset, "Students and Politics in Comparative Perspective," Daedalus, Vol. 97 (1968), p. 11.

 

 

 

REFERENCES

 

1. Bender, A. D., et al., "A Delphic Study of the Future of Medicine" (Philadelphia: Smith, Kline and French Laboratory, Research and Development Division, 1969).

 

2. Brown, Bernice, and Olaf Helmer, "Improving the Reliability of Estimates Obtained  from a Consensus of Experts," P-2986 (Santa Monica: RAND Corporation, September 1964).

 

3. Catling H., and P. Rodgers, "Forecasting the Textile Scene," R & D Management, Vol. 1 (1971).

 

4.  Nehnevajsa, Jiri, "Anticipations of Cold War Outcomes," presented at the Annual Meetings of the American Association for Public Opinion Research (Lake George, New York: May 18-20, 1962).

 

5. Quinn, James B., "Technological Forecasting," Harvard Business Review, Vol. 45:2 (1967).

 

6. Rescher, Nicholas, "A Questionnaire Study of American Values by 2000 A.D." in K. Baier and N. Rescher, eds, Values and the Future (New York: Free Press, 1969).

 

7.  Rescher, Nicholas, "Delphi and Values," P-4182 (Santa Monica: RAND Corporation, September 1969).

 

8. Shively, Stanley, "Analysis of Cold War Outcomes Which the Major Powers Are Seen as Desiring Most and Least," AFOSR-1851 (January 1962).

 

9.  Welty, Gordon, "A Critique of Some Long-Range Forecasting Developments," Contributed Papers of the 38th Sessions of the International Statistical Institute (Washington, D. C.: August 1971).  [124//]