“Problems of
Selecting Experts for Delphi Exercises,” Academy of Management Journal
(March 1972), Vol. 15:1, pp. 121-124
Gordon Welty
Wright State University
Dayton, OH 45435 USA
[//121] The Delphi
technique of aggregating the forecasts of a number of experts on
multidisciplinary issues/1/ is a recent development in long-range forecasting.
By sequentially polling the experts' opinions, interspersed with feedback of
information on the just-previous poll to the experts, consensus is generated.
Meanwhile, anonymity is guaranteed in the polling procedures so that social
psychological pressures usually present in the committee and face-to-face group
approaches to multidisciplinary issues are avoided./2/ The Delphi technique is presumably more
efficient than the usual committee, and efficiency considerations presuppose
cost-sensitivity in forecasting exercises.
A crucial
factor for the Delphi exercise is, clearly, the selection of experts; experts require
honoraria, etc., which laymen do not. Only in relatively cost-insensitive
forecasting exercises would this not be a crucial consideration. The selection
problem has been confronted on two levels.
On one hand,
there has been an attempt to distinguish greater and lesser subject-matter
expertise among a given group of experts. On the other hand, there has been an
attempt to distinguish within given forecasting-subject matter the relevance of
expertise to forecasting. We will briefly note some of the literature on the
former consideration, and, then, we will turn to the latter consideration and
present empirical findings which bear upon it.
While a
number of the developers of Delphi have discussed the problem of distinguishing
among levels of expertise, it is far from resolution. Brown and Helmer, for
example, propose that one's degree of expertise might be ascertained by a
self-assessment of forecast subject-matter knowledgeability (1964). More
recently, an attempt to test this self-assessment technique for selection of
experts or for weighting of expert opinions [121/122] showed equivocal findings
(Bender et al, 1969, p. 12). Two British investigators were so dissatisfied
with the outcome of self-weighting in a Delphic study that they state that they
"would not introduce the complication of self-weighting in any future
study" (Catling and Rodgers, 1971, p. 144). It seems clear that the
problem of differentiating among various levels of expertise requires more
research. Let us now turn to the problem of distinguishing classes of
judgmental forecasting issues for which expertise is required.
An important
early inquiry into the parameters governing judgmental forecasting exercises
was "Project Outcomes." As part of this long-term effort, 900
students and 778 legislators in 7 countries were sequentially polled on their
anticipations of cold-war outcomes. The principal investigator, Professor
Nehnevajsa, noted that the majority of all correlations (78.3 percent) exceeded
0.76 (1962, p. 12). This strongly suggests
that differences in anticipations (or predictions of outcomes) are not
substantial between the students and legislators; whereas, expertise on
internationally and domestically held values presumably is held by the
legislators.
Further
heuristic documentation that expertise in the domain of prediction of values is
nonexistent is contained in the Project Outcomes report of Stanley Shively,
where he states: “There are innumerable minor differences of response between
the respondents from the various countries, and … this difference was generally
greater than that found between the students and legislators” (1962, p. 56 and
Tables 2, 3, 4, 20, 21, and 22).
Although the Project Outcomes studies were similar to Delphic exercises
in many ways, they were not expressly Delphi-type exercises and did not take
into account explicitly the possibility of interaction effects. However, they
do pinpoint the problem under consideration.
It is widely
acknowledged that the forecasting of values is as important as the more
conventional forecasting of technological breakthroughs, because, for example,
technological efforts are dependent upon a complex set of value judgments for
findings. Most definitions of
technological forecasting make reference to "levels of support,"
which presuppose a set of values. Witness the travails to which NASA has
recently been subjected and consider the effect of this "reshuffling of
national priorities" upon "purely technological" forecasts of
the likelihood of putting a man on Mars
by the year 2000!
In a
Delphi-like study of the future of American values (Baier and Rescher, 1969, p.
6) Nicholas Rescher polled a selection of future-oriented individuals, focusing
on "high-level scientists and science administrators."/3/ In an earlier study, we replicated a
substantial portion of Rescher's research. Although the questions posed were
identical to those of Rescher's generic Question 2, and comparable procedures
were used throughout, the respondents were radically [122/123] different. Instead
of high-status scientists, sophomore engineering students were selected, so the
forecasters' discipline was relatively constant but the level of
expertise of the forecasters varied considerably. Our findings are
presented elsewhere; suffice it to say here that the relevance of expertise to
the forecasting of values was not evidenced (Welty, 1971).
We have just
completed the replication of another substantial portion of Rescher's study.
The questions here were identical to those of Rescher's generic Question 3, and
comparable procedures were used throughout.
However, in this replication not only did the respondents differ in
terms of level of expertise, but also they differed in terms of their academic
disciplines. Forty-three (predominantly junior) sociology majors constituted
the group of lay forecasters.
For each of
Rescher's 17 items (representing a value of American culture in the year 2000
A.D.), an opinion of the probable change in emphasis was elicited on a
five-point scale, ranging from 1 (greatly increased emphasis) to 5 (greatly
decreased emphasis) (Rescher, 1969, p. 144). Each item mean was computed and
compared by the F-ratio with the (rescaled) item means reported by Rescher.
(For Item 10, computational errors in Rescher's report forced us to eliminate
it from consideration.) Because the covariance structure of Rescher's data was
unavailable, it was not possible to compute a single multivariate F-ratio for
the overall comparison. Instead, we have computed a univariate F-ratio for each
of the 16 items in the fashion of posterior pairwise analysis.
For 14 of
the items, we found no significant difference at p = 0.05 between the mean
responses of Rescher's respondents and the student respondents. Rescher's
respondents expected significantly more emphasis on "material"
values and significantly less emphasis on "spiritual" values
than did the student respondents. We have included, in Table 1, the mean values
and standard deviations for each item responded to by Rescher's experts and the
students.
Thus, we can
conclude that our earlier finding that states that expertise is not relevant to
the forecasting of values is sustained. The two items which differed
could best be explained as evidencing the "idealism" which is widely prevalent
(and fostered) among students./4/ Thus, in cost-sensitive value forecasting
exercises it is highly questionable whether "experts" have any
function other than providing prestige to the exercise.
Finally, we
might remark that Quinn's prescription simply to eschew long-range forecasting
of global and distal values, such as those illustrated here, in favor of
short-run projections of proximate goals over a few years, avoids a basic
question (1967, pp. 104-105). While the
selection of experts may be less problematic for such short-run forecasting
exercises, this [123/124] has not been demonstrated. Further, a question still
remains as to the nature of the parameter that specifies the short-run horizon
in which the use of experts in judgmental forecasting is warranted. This
question clearly requires further empirical study.
|
TABLE 1 |
|||||
|
Values |
Experts |
Students |
|||
|
|
m |
s |
m |
s |
|
|
1. |
Self-regarding
(e.g. prudence) |
2.90 |
0.99 |
2.81 |
1.14 |
|
2. |
Other-regarding |
2.39 |
0.91 |
2.19 |
0.96 |
|
3. |
Material* |
2.14 |
0.79 |
3.00 |
1.23 |
|
4. |
Spiritual
* * |
3.04 |
0.91 |
2.44 |
1.30 |
|
5. |
Aesthetic |
2.35 |
0.81 |
2.47 |
1.03 |
|
6. |
Religious |
3.87 |
0.72 |
3.74 |
1.58 |
|
7. |
Personal |
2.67 |
1.04 |
2.37 |
1.09 |
|
8. |
Social |
2.14 |
0.66 |
2.16 |
0.81 |
|
9. |
Local |
3.29 |
0.87 |
2.98 |
1.21 |
|
10. |
National |
- |
- |
3.72 |
1.22 |
|
11. |
International |
2.15 |
0.64 |
1.93 |
0.99 |
|
12. |
Prowess |
2.96 |
0.89 |
3.00 |
1.23 |
|
13. |
Intellect |
2.14 |
0.82 |
2.02 |
0.83 |
|
14. |
Character |
2.83 |
0.94 |
2.63 |
1.16 |
|
15. |
Self-oriented
(e.g. prestige) |
2.87 |
0.87 |
3.14 |
1.19 |
|
16. |
Parochial |
3.62 |
0.69 |
3.63 |
1.09 |
|
17. |
Humanitarian |
2.69 |
0.96 |
2.30 |
1.17 |
|
* Sig. p< 0.001, univariate F, df = 1, 93 ** Sig. p< 0.01, univariate F, df = 1, 93 |
|
|
|
||
NOTES
1. Cf.
Norman Dalkey, "The Delphi Method," Proceedings of the
American Statistical Association (Social Statistics Section), 131st Annual Meeting
(Washington, D. C., 1972).
2. See Andrew van de Ven and Andre L. Delbecq's
review of the pertinent literature in their "Nominal Versus Interacting
Group Processes for Committee Decision Making Effectiveness," Academy
of Management Journal (June 1971), pp. 203-212.
3. Rescher gives a justification of his
selection of these experts. Cf. Nicholas Rescher, "The Study of Value
Change," Journal of Value Inquiry, Vol. 1 (1967), p. 21.
4. Cf. Seymour M. Lipset, "Students and Politics
in Comparative Perspective," Daedalus, Vol. 97 (1968), p. 11.
REFERENCES
1. Bender,
A. D., et al., "A Delphic Study of the Future of Medicine"
(Philadelphia: Smith, Kline and French Laboratory, Research and Development
Division, 1969).
2. Brown,
Bernice, and Olaf Helmer, "Improving the Reliability of Estimates
Obtained from a Consensus of
Experts," P-2986 (Santa Monica: RAND Corporation, September 1964).
3. Catling
H., and P. Rodgers, "Forecasting the Textile Scene," R & D
Management, Vol. 1 (1971).
4. Nehnevajsa, Jiri, "Anticipations of
Cold War Outcomes," presented at the Annual Meetings of the American
Association for Public Opinion Research (Lake George, New York: May 18-20,
1962).
5. Quinn,
James B., "Technological Forecasting," Harvard Business Review,
Vol. 45:2 (1967).
6. Rescher,
Nicholas, "A Questionnaire Study of American Values by 2000 A.D." in
K. Baier and N. Rescher, eds, Values and the Future (New York: Free
Press, 1969).
7. Rescher, Nicholas, "Delphi and
Values," P-4182 (Santa Monica: RAND Corporation, September 1969).
8. Shively,
Stanley, "Analysis of Cold War Outcomes Which the Major Powers Are Seen as
Desiring Most and Least," AFOSR-1851 (January 1962).
9. Welty, Gordon, "A Critique of Some
Long-Range Forecasting Developments," Contributed Papers of the 38th
Sessions of the International Statistical Institute (Washington, D. C.:
August 1971). [124//]