Bayesian Approaches to Plant Disease Forecasting
Jonathan Yuen, Department of Ecology and Crop Production Science, Swedish University of Agricultural Sciences, SE 750 07 Uppsala, Sweden
Corresponding author: Jonathan Yuen. jonathan.yuen@evp.slu.se
Abstract
Prediction of disease occurrence is a well known historical theme, and has
begun to receive new interest due to internetbased prediction systems. The
evaluation of these systems in a quantitative manner is an important step if
they are to be used in modern agricultural production. Bayes’s theorem is one
way in which the performance of such predictors can be studied. In this way, the
conditional probability of pest occurrence after a positive or negative
prediction can be compared with the unconditional probability of pest
occurrence. Both the specificity and the sensitivity of the predictive system
are needed, along with the unconditional probability of pest occurrence, in
order to make a Bayesian analysis. If there is little information on the prior
probability of disease, most predictors will be useful, but for extremely common
or extremely rare diseases, a Bayesian analysis indicates that a system
predicting disease occurrence or nonoccurrence will have limited usefulness.
Introduction
The prediction of pests and diseases in agricultural crops is a problem that we
still strive to solve today. This is an ancient historical theme, and early
references to pest prediction in the Bible (Joseph's prediction for the pharaoh)
indicate the importance that our forefathers placed on pest prediction, though
as modern scientists we may have difficulty justifying such prediction methods
and may question the historical accuracy of this account.
Our view of causality with regards to plant disease would probably rule out
interpretation of dreams (Joseph's method) as a way to predict occurrence of
diseases in crops, but one can ask if modern methods that rely on our knowledge
of the biology and ecology of the crops are better than dreams. We would hope
so, but one would then want objective methods for evaluating such predictive
systems. Such systems have begun to receive increased attention due to
Internetbased implementations, but one should keep in mind that Mills’s rules
for the prediction of apple scab actually predate modern computerbased methods.
Whether a predictive system is a table, a set of printed cards, or elaborate
internetbased graphic systems, there is still a basic set of decision rules
that should be evaluated in an objective manner.
A key concept to keep in mind is that many of these predictive systems are
fallible, and sometimes give incorrect predictions. How often they are
incorrect, and how this can affect their usefulness, is the theme of this paper.
Materials and Methods
A number of measures of a predictor can be derived by comparing
what the decision rules predict with what
actually happens. If we take a simple case,
the presence and absence of disease are the two possibilities for both
predictions and outcomes, and we thus have a
total of four possibilities. More complicated situations (timing of multiple applications during the growing season, for example) are much more complicated
and lie outside the scope of this article. This simple example was illustrated
quantitatively by Yuen and Hughes (5) with data borrowed from Jones (3)
who used the incidence of eyespot (caused by Pseudocercosporella
herpotrichoides) at GS 3031 to predict the need for fungicide treatment
(Table 1). In Table 1, the actual requirements are arranged as columns, and the different
predictions appear as rows.
Table 1. Actual requirement for fungicide compared with a predictor
for
eyespot damage using data presented by Jones (3).
Predictor 
Actual Requirement 
Total 
Spray 
Don’t Spray 
Spray 
28 
10 
38 
Don’t Spray 
13 
7 
20 
Total 
41 
17 
58 
One important measure is the sensitivity of the predictor. This is the
proportion of correct predictions that the pest will occur among those fields where the
pest actually occurred. This is also referred to as the true positive
proportion. In the data presented in Table 1, this is 28/41 or about 0.68. Another critical measure is the specificity of the predictor. This is the
proportion of fields with the correct prediction that the pest will not occur among
those fields where the pest was actually absent. In our data, this is 7/17.
Another measure that is used is the false positive rate, which is 1specificity,
or 10/17 with our data. Likewise, a false negative rate can also be calculated
(1sensitivity). Changing the sensitivity of a predictor by varying the decision
threshold will affect the specificity, such that increasing one decreases the
other. Simultaneous improvement of both sensitivity and specificity requires
either a reformulation of the predictive system, additional information, and
usually both of these.
An ideal predictive system will have a sensitivity and specificity of 1.0,
which would make the analyses proposed in this article superfluous. Our
attention is generally focused on the sensitivity of a predictor, since the
mistakes associated with poor sensitivity (missing an application of a control
measure such as a pesticide) are generally viewed as serious ones by farmers.
The mistakes associated with poor specificity (such as applying pesticides when
they are not needed) has traditionally not been viewed as serious by plant
pathologists, but for environmentalists this may be a valid concern. This
quantity is also more difficult to measure, since an untreated portion has to be
left in a field trial to see if a control measure truly was justified, whereas
imperfections in sensitivity are painfully obvious.
Information about sensitivity and specificity of a predictor is usually not
what is required by a decision maker. A farmer, for example, would most likely
want to know whether or not to apply control measures, and this is most likely
available from the probability of pest occurrence, which is a formalized way of
referring to the experience of the farmer. Most farmers can say whether
they have had a problem before, or if this is something new. Using this
experience to make inferences about the future is of course dependent on
assumptions about unchanging conditions, cultivars, farming practices, pathogen
genotypes, etc. Thus, without any additional information, a farmer (or any
decision maker) would act as though the future would be similar to the past.
An alternate measure can be obtained by examining the rows of Table 1 instead
of the columns. One calculates the proportion of correct decisions of the 38
positive predictions. In this case, the measure, known as the positive
predictive value (PPV), is 28/38. A similar measure is also available for the negative
predictions (7/20) and is known as the negative predictive value (NPV). These
measures, which are closer to the information needed by a decision maker, are
unfortunately dependent on the prevalence of the disease (i.e., how common the
disease is). As prevalence decreases PPV also decreases but NPV increases.
Conversely, as prevalence increases, PPV increases, but NPV decreases. Due to
their dependence on prevalence, these measures have limited utility.
In statistical terms, this can be defined in terms of probabilities. If
disease occurrence is referred to as A, the probability of disease
occurrence can be written Pr(A). In the absence of any predictive system,
this is the prevalence of the disease, and is the probability of disease
occurrence in the absence of any additional information. In Bayesian terms this
is often referred to as a prior probability, since this is the probability of
disease before we use the predictor. Note the connection to the experience of
the farmer. What we would like is the probability of pest occurrence after using
a positive prediction. These are called conditional probabilities, and are
written as Pr(AB), where B represents a positive
prediction. Likewise, there is the probability of pest occurrence after a
negative prediction. This is denoted Pr(AB), where B represents a
negative prediction.
We can calculate the conditional probabilities by combining the sensitivity
and specificity of a predictor with information on the prior probability of
disease using Bayes’s theorem. The usual form of Bayes’s theorem is presented as
Equation 1, and one can see that it contains the both the sensitivity and
prevalence in the denominator and the first part of the numerator, and even the
false positive rate appears in the second portion of the numerator.

[1] 
Thus, the probability of pest presence if the predictor is positive (Pr(AB))
equals the sensitivity (Pr(BA)) times the prior probability of
the
pest being present (Pr(A)) divided by the sum of sensitivity (Pr(BA))
times probability of the pest being present (Pr(A)) and false positive rate (Pr(BA)
times probability of the pest not being present (Pr(A)). The probability of the
pest being absent is written as Pr(A); thus Pr(A) + Pr(A) = 1.0.
This form of Bayes’s theorem is difficult to use, and can be simplified by the
use of likelihood ratios (LR) and conversion of probabilities to odds. The LR
for a positive prediction (LR+) is sensitivity/(1specificity) and the LR for a
negative test (LR) is (1sensitivity)/specificity. Odds can be calculated from
probabilities.
Odds(A) = 
Pr(A) 
[2] 
1  Pr(A) 
Using this form yields a much simpler form of Bayes’s theorem:
Odds_{Posterior} = Odds_{Prior }LR [3]
Thus the odds after using the predictor equals the odds before using the
predictor times the likelihood ratio.
Using the data from the eyespot example (with sensitivity 0.68 and
specificity 0.41), the LR+ is 0.68/(10.41) or 1.15 and the LR is (10.68)/0.41 or
0.78. Thus, if we were to use the predictive system proposed by Jones (3), a
positive prediction would increase the odds of disease by 15% and a negative
prediction would decrease it by 22%. The prior probabilities would vary from
field to field, but in the absence of any other information, one might use the
overall probability of eyespot. If we borrow information from Cook et al.
(1) we might start with an odds of 0.19 (5). With the
corresponding likelihood ratios and a prior odds, we can now use Equation 3.
Thus after a positive prediction the odds of eyespot would increase from 0.19 to
yield a posterior odds of 0.19 × 1.15 or 0.23. Likewise a negative
prediction would yield posterior odds of 0.19 × 0.78 or 0.15. These odds can
be converted to probabilities using the inverse of Equation 2, which would be
0.19 after a positive prediction for eyespot, and 0.13 after a negative
prediction for eyespot.
The critical value for eyespot incidence in that example was 20% of the
tillers (a fungicide application was justified if it was greater than or equal to
20%). This is an example of what is referred to as a decision threshold (2), and it could of course vary, with resulting changes in both
sensitivity and specificity (Yuen et al., 1996). A plot of these (usually the
true positive rate as a function of the false positive rate) is referred to as
an ROC curve. An example of a ROC for the prediction of plant disease, borrowed
from Yuen et. al., 1996, is presented in Fig. 1. Many of the predictive
systems for disease prediction have a continuous or semicontinuous point scale,
with a decision threshold, and these can easily be presented along with an ROC
curve.

Fig. 1. Receiver operating characteristic (ROC) curve for prediction of Sclerotinia stem rot (Yuen et al., 1996). 

Twengström et al. (4) developed a predictor for Sclerotinia stem rot based
on weather, cropping history, and other variables. They presented their results
in part as an ROC curve, and suggested decision thresholds with varying
specificity and sensitivity. Lower thresholds will increase the sensitivity of a
predictive system but will also increase the false positive rate (decrease
specificity). Higher thresholds can reduce the false positive rate (increase
specificity) at the expense of decreased sensitivity. This leads to varying
sensitivity and specificity, and the likelihood ratios for the different
decision thresholds are presented in Table 2.
Table 2. Likelihood ratios for positive and negative prediction of
Sclerotinia stem rot based on varying decision thresholds.
Threshold 
Sensitivity^{a} 
Specificity^{b} 
LR+^{c} 
LR^{d} 
35 
0.90 
0.77 
3.913 
0.130 
40 
0.77 
0.84 
4.812 
0.274 
50 
0.35 
0.95 
7.000 
0.684 
Although prior probabilities will vary from field to field (or from farmer to
farmer), a rough estimate of them can be obtained by examining the average need
for fungicide sprays from two regions or the single worst year (5). These are shown in Table 3.
Table 3. Prior probabilities and odds for Sclerotinia stem rot based on
twentyyear averages or single worst year from two provinces in Sweden.
Region 
Time Period 
Pr(A) 
odds(A) 
Uppland 
20year average 
0.16 
0.19 
Uppland 
Single worst year 
0.61 
1.56 
Västmanland 
20year average 
0.23 
0.30 
Västmanland 
Single worst year 
0.64 
1.78 
Use of Bayes’s theorem gives both increases (Table 4) and decreases (Table 5)
in the odds of disease occurrence following positive and negative predictions.
Table 4. Increases in the odds or probability of Sclerotinia occurrence after
a positive prediction based on varying decision thresholds. Values in body of
table are odds, probabilities in parentheses, and are calculated using the prior
odds from Table 3 (5).
Threshold
risk point
score 
LR+ 
Prior Odds 
0.19 
1.56 
0.30 
1.78 
35 
3.913 
0.75 (0.43) 
6.12 (0.86) 
1.17 (0.54) 
6.96 (0.87) 
40 
4.812 
0.92 (0.48) 
7.53 (0.88) 
1.44 (0.59) 
8.56 (0.90) 
50 
7.000 
1.33 (0.57) 
10.95 (0.92) 
2.09 (0.68) 
12.44 (0.93) 
Table 5. Decreases in the odds or probability of Sclerotinia occurrence after
a negative prediction based on varying decision thresholds. Values in body of
table are odds, probabilities in parentheses, and are calculated using the prior
odds from Table 3 (5).
Threshold
risk point
score 
LR 
Prior Odds 
0.19 
1.56 
0.30 
1.78 
35 
0.130 
0.02 (0.024) 
0.20 (0.17) 
0.04 (0.037) 
0.23 (0.19) 
40 
0.274 
0.05 (0.050) 
0.43 (0.30) 
0.08 (0.076) 
0.49 (0.33) 
50 
0.684 
0.13 (0.12) 
1.07 (0.52) 
0.20 (0.17) 
1.22 (0.55) 
Discussion
Sensitivity and specificity can be used to characterize disease prediction
rules, and can be summarized as likelihood ratios (LR) for positive and negative
predictions. Good predictors have high LR’s for positive predictions and low
LR’s for negative predictions. Bayes’s theorem can be used to examine how the
probability of disease occurrence changes after using the predictor.
It can be difficult to perform these calculations for many predictive systems
used for plant pests. This is in part because the sensitivity and specificity
are often not accurately quantified. The fact that the use of a predictive system
does not result in more disease than the common practice (or routinely used
control measures) is related to the sensitivity, but is not a direct measure of
this quantity. Likewise, we can make inferences about the specificity of a
predictor by comparing the frequency of pesticide use while following its
recommendation with the "usual" frequency, but, as with sensitivity, we have not
accurately measured this quantity either.
For extremely common or extremely rare diseases, it may be difficult to
develop predictors of pest occurrence that have sufficient sensitivity or
specificity such that they would be able to change the behavior of the decision
maker. For example, if a decision threshold is selected from Table 2 that
equally weighs selectivity and sensitivity (40 points), this gives a value for LR+ of 4.8. If the prior odds is 0.05, a positive prediction will increase the
odds to only 0.24. A prior odds of 0.01 would only increase to 0.048 after a
positive prediction. A similar argument can be made for extremely common
diseases and negative predictions. Thus, these predictive systems will have
their greatest usefulness when the prior probabilities lie near 50%, i.e. for
diseases that are neither very common nor very rare.
For many pests, the goal of the predictive system is to schedule multiple
control measures (such as the frequency of repeated fungicide applications).
Thus the problem is not so much if the pest will occur (which is the focus of
this article) but when the pest will occur, and the validation of these methods
lie outside of this discussion. Many of these systems assume that the pest will
occur (i.e., the prior assumption is that the pest will occur) and then try
to refine the control measures accordingly. Thus, the fact that such systems do
not try to predict the occurrence of the pest is entirely consistent with the
Bayesian analyses presented here. This does agree with the concepts presented
here, however, since the focus of the problem is not whether the pest will occur
during the growing season, but when the pest will occur.
Even for such diseases that are neither very common nor very rare, with
unconditional probabilities that lie around 50%, the use of a Bayesian analysis
will indicate the magnitude with which disease probabilities will change when
the predictor is used. If the sensitivity and/or specificity is poor (with
resulting values for the LR’s close to 1.0), only small changes in disease
probability will occur when the predictors are used.
Thus, knowledge of sensitivity and specificity of prediction rules would
permit targeting of systems where the prior probabilities would allow success.
If the prior probability of disease is too large or small, and the performance
of the predictive system is poor, one would not expect adoption of the system
due to small changes in the disease probabilities.
Even in the absence of predictive systems, a Bayesian analysis would be
useful. Knowing the prior probabilities of disease occurrence would determine
minimum performance criteria necessary for success.
Literature Cited
1. Cook, R. J., Polley, R. W., and Thomas, M. R. 1991. Diseaseinduced losses in
winterwheat in England Wales 198589. Crop Prot. 10:504508.
2. Hughes, G., McRoberts, N., and Burnett, F. J. 1999. Decisionmaking and diagnosis in
disease management. Plant Pathol. 48:147153.
3. Jones, D. R. 1994. Evaluation of fungicides for control of eyespot disease and
yield loss relationships in winter wheat. Plant Pathol. 43:83146.
4. Twengström, E., Sigvald, R., Svensson, C., and Yuen, J. 1998. Forecasting
sclerotinia stem rot in spring sown oilseed rape. Crop Prot. 17:405411.
5. Yuen, J. E., and Hughes, G. 2002. Bayesian analysis of plant disease prediction.
Plant Pathol. 51:407412.
6. Yuen, J. E., Twengström, E., and Sigvald, R. 1996. Calibration and verification of
risk algorithms using logistic regression. Eur. J. of Plant Pathol.
102:847854.
