Search PMN  

PDF version
for printing




Impact
Statement



© 2009 Plant Management Network.
Accepted for publication 2 February 2009. Published 31 March 2009.


Risk Analysis, Analysis of Variance: Getting More from Our Data


W. M. Clapham and J. M. Fedders, USDA-ARS, Appalachian Farming Systems Research Center, Beaver, WV 25813; and C. D. Teutsch, Virginia Polytechnic Institute and State University, Southern Piedmont Agricultural Research and Extension Center, Blackstone, VA 23824


Corresponding author: W. M. Clapham. William.Clapham@ars.usda.gov


Clapham, W. M., Fedders, J. M., Teutsch, C. D. 2009. Risk analysis, analysis of variance: Getting more from our data. Online. Forage and Grazinglands doi:10.1094/FG-2009-0331-01-RS.


Abstract

Analysis of variance (ANOVA) and regression are common statistical techniques used to analyze agronomic experimental data and determine significant differences among yields due to treatments or other experimental factors. Risk analysis provides an alternate and complimentary examination of the same data by determining yield probabilities for each treatment or factor. We generated and analyzed a synthetic data set to illustrate that data with similar means, as determined by ANOVA, can have markedly different probability distributions due to differences in standard deviations. We then applied the techniques to data from a five-year yield trial of twelve Bermudagrass cultivars. ANOVA detected significant year-by-cultivar interactions while risk analysis illustrated differences among the cultivars in yield stability and in the probabilities of achieving specific yield goals. Together, ANOVA and risk analysis provide a more complete view of the data that facilitates technical transfer of experimental results to producers and other end-users.


Introduction

Farmers are gamblers; every time they put seed in the ground, buy livestock, or make a capital investment, they are making a business decision and betting on a positive return. Risk is inherent to any farming system as producers face uncertainty from environmental and market forces (4). Any information or data that can reduce uncertainty in management decisions can have a large impact on net profits. Analysis of variance (ANOVA) is used routinely to estimate treatment means, partition variance among treatment factors and error, and determine significant differences among treatment means.

Risk analysis is an analytical technique used routinely in making business decisions but has not been applied widely to agronomic practices (1). Risk analysis utilizes data frequency distributions to assign probabilities of achieving given outcomes (6). In our application of risk analysis, we use the data collected from a traditional agronomic experiment and a mathematical technique known as bootstrapping to define the corresponding data distribution (2,3). Once defined, the distributions are used to generate cumulative probability curves by integrating the area underneath the frequency distribution. The cumulative probability curves provide users with a metric to predict the probabilities of success or failure of management decisions.

In this paper we describe the process of analyzing agronomic data using risk analysis and compare risk analysis results and interpretation with those from conventional ANOVA. We first demonstrate the technique using a simplified synthetic data set and then apply it to five years of data collected from a Bermudagrass yield trial. The primary objective of this study was to demonstrate the utility and power derived from combining ANOVA and risk analyses, and how risk analysis can facilitate decision making and technology transfer.


ANOVA and Risk Analysis of a Synthetic Data Set

A synthetic data set was created and then analyzed with both ANOVA and risk analysis techniques. Creating synthetic data allowed us to define the characteristics of a data set explicitly. Specifically, we were interested in exploring the results and interpretations of the two analytical techniques when treatment variances are heterogeneous. The data set was synthesized using Microsoft Excel 2003 and @Risk (Version 4.5, Palisade Corp., Ithaca, NY) software to represent the harvest yields of four theoretical treatments. A target mean (μ) and standard deviation (σ) were selected for each treatment. The RiskNormal(μ,σ) function of @Risk was then defined to model a normal yield distribution for each treatment. Forty synthesized yield values for each treatment were produced by Monte Carlo sampling from each distribution. The mean and standard deviation of the synthesized yield values were calculated and compared to the target mean and standard deviation for each treatment (Table 1).


Table 1. Target and actual parameters of four synthetic treatments.

Treatment Yield, target parameters
(kg/ha)
Yield, actual values
(kg/ha)
Mean Standard
deviation
N Mean Standard
deviation
1 2600 200 40    2595 a* 208
2 2800 200 40    2798 a 206
3 3000 300 40    2999 bc 327
4 3200 800 40    3213 c 816

 * Mean actual values followed by a different letter are significantly different (P < 0.05) based on Bonferroni adjusted probability levels.


The data set consisting of 160 data points (40 points for each of the four synthetic treatments) was analyzed with ANOVA. Levene’s test within the PROC GLM procedure of SAS confirmed heterogeneous variances among the treatments. The data were, therefore, analyzed using an unequal variance one-way model with the PROC MIXED procedure of SAS (5). Significant differences among least square treatment means were evaluated using the Bonferroni adjustment to provide error rate protection for multiple tests (7).

Our synthetic data set was also evaluated from the aspect of risk to determine which “treatment” would meet our objective function (goal), a theoretical “yield” of 2500 kg/ha. Probability density and cumulative probability curves were derived from the actual mean and standard deviations of the synthetic treatments based on normal distributions.


ANOVA and Risk Analysis of Bermudagrass Variety Trial

In June 2001, eleven seeded Bermudagrasses and a hybrid check were established at the Southern Piedmont Agricultural Research and Extension Center, Blackstone, VA (37°05'29"N, 77°57'58"W) (Table 2). The soil series was a Dothan (fine-loamy, kaolinitic, thermic Plinthic Kandiudults) -Norfolk (fine-loamy, kaolinitic, thermic Typic Kandiudults) complex. The experimental design was a randomized complete block with four replications. Plot size was 1.8 × 4.6 m with a 0.9 m border maintained between each plot with regular applications of glyphosate [N-(phosphonomethyl) glycine] or paraquat (N,N'-Dimethyl-4,4'-bipyridinium dichloride). A conventional seedbed was prepared by plowing, disking, and cultipacking. Plots were seeded using a cultipacker type seeder at a rate of 9 kg PLS/ha. In April 2001, dormant sprigs of Tifton 44 were started in a greenhouse in Styrofoam float trays with 5 × 5 cm cells. In June 2001, established plants were transplanted by hand into the prepared seedbed at a density 4.8 sprigs/m². Plots received 56 kg N per ha at seeding and 112 kg N per ha at spring green up and after each cutting except that no N was applied after the last harvest of each year. Phosphorus, K and lime were applied according to soil test recommendations. Mean monthly temperature and precipitation data for the five years of the study are presented in Table 3 and Table 4.


Table 2. Description of the bermudagrass varieties and blends used in this study.

Variety Type Blend Seed source Description
Cd 90160 seeded no DLF Intl. Seeds, Halsey, OR Improved cold tolerant, forage type adapted to the southern 1/3 of the US.
Cheyenne seeded no Pennington Seed, Madison, GA Improved high yielding, cold tolerant variety adapted to southern 1/3 of US.
Guymon seeded no OK AES and USDA-ARS Cold tolerant variety adapted to the upper southern US.
KF-194 seeded no KF Seeds, Inc., Brawley, CA Improved moderately cold tolerant variety adapted to the  southern 1/3 of the US.
Mirage seeded no DLF Intl. Seeds, Halsey, OR Improved cold tolerant variety designed for the turf market.
Mohawk seeded no Pennington Seed, Madison, GA Improved moderately cold  tolerant variety. Originally developed for turf market.
Pasto Rico seeded Giant and
Common
KF Seeds, Brawley, CA Giant, tall and fast growing with poor cold tolerance; Common, ecotype found in AZ and CA, moderate cold tolerance, best adapted to southern US.
Pyramid seeded no DLF Intl. Seeds, Halsey, OR Improved turf type best adapted to the southern US.
Ranchero Frio seeded Cheyenne and Wrangler Pennington Seed, Madison, GA Improved blend. See Cheyenne and Wrangler.
Sungrazer seeded KF-194 & Wrangler KF Seeds, Brawley, CA Improved blend. See KF-194 and Wrangler.
Tifton 44 hybrid no USDA-ARS and GA AES, Tifton, GA Cold tolerant, high yielding hybrid adapted to transition zone states.
Wrangler seeded no Johnston Seed Co., Enid, OK Very cold tolerant adapted to the northern transition zone of the US.

Plots were harvested on 30 August and 10 Dececember 2001 and five times per year from 2002 through 2006 (Table 5) by clipping a 1.2 m wide strip through the center of each plot with a self-propelled sickle bar-type forage harvester. The clipping height was 5 cm above the soil surface. Fresh weight from each plot was determined in the field. A sub-sample of fresh forage was returned to the laboratory and dried in a forced air oven for 5 days at 60°C to determine dry matter (DM) content. All yields are reported on a dry matter basis. The data presented herein consists of the cumulative annual dry matter yields for the five year period 2002 through 2006.


Table 5. Bermudagrass cutting schedule for the 2002 through 2006 growing seasons.

Harvest
order
Growing season
2002 2003 2004 2005 2006
Harvest date
First 31 May 19 Jun 26 May 9 Jun 2 Jun
Second 10 Jul 14 Jul 23 Jun 13 Jul 30 Jun
Third 15 Aug 15 Aug 30 Jul 10 Aug 26 Jul
Fourth 13 Sep 17 Sep 29 Sep 22 Sep 24 Aug
Fifth 22 Oct 24 Oct 30 Nov 14 Nov 3 Oct

Bermudagrass yields were analyzed using the MIXED procedure of SAS. Levene’s test indicated heterogeneous variances among the varieties (P < 0.001). The analysis utilized a repeated measure design across years with replicate declared a random effect. An unstructured covariance model was utilized.

The first step in the risk analysis was to create separate data sets for each variety by pooling data across replicates and years. Each of the 12 data sets, one for each variety, contained 20 yield values (4 replicates by 5 years). A bootstrap re-sampling procedure was performed on each data set. The procedure involved the random selection of four yield values with replacement from the 20 in each data set. These values were averaged to develop a mean value. This process was repeated through 10000 iterations to develop a frequency distribution of mean yield values. These yield distributions have shapes similar to the normal distribution conforming to the central limit theorem (3). Therefore, each yield distribution was modeled as a normal distribution based on its mean and standard deviation. The distributions were plotted as probability density and cumulative probability curves for comparison among varieties.


Comparison of Analytical Techniques for Synthetic Data

Results from the ANOVA indicate that synthetic Treatments 3 and 4 have the highest average yield and Treatment 1 has the lowest average yield (Table 1). The relative shape and position of probability density curves (Fig. 1) of these treatments provide additional insight into the data not apparent from the ANOVA results alone. Since the area under each probability density curve is equal to unity, relative differences in height and width of the distributions are due to differences in the standard deviations of the modeled yields. Since these probability density curves are normal distributions, the center or peak of a curve represents the mean value of the distribution. The probability density curves of Treatments 1 and 2 have identical shape due to their similar standard deviations. The Treatment 4 probability density curve suggests that approximately one-quarter of its yields would fall outside the range from 2000 to 4000 kg/ha. In contrast, virtually all of the yields from Treatment 3 would be within the range from 2000 to 4000 kg/ha. Although ANOVA found no difference between the means of Treatments 3 and 4 (Table 1), the probability density curves suggest important differences in variation between these two treatments.



 

Fig. 1. Probability density (A) and cumulative probability (B) curves for yield of four synthetic treatments.

 


Comparison of Analytical Techniques for Bermudagrass Trial

The ANOVA results indicated that Bermudagrass cumulative yields were significantly affected (P < 0.001) by variety, year, and the interaction of year and variety. Figure 2 illustrates the year to year variability in average yield for each of the cultivars. The significant interaction was due to yield variation among the cultivars over the five years. The sprigged cultivar, Tifton 44, had the lowest yield (10,380 kg/ha) during 2002, the initial year of the study. Once fully established, however, Tifton 44 yields were among the highest of any of the varieties reaching 20,940 kg/ha in 2006. Yields of the seeded varieties also varied from year to year. For example, the yield of Pasto Rico in 2002 (21,950 kg/ha) was among the highest of the varieties but its yield in 2003 fell to only 15,000 kg/ha, the lowest of any variety. Averaged across years, Pasto Rico along with Ranchero Frio and Cheyenne had the highest cumulative yields, near 19,000 kg/ha. Wrangler and Guymon yields were the lowest with averages of 16,030 and 16,790 kg/ha, respectively. The other seven varieties had mean cumulative annual yields that ranged between 16,970 (Sungrazer) and 17620 kg/ha (Mirage).



 

Fig. 2. Annual cumulative yields of Bermudagrass cultivars over the years 2002 through 2006. Error bars indicate the size of the pooled standard errors for each year.

 


The probability density and cumulative probability curves derived from the risk analysis procedure provide more insight into the relationships among the bermudagrass cultivars (Fig. 3). The slow to establish cultivar, Tifton 44, has the broadest and shortest probability density curve among all of the cultivars due to its high standard deviation caused by the gradual increase in productivity across years. ‘Wrangler’ likewise has a broad probability density plot due to wide variation in year to year yields. Observations indicate that Wrangler and Guymon, the most cold-tolerant seeded varieties in the trial, tended to be more sensitive to environmental stresses such as drought that occurred during the 5-year study. This sensitivity likely accounted for the wide variation in seasonal yield and the broad probability density plot (Fig. 3). In contrast, Mirage has a narrow probability density curve due to its relatively consistent production through all years of the study. The results of risk analysis combined with conventional ANOVA provide a richer framework from which to make management decisions.


 

Fig. 3. Probability density (A) and cumulative probability (B) curves for cumulative annual yield of Bermudagrass cultivars.

 

Risk Discussion

Risk analysis and resampling methods developed rapidly with the availability of personal computers. A question that often arises is "how can you estimate the population mean with such a small sample"? How large does a sample need to be? A small number of samples is adequate if it represents a random sample of the population. When resampling with replacement, a random selection from a sample pool is taken, recorded, and then returned to the pool. The probability of choosing that same sample with the next selection is identical to that of any other member of the pool. When random selection with replacement is repeated thousands of times, the result is an unbiased estimate of the population mean and variance.

Yield data is often presented as a series of means that are separated with a probability level. Data are often interpreted to determine treatments that maximize or minimize a relationship of interest. When experiments are repeated over time and space, mean yields can vary dramatically as seen in the Bermudagrass trial herein (Fig. 2). Optimum yield includes a discount or premium for risk. An explicit statement of a benchmark yield or goal is rarely mentioned in agronomic trials. Without a benchmark goal, risk cannot be quantified and optimum yield becomes a vague term. Specifying a goal or in other words the objective function is central to risk analysis, whether it is the forage yield to meet a stocking rate and performance expectations, or a set of goals such as the yield, quality, or timing of a particular crop. Risk analysis is built on the simple idea that we can utilize data distributions to reduce our uncertainty in making management or business decisions. When production is focused on meeting a specific goal (i.e., X kg/ha) alternative management strategies become more apparent.

We defined the objective function for the synthetic data set as 2500 kg/ha (Fig. 1). While the overall mean yield of all four treatments exceeded our goal, the probability of an individual yield exceeding 2500 kg/ha varied widely among the treatments based on the risk analysis. Treatments 2 and 3 had a 93% probability of an individual yield meeting or exceeding 2500 kg/ha. Treatment 1, while not statistically different from Treatment 2 according to ANOVA (Table 1), had the lowest probability (64%) of meeting our goal as indicated by the risk analysis. Treatment 4 was not statistically different from Treatment 3 according to ANOVA but a large standard deviation resulted in only an 81% probability of an individual yield meeting our defined objective. In a case such as this, the less variable (less risky), but lower yielding Treatment 3 might be recommended over the highly variable Treatment 4. Clearly, recommendations based upon the combination of a producer’s objective function, yield probability distributions and ANOVA could be different than recommendations based upon ANOVA alone.

Results for the Bermudagrass cultivar trial suggest similar utility for combining risk analysis and ANOVA. Yields of Bermudagrass varied widely among and within the cultivars (Fig. 2). If our objective function for yield is 17000 kg/ha, all but two cultivars (Wrangler and Guymon) would meet this goal with at least a 50% probability. However, three cultivars (Pasto Rico, Ranchero Frio, and Cheyenne) would meet our goal with a probability of 90% or more. If we raise our yield goal to 18500 kg/ha, we would anticipate achieving this level of production between 55 and 65% of the time using Pasto Rico, Ranchero Frio, and Cheyenne. The other seeded cultivars would only be expected to reach such high yield levels infrequently. The sprigged cultivar, Tifton 44, was slow to establish and did not reach full production until the 2004 season (Fig. 2) and contributed to the significant cultivar-by-year interaction. However, once established, the high and relatively stable yields were rivaled only by the seeded cultivar, Pasto Rico.


Conclusions

Management decisions should be based on the goals and resources of the producer in conjunction with the best available information in regard to potential outcomes and risks. The utility of ANOVA in formulating recommendations is due to its power to partition variance and detect significant treatment and interaction effects. Risk analysis provides metrics to compare treatments relative to producer defined objective functions and provides probabilities for achieving given yield levels. Risk analysis and ANOVA utilize the same data set but address different questions about the data. Together, they are synergistic tools that can aid our understanding of experimental data and assist in the development of management recommendations.


Literature Cited

1. Clapham, W. M., Fedders, J. M., Abaye, A. O., and Rayburn, E. B. 2008. Forage pasture production, risk analysis and the buffering capacity of triticale. Agron. J. 100:128-135.

2. Davison, A. C., and Hinkley, D. V. 1997. Bootstrap Methods and Their Applications. Cambridge Univ. Press., New York, NY.

3. Efron, B., and Tibshirani, R. J. 1998. An Introduction to the Bootstrap. Chapman and Hall/CRC Press, London, UK.

4. Hardaker, J. B., Huirne, R. B. M., Anderson, J. R., and Lien, G. 2004. Coping with Risk in Agriculture. CABI Publ., Cambridge, MA.

5. Littell, R. C., Milliken, G. A., Stroup, W. W., and Wolfinger, R. D. 1996. SAS System for Mixed Models. SAS Institute, Cary, NC.

6. Vose, D. 2000. Risk Analysis: A Quantitative Guide. John Wiley & Sons, Chichester, West Sussex, UK.

7. Westfall, P. H., Tobias, R. D., Rom, D., Wolfinger, R. D. and Hochberg, Y. 1999. Multiple Comparisons and Multiple Tests Using the SAS System. SAS Institute Cary, NC.