Probability distribution, sampling unit, data transformations and sequential sampling of European vine moth, Lobesia botrana (Lepidoptera: Tortricidae) larval counts from Northern Greece vineyards

Studies were conducted to investigate the distribution of larvae of the European vine moth, Lobesia botrana (Denis & Schiffermuller) (Lepidoptera: Tortricidae), a key vineyard pest of grape cultivars. The data collected were larval densities of the second and third generation of L. botrana on half-vine and entire plants of wine and table cultivars in 2003-2004. No insecticide treatments were applied to plants during the 2-year study. The distribution of L. botrana larvae can be described by a negative bino- mial. This reveals that the insect aggregates. A common value for the k parameter of the negative binomial distribution of kc = 0.6042, was obtained, using maximum likelihood estimation, and the advantages and cases of use of a common k are discussed. The and proved to be the best transformations for L. botrana larval counts. An entire vine is k � 1 Sinh � 1 � kx �1/2� k � 1 Sinh � 1 � kx �3/8� recommended as the sampling unit for research purposes, whereas a half-vine, which is suitable for grape vine cultivation in northern Greece, is recommended for practical purposes. We used these findings to develop a fixed precision sequential sampling plan and a sequential sampling program for classifying the pest status of L. botrana larvae.


INTRODUCTION
The European vine moth, Lobesia botrana (Denis & Schiffermüller) (Lepidoptera: Tortricidae) is a key pest of vineyards in Europe, southern Russia, Japan, the Middle East, Near East and northern and western Africa (Venette et al., 2003). In response to differences in climate, the number of generations completed by L. botrana within a season differs geographically. Usually, there are two generations annually in north areas of Europe, such as Austria, Germany, Switzerland and northern France, whereas three generations occur in southern Europe, including Mediterranean countries (Badenhausser et al., 1999;Venette et al., 2003). In Greece, L. botrana completes 3-4 generations per year, while in northern Greece, where this study was conducted, three distinct generations occur annually (Savopoulou-Soultani et al., 1989). Larvae of the first generation damage the inflorescences of grapes and those of the following generations damage the green and ripe grapes. Damage to grapes is often accompanied by infection with the gray mold fungus Botrytis cinerea Persoon (Savopoulou-Soultani & Tzanakakis, 1988;Fermaud & Giboulot, 1992). In the region of this study, economical important damage is mostly caused by the second and third generation larvae (Savopoulou-Soultani et al., 1999).
The observed dispersion pattern for a particular species is largely determined by its behaviour. A uniform or regular dispersion pattern indicates some degree of repulsion between individuals, which tends to equalize the number of individuals per sample. In a random population, there is an equal probability of an organism occupying any point in space, and the presence of one individual does not influence the distribution of another. In a typical series of samples from an aggregated population, many samples contain few or no individuals of a particular species while some samples may contain a high number of individuals (Davis, 1994). Information on dispersion is used to transform data prior to analysis to determine optimal sampling pattern and sample size and construct sequential sampling programs.
The first step in evaluating the dispersion of an organism within its habitat is obtaining a knowledge of the probability distribution of the pest population. Observations of populations in natural settings are a staple aspect of ecology. They remain invaluable for describing and identifying the possible causes of the distributions of individuals in natural populations. Moreover, determining the probability distribution of a population is useful for establishing a sampling procedure (Southwood, 1978). Combined with a knowledge of the spatial distribution of the population (i.e. the spatial arrangement of individuals among the units), the probability distribution allows a more accurate estimate of the total injury, and/or damage caused and, therefore, a better prediction of yield loss (Hughes & McKinlay, 1988).
Numerous discrete distributions are used to evaluate dispersion (Davis, 1994;Young & Young, 1998). The most common are the Poisson and negative binomial distribution (Davis, 1994;Young & Young, 1998;Binns et al., 2000). Random distributions are best described by the Poisson distribution in which the variance equals the mean. Insects are often not distributed randomly; they tend to aggregate. In these cases, the variance tends to be greater than the mean, with the negative binomial distri-bution adequately describing the observed frequencies (Southwood, 1978).
The negative binomial is a powerful tool for matching the frequencies of a wide variety of pest distributions in the field (Binns et al., 2000). It is described by two parameters, the mean µ and parameter k, which is generally called the exponent or clustering parameter of the distribution. If there are several sets of counts of the same species of insect, at various mean densities, it is possible to determine whether k remains stable and subsequently try to fit a common k. A common negative binomial k has many advantages: it is dependent upon the intrinsic power of the species to reproduce itself, and also it is useful to the development of sampling plans and evaluation of adequate data transformations in experimental designs (Anscombe, 1949;Bliss & Owen, 1958).
For pest management, a sequential sampling program is more practical, fast, accurate and statistically valid. The evaluation of sequential sampling plans is generally based on the operating characteristic (OC) and average sample number (ASN) functions. The OC function is the probability that the null hypothesis (the population mean is below the stated safety level) will be accepted for any given value of the mean. The ASN curve is the average number of plants observed in each simulated sample before taking a decision (the average sample size that is required to satisfy the stopping criterion).
The aim of our study was to identify the frequency (probability) distribution of L. botrana larvae, and determine their aggregation behaviour with emphasis on ecological importance and biological meaning. We focused on how the individuals were arranged within the statistical units. Additionally, an estimate of a common negative binomial parameter, for testing data transformations and a valid sampling unit suitable for grape vine cultivation in northern Greece, was obtained. We used these findings to develop a fixed precision sequential sampling plan and a sequential sampling program for classification of the pest status of L. botrana larvae.

Field plots
Studies were conducted in the commercial vineyards of the American Farm School and Aristotle University of Thessaloniki in Macedonia, northern Greece. The plots for this study covered an area smaller than half a hectare, which is a typical vineyard size in northern Greece. Four plots (A-D) of eleven different wine-cultivars and two plots (E, F) of table-cultivars were used in this study. No insecticides were applied to these plots.

Sampling unit -Data collection methodology
The statistical units used in our study were (a) half-vine plant (as divided by wires in a typical vertical shoot positioning system) and (b) entire plant (vine). Exhaustive counts were performed (every vine was examined) in all plots during the two-year period (2003)(2004) and vines were searched for larvae, two weeks after the end of each flight of adults. In this particular period, the damage is easily visible and the larvae have not yet abandoned the clusters for pupation (Savopoulou-Soultani et al., 1989. The experimental vineyards consisted of a total of 2,299 vines and 25,041 grape-clusters (mean number per generation per year). Counting all the individuals in a population, called a census, is the most direct and accurate method of determining population density. Every vine was inspected and the number of L. botrana larvae recorded regardless of instar (infestation).

Estimation of negative binomial parameter k
The probability distribution of the number of larvae per sampling unit was analyzed for half-vine and entire plants in different vineyards and years. Negative binomial parameter k was estimated using the maximum likelihood method (Bliss & Fisher, 1953;Davis, 1994;Young & Young, 1998). The maximum likelihood estimator (MLE) is the value that maximizes the probability of the observed data. The MLE of k is obtained by solving the following equation: where mj is the number of times a j occurs in a sample of size n.
The MLE of k has better asymptotic properties and a higher precision (Binns et al., 2000) and is generally considered superior to the other methods of estimating k (Bliss & Owen, 1958;Young & Young, 1998;Gozé et al., 2003). The estimates of k were calculated using code based on MATLAB 6.5 (The Math-Works Inc, 2002) and checked with EcoStat 1.0.2 (Trinity Soft-Ware, 1999). When k remained constant across different densities, a common value of k was estimated using MLE. The MLE of kc is obtained by solving the following equation: where mij is the number of observations in the sample from population i with the value j. The MLE of kc is not used often in biological applications because of the extensive computations involved (Young & Young, 1998). The estimate of kc was calculated using code written in MATLAB 6.5 (The MathWorks Inc, 2002).

Stability of the parameter
In order to assure the stability of the k parameter and increase our confidence in its suitability for larval counts of L. botrana we used three different procedures of testing this assumption. Initially, the parameter k from the negative binomial distribution was regressed against the mean to determine whether estimates of k and mean were related (Taylor et al., 1979). We used the curve estimation procedure in SPSS 12.0 (SPSS Inc, 2003). This procedure produces curve estimation regression statistics for 11 regression models (linear, logarithmic, inverse, quadratic, cubic, power, compound, S-curve, logistic, growth and exponential). Finally, the homogeneity of k for the populations studied was verified by a maximum likelihood ratio test. The test of homogeneity is a test of the null hypothesis that there is a k common for all the t negative binomial populations. Two tests were developed for this hypothesis; the first is based on the chisquare test and the second on the F-distribution (Young & Young, 1998).

Tests of the fit and stability of the probability distribution
The fit and stability of the distribution were tested using a chi-square test against the negative binomial distribution with the common value of k. A fit to these counts using the Poisson distribution was also tested to check the hypothesis of a nonaggregative probability distribution. We used the chi-square instead of Kolmogorov-Smirnov (K-S) test, since the K-S test is exact, only when the population distribution is continuous (Young & Young, 1998). The statistical software SYSTAT 11 (SYSTAT Inc, 2004) was used to test the fit of the data to the discrete frequency distributions. Observations with an expected frequency < 5 were pooled until all classes had acceptable expected frequencies (Marques de Sa, 2003).

Sequential sampling
The sequential sampling plan was based on the negative binomial distribution of L. botrana larvae and corresponding formulas from Young & Young (1998).
The minimum sample size and the required sample size to estimate population means were calculated by solving

Data transformations
We used a series of transformations of the L. botrana larval counts, based on an estimated common k, which are proposed in the literature for stabilizing the variance of negative binomial distributions (Bartlett, 1947;Anscombe, 1949;Bliss & Owen, 1958;Johnson & Kotz, 1969;Zar, 1999) (Table 1).
Departures from normality were tested using a Kolmogorov-Smirnov (K-S) test and homogeneity of variances with a Bartlett's test (Zar, 1999). Tukey's test of additivity was used to test for non-additivity (Snedecor, 1956).

Estimation and stability of common negative binomial parameter k
Mean and variance of counts of L. botrana larvae, the number of vines searched in each plot, the mean number Bartlett, 1947 x k 1 log 1 k 2 x k x Anscombe, 1949 x Sinh 1 x c k 2c c = 0.375 if k is large c = 0.2 when k = 2 Anscombe, 1949 x log(x 1 2 k) Reference Transformation of clusters searched in each vine (or half-vine), and the parameter k, for each sampling unit (half-vine and entire vine), for each generation, are shown in Tables 2 and 3. These data cover a wide range of pest densities, including economic thresholds.
The results did not indicate trend between k and mean infestation (all regression models not significant, the best logarithmic model having R 2 = 0.038 and P = 0.187). Therefore, the common value of k using MLE for all sampling units of the second and third generations of L. botrana infestation data, was estimated at kc = 0.6042. The homogeneity of this kc among the populations studied was verified by a chi-square test and an analysis of variance table, according to Young & Young (1998). From the chi-square test ( 2 = 33.8627, df = 46, P = 0.9076) we concluded that the null hypothesis cannot be rejected and that there is a common k to the t populations. Since the F-test associated with the slope was significant (P < 0.0001) and that for the intercept was not (P = 0.2471), the hypothesis of a common k was not rejected (Bliss & Owen, 1958;Young & Young, 1998).

Tests of fit and stability of distribution
The chi-square tests of fit of the counts to the discrete frequency distributions for second and third generation L. botrana larvae are shown in Table 4. The null hypothesis tests whether the distribution in question is a good fit to the data (the observed value equals the expected value); therefore, a significant test value indicates lack of fit. The negative binomial distribution with parameter kc = 0.6042 is valid for over 80% of the cases studied. On the contrary, the Poisson distribution fits the data only occasionally (about 15%) and only at low L. botrana densities. There was no case where the Poisson fitted the data significantly and the negative binomial distribution did not.  To determine the stability of the distribution, we investigated the mean-variance relationship. For a negative binomial distribution, the relationship between mean (µ) and variance ( 2 ) is 2 = µ+µ 2 /k. A regression analysis using this equation gave a satisfactory fit (R 2 = 0.992, P < 0.0001), compatible with a negative binomial distribution. The µ 2 factor is significant (P < 0.0001), revealing an over-dispersed (aggregated) distribution and a significant departure from a Poisson distribution (Davis, 1994;Gozé et al., 2003;Beyoa et al., 2004).

Sequential sampling
The total cumulative number of L. botrana larvae recorded in the field was plotted against the number of samples taken and sampling was continued until the cumulative (Tn) number of larvae exceeds a designated level, which is determined by a stop line for a desired level of precision (Fig. 1).
Example decision lines (upper and lower decision boundaries) for two classification sequential sampling programs are presented in Fig. 2, based on k = 0.6042, µ0 = 0.5, as well as economic thresholds of 4.0 larvae per half-vine for wine grapes and 2.0 larvae per half-vine for table grapes (A.A. Ifoulis, unpubl. data).
The OC curves for these sequential sampling plans are presented in Fig. 3. The OC curves show that the probability of making an incorrect decision is very low when the larval population is below or above the stated levels for both the economic thresholds. The risk of not spraying when the economic threshold µ1 is in fact reached and the risk of spraying when infestation is in fact equal to the tolerance level of population density µ0 is as expected. The sequential test is therefore conservative, meaning that it can be used with full confidence, even at low levels of infestation.
The average sample number curves for the sequential sampling plans are shown in Fig. 4. The average number reaches a reasonable maximum of 3 and 6 half-vines for wine and table grapes, respectively.

Data transformations
All estimates of the parameters in this study were used to calculate valid transformations of L. botrana data for research purposes. We also used all the transformed data in various situations; we compared wine and table cultivars, 2nd and 3rd generation, and the two years using a Student's t test. Apart from these comparisons, the transformed data was used in a real situation: In geostatistical analyses used to construct semivariograms, data were 757 Fig. 2. Decision boundaries for a sequential sampling plan for L. botrana larvae ( = 0.2, = 0.1, µ0 = 0.5 larvae per half-vine, µ1 = 4 and 2 larvae per half-vine for wine and table grapes, respectively). transformed to make the distribution more symmetrical and remove the trend in variance (Ifoulis & Savopoulou-Soultani, 2006a). We proposed and y 1 k 1 Sinh 1 k x 1/2 y 2 k 1 Sinh 1 k x 3/8 as the most appropriate transformations for counts of L. botrana larvae. These transformations corrected the heteroscedasticity, nonnormality and nonadditivity in most cases (data not shown). Additivity is often deemed the more important assumption when data follows a negative binomial distribution (Bliss & Owen, 1958). We used mainly Tukey's test of additivity, considered as a useful tool for indicating if a transformation was successful (Snedecor, 1956). It is characteristic that in all situations where Tukey's test was significant (express nonadditivity) with p value ranging between 0.01 and 0.1, the above transformations corrected the data (non significant Tukey's test). For the proposed value of k and nonnegative values of x these transformations are singlevalued functions of x 0, hence inverse transformations exist or x [Sinh(ky 1 )/k] 2 1/2 x [Sinh(ky 1 )/k] 2 3/8 and can be used to specify an integer value for x under an appropriate condition of given f1(x) or f2(x). An unbiased estimate of µ would be obtained by adding s 2 to the x derived by untrasforming (Zar, 1999), where s 2 is the y variance of the transformed data, which is specified at 0.25 for the negative binomial distribution (Bartlett, 1947).

Distribution
Probability or frequency distributions often serve as good models of real-world phenomena (Binns et al., 2000). Our results were for a variety of cultivars (table  and wine), different sampling unit sizes, different generations and years. Moreover, the distribution may be random at small scales and aggregated at larger scales (Young & Young, 1998). Thus, it is important to examine the distribution using different sampling units.
The frequency distribution of L. botrana larvae follows a negative binomial distribution. This reveals aggregation of larvae within the statistical unit (vine or half-vine). Taylor (1984) summarizes most of the integrated pest management (IPM) literature and indicates it is the most common distribution in insects. It must be emphasized that a negative binomial distribution fits most of this data, even those obtained using large sample sizes. Young & Young (1998) report that for large sample sizes, it is difficult to find a mathematical distribution that fits the data well. It is often reported that a negative binomial distribution fits the data collected from high infestations, and a Poisson from low infestations of the same insect. Furthermore, the results of our work show that, in a few fields where the population densities were low, it was difficult to discern among distributions. This is not contradictory because both distributions are indiscernible when an infestation is very low (Young & Young, 1998;Gozé et al., 2003). In addition, goodness-of-fit tests are notoriously weak for distinguishing discrete distributions and there may be more than one distribution that adequately fits the data.
Our experiments were conducted under natural conditions which are ideal for ecological studies and studying the distribution of L. botrana larvae, in particular. In controlled experiments it is difficult to infer ecological importance (Marsh & Borrell, 2001). Aggregation indicated by our results might be due to adult oviposition behaviour (Davis, 1994;Pedigo, 1999), egg survival, larval mobility and larval mortality.
The results suggest that female L. botrana tend to oviposit in a highly aggregated manner. Oviposition site selection can determine the distribution of eggs and larvae, and as a result, affect population dynamics (Price, 1994). Lepidoptera, like many other insects, use a variety of cues, including visual and olfactory cues, to locate oviposition sites. Gravid L. botrana females are attracted by volatiles but do not land at the source if grapes are not present (Tasin et al., 2005), demonstrating a clear role of volatile compounds in oviposition behaviour and the morphological characteristics of grapes, such as shape or colour, in the selection of an oviposition site (Ifoulis & Savopoulou-Soultani, 2002). Maher & Thiéry (2004) showed that females of L. botrana can perceive chemical information from grape berries and that the intensity of their oviposition response is associated with the strength of the stimulus. The active compounds may originate from the interior tissues of berries and reach the surface via diffusion paths in cuticle layers (Baur et al., 1999). Hence, injury to grapes caused by an earlier infestation of larvae may increase the concentration and vagility of those compounds, resulting in an increase in stimulatory chemical information for ovipositing females. Once berries are infested, other females are attracted, and oviposit around the area of the injury, thereby causing aggregations of eggs laid on particular vines. Another reason for this behaviour may be the ability of females to recognize an existing injury by either responding to living larvae or their excreta.
Summarizing, the distribution of L. botrana larvae, we indicate that females seek particular plants, and then tend to lay as many eggs as possible on these plants. They appear to prefer to oviposit on previously infested vines using the presence of larvae as an indicator of good food (high nutritional value, easy for penetration and suitable for nest establishment), a good site (conditions such as temperature, humidity, wind) and absence of natural enemies.

Common k
Stability in the k parameter would both increase the utility of the negative binomial and confidence in its suitability for describing larval distribution of L. botrana. Negative binomial parameter k was estimated using the maximum likelihood method, which gives a constant near zero bias estimation except for small n (Saha & Sudhir, 2005). The degree of L. botrana larval population aggregation may be stable in northern Greece because a common dispersion parameter k can be used to describe the distribution of larvae in many sets of field data even though insect population densities varied between generations, fields and years.
A stable k suggests that the level of aggregation is a species specific constant. Establishing a common value of k provides information about the intrinsic power of the species to reproduce itself (Anscombe, 1949). Another advantage of a common k is that a generalized linear model (GLM) can be used to model the dependence of a discrete response variable on a set of predictors (Dalthorp, 2004). When data are assumed to follow the negative binomial distribution, the mean and variance of counts can be related to one another via the negative binomial k parameter, which is assumed constant in the negative binomial GLM (Gotway & Stroup, 1997). These discrete GLMs include the simplifying assumption that the value of the negative binomial k is known. Gotway & Stroup (1997) applied the GLM approach to negative binomial weed count data, while Dalthorp (2004) used the negative binomial k parameter in a spatial GLM study of Coleoptera. Hence, the assumption of a constant k is a prerequisite for the use of GLM. Based on published k-values for a variety of taxa, strong levels of aggregation are apparently widespread in weevils, eight species of Drosophila, various other Diptera, Coleoptera, eriophyid mites and nematodes (Atkinson & Shorrocks, 1984;Rosewell et al., 1990;Hall et al., 1991, Peng & Brewer, 1994Renshaw et al., 1995;Warren et al., 2003;Dalthorp, 2004;Herve et al., 2005). The k-values for these taxa ranged between 0.005 and 7.15, with the most common values ranging from 0.01 to 0.6, which demonstrates high levels of aggregation. For such species, estimates of abundance depend on an accurate estimate of aggregation (k) prior to sampling or application of a density predictive model, which requires an a priori abundance estimate (Holt et al., 2002;Warren et al., 2003).

Data transformations
For the valid application of parametric analyses of variance and related procedures, certain basic assumptions must be met (normality, homogeneity, additivity). There are types of data, like larval counts of L. botrana, for which it is known that the population sampled is not normally distributed. Transformation of the data from their original form to a different form will generally result in data that have acceptable homoscedasticity characteristics (Zar, 1999). For research purposes, the most informative analysis of variance is obtained using counts transformed using an estimated common k (Anscombe, 1949;Bliss & Owen, 1958). We propose two appropriate transformations for data on the distribution of L. botrana larvae.

Sampling unit
A negative binomial distribution fits the aggregation of L. botrana larvae on both entire and half-vines. Moreover, the stability of common k and its estimation, clearly shows that a kc = 0.6042 is acceptable for both sampling units. Hence, half and entire vines are both valid statistical units.
Vine as a sampling unit is suitable for ecological and insect behavioural studies, in which the number of insects per plant is of interest. In a Monte-Carlo simulation procedure study, in a colder area where two generations of L. botrana occur, Badenhausser et al. (1999) also report that an individual vine is a suitable sampling unit.
For practical purposes and comparative experiments however, we recommend a half-vine as the sampling unit for efficacy or environmental (factorial) studies, for developing sampling plans and estimating larval mean density and economic thresholds. The half-vine term comes from the vertical shoot positioning system (VSP), which is used in most grape-cultivation regions. VSP creates some potential problems and has limitations. The vines are placed in rows, divided by wires into two halves. Usually, the corridor between rows is wide enough for one to move parallel with the rows, easily locate and inspect the east half-vine of one row and west half-vine of the other row, and so reduce total sampling time and cost. A half-vine sampling unit was previously used to determine the fixed sample size used in a nested analysis of variance of a multistage sampling plan (Ifoulis & Savopoulou-Soultani, 2006b). In this study a half-vine sampling unit was also used in the development of a sequential sampling plan.

Sampling program
In this paper, a sequential estimation plan was developed to determine the population density with a particular accuracy. Furthermore, when the focus is on determining whether population density is above or below a stated threshold, classification sequential sampling can significantly improve sampling efficiency.
This plan is suitable for pest management specialists and growers, and will enable them to make more judicious decisions about pesticide application. In the case of L. botrana larvae only a small number of samples are required for a decision when the pest densities differ greatly from economic thresholds.