Most “neutrality tests” assess whether particular data sets depart from the predictions of a standard neutral model without recombination. For Drosophila, where nuclear polymorphism data routinely show evidence of genetic exchange, the assumption of no recombination is often unrealistic. Furthermore, although conservative, this assumption is made at the cost of a large power loss.
Perhaps, as a result, tests of the frequency spectrum based on zero recombination suggest a good fit of the Drosophila polymorphism data to the predictions of the standard neutral model. Here, we analyze the frequency spectrum of a large number of loci in Drosophila melanogaster Recombinant and D. simulans using two summary statistics. We use an estimate of the population recombination rate based on a laboratory estimate of the rate of crossing by physical length and an estimate of the effective population size of the species.
In contrast to previous studies, we found that about half of the loci deviate from the predictions of the standard neutral model. The extent of the output depends on the exact recombination rate, but the overall pattern that emerges is robust. Interestingly, these deviations from neutral expectations are not one-way. The large variation in results may be due to a complex demographic history and inconsistent sampling, or to the pervasive action of natural selection.
We use two summary statistics. The first was D, which compares two estimates of the neutral mutation parameter, 𝛉 (Tajima 1989a). D is negative when there is an excess of rare mutations, as would be expected with recent population growth (Tajima 1989b) or after selective screening (Braverman et al. 1995). It is positive when there are too many intermediate frequency variants, as in the presence of a balanced polymorphism or under certain models of population subdivision (Tajima 1989b). Our second summary statistic, Gη, is based on the differences between the observed and expected number of mutations in each frequency class (Fu 1996).
The simulations assumed an equilibrium Wright-Fisher population, from which random samples were drawn. There was no selection, and each new mutation occurred at a previously unmutated site (Kimura 1969). In this implementation, we generate pedigrees and then place the observed number of mutations, S, in the tree (Hudson 1993). This approach departs from standard coalescent simulations that place mutations at a constant rate (of 𝛉/2) along each of the branches. We took this approach because we were able to observe the number of segregating sites while 𝛉 was unknown (cf. Hudson 1993).
Simulations have shown that the type I errors of various statistical tests under this ‘fixed S’ scheme is approximately equal to the nominal rejection probabilities (Kelly 1997; unpublished data). The parameters of the simulations were, for each locus, the sample size, the number of base pairs and the population recombination rate C = Camp (see below). Although our simulations are unorthodox, we refer to our set of assumptions as of the “standard neutral model” (SNM). To implement coalescence with recombination, we use a modification of a program kindly provided by R. Hudson (see Hudson 1983, 1990).
There is no gene conversion and the rate of crossing over is constant per base pair. Tracing the sample’s ancestral lineages back in time, there are two possible genealogical events. Lineages can merge in the usual way or they can split in two as a result of a crossover event. The result of a split event is that there are now two lineages, one to the left of the crossover point and one to the right. These two lineages then follow each other in time and may merge or split. At any given nucleotide site, there is a standard coalescing tree (no recombination). However, recombination causes different sites to have trees that differ from each other. These trees are not independent of each other, as they share parts of their genealogical histories.
Effect of population recombination rate
Without recombination, there is only one family tree at a locus or one extraction from the (highly stochastic) evolutionary process. With recombination, sites very close to each other are likely to have similar pedigrees, but sites farther apart will have quite different pedigrees. As the rate of recombination increases, so will the number of distinct trees in a location (Hudson 1983). Summaries of the frequency spectrum at neutral sites (such as D and Gη) reflect aspects of the underlying trees.
As the recombination rate increases, they reflect an average of more trees and thus will tend to take values closer to your expectations. In other words, its variance decreases as the recombination rate increases. Wall (1999) showed that a large number of statistical tests (based on both frequency spectrum and linkage disequilibrium summaries) have very little power to detect population structure if the simulations are run with C = 0 when the actual value of C is much higher. This is because the actual variances of the test statistics are smaller than the sample variances of the simulations without recombination.
However, it was unknown whether, in practice, many loci would show deviations from the null model for C > 0. These results suggest that in Drosophila, population recombination rates are high enough that alternative assumptions about C have a bearing. significant effect on one’s own conclusions. In particular, with our conservative estimate of the population recombination rate, Camp, we found that the standard neutral model was a poor predictor of the observed frequency spectra at many loci.