Challenges of microsatellite development in Lepidoptera : Euphydryas aurinia ( Nymphalidae ) as a case study

Currently it remains difficult to obtain robust microsatellite markers for Lepidoptera. In an attempt to overcome the problems associated with developing microsatellite markers for this insect order we combined (i) biotin-enrichment protocol, (ii) next generation pyrosequencing (through 454 GS-FLX Titanium technology) and (iii) the use of individuals collected from eight geographically distant European populations representing three subspecies of Euphydryas aurinia. Out of 96 stringently designed primer pairs, 12 polymorphic microsatellite loci amplified without obvious evidence of null alleles in eight individuals from different subspecies. Between five and seven of these loci showed full within population applicability and three revealed to be robust and transferable between populations and sub-species, providing a first step towards the development of a valuable and robust tool for studying conservation issues and evolution in E. aurinia populations. Nevertheless, as in most studies dealing with Lepidoptera microsatellites, null alleles were detected in most of the developed markers. Our results emphasize the need for further research in order to better understand the complex evolution and organization of Lepidopteran genomes. 261 * These authors contributed equally to this work. ptera: Nymphalidae) was selected. This species has a Palaearctic distribution, is found in a wide range of habitats and its larvae feed on at least ten different host species in southwest Europe (Mazel, 1986). Populations of E. aurinia vary in their wing pattern and three subspecies are recognized based on morphological characters (Higgins, 1950, 1955). In spite of its wide distribution and host plant range, this species is declining in abundance and considered as vulnerable in Europe (Van Swaay & Warren, 1999). These characteristics make this species an excellent model for the study of (i) the evolution of insect-host plant adaptation, (ii) the detection of glacial refuges using phylogeographical methods, (iii) issues in conservation biology and (iv), as in this study, to develop robust microsatellite markers for Lepidoptera by combining biotin-enrichment protocol and high throughput sequencing technology. This paper is an attempt to develop “robust” microsatellite markers for Lepidoptera that (i) do not have null alleles, (ii) have perfect repeat motifs (i.e. not interrupted or compound microsatellites), (iii) are PCR multiplexable and (iv) are transferable between geographically distant populations and/or subspecies. Multiplexing a set of loci makes genotyping faster and cost effective, which is important if a large number of samples have to be analysed. Moreover, multiplex genotyping minimizes the possibility of erroneous compilation of genotypes from different individuals. Perfect microsatellites were chosen because they make microsatellites easily usable not only for frequency based analyses but also in a coalescent theory framework for demographic inferences. In this context, 5 Mbp of microsatellite enriched DNA were sequenced using 454 GS-FLX Titanium technology followed by a stringent selection of sequences for primer design. For PCR amplification and genotyping tests of the markers, several E. aurinia samples from geographically and genetically distant and disconnected populations, which cover a wide geographical range and represent three subspecies, were used. Moreover, the three populations screened for microsatellite polymorphism are known to be at Hardy Weinberg equilibrium based on 8–10 allozyme loci (Zimmermann, 2000; Junker & Schmitt, unpubl.). This procedure is likely to discard markers that lack transferability because of null alleles. MATERIAL AND METHODS Individuals from eight European populations of E. aurinia representing the three European sub-species were used for the different steps in the microsatellite marker development (details in Table 1). The pairwise Fst (Weir & Hill, 2002) of the three populations used for population genetics analyses (from France, Portugal and Estonia) ranged from 0.24 to 0.49 based on allozyme markers. The 104 individuals were kept frozen and/or in 96% alcohol until used for DNA extraction. Microsatellites were isolated from four individuals following a biotin-enrichment protocol adapted from Kijas et al. (1994) at Genoscreen (Lille, France) (Table 1). Genomic DNA was extracted from 25 mg of tissue using the DNeasy Tissue Kit (QIAGEN) followed by RNase treatment. The obtained genomic DNA (2 μg) was digested with RsaI (Fermentas) for 1 hour at 37°C, following the manufacturer’s recommendations. DNA was further ligated to standard oligonucleotide adapters as described elsewhere (Dubut et al., 2009). Eight biotin-labelled oligonucleotides, (AG)10, (AC)10, (AAC)8, (AGG)8, (ACG)8, (AAG)8, (ACAT)6 and (ATCT)6, were simultaneously hybridized to the ligated DNA for 10 min after initial denaturation of the ligation products. The choice of screened motifs was based on microsatellite motif frequencies of the published genomes of 12 insect and vertebrate species (Martin et al., 2010). The enrichment step was completed using Dynabeads (Invitrogen) following a standard procedure. The resulting enriched DNA was amplified using primers corresponding to the standard adapters, with 25 cycles (20 s at 95°C, 20 s at 60°C and 1 min 30 s at 72°C) and a final elongation step of 30 min at 72°C. The PCR product was immediately purified using QIAquick PCR Purification Kit (QIAGEN). Then, two libraries were prepared. The first from one individual (collected from Le Puits de Rians) and the second from three individuals (Table 1), each of them labelled with a different MID (Multiplex IDentifier; MID1ACGAGTGCGT, MID2-ACGCTCGACA, MID3-AGACGCA CTC) and then pooled. Each purified enriched library was used in the 454 GS-FLX Titanium library preparation (Roche Applied Science) following the manufacturer’s protocols at Genoscreen (Lille, France). Emulsion PCR was carried out at a ratio of one copy per bead and subsequently disrupted using isopropanol. Beads containing amplified DNA fragments were enriched and recovered for sequencing to finally provide 50,000 to 70,000 enriched beads for each library. The recovered ssDNA beads were packed onto 1/16 region of a 70 × 75 mm Titanium PicoTiter plate and sequenced with 200 cycles. Sample preparation and analytical processing, such as base calling, were performed at Genoscreen (Lille, France) using the manufacturer’s protocol for Titanium series. The selection of sequences for primer design was done with the program QDD (Meglécz et al., 2010). Enrichment adaptors and tags were removed from sequences. Sequences longer than 80 base pairs (bp) and containing perfect microsatellites of at least five repetitions for any motif of 2–6 bp were selected for further analyses. Sequence similarities were detected through an “all against all” BLAST (Altschul et al., 1997; e-value 1E-40) in which microsatellite motifs were soft masked. Sequences for which pairwise similarity along the flanking regions was over 95% were grouped into contigs and a 2/3 majority rule consensus sequence was created from each contig. Sequences that had significant BLAST hit to other sequences with an overall similarity among the flanking region bellow 95% were discarded to avoid potential intra-genomic multicopy sequences. All of the unique sequences (with no BLAST hit to any other “454” reads) or consensus sequences were checked for the presence of short repetitions in the flanking regions. PCR primers were designed using QDD with the following stringent criteria: (i) target microsatellites have at least seven repetitions, (ii) length of PCR product between 100 and 300 bp, (iii) flanking regions did not contain a mononucleotide stretch of more than four bases and any di-hexa base pair motifs of more than two repetitions, (iv) annealing temperatures of primer pairs were optimised to fall between 57°C and 63°C, and (v) microsatellites are not compound or interrupted. Following this procedure, 96 sequences with target microsatellites were retained. Amplifications were initially performed on four individuals (from Estonia and Portugal, see Table 1) for each of the 96 primer pairs in a total volume of 10 μL containing 0.5 μL of 1 : 10 diluted DNA extract using the QIAGEN Multiplex PCR Kit (QIAGEN). Thermocycling was performed on a Mastercycler gradient (Eppendorf) with the following protocol: 95°C for 15 min, followed by 30 cycles (94°C for 1 min, 56°C for 1 min,


INTRODUCTION
Microsatellites are the most widely used molecular markers in ecological genetics, due to their genetic information content, heritability and the repeatability of their results (Selkoe & Toonen, 2006).However, it has so far proved difficult to isolate microsatellite markers for most of the Lepidoptera species assayed (e.g.Antony et al., 2001;Péténian et al., 2005;Fauvelot et al., 2006;van't Hof et al., 2007;Habel et al., 2008).This likely lies in mutually non-exclusive causes including flanking region similarity among loci (Meglécz et al., 2004(Meglécz et al., , 2007)), microsatellite association with transposable elements (Coates et al., 2009;Tay et al., 2010) and high frequencies of null alleles (Keyghobadi et al., 1999;Harper et al., 2003;Meglécz et al., 2004;Zhang, 2004;van't Hof et al., 2005;Mikheyev et al., 2010).As a consequence of high null allele frequencies, the transferability between species of the same genus (Flanagan et al., 2002;Nowak et al., 2009) and also between populations of the same species (Jiggins et al., 2005) is limited.Nevertheless, Daly et al. (2004) suggest that "while problems may be encountered in some species, the microsatellite approach should not be discounted as an inefficient marker strategy in Lepidoptera".In the era of high throughput sequencing technologies, when it is possible to produce tens of thousands of genomic fragments at reasonable cost and time, the question arises: Can a large amount of sequence data counterbalance the low success rate of robust microsatellite isolation in Lepidoptera?In fact, this approach appears promising, since large shotgun genomic libraries have recently proved to be an efficient, cost-effective and time saving tool for isolating microsatellites in other taxa, such as Aves (e.g.Hymenolaimus malacorhynchos; Abdelkrim et al., 2009), Reptilia (e.g.Agkistrodon contortrix; Castoe et al., 2010), Ascidiacea (e.g.Didemnum vexillum; Abbott et al., 2010) and Liliopsida (e.g.Typha minima; Csencsics et al., 2010), especially when combined with a biotinenrichment protocol as in the case of Zingel asper (Actinopterygii) (Dubut et al., 2010).
As an example of a butterfly species, the marsh fritillary, Euphydryas aurinia, (Rottemburg, 1775; Lepido-ptera: Nymphalidae) was selected.This species has a Palaearctic distribution, is found in a wide range of habitats and its larvae feed on at least ten different host species in southwest Europe (Mazel, 1986).Populations of E. aurinia vary in their wing pattern and three subspecies are recognized based on morphological characters (Higgins, 1950(Higgins, , 1955)).In spite of its wide distribution and host plant range, this species is declining in abundance and considered as vulnerable in Europe (Van Swaay & Warren, 1999).These characteristics make this species an excellent model for the study of (i) the evolution of insect-host plant adaptation, (ii) the detection of glacial refuges using phylogeographical methods, (iii) issues in conservation biology and (iv), as in this study, to develop robust microsatellite markers for Lepidoptera by combining biotin-enrichment protocol and high throughput sequencing technology.
This paper is an attempt to develop "robust" microsatellite markers for Lepidoptera that (i) do not have null alleles, (ii) have perfect repeat motifs (i.e.not interrupted or compound microsatellites), (iii) are PCR multiplexable and (iv) are transferable between geographically distant populations and/or subspecies.Multiplexing a set of loci makes genotyping faster and cost effective, which is important if a large number of samples have to be analysed.Moreover, multiplex genotyping minimizes the possibility of erroneous compilation of genotypes from different individuals.Perfect microsatellites were chosen because they make microsatellites easily usable not only for frequency based analyses but also in a coalescent theory framework for demographic inferences.
In this context, 5 Mbp of microsatellite enriched DNA were sequenced using 454 GS-FLX Titanium technology followed by a stringent selection of sequences for primer design.For PCR amplification and genotyping tests of the markers, several E. aurinia samples from geographically and genetically distant and disconnected populations, which cover a wide geographical range and represent three subspecies, were used.Moreover, the three populations screened for microsatellite polymorphism are known to be at Hardy Weinberg equilibrium based on 8-10 allozyme loci (Zimmermann, 2000;Junker & Schmitt, unpubl.).This procedure is likely to discard markers that lack transferability because of null alleles.

MATERIAL AND METHODS
Individuals from eight European populations of E. aurinia representing the three European sub-species were used for the different steps in the microsatellite marker development (details in Table 1).The pairwise Fst (Weir & Hill, 2002) of the three populations used for population genetics analyses (from France, Portugal and Estonia) ranged from 0.24 to 0.49 based on allozyme markers.The 104 individuals were kept frozen and/or in 96% alcohol until used for DNA extraction.
Microsatellites were isolated from four individuals following a biotin-enrichment protocol adapted from Kijas et al. (1994) at Genoscreen (Lille, France) (Table 1).Genomic DNA was extracted from 25 mg of tissue using the DNeasy Tissue Kit (QIAGEN) followed by RNase treatment.The obtained genomic DNA (2 µg) was digested with RsaI (Fermentas) for 1 hour at 37°C, following the manufacturer's recommendations.
DNA was further ligated to standard oligonucleotide adapters as described elsewhere (Dubut et al., 2009).Eight biotin-labelled oligonucleotides, (AG)10, (AC)10, (AAC)8, (AGG)8, (ACG)8, (AAG)8, (ACAT)6 and (ATCT)6, were simultaneously hybridized to the ligated DNA for 10 min after initial denaturation of the ligation products.The choice of screened motifs was based on microsatellite motif frequencies of the published genomes of 12 insect and vertebrate species (Martin et al., 2010).The enrichment step was completed using Dynabeads (Invitrogen) following a standard procedure.The resulting enriched DNA was amplified using primers corresponding to the standard adapters, with 25 cycles (20 s at 95°C, 20 s at 60°C and 1 min 30 s at 72°C) and a final elongation step of 30 min at 72°C.The PCR product was immediately purified using QIAquick PCR Purification Kit (QIAGEN).Then, two libraries were prepared.The first from one individual (collected from Le Puits de Rians) and the second from three individuals (Table 1), each of them labelled with a different MID (Multiplex IDentifier; MID1-ACGAGTGCGT, MID2-ACGCTCGACA, MID3-AGACGCA CTC) and then pooled.Each purified enriched library was used in the 454 GS-FLX Titanium library preparation (Roche Applied Science) following the manufacturer's protocols at Genoscreen (Lille, France).
Emulsion PCR was carried out at a ratio of one copy per bead and subsequently disrupted using isopropanol.Beads containing amplified DNA fragments were enriched and recovered for sequencing to finally provide 50,000 to 70,000 enriched beads for each library.The recovered ssDNA beads were packed onto 1/16 region of a 70 × 75 mm Titanium PicoTiter plate and sequenced with 200 cycles.Sample preparation and analytical processing, such as base calling, were performed at Genoscreen (Lille, France) using the manufacturer's protocol for Titanium series.
The selection of sequences for primer design was done with the program QDD (Meglécz et al., 2010).Enrichment adaptors and tags were removed from sequences.Sequences longer than 80 base pairs (bp) and containing perfect microsatellites of at least five repetitions for any motif of 2-6 bp were selected for further analyses.Sequence similarities were detected through an "all against all" BLAST (Altschul et al., 1997;e-value 1E-40) in which microsatellite motifs were soft masked.Sequences for which pairwise similarity along the flanking regions was over 95% were grouped into contigs and a 2/3 majority rule consensus sequence was created from each contig.Sequences that had significant BLAST hit to other sequences with an overall similarity among the flanking region bellow 95% were discarded to avoid potential intra-genomic multicopy sequences.All of the unique sequences (with no BLAST hit to any other "454" reads) or consensus sequences were checked for the presence of short repetitions in the flanking regions.PCR primers were designed using QDD with the following stringent criteria: (i) target microsatellites have at least seven repetitions, (ii) length of PCR product between 100 and 300 bp, (iii) flanking regions did not contain a mononucleotide stretch of more than four bases and any di-hexa base pair motifs of more than two repetitions, (iv) annealing temperatures of primer pairs were optimised to fall between 57°C and 63°C, and (v) microsatellites are not compound or interrupted.Following this procedure, 96 sequences with target microsatellites were retained.
Amplifications were initially performed on four individuals (from Estonia and Portugal, see Table 1) for each of the 96 primer pairs in a total volume of 10 µL containing 0.5 µL of 1 : 10 diluted DNA extract using the QIAGEN Multiplex PCR Kit (QIAGEN).Thermocycling was performed on a Mastercycler ® gradient (Eppendorf) with the following protocol: 95°C for 15 min, followed by 30 cycles (94°C for 1 min, 56°C for 1 min, 72°C for 1 min), and 60°C for 45 min.Out of the 96 primer pairs, 23 gave clean PCR products on agarose gel electrophoresis, i.e. discrete single bands or at most two when there were large difference in size between alleles.The remaining 73 primer pairs either did not amplify in some of the 4 individuals or produced multiple bands, smears, or severe ladder-like patterns.The reverse primer of Eau52 was redesigned at this step, since it gave clear but weak bands.
All the 23 markers were then tested on eight individuals collected from distant populations and representing three subspecies (E.a. aurinia from Estonia and Greece, E. a. beckeri from Portugal and E. a. debilis from Switzerland; see Table 1).The loci were amplified separately using forward primers labelled with the fluorescent dyes 6-FAM (Eurogentec), PET, NED or VIC (Applied Biosystems).The amplicons were visualized using an ABI 3130XL Genetic Analyzer (Applied Biosystems).Allele sizes were scored against an internal GeneScan-500 LIZ ® Size Standard (Applied Biosystems) and genotypes obtained using GeneMapper ® 3.7 (Applied Biosystems).Twelve markers, which amplified in all eight individuals and showed unambiguous genotype patterns, were kept and combined into two PCR multiplex kits (Table 2).The multiplex kits were tested using the amplification protocol described above on three populations (from Estonia, France and Portugal; see Table 1).These populations were selected based on three criteria: (i) they cover a wide geographical range and represent two subspecies, (ii) they are geographically and genetically distant and are disconnected and (iii) all of them display Hardy-Weinberg (HW) equilibrium based on 8-10 polymorphic allozyme loci (Zimmerman, 2000;Junker & Schmitt, unpubl.).
GENEPOP 4.0 (Rousset, 2008) was used to: (i) test HW equilibrium, (ii) estimate the heterozygosity for all loci and populations and (iii) test linkage disequilibrium (LD) among loci within populations.Micro-Checker ver.2.2.3 (van Oosterhout et al., 2004) was used to analyse the causes of departures from HW equilibrium: real HW disequilibrium, null alleles or scoring errors (often resulting from stuttering).

RESULTS
From the two libraries, 19879 raw sequences of at least 80 bp were obtained, and 3413 were (i) unique sequences (with no BLAST hit to any other "454" reads) or consensus sequences and (ii) contained a perfect microsatellite of at least 5 repetitions.The flanking regions of all of these sequences were checked and 263 with a flanking region of at least 30 bp without homopolymers and short repetitions (i.e.3-5 repeats) were retained.For 206 sequences primer pairs could be designed.Of them, 110 sequences with microsatellites of only 5 and 6 repetitions were discarded (maximizing chance of being polymorphic for target microsatellites).
Out the 96 sequences with a perfect microsatellite motif and at least seven repeats, 23 displayed clean PCR products and amplified all four individuals tested on agarose gels and 12 loci amplified in all eight individuals (representative of three subspecies) that were used for testing the labelled primers.The total number of alleles of these 12 loci (based on three population samples) ranged from 4 to 34 per locus.The expected heterozygosity for the different loci (based on 98 individuals from three populations) ranged from 0.10 to 0.87 (Table 2).
After applying Benjamini & Hochberg's (1995) False Discovery Rate (FDR) controlling procedure, between five and seven markers displayed HW equilibrium within each of the tested E. aurinia populations.Three markers (Eau59, Eau64 and Eau73) displayed HW equilibrium in all three populations and four other markers (Eau21, Eau45, Eau71 and Eau81) were at HW equilibrium in two populations.Regarding the loci displaying a significant heterozygote deficiency, Micro-Checker analyses indicated the presence of null alleles in all cases.Moreover, loci Eau32, Eau52, Eau79 and Eau81 were not amplified in more than four individuals out of the 98 analyzed, which indicates homozygotes with null alleles.
Additionally, no significant LD was detected between any of the twelve loci after applying the FDR controlling procedure.

DISCUSSION
Although over the last decade an increasing number of microsatellite markers have been developed for Lepidoptera, with over 20 species for which there are at least 10 markers, it is still difficult to develop microsatellite markers for this insect order.Currently, high numbers of 1 Number of individuals (n = 4) used for biotin-enrichment and 454 pyrosequencing steps; 2 Number of individuals (n = 4) used to test the PCR amplification of the 96 primer pairs on agarose gel; 3 Number of individuals (n = 8) used for testing genotype profiles of the 23 primer pairs that passed the previous tests; 4 Individuals used for screening polymorphism, testing HW and LD equilibrium at the population level for the 12 loci that passed the previous genotyping tests.
- microsatellites (n > 20) are published only for some economically important species (e.g.Cydia pomonella: Franck et al., 2005;Zhou et al., 2005; Ostrinia nubilalis: Dopman et al., 2004;Coates et al., 2005;Dalecky et al., 2006;Kim et al., 2008;Bombyx mori: Prasad et al., 2005;Nagaraja et al., 2005) or for model species (e.g.Bicyclus anynana: Beldade et al., 2009).Furthermore, apart from the few studies where all the markers were found at Hardy-Weinberg equilibrium (MERPDC et al., 2010;Zhou et al., 2005;Endersby et al., 2005;Torres-Leguizamon et al., 2009), in almost all the other reports of novel microsatellites in Lepidoptera, heterozygote deficiencies and the presence of null alleles are reported.It appears that the major problem with microsatellite development in Lepidopera is the frequent presence of null alleles.It is also noteworthy that next generation pyrosequencing, through 454 GS-FLX Titanium technology, was recently used to develop microsatellite markers from an E. editha transcriptome library (Mikheyev et al., 2010).Nevertheless, the markers displayed HW disequilibrium in many of the populations screened for polymorphism and a non-negligible number of individuals could not be amplified.Most of these markers therefore are non-robust and have limited transferability.Furthermore, markers designed from transcriptome libraries might be subject to selection (by linkage) and could result in biased estimates of population differentiation.
To overcome these problems, a combination of a biotinenrichment protocol and 454 GS-FLX Titanium technology was used to develop robust microsatellite markers for E. aurinia.This method produces 5-10 Mbp genomic sequences more cheaply than the few hundred sequences of 2-700 bp produced by cloning-based methods (ca.1200 EUR for 454 GS-FLX vs. ca.4000 EUR).Thus it is possible to stringently select sequences for primer design and still leave a high number of potential markers that can be tested later, if necessary.In addition, in order to avoid microsatellites with null alleles, individuals from three E. aurinia sub-species collected from eight geographically and genetically distant populations were included in the microsatellite development protocol.In this way, it was expected that the number of markers that are transferable between sub-species and/or distant populations for population analyses, would be maximized.
Although five to seven of the microsatellites developed showed applicability within each population and subspecies tested, only three of the initial 96 loci (Eau59, Eau64 and Eau73) showed evidence of transferability among E. aurinia sub-species and populations (i.e.HW equilibrium in all the three populations tested and therefore no null alleles), and thus appear suitable for analyses of population genetics over the entire European range of the species.Admittedly, more robust microsatellites are needed for carrying out studies on population genetics on E. aurinia.Nevertheless, the combination of 454 pyrosequencing of a microsatellite enriched library and a sequential testing of E. aurinia sub-species and geographically distant populations during the microsatellite development protocol appears to be an effective way of isolating robust microsatellites for Lepidoptera, especially when time and cost are considered.We suggest this strategy should be systematically applied in a population and conservation genetics framework in order to validate microsatellites.markers it is especially important not to test only one population for HW equilibrium, since it is uninformative regarding the transferability of markers.
A high success rate of microsatellite isolation obviously is an important factor for microsatellite users.However, the low success rate reported here emphasizes earlier hypotheses that stress the complexity of genome organisation and evolution in Lepidoptera.The high frequency of null alleles suggest that mutation rate in the flanking regions of most of the microsatellite loci is unusually high in Lepidoptera.A better understanding of the dynamics shaping the genomes of this insect order depends on future genomic studies focusing on this problem.

TABLE 1 .
The samples of E. aurinia used to develop the microsatellites.

TABLE 2 .
Characteristics of the twelve polymorphic microsatellite loci in E. aurinia.Abbreviations: NA -number of alleles; Hoobserved heterozygosity; He -expected heterozygosity; N -number of individuals amplified; P -Probability value of the HW test; * -HW disequilibrium significant at 0.05 level after FDR controlling procedure.