Development of novel microsatellite markers for a specialist species of Lepidoptera , Boloria aquilonaris ( Nymphalidae ) , based on 454 sequences

Microsatellites are the most common markers used in population and conservation genetic studies. However, their isolation is laborious and expensive. In some taxa, such as Lepidoptera, it is particularly difficult to isolate microsatellite markers due to the high similarity of the flanking regions of different loci and the presence of null alleles. Here we isolated microsatellites of the endangered butterfly Boloria aquilonaris using 454 GS-FLX Titanium pyro-sequences of biotin enriched DNA libraries and tested the success of cross-amplification on the sister-species B. eunomia. Fifteen polymorphic microsatellite loci were isolated in B. aquilonaris using initially 101 stringently designed primer pairs. Unlike in many other studies of microsatellite isolation in Lepidoptera, few null alleles were detected and only at very low frequencies. Additionally, the raw data set can still be used for the isolation of other microsatellite loci. None of the selected polymorphic loci for B. aquilonaris gave clear banding patterns for B. eunomia, although about 15 other loci gave promising banding patterns for the latter species. Low intraand inter-specific transferability of developed markers in this study also lends support to the hypothesis that the evolution of the genome of Lepidoptera is dissimilar from that of other organisms. 129 * Corresponding and current address: Laboratory of Molecular Virology, Institut de Biologie et Médecine Moléculaires (IBMM), Université Libre de Bruxelles (ULB), 6041 Gosselies, Belgium; e-mail: svandewoestijne@gmail.com tion genetic studies on this species have increased our understanding of effective connectivity between populations (Vandewoestijne & Baguette, 2002, 2004). These studies were based on both allozyme and RAPD (Random Amplified Polymorphic DNA; Williams et al., 1990) markers. At the landscape scale, both markers confirmed CMR results of high connectivity, and hence low population differentiation. However, incongruences occur at the regional scale, likely due to the very low number of polymorphic allozyme loci and differences in the mutation rate between the two markers (Vandewoestijne & Baguette, 2002). The population structure obtained using the RAPD data set provide a coherent picture of the geographic configuration of the populations within the landscape and the demographic data (Vandewoestijne & Baguette, 2002). However, although dominant markers, such as RAPD and AFLP’s have demonstrated their utility (Meudt & Clarke, 2007), even in population assignment tests (Campbell et al., 2003), they remain of limited use in conservation studies where genotypic data is necessary to guide management practices in relation to inbreeding and hence, where it is necessary to distinguish heterozygous from homozygous individuals. More importantly and especially in the case of endangered species, non-lethal sampling is plausible using microsatellites as their amplification does not necessitate large quantities of DNA (e.g. a small wing or leg fragment suffices). Here we isolate microsatellites in B. aquilonaris using 454 GS-FLX Titanium pyrosequencing of biotin enriched DNA libraries. We also tested success of cross-amplification of the developed microsatellites on the sister-species B. eunomia. MATERIAL AND METHODS Samples were collected from eight populations in the Belgian Ardenne in 1996, including 4 different uplands. Populations were separated by up to 62 km and were significantly differentiated based on RAPD data (Fst = 0.179, (Vandewoestijne & Baguette, 2002). Genomic DNA was extracted using a phenolchloroform-isoamyl alcohol method as described in Vandewoestijne & Baguette (2002), followed by RNase treatment. Microsatellite isolation was carried out by Genoscreen (Lille, France) by combining biotin-enrichment protocol and highthroughput pyrosequencing using the procedure described in Malausa et al. (2011). This protocol prevents the introduction of major bias in terms of the proportions and distribution of microsatellites (Martin et al., 2010) and is useful and efficient in other taxa (e.g. Dubut et al., 2010). Briefly, microsatellites were isolated following a biotin-enrichment protocol adapted from Kijas et al. (1994). Extracted genomic DNA, digested with RsaI (Fermentas), was ligated to standard oligonucleotide adapters. Eight biotinylated oligonucleotide probes were simultaneously hybridized to the ligated DNA for 10 min after initial denaturation following a protocol adapted from Kijas et al. (1994). The enrichment step was accomplished using Dynabeads (Invitrogen) following the standard procedure. The resulting enriched DNA was amplified using standard adapters and the QIAquick PCR Purification Kit (QIAGEN) was used to immediately purify the amplicons. The purified enriched library was used for the GS-FLX 454 Titanium (Roche Applied Science) library preparation following the manufacturer’s protocols. Emulsion PCR was carried out at a ratio of 1 copy per bead and subsequently disrupted using isopropanol. Beads containing amplified DNA fragments were enriched and sequenced with 200 cycles. Sample preparation and analytical processing, such as base calling, were performed using the manufacturer’s Titanium series protocol. The QDD program (Meglecz et al., 2010) was used to select sequences for primer design following the proposed criteria. Enrichment adaptors were removed from sequences longer than 80 base pairs (bp) and microsatellites containing at least seven repetitions of any motif of 2–6 bp were selected for further analyses. Sequence similarities were detected through an “all against all” BLAST (Altschul et al., 1997; e-value 1E-40) in which microsatellite motifs were soft masked. Sequences for which pair wise similarity along the flanking regions was over 95% were grouped into contigs and a 2/3 majority rule consensus sequence was created from each contig. Sequences that had significant BLAST hit to other sequences with an overall similarity among the flanking region below 95% were discarded to avoid potential intra-genomic multi-copy sequences. All of the unique and consensus sequences were checked for the presence of short repetitions in the flanking regions. PCR primers were designed using QDD and the following criteria: (i) target microsatellites have at least seven repetitions, (ii) length of PCR product between 100 and 300 bp, (iii) flanking regions did not contain a mononucleotide stretch of more than four bases and any di-hexa pair motifs of more than two repetitions, (iv) annealing temperatures of primer pairs were optimized to fall between 57°C and 63°C, and (v) microsatellites are not compound or interrupted. 101 sequences with target microsatellites were retained. All retained primer pairs were amplified separately for four B. aquilonaris and three B. eunomia individuals (Table 1, step 2) in a total volume of 10 μL monoplex PCR, with 10 ng of DNA, and final concentrations of 2.5 mM MgCl2, 0.1 mM of each 130 1 50°13 ́N,5°47 ́E Pisserotte 1 50°13 ́N,5°54 ́E Bovigny 1 50°10 ́N,5°47 ́E Sommerain Plateau des Tailles B. eunomia 14 1 1 49°57 ́N,5°19 ́E Libin Plateau de Recogne 1 50°01 ́N,5°25 ́E Plaine Hé Plateau St. Hubert 2 2 1 50°16 ́N,5°44 ́E Crépale Plateau des Tailles 21 1 1 1 50°15 ́N,5°52 ́E Mirenne Plateau des Tailles 19 1 1 1 50°14 ́N,5°46 ́E Grand Fange Plateau des Tailles 1 50°19 ́N,6°41 ́E Eichenbusch Plateau des Hautes Fagnes B. aquilonaris 15 primer pairs 17 primer pairs 101 primer pairs Coordinates Locality Upland Species Step 4 Population genetic analyses Step 3 Labelled primers screening Step 2 Non-labelled primers screening Step 1 454 sequencing Sampled sites Sampled individuals in each of fourstep of microsatellite isolation process TABLE 1. Boloria aquilonaris and B. eunomia sampling scheme used in the isolation of microstallites. Geographic and population origin and number of individuals used in each of the four steps in the isolation of microsatellites. Please refer to text for more information on the methods used in the different steps.


INTRODUCTION
Microsatellites are co-dominant, hyper variable, neutral and reproducible molecular markers (Jarne & Lagoda, 1996).For these reasons, they have become the most widely used molecular markers in population genetic and conservation studies.However, developing these markers through microsatellite enrichment followed by cloning and Sanger sequencing is time consuming and costly.Additionally, genotyping errors occur due to stutter products (Shinde et al., 2003), allele drop out (Miller & Waits, 2003), short allele dominance (Wattier et al., 1998) and the presence of null alleles (due to mutations or deletions in the flanking region, preventing primers from binding and therefore preventing amplification of the corresponding allele, Pemberton et al., 1995).These genotyping errors can cause deviations from Hardy-Weinberg proportions, in particular heterozygote deficiency and hence bias results and subsequent conservation and management decisions.Genotyping errors and difficulties in isolating microsatellite markers are especially predominant in butterflies although Lepidoptera are often considered as an ideal model for conservation biology studies because (i) they respond rapidly to both habitat change and global warming (Warren et al., 2001) and (ii) they are good indicators of habitat quality (New, 1997).The difficulty of isolating microsatellites in Lepidoptera are associated with high similarity in flanking regions between different microsatellites within a species (Meglecz et al., 2004) and / or the lack of conserved flanking regions leading to unrepeatable banding patterns (Sarhan, 2006).Hence, developing reliable microsatellite markers within this insect order is a necessary priority.Next-generation sequencing, using 454 GS-FLX technology (Roche Applied Science), offers the opportunity for more efficient microsatellite isolation (Malausa et al., 2011;Abdelkrim et al., 2009) than was previouslypossible.
Here we applied this new technology to develop microsatellite markers for a butterfly species of major conservation concern (listed as vulnerable in the Red Data Book on European Butterflies).The cranberry fritillary Boloria aquilonaris is a glacial relict located in wet acid peat bogs where their principal larval host plant, Vaccinium oxycoccos, is abundant.While the species is more or less continuously distributed towards the north of its range, western European B. aquilonaris populations are fragmented.Indeed, because of the high water requirements of bogs, large areas of habitats are restricted to uplands where climatic conditions are cold and wet.Over the past century, drainage and afforestation of peat bogs with Norway Spruce Picea abies has not only resulted in the substantial decline of the species [104 recorded populations before 1970, only 25 in 1991; (Baguette et al., 1992)], but has also increased the isolation of the remnant populations.Intensive demographic and ecological work has been carried out on the Cranberry fritillary.Capture-mark-recapture studies in the 1990's and again in the last few years have revealed a relatively high numbers of individuals moving between closely situated populations (Mousson et al., 1999) but also some very long distance dispersal movements (Baguette, 2003).The ecological requirements have also been studied in detail, providing very specific information on the habitat requirements of this species, and its consequences for population size (Turlure et al., 2010a) and significance for conservation under climate change (Turlure et al., 2010b).Popula-tion genetic studies on this species have increased our understanding of effective connectivity between populations (Vandewoestijne & Baguette, 2002Baguette, , 2004)).These studies were based on both allozyme and RAPD (Random Amplified Polymorphic DNA; Williams et al., 1990) markers.At the landscape scale, both markers confirmed CMR results of high connectivity, and hence low population differentiation.However, incongruences occur at the regional scale, likely due to the very low number of polymorphic allozyme loci and differences in the mutation rate between the two markers (Vandewoestijne & Baguette, 2002).The population structure obtained using the RAPD data set provide a coherent picture of the geographic configuration of the populations within the landscape and the demographic data (Vandewoestijne & Baguette, 2002).However, although dominant markers, such as RAPD and AFLP's have demonstrated their utility (Meudt & Clarke, 2007), even in population assignment tests (Campbell et al., 2003), they remain of limited use in conservation studies where genotypic data is necessary to guide management practices in relation to inbreeding and hence, where it is necessary to distinguish heterozygous from homozygous individuals.More importantly and especially in the case of endangered species, non-lethal sampling is plausible using microsatellites as their amplification does not necessitate large quantities of DNA (e.g. a small wing or leg fragment suffices).
Here we isolate microsatellites in B. aquilonaris using 454 GS-FLX Titanium pyrosequencing of biotin enriched DNA libraries.We also tested success of cross-amplification of the developed microsatellites on the sister-species B. eunomia.

MATERIAL AND METHODS
Samples were collected from eight populations in the Belgian Ardenne in 1996, including 4 different uplands.Populations were separated by up to 62 km and were significantly differentiated based on RAPD data (Fst = 0.179, (Vandewoestijne & Baguette, 2002).Genomic DNA was extracted using a phenolchloroform-isoamyl alcohol method as described in Vandewoestijne & Baguette (2002), followed by RNase treatment.
Microsatellite isolation was carried out by Genoscreen (Lille, France) by combining biotin-enrichment protocol and highthroughput pyrosequencing using the procedure described in Malausa et al. (2011).This protocol prevents the introduction of major bias in terms of the proportions and distribution of microsatellites (Martin et al., 2010) and is useful and efficient in other taxa (e.g.Dubut et al., 2010).Briefly, microsatellites were isolated following a biotin-enrichment protocol adapted from Kijas et al. (1994).Extracted genomic DNA, digested with RsaI (Fer-mentas), was ligated to standard oligonucleotide adapters.Eight biotinylated oligonucleotide probes were simultaneously hybridized to the ligated DNA for 10 min after initial denaturation following a protocol adapted from Kijas et al. (1994).The enrichment step was accomplished using Dynabeads (Invitrogen) following the standard procedure.The resulting enriched DNA was amplified using standard adapters and the QIAquick PCR Purification Kit (QIAGEN) was used to immediately purify the amplicons.The purified enriched library was used for the GS-FLX 454 Titanium (Roche Applied Science) library preparation following the manufacturer's protocols.Emulsion PCR was carried out at a ratio of 1 copy per bead and subsequently disrupted using isopropanol.Beads containing amplified DNA fragments were enriched and sequenced with 200 cycles.Sample preparation and analytical processing, such as base calling, were performed using the manufacturer's Titanium series protocol.
The QDD program (Meglecz et al., 2010) was used to select sequences for primer design following the proposed criteria.Enrichment adaptors were removed from sequences longer than 80 base pairs (bp) and microsatellites containing at least seven repetitions of any motif of 2-6 bp were selected for further analyses.Sequence similarities were detected through an "all against all" BLAST (Altschul et al., 1997;e-value 1E-40) in which microsatellite motifs were soft masked.Sequences for which pair wise similarity along the flanking regions was over 95% were grouped into contigs and a 2/3 majority rule consensus sequence was created from each contig.Sequences that had significant BLAST hit to other sequences with an overall similarity among the flanking region below 95% were discarded to avoid potential intra-genomic multi-copy sequences.All of the unique and consensus sequences were checked for the presence of short repetitions in the flanking regions.PCR primers were designed using QDD and the following criteria: (i) target microsatellites have at least seven repetitions, (ii) length of PCR product between 100 and 300 bp, (iii) flanking regions did not contain a mononucleotide stretch of more than four bases and any di-hexa pair motifs of more than two repetitions, (iv) annealing temperatures of primer pairs were optimized to fall between 57°C and 63°C, and (v) microsatellites are not compound or interrupted.101 sequences with target microsatellites were retained.
All retained primer pairs were amplified separately for four B. aquilonaris and three B. eunomia individuals (Table 1 B. eunomia sampling scheme used in the isolation of microstallites.Geographic and population origin and number of individuals used in each of the four steps in the isolation of microsatellites.Please refer to text for more information on the methods used in the different steps.dNTP, 0.5 µM of each primer and 0.6 U Taq.Samples used in these preliminary tests were selected from both non-and highlydivergent populations based on previous results (Vandewoestijne & Baguette, 2002).Thermocycling was performed on a Veriti Thermal Cycler (Applied Biosystems) with the following protocol: 95°C for 4 min, followed by 35 cycles (94°C for 1 min, 56°C for 1 min, 72°C for 1 min), and 60°C for 45 min.In order to select only those primer pairs that gave discrete single bands (or at most two when difference in allele size was large) the PCR products were run on 1% agarose gels.Seventeen primer pairs gave clean PCR products and indicated interpopulation polymorphism.
Multiplex mix reactions were tested on the 17 selected markers (Table 1, step 3).AutoDimer (Vallone & Butler, 2004) was used to determine primer composition within a mix to avoid primer-dimer and hairpin interactions.Loci were amplified using forward primers labelled with the fluorescent dyes 6-FAM, 6-TAMN and HEX (IDTdna).The amplified fragments were visualized using an ABI PRISM 3100 Genetic Analyser (Applied Biosystems).Allele sizes were scored against the internal standard ROX-400SD (Applied Biosystems) using GeneMapper 3.5® (Applied Biosystems).Multiplex amplifications were conducted using QIAGEN Multiplex PCR Kit (QIA-GEN) on Veriti Thermal Cycler (Applied Biosystems) with the following protocol: 95°C for 15 min, followed by 32 cycles (94°C for 30 s, 56°C for 90 s, 72°C for 90 s), and 72°C for 30 min, terminating at 25°C.
Two markers were dropped from the subsequent analyses because of non-amplification in multiplex reactions or lack of polymorphism.Hence, three populations ( characterized by 15 markers (Table 2).For each marker, genotypes were scored automatically using GeneMapper 3.7 (Applied Biosystems) and manually verified and corrected in case of automatic scoring errors.
Costs were limited because the B. aquilonaris microsatellite library occupied only 1/50 of a 454 GS-FLX run and initial primer selection was carried out with non-labelled primers on agarose gels to eliminate primers producing stutter patterns and only retain polymorphic markers.
Hardy-Weinberg (HW) equilibrium was tested for all primer -population combinations and each population.Probability values were estimated using the Markov chain method implemented in GENEPOP 4.0.10 (Rousset, 2008).Linkage disequilibrium was tested for all locus pair -population combinations in GENEPOP using the log likelihood ratio statistic.Significance levels were corrected for false positives (i.e.false discovery rate) following the procedure of Benjamini & Hochberg (1995).In the case of departure from Hardy-Weinberg equilibrium, the data was analyzed with Micro-Checker ver.2.2.0.3 (van Oosterhout et al., 2004).FREENA (Chapuis & Estoup, 2007) was used to calculate null allele frequency whenever detected.

RESULTS
We obtained 22,723 raw sequences of at least 80 bp of which 7,298 were (i) unique (no BLAST hit to other reads) or consensus sequences and (ii) contained a microsatellite of at least four repetitions.For the 3565 unique and consensus sequences with a flanking region of at least 30 bp without homopolymers and/or short repetitions, primer pairs could be designed for 790 loci.In order to maximize polymorphism of target microsatellites, sequences with microsatellites of only 4, 5 and 6 repetitions were discarded and 101 sequences with microsatellites of at least 7 repetitions were retained.
Of the 101 loci amplified, 36 loci showed clear bands and amplified for all four B. aquilonaris individuals.Of these, 17 primers were chosen because inter-individual variation was detected on agarose gel.Cross-species amplification was successful in B. eunomia for 15 of the 101 loci.However, all the loci with visible polymorphism in B. eunomia were different from the 17 selected loci for B. aquilonaris.Hence, microsatellite analyses in B. eunomia were limited to steps 1 (454 sequencing) and 2 (non-labelled primer screening).
For the 15 loci that were tested in B. aquilonaris, the total number of alleles ranged from 2 to 8 and expected heterozygosity ranged from 0.071 to 0.821 (Table 2).
After controlling for false discovery rate (Benjamini & Hochberg, 1995), no primer pair -population combination was in linkage disequilibrium.Primer -population pairs Baq-41 -Mirenne, Baq-41 -Libin and Baq-53 -Libin were the only primer -population combinations that were significantly in HW disequilibrium after applying the controlling false discovery rate procedure (Table 2).
Microchecker analyses indicated the presence of null alleles at loci Baq-38 and Baq-54 in Grande Fange and Baq-44 and Baq-77 in Mirenne.However, it should be stressed that no population -primer pair had null allele frequencies of more than 0.001.Simulations (Dakin & Avise, 2004) showed that the bias induced by null alleles is negligible at frequencies below 0.2 in parentage analyses.

DISCUSSION
Combining a biotin-enrichment protocol and 454 GS-FLX Titanium technology enabled us to isolate 15 polymorphic microsatellite loci in B. aquilonaris.Importantly, few null alleles were detected and only at a very low frequency.Hence, only three out of the 45 population -primer pairs deviated significantly from HW equilibrium contrary to many other reports on microsatellite isolation within Lepidoptera (e.g.Harper et al., 2000;Habel et al., 2008;Mikheyev et al., 2010).Indeed, HW disequilibrium resulting from null alleles can seriously limit the utility of microsatellites by creating substantial biases (Chapuis & Estoup, 2007).Additionally, the large number of genomic sequences produced using the techniques employed in this study leaves the possibility open for further microsatellite marker design if necessary.The technique presented here also appears to have a higher success rate than isolation of microsatellites through transcriptome pyrosequencing (Mikheyev et al., 2010), where many markers displayed HW disequilibrium and nonamplification of a certain number of individuals.This may be due to a higher ratio of loci within non-coding versus coding regions and hence possibly loci under selection when transcriptomic data are used.Differences may also be the result of different selection procedures of loci at various stages of the microsatellite development process.
Microsatellite polymorphism was high considering the spatial scale of the samples analyzed.Hence, these markers open the way for detailed population genetic analyses, including inbreeding signals, effective population size and dispersal estimates of a species that needs to be conserved.Genetic results will be confronted and complemented with demographic and ecological observations in order to provide the most viable conservation management guidelines possible.
Populations tested in this study were chosen because previous genetic marker data sets indicated high genetic differentiation (between Grande Fange and Libin, and Mirenne and Libin, respectively: Vandewoestijne & Baguette, 2002).Baq-44 and Baq-30 had high amplification failure rates in the highly differentiated population of Libin; even though DNA from Libin was included in the initial step (Table 1, step 1) of this microsatellite isolation protocol.Due to difficulty with amplification of markers Baq-30 and Baq-44 in the most differentiated population, their application is best limited to local scale studies of B. aquilonaris.
Although null allele frequencies were below 0.001, a small percentage of individuals did not amplify for certain loci in multiplex reactions.A subset of these individuals amplified successfully for the same loci in monoplex reactions.Further tests are necessary to determine the origin of this discrepancy.Sinama et al. (2011), applying the same protocol, found three loci that were transferable between highly differentiated populations (between sub-species and populations of Euphydryas aurinia).Although samples used for marker development in this study were from a smaller geographic scale they originated from genetically very distinct (Fst = 0.179 based on RAPD data; Vandewoestijne & Baguette, 2002), albeit less differentiated populations than those of E. aurinia.Together these results underline the fact that intra-and inter-specific transferability of developed markers remains difficult within the Lepidoptera.In this study, cross-amplification between sister species was possible for only 15% of the loci tested.However, none of the selected polymorphic loci for B. aquilonaris gave clear banding patterns for B. eunomia.Previously, we have also attempted to adapt primers developed for Lysandra bellargus (Harper et al., 2000) to the sister species P. coridon without success (Vandewoestijne, unpubl. results), obtaining either multiple bands (likely due to the multiplication of microsatellite containing regions) or absence of amplification (probably due to large inter-specific difference in flanking regions; Meglecz et al., 2004).Low intraand inter-specific transferability of developed markers in this study also lends support to hypotheses that the evolution of the genome in Lepidoptera is dissimilar from other organisms, resulting in the low success rate of microsatellite isolation (Neve & Meglecz, 2000).Nevertheless, there are some exceptions: Chen & Dorn (2010) successfully cross-amplified microsatellites developed for Cydia pomonella (Lepidoptera: Tortricidae) to three other species of the same tribe.
In conclusion, 15 robust and polymorphic microsatellite loci were developed in the butterfly B. aquilonaris using a combination of biotin-enrichment and pyro-sequencing.This method was much faster and less costly than the more traditional methods, which involve Sanger sequencing.Additionally, the raw data set can still be exploited for further isolation of microsatellite loci.

TABLE 2 .
Characteristics of the primers used for large-scale genotyping.NA -number of alleles; Ho -observed heterozygosity; He -unbiased expected heterozygosity; N -number of amplified samples; P -probability value of HW test; values in bold indicate significant HW disequilibrium (P < 0.05) after controlling for false rate discovery.