Intronic sequences of the silkworm strains of Bombyx mori (Lepidoptera: Bombycidae): High variability and potential for strain identification

We sequenced nine introns of 25 silkworm (Bombyx mori L.) strains, assuming that the introns are particularly prone to mutation. Mean sequence divergence and maximum sequence divergence in each intronic sequence among 25 silkworm strains ranged from 0.81% (3.8 nucleotides) ~ 9.15% (85.2 nucleotides) and 1.2% (seven nucleotides) ~ 39.3% (366 nucleotides), respec- tively. The degree of sequence divergence in some introns is very variable, suggesting the potential of using intronic sequences for strain identification. In particular, some introns were highly promising and convenient strain markers due to the presence of a large indels (e.g., 403 bp and 329 bp) in only a limited number of strains. Phylogenetic analysis using the individual or the nine concate- nated intronic sequences showed no clustering on the basis of known strain characteristics. This may further indicate the potential of the intronic sequences for the identification of silkworm strains.


INTRODUCTION
Due to their great economic value, more than 3000 genetically different silkworm (Bombyx mori) strains, some of which produce different qualities and yields of the silk, are maintained in Europe and Asia (Nagaraju, 2000).In the Republic of Korea, approximately 300 silkworm strains are maintained in The National Institute of Agricultural Science & Technology (NIAST), and some of these are also kept in other cocoon-producing countries, such as China, Japan and India.These strains are reared annually, and scores from indoor rearing are analyzed for consistent character maintenance.
Silkworm strains are described on the basis of several morphological and physiological characteristics such as origin, voltinism, number of moults or cocoon making.However, sorting one strain from another based on these characteristics is often difficult because of the high variability and environmental dependence of these characteristics.Furthermore, silkworm strains have been selected in order to maximize their commercial and regional suitability.Thus, compared to the diversity that exists within natural populations, the genetic diversity of silkworm strains is very much diminished.Additionally, the general genetic backgrounds of the strains are quite similar, even though some of the characteristics selected for commercial and regional purposes may differ.From a practical perspective, discriminating one strain from another is often necessary because silkworm larvae with similar external morphologies are often reared at the same place at the same time, and cross contamination between strains is possible.Once this occurs, the best procedure is to destroy the contaminated cultures, as it is impossible to guarantee the purity of the remaining larvae.This limitations has prompted some investigators to use molecular methods such as isozymes (Seong, 1997;Sohn et al., 2002), RAPD (random amplified polymorphic DNA; Hwang et al., 1995), RFLP (restriction fragment length polymorphism, Shi et al., 1995), and direct sequencing of mitochondrial DNA (mtDNA;Kim et al., 2000;Hwang et al., 1996Hwang et al., , 1998) ) to identify strains.Most of these techniques resolved the origin-based evolutionary relationships among some silkworm strains, and the relationships between the domestic and wild silkworm, B. mandarina, presumed ancestor of the domestic silkworm, rather than discriminating between silkworm strains.Microsatellite DNA is an exception in this regard, in that some microsatellite DNAs reflected a certain character type in the silkworm strains (i.e.diapause vs. non-diapause) (Reddy et al., 1999).
It is suggested that introns are particularly prone to mutations (Serapion et al., 2004), possibly due to reduced selective pressure (Juszczuk-Kubiak et al., 2004;Ueda et al., 1984Ueda et al., , 1985;;Martinez et al., 2004).Thus, more variation may be revealed by intronic sequencing.There are several genomic sequences of B. mori in the GenBank.Thus, several intron regions from the GenBank were Intronic sequences of the silkworm strains of Bombyx mori (Lepidoptera: Bombycidae): High variability and potential for strain identification selected and sequenced to determine the variability in the intronic sequences among the strains and to assess their potential for use in strain identification.Some intronic sequences showed substantial variation among the silkworm strains tested.

Silkworm strains
Silkworm strains chosen for the present study represent a diverse range of genetic stocks: different geographic origin, voltinism, moultinism, cocoon colour, and cocoon shape (Table 1).

Genomic DNA extraction
Genomic DNA was isolated from eggs of B. mori strains that are maintained at NIAST, Republic of Korea.In the case of field-collected wild silkworm, B. mandarina (Suwon City, Korea), genomic DNA was extracted from a larval specimen.Approximately 100 eggs or individual larvae were crushed in a glass grinder and genomic DNA was extracted using the Wizard Genomic DNA Purification Kit, in accordance with the manufacturer's instructions (Promega, Madison, WI, USA).

Intron selection
Of the silkworm genes for which complete genomic structures are available, 13 intron regions, approximately 500-700 bp in length, were selected from the GenBank database for laboratory study.These intron regions are described in Table 2. Primers were designed based on the sequence information of the flanking exons (Table 2).

PCR amplification, cloning and sequencing
The polymerase chain reactions were performed using a PCR mix (Bioneer, Soeul, Republic of Korea) with primers, both at a concentration of 10 pmol, along with genomic DNA at a concentration of approximately 100 ng and H2O up to a total volume of 20 µl.The following PCR protocol was used: 5 min at 94°C, followed by 40 cycles of 30 s at 94°C, 40 s at 50-60°C, and 45 s at 72°C, and a subsequent 7 min final extension at 72°C.The amplified PCR product was separated by electrophoresis in a 0.5% agarose gel (Sigma, St. Louis, MO, USA) with ethidium bromide.The amplicons were then cloned in pGEM-T Easy vector (Promega), and the resulting plasmid DNA was isolated using the Wizard Plus SV Minipreps DNA Purification System (Promega).Both strands of the PCR amplicons were cycle-sequenced using the ABI PRISM BigDye Terminator v1.1 Cycle Sequencing Kit and electrophoresed in each direction on an ABI PRISM 310 Genetic Analyzer (PE Applied Biosystems, Foster City, CA, USA).When necessary, an additional internal primer was designed to complete sequences by primer walking.

Sequence analysis and phylogenetic analyses
Each intronic sequence was aligned with the original sequence registered in GenBank using the CLUSTAL X program (Thompson et al., 1997).Sequence divergence and phylogenetic analysis were performed using PAUP* (Phylogenetic Analysis Using Parsimony and Other Method*) ver.4.0b10

74
M -multi-voltine strain; B -black; W -white; Br -brown; Y -yellow; LYG -light yellow green; F -Flesh; LG -light green; LY -light yellow; C -cream; "-" -no rigid cocoon shape.TABLE 1.General information on the silkworm strains utilized in this study.(Swofford, 2002).For tree construction, the maximumparsimony (MP) method (Fitch, 1971) was performed with the heuristic search.Branches were collapsed if the maximum branch length was zero.Trees were evaluated using the bootstrap test (Felsenstein 1985) with 1,000 iterations.To root the tree, the homologous sequence of B. mandarina, which is assumed to be an ancestor of the domestic silkworm (Arunkumar et al., 2006), was utilized.

Nucleotide composition and variability of intronic sequences
Among 13 intron regions, nine provided stable DNA amplicons that could be successfully sequenced.The GenBank accession numbers for 225 intronic sequences, composed of nine intron regions from 25 silkworm strains are DQ833532-DQ833750 and DQ852325-DQ852330.The nucleotide composition and genetic variability of each intronic sequence are presented in Table 3.The nucleotide composition of the intron regions was somewhat biased toward adenine and thymine, ranging from 57% (A4 Intron 1) -70.9% (LCP30 Intron 4).At each intron, the mean sequence divergence among the 25 silkworm strains ranged from 0.81% (3.8 bp) to 9.15% (85.2 bp) and the maximum sequence divergence at each intron ranged from 1.2% (7 bp) to 39.3% (366 bp) (Table 3).The 5,897 bp of the nine concatenated intronic sequences resulted in 4.65% of the mean sequence divergence and 15.6% of the maximum sequence divergence (Table 4).In comparison with some previous sequence-based studies of silkworm strains from which comparable divergence estimates can be drawn, the divergence estimate obtained in this study is substantial.For example, the maximum sequence divergence among 11 silkworm strains of different origin is only 0.2% in the 410 bp section of the Cytochrome Oxidase Subunit I (COI) gene of mitochondrial DNA (mtDNA) (Kim et al., 2000).The sequence of a ~1000 nucleotide long single intron of the heavy-chain fibroin gene (H-fib) from five strains of B. mori and five geographic samples of B. mandarina revealed as much as 0.26% of the maximum sequence divergence (Martinez et * Locations are taken from the gene sequences in the original publications.al., 2004).Further, an initial sequence analysis of five silkworm strains by ~500 bp of a hypervariable A+T-rich region of mtDNA revealed an identical sequence (data not shown).Taking these results into consideration, the degree of sequence divergence of the intron regions in this study is substantial.Specifically, PTTH Intron 3, LCP30 Intron 4, and VDP Intron 4 are highly polymorphic with a maximum of 43.5%, 33.2%, and 39.3% sequence divergence, respectively, including insertion and deletion (indel) (Table 3), suggesting that these intron regions may provide a means of strain discrimination.
Although in theory there is the possibility of heterozygosity, intra-strain variation in the intronic sequences is expected to be minimal in such long maintained silkworm strains, due to purifying selection.In fact, sequencing of five clones each from A4 Intron 1, A4  ning of this study did not reveal a single mutation.This probably implies that the intra-strain variation might be minimized during purifying selection.Nevertheless, an extra analysis of the remaining four intron regions is needed to confirm possible heterozygosity.

Large indels
Some intron regions in some silkworm strains have long indels (Table 5).There is a 178 bp deletion in LCP30 Intron 4 in three strains, numbers 69, 145, and 324 (Fig. 1).Additionally, there is a unique 430 bp insertion in PTTH Intron 3 in strain 33 and a 329 bp deletion in VDP Intron 4 in three strains, 148, 158, and 181 (data not shown).Except for a few runs of T nucleotides from the 178 bp region of LCP30 Intron 4, no repeating sequence was found either in the 430 bp region of PTTH Intron 3 or 329 bp region of VDP Intron 4. These indels seem to be highly promising and convenient strain markers, which do not require direct sequencing for strain identification, but more clones and strains need to be tested.There are also several 3~5-bp indels shared by only a small number of strains.

Analysis of relationships among strains
To test whether or not the intronic sequences reflect any known morphological characteristic or strain origin, a phylogenetic analysis of the 25 silkworm strains was performed using individual or nine concatenated intronic sequences.In this analysis, indel mutations were included in the construction of a phylogenetic tree, following the method suggested by Kawakita et al. (2003), in which long indels or short gaps are highly reliable sources of phylogenetic information, at least at lower taxonomic levels (i.e.among Bombus bumble bee species).In the trees obtained using individual intronic sequences, no group was formed based on any known strain characteristics, such as voltinism, moultinism, egg colour, blood colour, cocoon colour, or cocoon shape (data not shown).Furthermore, the tree obtained using the nine concatenated intronic sequences comprising 5,897 bp including indels resulted in a similar conclusion (Fig. 2).For example, although strains 148 and 158 have different origins (Korea vs. Europe), they formed a strong sister group with the highest bootstrap support (Fig. 2) and a relatively low sequence divergence, 2.2% (128 bp), in the range 0.6% (33 bp) ~ 15.6% (923 bp) (Table 4).Furthermore, these two strains share an identical sequence in LCP30 Intron 3 (Table 6).Nevertheless, the two strains only share brown egg colour and yellow blood colour (Fig. 3), but differ in voltinism and moultinism, cocoon colour, cocoon shape, larval morphology and origin (Table 1).Similar examples can be found at many other nodes of the tree.Thus, it is suggested that these intronic sequences do not reflect any known strain characteristics.Instead, the intronic sequences appear to be the product of neutral evolution with respect to gene expression.In fact, previous investigations of the 5' end region of the H-fib intron of B. mori and B. mandarina show no diagnostic difference between the two species.This was explained by the neutrality of the intronic variations with respect to the H-fib expression (Ueda et al., 1985;Kusuda et al., 1986).These results suggest that the intronic sequences obtained in this study may have better resolving power for the identification of silkworm strains than of the longterm evolutionary relationships among strains.

Identical sequences among strains
Sequence analysis of the nine intron regions provided varying numbers of identical sequences among the 25 silkworm strains, except for FLC Intron 3 (Table 6), for which substantially high mean and maximum sequence divergences of 5.99% (42.6 bp) and 12.2% (87 bp), respectively, were obtained (Table 3).Some intronic sequences, including LCP30 Intron 3 (identical sequences Fig. 2. The result of maximum parsimony analysis of 25 silkworm strains using the nine concatenated intronic sequences that comprise 5,897 bp including indels.The tree was obtained using heuristic search incorporated in PAUP (Swofford, 2002).This single-most parsimonious tree (tree length = 2930, consistent index = 0.658, retention index = 0.703, homoplasy index = 0.342) was obtained from an unweighted parsimony analysis.Numbers at each node indicate the number of times the nodes were supported in an analysis of a 1000 bootstrap replicate dataset.The outgroup selected was the wild silkworm, B. mandarina, which is presumed to be an ancestral species of the domestic silkworm, B. mori. in 13 strains) and SP1 Intron 3 (identical sequences in 12 strains), resulted in identical sequences among several strains, but other intronic sequences such as PTTH Intron 3 (identical sequences in three strains), LCP30 Intron 4 (identical sequences in two strains in each of three sequence types), and VDP Intron 4 (identical sequences in three strains) showed identical sequences in only a few strains.Thus, either using one intronic sequence, such as FLC Intron 3 alone or concatenation of some intronic sequences, or by removing highly conserved sequences (i.e.LCP30 Intron 3 and SP1 Intron 3), it is currently pos-sible to discriminate the 25 strains tested in this study.Further, the strains sharing identical sequences, in PTTH Intron 3, LCP30 Intron 4 and VDP Intron 4, are very different.Collectively, concatenation of these intronic sequences can also distinguish all 25 strains tested in this study.Thus, these intronic sequences may be suitable for strain identification after more clones and strains are tested.
More than 3000 silkworm strains are currently kept around the world, including those in Korea.These are always exposed to accidental contamination, but there is

Fig. 1 .
Fig. 1.An example of a large deletion found in the intronic sequence of LCP30 Intron 4: (A) a partial sequence alignment of the strains 69, 324 and 145, showing a 187 bp deletion and of strains 148 and 296 that do not have such a deletion; and (B) PCR products of strains 69, 324 and 145, showing a 187 bp deletion and strains 148 and 296 that do not have such deletion.M, molecular size marker.

TABLE 2 .
Summary of the silkworm intron regions utilized in this study.Mean nucleotide composition (%) was obtained by averaging the nucleotide composition of each intronic sequence in 25 silkworm strains.Sub.-substitution.S. D. -standard deviation. *

TABLE 3 .
Summary of sequence composition and genetic variability in each intron region among 25 silkworm strains.