Characterization of the complete mitochondrial genome of Spilarctia robusta ( Lepidoptera : Noctuoidea : Erebidae ) and its phylogenetic implications

The complete mitochondrial genome (mitogenome) of Spilarctia robusta (Lepidoptera: Noctuoidea: Erebidae) was sequenced and analyzed. The circular mitogenome is made up of 15,447 base pairs (bp). It contains a set of 37 genes, with the gene complement and order similar to that of other lepidopterans. The 12 protein coding genes (PCGs) have a typical mitochondrial start codon (ATN codons), whereas cytochrome c oxidase subunit 1 (cox1) gene utilizes unusually the CAG codon as documented for other lepidopteran mitogenomes. Four of the 13 PCGs have incomplete termination codons, the cox1, nad4 and nad6 with a single T, but cox2 has TA. It comprises six major intergenic spacers, with the exception of the A+T-rich region, spanning at least 10 bp in the mitogenome. The nucleotide composition of the genome is greatly A+T biased (81.09%), with a negative AT skewness (–0.007), indicating the presence of fewer As than Ts, similar to other Noctuoidea. The A+T-rich region is 343 bp long, and contains some conserved regions, including an “ATAGA” motif followed by a 19 bp poly-T stretch, a microsatellite-like (AT)9 and a poly-A element, a characteristic shared with other lepidopteran mitogenomes. Phylogenetic analysis, based on 13 PCGs using Maximum likelihood methods revealed that S. robusta belongs to the superfamily Noctuoidea. * Corresponding authors; e-mails: zhubaojian@ahau.edu.cn (B.-J. Zhu), clliu@ahau.edu.cn (C.-L. Liu). INTRODUCTION The insect mitochondrial genome is a circular molecule, ranging in size from 15 to 19 kb (Jiang et al., 2009). It contains a set of 37 genes that are typically similar in all insects sequenced to date. On the basis of their physiological functions, they are divided into 13 protein coding (two ATPase genes [atp6 and atp8], seven NADH dehydrogenase [nad1-nad6 and nad4L], a cytochrome b [cob], three cytochrome c oxidase [cox1-cox3]), 22 transfer RNAs and two ribosomal RNAs (rrnL and rrnS) genes (Shadel & Clayton, 1993; Cameron, 2014). In addition, it has a control region of variable length (A+T-rich region) (Wolstenholme, 1992). The mitochondrial DNA (mtDNA) is extremely conserved and is maternally inherited. Moreover, it is non-recombinant and undergoes reductive evolution. Therefore, the study of the mitogenome is considered to be important for understanding molecular evolution, comparative and evolutionary genomics, phylogenetics and Eur. J. Entomol. 113: 558–570, 2016 doi: 10.14411/eje.2016.076


INTRODUCTION
The insect mitochondrial genome is a circular molecule, ranging in size from 15 to 19 kb (Jiang et al., 2009).It contains a set of 37 genes that are typically similar in all insects sequenced to date.On the basis of their physiological functions, they are divided into 13 protein coding (two ATPase genes [atp6 and atp8], seven NADH dehydrogenase [nad1-nad6 and nad4L], a cytochrome b [cob], three cytochrome c oxidase [cox1-cox3]), 22 transfer RNAs and two ribosomal RNAs (rrnL and rrnS) genes (Shadel & Clayton, 1993;Cameron, 2014).In addition, it has a control region of variable length (A+T-rich region) (Wolstenholme, 1992).The mitochondrial DNA (mtDNA) is extremely conserved and is maternally inherited.Moreover, it is non-recombinant and undergoes reductive evolution.Therefore, the study of the mitogenome is considered to be important for understanding molecular evolution, comparative and evolutionary genomics, phylogenetics and

PCR amplifi cation, cloning and sequencing
We designed twelve pairs of primers from the conserved nucleotide sequences of known mitochondrial genomes of Lepidoptera to determine the sequence characteristics of the S. robusta mitogenome (Liu et al., 2013;Dai et al., 2015).The complete list 2014).Therefore, this taxonomic group has attracted the attention of researchers all over the world, and their main focus is to establish relationships within the group as well as with other taxonomic categories.
S pilarctia robusta ( Lepidoptera: N octuoidea: Erebidae: Arctiinae) is a major arthropod pest of trees.This species largely infests forests, but also damages roadside and garden trees in urban areas.The economic losses are increasing at an alarming rate (Liao et al., 2010).Therefore, considering its economic importance as well as the lack of information on its evolutionary relationships, we designed the present study, in order to sequence and annotate the complete mitogenome of S. robusta.Moreover, we compared it with other Lepidoptera that have been sequenced in order to highlight their evolution, particularly the phylogenetic relationships of Noctuoidea and Erebidae.

Experimental insects and DNA extraction
Spilarctia robusta (moth) specimens were collected from Anhui Agricultural University (AHAU), Anhui Province, China.These specimens were identifi ed as S. robusta by a taxonomist (Department of Entomology, AHAU).Total DNA was extracted using a Genomic DNA Extraction Kit (Aidlab Co., Beijing, China) according to the manufacturer's instructions.   of successful primers is given in Table 2 (Sangon Biotech Co., Shanghai, China).All amplifi cations were performed on an Eppendorf Mastercycler and Mastercycler gradient in 50 μL reaction volumes, including 35 μL sterilized distilled water, 5 μL 10 × Taq buffer (Mg 2+ plus), 4 μL dNTP (25 mM), 1.5 μL extracted DNA as a template, forward and reverse primers 2 μL each (10 μM) and 0.5 μL (1 unit) Taq (Takara Co., Dalian, China).The PCR amplifi cation conditions were as follows: an initial denaturation cycle at 94°C for 4 min followed by 38 cycles, one cycle at 94°C for 30 s, one cycle at 48-59°C for 1-3 min (depending on putative length of the fragments), and a fi nal extension step of one cycle at 72°C for 10 min.The PCR products were detected using electrophoresis in agarose gel (1%, w/v), purifi ed using a DNA gel extraction kit (Transgen Co., Beijing, China) and sequenced with the PCR primers.

Sequence assembly and gene annotation
Sequence annotation was performed using the blast tools available on NCBI (http://blast.ncbi.nlm.nih.gov/Blast) and the Seq-Man II program in the Lasergene software package (DNAStar Inc., Madison, USA).The protein-coding sequences were translated into putative proteins on the basis of the Invertebrate Mitochondrial Genetic Code.The skewness was measured using the method of Junqueira et al. (2004) and base composition of nucleotide sequences was described as: AT skew = The Relative Synonymous Codon Usage (RSCU) values were calculated using MEGA 5.0 (Tamura et al., 2011).
Transfer RNA genes were determined using tRNAscan-SE software (http://lowelab.ucsc.edu/tRNAscan-SE/)(Lowe & Eddy, 1997), or predicted by sequence features of being capable of folding into the typical cloverleaf secondary structure with legitimate anticodon.The tandem repeats in the A+T-rich region were found using the Tandem Repeats Finder program (http:// tandem.bu.edu/trf/trf.html)(Benson, 1999).

Phylogenetic analysis
To reconstruct the phylogenetic relationships of Lepidoptera, 38 complete or partially complete mitogenomes were downloaded from the GenBank database (Table 1).The mitogenomes of Drosophila melanogaster (U37541.1)(Lewis et al., 1995) and Locusta migratoria (NC_001712) (Flook et al., 1995) were used as an outgroup.The amino acid sequences of each of the 13 mitochondrial PCGs were aligned with Clustal X using default settings and concatenated (Thompson et al., 1997).Later a concatenated set of amino acid sequences from the 13 PCGs was used for phylogenetic analyses, which were performed using the Maximum Likelihood (ML) method with the MEGA version 5.1 program (Tamura et al., 2011).The method was used to infer phylogenetic trees based on 1000 bootstrap replicates.

Genome structure, organization and composition
The complete mitogenome of S. robusta is a closed circular molecule, 15,447 bp long (Fig. 1).The complete mitogenome is deposited in NCBI GenBank database under accession number KX753670.It contains the entire set of 37 genes (22 tRNA genes,nad4L,cob,atp6 and atp8], two rRNAs [rrnS and rrnL].In addition, there is one major non-coding A+T-rich region   (Table 3).The gene arrangement and orientation is trnM-trnI-trnQ.The nucleotide composition of the major strand is 40.28%A, 40.81%T, and 7.57% G and: 11.34% C, with a total of 81.09% A+T content (Table 4).The AT skewness and GC skewness are -0.007 and -0.199, respectively.

Protein-coding genes and codon usage
Twelve of the 13 PCGs of S. robusta use ATN (ATT, ATG and ATA) as an initiation codon.Of which, the ATG is the most frequent initiation codon as the cox2, cox3, atp6, nad4, nad4L, cytb and nad1 begin with it.Whereas cox1 has a CGA start codon.Four of the 13 PCGs (cox1, cox2, nad6 and nad4) terminate with incomplete stop codons, either TA or T nucleotide, and the remainder of the PCGs terminate with the canonical stop codon TAA.The cox1, nad4 and nad6 have a single T as a stop codon, while the cox2 has TA.We analyzed the codon usage of ten lepidopteran species, of which fi ve belonged to Noctuoidea and one each to Geometroidea, Tortricoidea, Papilionoidea, Pyraloidea and Hesperioidea (Fig. 2).The analysis reveals that Asn, Ile, Leu2, Lys, Phe, Tyr and Met are the most frequently utilized amino acids, of which, the hydrophobic amino acid Leu2 family is the most and the Arg codon family the least frequent.The codon distribution of fi ve Noctuoidea species is consistent, and the content of each amino acid is similar in the different species (Fig. 3).
The Relative Synonymous Codon Usage (RSCU) in the six lepidopteran superfamilies with known mitogenomes reveals that S. robusta PCGs are relatively similar to those of L. melli, A. formosae, S. charonda kuriyamaensis and O. furnacalis, and different from the species, which lack GCG

Ribosomal and tRNA genes
As in other Lepidoptera, S. robusta has two rRNA genes.The rrnL gene (1421 bp) is at the junction between tRNA Leu (CUN)-tRNA Val and the rrnS gene (816 bp), which is located between tRNA Val and the A+T-rich region (Table 3), and the A+T content is 83.81%.The value of the A+T content is well within the range of 80.16% (B.mandarina) to 85.93% (P.atrilineata) recorded for Lepidotera.Both the AT skewness (-0.014) and GC skewness (-0.362) are negative.
Fourteen of the 22 tRNA genes on the H-strand and eight on the L-strand were identifi ed.The length of tRNA genes ranges from 62 bp to 71 bp, which is similar to that of most Lepidoptera sequenced.It is highly A+T (81.19%) biased, and exhibits positive AT-skewness (0.010) and negative GC skewness (-0.127) (Table 4).All the tRNAs fold into the expected secondary cloverleaf structure, with the exception of tRNA Ser(AGN) (Fig. 5).It forms an unusual secondary structure lacking a stable stem-loop structure in the DHU arm.There are a total of 11 mismatches in S. robusta tRNA genes.The G-U wobble pairs are scattered throughout the 6 tRNA genes, (two in the acceptor stem, four in DHU and one in TψC) and there is one A-A mismatch in the anticodon stem of trnS1 and three U-U mismatches in the acceptor stem of trnA, trnL2 and trnS1 (Fig. 5).

Overlapping and intergenic spacer regions
There are fi ve overlapping regions, with a total of 24 bp in the mitogenome of S. robusta.On the basis of their location, they are categorized into three types: tRNA and tRNA (trnW and trnC, trnK and trnD), tRNA and protein (nad2 and trnW, trnF and nad5) and protein and protein (atp6 and atp8).The length of these sequences varies from 1 bp to 8 bp.The largest overlapping region (8 bp), located between trnW and trnC, the rest of 6 bp, 2 bp, and 1 bp overlaps located between trnF and nad5, nad2 and trnW, and trnK and trnW, respectively (Table 3).In addition, there is a 7 bp overlap located at the junction of atp8-atp6.Further we recorded intergenic nucleotides between atp8 and atp6 in ten species of Lepidoptera (Fig. 6).
The intergenic spacers in the mi togenome of S. robusta are spread over 18 regions and ranged in size from 1 bp to 37 bp, with a total of 168 bp in length.Of which there are six major intergenic spacers of at least 10 bp in length (Table 3).The largest intergenic spacer (37 bp) is present between trnQ and nad2 and has an extremely high A+T content.Further we identifi ed a 17 bp intergenic spacer between trnS2 (UCN) and nad1, which contains the "ATAC-TAA" motif (Fig. 7A).

The A+T-rich region
With a length of 344 bp, the A+T-rich region in the mito genome of S. robusta is located at the junction rrnS-trnM (Table 4).This region has the highest A+T (95.35%) content, and most negative AT skewness (-0.049) and GC skewness (-0.751) (Table 4).We recorded four short repeating sequences located on both sides of the A+T-rich region, the motif "ATAGA" and a 19 bp poly-T stretch downstream from the rrnS gene, while the microsatellite-  like element (AT) 9 and a poly-A element are located upstream of the trnM gene (Fig. 7B).

Phylogenetic relationships
We reconstructed the phylogenetic relationships using the ML method based on the concatenated nucleotide sequences of the 13 PCGs of the related lepidopteran superfamilies.The phylogenetic analysis reveals that the superfamilies Noctuoidea , Geometroidea, Bombycoidea, Pyraloidea, Papilionoidea, Tortricoidea, Yponomeuto idea and Hepialoidea are monophyletic (Table 1), and Noctuoidea is most closely related to the superfamilies Ge-ometroidea and Bombycoidea.Different species of the same family form a single cluster and S. robusta is closely related to L. melli in the Erebidae (Fig. 8).

DISCUSSION
In the present study, the size of the newly sequenced S. robusta (15,447 bp) mitogenome falls within the range of those recorded for other species of Lepidoptera sequenced; Artogeia (15,140 bp) has the shortest and B. mandarina (15,928) the longest.The variation in size is primarily due to differences in the number of repeats in the control regions (Pan et al., 2008;Hong et al., 2009).The gene number and nucleotide composition is similar to that of Metazoa, however their arrangement and orientation (trnM-trnI-trnQ) is different from the ancestral gene order trnI-trnQ-trnM (Boore, 1999).The AT skewness (-0.007) of the mitogenome studied indicates the presence of less As than Ts.This remarkable feature is also reported for several arthropod species including A. formosae (-0.027), C. pomonella (-0.004), H. vitta (-0.010) and A. pernyi (-0.021).Interestingly, the GC skewness (−0.362) of the rRNA is much lower than in previously sequenced animals and further reveals that the mitgenome is more biased toward Cs than Gs (Jiang et al., 2009;Liu et al., 2012).
Twelve of the 13 protein-coding genes have the standard ATN (ATT, ATG and ATA) start codon, while cox1 has CGA.Most of S. robusta PCGs (cox2,cox3,atp6,nad4,nad4L,cytb and nad1) have ATG as the initiation codon (Dai et al., 2016, Liu et al., 2016).Four PCGs (cox1, cox2, nad6 and nad4) have incomplete stop codons, either TA or T, while the remaining end with TAA.Partial stop codons are reported in many lepidopteran mitogenomes.In Lepidoptera there seems to be a high degree of conservation of incomplete stop codons (Liao et al., 2010;Liu et al., 2013).
The comparative analysis of the different codons in ten species of Lepidoptera (Fig. 2) reveals that Asn, Ile, Leu2, Lys, Phe, Tyr and Met are the most frequent amino acids, of which the Leu2 family (hydrophobic amino acid) the most frequent.The composition of amino acid might be related to the function of the chondriosome in encoding several transmembrane proteins (Lu et al., 2013).Furthermore, the relative codon usage (RSCU) recorded for S. robusta is similar to that for L. melli, A. formosae, S. charonda kuriyamaensis and O. furnacalis, but different from that recorded for the codons of species that lack GCG&GTG (H.cunea), GCG&CGC&CCG (G.argentata), CGG (P.atrilineata), GCG (C.pomonella) and GCG (H.vitta).It is likely there are fewer codons with a high GC, as this feature seems to be conserved in insects (Lu et al., 2013;Dai et al., 2015).
Five overlaps, with a total length of 24 bp were recorded in the mitogenome of S. robusta.The largest 8 bp overlap is between trnW and trnC (Table 3) as documented for other species of Lepidoptera, for instance B. mandarina (Li et al., 2010) and B. mori Dazao (Liu et al., 2013).An interesting aspect of the present study is the fi nding that there is an overlapping sequence of 7 bp (ATGATAA) at the junction of the atp8-atp6 genes.This overlap seems to be conserved in the Lepidoptera currently sequenced (Liu et al., 2008;Zhu et al., 2013).The overall organization of the mitogenome of S. robusta is compact, with only 168 bp intergenic spacers dispersed in 18 regions and ranging in size from 1 to 37 bp (Table 3).The longest intergenic spacer (37 bp) is located at the junction trnQ-nad2, with an extremely high A+T content, which is frequently recorded in the mitogenomes of Lepidoptera (He et al., 2015).The intergenic spacers in the mitogenome studied is longer than that in A. selene (137 bp over 13 regions), but shorter than that in O. lunifer (371 bp over 20 regions) (Salvato et al., 2008;Liu et al., 2012).The 17 bp spacer between trnS2 (UCN) and nad1 contains the "ATACTAA" motif (Fig. 7A), which is a highly conserved region in most insect mtDNAs and is proposed as a possible mitochondrial transcription termination peptide-binding site (mtTERM protein) (Taanman, 1999).
The AT-rich region in the mitogenomes of arthropods is a non-coding stretch, usually located between the trnI-trnQ-trnM gene cluster and the rrnS gene.The occurrence of different copy numbers of tandemly repeated elements is documented as one of the remarkable features of the A+Trich region in insects.The A+T-rich region in S. robusta extends over 344 bp (15,447) and is located between rrnS and trnM.The region is highly A+T (95.35%) biased compared to the mitogenome as a whole (81.09%).The fi rst two repeat regions, the motif "ATAGA" and the 19 bp poly-T stretch are located on the rrnS gene side of the A+T-rich region, while the microsatellite-like (AT) 9 and poly-A element are located upstream of the trnM gene (Fig. 7B).Comparison with other previously sequenced lepidopterans revealed that the A+T-rich region of the mitogenome studied is longer than that in L. melli (338 bp), G. argentata (340 bp), S. morio (316 bp), H. vitta (255 bp), M. sexta (324 bp) and A. ipsilon (332 bp), but shorter than that in H. cunea (357 bp), A. formosae (482 bp), C. pomonella (351 bp), P. atrilineata (457 bp), B. mandarina (484 bp), A. pernyi (552 bp) and C. suppressalis (348 bp) (Table 4).The length of the poly-T stretch varies from species to species (Lu et al., 2013;Dai et al., 2015) and the ATAGA region is conserved in Lepidoptera (Cameron & Whiting, 2008).
The phylogenetic analyses revealed that the different species from the same family clustered together.These results are consistent with the conclusion of many authors, e.g.Liu et al. (2015) and Lammermann et al. (2016).Results of further analyses strongly support a close relationship between S. robusta and L. melli (Erebidae).

Fig. 1 .
Fig. 1.Map of the mitogenome of S. robusta.The tRNA genes are labelled according to the IUPAC-IUB single-letter amino acids: cox1, cox2 and cox3 refer to the cytochrome c oxidase subunits; cob cytochrome b; nad1-nad6 NADH dehydrogenase components; rrnL and rrnS ribosomal RNAs.Genes named above the bar are located on major strands, while the others are located on minor strands.

Fig. 2 .
Fig. 2. Comparison of the size of each codon within the mitochondrial genome of different species of Lepidoptera.Lowercase letters (a, b, c, d, e and f) above species name indicate the superfamily to which the species belong (a -Noctuoidea; b -Geometroidea; c -Tortricoidea, d -Papilionoidea; e -Pyraloidea; f -Hesperioidea).

Fig. 4 .
Fig. 4. The Rel ative Synonymous Codon Usage (RSCU) of the mitochondrial genome in six super families of Lepidoptera.Codon families are plotted on the x-axis.Codons indicated above the bar are not present in the mitogenomes.

Fig. 5 .
Fig. 5. Putative secondary structures of the 22 tRNAs of the mitogenome of S. robusta.

Fig. 7 .
Fig. 7. (A) Alignment of the intergenic spacer region between trnS2 (UCN) and nad1 in several species of Lepidoptera.The shaded "ATAC-TAA" motif is conserved in Lepidoptera.(B) Features present in the A+T-rich region in S. robusta.The sequence is shown in the reverse strand.The ATATG motif is shaded.The poly-T stretch is underlined, while the poly-A stretch is double underlined.The single microsatellite T/A repeats sequence are indicated by dotted underlining.

Fig . 6 .
Fig .6. Alignment of the overlapping region between atp8 and atp6 in Lepidoptera and other insects.The numbers on the right refer to the number of intergenic nucleotides.

Table 1 .
Details of the lepidopteran mitogenomes used in this study.

Table 2 .
Details of the primers used to amplify the mitogenome of S. robusta.

Table 3 .
List of the annotated mitochondrial genes of S. robusta.

Table 4 .
Composition and skewness of different lepidopteran mitogenomes.