Complete mitochondrial genome of Palpita hypohomalia ( Lepidoptera : Pyraloidea : Crambidae ) and its phylogenetic implications

The complete mitochondrial genome of a pyraloid species, Palpita hypohomalia, was sequenced and analyzed. This mitochondrial genome is circular, 15,280 bp long, and includes 37 typical metazoan mitochondrial genes (13 protein-coding genes, two ribosomal RNA genes, 22 transfer RNA genes) and an A + T-rich region. Nucleotide composition is highly biased toward A + T nucleotides (81.6%). All 13 protein-coding genes (PCGs) initiate with the canonical start codon ATN, except for cox1 which is CGA. The typical stop codon TAA occurs in most PCGs, while nad2 and cox2 show TAG and an incomplete termination codon T, respectively. All tRNAs have a typical clover-leaf structure, except for trnS1 (AGN) which lacks the dihydrouridine (DHU) arm. Comparative mitochondrial genome analysis showed that the motif “ATGATAA” between atp8 and atp6, and the motif “ATACTAA” between trnS2 and nad1 were commonly present in lepidopteran mitogenomes. Furthermore, the “ATAG” and subsequent polyT structure, and the A-rich 3’ end were conserved in the A + T-rich regions of lepidopteran mitogenomes. Phylogenetic analyses based on our dataset of 37 mitochondrial genes yielded identical topology for the Pyraloidea, and is generally identical with that recovered by a previous study based on multiple nuclear genes. In a previous study of the Crambidae, the Evergestinae was synonymized with Glaphyriinae; the present study is the fi rst to clarify their close relationship with mitogenome data.


INTRODUCTION
Mitochondrial genes have been extensively employed in phylogenetic investigations since they are characterized by cellular abundance, absence of introns, rapid evolution and a lack of extensive recombination (Curole & Kocher, 1999;Cameron, 2014).In recent years, increasing numbers of mitochondrial genes, and even the whole mitochondrial genome (mitogenome), have become accessible following the development of sequencing technology, which in parallel has provided effective data for studies on systematics, population genetics and evolutionary biology ( Cameron, 2014).A typical animal mitogenome is circular and includes 13 protein-coding genes (PCGs), two ribosomal RNA genes (rRNAs), 22 transfer RNA genes (tRNAs), and an A + T-rich region (Boore, 1999).Mitogenome data has been used to investigate the phylogenetic relationships of multiple animal groups, and great progression has been made (e.g.Jex et al., 2008;Cameron, 2014;Li et al., 2017).The Lepidoptera, with more than 157,000 species recorded worldwide (van Nieukerken et al., 2011), has been intensely investigated (Timmermans et al., 2014;Yang et al., 2015;Wu et al., 2016).alignments using the MAFFT algorithm within the TranslatorX online platform (Abascal et al., 2010;Katoh et al., 2017).Two rRNA and 22 tRNA genes were aligned with the Q-INS-i algorithm within the MAFFT online platform (Katoh et al., 2017).Bayesian inference (BI) and maximum likelihood (ML) methods were used to construct phylogenetic trees based on the dataset consisting of all sites of 13 PCGs, two rRNAs and 22 tRNAs.
For BI analysis, the best schemes of partition and substitution models were determined by PartitionFinder version 1.1.1(Lanfear et al., 2012); results are shown in Table S2.Bayesian interference analysis was performed using MrBayes version 3.1.2(Ronquist & Huelsenbeck, 2003).Two independent Markov chain Monte Carlo (MCMC) runs were performed for 8,000,000 generations sampling per 100 generations.Convergence between the two runs was established by Tracer version 1.6 (effective sample sizes > 200) (Rambaut et al., 2014).After the fi rst 25% of yielded trees were discarded as burn-in, a 50% majority-rule consensus tree with posterior probability was generated from the remaining trees.For ML analysis, the raxmlGUI version 1.539 interface (Silvestro & Michalak, 2012) of RAxML version 7.2.6 (Stamatakis, 2006) was employed under the GTRGAMMAI model given that most sites of the dataset show GTR + I + G as the best substitution model (Table S2), and the node reliability was assessed using the ML + rapid bootstrap algorithm with 1000 replicates.

General features of the P. hypohomalia mitogenome
The mitogenome of P. hypohomalia (GenBank accession number: MH013483) is double-stranded circular with 37 typical mitochondrial genes (13 PCGs, 22 tRNAs and two rRNAs) and an A + T-rich region (Fig. 1, Table 1).The assembled mitogenome is 15,280 bp in size, comparable to other completely sequenced py raloid mitogenomes which range from 15,110 bp in Maruca testulalis (Zou et al., 2016) to 15,490 in Diatraea saccharalis (Li et al., 2011).Among the typical 37 genes, 23 are encoded on the majority strand (J-strand), and the remaining 14 genes are located on the minority strand (N-strand).This pattern is identical to that of all other sequenced Lepidoptera mitogenomes as well as other insects (e.g.Clary & Wolstenholme, 1985;Jiang et al., 2016;Wu et al., 2016;Yang et al., 2018).
In the P. hypohomalia mitogenome, 49 overlapping sites were detected across 11 gene junctions 1-17 bp in length.Among them, a 7-bp motif "ATGATAA" between atp8 and atp6 was recognized, a character commonly found in Lepidoptera (Fig. 2A) as well as other insects such as Reduvius tenebrosus (Hemiptera) (Jiang et al., 2016) and Tae nio pteryx ugola (Plecoptera) (Chen & Du, 2017).In addition to the A + T-rich region, a total of 217 intergenic nucleotides across 17 gene junctions 1-46 bp in length were identifi ed in the P. hypohomalia mitogenome.It has been reported that the spacer sequence between trnS2 and nad1 is important in mitochondrion transcription (Taanman, 1999) and could be commonly found in insect mitogenome (Wu et al., 2016).In this spacer sequence, a conserved motif "ATACTAA" could be recognized in P. hypohomalia as well as other reported pyraloid mitogenomes (Yang et al., 2018).Moreover, comparative mitogenome analysis showed that this motif was also commonly present in lepidopteran insects, including Thitarodes renzhiensis, a non-and structure of the A + T-rich region to that of other Lepidoptera representatives.Finally, phylogenetic analyses were performed based on nearly all available mitogenomes of pyraloid species.This study provides a reference for future comparative mitogenome analyses and mitogenomebased phylogenetic investigation of Pyraloidea and related groups.

Sample collection and DNA extraction
Adult specimens of P. hypohomalia were sampled by light traps in Zhoukou, Henan, China.Fresh samples were initially preserved in 100% ethanol and then stored at -80°C until required for genomic DNA extraction.Specimens were identifi ed following the morphological description by Li (2012).Total genomic DNA was extracted from the thorax muscle tissues using a DNeasy tissue kit (Qiagen, Hilden, Germany).Molecular identifi cation was performed by blasting the newly amplifi ed barcode sequence (see below) in GenBank, and a 100% similarity to that of P. hypohomalia provided by L.H. Fan et al. (China Jiliang University, Hangzhou, unpubl.)was found.Voucher specimens are deposited in the Biology Laboratory of Zhoukou Normal University, China.

Mitogenome sequencing
Complete mitogenome sequences of P. hypohomalia were obtained by next-generation sequencing.Briefl y, the extracted whole genome DNA was quantifi ed and fragmented to an average size of 400 bases using a Covaris M220 system with the Whole Genome Shotgun method (Covaris, Woburn, MA, USA).Then, a library was constructed using a TruSeq DNA PCR-Free Sample Preparation Kit ( Illumina, San Diego, CA, USA).Finally, an Illumina HiSeq 2500 was used for sequencing with the strategy of 250 paired-ends.

Mitogenome annotation and analysis
A total of 2,298,160 raw paired reads were retrieved for P. hypohomalia.The complete mitogenome sequence was annotated using the MITOS webserver with invertebrate genetic code (Bernt et al., 2013).tRNAScan-SE server version 1.21 (Lowe & Eddy, 1997) was used to re-identify the 22 tRNAs as well as to re-confi rm their secondary structures.MEGA version 6.06 (Tamura et al., 2013) was used to re-confi rm the gene boundaries by aligning the new mitogenome with other sequenced pyraloid mitogenomes.
Nucleotide sequences of the 13 PCGs were translated with both Primer Premier version 5.00 (Premier Biosoft International, Palo Alto, CA, USA) and MEGA version 6.06 (Tamura et al., 2013) to ensure the correct reading frame.Base composition and relative synonymous codon usage (RSCU) were calculated using MEGA version 6.06 (Tamura et al., 2013).Strand asymmetry was calculated according to the formulas: AT (Perna & Kocher, 1995).Tandem repeat elements in the A + T-rich region were identifi ed using the Tandem Repeats Finder program (http://tandem.bu.edu/trf/trf.html) (Benson, 1999).

Phylogenetic analyses
To investigate the phylogenetic implications of the P. hypohomalia mitogenome in Pyraloidea phylogeny, a total of 47 taxa (Table S1), namely 37 pyraloid species with sequenced mitogenomes and ten outgroup species from Lepidoptera superfamilies Geometroidea, Bombycoidea and Noctuoidea, were sampled for phylogenetic analyses.The homologous PCG sequences of all representatives were aligned by means of codon-based multiple  ditrysian species of Lepidoptera, although with a change in the last base ("ATACTAT") (Fig. 2B).This phenomenon is in accordance with that reported by Cameron & Whiting (2008).Surprisingly, we did not detect this motif in Bombyx mori (GenBank accession number: KM875545), a representative of the Bombycoidea herein.However, in another strain of B. mori (GenBank accession number: AY048187), the same motif "ATACTAA" is identifi ed.Therefore, whether the absence of this motif in B. mori (GenBank accession number: KM875545) can be ascribed to sequencing errors requires further investigation.

Protein-coding genes
The total length of the 13 PCGs is 11,173 bp, accounting for approximately 73% of the whole mitogenome (Tables 1-2).All PCGs start with the conventional ATN codon (ATT for nad2, nad3, nad5, and nad6; ATG for cox2, atp6, cox3, nad4, nad4l, cob, and nadl; ATC for atp8).The exception is cox1, which starts with CGA, although non-canonical start codons for cox1 are common across insects (Fenn et al., 2007;Wu et al., 2016;Yang et al., 2018).All 13 PGCs end with the typical stop codon TAA, except for nad2 which has TAG and cox2 which has an incomplete termination codon T. Incomplete termination codons are commonly recognized across arthropod mitogenomes, which may be related to post-transcriptional modifi cation during the mRNA maturation process (Ojala et al., 1981).The relative synonymous codon usage (RSCU) values of the P. hypohomalia mitogenome are shown in Fig. 3 and Table S3.There are 3,712 codons excluding the termination codons.Among them, UUA (Leu), AUU (Ile), UUU (Phe), AUA (Met), and AAU (Asn) represent the fi ve most frequently occurring codons (1,758/3,712; 47%).Interestingly, the codon UCG is absent in the P. hypohomalia mitogenome.Further comparative analysis showed that frequently occurring codons and those with higher RSCU values for each amino acid have a higher A + T content, which obviously contributes to the A + T bias of the whole mitogenome.

tRNAs and rRNAs
The P. hypohomalia mitogenome contains 22 typical tRNAs with lengths ranging from 63 bp (trnR) to 71 bp (trnK) (Table 1).The total length of tRNAs is 1,407 bp, accounting for approximately 9% of the whole mito- genome (Table 2).Among them, eight tRNAs are encoded by N-strand and the remaining 14 by J-strand.As shown in Fig. 4, all tRNAs exhibit typical clover-leaf secondary structure with the exception of trnS1 (AGN) which lacks the DHU arm; this feature is common to all lepidopteran insects, except in the Adoxophyes honmai (Tortricidae) mitogenome where all tRNAs show a complete clover-leaf structure (Lee et al., 2006).Moreover, the lack of the DHU arm in trnS1 (AGN) is a common feature in metazoan mitogenomes (Garey & Wolstenholme, 1989;Lavrov et al., 2000).In all tRNAs of the P. hypohomalia mitogenome, we recognized 22 unmatched base pairs.Among them, 18 were non-canonical G-U pairs, and the remaining four were mismatched base pairs including three U-U and one G-G pairs.The overrepresented pattern of the non-canonical G-U pairs in tRNAs of the mitogenome is commonly present in other insects (Salvato et al., 2008;Chen et al., 2016;Chen & Du, 2017).

A + T-rich region
The A + T-rich region of the P. hypohomalia mitogenome is located between the rrnS and trnM genes and is 335 bp in size (Fig .1, Table 1).This length is comparable to that of other completely sequenced pyraloid mitogenomes, which ranges from 278 bp in Meroptera pravella (Ali et al., 2017) to 596 bp in Plodia interpunctella (GenBank accession number: KT207942) (Tang et al., 2015).As in other reported pyraloid mitogenomes (Yang et al., 2018), several conserved sequence blocks can be recognized in this region.These blocks include (from 5' to 3' end) the motif "ATAG" and subsequent poly-T structure, the motif "ATTTA" and subsequent macrosatellite (AT) n element, and an "A"-rich 3' end upstream of the trnM gene (Fig. 5A).It has been reported that these conserved blocks play a key role in replication and transcription of the mitogenome (Zhang & Hewitt, 1997).The motif "ATAG" and subsequent poly-T structure, and the nucleotide A-rich 3' end upstream of the trnM gene in the A + T-rich region, are common to all lepidopteran mitogenomes (Fig. 5B).The insect A + T-rich region is generally characterized by the presence of multiple tandem repeat elements (Vila & Björklund, 2004).This character has been recognized in several species such as Chilo auricilius and Scirpophaga incertulas (Cao & Du, 2014;Cao et al., 2014).However, in P. hypohomalia and some other reported pyraloid mitogenomes (Liu et al., 2016;Ma et al., 2016;Yang et al., 2018), no tandem repeat elements were identifi ed.

Phylogenetic analyses
In the present study, we reconstructed the Pyraloidea phylogeny based on a dataset including 37 genes (13 PCGs, two rRNA and 22 tRNAs) of the whole mitogenome.ML and BI analyses generated identical topology as summa-  rized in Fig. 6.Relative to multiple outgroup species, the Pyraloidea formed a well-supported monophyly with two clades corresponding to the two pyraloid families Pyralidae and Crambidae.According to Regier et al. (2012), the Pyralidae includes fi ve subfamilies, four of which were included in the analysis (Chrysauginae was excluded since it did not have a reported mitogenome).The relationships between them were consistently recovered as Galleriinae + (Phycitinae + (Pyralinae + Epipaschiinae)), confi rming the fi ndings of Regier et al. (2012) based on multiple nuclear genes, as well as that of Yang et al. (2018) based on the combined 13 PCGs and two rRNA genes of the mitogenome.Regarding Crambidae, we sampled all seven available mitogenomes of the ten subfamilies defi ned by Regier et al. (2012).Two groups in this family were consistently recovered, generally corresponding to the "PS clade" and "non-PS clade" defi ned by Regier et al. (2012).The wellsupported sister groups Pyraustinae and Spilomelinae constituted the "PS clade" in this study, and the other fi ve subfamilies (Glaphyriinae, Schoenobiinae, Scopariinae, Crambinae, and Acentropinae) formed the "non-PS clade".The division of two groups in Crambidae is also supported by other investigations based on 13 PCGs or a combination of 13 PCGs and two rRNAs (Ma et al., 2016;Yang et al., 2018).However, in the relationships among the fi ve subfamilies of the "non-PS clade", variations exist between the present and previous studies.Our ML and BI analyses consistently recovered them as Glaphyriinae + (Scopariinae + (Crambinae + (Schoenobiinae + Acentropinae))), which shows minor difference with that of Regier et al. (2012).In Regier et al. (2012), the sister group Scopariinae + Crambinae was recovered.However, great differences in topology and node supports can be found between the results herein and that of Ma et al. (2016) and Yang et al. (2018) which were based on partial mitogenome sequences.As noted in Cameron ( 2014), almost all mitogenome-based phylogenetic investigations on insects have included the 13 PCGs, but the inclusion of the two rRNAs and 22 tRNAs has been more variable.Our results show that a dataset including 37 mitochondrial genes can provide more resolved pyraloid relationships than those which use 13 PCGs or a combination of 13 PCGs and two rRNAs, as in Ma et al. ( 2016) and Yang et al. (2018), respectively.This indicates that more phylogenetic information, possi-Fig.6. Bipa rtition tree obtained from ML analysis based on the dataset consisting of all sites of 13 PCGs, two rRNAs and 22 tRNAs.The species with newly sequenced mitogenomes are emphasized in bold.Numbers separated by a slash on the node are the posterior probability for the BI tree and bootstrap value; the dash (-) represents an unrecovered node in BI analysis.
bly in dataset including nuclear and the whole mitogenome sequences (Cameron, 2014), may gain better improvement on phylogeny of related insect groups.Evergestis junctalis, historically belonging to the Evergestinae, was transferred to the Glaphyriinae due to the synonymy between Evergestinae and Glaphyriinae suggested by Regier et al. (2012).Our results with mitogenome data allowed us to clarify the taxonomical identity of E. junctalis and confi rm the previous work of Reiger et al. (2012).
The mitogenome sequenced in this study placed P. hypohomalia within the Spilomelinae of the Crambidae , which is taxonomically in accordance with morphological study (Li, 2012).The Spilomeli nae, with 3,775 species assigned to 318 genera, represents the most speciose subfamily in the Pyraloidea.This large group is currently hypothesized to be polyphyletic or paraphyletic with respect to the Pyraustinae (Solis & Maes, 2002;Regier et al., 2012).However, to date, no phylogenetic investigation has been conducted to support this hypothesis or provide a robust genus-level relationship in Spilomelinae.In this study, 11 spilomeline genera with mitogenome data available were sampled and formed a well-supported group outside the Pyraustinae; the relationships were consistently recovered as Cnaphalocrocis + (((Pycnarmon + Spoladea) + (Haritalodes + Nomophila)) + (((Glyphodes + Dichocrocis) + (Omiodes + Tyspanodes)) + (Palpita + Maruca))).This relationship is different to that of Ma et al. (2016) and Yang et al. (2018) which included seven and ten genera of this subfamily, respectively, which could be ascribed to the different dataset used.

CONCLUSIONS
The complete mitogenome of the P. hypohomalia was determined using the next-generation sequencing method.Comparative mitogenome analyses showed that this mitogenome is typical of Lepidoptera mitogenomes in gene content, gene order and gene orientation.The Pyraloidea phylogeny obtained herein based on a dataset including mitochondrial 37 genes was in corroboration with a previous study based on multiple nuclear genes.In the Spilomelinae, the 11 genera involved herein formed a wellsupported group and consistently showed an identical topology across analyses, which could provide reference for future investigation to resolve the limits and defi nitions of the subfamilies of Spilomelinae.

Fig. 2 .
Fig. 2. A: The overlapping region between atp8 and atp6 in representatives of lepidopteran mitogenomes.Nucleotides colored red indicate the sequence of overlapping region, nu cleotides with green underline indicate partial sequence of the atp8 gene, and nucleotides with blue underline indicate partial sequence of the atp6 gene.B: Spacer sequence between trnS2 and nad1 representatives of lepidopteran mitogenomes.Nucleotides colored red indicate the "motif" sequence, nucleotides with green underline indicate partial sequence of the trnS2 gene, and nucleotides with blue underline indicate partial sequence of the nad1 gene.

Fig. 4 .
Fig. 4. Pu tative secondary structures of tRNAs from the Palpita hypohomalia mitogenome.tRNAs are labelled with the abbreviations of their corresponding amino acids.tRNA arms are illustrated as for trnV.Dashes indicate Watson-Crick base pairs, dots indicate wobble GU pairs, and other non-canonical pairs are not marked.

Fig. 5 .
Fig. 5. A: Organization of the A + T-rich region in the Palpita hypohomalia mitogenome.The motif ATAG (marked red) and subsequent poly-T stretch (marked green), the motif ATTTA (marked purple) and subsequent microsatellite-like AT repeat sequences (marked yellow), and the 3' end sequences highly biased "A" bases (marked blue), are emphasized.B: Alignment of the A + T-rich regions from representatives of lepidopteran mitogenomes.The conserved motif ATAG (marked red) and subsequent poly-T stretch (marked green), and the 3' end sequences highly biased "A" bases (marked orange), are emphasized.Dots indicate omitted sequences and the number of dots is not proportional to nucleotide number of corresponding part.
Note: "J" indicates the majority strand and "N" indicates the minority strand in the strand column.