The draft genome sequence of the Japanese honey bee , Apis cerana japonica ( Hymenoptera : Apidae )

Honey bees are not only important for honey production but also as pollinators of wild and cultivated plants. The Eastern honeybee (Apis cerana) is more resistant to several pathogens than the Western honeybee (Apis mellifera), and the genomes of two strains of the nominotypical subspecies, A. cerana cerana, northern (Korea) and southern (China) strains, have been sequenced. Apis cerana japonica, another subspecies of A. cerana, shows many specifi c features (e.g. mildness, low honey production and frequently absconds) and it is important to study the molecular biological and genetic aspects of these features. To accelerate the genetic research on A. cerana japonica, we sequenced the genome of this subspecies. The draft genome sequence of A. cerana japonica presented here is of high quality in terms of basic genome status (e.g. N50 is 180 kbp, total length is 211 Mbp, and largest contig length is 1.31 Mbp) and BUSCO results. The gene set of A. cerana japonica was predicted using AUGUSTUS software and the set of genes was annotated using Blastp and InterProScan, and GO terms were added to each gene. The number of genes is higher than in A. mellifera and in the two strains of A. cerana cerana sequenced previously. A small number of transposable elements and repetitive regions were found in A. cerana japonica, which are also in the genomes of A. mellifera and the northern and southern strains of A. cerana cerana. Apis cerana is resistant to several pathogens that seriously damage A. mellifera. We searched for 41 orthologs related to the IMD and Toll pathways, which have key roles in the immune reaction to invading pathogens. Some orthologs were not identifi ed in the genome of the northern strain of A. cerana cerana. This indicates that the Toll and IMD pathways function in the same way as in A. mellifera and Drosophila melanogaster. Use of the draft genome sequence of A. cerana japonica provided herein and those of the other Apis (sub)species may help to accelerate comparative research on the genome of honey bees. * K. Yokoi and H. Uchiyama contributed equally to this work. ** Present address: Graduate School of Life Sciences, Tohoku University, 6-3 Aoba, Aramaki-aza, Aoba-ku, Sendai, Miyagi 980-8578, Japan; e-mail: takeshi.wakamiya.q2@dc.tohoku.ac.jp *** Present address: Yamazaki University of Animal Health Technology, 4-7-2 Minami-osawa, Hachioji, Tokyo 192-0364, Japan; e-mail: t_furukawa@yamazakicollege.onmicrosoft.com **** Corresponding author; e-mail: kimura@affrc.go.jp INTRODUCTION Honey bees are not only important for honey production but also as pollinators of wild and cultivated plants. In addition, honey bees are used as a model insect for social insect biology. Because of its importance, the western honey Eur. J. Entomol. 115: 650–657, 2018 doi: 10.14411/eje.2018.064


INTRODUCTION
Honey bees are not only important for honey production but also as pollinators of wild and cultivated plants.In addition, honey bees are used as a model insect for social insect biology.Because of its importance, the western honey strength of social barriers to disease, or a tendency for bees to be attacked by a limited set of highly coevolved pathogens (Evans & Spivak, 2010).
The genomes of the two subspecies of the Asian honey bee A. cerana cerana northern and southern strains have been sequenced (Park et al., 2015;Diao et al., 2018).The general features of the genome of the western honey bee are also conserved in these species.Compared with A. mellifera, A. cerana is known to be more resistant to varroa mites (Peng et al., 1987) and American foulbrood (Chen et al., 2000).On the other hand, it is known that the eastern honey bee is vulnerable to tracheal mites, Acarapis woodi (Rennie, 1921) (Tarsonemoidea: Tarsonemidae) (Sakamoto et al., 2017), and susceptible to Chinese sacbrood (Shan et al., 2017).The Japanese honey bee A. cerana japonica (Radoszkowski, 1877) (Hymenoptera: Apidae), a subspecies of A. cerana, differs in characteristics such as mildness, low honey production, hot defensive bee ball and frequent absconding, from Western honey bees (Matsuura, 1988;Ono et al., 1995;Yoshida, 2000).Even though A. cerana japonica is a relatively well investigated species compared with the other subspecies of A. cerana cerana, there is little genetic or molecular level evidence for these behaviours and features (Ugajin et al., 2012).Genetic variation among subspecies of the Eastern honey bee has been studied (Smith et al., 2000).Apis cerana cerana and A. cerana japonica are considered to be one species because the average genetic distance of 13 mitochondrial proteincoding genes in the two subspecies is 0.006 (Takahashi et al., 2016).To accelerate genetic research on A. cerana japonica, we determined the whole genome sequence of A. cerana japonica.
There are two reasons for generating the whole genome sequence of A. cerana japonica.One is to investigate the immune genes in more detail, as the genome sequences could contribute to further understanding of the characteristic behaviour and features of this system.Furthermore, we assume that the features of A. cerana japonica related to bacterial resistance could be refl ected in differences in the immune-related genes.As described above, the IMD and Toll pathways have key roles.However, several important orthologs consisting of the IMD and Toll pathways have not been identifi ed in the genome of A. cerana cerana from Korea, the northern strain (Pelle, BG4, Kenny and Dredd) (Park et al., 2015).To confi rm whether orthologs related to the IMD and Toll pathways exist in the A. cerana genome, we searched for these orthologs in A. cerana japonica using a local Blast method.
The western honey bee genome contains fewer genes involved in innate immunity, detoxifi cation enzymes and gustatory (taste) receptors, although not surprisingly, it contains more genes for olfactory receptors and novel genes for nectar and pollen utilization (Evans et al., 2006;Honeybee Genome Sequencing Consortium, 2006).The honey bee genome is more similar to that of vertebrates than insects for genes involved in circadian rhythm, as well as biological processes involved in turning genes on or off.These features accord with the ecology and social structure of honey bees.Such reductions appear to be especially pervasive in the immune system.Although A. mellifera has 1 : 1 orthologs related to the signal transduction phase and NF-κB transcription factors in the Toll and IMD pathways, A. mellifera has fewer immune effector gene, such as antimicrobial peptide genes, compared with D. melanogaster, T. castaneum and A. gambiae (Evans et al., 2006).The immune fl exibility in bees may be associated with either the of TEs in closely related species is critical, and a whole genome analysis of A. cerana japonica is most suitable for this purpose.
In this report, we summarize the features of the Japanese honey bee genome by comparing it mainly with the genomes of the two strains of A. cerana cerana and A. mellifera, with particular focus on innate immune genes and TEs.

Bee sample
A drone pupa was collected from a hive in an apiary at Kyoto Sangyo University, Kyoto, Japan in 2012 (35°04´13˝N, 135°45´28˝E).

Extraction of genomic DNA
Genomic DNA (gDNA) was extracted from thoracic muscles using the standard phenol/chloroform method, which is based on Sambrook et al. (1989).To avoid contamination, the sample was washed with 99.5% ethanol and the muscle removed using dissecting scissors and sterilized tweezers.Lysis buffer (100 mM Tris-HCl pH 8.0, 10 mM EDTA, 0.5% SDS), Proteinase K (Qiagen, Hilden, North Rhine-Westphalia, Germany) and RNase A (Qiagen) were used to dissolve the tissue, which was then incubated overnight at 55°C.The solution was then washed with Tris-EDTA-saturated phenol, phenol/chloroform/isoamyl alcohol (25 : 24 : 1) and chloroform/isoamyl alcohol (24 : 1).Finally, gDNA was purifi ed by precipitating in ethanol and re-eluting in Tris-EDTA buffer.

DNA preparation and sequencing
Using the extracted gDNA, we prepared three libraries for the construction of the Apis cerana japonica genome.A fl ow diagram of the procedure is shown in Fig. 1.The three libraries were: a short-read paired-end library using a NEB Ultra DNA Library Preparation Kit (New England BioLabs, Beverly, MA, USA), a synthetic long-read DNA library (Illumina, San Diego, CA, USA) using a Truseq Long-Read Synthetic DNA library preparation Kit, and a long-read library using a rapid sequencing kit (RAD002) (Oxford Nanopore Technologies, Oxford, UK).The short-read paired-end library and the synthetic long-read DNA library were sequenced using Illumina HiSeq 2500 (Illumina), and the long-read library was sequenced using MinION with R9.4 fl ow cells and a MinKnow v1.4 (Oxford Nanopore Technologies).The raw sequence data from the short-read paired-end library, the synthetic long-read DNA library and the long-read library were deposited in the DDBJ sequence read archives (accession numbers DRR095708, DRR095707, and DRR095709, respectively in DRA005890).

Genome assembly, gene prediction and gene annotation
Approximately two-hundred-and-fi fty-thousand synthetic long-reads were constructed using BaseSpace (Illumina).Primary contigs were constructed using Spades ver.3.10.1 software (Bankevich et al., 2012;Bankevich & Pevzner, 2016), with default settings, from the synthetic long-reads and short-read pairedend reads.Finally, the draft genome sequences were constructed from the primary contigs and the long-read sequences using the Finishing module in CLC genomic workbench ver.1.7 (Qiagen) with default settings.The work fl ow for constructing the A. cerana japonica draft genome is shown in Fig. 1.The draft genome sequences were deposited in DDBJ and the accession numbers of each contig are BDUG01000001 to BDUG01003315.The gene set of A. cerana japonica was predicted using AUGUSTUS ver.3.2.3assisted by the A. mellifera library and with default settings (Stanke & Waack, 2003).For validation of the predicted gene set, the predicted amino acid sequence data for A. mellifera (version Amel_4.5), A. cerana cerana northern strain (version ACSNU-2.0), A. cerana cerana southern strain (BioProject Accession number: PRJNA239323) and A. cerana japonica were evaluated using BUSCO.v3with Insecta_odp9 and default settings (Simão et al., 2015: http://busco.ezlab.org/).
The amino acid sequences of the gene set were analyzed using the Blastp method against the NCBI-nr dataset, with default settings except e-value < 1e -3 and the three top descriptions limited.Additionally, InterProScan was used with the gene set using Blast2GO version 5.1.12with default settings (Conesa et al., 2005).Using the Blastp and InterProScan results, the Gene Ontology (GO) terms were added using Blast2GO version 5.1.12with default settings.

Searching for transposable elements and repetitive sequences
The genome sequence of A. cerana japonica was searched for TEs and repetitive sequences using Repeat masker with default settings (version open 4.0.5 with Apis mellifera sequence library in Repbase 20160829 update version with RMBlastn version 2.2.27+:Smit A.F.A., Hubley R. & Green P. RepeatMasker Open 4.0.2013-2015, http://www.repeatmasker.org).

Identifi cation of orthologs related to the Toll and IMD pathways
A Blastp database of amino acid sequences of the predicted gene sets in A. cerana japonica and A. cerana cerana southern strain was compiled (sequence data are available at https:// www.ncbi.nlm.nih.gov/genome/proteins/12051?genome_assem-bly_id=328750).Query amino acid sequences related to the Toll and IMD pathways were extracted from the A. mellifera genome database, BeeBase, and D. melanogaster, FlyBase or NCBI Gen-Bank (each query sequence and accession ID is listed in Supplementary data 6).Almost all A. mellifera sequence IDs in the query sequences, which begin with "GB1….",were extracted from the amel_OGSv1.0data fi le (http://hymenopteragenome. org/drupal/sites/hymenopteragenome.org.beebase/files/data/amel_OGSv1.0_pep.fa.gz).If the Blastp result of each query showed sequences with e-values < 1e -10, we regarded the top hit sequence as an ortholog of the corresponding query gene.

Assembled genome sequence of A. cerana japonica and its predicted gene sets
We used a drone pupa from a single hive as a sample for the whole genome sequencing, because honey bees exhibit haplodiploid reproduction, in which males (drones) are haploid whereas females are diploid.We constructed the draft genome of A. cerana japonica from short-read data, synthetic long-read data and single molecule longread data (Fig. 1).Table 1 shows the basic status of the A. cerana japonica draft genome sequence we constructed.The genome sequence consisted of 3,315 contigs.Of these contigs, 3,295 contigs were over 1 kbp in length (the largest was about 1.31 Mbp).The coverage number was about 291.8, N50 (the shortest contig length required for covering 50% of the genomes, which indicates the quality of the whole genome sequence, was 180,259 bp, and the number of Ns (undetermined or ambiguous nucleotides in genome sequences) per 100 kbp was 3.24.These numbers indicate that the constructed contigs were suffi cient for use as a draft genome sequence of A. cerana japonica.The genome sequences contained 33.1% GC, which was similar to A. mellifera (32.5%), A. cerana cerana northern strain (30.0%) and southern strain (38.0%) (Honeybee Genome Sequencing Consortium, 2006;Park et al., 2015).Using the draft genome sequence, we obtained a predicted gene set.The predicted total number of genes was 13,222.The detailed data on the genes was derived from AUGUSTUS as a gff3 fi le and is available in Supplementary data 1.Data on the DNA and amino acid sequences of the predicted genes in fasta formats based on the gff3 fi les are in Supplementary data 2 and 3, respectively.The assessment of the predicted gene sets of A. cerana japonica and the three Apis species (A.mellifera and the northern and southern strains of A. cerana cerana) using BUSCO software and comparison of the numbers in the three genomes (Table 2) provided additional support for the draft genome.Almost all genes of Apis species extracted using BUSCO belong to the Complete category, in which genes are of high-identity, full-length homologs and conserved in insects.The results indicate that the sequenced genome encoded highly con-served genes and the data on the genome is of good quality and suitable for further analyses.
To annotate the functions of the predicted genes, Blastp and InterProScan were used.Among the 13,222 predicted genes, 11,352 and 13,200 genes were annotated using Blastp and InterProScan, respectively (Fig. 2A).Consequently, using Blastp and InterProScan, GO terms were added to 5,060 genes (raw output results from Blast2GO are available in Supplementary data 4).Level 1 GO terms consisting of Biological process (Bp), Cellular component (Cc) and Molecular function (Mf) accounted for 3,275, 3,046, and 3,706 genes respectively (Fig. 2B).The frequencies of Level 2 GO terms for Bb, Cc and Mf were investigated.Among 29 level 2 Bp terms, "cellular process" and "metabolic process" were added to half of the 3,275 Bp-annotated genes (Fig. 2C)."Biological regulation" and "localization" were annotated to over 10% of the 3,275 genes.The other terms were annotated to less than10% of the genes or no genes.Among 21 level 2 Cc terms, 50% of 3,046 Cc-annotated genes were annotated with "membrane part" and "cell part" (Fig. 2D)."Organelle" "protein-containing complex", "organelle part", and "membrane" were annotated to about 20%, 20%, 15% and 10% of the 3,046 genes, respectively.The other 15 terms were annotated to less than 5% of the genes or no genes.Among the 15 level 2 Mf terms, "binding" and "catalytic activity" were annotated to over 50% of the 3,706 Mf-annotated genes (Fig. 4E).The other level 2 Mf terms were annotated to less than 10% of the 3,706 genes.Seven Mf term were not added.

Transposable elements and repeat contents
We searched for TEs and repetitive regions in the assembled genome of A. cerana japonica using RepeatMasker.A summary of the RepeatMasker results is shown in Table 3, and detailed data on each identifi ed element is available in Supplementary data 5. TEs and repetitive sequences constituted only 5.28% of the genome of A. cerana japonica.
TEs were classifi ed into two main classes, class I retrotransposable elements and class II DNA transposons (Piégu et al., 2015).Cl ass I retrotransposable elements use mRNA as an intermediate and transpose in a "copy and paste" manner.Class I TEs are further classifi ed into two subclasses according to whether or not long terminal repeats (LTR) are present in the TE: non-LTR elements and LTR elements.Non-LTR elements consist of long repeats interspersed with nuclear elements (LINEs) and short repeats interspersed with nuclear elements (SINEs).As shown in Table 3, only 44 class I retrotransposable elements were found in the genome of A. cerana japonica (total length 75,753 bp; 0.04% of the genome).The 44 class I elements were categorized into the LINEs R2/R4/NeS.Class II DNA transposons move in a "cut and paste" manner, and only 149 class II TEs were identifi ed (total length 57,679 bp).The 144 class II TEs were classifi ed in the family Tc1/ Mariner-IS630-Pogo.Tc1/Mariner-IS630-Pogo elements are widely found in multiple species, including humans and Drosophila, and the sequence of the element varies.

Identifi cation of orthologs involved in the Toll and IMD pathways in A. cerana japonica
We conducted searches for Toll and IMD pathway orthologs in A. cerana japonica.First, we selected 41 genes comprising Toll and IMD pathways genes from other species, most of which were from A. mellifera or D. melanogaster.Detailed results are shown in Supplementary data 5, and see the Query sequence description column in these results.These gene sequences were used as queries against the A. cerana japonica amino acid Blastp database we constructed.The Blastp database consisted of the amino acid sequence data of the predicted gene data set described above (Supplementary data 3).In A. cerana japonica, all orthologs except abaecin were found, and e-values of the gene orthologs were < 2e -29 (highest number of the evalues: D. melanogaster DREDD) (Supplementary data 6).In the case of abaecin, we have previously identifi ed and determined the sequence of A. cerana japonica abaecin (Yoshiyama & Kimura, 2010).
We also searched for the 41 orthologs in the genome of the southern strain of A. cerana cerana, and all 41 orthologs were found (data not shown).Taken together, these 41 immune-related genes are conserved in A. cerana japonica and the southern strain of A. cerana cerana.Several identifi ed orthologs for the Toll and IMD pathway genes in A. cerana japonica and the southern strain of A. cerana cerana are summarized in Fig. 3.

DISCUSSION
In this study, we constructed a draft genome for the Japanese honeybee, A. cerana japonica.Compared with the basic genome sequence status of A. mellifera and the southern and northern strains of A. cerana cerana, e.g.contig N50 numbers, total length and BUSCO results (Honeybee Genome Sequencing Consortium, 2006;Park et al., 2015;Diao et al., 2018), the A. cerana japonica draft genome sequence is of higher or similar quality, respectively.For construction of the draft genome sequence, we used three different low-reads, which consisted of Illumina shortreads, Illumina short-reads from a Long-Read Synthetic DNA library, and long-reads from MinION, and utilized different software, as shown in Fig. 1.The N50 value was approximately 180 kbp, which was suffi cient for using the sequences as whole a genome draft sequence, compared with other Apis genome sequences (Honeybee Genome Sequencing Consortium, 2006;Park et al., 2015;Diao et al., 2018).This indicates that the present method is useful for the construction of high-quality draft genomes.
A total of 13,222 genes were predicted in A. cerana japonica using Blastp and InterProScan.Based on the results, 5,060 genes were annotated with GO terms.Although both the ratios of Blastp and InterProScan hits in the predicted gene set were over 80%, the ratio of GO annotated genes was less than 40%.This may be because GO terms are mainly based on human, mouse or the other model species, but not non-model organisms (e.g.insects), implying that the genes that are not conserved in model species are not annotated with GO terms (Ashburner et al., 2000;The Gene Ontology Consortium, 2017).On the other hand, among Level 2 GO term distributions in the three basic GO terms (Bp, Cc, and Mf), the GO terms with many sub terms (e.g.Fig. 3. IMD and Toll pathway-related genes in the genomes of A mellifera, northern and southern strains of A. cerana cerana, and A. cerana japonica.The IMD and Toll pathways and the genes in these pathways are known to be important in the immune reactions of several species of insects.We searched for 41 orthologs of the genes that make up the IMD and Toll pathways in the southern strain of A. cerana cerana and A. cerana japonica.The genes in the fi gure are the orthologs identifi ed in the gene sets of the southern strain of A. cerana cerana and A. cerana japonica; information on these orthologs in A. mellifera and the northern strain of A. cerana cerana is reported in Evans et al. (2006) and Park et al. (2015) respectively.Genes whose names are written in black were identifi ed in all four (sub)species of Apis, whereas those in white in black boxes were identifi ed only in the genomes of A. mellifera, southern strain of A. cerana cerana and A. cerana japonica.
cellular process and metabolic process for Bp, membrane part and cell part for Cc, and binding and catalytic activity for Mf) were annotated to many genes.These results indicate that over half of the genes in A. cerana japonica are not conserved in mammals or other model organisms, whereas other genes functions were conserved between A. cerana japonica, mammals and the model organisms.
Most genes involved in the Toll and IMD pathways, which were found in the genome of A. mellifera, were identifi ed in that of A. cerana japonica.This indicates that the Toll and IMD pathways function in same way as in A. mellifera and D. melanogaster (Evans et al., 2006).Several genes from the intracellular Toll and IMD pathways, specifi cally Pelle, BG4 (FADD), Kenny and Dredd, are not present in the genome of the northern strain of A. cerana cerana (Park et al., 2015).We identifi ed these genes in the genome of A. cerana japonica.This might be because the quality of the genome sequence of the northern strain of A. cerana cerana is lower than that of A. cerana japonica.The genome of the northern strain of A. cerana cerana was constructed from sequence reads obtained using 454 Roche and Illumina sequencers, which could have resulted in incorrect sequence regions or undetermined sequence regions.Pelle, BG4 (FADD), Kenny and Dredd in the genome of the northern strain of A. cerana cerana may be located in such regions and could not be identifi ed.
Analyses showed that repetitive regions in the genome of A. cerana japonica made up about 5% of the whole genome sequence, which is consistent with the 3%, 6%, and 4.2% in A. mellifera, and the northern and southern strains of A. cerana cerana, respectively (Honeybee Genome Sequencing Consortium, 2006;Park et al., 2015;Diao et al., 2018).TE depletion can also be shown in A. cerana japonica.An interesting point is that there are only 149 MLEs in the genome of A. cerana japonica (Table 3), whereas there are 1,130 MLEs, 924 DNA transposons, and no MLEs in A. mellifera and the northern and southern strains of A. cerana cerana, respectively (Honeybee Genome Sequencing Consortium, 2006;Park et al., 2015).We searched for TEs in the genome of A. cerana japonica using Repeat-Masker and the A. mellifera transposon library.The differences could be explained by the differences in qualities of sequences of the genomes of species of Apis and A. cerana japonica.The number of MLEs in the genomes of other species of Apis may be not determined correctly, because the numbers of contig N50 in the genomes of other species of Apis are lower than in A. cerana japonica.When the genome sequence was constructed from only short read data, sequences containing repeat regions could not be determined precisely.On the other hand, even if these estimated numbers may contain false positives, the number of MLEs may differ between closely related species, because among species, the number of MLEs differ by several orders of magnitude.
In this study, we determined the draft genome sequence of a subspecies of A. cerana: A. cerana japonica.Using multiple sequence techniques, a high-quality draft genome sequence was constructed.Moreover, the gene set and re-petitive elements, including TEs, were predicted using the draft genome.Consequently, it was possible to analyze this data and obtain basic and comprehensive genomic insights, which indicate that the genome of A. cerana japonica is similar to that of the other subspecies of A. cerana and A. mellifera.However, there are several differences between these species of Apis (e.g. in TE numbers).Further comparative analyses will accelerate the development of the comparative research on species of Apis.

Fig. 1 .
Fig. 1.Flow diagram of the stages in the construction of the A. cerana japonica draft genome.The A. cerana japonica draft genome was constructed from multiple types of sequence data (square boxes) using several types of software (circular boxes).This fi gure shows the types of data and software used.Numbers in brackets indicate the accession numbers of each sequence data set in DDBJ.

Fig. 2 .
Fig. 2. Analyses of the predicted gene annotations.All annotation processes except Blastp against the NCBI-nr dataset were done using Blast2GO with default settings.(A) Ratio of hits and annotated genes in Blastp, InterProScan and GO terms in the predicted genes.Numbers in brackets in each analysis indicates the actual numbers of hits or annotated genes.(B) Numbers of genes annotated with Level 1 GO terms, Biological process (Bp), Cellular component (Cc) and Molecular function (Mf).(C), (D), (E) Numbers of genes annotated with Level 2 GO terms from Bp (C), Cc (D) and Mf (E).

Table 1 .
Basic status of the draft genome sequence of A. cerana japonica.

Table 2 .
Results of the validation of the predicted gene sets of A. mellifera, A. cerana (Korea) and A. cerana japonica using BUSCO.

Table 3 .
Transposable elements and repetitive regions in the genome of A. cerana japonica.
* Most of the repeats fragmented by insertions or deletions were counted as one element.