Leucine-rich fibroin gene of the Japanese wild silkmoth, Rhodinia fugax (Lepidoptera: Saturniidae)

We cloned and characterized a partial fibroin gene of Rhodinia fugax (Saturniidae). The gene encodes a fibroin consisting mainly of orderly arranged repeats, each of which is divided into a polyalanine and a nonpolyalanine block, similar to the fibroins of Antheraea pernyi and A. yamamai. Three repeat types differ in the sequence of the nonpolyalanine block. In contrast to the Antheraea fibroins, the fibroin of R. fugax is rich in glutamate and leucine residues (about 3% and 5%, respectively) and contains


INTRODUCTION
Many lepidopteran species produce silk to form cocoons.The domesticated silkmoth Bombyx mori produces silk consisting of two major components, fibrous and glue proteins.The glue proteins consist of three kinds of sericin proteins (Takasu et al., 2007).The fibrous proteins consist of fibroin-heavy chain (FHC), fibroin-light chain and P25 (e.g., Inoue et al., 2000), with FHC being the largest.FHC of B. mori is about 350 kDa and consists mainly of repetitive units, in which glycine, alanine and serine dominate, making up about 88% of the protein (Zhou et al., 2000).FHC is derived from a single gene that consists of two exons separated by an intron of about 1.0 kb.The first exon encodes only 14 amino acid residues, the second is larger (~15.7 kb) and encodes the repetitive units (Zhou et al., 2000).
In contrast to B. mori, wild silkmoths of the Saturniidae family produce silk with only a single type of fibroin (Tanaka & Mizuno, 2001) and sericin glue, although the sericins of Saturniidae are not well characterized.Saturniid fibroins contain repetitive polyalanine blocks and are rich in glycine.The fibroin gene of Antheraea pernyi (Apf), a member of the Saturniidae, was cloned and characterized by Sezutsu & Yukuhiro (2000).The gene's first exon encodes 14 amino acid residues, similar to Bombyx FHCs, but the Apf intronic sequence is 120 bp long, which is much shorter than that of Bombyx FHCs.The second exon is about 7.0 kb long and mainly consists of 78 repetitive units that encode a polyalanine block followed by a glycine-rich sequence.The repetitive part of Apf shows no similarity to that of Bombyx FHCs (Sezutsu & Yukuhiro, 2000).Hwang et al. (2001) report the genomic structure of the fibroin gene from Antheraea yamamai, which is closely related to A. pernyi.The A. yamamai fibroin (Ayf) is very similar to Apf.Note that many spider silk fibroins include repetitive polyalanine blocks (Xu & Lewis, 1990;Hinman & Lewis, 1992) and that dragline silks (e.g., spidroins) contain repetitive polyalanine arrays and are extremely strong and comparable to steel (Gosline et al., 1999).
The recent progress in transgenic technology has allowed the development of "insect factories" for producing exogenous proteins using B. mori (Tamura et al., 2000).Expression of foreign fibroin genes in the silk gland may be used to produce novel types of silks.To date, Ayf and spider silk proteins have been expressed in this way (I.Kobayashi & K. Kojima, pers. commun.).Since the silk of Rhodinia fugax has an interesting appearance, we decided to identify the fibroin gene of this species.
Rhodinia fugax Butler is a Saturniidae moth found in Japan, except for the Hokkaido and Okinawa islands (Inoue et al., 1982).It produces a cocoon with two holes (Yago & Mitamura, 1999), the upper one for adult emergence and the lower for draining rainwater.Rhodinia fugax cocoon silk has some peculiar features, such as it is difficult to spin and remains a green colour for a relatively long period (compared to A. yamamai).Little information exists on the properties at the protein level of R. fugax fibroin (Rff).
To identify the architecture of Rff, we used an Ayf DNA sequence as a probe and cloned part of the Rff gene from a genomic DNA library.We isolated a clone with nucleotide sequence corresponding to the 5' flanking region, an exon, an intron and part of a second exon.Although the clone did not include the posterior region of the gene, it contained sufficient repetitive units for general characterization of Rff.The repetitive units encode polyalanine blocks with variable nonpolyalanine sequences, some of which are leucine-rich ones.Based on what we have so far sequenced, leucine (Leu) is the fifth most abundant amino acid residue in Rff, although three other fibroin genes (Bombyx FHC, Apf and Ayf), which are fully characterized in the Bombycoidea superfamily, contained very few Leu residues.Note that another species of Saturniidae, Saturnia (or Caligula) japonica, also produces silk relatively rich in Leu residues (Kirimura, 1962).We discuss the nonrandom association of different types of repetitive motifs and the evolutionary aspects of R. fugax fibroin.

Samples
Genomic DNA was prepared from a pair of silkglands of a final instar larva of R. fugax using a standard technique (Sambrook et al., 1989).The larvae were supplied by Dr. H. Saito (Kyoto Institute of Technology).

Genomic DNA library preparation and screening
Genomic DNA library of R. fugax was prepared following Sezutsu & Yukuhiro (2000).We isolated clones that hybridized to a 32 P-labeled Ayf genomic DNA sequence according to Sezutsu & Yukuhiro (2000).A clone named -Rf1 was used in the following analysis because it contained the longest insert.EcoRI digestion of -Rf1 resulted in three different fragment sizes (Fig. 1).A 9.0-kb fragment was subcloned into pBluescript SK -(pSK-Rf1E9).

Sequencing strategy and analysis
We prepared a series of deletion derivatives from pSK-Rf1E9 using the Kilo-Sequence Deletion Kit (TaKaRa, Tokyo, Japan).These deletion derivatives were used as templates in the sequencing reaction using T7 and T3 primers.We used the BigDye Terminator ver.1.0 (ABI, Columbia, MD, USA) in the sequencing reaction and followed the manufacturer's instructions, and an ABI Prism Autosequencer 377 to detect signals.Individual sequence data were assembled using Sequence Navigator (ABI).We deduced the amino acid sequence and prepared an amino acid contents table using Genetyx Mac (Genetyx Corp., Tokyo, Japan).

RESULTS AND DISCUSSION
A 4732-bp sequence contained a partial Rf fibroin gene We determined nucleotide sequence of a 4732-bp fragment (GenBank accession number: AB437258) of the pSK-RF1E9 insert.By comparison with the data for A. yamamai (Tamura et al., 1987) the 5' flanking region, an exon, an intron and part of a second exon of the fibroin gene of R. fugax (Rff) were distinguished.The 5' flanking region was 919 bp long, and the first exon, intron and the partial second exon made up a 3813-bp fragment (Fig. 1).

Only a small range of sequences was conserved in the 5' flanking region
In general, the 5' flanking regions of genes contain multiple sequence blocks that contribute to the regulation of gene expression.Therefore, sequence blocks tend to be conserved in the homologous regions of different species.However, we found only a small range of conserved sequence blocks in the Rff and Ayf genes (Tamura et al., 1987).One of the conserved blocks corresponded to the TATA box (Fig. 2).Several sequence domains conserved between A. yamamai and B. mori (Tamura et al., 1987) were barely detectable in the Rff 5' flanking region.Individual repeats are numbered from head to tail, classification of the repeat types (described in Fig. 4) is shown on the right.Black rectangles mark regions that consist of Type 1 repeats and empty rectangles regions that are composed of the array repeat Types 3G, 2, and 3E.As shown in Fig. 4, the Type 1 repeat consisted of a PAB and a NPAB composed of 24 amino acid residues including three Leu residues.Leu is absent in the repetitive parts of Apf and Ayf, and very rare in Bombyx FHC.A variant repeat type1E, which contained Glu residue (E) at the 19 th site, was always followed by Type 3G repeat (described below).Two other modifications of the Type 1 repeat (Type 1L) had three additional amino acid residues, lysine (K) and a GlyLeu doublet.
In contrast to the Type 1, the Type 2 repeat was rich in serine (S), wherein 10 S occurred among 19 amino acid residues of the NPAB (Fig. 4).Type 2 repeat carried also one E residue.
Type 3 repeat included a NPAB consisting of 15 amino acid residues (Fig. 4).We classified Type 3 repeats into two subtypes: Type3E carried three E that were replaced by G in the 3G repeat type.Because Gly is hydrophobic, and Glu is hydrophilic and acidic, this type of amino acid replacement might induce differences in the secondary structure of the repeat.The high content of Glu (Fig. 5) also affects the overall charge of Rff in comparison with Antheraea fibroins.
Two Antheraea fibroins (Apf and Ayf) had four types of repeats.We did not determine the orthologous relationships between Rff and the Antheraea fibroins, except for a comparison of the Type 1 repeats (see below).

Nonrandom repeat distribution
As shown in Fig. 3c, the repetitive region began with the Type 3 repeat, which is followed by five repeat clusters.Each cluster included 1 to 4 Type 1 repeats (one of them was the 1L type in the 2 nd and 5 th clusters), followed by a sequence consisting of the repeat Types 1E, 3G, 2, and 3E (Fig. 3c).
Apf and Ayf had highly ordered but species-specific distributions of the repeats, suggesting shuffling of the repetitive units caused by gene conversion and/or unequal crossing over (Jeffreys et al., 1985).Sezutsu & Yukuhiro (2000) suggested that chi-like sequences played a key role in this type of event in the Apf gene.However, as we found little chi-like sequences in the Rff gene, repetitive unit shuffling independent ofchi-like sequences might be occurring in the Rff gene.

Difference in amino acid contents
Fig. 5 shows the amino acid contents of the Rff and Apf repetitive regions.Leu residue was remarkably frequent (5.39%) in Rff, but absent in Apf and Ayf.Leu residues are also very rare in Bombyx FHC (0.13%), which differ greatly inrepetitive sequences from Rff and Antheraea fibroins.
Leu residues were only in the Type 1 repeat of Rff (Fig. 4).The NPABs of the Antheraea fibroins often had single or double Ala residues (Fig. 4), while that of Rff did not.Fig. 6 shows Rff repeat of Type 1 aligned with the Apf Type 1 NPAB sequence.The two Ala residues in the Apf Type 1 repeat are replaced by Ser and Leu and an additional Leu doublet is inserted into the Type 1 repeat of Rff.The diversification of Rff from the Antheraea fibroins is apparently associated with a tendency to exclude alanine residues from the NPABs.In the other Saturniidae species, it is reported that there is a relatively high content of Leu residues in Saturnia japonica fibroin, whereas Samia cynthia ricini fibroin contains less (Kirimura, 1962).
As described above, the Rff gene is very poor in the chi-like sequence.In the Apf gene, the AGG amino acid triplet corresponds to a chi-like sequence.The near lack of chi in the Rff gene may be associated with the much lower content of Ala residue in the NPABs.
Fibroins are important candidates for the production of transgenic silkworms that would secrete novel types of silk.It is therefore essential to identify fibroin genes in species that produce attractive silks.The partial primary structure of the Leu-rich fibroin of R. fugax is described here.By incorporating this gene into B. mori, we can generate a series of novel types of silk.The sequence information coupled with physicochemical research will also reveal how to modify other fibroin genes in order to obtain silks with different properties.Further research, including the cloning and elucidation of other types of fibroin genes, may result in the development of new types of biomaterials.

Fig. 2 .
Fig. 2. Nucleotide sequence of the 5'-part of the fibroin gene of A. yamamai and R. fugax.The TATA sequence is boxed and the transcription is estimated to start at the site marked +1 where the first exon begins.Exons are demarcated by black triangles, coding sequence is printed boldand the intronic sequence is in italics.Dash (-) indicates a gap in alignment and the dot (.) marks nucleotide identity with the fibroin gene of A. yamamai.

Fig. 1 .
Fig. 1.Schematic drawing of the strategy used to clone the Rhodinia fugax fibroin gene.The upper line indicates -Rf1 sequence that contains R. fugax fibroin gene.The thick line designates genomic DNA sequence, and the thin line the represents DNA sequence.Capital letters stand for the restriction enzyme cutting sites: E for EcoRI, N for NotI, H for HindIII and S for SalI.Numbers on the thick line indicate sizes of the EcoRI fragments.The middle part shows a subclone named pSK-Rf1E9 that contained a 9.0 kb EcoRI fragment a part of which was sequenced (4732 bp).The lower part of the figure schematically indicates the structure of the R. fugax fibroin gene determined.

Fig. 3 .
Fig. 3. Alignment of deduced amino acid sequences of the fibroins of Saturniidae.a -amino acid sequences encoded by the first exon in R. fugax, A. pernyi, A. yamamai, A. mylitta, and B. mori, respectively.Residues identical to those in the fibroin of R. fugax are shown as dots.b -sequences of the nonrepetitive part of R. fugax, A. pernyi, A. yamamai and A. mylitta fibroins that are encoded by the 5'-terminal parts of the second exon.Residues identical to those in the fibroin of R. fugax are shown as dots, and gaps in alignment as dashes.c -repetitive part of the R. fugax fibroin deduced from the genomic sequence.Individual repeats are numbered from head to tail, classification of the repeat types (described in Fig.4) is shown on the right.Black rectangles mark regions that consist of Type 1 repeats and empty rectangles regions that are composed of the array repeat Types 3G, 2, and 3E.

Fig. 4 .
Fig. 4. Fibroin repetitive structures.a -three types of R. fugax fibroin repeats; b -four types of A. pernyi and A. yamamai repeats; c -a partial amino acid sequence of the repetitive region of B. mori FHC.

Fig. 5 .
Fig. 5. Differences in the amino acid content of the repetitive region of Rff and Apf.Ala is the most abundant in both fibroins, but less so in Rff than in Apf.By contrast, Leu and Glu exhibit high frequency in Rff, although they are extremely infrequent in Apf.In contrast to Leu and Glu, Trp is present in Apf, but not in Rff.

Fig. 6 .
Fig. 6.Comparison of the Type 1 repeat in Apf and Rff.Amino acid replacements are indicated by + and the Leudoublet present in Rff is underlined (dashes mark the gap in Apf).