Structural, functional, and evolutionary analysis of Cry toxins of Bacillus thuringiensis: an in silico study

Bacillus thuringiensis (Bt) is a gram-positive spore-forming soil bacterium that synthesizes crystalline (Cry) protein, which is toxic and causing pathogenicity against mainly three insect orders: Coleoptera, Diptera, and Lepidoptera. These crystalline protein inclusions, i.e., δ-endotoxins are successfully used as a bio-control agent against insect pests. A total of 58 various Cry proteins belonging to these 3 insect orders were retrieved from SwissProt database and are categorized into different groups. Structural and functional analysis were performed to understand the functional domain arrangements at sequence level as well as at structural level involving both experimental and predicted 3-dimensional models. Besides, the analysis of evolutionary relationship involving all 58 observed Cry proteins at the sequence, domain, and structural levels were done using different bioinformatics tools. Evolutionary analysis revealed that some Cry proteins having toxicity for a specific insect order are found to be clustered for another different insect order, which concludes that they might have toxicity for more than one insect order. Three-dimensional (3D) structure analysis of both experimental and predicted models revealed that proteins might have toxicity for a specific insect order differ in their structural arrangements and was observed in Cry proteins belonging to 3 different insect orders. It could be hypothesized that an inner-molecular domain shift or domain insertion/deletion might have taken place during the evolutionary process, which consequently causes structural and functional divergence of Bt. The study output may be helpful for understanding the diversity as well as specificity of the analyzed insecticidal proteins and their application as a biopesticide in the field of agriculture.


Background
Bacillus thuringiensis (Bt) is a gram-positive sporeforming soil bacterium causing pathogenicity in insects. Bt strains synthesize Crystal (Cry) and cytolytic (Cyt) toxins (also known as δ-endotoxins) that have a natural insecticidal effect on selective insect orders. δ-endotoxins of Bt are being used successfully as a biological control agent against some insect pests (Schnepf 1995). These endotoxins are exclusively active against larval stages of different insect orders such as Lepidoptera (Butterflies, Moths), Coleoptera (Flies and Mosquitoes), and Diptera (Beetles and Weevils) (Raymond et al. 2010). During sporulation phase, Bt produces these Cry or Cyt toxins that have hazardous effect on insects (Bravo et al. 2011). Cry and Cyt toxins are considered as parasporal inclusion proteins from Bt that exhibit toxic effects and hemolytic activity respectively ).These two types of toxins belong to a class of pore-forming toxins (PFTs) that are secreted as water-soluble proteins and undergo conformational changes in order to insert into the host membrane. When Crystal (Cry) toxins ingested by insects are get solubilized in their midgut, then it gets proteolytically activated by midgut proteases and bind to specific receptors located in the insect cell membrane leading to cell disruption and cell death (Bravo et al. 2007). Most Cry proteins exist as inactive protoxins that can be converted into active toxins by certain kinds of insect midgut proteases (Höfte and Whiteley 1989). This activation process appears to involve a sequential series of proteolytic cleavages, starting at the C-terminus and proceeding toward the N-terminus until the protease-stable toxin is generated (Choma and Kaplan 1990). Besides these membrane proteins, other components have been identified due to their capacity to interact with 3d-Cry toxins (3 domain-Cry toxins) such as glycolipids, intracellular proteins, V-ATPase subunit A or actin (McNall and Adang 2003;Griffitts et al. 2005;Bayyareddy et al. 2009). The interaction of Cry1 toxins with different proteins present in lepidopteran midgut cells is a complex process involving multiple membrane proteins such as cadherinlike proteins (CADs), aminopeptidase N (APN), and alkaline phosphatase (ALP) (Pigott and Ellar 2007;Soberon et al. 2009).
It has been observed that the Crystal toxins showed specificity to different insect orders which signify that they might have shared certain level of relationships at different levels such as sequence, domain, and structure. The present article aimed to retrieve different Cry proteins from various protein databases, followed by phylogenetic analysis, domain characterization, structure predictions, experimental structure analysis, and comparison.

Main text
Sequence retrieval and analysis of Cry proteins Crystal (Cry) proteins of Bt having specificity to 3 broad insect orders: Lepidoptera, Coleoptera, and Diptera were retrieved from the UniprotKB protein sequence database (https://www.uniprot.org/). Selections were made only from manually annotated and reviewed sequences belonging to Swiss-Prot and stored in FASTA format for further bioinformatics analysis.

Domain identification, characterization, and functional analysis
Identification and annotation of genetically mobile domains and its architectures in the manually annotated and reviewed protein sequences were carried out using a web-based bioinformatics domain prediction tool, i.e., NCBI Batch CD-Search service (https://www.ncbi.nlm. nih.gov/Structure/bwrpsb/bwrpsb.cgi). These domains are extensively annotated with respect to phyletic distributions, functional class, tertiary structures, and functionally important residues.

Sequence alignment and phylogenetic tree construction
Divergence analysis among the retrieved Crystal proteins and the predicted domains were conducted in Molecular Das et al. Egyptian Journal of Biological Pest Control (2021) 31:44 Evolutionary Genetics Analysis X (MEGA X) (Kumar et al. 2018) using Neighbor-Joining (NJ) method (Saitou and Nei 1987). The predicted phylogenetic tree was evaluated using bootstrap reliability test for 1000 replicates.
The tree was drawn to scale, with branch lengths in the same units as those of the evolutionary distances used to infer the phylogenetic tree. The evolutionary distances were computed using the Poisson correction method (Zuckerkandl and Pauling 1965) and are in the units of the number of amino acid substitutions per site.

Structural diversity analysis and prediction
The selected Cry proteins belonging to the 3 major insect orders Lepidoptera, Coleoptera, and Diptera were searched in RCSB-PDB (Research Collaborator for Structural Bioinformatics-Protein Data Bank) (https://www. rcsb.org/) for availability of experimentally solved 3dimensional structures and PDB files were downloaded.

Crystal (Cry) proteins of Bacillus thuringiensis
Total 58 Cry protein sequences of bacteria Bt were retrieved from UniProtKB database (only annotated and reviewed). Among all of these 58 sequences, total 15, 10, and 33 numbers of Cry proteins were observed in the insect orders: Coleoptera, Diptera, and Lepidoptera, respectively. Primary sequence variation in length of Cry proteins were noticed as 123-1169 amino acids in case of order Coleoptera (Table 1), 643-1180 amino acids in Diptera (Table 2), and mostly more than 1100 amino acids in Lepidoptera (Table 3).

Analysis of Cry proteins structural domains
Primary sequences of 58 Cry proteins with different specificity in the 3 insect orders were further analyzed to annotate different structural domains involved in various biological processes (Tables 4, 5, and 6). Domain analysis was revealed that most of the Cry proteins have 3 structural domains such as Endotoxin_N (PF03944), Endo-toxin_M (PF00555), and Endotoxin_C (PF03945) (Tables 4, 5, and 6). It was also observed that Endotoxin_N, Endotoxin_M, and Endotoxins_C are presented in N terminus, middle, and C-terminal region of the protein (Figs. 1 and 2), respectively. Generally, N-terminal helical domain involves in membrane insertion and pore formation whereas middle and C-terminal domains have a vital role in receptor bindings.

Evolutionary analysis of Cry proteins and domains
Evolutionary tree was constructed for 58 Cry proteins using MEGA X tool after elimination of all residual positions containing gaps and was further analyzed (Fig. 3). The tree is clustered into 3 major groups such as clusters I, II, and III. Cluster I (red color) consists of Cry proteins Interestingly, lepidopteran target Cry15Aa and coleopteran targets Cry34Ab1 and Cry35Ab1 were seemed as an out-group in the whole tree and were found to be distantly related from all the Cry protein sequences. The Cry proteins for the specific order were found to be diverged from the own group and placed in different insect orders indicate that they might have insecticidal property for more than one insect order. Besides, in order to understand the insect specificity of these Cry proteins, the functional domain regions were also taken for studying the divergence among them. The phylogenetic tree of the domain Endo-toxin_N, Endotoxin_M, and Endotoxin_C was constructed using MEGA X software and presented (Fig. 4).   The Cry protein sequence Endotoxin_N domains (55 numbers) was clustered into 3 major groups. The evolutionary analysis (Fig. 4a) depicted that the divergence pattern of all the domain sequences as similar with that of all Cry protein sequences and presented in (Fig. 3). Evolutionary analysis of Endotoxin_M domains (51 numbers) present in Crystal protein target for all the 3 insect orders depicted in (Fig. 4b) showed a similar type of divergence in the phylogenetic tree as that of two predicted tree for Endotoxin_N and total Cry protein sequence. Phylogenetic tree involving Endotoxin_C domains (51 numbers) was revealed that all Cry1 and Cry9 targets for lepidopterans are grouped into a single cluster, except for Cry1C, Cry1B, Cry1E, and Cry1Ac, which are found to be clustered into the second group of Cry protein, which targets the coleopterans. However, all Cry proteins for dipteral were clustered into the same group, except for Cry8Ba, which targets the coleopteran (Fig. 4c).

Analysis of Cry protein structure
Structural analysis of Cry protein sequences was performed to understand its function in a better way. It has been observed that most of the Cry proteins do not have experimentally solved 3-dimensional structures in the RCSB PDB, except for Cry3Aa, Cry34Ab1, Cry35Ab1, and Cry3Bb1 for Coleoptera (Fig. 5a), Cry4Ba and Cry4Aa for Diptera (Fig. 5b), and Cry1Aa, Cry1Ac, Cry1Da, Cry1Fa, and Cry1Be for Lepidoptera (Fig. 6a,b). The rest 47 unsolved 3-dimensional structures of Cry proteins belongs to 3 different insect orders predicted through Phyre2 homology modeling server, i.e., 11 numbers of model structure for coleopteran, 08 numbers of model structures for dipterans, and 28 numbers of model structure for lepidopterans. Structural analysis of Cry proteins for coleopteran's group revealed that the proteins having Endotoxin_N, Endotoxin_M, and Endo-toxin_C at sequence level corresponding to domain I, domain II, and domain III, respectively, at their structural levels. Domain I regions found to be observed starting from 60 to 300 amino acids at the N-terminal region of the Cry proteins and consists of α-helices (~8 amino acid residues, corresponding to Endotoxin_C domain of the Cry protein at the C-terminal region. By doing structure-structure alignment involving both experimental and structural Cry proteins of Coleopteran group depicted that all 15 three-dimensional structures for Coleopteran are clustered into two major groups comprising Cry7Aa, Cry9Da, Cry8Ba, Cry8Ca, Cry7Ab (2 numbers), and Cry8Aa in one group whereas Cry3Aa (3 numbers), Cry3Ca, Cry3Ba, and Cry3 in another group whereas Cry34Ab1 and Cry35Ab1 are observed as out-groups (Fig. 7a) (Table 6). Alignment of all 28 proteins structures was clustered into one major group leaving behind Cry2Ab as distantly related to them whereas Cry15Aa was observed as out-group (Fig. 7c). B. thuringiensis (Bt) strains produce a wide variety of proteins having toxicity against diverse insect orders. These toxins classified into 2 major groups: crystal (Cry) and cytolytic (Cyt). More than 700 Cry gene sequences that code for crystal protein (Cry) have been identified in plasmids by several researchers (Höfte and Whiteley 1989;Schnepf et al. 1998;Van Frankenhuyzen 2009).
Many Cry proteins are reported to have useful insecticidal properties for controlling insect pests in agriculture (Sanchis and Bourguet 2008). However, strong cytocidal activities have also been noticed against vertebrates (Palma et al. 2014). Primary protein sequence database search revealed that there are 58 numbers of Cry proteins showing specificity towards the 3 major insect orders: Coleoptera, Diptera, and Lepidoptera (Donovan et al. 2016;Sanchis and Bourguet 2008;Naimov et al. 2008). Sequences of different insect orders showed different sequence's length. All Cry1, Cry2, and Cry15 groups showed toxicity towards lepidopteran insects, whereas Cry9 group showed toxicity towards Lepidoptera and Coleoptera. Similarly, Cry proteins such as Cry3, Cry7, Cry8, Cry34, and Cry35 groups were found to have specificity for the coleopterans. Cry4, Cry10, Cry11, Cry19, Cry20, and Cry27 groups exhibit insecticidal activity against the insect belonging to dipteran's order. The classification of Cry proteins and their insecticidal activity against specific insect orders have been studied by several researchers (Crickmore 2000;de Maagd et al. 2001). Analysis of evolutionary relationship revealed that Cry proteins such as Cry1Id, Cry9Ca, Cry1Be, Cry1Ka, Cry1Bd, Cry1Bb, and Cry9Ea showed toxicity towards lepidopteran's insects also clustered together with the Cry proteins having toxicity for the coleopterans. This finding is supported by (Crickmore 2000). It has been also suggested that these proteins may have toxicity against both insect orders and was later showed for Cry1B toxin (López-Pazos et al. 2009). Similarly, Cry8Ca proteins of coleopteran's group and Cry2Ab and Cry9Aa for lepidopterans are found to be clustered in Diptera insect group. This phylogenetic relationship of whole Cry proteins could not able to reveal how Cry toxin involves in insect specificity. To validate further, phylogenetic analysis of the 3 structural domains such as domain I (Endotoxin_N), domain II (Endotoxin_M), and domain III (Endotoxin_C) were carried out independently. Divergence pattern were observed similar in case of both sequence level and structural level with minor fluctuation.
As per literature, the domain swapping of different Cry toxin is likely to be an active evolutionary process for determining insect specificity (de Maagd et al. 2001). Threedimensional X-ray crystallography structures of several Cry toxins of Bt have been reported in this connection. A total of 11 numbers of three-dimensional structures of the Cry proteins were retrieved from RCSB PDB out of which 4 proteins for Coleoptera, 2 for Diptera, and 5 for Lepidoptera. Homology model structure for 47 Cry proteins were predicted using web based Phyre2 tool to study the structural arrangements of 3 different domains. All experimental and predicted models of Cry proteins revealed that the domain I is present in N-terminal region having αhelices, domain II consists of three antiparallel β-sheets, and domain III consists of two twisted anti-parallel βsheets forming a sandwich. This observations and findings are also supported by different published reports (Grochulski et al. 1995;de Maagd et al. 2001, andGouet et al. 2003). Structure-structure alignment was also carried out involving all the 58 numbers of Cry proteins (experimental and model 3-dimensional structures) in order to understand the divergence among each other. In case of Coleoptera insect order, 3-dimensional structures of Cry34Ab1 and Cry35Ab1 were observed as out-group. In case of Diptera, 3-dimensional structures of Cry4Ba were found diverged from other Cry proteins whereas Cry2Ab protein of Lepidoptera group noticed to be diverged from other group of proteins. However, Cry15aa was predicted as an out-group in compare to other Cry proteins.

Conclusions
Bacillus thuringiensis (Bt) synthesizes various insecticidal proteins and thus recommended as potential bio control agent against various insect pests in agriculture. The evolution and diversification of these Cry proteins have been studied extensively by various researchers to discover the existence of important determinants, which confer insect specificity for improvement of its insecticidal activity. There are total 58 numbers of different Cry protein groups belong to major three insect orders: Coleoptera, Diptera, and Lepidoptera were retrieved and analyzed both at structural and sequence level. Structural and functional analysis was performed to understand the domain arrangements at sequence and structural level involving both experimental and predicted 3D models. Cry proteins having toxicity for a specific insect order are grouped accordingly. Threedimensional structure analysis of both experimental and predicted models revealed that the Cry proteins might have toxicity for a specific insect order differ in their structural arrangements and is observed in 3 different groups. It could be hypothesized that an inner-molecular domain shift or domain insertion/deletion might have taken place during the evolutionary process, which consequently causes structural and functional divergence of Bt. These findings lead to understand the wide diversity of insecticidal proteins and their application as biopesticides in agriculture.