Articles > Data-Mining Approaches Reveal Hidden Families of Proteases in the Genome of Malaria Parasite

Data-Mining Approaches Reveal Hidden Families of Proteases in the Genome of Malaria Parasite


Data-Mining Approaches Reveal Hidden Families of Proteases in the Genome of Malaria Parasite

Yimin Wu1,4, Xiangyun Wang2, Xia Liu1 and Yufeng Wang3,5

1Department of Protistology, American Type Culture Collection, Manassas, Virginia 20110, USA; 2EST Informatics, Astrazeneca Pharmaceuticals, Wilmington, Delaware 19810, USA;3 Department of Bioinformatics, American Type Culture Collection, Manassas, Virginia 20110, USA


The search for novel antimalarial drug targets is urgent due to the growing resistance of Plasmodium falciparum parasites to available drugs. Proteases are attractive antimalarial targets because of their indispensable roles in parasite infection and development, especially in the processes of host erythrocyte rupture/invasion and hemoglobin degradation. However, to date, only a small number of proteases have been identified and characterized in Plasmodium species. Using an extensive sequence similarity search, we have identified 92 putative proteases in the P. falciparum genome. A set of putative proteases including calpain, metacaspase, and signal peptidase I have been implicated to be central mediators for essential parasitic activity and distantly related to the vertebrate host. Moreover, of the 92, at least 88 have been demonstrated to code for gene products at the transcriptional levels, based upon the microarray and RT-PCR results, and the publicly available microarray and proteomics data. The present study represents an initial effort to identify a set of expressed, active, and essential proteases as targets for inhibitor-based drug design.

Genome Research, Vol 13, Issue 4, 601-616, April 2003.


Malaria remains one of the most dangerous infectious diseases in the world. It kills 1–2 million people each year, and is responsible for enormous economic burdens in endemic regions. The development of new antimalarial drugs is urgently needed due to the continuing high mortality and morbidity caused by malaria and the increasing prevalence of drug-resistance in the pathogenic parasite Plasmodium falciparum.

Malarial proteases have long been considered potential targets for chemotherapy due to their crucial roles in the parasite life cycle, and the feasibility of designing specific inhibitors (for reviews, see McKerrow et al. 1993Go; Rosenthal 1998Go; Blackman 2000Go; Rosenthal 2002Go). Efforts to identify functional proteases targeted by inhibition assays are ongoing. Subtilase-1 and Subtilase-2, two homologous serine proteases, are demonstrated to be involved in schizont rupture and merozoite invasion (Blackman et al. 1998Go; Barale et al. 1999Go; Hackett et al. 1999Go). Cysteine proteases have also been implicated in the rupture/invasion process (Salmon et al. 2001Go). A cluster of Serine Repeat Antigens (SERAs) exhibit limited sequence similarity to cysteine proteases, though their proteolytic activity remains undocumented (Delplace et al. 1988Go; Miller et al. 2002Go). A zinc-metallo-aminopeptidase has also been demonstrated to possess enzymatic activity (Florent et al. 1998Go). Meanwhile, three classes of proteases have been identified to be involved in hemoglobin degradation: (1) Four aspartic proteases (plasmepsin I, II, IV, and HAP) (see Banerjee et al. 2002Go for a review); (2) three cysteine proteases (falcipain-1, -2, and -3) (see Rosenthal 2002Go for a review); and (3) one metalloprotease (falcilysin; Eggleson et al. 1999Go). The successful crystallization of plasmepsin II and the expression of recombinant plasmepsin I/II and falcipain-2 represented a significant advance towards a functional understanding and a rational design of inhibitors of these enzymes (Silva et al. 1996Go; Bernstein et al. 1999Go; Tyas et al. 1999Go; Shenai et al. 2000Go; Dua et al. 2001Go).

The recent completion of the P. falciparum genome provides a basis on which to identify new proteases. The first pass annotation has predicted 25 proteases that belong to ten families of five catalytic classes (Table 1). Despite this initial progress, direct evidence from protease inhibition assays and independent comparisons with other genomes suggest that in addition to the limited number of characterized and predicted proteases, many important proteolytic enzymes remain uncharacterized (Olaya and Wasserman 1991Go; Southan 2001Go). The following six sets of experimental data suggest that unidentified proteases are responsible for additional critical hydrolytic activities: (1) A calpain-type protease, which appears to be involved in merozoite invasion of red blood cells (Olaya and Wasserman, 1991Go); (2) an entire group of threonine proteases in the proteasome complex (Gantt et al. 1998Go); (3) proteases that catalyze the primary processing of Merozoite Surface Protein (MSP-1; David et al. 1984Go), Apical Merozite Antigen-1 (AMA-1; Narum and Thomas 1994Go), and the precursor of SERA (Li et al. 2002Go); (4) the gp76 and gp68 GPI-anchored serine proteases that cleave host erythrocyte surface proteins in P. falciparum and P. chabaudi, respectively (Braun-Breton et al. 1988Go); (5) a 75-kD merozoite serine protease (Rosenthal et al. 1987Go); and (6) a neutral aminopeptidase essential in hemoglobin digestion (Curley et al 1994Go). Additional supportive evidence that the majority of malarial proteases are unexplored comes from a comparison with the number of proteases found in other organisms. According to the statistics in the protease database Merops ( as released on March 18, 2002, all the model organisms possess a large number of predicted and characterized proteases (human, 493; mouse, 431; Drosophila melanogaster, 529; Caenorhabditis elegans, 360; Arabidopsis thaliana, 568; Baker’s yeast, 112; Escherichia coli, 127; Bacillus subtilis, 119). An average of 2.21% of the gene products belong to the protease superfamily in 77 completed genomes. Hence, given the observation that the number of predicted proteases appears to be positively correlated with organismal complexity, one might envisage that a considerable number of malaria proteases have yet to be identified in the ~23 Mbp Plasmodium falciparum genome that encodes for approximately 5300 gene products.

Here, we report a complete survey of protease homologs in the predicted and annotated P. falciparum genome (Gardner et al. 2002Go). Our initial comparative sequence search identified 92 putative malaria proteases, including potentially an interesting calpain, a metacaspase, and a signal peptidase I. Their expressions have been evaluated by microarray and RT-PCR assays. This study helps to develop an integrated view of a number of novel malarial proteases within an organismal, evolutionary, and functional context, and offers an intriguing opportunity to further target expressed and active proteases for chemotherapy.

Source: Genome Research, Vol 13, Issue 4, 601-616, April 2003.

You will also like...