The present study focused on sequence, structural and functional analysis of PAGs using bu PAG2 as a model. ProtParam was used to analyze different physiochemical properties from the amino acid sequence. The 367 amino acids long bu PAG2 was predicted to have a molecular weight of 40804.7 Daltons and an isoelectric point (pI) of 6.34. An isoelectric point close to 7 indicates a slightly negatively charged protein, and an instability index of 49.21 suggests an unstable protein. The negative GRAVY index of −0.015 is indicative of a hydrophilic and soluble protein.
Homology modeling of bu PAG2
The 3D model of a protein provides us invaluable insights into the structural basis of its function. Homology or comparative modeling is the most common structure prediction method. Numerous online servers and tools are available for homology modeling of proteins. Upon a PSI-BLAST search against the Protein Data Bank (PDB), 3PSG_A was identified as the best template available for the homology modeling of the bu PAG2 with 47.59% sequence identity to bu PAG2 over 96% query coverage. 3PSG_A is a refined X-ray diffraction model of A-chain of porcine pepsinogen at a resolution of 1.65 Å. The query sequence and template structure were then provided as inputs in MODELLER9v10 to generate the 3D model of bu PAG2.
Energy minimization, quality assessment and visualization
The model generated by MODELLER was subject to energy minimization and assessed for both geometric and energy aspects using Swiss-Pdb Viewer and refined using What If Web Interface. The final model (Figure 1) showed a quality factor of 83.143% in ERRAT. The positioning of secondary structural elements was generated from PDBsum. In all, the predicted model of bu PAG2 was found to contain 7 sheets, 9 beta hairpins, 2 psi loops, 6 beta bulges, 26 strands, 15 helices, 4 helix-helix interactions, 32 beta turns, 6 gamma turns and 2 disulphide linkages (Figure 2).
Several structure assessment methods including Ramachandran plots and RMSD were used to check the reliability of the predicted 3D model. Ramachandran plots were also obtained from PDBsum for quality assessment. Only 1 (0.3%) of the total 367 residues were present in the disallowed region whereas another 5 residues were present in the generously allowed regions (Figure 3). G-factors provide a measure of how unusual a stereo-chemical property is. Values below −0.5 represent unusual property where as, values below −1.0 represent high unusualness. The G-factors for dihedral angles and main chain covalent forces were calculated to be −0.37 and 0.14, respectively. The overall average G-factor for the bu PAG2 model was −0.16. The Ramachandran plot and G-factors indicate that the backbone dihedral angles, phi and psi, in the 3D model of bu PAG2 are well within acceptable limits.
The Root Mean Square Deviation (RMSD) indicates the degree to which two 3D structures are similar; the lower the value, the more similar the structures. Both template and query structures were superimposed for the calculation of RMSD (Figure 4). The RMSD value obtained from superimposition of bu PAG2 and 3PSG_A, using MUSTANG in YASARA View, was found to be 0.447 Å over a total of 353 aligned residues. The overall quality factor, Ramachandran plot characteristics, G-factors and RMSD values confirm the quality of the homology model of bu PAG2. The final protein structure was deposited in PMDB [PMDB: PM0077895].
The glycosylation sites were predicted by using NetOGlyc, NetNGlyc and YinOYang tools provided by CBS DTU (Figure 5). NetOGlyc could not detect any O-glycosylation sites; NetNGlyc predicted N-glycosylation sites at residues 48, 68, 251 and 340. One N-glycosylation was also predicted with low confidence at position 245. YinOYang predicted 5 O-(beta)-GlcNAc sites at residues 112, 113, 234, 236 and 296. Four other sites were also predicted with low confidence at residues 97, 98, 106 and 302. Of the total 9 sites, residues 112 and 302 were also predicted as Yin-Yang sites. Yin-Yang sites are Ser/ Thr residues that are O-(beta)-GlcNAcylated as well as phosphorylated; these are reversibly and dynamically modified by O-GlcNAc or Phosphate groups at different times. Butler et al. also recorded large disparity in the glycosylation pattern of PAGs [1]. SignalP recognized the first 12 residues in the sequence as a signal peptide for extracellular secretion of the protein (Figure 6).
Functional annotation of buPAG2
Presently, PAGs are known to be pregnancy induced proteins expressed about 7th day post-fertilization onwards largely in the pre-placental trophoblast, and post-implantation trophectoderm. In the present study, a systematic workflow consisting of several bioinformatics tools and databases was defined and used with the goal of performing structural and functional annotation of bu PAG2. Three web tools were used to search the conserved domains and potential function of bu PAG2. Based on consensus predictions made by Pfam, NCBI-CDD and InterProScan, it is confirmed that buPAG2 belongs to the aspartate protease superfamily and possesses eukaryotic aspartyl protease domain. Aspartic proteases are a family of protease enzymes that use an aspartate residue for catalysis of their peptide substrates. In general, they have two highly-conserved aspartates in the active site and are optimally active at acidic pH.
Eukaryotic aspartic proteases include pepsins, cathepsins, and renins. They have a two-domain structure, arising from ancestral duplication. Each domain contributes a catalytic Asp residue, with an extended active site cleft localized between the two lobes of the molecule. One lobe has probably evolved from the other through a gene duplication event in the distant past. In modern-day enzymes, although the three-dimensional structures are very similar, the amino acid sequences are more divergent, except for the catalytic site motif, which is much conserved. The presence and position of disulfide bridges are other conserved features of aspartic peptidases [26, 27].
Most eukaryotic endopeptidases are synthesized with signal and propeptides. The animal pepsin-like endopeptidase propeptides form a distinct family of propeptides, which contain a conserved motif approximately 30 residues long. The propeptide contains two helices that block the active site cleft; in particular, the conserved Asp residue in the protease hydrogen bonds to a conserved Arg residue in the propeptide. This hydrogen bond stabilizes the propeptide conformation and is probably responsible for triggering the conversion of the zymogen to active enzyme under acidic conditions [26, 27].
In our structure, Pfam recognized a 29 amino-acid long, propeptide sequence from residues 13 to 41 and a 304 amino-acid long, eukaryotic aspartyl protease sequence from 64 to 367 residues. Active sites of the protease were recognized at positions 83 and 264. The first 12 residues were recognized by SignalP as the signal peptide. The propeptide was recognized as a member of the A1 Propeptide family (PF07966), whereas the aspartyl protease was recognized as a member of the Asp family (PF00026) under the peptidase clan AA (CL0129). These predictions are similar to those of InterProScan that recognized peptidase activity within residues ranging from 71–91, 212–225, 261–272 and 346–361. The catalytic sites for the protease were predicted by InterProScan within residue range from 54–219 and 225–367; active sites were predicted to be present within residue range from 80–91 and 261–272. The peptidase clan AA (CL0129) contains aspartic peptidases, including the pepsins and retropepsins. These enzymes contain a catalytic dyad composed of two aspartates. In the retropepsins one is provided by each copy of a homodimeric protein, whereas in the pepsin-like peptidases these aspartates come from a single protein composed of two duplicated domains. This clan contains the 12 member families, viz. Asp, Asp protease, Asp protease 2, DUF1758, gag-asp protease, Peptidase A2B, Peptidase A2E, Peptidase A3, RVP, RVP 2, Spuma A9PTase and Zn protease [26–28].
NCBI-CDD could also recognize A1 Propeptide (cl06833); and cellular and retroviral pepsin-like protease (cl11403) superfamily sequences within bu PAG2. This superfamily is further classified as the peptidase family A1 (pepsin A) and A2 (retropepsin family). Specifically, the alignment of bu PAG2was detected with the superfamily member cd05478, i.e. Pepsin A. The cellular pepsin and pepsin-like enzymes are twice as long as their retroviral counterparts. These are found in mammals, plants, fungi and bacteria. These well known and extensively characterized enzymes include pepsins, chymosin, rennin, cathepsins, and fungal aspartic proteases. They contain two domains possessing similar topological features. The N- and C-terminal domains, although structurally related by a 2-fold axis, have only limited sequence homology except in the vicinity of the active site, suggesting that the enzymes evolved by an ancient duplication event. The eukaryotic pepsin-like proteases have two active site Asp residues with each N- and C-terminal lobe contributing one residue. While the fungal and mammalian pepsins are bilobal proteins, retropepsins function as dimers and the monomer resembles structure of the N- or C-terminal domains of eukaryotic enzyme. The active site motif (Asp-Thr/Ser-Gly-Ser) is conserved between the retroviral and eukaryotic proteases and between the N-and C-terminal of eukaryotic pepsin-like proteases. These endopeptidases specifically cleave bonds in peptides at least six residues in length with hydrophobic residues in both the P1 and P1' positions. The active site is located at the groove formed by the two lobes, with an extended loop projecting over the cleft to form an 11-residue flap, which encloses substrates and inhibitors in the active site. Specificity is determined by nearest-neighbor hydrophobic residues surrounding the catalytic aspartates, and by three residues in the flap. Nearly all known aspartyl proteases are inhibited by pepstatin [26–28]. In our model, the inhibitor binding site was predicted by NCBI-CDD to be formed of residues 83, 85, 87, 123, 124, 125 and 169. NCBI-CDD could predict only one active site at residue 83 within a catalytic motif formed by residues 83–85. Additionally, active site flaps were predicted at residues 123–126 and 130–133.
ProKnow metaserver integrates outputs from PSI-BLAST, PROSITE, DALI/ DASEY, DIP and RIGOR to extract similarity of the query sequence with proteins in the ProKnow database. This information is subsequently used to assign a weighted set of functions to the query protein. Consensus results from ProKnow and PFP servers suggest inhibitory effects of bu PAG2 on proteolysis, immunological response and carbohydrate metabolism. Bu PAG2 shows strong evidence for MHC I binding and down-regulation of the complement pathway. Hashizume and co-workers put forth that PAGs may act as immunosuppressants allowing for the immunological acceptance of the embryo by the dam [4]; such effects may be accounted for in part by MHC binding and complement inhibiting activity of PAGs. Also, a role of bu PAG2 in regulation of transcription is predicted at moderate confidence level, possibly through DNA dependent and GTP binding mechanisms. Pregnancy is a complex physiological process requiring adaptations by the dam on many fronts. While down-regulation of immune response is deemed essential for acceptance of the fetus as a hemi-allograft, down-regulation of proteolysis and carbohydrate metabolism may have nutritional consequences. Alternately, down-regulation of proteolysis may also be an essential pre-requisite for controlled apoptotic processes underlying fetal morphogenesis and/ or re-modeling of feto-maternal tissues; similar roles for PAGs have been postulated in bovines by Hashizume et al. [4]. Similarly, regulation of transcription may also be required for orchestration of a multitude of physiological processes in response to pregnancy. PFP also recognizes bu PAG2 as an inducible, extracellular protein. Successful maintenance and consummation of pregnancy requires the dam to produce molecular signals, mainly proteins, which are involved in vital processes as blockage of PGF2α secretion and endometrial remodeling [29, 30]. A role of PAGs in implantation and placentogenesis has also been proposed by Ishiwata et al. [31]. PAGs have also been shown to possess luteotropic activity [32, 33].
BlastP against microbial and eukaryotic DEG entries did not recognize bu PAG2 or an ortholog as a gene product that is essential for survival of an organism. Based on a KEGG search performed via KAAS, again, bu PAG2 was not found to be essentially involved in any of the biometabolic pathways. The essentiality of an inducible gene product with sex-restricted expression in pregnant females is logically unlikely.