ENDscript uses as query either a four digit PDB identifier (e.g. 2CAH ) or an uploaded coordinate file in PDB format (NMR and crystallographic structures are supported).
The ENDscript automated pipeline involves numerous sequence and structure analysis programs and is divided in three succeeding phases:
Phase 1:
- The PDB query is processed with SPDB, a homemade program, and the amino acid sequence is extracted.
- A second SPDB output is generated and given to DSSP to extract secondary structure elements, disulfide bridges and solvent accessibility per residue.
This second SPDB file is also used by CNS to determine non-crystallographic and crystallographic protein:ligand and protein:protein contacts.
- At this point, ENDscript renders a first figure via our ESPript program:
- Secondary structure elements and residues in alternate confirmation are shown on the top of the sequence query.
- Accessibility and hydropathy scales, intermolecular contacts and possible disulfide bridges are shown on the bottom.
Phase 2:
- A BLAST+ search using the sequence of the PDB query is performed against a chosen sequence database (PDBAA by default) to detect protein homologues.
- The result is piped to a chosen multiple sequence alignment software (Clustal Omega, MAFFT, MSAProbs or MultAlin).
- A second figure is then generated by ESPript:
- It shows the aligned sequences colored according to their degree of similarity.
- In addition, each homologous sequence of known 3D structure is adorned with its secondary structure elements extracted by DSSP.
- Further information is presented on the bottom alignment as in phase 1.
Phase 3:
- Two PyMOL session files are generated. They can be downloaded and interactively examined with the molecular 3D visualization program PyMOL installed on the user's computer.
- The first PyMOL file is named "Cartoon":
This is a ribbon depiction of the PDB query colored as a function of similarity scores calculated from the previous multiple sequence alignment.
This color ramping from white (low score) to red (identity) allows to quickly locate areas of weak and strong sequence conservation on the structure of the query.
A solvent-accessible surface can be mapped with the same coloring code via the PyMOL control panel.
- The second PyMOL file is named "Sausage":
It shows a variable tube representation of the Cα trace of the query.
For this drawing, all homologous protein structures were superposed onto the PDB query with ProFit and the size of the tube is proportional to the mean r.m.s. deviation per residue between Cα pairs.
The same white to red color ramping is used to visualize possible substitutions in sequence.
Hence, the user can identify areas of weak and strong structural conservation and correlate this result with sequence conservation.
- If applicable, these two PyMOL representations can display an assortment of supplementary information compiled by ENDscript:
- Biological assembly (Cα trace style),
- Multiple NMR models (pink Cα trace),
- Disulfide bridges,
- Nucleic acids / ligands / monatomic elements and their contacting residues (ball-and-stick style),
- Water molecules,
- Strictly conserved residues (magenta ball-and-stick),
- PDB SITES markers (blue mesh).
- All these features are fully user-editable thanks to the PyMOL control panel and publication-quality pictures can rapidly be rendered.
- For more details on the automated pipeline and user-accessible settings, please consult the User Guide.
|
ENDscript can handle up to 3,000 distinct sequences adorned with their secondary structure elements and render their representation in the gigantic 'Tapestry' format (0.8 × 3.3 meters)!
|