Example 7: Closed, Unliganded HIV-1 BG505 Env SOSIP.664 Trimer

PDB ID 4zmj is a structure of the HIV-1 Env BG505 SOSIP-664 ectodomain trimer in a closed conformation, without any ligands bound. It is one of the earliest X-ray crystal structures solved for this protein in trimeric form.

Pestifer understands how to build a system using any chosen biomolecular assembly available in the structure file. In the case of 4zmj, the asymmetric unit is a single heterodimeric protomer composed of chains G (gp120) and B (gp41). The relevant biological assembly is a C3-symmetric homotrimer of protomers, which is labeled biological assembly 1 in the PDB header. Here we specify that the new chains generated by BIOMT transforms are H and J for chain G and C and D for chain B. Pestifer will also by default undo any engineered mutations (there are three in 4zmj) and add any unresolved or zero-occupancy residues. A build of the 4zmj trimer illustrates these capabilities.

# Author: Cameron F. Abrams, <cfa22@drexel.edu>
#
# pestifer input script
# 
# Simple build of solvated HIV-1 Env SOSIP ectodomain trimer 4zmj
#
# Notes:
#   - Chain G produces daughter chains H and J upon BIOMT transformation to build trimer
#   - Chain B produces daughter chains C and D upon BIOMT transformation to build trimer
#   - Existing chains C and D (these are glycans) are assigned new chainIDs
#   - All glycans are assigned chainIDs that are not shared with any protein
#   - Missing protein is built-in
#   - Fine-tuning ramachandran rotations performed at N-terminus of HN1
#     to avoid steric clashes when building in these missing residues
#   - A five-phase NPT equilibration is used to settle the density
#   - A production tarball is generated: prod_4zmj.tgz
#
title: Closed, Unliganded HIV-1 BG505 Env SOSIP-664 Trimer
tasks:
  - fetch:
      sourceID: 4zmj
  - psfgen:
      source:
        biological_assembly: 1
        transform_reserves:
          G: [H,J]
          B: [C,D]
      mods:
        mutations:  # undo the SOSIP mutations
          - G:CYS,501,ALA
          - B:PRO,559,ILE
          - B:CYS,605,THR
        crotations:
          - psi,B,546,568,-180.0
          - phi,B,547,568,-60.0
  - validate:
      tests:
        - attribute_test:
            name: point mutation
            selection: protein and chain G H J and resid 501 and name CA
            attribute: resname
            value: ALA
            value_count: 3
        - attribute_test:
            name: point mutation
            selection: protein and chain B C D and resid 559 and name CA
            attribute: resname
            value: ILE
            value_count: 3
        - attribute_test:
            name: point mutation
            selection: protein and chain B C D and resid 605 and name CA
            attribute: resname
            value: THR
            value_count: 3
  - md:
      ensemble: minimize
  - ligate:
      steer:
        nsteps: 4200
  - validate:
      tests:
        - connection_test:
            name: gp120 gaps ligation
            selection: protein and chain G H J and ((resid 185 and insertion I) or resid 187 410 411)
            connection_type: interresidue
            connection_count: 6
        - connection_test:
            name: gp41 gaps ligation
            selection: protein and chain B C D and resid 568 569
            connection_type: interresidue
            connection_count: 3
  - md:
      ensemble: minimize
  - md:
      cpu-override: True
      ensemble: NVT
      nsteps: 2400
  - solvate:
  - md:
      ensemble: minimize
  - md:
      ensemble: NVT
  - md:
      ensemble: NPT
      nsteps: 200
  - md:
      ensemble: NPT
      nsteps: 400
  - md:
      ensemble: NPT
      nsteps: 800
  - md:
      ensemble: NPT
      nsteps: 1600
  - md:
      ensemble: NPT
      nsteps: 13200
  - mdplot:
      timeseries:
        - density
        - - a_x
          - b_y
          - c_z
      grid: True
      basename: solvated
  - terminate:
      basename: my_4zmj
      artifacts: artifacts
      package:
        basename: prod_4zmj
        namd:
          ensemble: NPT
Pipeline task summary

Step

Task

Details

1

fetch

PDB 4ZMJ

2

psfgen

mutations, crotations

3

validate

3 test(s)

4

md

minimize

5

ligate

ligate chain breaks (steered MD, 4,200 steps)

6

validate

2 test(s)

7

md

minimize → NVT (2,400 steps)

8

solvate

water box

9

md

minimize → NVT (2,000 steps) → NPT (16,200 steps, 5 phases)

10

mdplot

equilibration time-series plots → mdplots/

11

terminate

basename: my_4zmj; package: prod_4zmj

There are several new aspects in this example relative to the first four. First, in the psfgen task, the source directive has a biological_assembly specification with transform_reserves and sequence subdirectives.

Clearly we are indicating biological assembly 1, which you can verify through the RCSB web interface or by reading the PDB file header is the trimer.

There is also a ligate task. Together, the loops subdirective of the sequence directive in the source, and the ligate task, constitute the method of inserting missing residues (residues designated by MISSING records in the PDB or zero-occupancy in the mmCIF). Building in missing protein loops that are internal to any given chain is done in the following way:

  1. Via residue commands inside segment stanzas of the psfgen script, each missing residue is indicated. Additionally, a sacrificial glycine residue is added at the C-terminus of any run of missing residues. After all segments are defined, sacrificial glycines are deleted and NTER/CTER patches are explicitly added. When psfgen is run, the guesscoords command results in this missing residues being built in according to their default internal coordinates; this means they grow in as straight chains where every phi and psi angle is 180 degrees.

  2. Each such loop is put through a declashing procedure in which phi and psi angles are adjusteds so that the loop residues do not clash with any other residues. Sometimes, additional manual adjustment of some dihedrals is necessary; that is the case here.

  3. After a minimization, pestifer then runs a ligation task which has two parts. First, a steered MD is run that shrinks the distance between the terminal glycine of each loop and the location on the resolved protein to which it should be attached. Second, another psfgen run is performed to “heal” the gap between the C-terminus of the loop and its attachment point with a peptide bond.

Declashing is done using a Monte-Carlo approach where trial rotations are suggested and only performed if they result in a reduction in the number of steric clashes.

The 4zmj entry contains partially resolved glycans. By default, pestifer will include all resolved glycans. These can be excluded using an excludes list that specifies resnames like NAG, MAN, etc.

The snapshots below illustrate the process by which the loops are grown in. In these snapshots, only backbone protein atoms are shown with bonds drawn as lines. The model-built parts are drawn with thick bonds, and the six chains are colored uniquely.

../../_images/4zmj_step0.png

Structure after first psfgen.

../../_images/4zmj_step1.png

Structure after declashing loops.

../../_images/4zmj_step2-1.png

Early in the steering.

../../_images/4zmj_step2-2.png

Midway through the steering.

../../_images/4zmj_step2-3.png

At the end of the steering.

../../_images/4zmj_step3.png

After healing.