Example 15: Fully Glycosylated, Closed SARS-CoV-2 Omicron BA.2 Variant Spike

This example highlights the use of Pestifer to build a fully glycosylated SARS-CoV2 Spike protein (BA.2 strain) using grafted glycans and cleaving at the furin cleavage sites. This build is based on the PDB entry 7xix, which contains a spike protein in the closed conformation. The PDB file contains glycans, but they are not fully resolved, so we graft glycans from prototypical structures.

The glycans taken from prototypical structures are the following:

  • PDB ID 2wah chain C is a “high-mannose” glycan with 9 mannoses; its full name is alpha-D-mannopyranose-(1-2)-alpha-D-mannopyranose-(1-6)-[alpha-D-mannopyranose-(1-3)]alpha-D-mannopyranose-(1-6)-[alpha-D-mannopyranose-(1-2)-alpha-D-mannopyranose-(1-3)]beta-D-mannopyranose-(1-4)-2-acetamido-2-deoxy-beta-D-glucopyranose-(1-4)-2-acetamido-2-deoxy-beta-D-glucopyranose

  • PDB ID 4b7i chain C is an “intermediate” glycan with 5 mannoses and a fucose; its full name is alpha-D-mannopyranose-(1-3)-[alpha-D-mannopyranose-(1-6)]alpha-D-mannopyranose-(1-6)-[alpha-D-mannopyranose-(1-3)]beta-D-mannopyranose-(1-4)-2-acetamido-2-deoxy-beta-D-glucopyranose-(1-4)-[alpha-L-fucopyranose-(1-6)]2-acetamido-2-deoxy-beta-D-glucopyranose

  • PDB ID 4byh chain C is a “complex” glycan; its full name is N-acetyl-alpha-neuraminic acid-(2-6)-beta-D-galactopyranose-(1-4)-2-acetamido-2-deoxy-beta-D-glucopyranose-(1-2)-alpha-D-mannopyranose-(1-6)-[2-acetamido-2-deoxy-beta-D-glucopyranose-(1-2)-alpha-D-mannopyranose-(1-3)]beta-D-mannopyranose-(1-4)-2-acetamido-2-deoxy-beta-D-glucopyranose-(1-4)-[alpha-L-fucopyranose-(1-6)]2-acetamido-2-deoxy-beta-D-glucopyranose

2wah chain C glycan

High-mannose glycan from PDB ID 2wah chain C. Green circles denote mannoses, either α or β, and blue circles denote N-acetylglucosamines.

4b7i chain C glycan

Intermediate glycan from PDB ID 4b7i chain C. Green circles denote mannoses, either α or β, blue circles denote N-acetylglucosamines, and the red triangle denotes fucose.

4byh chain C glycan

Complex glycan from PDB ID 4byh chain C. Green circles denote mannoses, either α or β, blue circles denote N-acetylglucosamines, red triangle denotes fucose, yellow circles denote galactose, and the purple diamond denotes sialic acid.

The script below shows the use of graft modifications to include the glycans. The glycan assignments (i.e., which asparagines have high-mannose, intermediate, and complex glycans) are taken from Watanabe et al. (2020). The commented-out integer labels on each graft directive indicate the residue numbers in the PDB file to which the glycans are grafted.

The cleave task is used to cleave each protomer at its furin cleavage site (residue 685).

# Author: Cameron F. Abrams, <cfa22@drexel.edu>
#
# pestifer input script
# 
# BA.2 SARS-CoV-2 Spike
#
# Notes:
#   - Glycans are grafted from prototypical structures
#     - 2wah chain C is a poorly processed, high-mannose glycan
#     - 4b7i chain C is an intermedately processed glycan
#     - 4byh chain C is a complex glycan
#   - Chains are cleaved at the furin cleavage sites
#
title: BA.2 SARS-CoV-2 Spike 7xix, fully glycosylated using grafts, and cleaved
tasks:
  - fetch:
      sourceID: 7xix
  - psfgen:
      source:
        biological_assembly: 1
        sequence:
          loops:
            declash:
              maxcycles: 20
          glycans:
            declash:
              maxcycles: 500
      mods:
        mutations: # undo the stabilizing proline mutations
          - A:PRO,986,LYS
          - A:PRO,987,VAL
          - B:PRO,986,LYS
          - B:PRO,987,VAL
          - C:PRO,986,LYS
          - C:PRO,987,VAL
        grafts:
          - A_1304:4b7i,C_1-8 # 61
          - B_1304:4b7i,C_1-8
          - C_1304:4b7i,C_1-8
          - A_1305:4b7i,C_1-8 # 122
          - B_1305:4b7i,C_1-8
          - C_1305:4b7i,C_1-8
          - A_1306:4b7i,C_1-8 # 165
          - B_1306:4b7i,C_1-8 # 165
          - C_1306:4b7i,C_1-8 # 165
          - A_1301:2wah,C_1-9 # 234
          - B_1301:2wah,C_1-9 # 234
          - C_1301:2wah,C_1-9 # 234
          - A_1307:4byh,C_1-10 # 282
          - B_1307:4byh,C_1-10 # 282
          - C_1307:4byh,C_1-10 # 282
          - A_1302:4byh,C_1-10 # 331
          - B_1302:4byh,C_1-10 # 331
          - C_1302:4byh,C_1-10 # 331
          - A_1303:4byh,C_1-10 # 343
          - B_1303:4byh,C_1-10 # 343
          - C_1303:4byh,C_1-10 # 343
          - A_1308:4b7i,C_1-8 # 603
          - B_1308:4b7i,C_1-8 # 603
          - C_1308:4b7i,C_1-8 # 603
          - D_1-2:4byh,C_1#2-10 # 616
          - J_1-2:4byh,C_1#2-10 # 616
          - P_1-2:4byh,C_1#2-10 # 616
          - A_1309:4b7i,C_1-8 # 657
          - B_1309:4b7i,C_1-8 # 657
          - C_1309:4b7i,C_1-8 # 657
          - E_1-2:2wah,C_1#2-9 # 709
          - K_1-2:2wah,C_1#2-9 # 709
          - Q_1-2:2wah,C_1#2-9 # 709
          - F_1-2:4b7i,C_1#2-8 # 717
          - L_1-2:4b7i,C_1#2-8 # 717
          - R_1-2:4b7i,C_1#2-8 # 717
          - G_1-2:2wah,C_1#2-9 # 801
          - M_1-2:2wah,C_1#2-9 # 801
          - S_1-2:2wah,C_1#2-9 # 801
          - A_1310:4b7i,C_1-8 # 1074
          - B_1310:4b7i,C_1-8 # 1074
          - C_1310:4b7i,C_1-8 # 1074
          - H_1-2:4byh,C_1#2-10 # 1098
          - N_1-2:4byh,C_1#2-10 # 1098
          - T_1-2:4byh,C_1#2-10 # 1098
          - I_1-2:2wah,C_1#2-9 # 1134
          - O_1-2:2wah,C_1#2-9 # 1134
          - U_1-2:2wah,C_1#2-9 # 1134
  - validate:
      tests:
        - connection_test:
            name: glycans
            selection: protein and chain A B C and resid 61 122 165 234 282 331 343 603 616 657 709 717 801 1074 1098 1134
            connection_type: glycosylation
            connection_count: 48
        - attribute_test:
            name: point mutation 986
            selection: protein and chain A B C and resid 986 and name CA
            attribute: resname
            value: LYS
            value_count: 3
        - attribute_test:
            name: point mutation 987
            selection: protein and chain A B C and resid 987 and name CA
            attribute: resname
            value: VAL
            value_count: 3
  - md:
      cpu-override: True
      ensemble: minimize
  - ligate:
      steer:
        nsteps: 4000
  - md:
      cpu-override: True
      ensemble: minimize
  - cleave:
      sites:
        - A:685-686
        - B:685-686
        - C:685-686
  - validate:
     tests:
       - connection_test:
           name: cleavages
           selection: protein and chain A B C and resid 685 686
           connection_type: interresidue
           connection_count: 0
  - md:
      cpu-override: True
      ensemble: minimize
  - md:
      cpu-override: True
      ensemble: NVT
  - solvate:
  - md:
      ensemble: minimize
  - md:
      ensemble: NVT
  - md:
      ensemble: NPT
      nsteps: 200
  - md:
      ensemble: NPT
      nsteps: 400
  - md:
      ensemble: NPT
      nsteps: 800
  - md:
      ensemble: NPT
      nsteps: 1600
  - md:
      ensemble: NPT
      nsteps: 13200
  - mdplot:
      timeseries:
        - density
      basename: solvated
      grid: True
  - terminate:
      basename: my_7xix
      artifacts: artifacts
      package:
        basename: prod_7xix
        namd:
          ensemble: NPT
          nsteps: 5000000
Pipeline task summary

Step

Task

Details

1

fetch

PDB 7XIX

2

psfgen

mutations, 48 glycan graft(s)

3

validate

3 test(s)

4

md

minimize

5

ligate

ligate chain breaks (steered MD, 4,000 steps)

6

md

minimize

7

cleave

proteolytic cleavage at 3 site(s)

8

validate

1 test(s)

9

md

minimize → NVT (2,000 steps)

10

solvate

water box

11

md

minimize → NVT (2,000 steps) → NPT (16,200 steps, 5 phases)

12

mdplot

equilibration time-series plots → mdplots/

13

terminate

basename: my_7xix; package: prod_7xix

Note the various syntax used in the graft directives. For example:

graft:
  - A_1304:4b7i,C_1-8 # 66

This indicates that the glycan from PDB ID 4b7i chain C, residues 1 to 8, is grafted onto resid 1304 of chain A on the spike. That resid is not the asparagine at position 61; it is the primary NAG attached to Asn61. Residue 1 of chain C of 4b71 is also a primary NAG, so the graft operation aligns the entire glycan such that its primary NAG aligns on the primary NAG already resolved in the spike’s structure. That NAG is deleted and then the glycan from 4b7i is attached directly from the C1 atom of the primary NAG to the ND2 atom of Asn61.

graft:
  - D_1-2:4byh,C_1#2-10 # 616

In contrast, this indicates that the glycan from PDB ID 4byh chain C, residues 1 and 2, is grafted onto resid 1 and 2 of chain D on the spike. Chain D happens to be just the two NAGs at Asn 616 on one protomer. The 1#2 notation means to take resid 1 and 2 from chain C of 4byh and use them together as an alignment basis before grafting.

A future release of pestifer will allow for more transparent specification of glycans.

../../_images/7xix.png

Fully glycosylated BA-2 SARS-CoV-2 Spike protein (PDB ID 7xix) in closed conformation with glycans shown in white licorice. Protein chains are colored uniquely.

The prototypical glycans in the same PDB structure but not used here are:

  • PDB ID 2wah chain D; beta-D-mannopyranose-(1-4)-2-acetamido-2-deoxy-beta-D-glucopyranose-(1-4)-2-acetamido-2-deoxy-beta-D-glucopyranose

  • PDB ID 4byh chain D; beta-D-galactopyranose-(1-4)-2-acetamido-2-deoxy-beta-D-glucopyranose-(1-2)-alpha-D-mannopyranose-(1-3)-[beta-D-galactopyranose-(1-4)-2-acetamido-2-deoxy-beta-D-glucopyranose-(1-2)-alpha-D-mannopyranose-(1-6)]beta-D-mannopyranose-(1-4)-2-acetamido-2-deoxy-beta-D-glucopyranose-(1-4)-[alpha-L-fucopyranose-(1-6)]2-acetamido-2-deoxy-beta-D-glucopyranose

2wah chain D glycan

Prototypical glycan from PDB ID 2wah chain D. Green circles denote mannoses, either α or β, and blue circles denote N-acetylglucosamines.

4byh chain D glycan

Prototypical glycan from PDB ID 4byh chain D. Green circles denote mannoses, either α or β, blue circles denote N-acetylglucosamines, red triangle denotes fucose, and yellow circles denote galactoses.