make-pdb-collection

A PDB collection is a set of representative PDB files for small molecules, such as lipids. Collections are associated with CHARMMFF “streams”, and one more more collections comprise a PDB “repository”. A PDB collection must be a directory whose name is its ID, and whose contents are either standalone PDB files with the naming conventions <RESI>.pdb, or subdirectories named after the RESI, each of which contains a set of PDB files that represent different configurations of the molecule. Pestifer’s built-in PDB repository was constructed from selected residues in the lipid, water, and ion streams of the CHARMM36 force field.

The built-in PDB collection

You can see what RESIs are included in pestifer using the show-resources subcommand:

$ pestifer show-resources --charmmff pdb
---------------------------------------------------------------------------
PDB Collections:
PDBCollection(registered_at=2, streamID=water_ions, path=water_ions.tgz, contains 12 resnames)
BAR,    CAL,    CD2,    CES,    CLA,    LIT,     MG,
POT,    RUB,    SOD,   TIP3,    ZN2
PDBCollection(registered_at=1, streamID=lipid, path=lipid.tgz, contains 130 resnames)
23SM,    ASM,    BSM, C6DHPC, CER160, CER180, CER181,
CER2, CER200, CER220, CER240, CER241,  CER3E,   CHL1,
CHM1,   CHSD,   CHSP,   DAPA,   DAPC,   DAPE,   DAPG,
DAPS,   DCPC,  DDOPC,  DDOPE,  DDOPS,   DDPC,   DEPA,
DEPC,   DEPE,   DEPG,   DEPS,   DGPA,   DGPC,   DGPE,
DGPG,   DGPS,   DIPA,   DLPA,   DLPC,   DLPE,   DLPG,
DLPS,  DLiPC,  DLiPE,   DMPA,   DMPC,   DMPE,   DMPG,
DMPS,   DNPA,   DNPC,   DNPE,   DNPG,   DNPS,   DOPA,
DOPC,   DOPE,   DOPG,  DOPP1,  DOPP2,  DOPP3,   DOPS,
DPPA,   DPPC,   DPPE,   DPPG,   DPPS,   DSPA,   DSPC,
DSPE,   DSPG,   DSPS,   DTPA,   DUPC,   DXPC,   DXPE,
DYPA,   DYPG,   DYPS,    ERG,   LLPA,   LLPC,   LLPE,
LLPS,   LPPC,    LSM,    NSM,    OSM,  PDOPC,  PDOPE,
PLPA,   PLPC,   PLPE,   PLPG,   PLPS,   POPA,   POPC,
POPE,   POPG,  POPP1,  POPP2,  POPP3,   POPS,    PSM,
SAPA,   SAPC,   SAPE,   SAPG,   SAPS,   SDPA,   SDPC,
SDPE,   SDPG,   SDPS,   SITO,   SLPA,   SLPC,   SLPE,
SLPG,   SLPS,   SOPA,   SOPC,   SOPE,   SOPG,   SOPS,
SSM,   STIG,   TIPA,   TSPC
---------------------------------------------------------------------------

This shows that there are two PDB collections in the built-in repository: one for water and ions and another for lipids. The water/ion collection was created manually; these are very simple PDB files. Any of those lipid residues can be referred to in a make_membrane_system task (See make_membrane_system and Example 16: HIV-1 Env MPER-TM Trimer in a DMPC Symmetric Bilayer and Example 17: HIV-1 Env MPER-TM Trimer in an Asymmetric, Model Viral Bilayer).

The lipid collection was created using make-pdb-collection in the following way:

$ pestifer make-pdb-collection --streamID lipid
$ pestifer make-pdb-collection --streamID lipid --substreamID cholesterol
$ pestifer make-pdb-collection --streamID lipid --substreamID cholesterol --resname CHM1 --take-ic-from CHL1
$ pestifer make-pdb-collection --streamID lipid --substreamID sphingo
$ pestifer make-pdb-collection --streamID lipid --substreamID miscellaneous
$ pestifer make-pdb-collection --streamID lipid --substreamID detergent --residueID C6DHPC
$ tar zcf lipid.tgz lipid

The tarball lipid.tgz is the compressed PDB collection that pestifer uses, and it is contained in the resources data directory of the project. The residue CHM1 does not have valid internal coordinates (ICs) because it is just a truncated version of the cholesterol residue CHL1, so we use the --take-ic-from option to copy the ICs from CHL1 to CHM1.

(As instructed in the current CHARMM force field, we use “model 1” for cholesterol.)

Contents of one lipid RESI entry in a PDB collection

Each RESI in a PDB collection is represented by a subdirectory named after the RESI, and that subdirectory contains a set of PDB files that represent different configurations of the molecule. For example, the DOPC RESI in the lipid collection has the following contents:

DOPC
├── DOPC-00.pdb
├── DOPC-01.pdb
├── DOPC-02.pdb
├── DOPC-03.pdb
├── DOPC-04.pdb
├── DOPC-05.pdb
├── DOPC-06.pdb
├── DOPC-07.pdb
├── DOPC-08.pdb
├── DOPC-09.pdb
├── DOPC-init.pdb
├── DOPC-init.psf
├── info.yaml
└── init.tcl

The pdb files DOPC-00.pdb through DOPC-09.pdb are the 10 different configurations of the DOPC molecule. The DOPC-init.pdb and DOPC-init.psf files are the initial coordinates and topology of the molecule, and the init.tcl file is a psfgen script used to generate those two files. The info.yaml file contains metadata about the RESI, such as its long name and measurements of its dimensions that packmol needs:

charge: 0.0
conformers:
- head-tail-length: 27.324
  max-internal-length: 31.501
  pdb: POPC-00.pdb
- head-tail-length: 28.827
  max-internal-length: 32.059
  pdb: POPC-01.pdb
- head-tail-length: 28.67
  max-internal-length: 31.82
  pdb: POPC-02.pdb
- head-tail-length: 28.222
  max-internal-length: 31.135
  pdb: POPC-03.pdb
- head-tail-length: 27.051
  max-internal-length: 31.377
  pdb: POPC-04.pdb
- head-tail-length: 26.786
  max-internal-length: 31.216
  pdb: POPC-05.pdb
- head-tail-length: 27.72
  max-internal-length: 31.825
  pdb: POPC-06.pdb
- head-tail-length: 27.918
  max-internal-length: 31.337
  pdb: POPC-07.pdb
- head-tail-length: 27.752
  max-internal-length: 30.961
  pdb: POPC-08.pdb
- head-tail-length: 27.942
  max-internal-length: 31.738
  pdb: POPC-09.pdb
defined-in: top_all36_lipid.rtf
parameters:
- par_all36m_prot.prm
- par_all36_na.prm
- par_all36_cgenff.prm
- toppar_all36_carb_glycopeptide.str
- par_all36_carb.prm
- toppar_water_ions.str
- toppar_all36_prot_modify_res.str
- par_all36_lipid.prm
reference-atoms:
heads:
- name: N
  serial: 1
tails:
- name: C218
  serial: 88
- name: C316
  serial: 131
synonym: 3-palmitoyl-2-oleoyl-D-glycero-1-Phosphatidylcholine

Building your own PDB collections

Suppose you want to use lipid residues defined in the CHARMMFF stream file toppar_all36_lipid_yeast.str; that is, you want PDBs for all the RESI’s in the yeast substream. These are currently not part of the default PDB collection that comes with pestifer. Consider the following commands:

$ mkdir ~/my_pestifer_project
$ cd ~/my_pestifer_project
$ pestifer make-pdb-collection --streamID lipid --substreamID yeast --output-dir lipid-yeast

This will generated a directory ~/my_pestifer_project/lipid-yeast/ that contains the new PDB collection. Each RESI subdirectory will contain 10 PDB files, each of which represents a different configuration of the molecule, along with an info.yaml file that contains important metadata about the RESI:

lipid-yeast/
├── DYPC
├── DYPE
├── PYPE
├── YOPA
├── YOPC
├── YOPE
└── YOPS

Each of these subdirectories contains the PDB files and metadata for that RESI. For example, the DYPC subdirectory contains:

DYPC/
├── DYPC-00.pdb
├── DYPC-01.pdb
├── DYPC-02.pdb
├── DYPC-03.pdb
├── DYPC-04.pdb
├── DYPC-05.pdb
├── DYPC-06.pdb
├── DYPC-07.pdb
├── DYPC-08.pdb
├── DYPC-09.pdb
├── DYPC-init.pdb
├── DYPC-init.psf
├── info.yaml
└── init.tcl

Suppose you want to use the PDB collection you just created in a make_membrane_system task. You would need include the path in the pdbcollections list under the toplevel charmmff section:

charmmff:
  pdbcollections:
    - ~/my_pestifer_project/lipid-yeast