pestifer.molecule.molecule module

A class for handling molecules

class pestifer.molecule.molecule.Molecule(source: dict = {}, objmanager: ObjManager = None, chainIDmanager: ChainIDManager = None, **kwargs)[source]

Bases: object

A class for handling molecules, including their asymmetric unit and biological assemblies. This class is initialized with a source dictionary that can contain various specifications such as PDB or mmCIF identifiers, prebuilt structures, or AlphaFold predictions and manages the parsing of these structures into an asymmetric unit and biological assemblies.

Parameters:
  • source (dict) –

    A dictionary containing the source specifications for the molecule. It can include:

    • id: PDB or mmCIF identifier (e.g., {id: 1ABC, file_format: PDB})

    • prebuilt: A dictionary with psf and pdb keys for prebuilt structures (e.g., {prebuilt: {psf: structure.psf, pdb: structure.pdb}})

    • alphafold: A dictionary with AlphaFold specifications (e.g., {alphafold: {model: AF-1234}})

  • objmanager (ObjManager, optional) – An instance of ObjManager to manage objects within the molecule. If not provided, a new ObjManager will be created.

  • chainIDmanager (ChainIDManager, optional) – An instance of ChainIDManager to manage chain IDs within the molecule. If not provided, a new ChainIDManager will be created.

  • reset_counter (bool, optional) – If True, resets the molecule counter to 0. This is useful for testing or reinitialization purposes. Default is False.

molid

Unique identifier for the molecule instance, automatically incremented with each new instance.

Type:

int

objmanager

An instance of ObjManager that manages objects within the molecule.

Type:

ObjManager

chainIDmanager

An instance of ChainIDManager that manages chain IDs within the molecule.

Type:

ChainIDManager

sourcespecs

A dictionary containing the source specifications for the molecule, such as PDB or mmCIF identifiers, prebuilt structures, or AlphaFold predictions.

Type:

dict

asymmetric_unit

An instance of AsymmetricUnit representing the asymmetric unit of the molecule.

Type:

AsymmetricUnit

biological_assemblies

An instance of BioAssembList containing the biological assemblies derived from the asymmetric unit.

Type:

BioAssembList

parsed_struct

A dictionary containing the parsed structure of the molecule, which may include atoms, residues, and segments parsed from the source specifications.

Type:

dict

rcsb_file_format

The file format of the source specifications, either PDB or mmCIF.

Type:

str

activate_biological_assembly(index)[source]

Activate a biological assembly by its index. This method sets the active biological assembly based on the provided index.

Parameters:

index (int) – The index of the biological assembly to activate. If the index is 0 or if no biological assemblies are specified, the asymmetric unit will be used as the biological assembly.

Returns:

self – Returns the instance of the Molecule with the active biological assembly set.

Return type:

Molecule

Raises:

AssertionError – If the specified biological assembly index is invalid.

cleave_chains(clv_list: CleavageSiteList)[source]

Cleave segments in the asymmetric unit based on a list of cleavage specifications. This method iterates through the list of cleavage specifications, finds the corresponding segments in the asymmetric unit, and performs the cleavage operation. It also updates the chain IDs of disulfide bonds and links that are affected by the cleavage.

Parameters:

clv_list (list) – A list of cleavage specifications to apply to the asymmetric unit.

get_chainmaps()[source]

Get a mapping of chain IDs in the active biological assembly. This method returns a dictionary where keys are original chain IDs and values are lists of dictionaries containing the transform index and the mapped chain ID.

Returns:

maps – A dictionary mapping original chain IDs to lists of dictionaries with transform index and mapped chain ID. Each entry in the list corresponds to a transformation applied to the asymmetric unit.

Return type:

dict

loop_counts(min_loop_length=1)[source]

Check if the asymmetric unit contains loops (missing residues) of a specified minimum length. This method iterates through the segments of the asymmetric unit and counts the number of loops that are in the MISSING state and have a length greater than or equal to the specified minimum length.

Parameters:

min_loop_length (int) – The minimum length of loops to consider.

Returns:

nloops – A dictionary containing the counts of loops in the asymmetric unit, categorized by segment type. The keys are segment types (e.g., protein, nucleicacid) and the values are the counts of loops. For example, {'protein': 3, 'nucleicacid': 2} indicates that there are 3 loops in protein segments and 2 loops in nucleic acid segments.

Return type:

dict

nglycans()[source]

Count the number of glycan segments in the asymmetric unit. This method iterates through the segments of the asymmetric unit and counts the number of segments that are of type glycan.

Returns:

nglycans – The number of glycan segments found in the asymmetric unit.

Return type:

int

num_atoms()[source]

Count the number of atoms in the asymmetric unit. This method returns the total number of atoms present in the asymmetric unit.

Returns:

num_atoms – The total number of atoms present in the asymmetric unit.

Return type:

int

num_images()[source]

Count the number of images in the active biological assembly. This method returns the number of transforms applied to the asymmetric unit in the active biological assembly.

Returns:

num_images – The number of images (transforms) in the active biological assembly.

Return type:

int

num_residues()[source]

Count the number of residues in the asymmetric unit. This method returns the total number of residues present in the asymmetric unit.

Returns:

num_residues – The total number of residues present in the asymmetric unit.

Return type:

int

num_segments()[source]

Count the number of segments in the asymmetric unit. This method returns the total number of segments present in the asymmetric unit.

Returns:

num_segments – The total number of segments present in the asymmetric unit.

Return type:

int

set_coords(altcoordsfile)[source]

Set the coordinates of the asymmetric unit from an alternate coordinates file. This method reads the alternate coordinates from a PDB file and updates the asymmetric unit’s coordinates.

Parameters:

altcoordsfile (str) – The path to the alternate coordinates file in PDB format.

Raises:

AssertionError – If the provided file is not in PDB format or if the file does not exist.

write_connect_patches(writer, min_length=4)[source]

Write Tcl commands to create connect patches (LINK``s) for gaps in the asymmetric unit. This method iterates through the segments of the asymmetric unit and generates Tcl commands to create connect patches for gaps that are in the ``MISSING state and have a length greater than or equal to the specified minimum length.

Parameters:
  • writer (scriptwriter) – An instance of a script writer that will be used to write the Tcl commands.

  • min_length (int, optional) – The minimum length of gaps to consider. Default is 4.

write_gaps(writer, min_length=4)[source]

Write Tcl commands to declare gaps in the asymmetric unit that are in the MISSING state and have a length greater than or equal to the specified minimum length. This method iterates through the segments of the asymmetric unit and generates Tcl commands to declare the gaps.

Parameters:
  • writer (scriptwriter) – An instance of a script writer that will be used to write the Tcl commands.

  • min_length (int, optional) – The minimum length of gaps to consider. Default is 4.