Solute and solvent selections

The solute and solvent are defined by selecting subsets of atoms from the system. These subsets are defined by the AtomSelection data structures.

To construct a AtomSelection data structure, one needs to provide, at least, the (1-based) indices of the atoms that belong to the selection, and either the number of atoms of each molecule or the number of molecules in the selection.

Using the PDBTools package

The PDBTools package helps the construction of the solute and solvent data structures, by providing a convenient selection syntax. Additionally, it sets up the names of the atoms of the system in the data structure, which can be used to retrieve atom and and group contributions to MDDFs and coordination numbers.

For example, here we define a protein of a system as the solute:

julia> using ComplexMixtures, PDBTools

julia> atoms = readPDB(ComplexMixtures.Testing.pdbfile);

julia> protein = select(atoms, "protein");

julia> solute = AtomSelection(protein, nmols=1)
AtomSelection 
    1463 atoms belonging to 1 molecule(s).
    Atoms per molecule: 1463
    Number of groups: 1463 

We need to inform the AtomSelection function about the number of atoms of each molecule (using natomspermol=3, for example), or the number of molecules (using nmols=1000, for example), such that the atoms belonging to each molecule can be determined without ambiguity.

Now, we define the solvent of the system as the water molecules:

julia> water = select(atoms, "water"); 

julia> solvent = AtomSelection(water, natomspermol=3)
AtomSelection 
    58014 atoms belonging to 19338 molecule(s).
    Atoms per molecule: 3
    Number of groups: 3

Using VMD

VMD is a very popular and powerful package for visualization of simulations. It contains a very versatile library to read topologies and trajectory files, and a powerful selection syntax. The PDBTools.jl (v1.0 or greater) package provides a simple wrapper to VMD that allows using the same syntax at it supports.

For example, the solute can be defined with:

using ComplexMixtures, PDBTools

atoms = readPDB("system.pdb")

indices, names = select_with_vmd("./system.pdb", "protein", vmd="/usr/bin/vmd")

protein = atoms[indices]

solute = AtomSelection(protein, nmols=1)

The main advantage here is that all the file types that VMD supports are supported. But VMD needs to be installed and is run in background, and it takes a few seconds to be executed.

The VMDSelect function also accepts an optional keyword parameter srcload, which can be used to load custom scripts within vmd before setting the selection. This allows the definition of tcl scripts with custom selection macros, for instance. The usage would be:

using PDBTools

indices, names = select_with_vmd(
    "file.pdb", 
    "resname MYRES"; 
    srcload = [ "mymacros1.tcl", "mymacros2.tcl" ]
)

Which corresponds to sourceing each of the macro files in VMD before defining the selection with the custom MYRES name.

Warning

VMD uses 0-based indexing and VMDselect adjusts that. However, if a selection is performed by index, as with index 1, VMD will select the second atom, and the output will be [2]. AtomSelections by type, name, segment, residue name, etc, won't be a problem.

Predefinition of atom groups

Importantly, this should be only a concern for the solvation analysis of systems in which individual molecules are very large. This feature was introduced in version 2.0 of the package to support the study of small molecule distribution in virus structures, of millions of atoms.

By default, the contribution of each type of atom to the coordination number counts is stored, to allow the decomposition of the final MDDFs into any group contribution. However, when a structure, like a virus, has millions of atoms, storing the contribution of each atom becomes prohibitive in terms of memory. Thus, one may need to predefine the groups in which the contributions will be analyzed.

Here, we illustrate this feature by presselecting the acidic and basic residues of a protein:

julia> using ComplexMixtures, PDBTools

julia> atoms = readPDB(ComplexMixtures.Testing.pdbfile);

julia> protein = select(atoms, "protein");

julia> acidic_residues = select(atoms, "protein and acidic");

julia> basic_residues = select(atoms, "protein and basic");

julia> solute = AtomSelection(
        protein, 
        nmols=1,
        group_atom_indices = [ index.(acidic_residues), index.(basic_residues) ],
        group_names = [ "acidic residues", "basic residues" ]
    )
AtomSelection 
    1463 atoms belonging to 1 molecule(s).
    Atoms per molecule: 1463
    Number of groups: 2 

In this example, then, the solute AtomSelection has two groups. The indices of the atoms of the groups are stored in the group_atom_indices vector and the group names in the group_names vector. The atom_group auxiliary function is the most practical way to retrive the indices of the atoms of the group.

julia> atom_group(solute, "acidic residues")
162-element Vector{Int64}:
   24
   25
   26
    ⋮
 1436
 1437

With these group selections predefined, the contributions of these groups to the MDDF or coordination numbers can be retrived directly from the result data structure with, for example:

julia> trajectory = Trajectory(trajectory_file, solute, solvent)

julia> result = mddf(trajectory, Options(bulk_range=(8.0, 12.0)));

julia> acidic_residue_contributions = contributions(result, SoluteGroup("acidic residues"))

Reference functions

ComplexMixtures.AtomSelectionType
struct AtomSelection

Structure that contains the information about the solute and solvent molecules.

  • nmols::Int64

  • natomspermol::Int64

  • indices::Vector{Int64}

  • custom_groups::Bool

  • group_atom_indices::Vector{Vector{Int64}}

  • group_names::Vector{String}

ComplexMixtures.AtomSelectionMethod

AtomSelection constructors

The AtomSelection structure carries the information of the molecules that are going to be used to compute the MDDF. The structure can be initialized in different ways:

  1. Initialize the structure providing a vector of PDBTools.Atom(s).
    AtomSelection(
        atoms::AbstractVector{<:PDBTools.Atom}; 
        nmols::Int = 0, 
        natomspermol::Int = 0,
        group_atom_indices::Union{Nothing,Vector{Vector{Int}}} = nothing,
        group_names::Vector{String} = String[]
    ) 

The indices of the atoms will be retrived from the indices of the atoms as defined in the PDB file, thus the PDB file must correspond to the same system as that of the simulation.

Either the number of molecules (nmols) or the number of atoms per molecule (natomspermol) must be provided.

If group_atom_indices is nothing or group_names is empty, the names of the groups will be retrieved from the atom names, and in the coordination numbers of each individual atom will be stored.

Example

julia> using ComplexMixtures, PDBTools

julia> pdbfile = ComplexMixtures.Testing.pdbfile;

julia> atoms = PDBTools.readPDB(pdbfile, "resname TMAO");

julia> atsel = AtomSelection(atoms, natomspermol=14)
AtomSelection 
    2534 atoms belonging to 181 molecule(s).
    Atoms per molecule: 14
    Number of groups: 14 

julia> atom_group_name(atsel, 1)
"N"

julia> atom_group_name(atsel, 5)
"O1"

julia> length(atom_group_names(atsel))
14
  1. Lower level: initialize the structure providing the index of atoms and groups.
    AtomSelection(
        indices::Vector{Int};
        nmols::Int = 0,
        natomspermol::Int = 0,
        group_atom_indices::Union{Nothing,Vector{Vector{Int}}} = nothing,
        group_names::Vector{String} = String[]
    )

Construct an AtomSelection structure from the most low-level information: the index of atoms and groups.

Either the number of molecules (nmols) or the number of atoms per molecule (natomspermol) must be provided.

Groups of atoms can be defined by providing a vector of vectors of atom indices (group_atom_indices), and a vector of group names (group_names). If group_atom_indices is set to nothing, the coordination numbers of each individual atoms wil be stored.

Examples

julia> using ComplexMixtures

julia> AtomSelection([1,2,3], nmols=1)
AtomSelection 
    3 atoms belonging to 1 molecule(s).
    Atoms per molecule: 3
    Number of groups: 3

julia> AtomSelection([1,2,3], natomspermol=1)
AtomSelection 
    3 atoms belonging to 3 molecule(s).
    Atoms per molecule: 1
    Number of groups: 1

julia> AtomSelection([1,2,3], natomspermol=1, group_atom_indices=[[1,2],[3]], group_names=["G1", "G2"])
AtomSelection 
    3 atoms belonging to 3 molecule(s).
    Atoms per molecule: 1
    Number of groups: 2 
ComplexMixtures.SoluteGroupType

SoluteGroup and SolventGroup data structures.

These structures are used to select groups of atoms to extract their contributions from the MDDF results.

Most tipically, the groups are defined from a selection of atoms with the PDBTools package, or by providing directly the indices of teh atoms in the structure.

Alternativelly, if the groups were predefined, the groups can be selected by group index or group name.

The possible constructors are:

SoluteGroup(atoms::Vector{PDBTools.Atom})
SoluteGroup(atom_indices::Vector{Int})
SoluteGroup(atom_names::Vector{String})
SoluteGroup(group_name::String)
SoluteGroup(residue::PDBTools.Residue)
SoluteGroup(atsel::AtomSelection)

above, each constructor can be replaced by SolventGroup. The resulting data structures are used as input parameters for the contributions function:

contributions(results::Result, group::Union{SoluteGroup, SolventGroup}; type=:mddf)

See the contributions help entry for additional information.

Examples

Defining solute groups with different input types:

julia> using ComplexMixtures, PDBTools

julia> atoms = PDBTools.readPDB(ComplexMixtures.Testing.pdbfile, "protein"); 

julia> SoluteGroup(select(atoms, "protein and resname ASP")) # vector of PDBTools.Atom(s)
SoluteGroup defined by:
    atom_indices: [ 24, 25, ..., 1056, 1057 ] - 72 atoms

julia> SoluteGroup(1:100) # atom indices (range or vector)
SoluteGroup defined by:
    atom_indices: [ 1, 2, ..., 99, 100 ] - 100 atoms

julia> SoluteGroup(["N", "CA", "C", "O"]) # vector of atom names
SoluteGroup defined by:
    atom_names: [ N, CA, C, O ] - 4 atom names.
 
julia> SoluteGroup("acidic residues") # predefined group name
SoluteGroup defined by:
    group_name: "acidic residues"

julia> SoluteGroup(1) # predefined group index
SoluteGroup defined by:
    group_index: 1

julia> SoluteGroup(collect(eachresidue(atoms))[2]) # PDBTools.Residue(s)
SoluteGroup defined by:
    atom_indices: [ 13, 14, ..., 22, 23 ] - 11 atoms
ComplexMixtures.SolventGroupType

SoluteGroup and SolventGroup data structures.

These structures are used to select groups of atoms to extract their contributions from the MDDF results.

Most tipically, the groups are defined from a selection of atoms with the PDBTools package, or by providing directly the indices of teh atoms in the structure.

Alternativelly, if the groups were predefined, the groups can be selected by group index or group name.

The possible constructors are:

SoluteGroup(atoms::Vector{PDBTools.Atom})
SoluteGroup(atom_indices::Vector{Int})
SoluteGroup(atom_names::Vector{String})
SoluteGroup(group_name::String)
SoluteGroup(residue::PDBTools.Residue)
SoluteGroup(atsel::AtomSelection)

above, each constructor can be replaced by SolventGroup. The resulting data structures are used as input parameters for the contributions function:

contributions(results::Result, group::Union{SoluteGroup, SolventGroup}; type=:mddf)

See the contributions help entry for additional information.

Examples

Defining solute groups with different input types:

julia> using ComplexMixtures, PDBTools

julia> atoms = PDBTools.readPDB(ComplexMixtures.Testing.pdbfile, "protein"); 

julia> SoluteGroup(select(atoms, "protein and resname ASP")) # vector of PDBTools.Atom(s)
SoluteGroup defined by:
    atom_indices: [ 24, 25, ..., 1056, 1057 ] - 72 atoms

julia> SoluteGroup(1:100) # atom indices (range or vector)
SoluteGroup defined by:
    atom_indices: [ 1, 2, ..., 99, 100 ] - 100 atoms

julia> SoluteGroup(["N", "CA", "C", "O"]) # vector of atom names
SoluteGroup defined by:
    atom_names: [ N, CA, C, O ] - 4 atom names.
 
julia> SoluteGroup("acidic residues") # predefined group name
SoluteGroup defined by:
    group_name: "acidic residues"

julia> SoluteGroup(1) # predefined group index
SoluteGroup defined by:
    group_index: 1

julia> SoluteGroup(collect(eachresidue(atoms))[2]) # PDBTools.Residue(s)
SoluteGroup defined by:
    atom_indices: [ 13, 14, ..., 22, 23 ] - 11 atoms
ComplexMixtures.atom_groupMethod
atom_group(atsel::AtomSelection, i::Int)
atom_group(atsel::AtomSelection, groupname::String)

atom_group(atsel::AtomSelection, i::Int)
atom_group(atsel::AtomSelection, groupname::String)

Return the indices of the atoms that belong to a given group.

Example

julia> using ComplexMixtures

julia> atsel = AtomSelection([1,2,3], natomspermol=1, group_atom_indices=[[1,2],[3]], group_names=["G1", "G2"])
AtomSelection 
    3 atoms belonging to 3 molecule(s).
    Atoms per molecule: 1
    Number of groups: 2

julia> atom_group(atsel, 1)
2-element Vector{Int64}:
 1
 2

julia> atom_group(atsel, "G2")
1-element Vector{Int64}:
 3

julia> atom_group_name(atsel, 1)
"G1"
ComplexMixtures.atom_group_nameMethod
atom_group_name(atsel::AtomSelection, i::Int)
atom_group_names(atsel::AtomSelection)

Return the name of the group of atoms with index i. The atom_group_names function returns a vector with the names of all the groups.

Example

julia> using ComplexMixtures

julia> atsel = AtomSelection([1,2,3], natomspermol=1, group_atom_indices=[[1,2],[3]], group_names=["G1", "G2"])
AtomSelection 
    3 atoms belonging to 3 molecule(s).
    Atoms per molecule: 1
    Number of groups: 2

julia> atom_group_name(atsel, 1)
"G1"

julia> atom_group_names(atsel)
2-element Vector{String}:
 "G1"
 "G2"
ComplexMixtures.atom_group_namesMethod
atom_group_name(atsel::AtomSelection, i::Int)
atom_group_names(atsel::AtomSelection)

Return the name of the group of atoms with index i. The atom_group_names function returns a vector with the names of all the groups.

Example

julia> using ComplexMixtures

julia> atsel = AtomSelection([1,2,3], natomspermol=1, group_atom_indices=[[1,2],[3]], group_names=["G1", "G2"])
AtomSelection 
    3 atoms belonging to 3 molecule(s).
    Atoms per molecule: 1
    Number of groups: 2

julia> atom_group_name(atsel, 1)
"G1"

julia> atom_group_names(atsel)
2-element Vector{String}:
 "G1"
 "G2"