AssigningSecondaryStructure

Latest Release MIT license Build Status Coverage

AssigningSecondaryStructure provides a way to assign loops, helices, and strands to protein backbones using a simplified version of the DSSP algorithm.

Both the BioStructures.jl and ProteinSecondaryStructures.jl packages provide interfaces for more sophisticated secondary structure assignment, but they both call the DSSP_jll.jl binary under the hood, which requires writing structures to a file with significant overhead.

Installation

The package is registered in the General registry, and can be installed from the REPL with ]add AssigningSecondaryStructure.

Usage

The assign_secondary_structure function takes a vector of atom coordinate arrays of size (3, 3, L). The first axis is for the x, y, and z coordinates, the second axis is for the atom types (N, CA, C), and the third axis is for the residues.

julia> using BioStructures

julia> coords_vector = map(collectchains(read("test/data/1ZAK.pdb", PDBFormat))) do chain
        reshape(coordarray(chain, backboneselector), 3, 4, :)[:, 1:3, :] # get N, CA, C atoms only
    end

julia> using AssigningSecondaryStructure

julia> assign_secondary_structure(coords_vector) # 2 chains
2-element Vector{Vector{Int64}}:
 [1, 1, 1, 1, 3, 3, 3, 3, 3, 3    2, 2, 2, 2, 2, 2, 2, 1, 1, 1]
 [1, 1, 1, 1, 3, 3, 3, 3, 3, 3    2, 2, 2, 2, 2, 2, 2, 1, 1, 1]

Acknowledgements

This package was originally ported from the PyDSSP package, created by Shintaro Minami. The code has since been rewritten to look more like the 1983 paper (Kabsch W and Sander C), and to be more Julian, understandable, and efficient, at the cost of it no longer being differentiable like the PyDSSP version. The time complexity is still quadratic, so it may be slow for larger proteins. We plan on making a more efficient version with k-d trees.