BioRecordsProcessing

In BioRecordsProcessing records are processed using a Pipeline that is constructed by taking a source (producing records), a user-defined function to process the records and a sink that will store the output of the processing function. The pipeline can then be run.

In this example a FASTA file is read from the disk, the sequence is extracted from the records and collected in an array :

using BioRecordsProcessing, FASTX, BioSequences

p = Pipeline(
    Reader(FASTX.FASTA, File(filepath)),
    record -> begin
        sequence(LongDNA{4}, record)
    end,
    Collect(LongDNA{4}),
)
run(p)

# output
2-element Vector{LongSequence{DNAAlphabet{4}}}:
 CTTGGCATACTCAAACTCTT
 CTTGGCATACTCAAACTCTT

By using different combinations of source and sink, and with user defined processing function, this allows to handle many common cases of biological records processing.

Conventions

  • If the processing function returns nothing the record will not be written to the sink, allowing to filter out records.
  • When writing a file to the disk the sink will get the filename from the source, so a source need to have a filename provided in this case.

Sources

Missing docstring.

Missing docstring for BioRecordsProcessing.Reader. Check Documenter's build log for details.

Missing docstring.

Missing docstring for BioRecordsProcessing.Buffer. Check Documenter's build log for details.

File Providers

Reader can take one of these files provider as agument :

Missing docstring.

Missing docstring for BioRecordsProcessing.File. Check Documenter's build log for details.

Missing docstring.

Missing docstring for BioRecordsProcessing.Directory. Check Documenter's build log for details.

Sinks

Missing docstring.

Missing docstring for Writer. Check Documenter's build log for details.

Missing docstring.

Missing docstring for Collect. Check Documenter's build log for details.

Pipeline

Missing docstring.

Missing docstring for Pipeline. Check Documenter's build log for details.

Base.runFunction
run(p::Pipeline; max_records = Inf, verbose = true)

Run the pipeline, the processing will stop after max_records have been read. Depending on the sink it will return a path to the output file or an array.