Buffer(data::Vector{T}; filename = "")

Use the array data as a source of records. An optional filename can be provided when a Writer is used as a sink.

Collect(T::DataType; paired=false)

Write the output of the processing function into an vector in memory. The type of output has to be provided. For paired files the option paired need to be set to true, the output will then consists of a vector of tuples.

Directory(directory::String, glob_pattern::String; second_in_pair = nothing)

List all files matching the glob_pattern (See Glob.jl) in directory. For paired files a function taking as argument the filename of the first file in pair and returning the filename of the second file can be provided.

Directory(input_directory, "*.fastq")
File(filename; second_in_pair = nothing)

For paired files a function taking as argument the filename of the first file in pair and returning the filename of the second file can be provided. For example one can use replace or a dictionnary, e.g. second_in_pair = f1 -> replace(f1, "_1" => "_2").

Pipeline(source, processor, sink)
Pipeline(source, sink)

Build a Pipeline, if processor is omitted it will default to identity.

Reader(record_module::Module, file_provider::F) where {F <: AbstractFileProvider}

Read a file or a directory on the disk and produce records of type record_module.Record. The second argument can be a File or a Directory.

If a string is passed the second argment will default to File.

Reader(FASTX.FASTA, "test.fa")
Reader(FASTX.FASTA, File("test.fa"))
Reader(FASTX.FASTQ, Directory("data/", "*.fastq"))
Writer(record_module::Module, output_directory::String; 
    suffix = "", 
    paired = false, 
    second_in_pair = nothing, 
    extension = nothing, 
    header = nothing

Write the output of the processing function into a file, the first argument is the module that owns the Record type (e.g FASTX.FASTA, VCF, ...), and the second the ouput directory. The filename is determined by the source, to which an optional suffix can be added. If the type ouput is different from the type of the output (e.g. SAM to BAM), the extension (".bam") should be specified. For SAM & BAM a SAM.Header should be provided.

To avoid overwriting existing files, the pipeline will check that the output file is different from the input file.

run(p::Pipeline; max_records = Inf, verbose = true)

Run the pipeline, the processing will stop after max_records have been read. Depending on the sink it will return a path to the output file or an array.