DataDepsGenerators.aggregate
— Methodaggregate(metadatas)
Given a collection of Metadata
from differenct sources, combined them to create the most complete and detailed accounting of metadata.
DataDepsGenerators.citation_text
— Methodcitation_text(doi)
Uses the DOI formatted citation service to generate citation text for a given DOI. This works for DOI's issued by: CrossRef, DataCite, and mEDRA.
See https://citation.crosscite.org/docs.html#sec-4-1
DataDepsGenerators.escape_multiline_string
— Methodescape_multiline_string
like Escape string, but does not escape newlines
DataDepsGenerators.filter_html
— Methodfilter_html(text)
Strips any HTML tags out of the text
. If that is required.
DataDepsGenerators.generate
— Functiongenerate([repo/s], url/id, [shortname]; show_failures=false)
Generates a DataDeps code block. The only required parameter is the url/id.
url/id
The identifier for the dataset- a URL for a landing page is normally best
- Other IDs like a DOIs also work.
repo/s
either a single repository/API or a list of such- this takes on of the
DataRepo
types exported by this package.- E.g
CKAN()
, orFigshare()
.
- E.g
- If not provided, this defaults to checking all of them.
- If only one repo is provided, and it fails, the error will be thrown.
- If multiple repos are provided, them the metadata from all of them is combined.
- this takes on of the
shortname
, the name to use in the generated DataDep- if not provided will use the dataset's title, but these are often very long.
show_failures
, weather or not to list all therepos
that fail and why.- You generally do not want to turn this on, unless you are debugging.
- It is fine and expected for most repos to fail (after all the data is probably only on one of them)
- If all repos fail, then the failure list will be shown, regardless of if this is set or not.
DataDepsGenerators.get_dataurls_from_webserver_index
— Methodlinks_from_webserver_index(url)
Extracts all the content links from a webservers directory index page. These follow a pretty standard form. This one is tested so far on Apache/2.2.15
DataDepsGenerators.getfirst
— Methodgetfirst(dict, keys...)
Returns the element coresponding to the first key that is found. Returns missing
if no key is found.
DataDepsGenerators.getpage
— Functiongetpage(url)
downloads and parses the page from the URL
DataDepsGenerators.getpage_raw
— Methodgetpage_raw(url)
Downloads the page from the URL, returning the raw (unparsed) text of the body.
DataDepsGenerators.indent
— Functionindent(str)
Indents each line in a string
DataDepsGenerators.leaf_subtypes
— Methodleaf_subtypes(T)
Returns all the nonabstract types decedent from T
.
DataDepsGenerators.lift
— Methodlift(func, arg)
Calls func(arg)
, propagating missing
values
DataDepsGenerators.match_doi
— Methodmatch_doi(uri::String
DataDepsGenerators.text_only
— Methodtext_only(doc)
Extracts just the unformatted text (no attributes etc), from a HTML document or fragment(/s)