Guide

The CIF files used in these examples are provided in the docs directory.

Reading a CIF file

To open CIF file demo.cif, and read _cell.length_a from block saly2_all_aniso:


using CrystalInfoFramework, DataFrames, FilePaths

nc = Cif(p"demo.cif")
my_block = nc["saly2_all_aniso"]  #could also use first(nc).second
my_block["_cell.length_a"]

# output

1-element Array{Union{Missing, Nothing, Dict{String,T}, Array{T,1}, String} where T,1}:
 "11.520(12)"

Note that values are always returned as Array type, with the value for a key value pair being a single-element array. The values in the arrays returned are either String, Missing (CIF ?), Nothing (CIF .) or compound types Dict and Array which are only available for CIF2 syntax files.

Option native=false to Cif will use the cif_api C parser instead of the Julia parser. The cif_api parser is faster and more memory-efficient for scripting use, whereas the native parser is faster if compilation time is less of a consideration (e.g. multiple files are being read in). The cif_api parser is not currently available on Windows systems.

Loops

Individual columns are returned when the data name is requested, as above.

get_loop returns a DataFrame object that can be manipulated using the methods of that package, most obviously, eachrow to iterate over the packets in a loop. To specify the required loop, simply provide any data name that appears in the loop.


l = get_loop(my_block,"_atom_site.label");

for r in eachrow(l)
    println("$(r[Symbol("_atom_site.fract_x")])")
end

# output

.5505(5)
.4009(5)
.2501(5)
.4170(7)
.3145(7)
.2789(8)
.3417(9)
.4445(9)
.4797(8)
.4549(7)

Updating a CIF file

Single key-value pairs are added in the same way as for a normal dictionary.

my_block["_new_item"] = "a fine item"
my_block["_number_item"] = 23

If the dataname belongs to a loop, a two-step process is required to add the values. First the column of values for the new data name is added as above, and then the new dataname can be added to a previously-existing loop. The following call adds _new_loop_item to the loop containing _old_item:

my_block["_new_loop_item"] = [1,2,3,4]
add_to_loop!(my_block,"_old_item","_new_loop_item")

The number of values in the array assigned to _new_loop_item must match the length of the loop it is added to - this is checked.

A completely new loop can be created with create_loop!. The columns corresponding to the data names provided to create_loop! must have previously been added to the data block, just like for add_to_loop!.

Writing CIFs

To write a CIF, open an IO stream and write the contents of the Cif object as MIME type "text/cif":

t = open("newcif.cif","w")
show(t,MIME("text/cif"),mycif)
close(t)

Note that currently no checks are made for correct construction of data names (e.g. leading underscore and characterset restrictions). This will be checked in the future.

Dictionaries and DataSources

Dictionaries

CIF dictionaries are created by passing the dictionary file name to DDLm_Dictionary or DDL2_Dictionary constructors. Note that DDL2 dictionaries are published by the Protein Data Bank (wwPDB) and DDLm dictionaries are used by the IUCr.

d = DDLm_Dictionary(p"cif_core.dic")

DataSources

CIF dictionaries can be used with any DataSource, providing that the datasource recognises the data names defined in the dictionary.

A DataSource is any object returning an array of values when supplied with a string. A CIF Block conforms to this specification, as does a simple Dict{String,Any}. DataSources are defined in submodule CrystalInfoFramework.DataContainer.

A CIF dictionary can be used to obtain data with correct Julia type from a DataSource that uses data names defined in the dictionary by creating a TypedDataSource:

using CrystalInfoFramework.DataContainer
my_dict = DDLm_Dictionary(p"../test/cif_core.dic")
bd = TypedDataSource(my_block,my_dict)
bd["_cell.length_a"]

# output

1-element Array{Float64,1}:
 11.52

Note that the array elements are now Float64 and that the standard uncertainty has been removed. Future improvements may use Measurements.jl to retain standard uncertainties.

Dictionaries also allow alternative names for a data name to be recognised provided these are noted in the dictionary:


l = bd["_cell_length_a"] #no period in name

# output

1-element Array{Float64,1}:
 11.52

where _cell_length_a is the old form of the data name.

Currently transformations from DataSource values to Julia values assume that the DataSource values are either already of the correct type, or are Strings that can be directly parsed by the Julia parse method.

Creating new DataSources

A file format can be used with CIF dictionaries if:

  1. It returns an Array of values when provided with a data name defined in the dictionary
  2. Arrays returned for data names from the same CIF category have corresponding values at the same position in the array - that is, they line up correctly if presented as columns in a table.

At a minimum, the following methods should be defined for the DataSource: getindex, haskey.

If the above are true of your type, then it is sufficient to define DataSource(::MyType) = IsDataSource() to make it available.

If a DataSourcemds can instead be modelled as a collection of DataSources, iterate_blocks should also be defined to iterate over the constituent DataSources. MultiDataSource(mds) will then create a DataSource where values returned for any data names defined in the constituent blocks are automatically aligned. Such MultiDataSource objects can be built to form hierarchies.

Types

A TypedDataSource consists of a DataSource and a CIF dictionary.

Values returned from a TypedDataSource are transformed to the appropriate Julia type as specified by the dictionary if the underlying DataSource returns String values formatted in a way that Julia parse can understand. Otherwise, the DataSource is responsible for returning the appropriate Julia type. Future improvements may add user-defined transformations if that proves necesssary.

A NamespacedTypedDataSource includes data from multiple namespaces. Correctly-typed data for a particular namespace can then be obtained from the object returned by select_namespace(t::NamespacedTypedDataSource,nspace).

Cif Categories from DataSources

A CIF category (a 'Relation' in the relational model) can be constructed from a DataSource, a CIF dictionary, and the CIF name of the category:

as = LoopCategory("atom_site",my_block,my_dict)

# output

Category atom_site Length 10
10×7 DataFrame. Omitted printing of 2 columns
│ Row │ u_iso_or_equiv │ fract_x   │ fract_z   │ adp_type  │ occupancy │
│     │ Cif Value…?    │ Cif Val…? │ Cif Val…? │ Cif Val…? │ Cif Val…? │
├─────┼────────────────┼───────────┼───────────┼───────────┼───────────┤
│ 1   │ .035(3)        │ .5505(5)  │ .1605(11) │ Uani      │ 1.00000   │
│ 2   │ .033(3)        │ .4009(5)  │ .2290(11) │ Uani      │ 1.00000   │
│ 3   │ .043(4)        │ .2501(5)  │ .6014(13) │ Uani      │ 1.00000   │
│ 4   │ .029(4)        │ .4170(7)  │ .4954(15) │ Uani      │ 1.00000   │
│ 5   │ .031(5)        │ .3145(7)  │ .6425(16) │ Uani      │ 1.00000   │
│ 6   │ .040(5)        │ .2789(8)  │ .8378(17) │ Uani      │ 1.00000   │
│ 7   │ .045(6)        │ .3417(9)  │ .8859(18) │ Uani      │ 1.00000   │
│ 8   │ .045(6)        │ .4445(9)  │ .7425(18) │ Uani      │ 1.00000   │
│ 9   │ .038(5)        │ .4797(8)  │ .5487(17) │ Uani      │ 1.00000   │
│ 10  │ .029(4)        │ .4549(7)  │ .2873(16) │ Uani      │ 1.00000   │

where a category is either a LoopCategory, with one or more rows, or a SetCategory, which is restricted to a single row. Alternatively, a TypedDataSource can be used, in which case the dictionary used by the TypedDataSource is also used for category construction.

as = LoopCategory("atom_site",bd)

# output

Category atom_site Length 10
10×7 DataFrame. Omitted printing of 2 columns
│ Row │ u_iso_or_equiv │ fract_x   │ fract_z   │ adp_type  │ occupancy │
│     │ Cif Value…?    │ Cif Val…? │ Cif Val…? │ Cif Val…? │ Cif Val…? │
├─────┼────────────────┼───────────┼───────────┼───────────┼───────────┤
│ 1   │ .035(3)        │ .5505(5)  │ .1605(11) │ Uani      │ 1.00000   │
│ 2   │ .033(3)        │ .4009(5)  │ .2290(11) │ Uani      │ 1.00000   │
│ 3   │ .043(4)        │ .2501(5)  │ .6014(13) │ Uani      │ 1.00000   │
│ 4   │ .029(4)        │ .4170(7)  │ .4954(15) │ Uani      │ 1.00000   │
│ 5   │ .031(5)        │ .3145(7)  │ .6425(16) │ Uani      │ 1.00000   │
│ 6   │ .040(5)        │ .2789(8)  │ .8378(17) │ Uani      │ 1.00000   │
│ 7   │ .045(6)        │ .3417(9)  │ .8859(18) │ Uani      │ 1.00000   │
│ 8   │ .045(6)        │ .4445(9)  │ .7425(18) │ Uani      │ 1.00000   │
│ 9   │ .038(5)        │ .4797(8)  │ .5487(17) │ Uani      │ 1.00000   │
│ 10  │ .029(4)        │ .4549(7)  │ .2873(16) │ Uani      │ 1.00000   │

getindex for CIF categories uses the indexing value as the key value for looking up a row in the category:

one_row = as["o1"]
one_row.fract_x

# output

".5505(5)"

If a category key consists multiple data names, a Dict{Symbol,V} should be provided as the indexing value, where Symbol is the object_id of the particular data name forming part of the key and V is the type of the values.

A category can be iterated over as usual, with the value of each dataname for each row available as a property:

for one_row in as
    println("$(one_row.label) $(one_row.fract_x) $(one_row.fract_y) $(one_row.fract_z)")
end

# output

o1 .5505(5) .6374(5) .1605(11)
o2 .4009(5) .5162(5) .2290(11)
o3 .2501(5) .5707(5) .6014(13)
c1 .4170(7) .6930(8) .4954(15)
c2 .3145(7) .6704(8) .6425(16)
c3 .2789(8) .7488(8) .8378(17)
c4 .3417(9) .8529(8) .8859(18)
c5 .4445(9) .8778(9) .7425(18)
c6 .4797(8) .7975(8) .5487(17)
c7 .4549(7) .6092(7) .2873(16)

If you prefer the DataFrame tools for working with tables, DataFrame(c::CifCategory) creates a DataFrame:

DataFrame(as)

# output

10×7 DataFrame. Omitted printing of 2 columns
│ Row │ u_iso_or_equiv │ fract_x   │ fract_z   │ adp_type  │ occupancy │
│     │ Cif Value…?    │ Cif Val…? │ Cif Val…? │ Cif Val…? │ Cif Val…? │
├─────┼────────────────┼───────────┼───────────┼───────────┼───────────┤
│ 1   │ .035(3)        │ .5505(5)  │ .1605(11) │ Uani      │ 1.00000   │
│ 2   │ .033(3)        │ .4009(5)  │ .2290(11) │ Uani      │ 1.00000   │
│ 3   │ .043(4)        │ .2501(5)  │ .6014(13) │ Uani      │ 1.00000   │
│ 4   │ .029(4)        │ .4170(7)  │ .4954(15) │ Uani      │ 1.00000   │
│ 5   │ .031(5)        │ .3145(7)  │ .6425(16) │ Uani      │ 1.00000   │
│ 6   │ .040(5)        │ .2789(8)  │ .8378(17) │ Uani      │ 1.00000   │
│ 7   │ .045(6)        │ .3417(9)  │ .8859(18) │ Uani      │ 1.00000   │
│ 8   │ .045(6)        │ .4445(9)  │ .7425(18) │ Uani      │ 1.00000   │
│ 9   │ .038(5)        │ .4797(8)  │ .5487(17) │ Uani      │ 1.00000   │
│ 10  │ .029(4)        │ .4549(7)  │ .2873(16) │ Uani      │ 1.00000   │