Skip to content

Parse the CIF file format for Protein Data Bank (PDB) data.

License

Notifications You must be signed in to change notification settings

EliotJones/BioCif

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bio CIF

BioCif is a small C# library designed to parse the Crystallographic Information File (CIF) format, the standard for information interchange in crystallography. It is designed to be fast and easy-to-use.

It provides access to both Tokenization and Parsing of CIF formats for both version 1.1 and version 2.0 as well as convenience wrappers for an API for the Protein Data Bank (PDB) data. The PDB hosts CIF format data (PDBx/mmCIF - Macro-molecular CIF) for protein structure.

Usage

To access the raw stream of tokens:

using BioCif.Core.Tokenization;
using BioCif.Core.Tokens;

using (var fileStream = File.Open(@"C:\path\to\data.cif"))
using (var streamReader = new StreamReader(fileStream))
{
    foreach (Token token in CifTokenizer.Tokenize(streamReader))
    {
        Console.WriteLine(token.TokenType);
    }
}

To access the parsed CIF structure:

using (var fileStream = File.Open(@"C:\path\to\data.cif"))
{
    Cif cif = CifParser.Parse(fileStream);

    DataBlock block = cif.DataBlocks[0];
    Console.WriteLine($"Block name: {block.Name}");

    foreach (IDataBlockMember member in block.Members)
    {
        // ...
    }
}

To access a parsed PDBx/mmCIF:

Pdbx pdbx = PdbxParser.ParseFile(@"C:\path\to\mypdbx.cif");
PdbxDataBlock block = pdbx.First;
List<AuditAuthor> auditAuthors = block.AuditAuthors;

Notes

Defined terms from the CIF specification:

  • data file - information relating to an experiment
  • dictionary file - contains information about data names
  • data name (AKA Tag): identifies the content of a data value
  • data value: string representing a value of any type.
  • data item: data name + data value

Notes on structures within a CIF file:

data block : highest level of cif file
  data_<block name>
  [data items or save frames]

save frame: partitionaed collection of data items
  save_<frame code>
  [data items]
  save_   # Terminates the save frame
  ^ only used in dictionary files

Useful Links

Status

Early stage/incomplete/unmaintained.