New Fingerprint
John May edited this page Feb 3, 2016
·
15 revisions
Improved API for the representation, storage, scoring, and indexing of fingerprints.
- Unified abstraction for representing both binary and frequency fingerprints
- Read/Write FPS format
- Efficient index implementation(s) allow users to build an index and search for a small datasets (<2GB: ~15 mil 1024 bit FPs).
- Efficient 'feature encoding' utilities allow hashing of selected atom/bond type info from different features. This will allow users to combined/build their own implementations whilst simplifying the defaults. Examples of features include:
- Path
- Radial
- Tree
- Ring
- Fingerprint Naming/Versioning
- Adapters to use old IFingerprinter implementations whilst in development
Fingerprints can be read and written to the FPS format. The format encodes binary fingerprints in base 16 (hexadecimal) and includes a title suffixed by a tab.
#FPS1
#num_bits=256
#software=RDKit/2009Q3_1
#type=RDKit-Fingerprint/1 minPath=1 maxPath=7 fpSize=256 nBitsPerHash=4 useHs=True
#source=/Users/dalke/databases/Compound_00000001_00025000.sdf.gz
#date=2010-01-27T02:22:26
fffeffbfb7fffedff7beefdbddf7ffffabff76cf6df7fcf6f7fffebf7d7ffd6f 1
fffeffbfb7fffedff7beefdbddf7ffffabff76cf6df7fcf6f7fffebf7d7ffd6f 2
ffffbfdfffffffffbfeffffffffffffffffffffffff77efffffffebfffffffef 3
00c02010002610000080800041100002084000440d100000c055048801224400 4
FPS round tripping.
try (FpsInput in = new FpsInput("input.fps");
FpsOutput out = new FpsOutput("output.fps")) {
out.writeHeader(in.getHeader()); // copy header
Fp fp = new BinaryFp(in.getFpLen());
while (in.read(fp)) {
out.write(fp);
}
}
The FPS header contains key value pairs, the following constants can be use to set the values.
- FpsInput.HeaderNumBits
- FpsInput.HeaderAromaticity
- FpsInput.HeaderType
- FpsInput.HeaderDate
- FpsInput.HeaderSoftware
- FpsInput.HeaderSource
Example of creating a header.
Map<String,String> header = new LinkedHashMap<>();
header.put(FpsInput.HeaderSource, "chembl_20.smi");
header.put(FpsInput.HeaderSoftware, "CDK");
header.put(FpsInput.HeaderNumBits, "1024");
FpEncoder encoder = new FpEncoder(mol);
IAtomContainer mol;
BinaryFp fp = new BinaryFp(1024);
encoder.encodePath(fp, lo, hi, atype, btype);
encoder.encodeTree(fp, lo, hi, atype, btype);
encoder.encodeRing(fp, lo, hi, atype, btype);
encoder.encodeCirc(fp, lo, hi, atype, btype);
FpSimIdx idx = ...;
BinaryFp qry = new BinaryFp(1024);