Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rdfile reader #942

Open
wants to merge 48 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
6831b03
added first draft of RdfileReader, RdfileRecord and a test class Rdfi…
uli-f Nov 9, 2022
a2d0f9b
re-fined implementation of RdfileReader and RdfileRecord; started add…
uli-f Nov 10, 2022
ede9a39
added more tests to RdfileReaderTest
uli-f Nov 14, 2022
c35e013
refactored RdfileReader, added more tests to RdfileReaderTest
uli-f Nov 15, 2022
5ab32d7
removed package-info.java; added missing RDfile used in RdfileReaderTest
uli-f Nov 15, 2022
57654e5
re-worked test method to be able to deal with a list of records
uli-f Nov 17, 2022
28f175f
implemented skip logic for broken records (pun intended) together wit…
uli-f Nov 17, 2022
4e5882e
implemented skip logic for broken records (pun intended) together wit…
uli-f Nov 19, 2022
10c5907
added test cases to BasicMoleculeHashGeneratorTest; added javadoc to …
uli-f Nov 30, 2022
733bef1
amended test cases in RdfileReaderTest
uli-f Nov 30, 2022
bbec01e
Merge pull request #940 from uli-f/hashgenerator_tests
johnmay Nov 30, 2022
3c5f8e8
added first draft of RdfileReader, RdfileRecord and a test class Rdfi…
uli-f Nov 9, 2022
e31b874
re-fined implementation of RdfileReader and RdfileRecord; started add…
uli-f Nov 10, 2022
8c5658d
added more tests to RdfileReaderTest
uli-f Nov 14, 2022
99320a6
refactored RdfileReader, added more tests to RdfileReaderTest
uli-f Nov 15, 2022
7eb2b31
removed package-info.java; added missing RDfile used in RdfileReaderTest
uli-f Nov 15, 2022
6095454
re-worked test method to be able to deal with a list of records
uli-f Nov 17, 2022
643dd72
implemented skip logic for broken records (pun intended) together wit…
uli-f Nov 17, 2022
baabff2
implemented skip logic for broken records (pun intended) together wit…
uli-f Nov 19, 2022
b8e071a
amended test cases in RdfileReaderTest
uli-f Nov 30, 2022
cc9d3f1
Merge remote-tracking branch 'origin/rdfile_reader' into rdfile_reader
uli-f Dec 1, 2022
feeb1cd
fixed spelling of test input file
uli-f Dec 1, 2022
7c8d39e
Support non-sequential atom index ins MDL V3000 inputs, fixes #943.
johnmay Dec 1, 2022
bfdecb4
added copyright/license header
uli-f Dec 2, 2022
17cbaf1
added some basic class level documentation to RdfileReader; now checking
uli-f Dec 2, 2022
80a594d
Newer XOM version
egonw Dec 2, 2022
b42e3d8
Removed the Xerces and Xalan dependencies
egonw Dec 2, 2022
639cf83
CMLXOM 4.4
egonw Dec 2, 2022
3bafacf
Merge pull request #945 from egonw/minus/xerces
johnmay Dec 3, 2022
57ca227
Set environment as SonarCloud
johnmay Dec 5, 2022
dfbc3fc
added first draft of RdfileReader, RdfileRecord and a test class Rdfi…
uli-f Nov 9, 2022
707a62a
re-fined implementation of RdfileReader and RdfileRecord; started add…
uli-f Nov 10, 2022
caf5195
added more tests to RdfileReaderTest
uli-f Nov 14, 2022
9d0f9c7
refactored RdfileReader, added more tests to RdfileReaderTest
uli-f Nov 15, 2022
40c2d89
removed package-info.java; added missing RDfile used in RdfileReaderTest
uli-f Nov 15, 2022
b365a35
re-worked test method to be able to deal with a list of records
uli-f Nov 17, 2022
3adf55c
implemented skip logic for broken records (pun intended) together wit…
uli-f Nov 17, 2022
ddc6105
implemented skip logic for broken records (pun intended) together wit…
uli-f Nov 19, 2022
3ad573a
amended test cases in RdfileReaderTest
uli-f Nov 30, 2022
b1855a2
re-based on main
uli-f Nov 9, 2022
718d8af
re-fined implementation of RdfileReader and RdfileRecord; started add…
uli-f Nov 10, 2022
9d276ca
refactored RdfileReader, added more tests to RdfileReaderTest
uli-f Nov 15, 2022
56470e8
removed package-info.java; added missing RDfile used in RdfileReaderTest
uli-f Nov 15, 2022
e47630a
fixed spelling of test input file
uli-f Dec 1, 2022
2a82ea8
added copyright/license header
uli-f Dec 2, 2022
b09b4c8
added some basic class level documentation to RdfileReader; now checking
uli-f Dec 2, 2022
f130642
fixed javadoc error in RdfileReader
uli-f Dec 5, 2022
10eefd7
Merge remote-tracking branch 'origin/rdfile_reader' into rdfile_reader
uli-f Dec 5, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/maven.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ jobs:
build-sonarcloud:
name: Build sonarcloud
runs-on: ubuntu-latest
environment: SonarCloud
steps:
- uses: actions/checkout@v3
with:
Expand Down
10 changes: 10 additions & 0 deletions base/dict/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,16 @@
<dependency>
<groupId>xom</groupId>
<artifactId>xom</artifactId>
<exclusions>
<exclusion>
<groupId>xerces</groupId>
<artifactId>xercesImpl</artifactId>
</exclusion>
<exclusion>
<groupId>xalan</groupId>
<artifactId>xalan</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>${project.groupId}</groupId>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,13 @@

/**
* A hash function which generates a single 64-bit hash code for a molecule.
*
* <p>
* Implementations of this interface are encouraged to be conservative about
* introducing any changes to the generated hash values and to mention any
* such changes in the release notes. However, please take into consideration
* that the hash values generated by this class are <b>not</b> guaranteed to
* remain unaltered over time.
* </p>
* @author John May
* @cdk.module interfaces
* @cdk.githash
Expand Down
12 changes: 11 additions & 1 deletion descriptor/qsarcml/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,21 @@
<dependency>
<groupId>xom</groupId>
<artifactId>xom</artifactId>
<exclusions>
<exclusion>
<groupId>xerces</groupId>
<artifactId>xercesImpl</artifactId>
</exclusion>
<exclusion>
<groupId>xalan</groupId>
<artifactId>xalan</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.blueobelisk</groupId>
<artifactId>cmlxom</artifactId>
<version>4.3</version>
<version>4.4</version>
<exclusions>
<exclusion>
<groupId>org.apache.logging.log4j</groupId>
Expand Down
10 changes: 10 additions & 0 deletions descriptor/qsarmolecular/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,16 @@
<dependency>
<groupId>xom</groupId>
<artifactId>xom</artifactId>
<exclusions>
<exclusion>
<groupId>xerces</groupId>
<artifactId>xercesImpl</artifactId>
</exclusion>
<exclusion>
<groupId>xalan</groupId>
<artifactId>xalan</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.junit.jupiter</groupId>
Expand Down
9 changes: 9 additions & 0 deletions doc/refs/cheminf.bibx
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,15 @@
</bibtex:article>
</bibtex:entry>

<bibtex:entry id="Dassault20">
<bibtex:techreport>
<bibtex:title>CTFile Formats Biovia Databases 2020</bibtex:title>
<bibtex:institution>Dassault Systèmes</bibtex:institution>
<bibtex:year>2020</bibtex:year>
<bibtex:url>https://discover.3ds.com/sites/default/files/2020-08/biovia_ctfileformats_2020.pdf</bibtex:url>
</bibtex:techreport>
</bibtex:entry>

<bibtex:entry id="BLE91">
<bibtex:article>
<bibtex:author>Bley, K. and Brandt, J. and Dengler, A. and
Expand Down
2 changes: 1 addition & 1 deletion pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -518,7 +518,7 @@
<dependency>
<groupId>xom</groupId>
<artifactId>xom</artifactId>
<version>1.3.7</version>
<version>1.3.8</version>
</dependency>
<dependency>
<groupId>org.apache.felix</groupId>
Expand Down
119 changes: 119 additions & 0 deletions storage/ctab/src/main/java/org/openscience/cdk/io/CharIter.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
/*
* Copyright (C) 2022 NextMove Software
* 2022 John Mayfield
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
* License as published by the Free Software Foundation; either
* version 2.1 of the License, or (at your option) any later version.
*
* This library is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* Lesser General Public License for more details.
*
* You should have received a copy of the GNU Lesser General Public
* License along with this library; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
*/
package org.openscience.cdk.io;

/**
* Helper class that facilitates the parsing of text data.
* @author John Mayfield
*/
final class CharIter {
uli-f marked this conversation as resolved.
Show resolved Hide resolved
private final String string;
private int position = 0;

CharIter(String str) {
this.string = str;
}

static boolean isSpace(char c) {
uli-f marked this conversation as resolved.
Show resolved Hide resolved
return c == ' ';
}

static boolean isDigit(char c) {
return c >= '0' && c <= '9';
}

int position() {
return position;
}

char next() {
return string.charAt(position++);
}

char peek() {
return position < string.length() ? string.charAt(position) : '\0';
}

boolean hasNext() {
return position < string.length();
}

String rest() {
return string.substring(position);
}

void skipWhiteSpace() {
while (hasNext()) {
if (isSpace(string.charAt(position)))
position++;
else
break;
}
}

int nextUnsignedNumber() {
if (!hasNext())
return -1;
if (!isDigit(peek()))
return -1;
int num = next() - '0';
while (hasNext() && isDigit(peek()))
num = (10 * num) + (next() - '0');
return num;
}

boolean consume(String substring) {
if (position + substring.length() > string.length())
return false;
int mark = position;
for (int i = 0; i < substring.length(); i++) {
if (substring.charAt(i) != string.charAt(position)) {
position = mark; // reset
break;
}
position++;
}
return position - mark == substring.length();
}

String substring(int beg, int end) {
return string.substring(beg, end);
}

void seek(int position) {
this.position = position;
}

boolean nextIf(char c) {
if (peek() == c) {
next();
return true;
}
return false;
}

@Override
public String toString() {
StringBuilder sb = new StringBuilder();
sb.append(string, 0, position);
sb.append('|');
sb.append(string.substring(position));
return sb.toString();
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@
import java.io.Reader;
import java.io.StringReader;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.Hashtable;
import java.util.Iterator;
Expand Down Expand Up @@ -160,6 +161,38 @@ private static final class ReadState {
boolean chiral;
Map<Integer,Integer> stereoflags = null;
final Map<IAtom,Integer> stereo0d = new HashMap<>();

// atom/bond ids need not be sequential, we could use map but more
// common is the ids will be sequential
IAtom[] atomById = new IAtom[64];
IBond[] bondById = new IBond[64];

<T> T[] grow(T[] arr, int req)
{
int cap = arr.length;
return Arrays.copyOf(arr, Math.max(cap + cap >> 1,
req + 1));
}

void addAtom(int id, IAtom atom) {
if (id >= atomById.length)
atomById = grow(atomById, id);
atomById[id] = atom;
}

void addBond(int id, IBond bond) {
if (id >= bondById.length)
bondById = grow(bondById, id);
bondById[id] = bond;
}

public IAtom getAtom(int i) {
return atomById[i];
}

public IBond getBond(int i) {
return bondById[i];
}
}

public IAtomContainer readMolecule(IChemObjectBuilder builder) throws CDKException {
Expand Down Expand Up @@ -190,9 +223,9 @@ public IAtomContainer readConnectionTable(IChemObjectBuilder builder) throws CDK
} else if ("BEGIN ATOM".equals(command)) {
readAtomBlock(state);
} else if ("BEGIN BOND".equals(command)) {
readBondBlock(readData);
readBondBlock(state);
} else if ("BEGIN SGROUP".equals(command)) {
readSGroup(readData);
readSGroup(state);
} else if ("BEGIN COLLECTION".equals(command)) {
readCollection(state);
} else {
Expand Down Expand Up @@ -277,9 +310,9 @@ private void finalizeStereochemistry(ReadState state, IAtomContainer readData) {
if (se.getConfigClass() != IStereoElement.TH)
continue;
IAtom focus = (IAtom) se.getFocus();
int idx = readData.indexOf(focus);
if (idx < 0)
if (focus.getID() == null)
continue;
int idx = Integer.parseInt(focus.getID());
Integer grpinfo = state.stereoflags.get(idx);
if (grpinfo != null)
se.setGroupInfo(grpinfo);
Expand Down Expand Up @@ -372,7 +405,7 @@ private void parseStereoGroup(Map<Integer,Integer> flags, String str, int type)
}
// val-1 since we store atom index instead of atom number
if (val > 0)
flags.put(val-1, type);
flags.put(val, type);
while (i < len && str.charAt(i) == ' ')
i++;
if (i < len && str.charAt(i) == ')')
Expand Down Expand Up @@ -455,6 +488,7 @@ public void readAtomBlock(ReadState state) throws CDKException {

int RGroupCounter = 1;
int Rnumber;
String id;
String[] rGroup;

boolean foundEND = false;
Expand All @@ -469,7 +503,7 @@ public void readAtomBlock(ReadState state) throws CDKException {
StringTokenizer tokenizer = new StringTokenizer(command);
// parse the index
try {
atom.setID(tokenizer.nextToken());
id = tokenizer.nextToken();
} catch (Exception exception) {
String error = "Error while parsing atom index";
logger.error(error);
Expand Down Expand Up @@ -607,7 +641,9 @@ public void readAtomBlock(ReadState state) throws CDKException {
}

// store atom
atom.setID(id);
readData.addAtom(atom);
state.addAtom(Integer.parseInt(id), readData.getAtom(readData.getAtomCount()-1));
logger.debug("Added atom: " + atom);
}
}
Expand All @@ -616,7 +652,8 @@ public void readAtomBlock(ReadState state) throws CDKException {
/**
* Reads the bond atoms, order and stereo configuration.
*/
public void readBondBlock(IAtomContainer readData) throws CDKException {
public void readBondBlock(ReadState state) throws CDKException {
IAtomContainer readData = state.mol;
logger.info("Reading BOND block");
boolean foundEND = false;
while (isReady() && !foundEND) {
Expand Down Expand Up @@ -657,7 +694,7 @@ public void readBondBlock(IAtomContainer readData) throws CDKException {
try {
String indexAtom1String = tokenizer.nextToken();
int indexAtom1 = Integer.parseInt(indexAtom1String);
IAtom atom1 = readData.getAtom(indexAtom1 - 1);
IAtom atom1 = state.getAtom(indexAtom1);
bond.setAtom(atom1, 0);
} catch (Exception exception) {
String error = "Error while parsing index atom 1 in bond";
Expand All @@ -669,7 +706,7 @@ public void readBondBlock(IAtomContainer readData) throws CDKException {
try {
String indexAtom2String = tokenizer.nextToken();
int indexAtom2 = Integer.parseInt(indexAtom2String);
IAtom atom2 = readData.getAtom(indexAtom2 - 1);
IAtom atom2 = state.getAtom(indexAtom2);
bond.setAtom(atom2, 1);
} catch (Exception exception) {
String error = "Error while parsing index atom 2 in bond";
Expand Down Expand Up @@ -726,6 +763,8 @@ public void readBondBlock(IAtomContainer readData) throws CDKException {

// storing bond
readData.addBond(bond);
state.addBond(Integer.parseInt(bond.getID()),
readData.getBond(readData.getBondCount()-1));

// storing positional variation
if ("ANY".equals(attach)) {
Expand All @@ -750,7 +789,8 @@ public void readBondBlock(IAtomContainer readData) throws CDKException {
/**
* Reads labels.
*/
public void readSGroup(IAtomContainer readData) throws CDKException {
public void readSGroup(ReadState state) throws CDKException {
IAtomContainer readData = state.mol;
boolean foundEND = false;
while (isReady() && !foundEND) {
String command = readCommand(readLine());
Expand Down Expand Up @@ -788,13 +828,13 @@ public void readSGroup(IAtomContainer readData) throws CDKException {
StringTokenizer atomsTokenizer = new StringTokenizer(value);
int nExpected = Integer.parseInt(atomsTokenizer.nextToken());
while (atomsTokenizer.hasMoreTokens()) {
sgroup.addAtom(readData.getAtom(Integer.parseInt(atomsTokenizer.nextToken()) - 1));
sgroup.addAtom(state.getAtom(Integer.parseInt(atomsTokenizer.nextToken())));
}
} else if (key.equals("XBONDS")) {
StringTokenizer xbonds = new StringTokenizer(value);
int nExpected = Integer.parseInt(xbonds.nextToken());
while (xbonds.hasMoreTokens()) {
sgroup.addBond(readData.getBond(Integer.parseInt(xbonds.nextToken()) - 1));
sgroup.addBond(state.getBond(Integer.parseInt(xbonds.nextToken())));
}
} else if (key.equals("LABEL")) {
label = value;
Expand Down