Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate expansion support #419

Merged
merged 9 commits into from
May 13, 2024

Commits on Mar 19, 2024

  1. Configuration menu
    Copy the full SHA
    77860bb View commit details
    Browse the repository at this point in the history
  2. Add default shard number

    ljarosch committed Mar 19, 2024
    Configuration menu
    Copy the full SHA
    e678050 View commit details
    Browse the repository at this point in the history

Commits on Mar 20, 2024

  1. Add duplicate chain file support to alignment DB script

    This makes it more straightforward to create an alignment database directly from the flattened RODA downloads
    ljarosch committed Mar 20, 2024
    Configuration menu
    Copy the full SHA
    ee0c5db View commit details
    Browse the repository at this point in the history
  2. Add script for expanding the alignment dir with duplicates

    This adds support for duplicate chain expansion for the alignment dir format. This script can be run on the flattened non-redundant RODA alignments to add explicit directories for all of the duplicate chains in the duplicate_chains file, symlinked to their representative chain alignment directory.
    ljarosch committed Mar 20, 2024
    Configuration menu
    Copy the full SHA
    94819bf View commit details
    Browse the repository at this point in the history

Commits on May 6, 2024

  1. Add more efficient script to generate all-seqs FASTA

    The previous data_dir_to_fasta.py script is very slow and requires fully reparsing mmCIF files. This new script is much faster and uses the sequence information from the alignment data instead. Note that this will not include chains for which alignments could not be generated, but we can't use those during training anyways.
    ljarosch committed May 6, 2024
    Configuration menu
    Copy the full SHA
    e2479cb View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    0b5c949 View commit details
    Browse the repository at this point in the history
  3. Slightly improve comment

    ljarosch committed May 6, 2024
    Configuration menu
    Copy the full SHA
    244970b View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    78b9706 View commit details
    Browse the repository at this point in the history
  5. Improve import formatting

    ljarosch committed May 6, 2024
    Configuration menu
    Copy the full SHA
    04410d5 View commit details
    Browse the repository at this point in the history