-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aggregate a plain non-synthetic dataset for Bio sequences #91
Labels
good first issue
Good for newcomers
Comments
ashvardanian
added a commit
that referenced
this issue
Feb 13, 2024
Requesting more dataset contributions #91
ashvardanian
pushed a commit
that referenced
this issue
Feb 15, 2024
# [3.1.0](v3.0.0...v3.1.0) (2024-02-15) ### Add * `sz_isascii` and UTF8 Levenshtein distance ([a0962fb](a0962fb)) * 32-bit support with CPython ([253a3c1](253a3c1)) * Big-endian support ([b126fab](b126fab)) * Levenshtein & NW score for Rust (#89) ([663a633](663a633)), closes [#89](#89) * Macro SZ_NULL_CHAR, Clang-CL instrinsics. (#88) ([dee90bb](dee90bb)), closes [#88](#88) * serial clz/ctz for Win32 ([c968337](c968337)) ### Docs * sectioning contribution guide ([cf6ced0](cf6ced0)), closes [#91](#91) ### Fix * Clamping bounded Levenshtein ([69892fb](69892fb)) * Memory leak in macro ([c88a72a](c88a72a)) ### Improve * Port to `arm32v7` 32-bit arch ([4acf3b7](4acf3b7)) ### Make * `cibuildwheel.overrides` over custom scripts ([6d8c586](6d8c586)) * Clear root directory ([7497c96](7497c96)) * Constrain workflow names ([079f111](079f111)) * Disable a;; CI versioning ([a55d227](a55d227)) * Drop NumPy dependency ([c56239e](c56239e)) * Fix implicit `malloc` declaration ([f7761be](f7761be)) * Infer big-endian in CMake/setup.py ([72453c6](72453c6)) * Keywords for crates.io ([8d237a6](8d237a6)) * Overwrite packs with same name ([0642318](0642318)) * Packing CIBuildWheels for all archs ([49bee70](49bee70)) * Parallel wheels compilation ([0f5a946](0f5a946)) * Upgrade GitHub CI ([cd424ca](cd424ca)) * Upgrade Python CI ([4f1bf43](4f1bf43)) * Use QEMU for Linux wheels ([ac4556a](ac4556a))
ashvardanian
changed the title
Aggregate a plain non-synthetic dataset for protein sequences
Aggregate a plain non-synthetic dataset for Bio sequences
Apr 27, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
For fair benchmarks of Needleman-Wunsch scoring algorithms we should find a real-world protein bank and ideally export it into a whitespace or newline delimited
.txt
file, that will be easy to parse not only in Python, but also in C++. Community contributions more than welcome 🤗The text was updated successfully, but these errors were encountered: