Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrape FIPS algorithm data #276

Open
J08nY opened this issue Oct 26, 2022 · 2 comments
Open

Scrape FIPS algorithm data #276

J08nY opened this issue Oct 26, 2022 · 2 comments
Assignees
Labels
enhancement New feature or request fips Related to FIPS 140 certification

Comments

@J08nY
Copy link
Member

J08nY commented Oct 26, 2022

Initial description by @J08nY

Data from the FIPS algorithm dataset is not utilized and mined fully. We can follow the links to the algorithm page and get more data that will help us. This can help in cert id cleanup to get rid of the algo references.

Details

Currently, the FIPSAlgorithm object is built from rows of a pandas DataFrame constructed merely from the list of Algorithms, see below

df = pd.read_html(html_path)[0]

This table does not include valuable attributes found on the individual pages of the algorithm. The proposed enhancement should:

class FIPSAlgorithm(PandasSerializableType, ComplexSerializableType):

Further guidance

One can isolate the pipeline stage that processes the algorithm dataset simply by

from sec_certs.dataset.fips_algorithm import FIPSAlgorithmDataset

alg_dset = FIPSAlgorithmDataset.from_web()
alg_dset.to_json("/path/to/some/file.json")

The PR implementing this enhancement should modify the parse_algorithms_from_html method.

@J08nY J08nY added the enhancement New feature or request label Oct 26, 2022
@J08nY J08nY self-assigned this Oct 27, 2022
@J08nY J08nY added the fips Related to FIPS 140 certification label Nov 9, 2022
@adamjanovsky
Copy link
Collaborator

Just FYI, the current state of #275 is that I've refactored building of a dataset of FIPS Algorithms. The certificate only store strings of the algorithm identifiers and are nowhere connected to the respecitve objects. So once we improve algorithm scraping, we could connect these two datasets as well.

@adamjanovsky
Copy link
Collaborator

adamjanovsky commented May 2, 2024

@Julik24 this is the task that we've discussed today. Before attempting to contribute, please be sure to go through https://sec-certs.org/docs/contributing.html, especially the Quality Assurance section. The typical development workflow is described at https://docs.github.com/en/get-started/using-github/github-flow.

Please, assign yourself to the issue once you accept the invitation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request fips Related to FIPS 140 certification
Projects
None yet
Development

No branches or pull requests

3 participants