Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Chunk Operator #1583

Closed
wants to merge 2 commits into from
Closed

Conversation

JasonPaulGithub
Copy link
Contributor

@JasonPaulGithub JasonPaulGithub commented Apr 8, 2024

Fix for Issue : #1487

I have created a Chunk Operator, aka Chop

  • Chop will chunk the file and return the following fields:
    head_chunk_address: ChunkAddress,
    data_map: Chunk,
    file_size: u64,
    data_map_in_chunks: Vec<(XorName, PathBuf)>,
  • Chop will then encrypt data based on the file size. As you will have three separate outcomes requiring the same generic operation, this is handled using basic polymorphism.

  • Chop will remove the need for the boolean flag 'data_map_in_chunks', and instead simply use either: build() or build_with_data_map_in_chunks()

  • Chop code is encapsulated inside its own .rs file and has been removed from the FilesApi code.

  • Chop makes use of the builder design pattern.

@JasonPaulGithub JasonPaulGithub self-assigned this Apr 8, 2024
@JasonPaulGithub JasonPaulGithub added Enhancement New feature or request Help Needed Extra attention is needed Medium Medium sized PR Tests Refactor labels Apr 8, 2024
@JasonPaulGithub JasonPaulGithub marked this pull request as ready for review April 9, 2024 01:17
@JasonPaulGithub JasonPaulGithub removed the Help Needed Extra attention is needed label Apr 9, 2024
@JasonPaulGithub
Copy link
Contributor Author

JasonPaulGithub commented Apr 9, 2024

Probably too late for this PR but if you are working more in this area, is there a chance you can add the file path (not just name as it is now) to the ChunkManager uploaded files tracking? More in this issue #1576

I want to use the ChunkManager and to be able to access the xor address and full path of each uploaded file, not just the filenames.

#1528 (comment)

@RolandSherwin

Comment on lines +24 to +26
file_size: u64,
data_map_chunk: Chunk,
chunk_vec: Vec<(XorName, PathBuf)>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we store the file_size, data_map_chunk and chunk_vec here? During new() they get default values and they're directly copied into ChunkOperator after chunking.

Comment on lines +13 to +18
pub struct ChunkOperator {
pub head_chunk_address: ChunkAddress,
pub data_map: Chunk,
pub file_size: u64,
pub chunk_vec: Vec<(XorName, PathBuf)>,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the result of the BuildChunkOperator? If so, can we state that in the docs? I think the docs is misleading. Maybe ChunkOperatorResult would work here? We use the convention in our codebase.


output_file
.write_all(&self.data_map_chunk.value)
.expect("error writing all");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might have to return proper errors here. And cannot use expect which can panic.

.chunk_dir
.join(hex::encode(*self.data_map_chunk.name()));
let mut output_file =
File::create(data_map_path.clone()).expect("error creating output file");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot use expect here as it can panic. We should reutrn proper error types.

chunks_paths
}

pub fn build(self, file_path: &Path, chunk_dir: &Path) -> ChunkOperator {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the code here, I think the rust builder pattern is not used here. We do not configure a lot of things and instead the 3 required args are passed via the build fn and with the use of two build_* fns. Why should we go with builder here instead of having something like ChunkOperator with execute(file_path, chunk_dir) / execute_with_data_map_in_chunks(file_path, chunk_dir)? I

@JasonPaulGithub JasonPaulGithub deleted the chop branch April 18, 2024 12:58
@JasonPaulGithub JasonPaulGithub removed their assignment May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement New feature or request Medium Medium sized PR Refactor Tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants