Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to SBC in KFF (incl. HashPlace) #368

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ukengineer925
Copy link
Collaborator

Removes complexity within HashPlace/KffDataObjectHandler to just use the SBC path. Adds additional tests to confirm behaviour and functionality.

@ukengineer925 ukengineer925 added the needs testing This change may necessitate further testing label Dec 12, 2022
@cfkoehler
Copy link
Collaborator

We are going to need to see the performance impact of this at full scale. My gut feeling is that using a SBC here will be wasteful and impact throughput.

@ukengineer925 ukengineer925 force-pushed the remove-byte-array-path-from-hash-place branch from 5187a60 to 4a1693e Compare December 13, 2022 16:06
@jdcove2
Copy link
Collaborator

jdcove2 commented Dec 13, 2022

PR#204 ("Large data support") only calculated the hashes using the SeekableByteChannel. PR#204 spent months being tested on the development system with no observable performance impact by either myself or anyone else on the team. It was not until PR#362 ("KFFHashPlace should prefer byte[] to SeekableByteChannelFactory") that this behavior was changed to calculating hashes using the byte[]. Therefore, I believe the performance impact of exclusively calculating the hashes using the SeekableByteChannel has already been shown to be negligible.

@ukengineer925 ukengineer925 force-pushed the remove-byte-array-path-from-hash-place branch from 4a1693e to c8616a9 Compare December 13, 2022 17:01
@cfkoehler
Copy link
Collaborator

cfkoehler commented Dec 13, 2022

@jdcove2 I hear your points but it has not been tested at full scale. You quote no observable performance differences. Do you have any data to back up those claims?

But as a side note. Emissary is a configuration-driven framework. Why should users of emissary be forced to use SBC if they don't want to?

I will defer to senior devs to make the decisions on this going forward.

Copy link
Collaborator

@drivenflywheel drivenflywheel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple minor changes suggested. None are blockers.

drivenflywheel
drivenflywheel previously approved these changes Dec 13, 2022
@jdcove2
Copy link
Collaborator

jdcove2 commented Dec 13, 2022

Given drivenflywheel's comment in PR#362 that "PRs work great for discussion, and they leave a much better historical record than a verbal discussion", it seems that this is the appropriate place to discuss your above concerns.

Therefore, I have the following questions...

  1. If testing on the non-production system is insufficient to be "tested at full scale", then what is sufficient?
  2. If BDO's can now hold data that is significantly larger than what a Java byte array can hold using a SeekableByteChannel, then why would we want to keep the byte array interface on the BDO knowing that it might return only a small subset of the actual data?

@cfkoehler
Copy link
Collaborator

Thanks for the extra details Dave. As for the full scale I personally would have liked to see a statistical analysis of the performance on a full scale system. That can be interpolated from a smaller system with a different resource profile.

But it appears that the team is ok accepting this how it is. And not allowing for it to be configurable so I will accept that moving forward.

@jdcove2
Copy link
Collaborator

jdcove2 commented Dec 14, 2022

From my questions above...

  1. It sounds like you have specific performance information you would like to see. I can work with you to produce this performance information. Based on what you said above, please let me know what metrics you want collected, what statistical analysis you want performed and what system you would like the metrics gathered on.

  2. In addition to what I said in my question 2 above, every data representation (e.g. byte array, SeekableByteChannel, InputStream, etc.) on the BDO will need to be supported by every place that uses BDO data. That is, if the BDO supports more than one data representation then each place would need to determine which data representation is being used and then process that data representation appropriately. This is why I understand the goal to be staying with one data representation by migrating from byte array to SeekableByteChannel. Eventually the byte array data methods will be removed from the BDO.

  3. The current code in this PR is configurable only because the BDO currently supports both byte array and SeekableByteChannel data representations during the transition from byte array to SeekableByteChannel. All code that uses byte arrays from the BDO will need to be converted to using SeekableByteChannels from the BDO. This PR addresses the code that manages hashes on the BDO and it has code for both data representations. Therefore, this PR is just eliminating the byte array code since the byte array data representation will be eventually removed from the BDO.

@ukengineer925 ukengineer925 force-pushed the remove-byte-array-path-from-hash-place branch 2 times, most recently from c4af020 to 521ae39 Compare December 20, 2022 10:33
@ukengineer925 ukengineer925 force-pushed the remove-byte-array-path-from-hash-place branch from 521ae39 to 7b695da Compare January 3, 2023 15:51
@ukengineer925 ukengineer925 force-pushed the remove-byte-array-path-from-hash-place branch from 7b695da to 9ec9a5f Compare January 18, 2023 19:49
@ukengineer925 ukengineer925 force-pushed the remove-byte-array-path-from-hash-place branch 3 times, most recently from 944d558 to ef575a3 Compare February 3, 2023 17:22
@ukengineer925 ukengineer925 force-pushed the remove-byte-array-path-from-hash-place branch from ef575a3 to ea8a568 Compare March 3, 2023 18:14
@ukengineer925 ukengineer925 force-pushed the remove-byte-array-path-from-hash-place branch from ea8a568 to e25a082 Compare March 6, 2023 18:12
@jdcove2 jdcove2 added tested This change was tested out-of-band and removed needs testing This change may necessitate further testing labels Mar 15, 2023
@cfkoehler
Copy link
Collaborator

@ukengineer925 @jdcove2
I think we need more discussions on this approach. Maybe it will be discussed at our meeting on Friday.
My main concern is the removal of being able to use KFF without sbc.
We should not force all emissary users to use sbc.
I know of use cases where even if you get a large file the hash from a truncated input is useful.
Or we might have flavors of emissary that by design never get large files so why should they have to use sbc in places that don't need them.

This might require some rollback of previous work but a more acceptable and easier to integrate approach would be to have a new KFF hash place or handler that uses sbc.
Then a running server can be configured to use one or the other. Or maybe in the world of testing or a unique use case use both. The framework is very configurable for places and routing. We should leverage it more.

And maybe something changes later in time and it is realized that the old non sbc KFF hash place is never used and not needed then it can go through the deprecation and removal process.

@ukengineer925
Copy link
Collaborator Author

@cfkoehler - I've added a convenience method to allow users to pass in a byte array of data if they really don't want to create their own SBCF (though as you'll see, there's a neat helper method to do that). I don't think maintaining two copies of KffDataObjectHandler, or two paths within one class, is optimal hence this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tested This change was tested out-of-band
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants