Proposal for Unified Pluggable Scanner API #164

prahaladdarkin · 2021-05-23T19:02:12Z

Scope of change:

Introduced the Unified Pluggable Scanner API proposal based on the discussions in the Harbor Scanner Workgroup.

Signed-off-by: prahaladd prahaladd@vmware.com

danielpacak

👋 @prahaladdarkin This is great start. I left a few comments / asks for clarification. Overall I'd focus more on onboarding new tools and keep API structures as close to v1.1 as possible without breaking compatibility.

Adding more examples of a hypothetical adapter that supports both vulnerability scan and license information, including happy path and partial errors would be very helpful to understand the end to end integration.

danielpacak · 2021-06-04T21:20:38Z

proposals/pluggable-image-scanning-api-v2.md

@@ -0,0 +1,177 @@
+
+# Proposal: Unified API for Pluggable Image Scanners


Not sure if this title suggests the ultimate goal of this proposal.

IIRC we wanted to leverage API v1.1 which is already defined as a minimal set of API endpoints and leverages MIME types. IMO the main focus should be around new MIME types and new scanners / security tools to onboard incrementally

@danielpacak The reason I thought about this title is because of the fact that we are presenting unified endpoints for scanning different types of data. The details about the MIME types being supported are still subject to discussion with respect to the names to be used etc. The importance of these MIME types has been highlighted in the section that describes how the client can retrieve the various types of results and also in the section Supporting Multiple Mime Types.
We can elaborate on these MIME types if required

danielpacak · 2021-06-04T21:28:43Z

proposals/pluggable-image-scanning-api-v2.md

+           "repository": "library/mongo",
+           "digest": "sha256:917f5b7f4bef1b35ee90f03033f33a81002511c1e0767fd44276d4bd9cd2fa8e"
+         }
+         "scanTypes": {


If the scanType property is a subset of produced MIME types advertised by the /metadata endpoint maybe we can think about different / consistent naming here. More importantly, I'd use an array of objects instead of a map with MIME types as a key to enable further extensibility:

{ "scanTypes": [ { "mimeType": "application/vnd.security.vulnerability.report; version=1.1", "params": {} }, { "mimeType": "application/vnd.security.bom.report; version=1.0", "params": { "allowGNU": false, "format": "spdx" }, "propertyAddedInTheFuture": {} } ] }

I'd also justify why combining all mime types in a single request is better than multiple scan requests per mime type.

I agree with the MIME type naming part. perhaps we can all agree upon some well defined names. These could also be specified within the OCI spec that @steven-zou started.

Additionally, I am in fact using an array of objects to specify scan types. The value of the mimeType field is used to activate a particular type of scanner. However, the activation can be controlled using the params object that can have scanner specific parameters or settings for fine-tuning.
With respect to the specification of MIME types in a request, it is up to the client to decide the data fetch mechanism. For e.g. the above structure allows the clients to make multiple API invocation each time specifying a single MIME type in the request. Another advanced client can on the other hand specify multiple MIME types and wait for the results of all being available. Hence this provides the required flexibility to the client who can decide the best approach to handle scan requests based on it's capabilities. I will add this part to the proposal

danielpacak · 2021-06-04T21:35:13Z

proposals/pluggable-image-scanning-api-v2.md

+
+The response from the scanner are packaged in an uber-container structure again keyed by the MIME type. The object against each MIME type would contain the following
+   - Internal scan tracking identifier for the composite scan.
+   - The scan id and status for each MIME type. The status would be either one of started, completed, running, aborted, failed


I don't think we need this complexity. It's non backward compatible and requires designing and implementing a workflow to rescan for a particular MIME types in case of partial errors. I believe that most of the time a scanner adapter will focus on one MIME types (e.g. existing vulnerability scanners). For scanners that support multiple MIME types it's probably acceptable to retry all scan types.

Beyond that, scanner adapters will have to maintain composite identifiers to implement state machine for each scan type. We'll have to upgrade all existing adapters as well.

Will give in some more thought to this aspect. Lets discuss this in a meeting.

danielpacak · 2021-06-04T21:43:12Z

proposals/pluggable-image-scanning-api-v2.md

+* Create an uber-container data structure which will contain the scan results keyed by the corresponding MIME types and send the result to client.
+
+### Scan API response handling
+The scanning process would be executing multiple types of scans asynchronously and hence a sophisticated response handling mechanism would required.


In the example above there's one scan request with multiple scanTypes indexed by mime type. On the other hand we'd be tracking multiple scan identifiers embedded in the scan response. Why not sending multiple scan requests and tracking them one by one? Or if we stick to one request with multiple scanTypes we could track all mime types as a whole

Th reason I maintained per scan type based ID is because of the fact that some scans may take longer than expected to complete (with either success or failure). In such cases given that not all scan types finish at the same time we may want the client to start reporting status for each type of scan job separately thereby making it more response as opposed to waiting for all scans to complete. With such a workflow the scan ID would be needed to be per scan type since we would use the existing reporting mechanisms that fetch scan status based on scan identifiers.
If we all agree upon a workflow where-in the client would wait for all the component scans to finish before reporting a success/failure status then we can track it using a single scan Id.

danielpacak · 2021-06-04T21:44:04Z

proposals/pluggable-image-scanning-api-v2.md

+### Scan API response handling
+The scanning process would be executing multiple types of scans asynchronously and hence a sophisticated response handling mechanism would required.
+
+For a given job, the multiplexer component would maintain a mapping between the composite scan Id and the granular status of each of the component scans  as and when received from the scanner. 


For some scanners keeping such granularly and multiple scan identifiers might be overly complex. Especially for existing scanners that support only one mime type.

I will give in some more thoughts to this aspect. Lets discuss it in our meeting

danielpacak · 2021-06-04T21:45:57Z

proposals/pluggable-image-scanning-api-v2.md

+  -  how is the error reported to the caller?
+  -  what is the impact of an error on the overall scan job?
+
+A clean, simple and efficient mechanism to handle errors would be as follows


We could return an array of errors. Each error might have a mimeType or scanType property to indicate which one is failing

That is what exactly I have mentioned below. We will have an array of objects (one per MIME type) and the corresponding results/error data. Did you imply that some other detail was missing?
I will add a response type example for such a situation.

danielpacak · 2021-06-04T21:47:12Z

proposals/pluggable-image-scanning-api-v2.md

+2. Adopts a "what you send is what you receive" model.
+3. Allows clients to control what data they want from the scanner adapter - when the same scanner supports multiple types of scan functionalities
+4. Provides a mechanism to fine tune scanning operation.
+5. Incremental changes on the existing API model exposed by Harbor.


It seems that complex structure of scan identifiers per mime type would required lots of changes in Harbor core / jobservice

Lets discuss this. will also like to know what @steven-zou thinks about this part.
from what I understand - the existing Harbor job service would only have to implement a thin aggregation layer where-in the a mapping of the scan request to it's corresponding scan type jobs are maintained.
The underlying service would still have the same flow. I am however yet to gain some insights in this area and hence any inputs from @steven-zou and @heww would also help clarify

Signed-off-by: prahaladd <prahaladd@vmware.com>

prahaladdarkin

Responded to some review comments with some clarifications. We can discuss in detail in the meeting and then I will make the required changes.

prahaladdarkin · 2021-06-23T06:59:22Z

proposals/pluggable-image-scanning-api-v2.md

@@ -0,0 +1,177 @@
+
+# Proposal: Unified API for Pluggable Image Scanners


@danielpacak The reason I thought about this title is because of the fact that we are presenting unified endpoints for scanning different types of data. The details about the MIME types being supported are still subject to discussion with respect to the names to be used etc. The importance of these MIME types has been highlighted in the section that describes how the client can retrieve the various types of results and also in the section Supporting Multiple Mime Types.
We can elaborate on these MIME types if required

prahaladdarkin · 2021-06-23T07:11:55Z

proposals/pluggable-image-scanning-api-v2.md

+           "repository": "library/mongo",
+           "digest": "sha256:917f5b7f4bef1b35ee90f03033f33a81002511c1e0767fd44276d4bd9cd2fa8e"
+         }
+         "scanTypes": {


I agree with the MIME type naming part. perhaps we can all agree upon some well defined names. These could also be specified within the OCI spec that @steven-zou started.

Additionally, I am in fact using an array of objects to specify scan types. The value of the mimeType field is used to activate a particular type of scanner. However, the activation can be controlled using the params object that can have scanner specific parameters or settings for fine-tuning.
With respect to the specification of MIME types in a request, it is up to the client to decide the data fetch mechanism. For e.g. the above structure allows the clients to make multiple API invocation each time specifying a single MIME type in the request. Another advanced client can on the other hand specify multiple MIME types and wait for the results of all being available. Hence this provides the required flexibility to the client who can decide the best approach to handle scan requests based on it's capabilities. I will add this part to the proposal

prahaladdarkin · 2021-06-23T07:32:25Z

proposals/pluggable-image-scanning-api-v2.md

+* Create an uber-container data structure which will contain the scan results keyed by the corresponding MIME types and send the result to client.
+
+### Scan API response handling
+The scanning process would be executing multiple types of scans asynchronously and hence a sophisticated response handling mechanism would required.


Th reason I maintained per scan type based ID is because of the fact that some scans may take longer than expected to complete (with either success or failure). In such cases given that not all scan types finish at the same time we may want the client to start reporting status for each type of scan job separately thereby making it more response as opposed to waiting for all scans to complete. With such a workflow the scan ID would be needed to be per scan type since we would use the existing reporting mechanisms that fetch scan status based on scan identifiers.
If we all agree upon a workflow where-in the client would wait for all the component scans to finish before reporting a success/failure status then we can track it using a single scan Id.

prahaladdarkin · 2021-06-23T07:32:58Z

proposals/pluggable-image-scanning-api-v2.md

+### Scan API response handling
+The scanning process would be executing multiple types of scans asynchronously and hence a sophisticated response handling mechanism would required.
+
+For a given job, the multiplexer component would maintain a mapping between the composite scan Id and the granular status of each of the component scans  as and when received from the scanner. 


I will give in some more thoughts to this aspect. Lets discuss it in our meeting

prahaladdarkin · 2021-06-23T07:33:19Z

proposals/pluggable-image-scanning-api-v2.md

+
+The response from the scanner are packaged in an uber-container structure again keyed by the MIME type. The object against each MIME type would contain the following
+   - Internal scan tracking identifier for the composite scan.
+   - The scan id and status for each MIME type. The status would be either one of started, completed, running, aborted, failed


Will give in some more thought to this aspect. Lets discuss this in a meeting.

prahaladdarkin · 2021-06-23T07:35:50Z

proposals/pluggable-image-scanning-api-v2.md

+  -  how is the error reported to the caller?
+  -  what is the impact of an error on the overall scan job?
+
+A clean, simple and efficient mechanism to handle errors would be as follows


That is what exactly I have mentioned below. We will have an array of objects (one per MIME type) and the corresponding results/error data. Did you imply that some other detail was missing?
I will add a response type example for such a situation.

prahaladdarkin · 2021-06-23T07:39:26Z

proposals/pluggable-image-scanning-api-v2.md

+2. Adopts a "what you send is what you receive" model.
+3. Allows clients to control what data they want from the scanner adapter - when the same scanner supports multiple types of scan functionalities
+4. Provides a mechanism to fine tune scanning operation.
+5. Incremental changes on the existing API model exposed by Harbor.


Lets discuss this. will also like to know what @steven-zou thinks about this part.
from what I understand - the existing Harbor job service would only have to implement a thin aggregation layer where-in the a mapping of the scan request to it's corresponding scan type jobs are maintained.
The underlying service would still have the same flow. I am however yet to gain some insights in this area and hence any inputs from @steven-zou and @heww would also help clarify

zhill · 2021-07-09T02:17:10Z

proposals/pluggable-image-scanning-api-v2.md

+| Mime Type | Type | Version | Status | 
+| --- | --- | --- | --- |
+| application/vnd.security.vulnerability.report; version=1.1 | OS/Package vulnerability data | 1.1 | Released in Harbor 2.3.0 |
+| application/vnd.security.bom.report; version=1.0 | Bill of Materials (BoM) | 1.0 | Proposed |


There are several near-standard formats that would make sense for SBoM instead of a new proprietary one: SPDX and CycloneDX are the two with the most traction. Those will help with interoperability, and may include enough information for the FS verification type as well. SPDX in particular supports files and digests as long as you are ok with very large manifests.

danielpacak self-requested a review May 28, 2021 13:05

danielpacak reviewed Jun 4, 2021

View reviewed changes

steven-zou added the area/interrogation-service Services like vulnerability scanning and compliance checking etc. label Jun 10, 2021

prahaladdarkin force-pushed the pd_scanner_api_v2 branch from 939e364 to dfa8e01 Compare June 23, 2021 07:46

Proposal for Unified Pluggable Scanner API

56342a9

Signed-off-by: prahaladd <prahaladd@vmware.com>

prahaladdarkin force-pushed the pd_scanner_api_v2 branch from dfa8e01 to 56342a9 Compare June 23, 2021 07:54

prahaladdarkin commented Jun 23, 2021

View reviewed changes

zhill reviewed Jul 9, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal for Unified Pluggable Scanner API #164

Proposal for Unified Pluggable Scanner API #164

prahaladdarkin commented May 23, 2021

danielpacak left a comment •

edited

danielpacak Jun 4, 2021 •

edited

prahaladdarkin Jun 23, 2021

danielpacak Jun 4, 2021 •

edited

prahaladdarkin Jun 23, 2021

danielpacak Jun 4, 2021 •

edited

prahaladdarkin Jun 23, 2021

danielpacak Jun 4, 2021 •

edited

prahaladdarkin Jun 23, 2021

danielpacak Jun 4, 2021 •

edited

prahaladdarkin Jun 23, 2021

danielpacak Jun 4, 2021 •

edited

prahaladdarkin Jun 23, 2021

danielpacak Jun 4, 2021 •

edited

prahaladdarkin Jun 23, 2021

prahaladdarkin left a comment

prahaladdarkin Jun 23, 2021

prahaladdarkin Jun 23, 2021

prahaladdarkin Jun 23, 2021

prahaladdarkin Jun 23, 2021

prahaladdarkin Jun 23, 2021

prahaladdarkin Jun 23, 2021

prahaladdarkin Jun 23, 2021

zhill Jul 9, 2021 •

edited

		@@ -0,0 +1,177 @@

		# Proposal: Unified API for Pluggable Image Scanners

Proposal for Unified Pluggable Scanner API #164

Are you sure you want to change the base?

Proposal for Unified Pluggable Scanner API #164

Conversation

prahaladdarkin commented May 23, 2021

danielpacak left a comment • edited

Choose a reason for hiding this comment

danielpacak Jun 4, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danielpacak Jun 4, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danielpacak Jun 4, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danielpacak Jun 4, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danielpacak Jun 4, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danielpacak Jun 4, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danielpacak Jun 4, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

prahaladdarkin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhill Jul 9, 2021 • edited

Choose a reason for hiding this comment

danielpacak left a comment •

edited

danielpacak Jun 4, 2021 •

edited

danielpacak Jun 4, 2021 •

edited

danielpacak Jun 4, 2021 •

edited

danielpacak Jun 4, 2021 •

edited

danielpacak Jun 4, 2021 •

edited

danielpacak Jun 4, 2021 •

edited

danielpacak Jun 4, 2021 •

edited

zhill Jul 9, 2021 •

edited