Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do TLCC cc2scc calls require reflection in r/w set of transactions? #361

Open
3 of 6 tasks
g2flyer opened this issue May 19, 2020 · 16 comments
Open
3 of 6 tasks

Do TLCC cc2scc calls require reflection in r/w set of transactions? #361

g2flyer opened this issue May 19, 2020 · 16 comments
Assignees

Comments

@g2flyer
Copy link
Contributor

g2flyer commented May 19, 2020

Description

Currently calls to TLCC from ECC (or potentially in the future for ERCC) are not reflected explicitly in an endorsement as other cc2(s)cc calls are. We should investigate whether this is secure and/or safe from conflicts and otherwise might have to extend our protocols. Probably also requires a bit more in-deptch study how fabric handles that exactly for other cc2scc calls.

Steps

Link to feature or bug

Link to dependent issues

@mbrandenburger
Copy link
Contributor

mbrandenburger commented Jun 12, 2020

With Fabric it is possible to call another chaincode (chaincodeB) using chaincode2chaincode calls while your chaincode (chaincodeA) is executed. The interface is the same as a "normal" chaincode invocation. You send an operation with arguments and as a result it returns a response. If the called chaincode accesses it's state via getState calls, these calls and all accessed keys are presented in the read/write set.

For example:

<TxReadWriteSet>
  <NsReadWriteSet name="chaincodeA">
    <read-set>
      <read key="K1", version="1">
    </read-set>
    <write-set>
      <write key="K1", value="V1"
    </write-set>
  </NsReadWriteSet>
  <NsReadWriteSet name="chaincodeA">
    <read-set>
      <read key="K1", version="1">
    </read-set>
  </NsReadWriteSet>
<TxReadWriteSet>

When doing chaincode2chode calling system chaincode, the principle should be the same. However, it is the called system chaincode's implementation responsability to provide an apropriate read-write set (for normal chaincode, the chaincode's shim does that automatically).

In the case of ecc calling TLCC (enclave), it does not access the state using getState ops but instead calls into the enclave and produces some integrity metadata that allows the FPC chaincode enclave to verify that it received "correct" state when ecc called getState.

Is it bad the the integrity metadata will not show up in the read/writeset? Since the implementation of the integrity metadata is a cmac that requires the corresponding key to verify it, even if we add it to the transaction proposal response, we cannot use to verify as the key is shared only between tlcc_enclave and ecc_enclave. So I guess we dont have an issue here.

@mbrandenburger
Copy link
Contributor

mbrandenburger commented Jun 12, 2020

The other case where we may want to leverage cc2cc calls (to system chaincode) is to let ercc retrieve the chaincode definition, in particular the MRENCLAVE, for a FPC chaincode. As ercc cannot directly access it from the peer when verifying an attestation report, we need to get it from somewhere. There are two options:

  1. client provides it when invoking ERCC. This is not a good idea as every endorsing peer should verify an attestation report using it's own view of the chaincode definition.

  2. let ercc get the chaincode definition through a system chaincode. Need to double check if lifecycle chaincode would return the chaincode definition. Alternatively, since we are implementing tlcc as a system chaincode, we can extend tlcc to retrieve the chaincode definition and return it ercc.

Should the retrieve be visible in the read/writeset of the ercc invocation? This is unclear. However, we want to ensure that all endorsing peers are executing a ercc transaction proposal come to the same resulting transaction proposal response. That is, if peer A and peer B retrieve different chaincode definitions for a chaincode (e.g. due to network latency, peer B is behind and hasnt received yet an update), their transaction proposal response will look different, thus, the transaction should fail.

However, from the read/write set inspection we will not see what went wrong in this situation. We could do two things: a) let ercc write the chaincode definition to its namespace using putState. In particular, we then store for each enclave its public key, the attestation evidence and the chaincode definition (or just a hash of it?). Alternatively, we can write the hash of the chaincode definition to the transaction response to ensure that all transaction proposal responses used the same chaincode definition.

@mbrandenburger
Copy link
Contributor

mbrandenburger commented Jun 12, 2020

Consequently, if we consider cc2cc responses as additional input for a chaincode, it is a good idea to reflect this input in the transaction proposal response. However, this only works for deterministic input that is also useful to verify the proposal response. In the case of the integrity metadata check provided by TLCC this is not the case, as it is unique for every enclave/tlcc pair.

@mbrandenburger
Copy link
Contributor

@g2flyer @bvavala WDYT?

@bvavala
Copy link
Contributor

bvavala commented Jun 12, 2020

The other case where we may want to leverage cc2cc call is to let ercc retrieve the chaincode definition, in particular the MRENCLAVE, for a FPC chaincode. As ercc cannot directly access it from the peer when verifying an attestation report, we need to get it from somewhere. There are two options:

  1. client provides it when invoking ERCC. This is not a good idea as every endorsing peer should verify an attestation report using it's own view of the chaincode definition.
  2. let ercc get the chaincode definition through a system chaincode. Need to double check if lifecycle chaincode would return the chaincode definition. Alternatively, since we are implementing tlcc as a system chaincode, we can extend tlcc to retrieve the chaincode definition and return it ercc.

Would the retrieve be visible in the read/writeset of the ercc invocation? This is unclear. However, we want to ensure that all endorsing peers are executing a ercc transaction proposal come to the same resulting transaction proposal response. That is, if peer A and peer B retrieve different chaincode definitions for a chaincode (e.g. due to network latency, peer B is behind and hasnt received yet an update), their transaction proposal response will look different, thus, the transaction should fail.

However, from the read/write set inspection we will not see what went wrong in this situation. We could do two things: a) let ercc write the chaincode definition to its namespace using putState. In particular, we then store for each enclave its public key, the attestation evidence and the chaincode definition (or just a hash of it?). Alternatively, we can write the hash of the chaincode definition to the transaction response to ensure that all transaction proposal responses used the same chaincode definition.

This is already discussed in the registration diagram (check-consistency-of-credentials part).
The client sends the Credentials (which include the chaincode definition) to ercc, then each ercc checks, stores it in its namespace, and endorses it locally. TLCC is convenient to check it (no enclave required). If a check fails, then the endorsement fails, and you already know what went wrong -- no need to give the client additional info. At commit time, either enough ercc's said ok, or... no registration!

@g2flyer
Copy link
Contributor Author

g2flyer commented Jun 16, 2020

I would argue conceptually system chaincode is no different to chaincode and should from a security perspective ideally reflect it's own state access and modification in the R/W set.

What seems clear is that any state change in the callee should be reflected somewhere in the endorsement or we could end it in inconsistent overall state with unclear effect.
Given that chaincode should be deterministic, one could also argue that the top-level input (in the signedproposal) and the rw-sets should determine all inputs and outputs of "internal" functions and henvce its seems as if internal inputs and outputs could be omitted from a security perspective.
It is also tempting to say that read-sets could be omitted as any inconsistent values read and returned should either result in different behavior reflected in different read/write-sets or final output (which should also be captured in the endorsement signature). However, as in "standard" MVCC, the read-set might also be required from a serializability perspective, e.g., without recording read-sets two transactions could be accepted which would rely on state which could never have existed from a serializability perspective!

Besides the security question, there is of course a usability question that explicit R/W sets can help explain why multiple endorsements disagree even though fed the same inputs, so omitting R/W values would make it hard to figure out why something fails (it is, though, similar to the case that code happens to be non-deterministic?)

Mapping above to the various cc2scc cases --- note, not all of them are yet in github issues, some of them are still in GDocs or pending resolution of some design decisions around e-manager .. --, my take would be that

  1. ecc to tlcc enclave / retrieve state: This should be read-only, so definitely nothing for write-sets. Regarding read-sets, there is the serializability argument. As long as there is a 1-1 corresponds between the state from which actual values are fetched from untrusted ledger and meta-data fetched from tlcc (which should be consistent due to the MAC), the r/w set from ecc should handle this. This also relates to issue TLCC - Peer update synchronization #287 (although only from robustness perspective, not security?)
  2. ecc to tlcc enclaves / extract/validate signed proposal/creator: This op should also be only read-only but will have to access MSP and other channel information. As in this case probably nothing would be in the R/W set of the ecc and nothing would ensure the consistency, we probably should add something to the read-set to handle the serializability aspect (easiest might be just a hash of the used channel definition with the MSP and alike?
  3. ercc to tlcc untrusted / validate/get chaincode definition: This also should be read-only and so only serializability should be an issue. Resolution might be similar to above (add hash of active chaincode definition used to read-set)
  4. others: if we would store sealed state in e_manager this also might be cc2scc call. However, in that case i think everything should be implicitly secure (and from a robustness perspective probably immediately clear at least for MVP/designated peer as there wouldn't be any update of sealed state and so the only failure scenario would be not retrieving a sealed blob and failing due to that?)

@g2flyer
Copy link
Contributor Author

g2flyer commented Jun 16, 2020

@mbrandenburger @bvavala See above my analysis. In particular, note my observation regarding serializability which contradicts our conclusion on the call that we can ignore the read-sets.

PS: i also took the liberty to slightly edit marcus' orginal nice summary but pointing out the difference of the situation for cc2cc vs cc2scc.

@bvavala
Copy link
Contributor

bvavala commented Jun 17, 2020

(The inputs in theory should include the "transient" fields as well, but they don't.)

The gist of the solution is in your point (1): MVCC does the trick, as long as the input data goes through the get_state from the Fabric shim.

Regarding (2), I don't see it different than (1). The signed proposal comes from the Fabric shim and it is hashed in the response. The GetCreator returns a response that tlcc "simply" validates (given the proposal).

Regarding (3), there is a fundamental challenge in our design: keeping the chaincode definitions in ERCC and channel configuration in sync. I see some noteworthy cases here:
(i) ercc accepts a registration but, before the registration tx is committed, the chaincode definition changes.
(ii) the chaincode definition changes between create and register
(iii) the chaincode definition changes before the client queries ercc for the enclave's key
(iv) the chaincode definition changes after querying ercc and before the transaction proposal reaches the chaincode.

In (i), I think the registration succeeds since we don't have commit-time checks -- neither in TLCC, nor in ercc pluging (which will be removed). Result: divergent definitions.

In (ii), ercc rejects registration. Result: nothing to sync.

In (iii), the problem is similar to (i), but we have some options: (a) ercc checks the definition and aborts; (b) the client grabs ercc's definition then queries the peer for the committed definition and aborts. These alternatives are related to FPC client #358 and execution flow #355.

About (iv), I might have spotted a security issue.
The client thinks of invoking CC version X (enclave X), but the proposal reaches CC version Y (enclave Y) (let's assume that the malicious peer lets it through). The proposal will be processed because the client uses the chaincode public encryption key (not the enclave's). In the current endorsement process, the response is signed by enclave Y, and the peer will put together the proposal response with chaincode version Y. Important (to be checked further in <1> and <2> below): the proposal response only contains the proposal payload, which does not include the proposal header with chaincode version X. Now, assuming the chaincode definition has changed (both in ercc and channel configuration), the validation will be successful since it will validate the expected enclave signature (enclave Y) and the correct chaincode version, as taken from the response.
A solution would require that: at invocation time, the enclave checks that its chaincode definition matches the chaincode definition in the proposal; the enclave binds the response to its chaincode definition; at commit time, the validator (additionally) checks that the currently committed chaincode definition matches the one bound to the response. Also, these additional checks do not seem to make ercc's chaincode definition check redundant, as it enforces that only approved enclaves are invoked.
Question: have I spotted an issue and, if so, is that a suitable solution?

<1> proposal.proto
<2> transaction.proto

@g2flyer
Copy link
Contributor Author

g2flyer commented Jun 17, 2020

(The inputs in theory should include the "transient" fields as well, but they don't.)

Ah, interesting. So this could be another source of spurious non-matching responses, similar to non-determinism. That said, we do not really support transient fields, so shouldn't be an issue (at least assuming use of transient fields results in immediate failure or these fields are simply ignored in FPC ...)

The gist of the solution is in your point (1): MVCC does the trick, as long as the input data goes through the get_state from the Fabric shim.

BTW: I guess you are referring by (1), (2), to the different syscc cases from my comment? For clarity reason i correspondingly changed the list in my comment to a numbered list.....

Regarding (2), I don't see it different than (1). The signed proposal comes from the Fabric shim and it is hashed in the response. The GetCreator returns a response that tlcc "simply" validates (given the proposal).

Not really: the difference here is that the sys-cc needs additional ledger info from the channel definition to process and validate GetCreator and that is not automatically reflected in r/w-set (and hence also serialization issues not captured by MVCC unless we reflect in some form or fashion the channel definition in the r/(w)-set ..)

Regarding (3), there is a fundamental challenge in our design: keeping the chaincode definitions in ERCC and channel configuration in sync.

I guess by "channel configuration" you mean the (official) "chaincode definition on the channel"? (When i mentioned "channel definition" i meant the record containing msp etc defining a channel on the ledger, a change of which in this case should be orthogonal?).

I see some noteworthy cases here:
(i) ercc accepts a registration but, before the registration tx is committed, the chaincode definition changes.
(ii) the chaincode definition changes between create and register
(iii) the chaincode definition changes before the client queries ercc for the enclave's key
(iv) the chaincode definition changes after querying ercc and before the transaction proposal reaches the chaincode.

BTW: due #287, there could be also potentially a de-sync between tlcc's view on the ledger and what the peer sees. But your cases of overlapping changes while registring or querying of course adds a few additional wrinkles in the mix (and gives me the gut feeling that ercc should do a sys-cc for other ops such as create and register to ensure serialization of chaincode definition views...)

In (i), I think the registration succeeds since we don't have commit-time checks -- neither in TLCC, nor in ercc pluging (which will be removed). Result: divergent definitions.

In (ii), ercc rejects registration. Result: nothing to sync.

In (iii), the problem is similar to (i), but we have some options: (a) ercc checks the definition and aborts; (b) the client grabs ercc's definition then queries the peer for the committed definition and aborts. These alternatives are related to FPC client #358 and execution flow #355.

About (iv), I might have spotted a security issue.
The client thinks of invoking CC version X (enclave X), but the proposal reaches CC version Y (enclave Y) (let's assume that the malicious peer lets it through). The proposal will be processed because the client uses the chaincode public encryption key (not the enclave's). In the current endorsement process, the response is signed by enclave Y, and the peer will put together the proposal response with chaincode version Y. Important (to be checked further in <1> and <2> below): the proposal response only contains the proposal payload, which does not include the proposal header with chaincode version X. Now, assuming the chaincode definition has changed (both in ercc and channel configuration), the validation will be successful since it will validate the expected enclave signature (enclave Y) and the correct chaincode version, as taken from the response.
A solution would require that: at invocation time, the enclave checks that its chaincode definition matches the chaincode definition in the proposal; the enclave binds the response to its chaincode definition; at commit time, the validator (additionally) checks that the currently committed chaincode definition matches the one bound to the response. Also, these additional checks do not seem to make ercc's chaincode definition check redundant, as it enforces that only approved enclaves are invoked.
Question: have I spotted an issue and, if so, is that a suitable solution?

I have to think a bit more with a clearer head than right now after 12+h work. One question, though: do you think there is a problem even if we reflect chaincode definition reads (and channel definition reads as apropriate) and the txs should be serializable or only without? In the later case, I'm pretty sure there are some odd corner cases and it would seem difficult to enumerate them all, so preferable we can make it properly serializable via proper MVCC/rw-set updates.

<1> proposal.proto
<2> transaction.proto

@g2flyer
Copy link
Contributor Author

g2flyer commented Jun 18, 2020

Given that it seems we do seem to agree there are some issues which require further investigation (and imho almost certainly adding stuff to R/W set in tlcc), nothing of that has to be reflected in UML and hence i check off the Tech Preview items in the description ...

@g2flyer g2flyer modified the milestones: Tech Preview, MVP Jun 18, 2020
@g2flyer
Copy link
Contributor Author

g2flyer commented Jun 19, 2020

btw: looking now at key mgmt protocol, it seems we have a few more tlcc cc2scc calls: e.g., can_endorse (from both ercc and ecc) and check_committed_data (from ecc). Not sure if anything fundamentally changes in the analysis above, but log it here for completeness

@mbrandenburger
Copy link
Contributor

mbrandenburger commented Jun 19, 2020

ERCC can directly fetch a chaincode definition through _lifecycle cc2scc call using the following code:

	logger.Info("Try to get chaincode definition")
	dfFunction := "QueryChaincodeDefinition"
	dfArgs := &lb.QueryChaincodeDefinitionArgs{
		Name: "auction_test",
	}
	dfArgsBytes, err := proto.Marshal(dfArgs)

	resp := stub.InvokeChaincode("_lifecycle", [][]byte{[]byte(dfFunction), dfArgsBytes}, stub.GetChannelID())

	df := &lb.QueryChaincodeDefinitionResult{}
	if err := proto.Unmarshal(resp.Payload, df); err != nil {
		return shim.Error("we stop here ...")
	}
	logger.Infof("Got chaincode definition: %v", df)

This code calles https://github.com/hyperledger/fabric/blob/a282f7132565d509d3ba6d502cd9c07cc5e567f0/core/chaincode/lifecycle/scc.go#L523 and returns a QueryChaincodeDefinitionResult including version, approvals, etc ... all we need :)

I just run this code from ERCC within the register method. I can also confirm that the corresponding readset contains reads to _lifecycle.auction_test.version (and others ...).

With this code working, there is no need to retrieve the chaincode definition through TLCC. As long as cc2scc to _lifecycle is not deprecated it seems to be an easy and safe route to go.

@g2flyer
Copy link
Contributor Author

g2flyer commented Jun 19, 2020

Mea culpa: i missread code (twice, actually): i mistakenly thought stub.GetChannelID() was the name used for the lookup and then also (again mistakenly) read it as the the function to get your own chaincode name rather than the channel name. Another lesson not to try to multi-task ...
PS: to remove clutter in this issue i've deleted the two comments above (my original misreading and marcus reply) as they were unnecessarily cluttering the issue and distracting from the main discussion and as email-issued comments anyway were hard to process. I left though this mea culpa in case somebody is wondering about the deleted comments.

@g2flyer
Copy link
Contributor Author

g2flyer commented Jun 29, 2020

btw: to securely get the caller CC's identity/namespace this fabric-ml article might be helpful ...

@bvavala
Copy link
Contributor

bvavala commented Jun 29, 2020

btw: to securely get the caller CC's identity/namespace this fabric-ml article might be helpful ...

I think that that's not the caller, but rather the callee of a chaincode invocation. For instance, if you invoke the FPC auction chaincode to submit a bid, the proposal will contain the FPC auction chaincode definition.

On a side note, this is the mechanism that should be implemented in the fetch CC parameters on enclave creation. That's why it is unnecessary (and probably brittle) to let the admin send the chaincode definition.

Continuing the example, if the FPC auction makes cc2cc call to ERCC, I suspect that the proposal remains the same (InvokeChaincode does not modify the transaction context; I think the proposal belongs to the context). So ERCC can check that it is not the original callee.

@g2flyer
Copy link
Contributor Author

g2flyer commented Jun 29, 2020

I think that that's not the caller, but rather the callee of a chaincode invocation. For instance, if you invoke the FPC auction chaincode to submit a bid, the proposal will contain the FPC auction chaincode definition.

It's the callee of the top-level chaincode invocation which is the caller from a callee in a cc2cc transaction. (see comment of the referenced code). That said, you also have to be careful that there are not nested cc2cc transactions, so you really get the complete path ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants