Allow TP to read by address prefix #34

yoni-wolf · 2018-12-20T11:02:48Z

RFC to support read by address prefix from transaction processor

agunde406

This pr should not include commits from your previous rfc, it should not delete the old rfc as when this is merged it would delete the rfc in master, and should not have any merge commits. Instead, this branch should be based on master and only contain the new commits.

Signed-off-by: ywolf <yoni.wolf@intel.com>

yoni-wolf · 2018-12-23T11:00:09Z

apologize for the mess, started again with only the one initial commit

arsulegai · 2019-01-03T03:48:54Z

text/0000-read-by-prefix.md

+
+This RFC proposes an additional API to transaction context,
+context.listAddresses(address_prefix), which will get address prefix as input
+and return list of all existing addresses that starts with the prefix.


Do we allow the case of address_prefix being empty / none?
If so, the reference implementation lists all addresses with non-empty data at prefixes which are in input address list of transaction.

I think that better to follow logic as in get_state: if TP is requesting read of addresses that is not in 'input' address list then return failure.
It is up to the transaction family developer to check return value of the api and to make sure that the 'input' address list is valid, since the input list can accept address prefix i don't see this as a change of behavior.

yoni-wolf · 2019-01-03T07:01:29Z

apologize for the mess, started again with only the one initial commit

@agunde406 can you please restart your review?

yoni-wolf · 2019-01-16T06:55:02Z

Can we progress with getting this approved?

agunde406 · 2019-01-16T14:07:30Z

text/0000-read-by-prefix.md

+
+Get state by prefix is a feature available for reading via sawtooth rest-api.
+This RFC adds method to expose this capability of reading address prefix 
+to transaction processor.


This section is very light on content. Please provide concrete examples of why this is needed. Please check the Guide-Level explanation section in the template for other information that should be in this section.

@yoni-wolf you have more thoughts in here: https://jira.hyperledger.org/browse/STL-1375 like the example of deleting all leaves under a certain node. As a use case maybe that's deleting all accounts owned by a certain user?

I think the main usage for this feature is this:

you allocate some prefix for a member or usage

the TP logic has some algorithm for translating data to sufix

TP writes the data to the calculated prefix+sufix

now, it is very fast to find a certain piece of information, you just calculate it again and read that address, however, it is very hard and inefficient to clean it up!

with 'read by prefix' TP can support a 'clean transaction' that will:

use this feature to acquire a list of all used addresses under that prefix

make a list of them and call the delete once, or call delete for each address there

how would it be implemented w/o 'read by prefix'? (assuming you did implement efficient lookup like this):

one way is to try and read every possible suffix (which might be a huge list and take forever), another way is to start keeping track of every suffix you used in another address, which you will have to read,modify,write for every sufix you add, which will slow you down. both are much worse i think.

agunde406 · 2019-01-16T14:10:16Z

text/0000-read-by-prefix.md

+Example of implementation can be found here: 
+Sawtooth-core + Python SDK changes: 
+https://github.com/arsulegai/sawtooth-core/tree/jira_1375 
+(commits: 4225dc7 and 2c3a223)


Please explain the actual implementation suggested for the validator and the sdks in this section as well, besides just the links to the commits.

vaporos · 2019-01-16T15:48:23Z

text/0000-read-by-prefix.md

+read/write/delete each address in the list.
+This has a performance downside that each time transaction processor needs to 
+write a new address, it will need to modify the special address containing 
+list of all addresses. This address could be huge, when the need to actually 


This argument - that the address could be huge - is also a flaw of this approach generally, since the returned address list could be that large. This should be mentioned in Drawbacks.

Added to drawbacks.
The main different is that the access to a huge set of data can be limited to only when actually need to change all the data. with 'old' approach of keeping an address that tracks existing addresses, TP needs to modify this address during almost every transaction.

vaporos · 2019-01-16T15:54:19Z

text/0000-read-by-prefix.md

+[drawbacks]: #drawbacks
+
+This will need to be implemented in every sawtooth SDK and will increase 
+the complexity of the validator slightly.


A few other drawbacks should be mentioned here: a) this will perform very poorly when the address list is large; b) this will cause large I/O and CPU on the validator when the address list is large (it is potentially very expensive, unlike the rest of the TP API); c) once implemented it will be hard/difficult to deprecate because of our commitment to backward compatibility; d) this seems like an anti-pattern in TP design, and may encourage others to implement poorly designed TPs.

I don't know how this is handled for the rest-api, but one salient difference between the situation for the rest-api and a transaction processor, is that only one node is impacted by the query overhead for serving the rest-api request whereas all nodes are impacted by a transaction using a similar query. If not implemented carefully, a significant performance cost could be exploited as a DOS attack on the network.

vaporos · 2019-01-16T15:56:03Z

text/0000-read-by-prefix.md

+the complexity of the validator slightly.
+
+# Rationale and alternatives
+[alternatives]: #alternatives


Another alternative is to redesign the TP to not need the list of addresses at all.

vaporos · 2019-01-16T15:58:41Z

text/0000-read-by-prefix.md

+that contains list of existing addresses. When need to manipulate multiple 
+addresses transaction processor will first read the special address, then 
+read/write/delete each address in the list.
+This has a performance downside that each time transaction processor needs to 


There is a performance penalty for maintaining a list like this, but it is probably substantially faster than the approach contained in this RFC.

text/0000-read-by-prefix.md

vaporos · 2019-01-16T16:12:08Z

text/0000-read-by-prefix.md

+in a separate address.
+
+Having this capability allows a better and smarter address mapping this a 
+better transaction processor logic.


This is not a well-formed sentence.

vaporos · 2019-01-16T16:18:31Z

text/0000-read-by-prefix.md

+	message TpStateAddressesListRequest {
+		// The context id that references a context in the contextmanager
+		string context_id = 1;
+		repeated string addresses = 2;


What if this request was for an address space with an exceptionally large number of addresses? Should they just be returned, or is there an upper limit, where the validator will send an error back with "request denied". If there should be an upper limit, how would we manage it?

Another approach may be a paging approach, where the request specifies the maximum number of addresses to return (with a max page size, perhaps configurable per-validator).

Same as regular read, there is no limitation of data size in one address, it is up the the TP developer to make sure he doesn't write too much data to one address.
Since only requesting address list without addresses data this won't get too big that easily.
No objection to adding paging and max, only that it is not done in regular read address, added it as to unresolved questions section.

It is not the same type of problem, because read/write are relatively efficient. Returning a list of addresses from the state tree will not be efficient as it will require a large number of underlying database reads to accomplish.

Adding an on-chain setting for maximum size of address storage would probably be a good feature too.

vaporos · 2019-01-16T16:20:59Z

text/0000-read-by-prefix.md

+	message TpStateAddressesListRequest {
+		// The context id that references a context in the contextmanager
+		string context_id = 1;
+		repeated string addresses = 2;


Why is this a list? If it remains a list, we should specify in this section how the validator should process the request.

vaporos · 2019-01-16T16:21:42Z

text/0000-read-by-prefix.md

+[summary]: #summary
+
+This RFC proposes an additional API to transaction context,
+context.listAddresses(address_prefix), which will get address prefix as input


This should probably return an iterator so not all addresses must be obtained at once.

vaporos · 2019-01-16T16:24:34Z

text/0000-read-by-prefix.md

+		enum Status {
+			STATUS_UNSET = 0;
+			OK = 1;
+			AUTHORIZATION_ERROR = 2;


This document should describe whether this occurs based on a comparison to the request, or whether it occurs if the list would return an address outside the allowed range. It should probably be the former.

peterschwarz

The general concept described here, as I understand it, could be implemented as a mapping of application keys and values, in turn stored at a single address, where that address would be more specific for that particular transaction. In other words, store the list of key-value pairs at an address specific enough for the use case. Transaction processors don't generally operate on arbitrary depths of addresses - things are scoped to a particular user or org or product, etc.

@dcmiddle referenced a Jira ticket for this particular describes a situation where intkey keys are selected by key length. This presumes a direct link between keys and their address in a 1-1 manner. Keys are often generated using hashes, which would be a fixed length, regardless of original key length.

A reply in the issue does describe deleting an entire subtree, which makes sense and probably has a valid set of use cases.

I think a compelling use case needs to be presented, but ultimately don't think there is one.

peterschwarz · 2019-01-16T20:49:30Z

text/0000-read-by-prefix.md

+# Motivation
+[motivation]: #motivation
+
+The motivation came when looking for a way to improve processing transaction 


The optimization here is actually reduces the number of messages between the validator and a transaction processor in exchange for potentially larger messages.

Signed-off-by: ywolf <yoni.wolf@intel.com>

text/0000-read-by-prefix.md

Signed-off-by: ywolf <yoni.wolf@intel.com>

yoni-wolf · 2019-01-28T07:35:41Z

Added paging mechanism.

yoni-wolf · 2019-02-03T09:29:25Z

@peterschwarz @vaporos @agunde406 @dcmiddle
Added paging mechanism to address the performance concerns.
Oron added use case that helps explain why this feature is required.

Can you please review again?
Thanks

agunde406 · 2019-02-07T14:22:33Z

There are still quite a few comments that have not been addressed.

duncanjw · 2019-02-21T17:58:35Z

I'm delighted to see this RFC proposed as we hit exactly this problem running a Sawtooth Workshop aimed exclusively at developers in Hong Kong in December.

What was intuitive to the developers when tackling a programming exercise aimed at creating a voting app TP was reading the current set of voters using the voters prefix to determine whether someone registering was in fact already registered when handing a register to vote transaction.

They were baffled when they discovered they couldn't do this.

jsmitchell · 2019-04-01T19:04:20Z

I'm delighted to see this RFC proposed as we hit exactly this problem running a Sawtooth Workshop aimed exclusively at developers in Hong Kong in December.

What was intuitive to the developers when tackling a programming exercise aimed at creating a voting app TP was reading the current set of voters using the voters prefix to determine whether someone registering was in fact already registered when handing a register to vote transaction.

They were baffled when they discovered they couldn't do this.

This is an application design issue, and does not require the feature described in this RFC. If the application was designed to store voter registration information at a deterministic location based on information describing the voter, then the existence check becomes an O(1) operation, a direct lookup.

jsmitchell

I'm voting no on this. I believe the need for this can be avoided with appropriate state design. If there are specific examples that can't be solved for with an alternate state layout, I'm interested in exploring them. Even if such examples are presented, there are a number of practical challenges with this proposal, foremost the question of how to constrain/page the result set.

yoni-wolf force-pushed the read-prefix branch from eba5274 to 853ce13 Compare December 20, 2018 11:04

agunde406 requested review from agunde406, aludvik, dcmiddle, jsmitchell, peterschwarz and vaporos December 20, 2018 14:19

agunde406 assigned vaporos Dec 20, 2018

agunde406 suggested changes Dec 20, 2018

View reviewed changes

read by prefix first draft

551a040

Signed-off-by: ywolf <yoni.wolf@intel.com>

yoni-wolf force-pushed the read-prefix branch from a3b4762 to 551a040 Compare December 23, 2018 10:56

arsulegai reviewed Jan 3, 2019

View reviewed changes

yoni-wolf closed this Jan 3, 2019

yoni-wolf reopened this Jan 3, 2019

agunde406 self-requested a review January 3, 2019 13:42

agunde406 suggested changes Jan 16, 2019

View reviewed changes

vaporos suggested changes Jan 16, 2019

View reviewed changes

peterschwarz suggested changes Jan 16, 2019

View reviewed changes

Update 0000-read-by-prefix.md

bd2bf2d

Signed-off-by: ywolf <yoni.wolf@intel.com>

yoni-wolf force-pushed the read-prefix branch from 2412b3c to bd2bf2d Compare January 21, 2019 07:32

vaporos reviewed Jan 22, 2019

View reviewed changes

text/0000-read-by-prefix.md Outdated Show resolved Hide resolved

add paging mechanism

fa77898

Signed-off-by: ywolf <yoni.wolf@intel.com>

agunde406 requested review from agunde406, vaporos and peterschwarz February 4, 2019 14:21

jsmitchell suggested changes Apr 1, 2019

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow TP to read by address prefix #34

Allow TP to read by address prefix #34

yoni-wolf commented Dec 20, 2018

agunde406 left a comment

yoni-wolf commented Dec 23, 2018

arsulegai Jan 3, 2019

yoni-wolf Jan 3, 2019

arsulegai Jan 3, 2019

yoni-wolf commented Jan 3, 2019

yoni-wolf commented Jan 16, 2019

agunde406 Jan 16, 2019

dcmiddle Jan 16, 2019

lenzoron Jan 28, 2019

agunde406 Jan 16, 2019

vaporos Jan 16, 2019

yoni-wolf Jan 20, 2019

vaporos Jan 16, 2019

dcmiddle Jan 16, 2019

vaporos Jan 16, 2019

vaporos Jan 16, 2019

vaporos Jan 16, 2019

vaporos Jan 16, 2019

yoni-wolf Jan 20, 2019

vaporos Jan 22, 2019

vaporos Jan 16, 2019

vaporos Jan 16, 2019

peterschwarz Jan 16, 2019 •

edited

vaporos Jan 16, 2019

peterschwarz left a comment

peterschwarz Jan 16, 2019

yoni-wolf commented Jan 28, 2019

yoni-wolf commented Feb 3, 2019

agunde406 commented Feb 7, 2019

duncanjw commented Feb 21, 2019 •

edited

jsmitchell commented Apr 1, 2019

jsmitchell left a comment

Allow TP to read by address prefix #34

Are you sure you want to change the base?

Allow TP to read by address prefix #34

Conversation

yoni-wolf commented Dec 20, 2018

agunde406 left a comment

Choose a reason for hiding this comment

yoni-wolf commented Dec 23, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yoni-wolf commented Jan 3, 2019

yoni-wolf commented Jan 16, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

peterschwarz Jan 16, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

peterschwarz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yoni-wolf commented Jan 28, 2019

yoni-wolf commented Feb 3, 2019

agunde406 commented Feb 7, 2019

duncanjw commented Feb 21, 2019 • edited

jsmitchell commented Apr 1, 2019

jsmitchell left a comment

Choose a reason for hiding this comment

peterschwarz Jan 16, 2019 •

edited

duncanjw commented Feb 21, 2019 •

edited