add RPC call Metadata requesting sector IDs #2584

starius · 2017-12-26T14:42:13Z

I addressed comments of @lukechampine in #2321. This pull request adds light version of Metadata RPC which returns just a first sector ID - it is cheap for host (so can be provided for free), but is sufficient to build a stateless client, recovering everything from seed only. (IDs of other sectors can be stored in the first sector.)

DavidVorick · 2018-06-11T02:43:13Z

This is now a high priority pull request.

Finally :P

lukechampine · 2018-06-11T16:27:51Z

modules/negotiate.go

+	NegotiateMetadataMaxSliceSize = build.Select(build.Var{
+		Dev:      uint64(1 << 17),
+		Standard: uint64(1 << 17),
+		Testing:  uint64(1 << 4),


these numbers are kind of magic; better would be SectorSize / crypto.HashSize.

lukechampine · 2018-06-11T17:09:54Z

modules/renter/proto/negotiate.go

-func verifyRecentRevision(conn net.Conn, contract contractHeader, hostVersion string) error {
+// getRecentRevision downloads the current revision from the host
+// and checks signatures.
+func getRecentRevision(conn net.Conn, fcid types.FileContractID, sk crypto.SecretKey, windowStart types.BlockHeight, hostVersion string) (types.FileContractRevision, error) {


I don't understand why it's necessary to change verifyRecentRevision. It doesn't look like you're using lastRevision in GetMetadata, so why do you need to return it?

The caller of GetMetadata can calculate merkle root of sector IDs returned by GetMetadata and compare it with merkle root from lastRevision. This is a way to verify that sector IDs are correct for some revision (host still can return old revision).

lukechampine · 2018-06-11T17:17:44Z

modules/renter.go

@@ -219,6 +219,7 @@ type RenterContract struct {
 	ID            types.FileContractID
 	HostPublicKey types.SiaPublicKey
 	Transaction   types.Transaction
+	SecretKey     crypto.SecretKey


This shouldn't be necessary

It is used in TestIntegrationMetadata to pass it to GetMetadata

lukechampine · 2018-06-11T17:33:04Z

IIUC this only allows you to fetch the first 2^17 = 131072 sector roots, i.e. 500 GiB of contract data. The RPC is also free. I suggest that instead, we allow requesting an arbitrary number of roots, but the renter must pay for them.

One interesting way to implement this would be to reuse almost all of the RPCDownload code, but with sentinel sector roots. Requesting crypto.Hash{0} would return the first 2^17 roots, crypto.Hash{1} would be the next 2^17, etc. (Actually, we'd probably want to use a special prefix, e.g. crypto.Hash{'s','e','c','t','o','r', 0, 1, 2, 3, 4, 5, 6, 7, 8}, where 0-8 is a little-endian uint64.) All the host needs to do is detect these roots and translate them into indices. We could even reuse the Offset and Length fields of modules.DownloadAction to enable requesting arbitrary subsets of roots.

One tricky thing here is that the renter needs to know how many roots there are. But it can get this by looking at the FileContractRevision reported by the host: the number of roots will be revision.NewFilesize / modules.SectorSize.

DavidVorick · 2018-06-11T21:23:35Z

I think I am not in favor of re-using the download rpc to do this, though I am in favor of having the renter pay to download the roots. I'm not sure the best way to mesh them to avoid code duplication, but using special values for the desired merkle root just feels like a bad plan to me.

I am also in favor of limiting the number of roots that can be downloaded at once to 4 MB of roots (131,000 roots), and having the downloader be able to specify an offset and length.

lukechampine · 2018-06-12T15:49:30Z

Some code duplication is probably okay. We don't want to intermingle the metadata RPC with the download RPC too much because we're going to be splitting download+upload into a separate RPC at some point anyway.
I'd be fine with copying the download RPC almost entirely, changing only the DownloadAction to:

type SectorRootsRequest struct {
    StartIndex uint64
    EndIndex   uint64
}

and of course changing the retrieval code to fetch roots rather than data. But everything else should stay: doing repeated requests in a loop, exchanging new host settings on each iteration, etc.

Actually, while we're at it, we could consider tightening up the protocol a little. One thing that stands out to me in particular is that we could eliminate a round trip. The current download protocol is:

Renter initiates request
Enter loop
Host sends settings
Renter sends accept/reject
Renter sends download request and unsigned revision
Host sends accept/reject
Renter sends signature
Host sends accept/reject
Host sends signature and data
Go to 3

We can eliminate a round trip by merging steps 5 and 7: there is no reason for the renter to wait until the host has accepted or rejected the request before sending its signature.

If we want to go further, we could modify the protocol to send only the parts of the revision that changed, instead of the full new revision. Currently the only values that change are the amounts in the proof outputs, and the revision number. So all we really need to send is a single types.Currency value: the amount that the renter is paying the host for the download. The host can use this to calculate the new proof outputs, and increment the revision number by 1. This would save about 500 bytes per iteration. However, we probably don't want to be so restrictive; for example, the renter should be able to set the revision number to math.MaxUint64 if it wants to, so we should explicitly send the revision number instead of always incrementing by 1. There may be other degrees of freedom that we want to permit as well. So I'm only suggesting this optimization tentatively.

starius · 2018-06-12T16:10:50Z

To speed up the protocol, we can go even further:

the renter sends new revision and a partial signature.
Host sends back the full signature + data (IDs or sector data) or sends "reject" and deletes the half-signature.

Just one round-trip - the fastest possible way.

The only concern is that a malicious host can keep the half-signature and not send the data, but it can do exactly this in the current protocol as well.

lukechampine · 2018-06-12T16:50:43Z

I think we're describing the same thing. The new protocol would be:

Renter initiates request
Enter loop
Host sends settings
Renter sends accept/reject
Renter sends download request, revision, and signature
Host sends accept/reject, signature, and data
Go to 3

We can get this down to a single round trip if we remove the settings exchange. Which I think is feasible; host settings probably shouldn't change during the protocol. If the host really wants to change their prices, they can still do so by terminating the protocol and waiting for the renter to "call them back," and then establishing the new settings. So then the protocol is:

Renter initiates request
Host sends settings
Renter sends accept/reject
Enter loop
Renter sends download request, revision, and signature
Host sends accept/reject, signature, and data
Go to 5

starius · 2018-06-12T21:12:10Z

I propose not to make settings exchange in the beginning, because the renter can cache those settings and they change really rare. The only part of settings that matters are prices. If settings change and the renter sends insufficient or excess payment, the host drops the request with the message "please update settings". In this case the renter makes RPCSettings and redoes the request. The latency of such protocol is just one round trip like of HTTP.

lukechampine · 2018-06-13T03:23:16Z

Yeah, after some thought, that sounds okay to me. It's like how you sometimes need to make multiple HTTP requests to figure out what content you want, but once you start receiving data it's a tight single-round-trip loop.
The renter is at risk of overpaying if they don't use the most recent settings, but I don't think that's a huge problem; they can always manually request before each transfer if they want to, and regardless if a price is too high they should just refuse to pay it in the first place.

starius · 2018-06-13T08:47:51Z

To avoid code duplication, I suggest to write the following private function:

type DownloadTarget int
const (
  DOWNLOAD_DATA DownloadTarget = iota
  DOWNLOAD_ID
)

func download(fcid types.FileContractID, target DownloadTarget, sectorID crypto.Hash,
  startIndex, endIndex uint64, paymentRevision types.FileContractRevision) ([]byte, error)

The function returns the payload: sector data or an array of sector IDs (in encoding format, so it can be written directly to conn.

This function can be used by existing RPCDownload and by RPCMetadata. We can also add new RPCDownload2 which will just call this function (like RPCMetadata). As an optimization, new RPCs don't need to pass whole paymentRevision on wire, but can pass just NewRevisionNumber and paymentRevision.NewValidProofOutputs[1] (this data is sufficient to build paymentRevision).

starius · 2018-06-16T17:49:11Z

Some hosts provide downloads for free. In this case we can just keep previous contract version instead of issuing a new one and writing it to disk. For highly loaded hosts it can be a significant optimization (worth providing download for free). Can we take it into account in this new RPC?

DavidVorick · 2018-06-18T19:38:57Z

That's an interesting thought. Perhaps we can.

DavidVorick · 2018-06-18T19:58:02Z

Should we just have 1 RPC between the renter and the host?

There are multiple things you can do:

upload
download
renew
fetch sector ids
maybe something else?

When the renter opens the first communication, the renter can probably send a request that includes the thing that the renter wants (upload/download/etc), a partially signed contract to pay for the request (omitted if the request doesn't require a new signature), and whatever other data is needed for the host to provide everything necessary.

Then the host can respond in a way that completes the request, and allows the renter to make a new request on the next communication using the same connection.

That way basically everything happens in 1 round trip.

The tricky part is if the host rejects whatever proposal the renter makes. If the renter makes a bad proposal (example: host has increased download price and renter doesn't know), the host can return some settings indicating the new settings / requirements and then the renter can try again on the same connection.

The one place this probably doesn't work is creating contracts, because as a renter you definitely want to see the most recent settings before you send a propsal contract, that way you don't risk over-paying.

lukechampine · 2018-06-18T21:30:24Z

I think I agree with starius' argument that it can be the renter's responsibility to fetch up-to-date settings. Especially if the host can return errors like "renter didn't pay enough", it's pretty clear that the renter should respond by fetching the most recent settings and then trying again.

Contract formation/renewal are pretty rare operations and don't need to be highly performant, so I'm less concerned about reducing round trips and such for those actions. I'd probably keep them distinct from the upload/download/metadata RPC, although I don't have super strong feelings about this. I will say that form/renew share a LOT of code, so it might make sense to formally merge them into one RPC.

The other important thing to keep in mind is that actions like upload and download are stateful. Our current loop-based RPCs are a good fit for this because, after initial setup, the renter and host can focus on just exchanging the minimum amount of data as fast as possible. For example, currently the renter has to prove to the host that it can sign with the renter key before any transfers take place -- but this only has to happen once. Similarly, for performance reasons, the host will want to load as much into memory as it can during the loop (such as the contract obligation) so that it doesn't need to fetch that info from disk for each request. If we move to a stateless single request/single response protocol, this stuff becomes harder; the host would have to maintain a LRU cache of obligations or something to that effect.

The loop model isn't perfect though; for small transfers, we could definitely do better. In particular, we want to design the protocol such that users can download small (sub-sector) files as quickly as possible. For sub-sector downloads, latency becomes much more important, to the point where even the TCP connection handshake starts to feel like a burden. David and I have kicked around the idea of a UDP renter-host protocol; I think that, for downloads that can fit in a small number of UDP packets, it could make sense. But my current feeling is that sub-sector downloads have such different requirements from the other RPCs that it may be worth devoting a separate RPC to them, distinct from the standard multi-sector download RPC. The advantage of this is that it allows us to add a sub-sector download RPC later, as an optimization, instead of trying to design a download RPC that's highly optimized for both use cases from the beginning. The standard download RPC can stick with the tried-and-tested TCP-based request/response loop (although of course we'll strive to eliminate as many round-trips as possible). That way, we're free to experiment with a highly-optimized download protocol without blocking progress on the new set of RPCs.

starius · 2018-06-18T23:50:38Z

I also think upload/download RPCs should not be mixed with new/renew RPCs. Actually I am not sure about upload RPCs. To organize upload RPC in one round trip way, a renter needs to send the data with everything else. But if the server rejects the request, it is still bombarded with 4M of uploaded data. On the other hand, upload requests are not as latency sensitive as download requests are, so they won't benefit much from being one round trip. So I think we should not change upload at this point.

Our current loop-based RPCs are a good fit for this because, after initial setup, the renter and host can focus on just exchanging the minimum amount of data as fast as possible. For example, currently the renter has to prove to the host that it can sign with the renter key before any transfers take place -- but this only has to happen once.

If renter does not prove that he can sign with the renter key, then the only extra information that can be sent once in the beginning is contract ID. We can create an RPC that starts with sending contract ID once and then the renter can send download/upload/metadata requests, sending only needed data. The renter doesn't wait after sending contract ID, so it still one round trip even for the first sub-request.

Stateful handlers can keep some state in memory, but they still have to sync with disk each new recent revision.

I like the idea with UDP. I had a slightly different idea of stateful download requests, which can be combined with UDP: pay first, download next. I.e. you paid for downloading of 1G and the host gives you a token, which you can use to download that 1G later (probably with some expiration time). Here you trust the host not to cheat. But you do it now as well, just for smaller size (one sector). If you know that the host works well, you can increase the amount you pay for in advance. Hosts can keep the table with download tokens in memory and sync it in background (if they crash, then some tokens will have slightly more download space then they paid for, which is fine). The same tokens can be reused with UDP: you can send the token and the target sector and coordinates in it in every UDP packet. Hosts should reward renters downloading pieces from the same sector rather than from different sectors. There can be two UDP RPCs: (1) download piece and (2) download piece and keep that sector in memory for some time. Both of them charge less if the sector is already in memory. But I think I am going into details too deep.

sub-sector downloads have such different requirements from the other RPCs that it may be worth devoting a separate RPC to them

In https://godoc.org/github.com/NebulousLabs/Sia/modules#DownloadAction I see Offset and Length parameters. Don't they already provide sub-sector downloads?

Function getRecentRevision downloads recent revision from the host and checks its signatures. It takes the minimal data sufficient for this operation: contract ID, secret key, window start and host version. Function verifyRecentRevision calls getRecentRevision and compares the recent revision from host with locally known recent revision. It takes a private type (contractHeader) used by other code in the package.

This is second attempt of NebulousLabs#2321 The whole list of IDs can be expensive for a host to provide for free, but a short list of sector IDs (up to 2^17) is cheap. At the same time it opens doors for stateless clients, i.e. recovering everything from seed only.

tbenz9 requested a review from lukechampine January 2, 2018 19:25

starius force-pushed the metadata-rpc-3 branch from 75c8e7a to da928fc Compare January 27, 2018 14:53

starius changed the title ~~add RPC call Metadata requesting first sector ID~~ add RPC call Metadata requesting sector IDs Feb 5, 2018

starius force-pushed the metadata-rpc-3 branch from da928fc to 4cf1867 Compare April 21, 2018 12:44

DavidVorick added the v1.3.4 label Jun 11, 2018

lukechampine reviewed Jun 11, 2018

View reviewed changes

MSevey added this to the v1.3.4 milestone Jun 25, 2018

MSevey added v1.4.0 and removed v1.3.4 labels Jun 28, 2018

Boris Nagaev added 3 commits July 15, 2018 12:34

re-add SecretKey back to RenterContract

5b75e1f

starius force-pushed the metadata-rpc-3 branch from 4cf1867 to 7833125 Compare July 15, 2018 10:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add RPC call Metadata requesting sector IDs #2584

add RPC call Metadata requesting sector IDs #2584

starius commented Dec 26, 2017

DavidVorick commented Jun 11, 2018

lukechampine Jun 11, 2018 •

edited

lukechampine Jun 11, 2018

starius Jul 15, 2018

lukechampine Jun 11, 2018

starius Jul 15, 2018

lukechampine commented Jun 11, 2018

DavidVorick commented Jun 11, 2018

lukechampine commented Jun 12, 2018

starius commented Jun 12, 2018

lukechampine commented Jun 12, 2018 •

edited

starius commented Jun 12, 2018

lukechampine commented Jun 13, 2018

starius commented Jun 13, 2018

starius commented Jun 16, 2018

DavidVorick commented Jun 18, 2018

DavidVorick commented Jun 18, 2018

lukechampine commented Jun 18, 2018

starius commented Jun 18, 2018

add RPC call Metadata requesting sector IDs #2584

Are you sure you want to change the base?

add RPC call Metadata requesting sector IDs #2584

Conversation

starius commented Dec 26, 2017

DavidVorick commented Jun 11, 2018

lukechampine Jun 11, 2018 • edited

Choose a reason for hiding this comment

lukechampine Jun 11, 2018

Choose a reason for hiding this comment

starius Jul 15, 2018

Choose a reason for hiding this comment

lukechampine Jun 11, 2018

Choose a reason for hiding this comment

starius Jul 15, 2018

Choose a reason for hiding this comment

lukechampine commented Jun 11, 2018

DavidVorick commented Jun 11, 2018

lukechampine commented Jun 12, 2018

starius commented Jun 12, 2018

lukechampine commented Jun 12, 2018 • edited

starius commented Jun 12, 2018

lukechampine commented Jun 13, 2018

starius commented Jun 13, 2018

starius commented Jun 16, 2018

DavidVorick commented Jun 18, 2018

DavidVorick commented Jun 18, 2018

lukechampine commented Jun 18, 2018

starius commented Jun 18, 2018

lukechampine Jun 11, 2018 •

edited

lukechampine commented Jun 12, 2018 •

edited