Merge pull request #128 from dbaarda/doc/release-v2.0.1

Release v2.0.1.
librsync · Oct 17, 2017 · f1dea6d · f1dea6d
2 parents 59b301a + 05f6a91
commit f1dea6d
Show file tree

Hide file tree

Showing 2 changed files with 21 additions and 95 deletions.
diff --git a/NEWS.md b/NEWS.md
@@ -1,9 +1,13 @@
 # librsync NEWS
 
-## librsync 2.0.1
+## librsync 2.0.2
 
 NOT RELEASED YET
 
+## librsync 2.0.1
+
+Released 2017-10-17
+
  * Extensively reworked Doxygen documentation, now available at
    http://librsync.sourcefrog.net/ (Martin Pool)
 
@@ -74,15 +78,15 @@ NOT RELEASED YET
  * Fixed hanging for truncated input files. It will now correctly report an
    error indicating an unexpected EOF was encountered. (dbaarda,
    https://github.com/librsync/librsync/issues/32)
-   
+
  * Fixed #13 so that faster slack delta's are used for signatures of
    empty files. (dbaarda,
    https://github.com/librsync/librsync/issues/13)
-   
+
  * Fixed #33 so rs_job_iter() doesn't need calling twice with eof=1.
    Also tidied and optimized it a bit. (dbaarda,
    https://github.com/librsync/librsync/issues/33)
-   
+
  * Fixed #55 remove excessive rs_fatal() calls, replacing checks for
    programming errors with assert statements. Now rs_fatal() will only
    be called for rare unrecoverable fatal errors like malloc failures or

diff --git a/TODO.md b/TODO.md
@@ -1,74 +1,36 @@
 * Fix symbol names:
 
   * Rename all symbols that are intended to be private to `rs__`
-  
+
   * Rename those that don't match either prefix.
-  
+
 * We have a few functions to do with reading a netint, stashing
   it somewhere, then moving into a different state.  Is it worth
   writing generic functions for that, or would it be too confusing?
 
-* Fix up consecutive matches
-
-  We often have several consecutive matches, and we can combine them
-  into a single COPY command.  So far so good.
-
-  In some inputs, there might be several identical blocks.
-
-  When we're matching, we want to prefer to match a block that comes
-  just after the previous match, so that they'll join up nicely into
-  a single larger match.  rsync does this; librsync doesn't at the
-  moment.  It does cause a measurable problem.
-
-  In fact, we could introduce an additional optimization over rsync.
-  Suppose that the block A occurs twice, once followed by B and once
-  by C.  When we first match it, we'll probably make an arbitrary
-  choice of which one to use.  But if we next observe C, then it
-  might be better to have given the offset of the A that precedes C,
-  so that they can be joined into a single copy operation.
-
-  This might be a bit complex.  You can imagine in fact needing an
-  arbitrarily deep lookback.
+* Duplicate block handling. Currently duplicate blocks are included in
+  the signature, but we only put the first duplicate block in the
+  hashtable so the delta only includes references to the first block.
+  This can result in sub-optimal copy commands, breaking single large
+  copies with duplicate blocks into multiple copies referencing the
+  earlier copy of the block. However, this could also make patching use
+  the disk cache more effectively. This solution is probably fine,
+  particularly given how small copy instructions are, but there might be
+  solutions for improving copy commands for long runs of duplicate blocks.
 
-  As a simpler optimization, we might just try to prefer matching
-  blocks in the same order that they occur in the input.
-
-  But for now we ought to at least check for consecutive blocks.
-
-  On the other hand, abo says:
-
-       In reality copy's are such a huge gain that merging them efficiently
-       is a bit of a non-issue. Each copy command is only a couple of
-       bytes... who cares if we output twice as many as we need to... it's
-       the misses that take up whole blocks of data that people will notice.
-
-       I believe we are already outputing consecutive blocks as a single
-       "copy" command, but have you looked at the "search" code? We have far
-       more serious problems with the hash-table that need to be fixed first
-       :-)
-
-       We are not getting all the hits that we could due to a limited
-       hash-table, and this is going to make a much bigger difference than
-       optimizing the copy commands.
-
 * Optimisations and code cleanups;
 
   scoop.c: Scoop needs major refactor. Perhaps the API needs
   tweaking?
 
   rsync.h: rs_buffers_s and rs_buffers_t should be one typedef?
-  
+
   * Just how useful is rs_job_drive anyway?
-
-  patch.c: rs_patch_s_copying() does alloc, copy free, when it could
-  just copy directly into rs_buffer_t buffer. This _does_ mean the
-  callback can't allocate it's own data, though this can be done by
-  checking if the callback changed the pointer.
 
   mdfour.c: This code has a different API to the RSA code in libmd
   and is coupled with librsync in unhealthy ways (trace?). Recommend
   changing to RSA API?
-   
+
 * Don't use the rs_buffers_t structure.
 
   There's something confusing about the existence of this structure.
@@ -121,10 +83,6 @@
   Some are more likely to change than others.  We need a chart
   showing which source files depend on which variable.
 
-* Error handling
-
-  * What happens if the user terminates the request?
-
 * Encoding implementation
 
   * Join up signature commands
@@ -181,21 +139,6 @@
     current simple rolling-sum mechanism?  Could it let us match
     variable-length signatures?
 
-  * Cross-file matches
-
-    If the downstream server had many similar URLs, it might be nice
-    if it could draw on all of them as a basis.  At the moment
-    there's no way to express this, and I think the work of sending
-    up signatures for all of them may be too hard.
-
-    Better just to make sure we choose the best basis if there is
-    none present.  Perhaps this needs to weigh several factors.
-
-    One factor might be that larger files are better because they're
-    more likely to have a match.  I'm not sure if that's very strong,
-    because they'll just bloat the request.  Another is that more
-    recent files might be more useful.
-
 * Support gzip compression of the difference stream.  Does this
   belong here, or should it be in the client and librsync just have
   an interface that lets it cleanly plug in?
@@ -273,36 +216,15 @@
     on the network, then it's a security boundary.  Make sure that
     corrupt input data can't make the program crash or misbehave.
 
-* Use slprintf not strnprintf, etc.
-
 * Long files
 
   * How do we handle the large signatures required to support large
     files?  In particular, how do we choose an appropriate block size
     when the length is unknown?  Perhaps we should allow a way for
     the signature to scale up as it grows.
 
-  * What do we need to do to compile in support for this?
-
-    * On GNU, defining `_LARGEFILE_SOURCE` as we now do should be
-      sufficient.
-
-    * SCO and similar things on 32-bit platforms may be more
-      difficult.  Some SCO systems have no 64-bit types at all, so
-      there we will have to do without.
-
-    * On larger Unix platforms we hope that large file support will
-      be the default.
-
 * Perhaps make extracted signatures still be wrapped in commands.
   What would this lead to?
 
   * We'd know how much signature data we expect to read, rather than
     requiring it to be terminated by the caller.
-
-* Only use `inline` if the compiler supports it; perhaps allow it to be
-  disabled or even just let the compiler decide?
-
-* Fall back from `uint8_t` to probably `unsigned char` if necessary.
-
-* Don't randomly use chars and longs; use rs_byte_t and rs_size_t.