Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Implement discard for AIO on certain file systems #167

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

robertbreker
Copy link

Discard support is announced to the guest, if each of the following conditions
is true:

  • block-aio is used as the driver
  • the image is stored on a filesystem that supports hole punching.
  • the kernel and file system combination are whitelisted for supporting hole
    punching in tapdisk-util.c:is_hole_punching_supported_for_fd()
  • tapback is not run with the --nodiscard overwrite

Discard requests are translated into the following request on the file system:

  • fallocate(fd, (FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE), offset, length)

This commit has been dev-tested using:

  • v8 Windows PV drivers that include XenDisk and thereby implement discard
  • Linux xen-blkfront that implements discard

Signed-off-by: Robert Breker robert.breker@citrix.com

Robert Breker added 3 commits October 7, 2015 16:59
With this commit, tapdisk is able to understand and pass-through discard
request to tapdisk drivers which support it.

Each discard messages on the xen blkif is handled as follows:
1. xenio_blkif_get_request() gets discard requests from the ring. It decodes
   the request depending on the blkif protocol type and converts
   them into generic blkif_request_discard_t using blkif_get_req_discard.
2. tapdisk_xenblkif_make_vbd_request() iterates the message counter
   blkback_stats.st_ds_req for discards.
3. tapdisk_xenblkif_parse_request_discard() converts the discard request into
   a td_vbd_request with a start sector (sec) and a length
   (discard_nr_sectors).
3. The td_vbd_request is encapsulated into a td_request_t and is sanity
   checked in tapdisk_image_check_td_request, tapdisk_image_check_request and
   the new td_queue_discard method.
4. Ultimately the request is handled in td_queue_discard. If the tapdisk
   driver implements td_queue_discard, the request is passed through to that.
   If not, the request is failed with -EOPNOTSUPP.

This commit has been dev-tested using:
* v8 Windows PV drivers that include XenDisk and thereby implement discard
* Linux xen-blkfront that implements discard

Signed-off-by: Robert Breker <robert.breker@citrix.com>
This commit enables discard for images accessed using block_aio:
- For discard support, the image must be stored on a filesystem that supports
  hole punching. To determine this, there is a whitelist in
  util.c:is_hole_punching_supported_for_fd(), that filters based on the
  kernel version and filesystem.
- Discard request are handled as synchronous hole punches into the image.

This commit has been dev-tested using:
* v8 Windows PV drivers that include XenDisk and thereby implement discard
* Linux xen-blkfront that implements discard

Signed-off-by: Robert Breker <robert.breker@citrix.com>
The discard capability is announced for vbds if the following two conditions
apply:
* The backing tapdisk indicates discard support for the image that is
  attached to the vbd (i.e. block-aio is used in combination with a capable
  filesystem)
* tapback is not run with the --nodiscard overwrite

If discard is enabled, the discard granularity (discard_granularity) is
statically announced as the sector size of the vbd. The alignment
(discard_alignment) is statically announced as 0.
Secure discard (discard_secure) is never guaranteed.

This commit has been dev-tested using:
* v8 Windows PV drivers that include XenDisk and thereby implement discard
* Linux xen-blkfront that implements discard

Signed-off-by: Robert Breker <robert.breker@citrix.com>
@robertbreker
Copy link
Author

I've squeezed the changes and grouped them so they are easier to review.

The elephant in the room is still the potential request reordering of discards, as raised by Felipe:

This is not really queuing, it is punch-holing on the spot. This might be a problem if a guest issues: R, W, R, D, R, expecting "read data", "write new data", "read new data", "trim", "read zeros". If we queue everything else and trim on the spot, we will probably return: "read zeros", "write new data", "read new data", "read new data", which is a problem. So we might need to test whether this approach works or if we need to properly queue a discard operation and execute it later, in order.

@robertbreker
Copy link
Author

Regarding the reordering of requests:

As the next step I'd suggest:

  • Confirm whether guests would issue "READ, DISCARD", or whether they would issue barriers "READ, BARRIER", DISCARD" or wait "READ, WAITFORREPLY, DISCARD" when issuing the described request order.
  • In case we may need to handle "READ, DISCARD"- my current thinking is that we can get away with flushing the local tapdisk queue into the kernel queue using tapdisk_server_submit_tiocbs before calling fallocate64.

This will need some more thoughts both from @malcolmcrossley and myself to be certain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant