I/O issues using flashcache, broken system files #193

andrey-minsky · 2014-11-15T18:46:37Z

Hello, we are using 2x480GB SSD RAID0 as cache for our VPS nodes, but with latest versions we have a lot of issues. Some examples of logs:

[ 42.722313] Buffer I/O error on device xvda1, logical block 309394
[ 42.722317] lost page write due to I/O error on xvda1
[ 42.730988] end_request: I/O error, dev xvda1, sector 2475344
[ 42.739014] end_request: I/O error, dev xvda1, sector 2475600

Issue can be solved by node reboot.

Also, Windows VPS after some time of using on node with flashcache have broken system files and not booting correctly - its very big problem...

mohans · 2014-11-16T05:02:40Z

Are there any flashcache messages in the messages file around the time these issues occur ?

What version of flashcache are you running ? Are you running master, top of the tree ?

From: andrey-minsky notifications@github.com
To: facebook/flashcache flashcache@noreply.github.com
Sent: Saturday, November 15, 2014 10:46 AM
Subject: [flashcache] I/O issues using flashcache, broken system files (#193)

Hello, we are using 2x480GB SSD RAID0 as cache for our VPS nodes, but with latest versions we have a lot of issues. Some examples of logs:
[ 42.722313] Buffer I/O error on device xvda1, logical block 309394
[ 42.722317] lost page write due to I/O error on xvda1
[ 42.730988] end_request: I/O error, dev xvda1, sector 2475344
[ 42.739014] end_request: I/O error, dev xvda1, sector 2475600
Issue can be solved by node reboot.
Also, Windows VPS after some time of using on node with flashcache have broken system files and not booting correctly - its very big problem...
—
Reply to this email directly or view it on GitHub.

andrey-minsky · 2014-11-16T12:29:50Z

cat /proc/flashcache/md0+scsi-3600605b0057db8401a4b346b2d4d3838-part2/flashcache_stats
reads=148535851 writes=107025354
read_hits=33246752 read_hit_percent=22 write_hits=77837038 write_hit_percent=72 replacement=22503167 write_replacement=8191551 write_invalidates=95064 read_invalidates=162421 pending_enqueues=152 pending_inval=152 no_room=0 disk_reads=115292063 disk_writes=78626210 ssd_reads=33247169 ssd_writes=219983065 uncached_reads=1543878 uncached_writes=787582 uncached_IO_requeue=0 uncached_sequential_reads=0 uncached_sequential_writes=0 pid_adds=0 pid_dels=0 pid_drops=0 pid_expiry=0

I mean write_invalidates/read_invalidates errors. We are using master version flashcache-3.1.1

mohans · 2014-11-16T17:00:39Z

Are there flashcache messages in /var/log/messages that show IO errors either from flash or disk ? Flashcache is returning EIO (or at least the application is reporting EIO). Are either the underlying disk or flash returning EIO ?

Can you paste the output from dmsetup status and dmsetup table ?

Write Invalidate is not an error.

From: andrey-minsky notifications@github.com
To: facebook/flashcache flashcache@noreply.github.com
Cc: Mohan Srinivasan mohan_srinivasan@yahoo.com
Sent: Sunday, November 16, 2014 4:29 AM
Subject: Re: [flashcache] I/O issues using flashcache, broken system files (#193)

cat /proc/flashcache/md0+scsi-3600605b0057db8401a4b346b2d4d3838-part2/flashcache_stats
reads=148535851 writes=107025354
read_hits=33246752 read_hit_percent=22 write_hits=77837038 write_hit_percent=72 replacement=22503167 write_replacement=8191551 write_invalidates=95064 read_invalidates=162421 pending_enqueues=152 pending_inval=152 no_room=0 disk_reads=115292063 disk_writes=78626210 ssd_reads=33247169 ssd_writes=219983065 uncached_reads=1543878 uncached_writes=787582 uncached_IO_requeue=0 uncached_sequential_reads=0 uncached_sequential_writes=0 pid_adds=0 pid_dels=0 pid_drops=0 pid_expiry=0
I mean write_invalidates errors. We are using master version /lib/modules/3.16.1/extra/flashcache/flashcache.ko
—
Reply to this email directly or view it on GitHub.

Kaydannik · 2014-11-17T08:01:07Z

We had same errors on setup with HDD storage with 4k block size.
Storage was 4k 18 TB. SDD was usual 512 blocks size. As result we were receiving errors
end_request: I/O error, dev sda1 , sector XXXXX and dataloss.

We tried different settings on cache creates, blocks sizes etc.. But was solved just by moving back to 512b sectors on HDD

andrey-minsky · 2014-11-17T08:13:08Z

Kaydannik, usefull information, thanks, but our node in production already and we can change block size, also, before, in older versions everything was ok. May be possible somehow fix this issue in new version of flashcache? Or we can format SSDs with same block size as HDDs?

andrey-minsky · 2014-11-17T15:04:51Z

We check once again, our block sizes are similar and equal to 512.

We do not have any flashcache errors in logs, requested logs attached:
http://185.4.64.3/dmsetup-status.txt
http://185.4.64.3/dmsetup-table.txt

JQuags · 2015-03-03T19:57:26Z

I have a similar set up with flashcache running on KVM and windows templates. On Linux templates the same settings work fine. The issue is system files randomly become corrupted, in write through mode. Simply turning off cache all solves the issue. Flashcache reports no errors. I am including information below.

I will do a more detailed test soon with virtio on / off, and different options in KVM/libvirt. But the KVM settings the same way do work on linux guests.

version flashcache-3.1.1
kernel 2.6.32-504.8.1.el6.x86_64
mode writethrough

status
cachedev_vz: 0 3730019968 flashcache stats:
reads(123395096), writes(5656486)
read hits(934953), read hit percent(0)
write hits(13189) write hit percent(0)
replacement(11), write replacement(0)
write invalidates(39108), read invalidates(15214)
pending enqueues(30544), pending inval(30496)
no room(0)
disk reads(122466306), disk writes(5603184) ssd reads(934953) ssd writes(351888)
uncached reads(122180911), uncached writes(5589995), uncached IO requeue(6163)
disk read errors(0), disk write errors(0) ssd read errors(0) ssd write errors(0)
uncached sequential reads(118727551), uncached sequential writes(4630111)
pid_adds(0), pid_dels(0), pid_drops(0) pid_expiry(0)
lru hot blocks(26312960), lru warm blocks(26312960)
lru promotions(234809), lru demotions(0)

table
cachedev_vz: 0 3730019968 flashcache conf:
ssd dev (/dev/md131), disk dev (/dev/md127) cache mode(WRITE_THROUGH)
capacity(205570M), associativity(512), data block size(4K)
disk assoc(256K)
skip sequential thresh(64K)
total blocks(52625920), cached blocks(260033), cache percent(0)
nr_queued(0)
Size Hist: 512:23946 1024:125609 1536:39395 2048:43893 2560:36999 3072:121873 3584:17439 4096:128642416

stats
reads=123395136 writes=5656487
read_hits=934983 read_hit_percent=0 write_hits=13189 write_hit_percent=0 replacement=11 write_replacement=0 write_invalidates=39108 read_invalidates=15214 pending_enqueues=30544 pending_inval=30496 no_room=0 disk_reads=122466316 disk_writes=5603185 ssd_reads=934983 ssd_writes=351888 uncached_reads=122180921 uncached_writes=5589996 uncached_IO_requeue=6163 uncached_sequential_reads=118727551 uncached_sequential_writes=4630111 pid_adds=0 pid_dels=0 pid_drops=0 pid_expiry=0

disk_read_errors=0 disk_write_errors=0 ssd_read_errors=0 ssd_write_errors=0 memory_alloc_errors=0

JQuags · 2015-03-04T21:00:56Z

A little update on my above post.

I can reproduce the errors with flashcache on and windows consistently but also appear to have a solution. Setting a password or windows updates results in errors like: http://i.is.cc/1yPQJu2r.png (second or third time it goes though)

My normal set up for linux and windows servers in the xml file is

driver name='qemu' type='raw' cache='none'/

if I change this to

driver name='qemu' type='raw' cache='writeback' threads='native'/

I no longer see any errors in windows guests with flashcache.

ghost · 2015-08-04T18:02:00Z

Thank you for reporting this issue and appreciate your patience. We've notified the core team for an update on this issue. We're looking for a response within the next 30 days or the issue may be closed.

pubyun · 2015-10-26T01:08:27Z

same as issue:
#133

can be reproduced easily.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I/O issues using flashcache, broken system files #193

I/O issues using flashcache, broken system files #193

andrey-minsky commented Nov 15, 2014

mohans commented Nov 16, 2014

andrey-minsky commented Nov 16, 2014

mohans commented Nov 16, 2014

Kaydannik commented Nov 17, 2014

andrey-minsky commented Nov 17, 2014

andrey-minsky commented Nov 17, 2014

JQuags commented Mar 3, 2015

JQuags commented Mar 4, 2015

ghost commented Aug 4, 2015

pubyun commented Oct 26, 2015

I/O issues using flashcache, broken system files #193

I/O issues using flashcache, broken system files #193

Comments

andrey-minsky commented Nov 15, 2014

mohans commented Nov 16, 2014

andrey-minsky commented Nov 16, 2014

mohans commented Nov 16, 2014

Kaydannik commented Nov 17, 2014

andrey-minsky commented Nov 17, 2014

andrey-minsky commented Nov 17, 2014

JQuags commented Mar 3, 2015

JQuags commented Mar 4, 2015

ghost commented Aug 4, 2015

pubyun commented Oct 26, 2015