dog vdi delete leave orphan objects (in healthy cluster) #436

ggrandes · 2018-12-03T15:24:57Z

Summary:

dog vdi delete leaves orphan files in obj directory.

Environment:

Ubuntu 18.04.1 LTS
sheepdog 0.8.3-5 amd64 (standard ubuntu package in universe/bionic repo)
3 node cluster, corosync
basic test system (no data)

How reproduce:

# With: /usr/sbin/sheep --upgrade --pidfile /var/run/sheepdog.pid /data/sheepdog
# Do:
# dog cluster format -t             
using backend plain store
# ls -al /data/sheepdog/obj/              
total 20
drwxr-x--- 3 root root 12288 Dec  3 16:05 .
drwxr-x--- 4 root root  4096 Dec  3 16:05 ..
drwxr-x--- 2 root root  4096 Dec  3 16:05 .stale
# dog vdi list                               
  Name        Id    Size    Used  Shared    Creation time   VDI id  Copies  Tag
## ----- preallocate
# dog vdi create -P dog001 16M              
100.0 % [=====] 16 MB / 16 MB      
# dog vdi list
  Name        Id    Size    Used  Shared    Creation time   VDI id  Copies  Tag
  dog001       0   16 MB   16 MB  0.0 MB 2018-12-03 16:06   f81c00      3              
# ls -al /data/sheepdog/obj/               
total 20508
drwxr-x--- 3 root root   12288 Dec  3 16:06 .
drwxr-x--- 4 root root    4096 Dec  3 16:05 ..
-rw-r----- 1 root root 4194304 Dec  3 16:06 00f81c0000000000
-rw-r----- 1 root root 4194304 Dec  3 16:06 00f81c0000000001
-rw-r----- 1 root root 4194304 Dec  3 16:06 00f81c0000000002
-rw-r----- 1 root root 4194304 Dec  3 16:06 00f81c0000000003
-rw-r----- 1 root root 4198976 Dec  3 16:06 80f81c0000000000
drwxr-x--- 2 root root    4096 Dec  3 16:05 .stale
# dog vdi delete dog001                        
# dog vdi list                              
  Name        Id    Size    Used  Shared    Creation time   VDI id  Copies  Tag
# ls -al /data/sheepdog/obj/           
total 4124
drwxr-x--- 3 root root   12288 Dec  3 16:06 .
drwxr-x--- 4 root root    4096 Dec  3 16:05 ..
-rw-r----- 1 root root 4198976 Dec  3 16:06 80f81c0000000000 # <-------- !!!!
drwxr-x--- 2 root root    4096 Dec  3 16:05 .stale
## ----- non-preallocate
# dog vdi create dog001 16M               
# dog vdi list            
  Name        Id    Size    Used  Shared    Creation time   VDI id  Copies  Tag
  dog001       0   16 MB  0.0 MB  0.0 MB 2018-12-03 16:07   f81c01      3              
# ls -al /data/sheepdog/obj/               
total 8228
drwxr-x--- 3 root root   12288 Dec  3 16:07 .
drwxr-x--- 4 root root    4096 Dec  3 16:05 ..
-rw-r----- 1 root root 4198976 Dec  3 16:06 80f81c0000000000
-rw-r----- 1 root root 4198976 Dec  3 16:07 80f81c0100000000
drwxr-x--- 2 root root    4096 Dec  3 16:05 .stale
# dog vdi delete dog001                      
# dog vdi list                    
  Name        Id    Size    Used  Shared    Creation time   VDI id  Copies  Tag
# ls -al /data/sheepdog/obj/               
total 8228
drwxr-x--- 3 root root   12288 Dec  3 16:07 .
drwxr-x--- 4 root root    4096 Dec  3 16:05 ..
-rw-r----- 1 root root 4198976 Dec  3 16:06 80f81c0000000000
-rw-r----- 1 root root 4198976 Dec  3 16:08 80f81c0100000000 # <-------- !!!!
drwxr-x--- 2 root root    4096 Dec  3 16:05 .stale
## -----

The text was updated successfully, but these errors were encountered:

vtolstov · 2018-12-04T11:26:49Z

Please, try with latest master

ggrandes · 2018-12-10T08:15:37Z

Hi, we tried (without cluster, one node only) stable v1.0.1 (sheepdog-1.0.1-1_amd64.deb), problem remains same:

# dog cluster format -t -c 1
using backend plain store
# ls -al /var/lib/sheepdog/obj/              
total 12
drwxr-x--- 3 root root 4096 Dec 10 09:12 .
drwxr-x--- 4 root root 4096 Dec 10 09:12 ..
drwxr-x--- 2 root root 4096 Dec 10 09:12 .stale
# dog vdi create -P dog001 16M                 
100.0 % [===] 16 MB / 16 MB      
# ls -al /var/lib/sheepdog/obj/              
total 16408
drwxr-x--- 3 root root     4096 Dec 10 09:12 .
drwxr-x--- 4 root root     4096 Dec 10 09:12 ..
-rw-r----- 1 root root  4194304 Dec 10 09:12 00f81c0000000000
-rw-r----- 1 root root  4194304 Dec 10 09:12 00f81c0000000001
-rw-r----- 1 root root  4194304 Dec 10 09:12 00f81c0000000002
-rw-r----- 1 root root  4194304 Dec 10 09:12 00f81c0000000003
-rw-r----- 1 root root 12587576 Dec 10 09:12 80f81c0000000000
drwxr-x--- 2 root root     4096 Dec 10 09:12 .stale
# dog vdi delete dog001                     
# ls -al /var/lib/sheepdog/obj/           
total 16
drwxr-x--- 3 root root     4096 Dec 10 09:12 .
drwxr-x--- 4 root root     4096 Dec 10 09:12 ..
-rw-r----- 1 root root 12587576 Dec 10 09:12 80f81c0000000000 # <-------- !!!!
drwxr-x--- 2 root root     4096 Dec 10 09:12 .stale

ggrandes · 2018-12-10T08:49:42Z

Compiled from master (git clone / 3ebe5ea), same problem as v1.0.1

vatelzh · 2018-12-10T09:03:56Z

@ggrandes 80**** is inode object. It should not be deleted as design.

vtolstov · 2018-12-10T20:32:27Z

@vatelzh why? if no other vdi references objects from this vdi, why it not deleted?

vatelzh · 2018-12-11T05:58:43Z

@vtolstov This is useful in some scenarios. For example, when creating a vdi snapshot, vid is selected right next origin vid in a vdi bitmap which records all vid allocated. And Inodes in disk is the way to know which one was allocated when cluster was shutdown.

ggrandes · 2018-12-11T18:37:07Z

@vatelzh Thanks for the response. I do not doubt that for some scenarios is useful, but in other scenarios, when VDI is no longer used, really orphan files in long term are like a "memory leak".

Our future scenario are many ephimeral machines (+500/day), 12Mbytes per VDI, 365 days=2.19Tbytes of space lost (speaking in AWS prices this is around 400USD/month of EBS) in v1.0.1 (note also that this orphan files are x3 more big in v1.0.1 than her brothers of v0.8.3), this is a lot of space.

Thinking: can we do a "garbage collector" of some-type?

vtolstov · 2018-12-17T13:46:48Z

As i remember, sheepdog unmaintained for such things, but i'm try to build sheepdog compatible storage system (ceph crushmap for object location, but sheepdog proto for qemu).

AnatolyZimin · 2019-06-08T01:37:46Z

root@vmu-14:~# dog node list
  Id   Host:Port         V-Nodes       Zone
   0   172.16.3.14:7000    	189  235081900
   1   172.16.3.15:7000    	194  251859116
   2   172.16.3.16:7000    	194  268636332
   3   172.16.3.17:7000    	194  285413548
   4   172.16.3.18:7000    	194  302190764
   5   172.16.3.19:7000    	194  318967980
   6   172.16.3.20:7000    	194  335745196
root@vmu-14:~# dog node info
Id	Size	Used	Avail	Use%
 0	737 GB	0.0 MB	737 GB	  0%
 1	757 GB	0.0 MB	757 GB	  0%
 2	757 GB	0.0 MB	757 GB	  0%
 3	757 GB	0.0 MB	757 GB	  0%
 4	757 GB	0.0 MB	757 GB	  0%
 5	757 GB	0.0 MB	757 GB	  0%
 6	757 GB	0.0 MB	757 GB	  0%
Total	5.2 TB	0.0 MB	5.2 TB	  0%

Total virtual image size	0.0 MB
root@vmu-14:~# dog vdi list
  Name        Id    Size    Used  Shared    Creation time   VDI id  Copies  Tag   Block Size Shift
root@vmu-14:~# dog cluster format -c 3 -b tree
    __
   ()'`;
   /\|`
  /  |   Caution! The cluster is not empty.
(/_)_|_  Are you sure you want to continue? [yes/no]: yes
using backend tree store
root@vmu-14:~# dog vdi create -P TEST 12M
100.0 % [================================================================================================================================================================================================================================] 12 MB / 12 MB
root@vmu-14:~# for x in {14..20}; do echo $x; ssh 172.16.3.$x "find /sheep/ -type f | grep /sheep/vd | grep -v meta" ;done
14
/sheep/vdi1/25/00902e2500000002
/sheep/vdg1/25/00902e2500000001
15
/sheep/vdf1/25/00902e2500000002
16
/sheep/vdd1/25/00902e2500000000
17
/sheep/vde1/25/00902e2500000000
/sheep/vdi1/25/00902e2500000001
18
/sheep/vdd1/25/00902e2500000001
19
/sheep/vdh1/25/00902e2500000000
/sheep/vdi1/25/00902e2500000002
20
root@vmu-14:~# for x in {14..20}; do echo $x; ssh 172.16.3.$x "find /sheep/ -type f | grep /sheep/vd | grep  meta" ;done
14
/sheep/vdi1/meta/80902e2500000000
15
16
17
/sheep/vdf1/meta/80902e2500000000
18
19
20
/sheep/vdf1/meta/80902e2500000000
root@vmu-14:~# ls -lha /sheep/vdi1/meta/80902e2500000000
-rw-r----- 1 root root 13M июня   8 03:23 /sheep/vdi1/meta/80902e2500000000
root@vmu-14:~# ls -lha /sheep/vdi1/25/00902e2500000002
-rw-r----- 1 root root 4,0M июня   8 03:23 /sheep/vdi1/25/00902e2500000002
root@vmu-14:~# dog vdi delete TEST
root@vmu-14:~# for x in {14..20}; do echo $x; ssh 172.16.3.$x "find /sheep/ -type f | grep /sheep/vd | grep -v meta" ;done
14
15
16
17
18
19
20
root@vmu-14:~# for x in {14..20}; do echo $x; ssh 172.16.3.$x "find /sheep/ -type f | grep /sheep/vd | grep  meta" ;done
14
/sheep/vdi1/meta/80902e2500000000
15
16
17
/sheep/vdf1/meta/80902e2500000000
18
19
20
/sheep/vdf1/meta/80902e2500000000

AnatolyZimin · 2019-06-08T01:49:34Z

Do not worry. This is normal.
You can write an external script for periodic inode cleaning.

Look here
rgw: Add a command that deletes objects leaked from multipart retries
ceph/ceph#17349

This is a typical problem...

Sheepdog is not so bad as some people think. This is a great project.

ggrandes · 2019-06-09T12:49:17Z

Maybe.... but I had a simple rule: don't play/joke with storage, If you touch, don't surprise if you break things.
I would never dare touch an Oracle datafile manually, why should I do it with Sheepdog? ;-) (I will leave software do its work)

AnatolyZimin · 2019-06-09T23:03:09Z

You have one choise is to use hardware solutions. Open source does not fit into your rule.
It is incorrect to compare paid and expensive software with open source. Linux is an amateur OS. Now Solaris is dead, but when he was alive, this was a true enterprise.

ggrandes · 2019-06-10T22:48:52Z

Hi @AnatolyZimin,
Maybe the idiom barrier (I'm spanish-speaker) don't tell me express mi basic idea: "My rule is don't touch files manually (error prone). Sheepdog must be the only piece of software that must touch its metadata".
The Oracle reference was only to express a comparative about serious things (databases), but with PostgreSQL, Cassandra or Solr datafiles you have the same concept (let's forget Oracle).
But, by the way, "Do not worry. This (leak) is normal" is like say "this software have a memory leak, don't worry, buy more memory or reboot the server every week" 🙈
Workarounds like this must be temporary. I also develop opensource software (in other fields) and it would never occur to me to leave a bug in workaround mode (I think what I say is reasonable).
If I knew how to fix this problem in sheepdog I would be very happy to fix it myself and send a PullRequest; but honestly, I have no idea. 😅

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dog vdi delete leave orphan objects (in healthy cluster) #436

dog vdi delete leave orphan objects (in healthy cluster) #436

ggrandes commented Dec 3, 2018

vtolstov commented Dec 4, 2018

ggrandes commented Dec 10, 2018

ggrandes commented Dec 10, 2018

vatelzh commented Dec 10, 2018

vtolstov commented Dec 10, 2018

vatelzh commented Dec 11, 2018

ggrandes commented Dec 11, 2018

vtolstov commented Dec 17, 2018

AnatolyZimin commented Jun 8, 2019

AnatolyZimin commented Jun 8, 2019

ggrandes commented Jun 9, 2019

AnatolyZimin commented Jun 9, 2019 •

edited

ggrandes commented Jun 10, 2019

dog vdi delete leave orphan objects (in healthy cluster) #436

dog vdi delete leave orphan objects (in healthy cluster) #436

Comments

ggrandes commented Dec 3, 2018

vtolstov commented Dec 4, 2018

ggrandes commented Dec 10, 2018

ggrandes commented Dec 10, 2018

vatelzh commented Dec 10, 2018

vtolstov commented Dec 10, 2018

vatelzh commented Dec 11, 2018

ggrandes commented Dec 11, 2018

vtolstov commented Dec 17, 2018

AnatolyZimin commented Jun 8, 2019

AnatolyZimin commented Jun 8, 2019

ggrandes commented Jun 9, 2019

AnatolyZimin commented Jun 9, 2019 • edited

ggrandes commented Jun 10, 2019

AnatolyZimin commented Jun 9, 2019 •

edited