Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dog vdi delete leave orphan objects (in healthy cluster) #436

Open
ggrandes opened this issue Dec 3, 2018 · 13 comments
Open

dog vdi delete leave orphan objects (in healthy cluster) #436

ggrandes opened this issue Dec 3, 2018 · 13 comments

Comments

@ggrandes
Copy link

ggrandes commented Dec 3, 2018

Summary:

  • dog vdi delete leaves orphan files in obj directory.

Environment:

  • Ubuntu 18.04.1 LTS
  • sheepdog 0.8.3-5 amd64 (standard ubuntu package in universe/bionic repo)
  • 3 node cluster, corosync
  • basic test system (no data)

How reproduce:

# With: /usr/sbin/sheep --upgrade --pidfile /var/run/sheepdog.pid /data/sheepdog
# Do:
# dog cluster format -t             
using backend plain store
# ls -al /data/sheepdog/obj/              
total 20
drwxr-x--- 3 root root 12288 Dec  3 16:05 .
drwxr-x--- 4 root root  4096 Dec  3 16:05 ..
drwxr-x--- 2 root root  4096 Dec  3 16:05 .stale
# dog vdi list                               
  Name        Id    Size    Used  Shared    Creation time   VDI id  Copies  Tag
## ----- preallocate
# dog vdi create -P dog001 16M              
100.0 % [=====] 16 MB / 16 MB      
# dog vdi list
  Name        Id    Size    Used  Shared    Creation time   VDI id  Copies  Tag
  dog001       0   16 MB   16 MB  0.0 MB 2018-12-03 16:06   f81c00      3              
# ls -al /data/sheepdog/obj/               
total 20508
drwxr-x--- 3 root root   12288 Dec  3 16:06 .
drwxr-x--- 4 root root    4096 Dec  3 16:05 ..
-rw-r----- 1 root root 4194304 Dec  3 16:06 00f81c0000000000
-rw-r----- 1 root root 4194304 Dec  3 16:06 00f81c0000000001
-rw-r----- 1 root root 4194304 Dec  3 16:06 00f81c0000000002
-rw-r----- 1 root root 4194304 Dec  3 16:06 00f81c0000000003
-rw-r----- 1 root root 4198976 Dec  3 16:06 80f81c0000000000
drwxr-x--- 2 root root    4096 Dec  3 16:05 .stale
# dog vdi delete dog001                        
# dog vdi list                              
  Name        Id    Size    Used  Shared    Creation time   VDI id  Copies  Tag
# ls -al /data/sheepdog/obj/           
total 4124
drwxr-x--- 3 root root   12288 Dec  3 16:06 .
drwxr-x--- 4 root root    4096 Dec  3 16:05 ..
-rw-r----- 1 root root 4198976 Dec  3 16:06 80f81c0000000000 # <-------- !!!!
drwxr-x--- 2 root root    4096 Dec  3 16:05 .stale
## ----- non-preallocate
# dog vdi create dog001 16M               
# dog vdi list            
  Name        Id    Size    Used  Shared    Creation time   VDI id  Copies  Tag
  dog001       0   16 MB  0.0 MB  0.0 MB 2018-12-03 16:07   f81c01      3              
# ls -al /data/sheepdog/obj/               
total 8228
drwxr-x--- 3 root root   12288 Dec  3 16:07 .
drwxr-x--- 4 root root    4096 Dec  3 16:05 ..
-rw-r----- 1 root root 4198976 Dec  3 16:06 80f81c0000000000
-rw-r----- 1 root root 4198976 Dec  3 16:07 80f81c0100000000
drwxr-x--- 2 root root    4096 Dec  3 16:05 .stale
# dog vdi delete dog001                      
# dog vdi list                    
  Name        Id    Size    Used  Shared    Creation time   VDI id  Copies  Tag
# ls -al /data/sheepdog/obj/               
total 8228
drwxr-x--- 3 root root   12288 Dec  3 16:07 .
drwxr-x--- 4 root root    4096 Dec  3 16:05 ..
-rw-r----- 1 root root 4198976 Dec  3 16:06 80f81c0000000000
-rw-r----- 1 root root 4198976 Dec  3 16:08 80f81c0100000000 # <-------- !!!!
drwxr-x--- 2 root root    4096 Dec  3 16:05 .stale
## -----
@vtolstov
Copy link
Contributor

vtolstov commented Dec 4, 2018

Please, try with latest master

@ggrandes
Copy link
Author

Hi, we tried (without cluster, one node only) stable v1.0.1 (sheepdog-1.0.1-1_amd64.deb), problem remains same:

# dog cluster format -t -c 1
using backend plain store
# ls -al /var/lib/sheepdog/obj/              
total 12
drwxr-x--- 3 root root 4096 Dec 10 09:12 .
drwxr-x--- 4 root root 4096 Dec 10 09:12 ..
drwxr-x--- 2 root root 4096 Dec 10 09:12 .stale
# dog vdi create -P dog001 16M                 
100.0 % [===] 16 MB / 16 MB      
# ls -al /var/lib/sheepdog/obj/              
total 16408
drwxr-x--- 3 root root     4096 Dec 10 09:12 .
drwxr-x--- 4 root root     4096 Dec 10 09:12 ..
-rw-r----- 1 root root  4194304 Dec 10 09:12 00f81c0000000000
-rw-r----- 1 root root  4194304 Dec 10 09:12 00f81c0000000001
-rw-r----- 1 root root  4194304 Dec 10 09:12 00f81c0000000002
-rw-r----- 1 root root  4194304 Dec 10 09:12 00f81c0000000003
-rw-r----- 1 root root 12587576 Dec 10 09:12 80f81c0000000000
drwxr-x--- 2 root root     4096 Dec 10 09:12 .stale
# dog vdi delete dog001                     
# ls -al /var/lib/sheepdog/obj/           
total 16
drwxr-x--- 3 root root     4096 Dec 10 09:12 .
drwxr-x--- 4 root root     4096 Dec 10 09:12 ..
-rw-r----- 1 root root 12587576 Dec 10 09:12 80f81c0000000000 # <-------- !!!!
drwxr-x--- 2 root root     4096 Dec 10 09:12 .stale

@ggrandes
Copy link
Author

Compiled from master (git clone / 3ebe5ea), same problem as v1.0.1

@vatelzh
Copy link

vatelzh commented Dec 10, 2018

@ggrandes 80**** is inode object. It should not be deleted as design.

@vtolstov
Copy link
Contributor

@vatelzh why? if no other vdi references objects from this vdi, why it not deleted?

@vatelzh
Copy link

vatelzh commented Dec 11, 2018

@vtolstov This is useful in some scenarios. For example, when creating a vdi snapshot, vid is selected right next origin vid in a vdi bitmap which records all vid allocated. And Inodes in disk is the way to know which one was allocated when cluster was shutdown.

@ggrandes
Copy link
Author

@vatelzh Thanks for the response. I do not doubt that for some scenarios is useful, but in other scenarios, when VDI is no longer used, really orphan files in long term are like a "memory leak".

Our future scenario are many ephimeral machines (+500/day), 12Mbytes per VDI, 365 days=2.19Tbytes of space lost (speaking in AWS prices this is around 400USD/month of EBS) in v1.0.1 (note also that this orphan files are x3 more big in v1.0.1 than her brothers of v0.8.3), this is a lot of space.

Thinking: can we do a "garbage collector" of some-type?

@vtolstov
Copy link
Contributor

As i remember, sheepdog unmaintained for such things, but i'm try to build sheepdog compatible storage system (ceph crushmap for object location, but sheepdog proto for qemu).

@AnatolyZimin
Copy link

root@vmu-14:~# dog node list
  Id   Host:Port         V-Nodes       Zone
   0   172.16.3.14:7000    	189  235081900
   1   172.16.3.15:7000    	194  251859116
   2   172.16.3.16:7000    	194  268636332
   3   172.16.3.17:7000    	194  285413548
   4   172.16.3.18:7000    	194  302190764
   5   172.16.3.19:7000    	194  318967980
   6   172.16.3.20:7000    	194  335745196
root@vmu-14:~# dog node info
Id	Size	Used	Avail	Use%
 0	737 GB	0.0 MB	737 GB	  0%
 1	757 GB	0.0 MB	757 GB	  0%
 2	757 GB	0.0 MB	757 GB	  0%
 3	757 GB	0.0 MB	757 GB	  0%
 4	757 GB	0.0 MB	757 GB	  0%
 5	757 GB	0.0 MB	757 GB	  0%
 6	757 GB	0.0 MB	757 GB	  0%
Total	5.2 TB	0.0 MB	5.2 TB	  0%

Total virtual image size	0.0 MB
root@vmu-14:~# dog vdi list
  Name        Id    Size    Used  Shared    Creation time   VDI id  Copies  Tag   Block Size Shift
root@vmu-14:~# dog cluster format -c 3 -b tree
    __
   ()'`;
   /\|`
  /  |   Caution! The cluster is not empty.
(/_)_|_  Are you sure you want to continue? [yes/no]: yes
using backend tree store
root@vmu-14:~# dog vdi create -P TEST 12M
100.0 % [================================================================================================================================================================================================================================] 12 MB / 12 MB
root@vmu-14:~# for x in {14..20}; do echo $x; ssh 172.16.3.$x "find /sheep/ -type f | grep /sheep/vd | grep -v meta" ;done
14
/sheep/vdi1/25/00902e2500000002
/sheep/vdg1/25/00902e2500000001
15
/sheep/vdf1/25/00902e2500000002
16
/sheep/vdd1/25/00902e2500000000
17
/sheep/vde1/25/00902e2500000000
/sheep/vdi1/25/00902e2500000001
18
/sheep/vdd1/25/00902e2500000001
19
/sheep/vdh1/25/00902e2500000000
/sheep/vdi1/25/00902e2500000002
20
root@vmu-14:~# for x in {14..20}; do echo $x; ssh 172.16.3.$x "find /sheep/ -type f | grep /sheep/vd | grep  meta" ;done
14
/sheep/vdi1/meta/80902e2500000000
15
16
17
/sheep/vdf1/meta/80902e2500000000
18
19
20
/sheep/vdf1/meta/80902e2500000000
root@vmu-14:~# ls -lha /sheep/vdi1/meta/80902e2500000000
-rw-r----- 1 root root 13M июня   8 03:23 /sheep/vdi1/meta/80902e2500000000
root@vmu-14:~# ls -lha /sheep/vdi1/25/00902e2500000002
-rw-r----- 1 root root 4,0M июня   8 03:23 /sheep/vdi1/25/00902e2500000002
root@vmu-14:~# dog vdi delete TEST
root@vmu-14:~# for x in {14..20}; do echo $x; ssh 172.16.3.$x "find /sheep/ -type f | grep /sheep/vd | grep -v meta" ;done
14
15
16
17
18
19
20
root@vmu-14:~# for x in {14..20}; do echo $x; ssh 172.16.3.$x "find /sheep/ -type f | grep /sheep/vd | grep  meta" ;done
14
/sheep/vdi1/meta/80902e2500000000
15
16
17
/sheep/vdf1/meta/80902e2500000000
18
19
20
/sheep/vdf1/meta/80902e2500000000

@AnatolyZimin
Copy link

  1. Do not worry. This is normal.
  2. You can write an external script for periodic inode cleaning.

Look here
rgw: Add a command that deletes objects leaked from multipart retries
ceph/ceph#17349

This is a typical problem...

Sheepdog is not so bad as some people think. This is a great project.

@ggrandes
Copy link
Author

ggrandes commented Jun 9, 2019

Maybe.... but I had a simple rule: don't play/joke with storage, If you touch, don't surprise if you break things.
I would never dare touch an Oracle datafile manually, why should I do it with Sheepdog? ;-) (I will leave software do its work)

@AnatolyZimin
Copy link

AnatolyZimin commented Jun 9, 2019

You have one choise is to use hardware solutions. Open source does not fit into your rule.
It is incorrect to compare paid and expensive software with open source. Linux is an amateur OS. Now Solaris is dead, but when he was alive, this was a true enterprise.

@ggrandes
Copy link
Author

Hi @AnatolyZimin,
Maybe the idiom barrier (I'm spanish-speaker) don't tell me express mi basic idea: "My rule is don't touch files manually (error prone). Sheepdog must be the only piece of software that must touch its metadata".
The Oracle reference was only to express a comparative about serious things (databases), but with PostgreSQL, Cassandra or Solr datafiles you have the same concept (let's forget Oracle).
But, by the way, "Do not worry. This (leak) is normal" is like say "this software have a memory leak, don't worry, buy more memory or reboot the server every week" 🙈
Workarounds like this must be temporary. I also develop opensource software (in other fields) and it would never occur to me to leave a bug in workaround mode (I think what I say is reasonable).
If I knew how to fix this problem in sheepdog I would be very happy to fix it myself and send a PullRequest; but honestly, I have no idea. 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants