Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do you use LizardFS? #858

Open
zicklag opened this issue Dec 26, 2019 · 18 comments
Open

How do you use LizardFS? #858

zicklag opened this issue Dec 26, 2019 · 18 comments
Labels

Comments

@zicklag
Copy link

zicklag commented Dec 26, 2019

I just wanted to open a general topic asking how the people that use LizardFS are using it today. It would be valuable to know things like how big your cluster is ( number of nodes, cluster storage capacity ), what hardware you are deploying to ( VMs, bare metal, SSD, etc. ), and if you are using any deployment tools such as Docker, Ansible, Chef, etc. Possibly another good thing to know would be how you plan to use LizardFS in the future, if at all.

Obviously nobody is obliged to answer, but I think that it would help us get an idea of what improvements to target with LizardFS if we knew how people use it today.


To answer my own questions, I'm not using LizardFS at the moment.

In the past I used my team's Docker image and Docker plugin to deploy LizardFS on Docker Swarm. This was a 3 node cluster where the apps ran on the same servers as LizardFS. The servers were VMs in the cloud with SSDs.

The cluster hosted database storage and the file storage for all of the apps we hosted on those servers. Databases were noticeably slower on LizardFS, but it worked. We were running LizardFS 3.12.

With LizardFS development getting back up again, me and my team very possibly will use it in the future, but we don't have a specific use-case in mind yet.

@eleaner
Copy link

eleaner commented Dec 26, 2019

I am using Lizardfs in two SOHO setups
1.
three-node small VPS cluster running swarm and your images of 3.12 on the host network
It's serving files only for my applications. Total size 300GB with the standard goal of 3
I run the database as a separate cluster on the same swarm.
It works awesome but I miss the proper multi-master/multi-IP HA for lizardfs.
Currently running shadows on other VPSes and I devised my own mechanism to switch the context when master becomes unavailable. Unfortunately, it requires a floating IP to point to the right server. Currently using ZeroTier One's API to achieve that. Also, switching masters goes randomly to the next one available shadow without control of the data accuracy.
I raised the feature request #848 for that.
2.
Single machine cluster running 3.13rc1 in docker (not swarm), ten single-HDD chunk servers of various sizes and speeds with ec(4.2), total space of 27TB
write speeds are around MB/s
This machine is used only as an on-site backup server. There is S3QL on top of lizardfs to manage the backup, snapshots and deduplication side of things, so I can easily move the data to another filesystem if needed.

I don't use lizardfs as my main home storage, as I was not able to go around boosting the write speeds with SSD with an automatic move to colder HDD storage after a while (as offered by other solutions)

@jbarrigafk
Copy link

We use lizardfs to store millions of small files, through our web service we must have them available to our customers.
Our first cluster in lizardfs tried to have URAFT as follows through VPNs between data centers:

Datacenter A:
2x - mfsmasters (Master/Shadow)
4x - chunkservers (simple goal by 2 copies using labels. goal A B)
1x - mfsmetalogger

Datacenter B:
1x - mfsmaster (Shadow)
4x - chunkservers (simple goal by 2 copies using labels. goal A B)
1x - mfsmetalogger

initially that worked well the first months, but then we started having problems with the metadata servers and we had to remove URAFT and later remove lizardfs from our productive environment due to the high consumption of RAM in the metadata servers (81GB RAM of 94GB) and due to problems when starting the services (I have a pending thread here #824 ) We managed to store 207,000,000 small files with a total storage of 19TB and 15TB used.

We gave another opportunity to lizardfs only using servers in Datacenter A and without URAFT. It currently has 77 million small files stored, our cluster consists of

2x - metadata servers (Master/Shadow)
6x - chunkservers (simple goal by 2 copies)
1x - metalogger.

At the moment we have been with this cluster for approximately 4 months and we have not had any problem.
Maybe lizardfs is not the best way to store millions of small files, due to the high increase of fs objects and the high consumption of RAM in metadata servers. But we continue to see how it behaves in the following months.

@nickcoons
Copy link

We have several clusters all running in our datacenter:

The first is the oldest, and we're migrating everything off of it because of its age:

LFS 3.9.4
3 Physical servers:
1 running Metadata Master
1 running Metadata Shadow
1 running Metadata Logger
All three are Chunkservers

LFS stores KVM VMs, which all three run with live migration capabilities. It also stores all of the user data. The VMs are small (usually about 8GB) and just run the OS and any installed applications, but then mount various LFS mounts for user data storage. The VMs provide services for web, email, VoIP, and a few other hosted services.

The second cluster is running 3.13rc1, similar to the above setup, except with uRaft, and Proxmox vs. KVM.

Both of those clusters are all running SSDs only.

The third cluster is for larger slow storage running 3.13rc2 and is using HDDs.

All servers are connected via 10GbE.

@HobbledGrubs
Copy link

HobbledGrubs commented Feb 6, 2020

We have a cluster which has been running for 3 years now using LFS 3.12.0

The config is as follows:
1 Master on baremetal, 12 cores, 64G RAM
1 Shadow on baremetal, 12 cores, 64G RAM
1 Metadata on vm, 2vCores, 2GB RAM
1 CGI on vm, 2vCores, 1GB RAM
5 x Chunkserver on baremetal 8 cores, 64G RAM

Spread over the chunk servers we have 37 spinning drives all 3-4TB enterprise grade.
All systems are connected via 10GBE
All disk filesystems are individual zfs pools

In total we have 13T used of 108TB in 32,889,057 files and 32,801,350 chunks

        Goal            Safe       Unsafe    Lost
        3               31327825      -       29
        ec_small        1461734       -       -

We have stopped adding data to the system as the performance is terrible and we are currently waiting for LizardFS updates or finding another solution. One of the projects on the system will be leaving soon so we can remove millions of files and hopefully the system will be usable enough that we can test performance enhancements without impacting clients.
The system has been mostly stable although we have random chunk server crashes on all chunk servers in the last few days which is odd.
Hope this helps and sorry for my first post which was rather short :)

@pbeza pbeza added the question label Feb 6, 2020
@przemekpjski przemekpjski pinned this issue Feb 6, 2020
@Piiinky
Copy link

Piiinky commented Feb 7, 2020

@HobbledGrubs This is odd also for me. Version you are currently using should be stabile. Could you please share with me more details about your problem via mail: pawel.pinkosz@lizardfs.com?
We should find a solution.
Hope this helps.

@onlyjob
Copy link
Member

onlyjob commented Feb 7, 2020

Let's keep as much information public as possible please. I'm also interested to know more about the problem and solution...

@HobbledGrubs
Copy link

Ok, here is a resume of what happened.

  1. A warning popped up on our monitoring regarding memory use on a chunkserver, it was using 53/64G
  2. I checked other chunk servers and noticed that they were all very high ~ 50G regardless of uptime (77days vs 450days for some) I assumed this was due to large number of chunks
  3. I decided to reboot one of the 450day chunkservers to pick up a new kernel
  4. After coming up, one of the zfs pools was not imported automatically
  5. I imported the missing pool and reloaded the chunkserver in the middle of its disk scan
  6. The final disk immediately began scanning. A minute or two later 2 other chunk servers segfaulted.
  7. A third chunkserver and one of the first two lost connectivity although network connectivity was fine after checking.
  8. The system eventually stabilized after another segfault on one of the chunkservers. There were 29 missing chunks and no files were defective.
  9. The next morning a fourth previously unaffected chunkserver segfaulted along with some of the first ones. I hadn't touched the machines since the day before.
  10. The system was having trouble stabilizing and as soon as it was stable briefly there were 43 missing chunks and 41 files defective.
  11. A file repair reduced this to 10 defective files and chunks, it has been stable since then.

Here are some logs:

Feb  5 17:37:33 chunk01 mfschunkserver[51189]: connected to Master
Feb  5 17:40:53 chunk01 mfschunkserver[51189]: (forward) write error: Broken pipe
Feb  5 17:46:20 chunk01 kernel: [6680198.907607] mfschunkserver[51230]: segfault at 563440d47680 ip 00007fe71743412a sp 00007fe6fbfec848 error 6 in libc-2.24.so[7fe7173b0000+195000]
Feb  5 17:46:20 chunk01 mfschunkserver: can't find process to terminate
Feb  5 17:46:20 chunk01 mfschunkserver: set gid to 113
Feb  5 17:46:20 chunk01 mfschunkserver: set uid to 109
Feb  5 17:46:20 chunk01 mfschunkserver: changed working directory to: /var/lib/mfs
Feb  5 17:46:20 chunk01 mfschunkserver: lockfile /var/lib/mfs/.mfschunkserver.lock created and locked
Feb  5 17:46:20 chunk01 mfschunkserver: hdd configuration file /etc/mfs/mfshdd.cfg opened
Feb  5 17:46:20 chunk01 mfschunkserver: hdd space manager: path to scan: /lfs/hdd7/
Feb  5 17:46:20 chunk01 mfschunkserver: hdd space manager: path to scan: /lfs/hdd6/
Feb  5 17:46:20 chunk01 mfschunkserver: hdd space manager: path to scan: /lfs/hdd5/
Feb  5 17:46:20 chunk01 mfschunkserver: hdd space manager: path to scan: /lfs/hdd4/
Feb  5 17:46:20 chunk01 mfschunkserver: hdd space manager: path to scan: /lfs/hdd3/
Feb  5 17:46:20 chunk01 mfschunkserver: hdd space manager: path to scan: /lfs/hdd2/
Feb  5 17:46:20 chunk01 mfschunkserver: hdd space manager: path to scan: /lfs/hdd1/
Feb  5 17:46:20 chunk01 mfschunkserver: main server module: listen on *:9422
Feb  5 17:46:20 chunk01 mfschunkserver: connecting to Master
Feb  5 17:46:20 chunk01 mfschunkserver: loaded charts data file from /var/lib/mfs/csstats.mfs
Feb  5 17:46:20 chunk01 mfschunkserver: open files limit: 10000
Feb  5 17:46:20 chunk01 mfschunkserver: mfschunkserver daemon initialized properly
Feb  5 17:46:20 chunk01 mfschunkserver[52290]: connected to Master
Feb  5 17:46:21 chunk01 kernel: [6680199.763352] mfschunkserver[52328]: segfault at 55a6ba68c000 ip 00007f6a6c96bd7d sp 00007f6a55ee08b0 error 4 in libz.so.1.2.8[7f6a6c968000+19000]
Feb  5 17:38:05 chunk02 mfschunkserver: mfschunkserver daemon initialized properly
Feb  5 17:38:05 chunk02 mfschunkserver[13757]: connected to Master
Feb  5 17:46:14 chunk02 kernel: [ 1358.542696] mfschunkserver[13826]: segfault at 55b36609d804 ip 00007f15c3c70608 sp 00007f1577fec850 error 6 in libz.so.1.2.8[7f15c3c6d000+19000]
Feb  5 17:37:43 chunk03 mfschunkserver[849]: replication error: Status 'No such chunk' sent by chunkserver (server 10.100.8.21:9422)
Feb  5 17:37:43 chunk03 mfschunkserver[849]: Received invalid response for chunk get block
Feb  5 17:37:43 chunk03 mfschunkserver[849]: replication error: Status 'No such chunk' sent by chunkserver (server 10.100.8.21:9422)
Feb  5 17:37:45 chunk03 kernel: [37515760.505070] mfschunkserver[887]: segfault at 560d1f083000 ip 00007f12aa439d7d sp 00007f12977eb8b0 error 4 in libz.so.1.2.8[7f12aa436000+19000]

A list of all segfaults

Feb  5 17:37:32 chunk01 kernel: [6679671.442801] mfschunkserver[6394]: segfault at 55c2ba679520 ip 00007f80f5d7b12a sp 00007f80c2fea848 error 6 in libc-2.24.so[7f80f5cf7000+195000]
Feb  5 17:46:20 chunk01 kernel: [6680198.907607] mfschunkserver[51230]: segfault at 563440d47680 ip 00007fe71743412a sp 00007fe6fbfec848 error 6 in libc-2.24.so[7fe7173b0000+195000]
Feb  5 17:46:21 chunk01 kernel: [6680199.763352] mfschunkserver[52328]: segfault at 55a6ba68c000 ip 00007f6a6c96bd7d sp 00007f6a55ee08b0 error 4 in libz.so.1.2.8[7f6a6c968000+19000]
Feb  6 10:15:50 chunk01 kernel: [6739570.197787] mfschunkserver[52939]: segfault at 561d8339c7a0 ip 00007ff205e7e12a sp 00007ff1e2fea848 error 6 in libc-2.24.so[7ff205dfa000+195000]
Feb  5 17:38:04 chunk02 kernel: [  868.766931] mfschunkserver[9646]: segfault at 55638a891a30 ip 00007f93de61812a sp 00007f9396fea848 error 6 in libc-2.24.so[7f93de594000+195000]
Feb  5 17:46:14 chunk02 kernel: [ 1358.542696] mfschunkserver[13826]: segfault at 55b36609d804 ip 00007f15c3c70608 sp 00007f1577fec850 error 6 in libz.so.1.2.8[7f15c3c6d000+19000]
Feb  5 17:37:45 chunk03 kernel: [37515760.505070] mfschunkserver[887]: segfault at 560d1f083000 ip 00007f12aa439d7d sp 00007f12977eb8b0 error 4 in libz.so.1.2.8[7f12aa436000+19000]
Feb  6 09:19:16 chunk03 kernel: [52728.231109] mfschunkserver[9792]: segfault at 561eafbbb000 ip 00007f88b1191d7d sp 00007f8888fe68b0 error 4 in libz.so.1.2.8[7f88b118e000+19000]
Feb  6 10:15:24 chunk03 kernel: [56096.927880] mfschunkserver[59670]: segfault at 55e8494744bc ip 00007fda9273512d sp 00007fda4d7e7878 error 4 in libz.so.1.2.8[7fda92732000+19000]

chunk02 was the server rebooted at Feb 5 17:24:31

@zicklag
Copy link
Author

zicklag commented Feb 7, 2020

I agree that it would be good to keep the discussion about the problem public, but we should probably create a new issue for it and then link to it from here so that we don't over-clutter this topic.

@wildtv
Copy link

wildtv commented Feb 12, 2020

We use LizardFS for content storage for television delivery. 3.12
Bare metal boxes hold drives, as we free up hardware from old raid arrays, we try to integrate what doesn't fail as well as some new hardware.

7 Servers: 2x5TB, 2X15TB, 2x22TB 1x44TB, for 127TB, with ~8TB free. Expected to grow.

A 3 server Proxmox cluster holds master, and a shadow, cgi-serv, this aids in the event of needing to do maintenance, snapshots.

10G backbone is planned, but missing cards.

We use Debian, XFS, mostly Areca sata controllers.

Our DAM now has support for managing Goals dynamically, changes to command line clients commands, would introduce changes for us. Nothing horrible, but knowing about the changes in advance would be helpful.

RMlint is used for de-duplication using extended file attributes for hash storage, we also have support in out DAM to use/create this same (md5) hash. Any changes that might break extended file attributes would be unfortunate.

Clients are Debian and OSX, that windows client, or build instructions, would be "handy"

I had some issues with a 2 chunk goal ending with odd missing chunks when losing a drive. Try to keep 3 chunks and or parity now if the content matters.

Other than that it's been pretty good, MUCH easier to manage than CEPH.
Not having to rebuild a raid array in the middle of a work week has been amazing, for everyone involved.

@19wolf
Copy link
Contributor

19wolf commented Jun 2, 2020

I'm using it on a single node (my home server) with 5 chunkservers (1 for each hdd) so I can set different profiles for different files. General media gets xor3, my photos get copies 3, personal files get copies 2, etc. It works pretty well until I try to read to many files too quickly, then I run out of ram (12gb total) and lizard crashes until I reboot my machine. Strange things happen when lizard crashes while mounted...

@bash99
Copy link

bash99 commented Jun 12, 2020

We have three setup,
first is on our UAT: 1master 1slave (use keepalived to achieve HA which is not good all-through pass basic power off test), 1 metalogger and 5 chunkservers, 3.12/3.11.3 mixed. Lose some file due to several power lost on all server.

second is on a IDC, 1 master 1 slave, 10 chunkservers, total 28T and mostly big backup files, and a few exports to web server for user upload storage.
current not problem for 4 years. more one is update to 3.12.

All our lizard server are running on bare metal boxes with direct hard drivers, but is not the only service on those box. Those box serve as Proxmox nodes.

We also has about 100 millions small file need to storage, we choose SeaweedFS as it's design for small files, with only master and volume server (we save fid with our application data), it's stable enough for our needs.

@antonborecki
Copy link

Hi guys, if anyone would like to share your story with us – please email me antonio@lizardfs.com. We are looking for any use case of LizardFS. Your input into the project would be much appreciated. E-mail me and I'll tell you which way it will be appreciated :)

@Zorlin
Copy link

Zorlin commented Jul 1, 2020

Sorry to say, I don't use LizardFS any more.

Unfortunately the 3.13 RC1 stuff completely turned me off, especially when combined with the poor performance on my ODROID HC2 units... I went back to MooseFS...

I'd happily explore LizardFS again if there was any hope of it running as fast as MooseFS on my hardware.

@antonborecki
Copy link

@Zorlin I am sure we will make you change your mind soon :) Thank you for your feedback though.

@creolis
Copy link

creolis commented Aug 12, 2020

We use lizardFS as massive cache distribution layer.

  • ChunkNodes: 120 (15 nodes w/ 3x4TiB HDDs, 105 nodes w/ 1x512GiB SSD) split into 4 switched zones
  • One master
  • 15 MetaLoggers

Our students primarily use VMware Workstation and work on templates that are provided on a central nfs storage.
The I/O to our nfs storage would be overwhelming if 100 students would start to do random reads on the templated VMs. We therefore migrated all our VM templates read-only into a lizardFS share and set the replication goal to 120.

Doing this, all our templates are distributed to all the clients, which mount the share. Having PREFER LOCAL CHUNKSERVER enabled, each client will sip the templates from his own SSD if the chunks in question are available. If unsuccessful, they will try to get the chunk from a node on the same switch, if unsuccessful, it will try to get them from ANY other node on the network running lizardFS.

This turned out to work extremely well, performance-, availability- and administrationwise.

@4Dolio
Copy link

4Dolio commented Oct 31, 2020

Placeholder for future explanation of still in service 2.6.0-wip +(Corosync/Pacemaker). Three LFS clusters do iSCSI/NFS block stores servicing multiple Xen pools. Eight LFS clusters do mixed native and block stores. Mixes of Five ~100TB JBOD and Five more smaller JBOD/Raid bare metal CS. Various SSD/NVMe bare and virtual CS. Masters/Shadows a mix of metal and mostly virtual. Loggers at CSs, Day/weekly snapshots, no dedup (yet) but just read someone (#858 (comment)) is, was planning on the same tricks. Running metadata sets sum up to about ~300GB, largest is kept under 128G RAM. Due to network glitches mostly, often leave CoroPace inactive but still use it to transition and move FloatIP between masters. Standard goals, never used EC, no long haul replication yet. One day, will get to 10-100gbps network.

At home: ODroid/RasPi/VirtualBox and (10+16TB USB)³, one CS is even on a laptop via CIFS+VirtualBox. Mainly block store of decades worth of junk. Backs some Sia storage hosts and chewing on Ring footage as of late.

@MariuszLizardfs
Copy link

Last configuration. Two buildings with 5 servers each 5x 1080 TB per site. Running HA and topology. The user downloads from the nearest location.

@tony-travis
Copy link

tony-travis commented Apr 26, 2023

  1. Computer Angels, Italy:
    Active/Downgraded 6@16TB-node ec(2,2) LizardFS cluster to 2@32TB-node mirror to save electricity.

  2. Minke Informatics, Scotland (UK):
    Active/Downgraded 8@32TB-node ec(2,2) LizardFS cluster to 2@96TB-node mirror to save electricity.

  3. Assam Agricultural University, India:
    Active 4@48TB-node ec(2,2) LizardFS cluster

[Note: all LizardFS clusters are running Ubuntu-MATE 22.04 LTS, with Ubuntu-MATE 20.04 LTS lizardfs*.deb's]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests