Skip to content

A FUSE wrapper around MongoDB gridfs using python and pyfuse3.

License

Notifications You must be signed in to change notification settings

jmfernandez/py_gridfs_fuse

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

68 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

python gridfs fuse

A couple of FUSE wrappers around MongoDB gridfs using python3 and pyfuse3.

This work is based on https://github.com/axiros/py_gridfs_fuse and https://github.com/Liam-Deacon/py_gridfs_fuse developments.

There are two implementations:

  • The naive one, done by me (jmfernandez). It is fully compatible with existing GridFS collections, and some write scenarios are not supported. Directories are simulated through the usage of directory separator symbol in the filenames, like in cloud filesystems like s3fs or rclone.

  • The classical one from axiros and Liam Deacon. It is a full filesystem, with subdirectories, but it is not compatible with existing GridFS collections.

Usage (naive)

naive_gridfs_fuse --mongodb-uri="mongodb://127.0.0.1:27017" --database="gridfs_fuse" --mount-point="/mnt/gridfs_fuse" # --options=allow_other
naive_gridfs_fuse --mongodb-uri="mongodb://127.0.0.1:27017/" -c specialcoll --database="gridfs_fuse" --mount-point="/mnt/gridfs_fuse --show-versions" # --options=allow_other

fstab example

mongodb://127.0.0.1:27017/gridfs_fuse.fs  /mnt/gridfs_fuse  gridfs_naive  defaults,allow_other  0  0 

Note this assumes that you have the mount.gridfs_naive program (or mount_gridfs_naive on MacOS X) symlinked into /sbin/ e.g. sudo ln -s $(which mount.gridfs_naive) /sbin/

Usage (classical)

gridfs_fuse --mongodb-uri="mongodb://127.0.0.1:27017" --database="gridfs_fuse" --mount-point="/mnt/gridfs_fuse" # --options=allow_other

fstab example

mongodb://127.0.0.1:27017/gridfs_fuse.fs  /mnt/gridfs_fuse  gridfs  defaults,allow_other  0  0 

Note this assumes that you have the mount.gridfs program (or mount_gridfs on MacOS X) symlinked into /sbin/ e.g. sudo ln -s $(which mount.gridfs) /sbin/

Requirements

  • pymongo
  • pyfuse3

Install

Ubuntu 16.04:

sudo apt-get install libfuse python3-pip
sudo -H pip3 install git+https://github.com/jmfernandez/py_gridfs_fuse.git@v0.4.0

MacOSX:

brew install osxfuse
sudo -H pip3 install git+https://github.com/jmfernandez/py_gridfs_fuse.git@v0.4.0

Operations supported (naive)

  • create/list/delete directories => folder support (albeit permissions and ownership are not persisted).
  • Show all file versions (through mount flag).
  • read files (any of their versions).
  • delete files (all their versions at once).
  • open and write once (like HDFS).
  • rename

Operations partially supported (naive)

  • modify an existing file (only opening as O_WRONLY, it creates a new version of the file in GridFS).

Operations supported (classical)

  • create/list/delete directories => folder support.
  • read files.
  • delete files.
  • open and write once (like HDFS).
  • rename

Operations not supported (naive)

  • resize an existing file.
  • hardlink
  • symlink
  • statfs

Operations not supported (classical)

  • modify an existing file.
  • resize an existing file.
  • hardlink
  • symlink
  • statfs

Performance (classical)

Setup

  • AWS d2.xlarge machine.
    • 4 @ 2.40Ghz (E5-2676)
    • 30 gigabyte RAM
  • filesystem: ext4
  • block device: three instance storage disks combined with lvm.
lvcreate -L 3T -n mongo -i 3 -I 4096 ax /dev/xvdb /dev/xvdc /dev/xvdd
  • mongodb 3.0.1
  • mongodb storage engine WiredTiger
  • mongodb compression: snappy
  • mongodb cache size: 10 gigabyte

Results

  • sequential write performance: ~46 MB/s
  • sequential read performance: ~90 MB/s

Write performance was tested by copying 124 files, each having a size of 9 gigabytes and different content. Compression factor was about factor three. Files were copied one by one => no parallel execution.

Read performance was tested by randomly picking 10 files out of the 124. Files were read one by one => no parallel execution.

# Simple illustration of the commands used (not the full script).

# Write
pv -pr /tmp/big_file${file_number} /mnt/gridfs_fuse/

# Read
pv -pr /mnt/gridfs_fuse${file_number} > /dev/null

About

A FUSE wrapper around MongoDB gridfs using python and pyfuse3.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%