Skip to content

A metadata indexing framework for large-scale storage systems, such as HPSS archival storage system.

License

Notifications You must be signed in to change notification settings

sandrain/hpssix

Repository files navigation

HPSSIX: HPSS Metadata Indexing

HPSSIX incrementally indexes metadata of files in running HPSS system and allows users to quickly locate files of interest with file metadata, including POSIX file attributes (or stat metadata) and POSIX extended attributes. HPSSIX also features the fulltext search for supported file types, e.g., Microsoft Office document files.

This framework enables fast file search operations.

## cp files to hpss
$ mkdir /var/hpss/mnt/home/hs2/demo
$ cp -r documents /var/hpss/mnt/home/hs2/demo

## setting some extended attributes
$ for f in /var/hpss/mnt/home/hs2/demo/documents/mtg.notes/*; do hpssix-tag set -n projid -v 100 $f; done

## show extended attributes
$ for f in /var/hpss/mnt/home/hs2/demo/documents/mtg.notes/*; do getfattr -d $f; done

## index -- this takes time (around 90sec), automatically runs daily
$ hpssixd.sh action

## search -- stat, name, path
$ hpssix-search --uid=`id -u` --count-only (--file-only) (--directory-only) (--verbose) (--limit)
$ hpssix-search --uid=`id -u` --name=pptx
$ hpssix-search --uid=`id -u` --path=papers (--file-only) (--directory-only)
$ hpssix-search --uid=`id -u` --name=pptx --date='>20190227-15:00:00'

## tagging
$ hpssix-search --tag="projid:" (--tag="projid:100", --tag="projid:0,200", --tag="projid:>=90.56")
$ hpssix-search --tag="projid:<=100"

## keyword
$ hpssix-search --keyword="parallel file system performance titan" --name='pptx'
$ hpssix-search --keyword='checkpoint burst buffers' --name=pptx

## document display
$ hpssix-meta --meta /var/hpss/mnt/home/hs2/tests/documents/projects/project-poster.pptx
$ hpssix-meta --text /var/hpss/mnt/home/hs2/tests/documents/projects/project-poster.pptx

The architecture is summarized in the following slides.

This work was presented in the following events.

  • Hyogi Sim, "Extracting Metadata from the ORNL HPSS Archive to Improve its Usability," Knowledge Is Power: Unleashing the Potential of Your Archives through Metadata, BoF in ACM/IEEE International Conferenece for HighPerformance Computing, Networking, Storage, and Analysis (SC), Denver, CO, November 2019
  • Hyogi Sim, "Making a Peta-Scale Archival Storage System Searchable," High Performance Storage Systems User Forum2019 (HUF 2019), Indiana University, Bloomington, IN, October 2019

About

A metadata indexing framework for large-scale storage systems, such as HPSS archival storage system.

Topics

Resources

License

Stars

Watchers

Forks