Skip to content

bsjerps/qdda

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

QDDA - The Quick and Dirty Deduplication Analyzer

logo

Description

QDDA checks files, named pipes or block devices for duplicate blocks to estimate deduplication efficiency on dedupe capable All-Flash Arrays. It also estimates compression ratios.

QDDA uses MD5 hashing and LZ4 or DEFLATE compression and stores hash values, hash counts and compressed bytes counts in SQLite database format.

Other features:

  • Does not scan just a sample - ALL data is scanned
  • Can be executed as non-root (for security reasons)
  • Non-Linux data (UNIX, Windows, VMware) can be scanned via IP (netcat/pipes)
  • Can scan live environments (such as prod database servers)
  • Blocksize adjustable between 1K and 128K
  • Built-in IO throttle to avoid overloading production systems (default 200MB/s)
  • Can merge (combine) results from different nodes (i.e. distributed storage environments)
  • Scales to datasets of multiple terabytes (tested 3+TB) although it may take a while
  • Can report compression and deduplication histograms
  • Scan speed (observed) up to 7GB/s (multi-threaded)
  • The SQLite database can be queried with standard SQLite tools

Wiki page: qdda

Installation: qdda is built and packaged for EL6 (CentOS, RHEL etc) and is upward compatible, so it runs fine on EL7 based systems. See wiki page for download instructions.

Download

latest version

Installation

From ZIP file:

  • Download the zipfile and place the binary qdda in $PATH

From YUM repository (RPM/YUM based):

yum install https://yum.dirty-cache.com/dcrepo-release.rpm
yum install qdda

From SOURCE:

See INSTALL

Usage

Run qdda -h for command line options, qdda -m for more extensive help. Further documentation on the wiki page: https://wiki.dirty-cache.com/qdda

Man page

See MAN PAGE you can run the embedded man page using qdda -m if your system supports it. If qdda is installed from RPM or make install you can also do man qdda

Example

The example shows QDDA run against 4 Oracle ASM devices with Oracle 12c and ASM. The database on disk has about 1GB Swingbench benchmark data loaded plus a 700M empty tablespace. The database is running while we scan.

example1

Example output (Oracle ASM)

qdda 2.0.7 - The Quick & Dirty Dedupe Analyzer
Use for educational purposes only - actual array reduction results may vary

Database info (/home/jail/qdda.db):
database size       = 2.51 MiB
array id            = XtremIO X2
blocksize           = 16 KiB

Overview:
total               =     6144.00 MiB (    393216 blocks)
free (zero)         =     3596.94 MiB (    230204 blocks)
used                =     2547.06 MiB (    163012 blocks)
dedupe savings      =      435.56 MiB (     27876 blocks)
deduped             =     2111.50 MiB (    135136 blocks)
compressed          =      438.74 MiB (     79.22 %)
allocated           =      528.39 MiB (     33817 blocks)

Details:
used                =     2547.06 MiB (    163012 blocks)
compressed raw      =      442.53 MiB (     82.63 %)
unique data         =     1930.66 MiB (    123562 blocks)
non-unique data     =      616.41 MiB (     39450 blocks)

Summary:
percentage used     =       41.46 %
percentage free     =       58.54 %
deduplication ratio =        1.21
compression ratio   =        4.00
thin ratio          =        2.41
combined            =       11.63
raw capacity        =     6144.00 MiB
net capacity        =      528.39 MiB

raw capacity = total scanned disk space

net capacity = required space on an XtremIO X2

Future

  • More storage arrays
  • More special database queries
  • Multiple compression methods

Licensing

QDDA is licensed under GPLv3. See "COPYING" for more info.

Support

Please file bugs and issues at the Github issues page.