Skip to content

wodny/ncdu-export

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Standalone ncdu export feature and some other tools

ncdu (NCurses Disk Usage) is a great utility with an ncurses interface that allows browsing through directories and check their disk usage (like the du command). It first walks through a directory and then allows browsing the cached result.

Newer versions (≥1.9) of ncdu have a feature allowing you to make a JSON export file on a remote machine (-o option) and then browse directories locally (-f option).

On some old machines there may be an old version of ncdu without that option or there may be no ncdu at all and it may be expensive to build ncdu for every one of them or for some reason you cannot get static binaries for a specific platform. The ncdu-export tool is a workaround - it generates an export file compatible with ncdu and requires only Python 2.6 (or newer) on the remote machine.

Below, there is also a script based on the find command only (without using Python).

Note, however, that there are static binaries for x86, x86_64 and ARM available directly from the ncdu's homepage, so tools from this repository may be useful only if you cannot use those static binaries.

Currently the scripts' output is not identical to ncdu's output but should work well enough.

Example:

1. Copy the script to the remote host
$ scp ncdu-export remote-host:

2A. Pipe meta-data via ssh to a local file:
$ ssh remote-host ./ncdu-export -p / > files.json

2B. Collect meta-data on the remote host and then download it:
$ ssh remote-host
$ ./ncdu-export -p / > files.json
^D
$ scp remote-host:files.json .

3. Analyze the data
$ ncdu -f files.json

Other tools

Tools described below are prepared for filenames containing unusual characters like newlines. They support - as the FILE's name so you can use them with pipes.

Flatten/unflatten

Sometimes one can have a need to automatically filter meta-data dumped using the ncdu or ncdu-export tools. Those dumps can be quite big, hundreds of megabytes. One can process those dumps with jq, but:

  • using jq in non-stream mode can consume a lot of RAM,
  • getting directory name from this kind of dump may be quite complicated (I don't like my own example with walk),
  • I didn't find a way to process ncdu's output in jq's stream mode and using methods like fromstream(1|truncate_stream(inputs)); I suppose it's because contrary to most formats used in jq's usage examples ncdu's format is not flat (it's an array of arrays of maps).

This set of tools can be used to flatten ncdu's output, make it easy to process using jq and then optionally unflatten it back again. These tools depend on the ijson Python library using the YAJL2 library underneath. Those libraries work on streams and parse JSON incrementally so it's possible to convert huge dumps without consuming all the RAM.

The yajl2_cffi backend is chosen automatically (if available). It's faster than the pure Python backend. During experiments it reduced the conversion time by as much as 40%.

Example of filtering files modified before 2018-01-01:

$ ./ncdu-export -mp a-directory > files.json
$ ./flatten.py files.json > files-flat.json
$ export ts=$(date -d 2018-01-01 +%s)

Rebrowsing in ncdu:
$ jq -c 'select(.mtime < (env.ts | tonumber))' < files-flat.json > files-flat-before2018.json
$ ./unflatten.py files-flat-before2018.json > files-before2018.json
$ ncdu -f files-before2018.json

Putting files in an archive and removing them:
$ jq -j 'select(.mtime < (env.ts | tonumber) and .type == "file") | .dirs + "/" + .name + "\u0000"' < files-flat.json > files-flat-before2018.txt
$ tar cvzf archive.tgz --null -T files-flat-before2018.txt --remove-files

Find export

There is also a script that allows you to produce a meta-data dump just using the find command on a remote host (without using ncdu nor Python at all) and then process it locally to regenerate the ncdu-compatible JSON format. It works thanks to find's printf action (available in the Linux version, not the BusyBox one).

$ ./find.sh a-directory > find-export.txt
$ ./find2flat.py find-export.txt > find-flat-export.json
$ ./unflatten.py find-flat-export.json > find-export.json
$ ncdu -f find-export.json

or

$ ./find.sh ~/projects/ | ./find2flat.py - | ./unflatten.py - | ncdu -f -

Graph of tools

                     .------------.
     .---------------| filesystem |
     |               '------------'
     |                      |
     |                      | ncdu -o / ncdu-export
     |                      v
     |                  .------.         .---------.
     | find.sh          | ncdu | ncdu -f |  ncdu   |
     |                  | JSON |-------->| preview |
     |                  '------'         '---------'
     |                    |  ^
     |         flatten.py |  | unflatten.py
     v                    v  |
.--------.              .------.
|  find  | find2flat.py | flat |<---. jq filtering
| output |------------->| JSON |----'
'--------'              '------'
                            |
                            | jq
                            v
                      .-----------.        .---------.
                      |    tar    | tar -T |   tar   |
                      | file list |------->| archive |
                      '-----------'        '---------'