Utility for syncing training, validation, and evaluation data. #188

markmester · 2019-10-16T23:59:15Z

On most of the datasets I'm putting together, there is not always a 1-1 matching of masks to tiles. At the very least there should be clarification that the trainer needs a directory where all files are in sync. Even better would be to provide a simple pre-processing script for syncing the masks/tiles or in rs_trainer provide an option to ignore or remove un-synced masks/tiles.

Currently I just use a simple python script to sync the directory:

import os
import argparse

def dir_dict(dir: str) -> dict:
    dd = {}

    for subdir, dirs, files in os.walk(dir):
        for file in files:
            f = '/'.join(os.path.join(subdir, file).split("/")[-3:])
            dd[f] = os.path.join(subdir, file)

    return dd

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('dir1', type=str)
    parser.add_argument('dir2', type=str)
    args = parser.parse_args()

    removed = []

    dir1_dict = dir_dict(args.dir1)
    dir2_dict = dir_dict(args.dir2)

    for k, v in dir1_dict.items():
        if not dir2_dict.get(k):
            removed.append(v)
    
    for k, v in dir2_dict.items():
        if not dir1_dict.get(k):
            removed.append(v)

    for file in removed:
        os.remove(file)
        
    return len(removed)

    
if __name__ == "__main__":
    print ( f"removed {main()} un-synced files" )

The text was updated successfully, but these errors were encountered:

daniel-j-h · 2019-10-24T19:06:02Z

See #93 and #93 (comment)

We should keep the user responsible for preparing the dataset and making sure it's in sync. What we could do in the context of #91 is to go through our assertions and make them easier to understand (and show ways to solve the problem) for our users.

rs train's pre-conditions are a dataset directory with pairs of images and labels.

I agree with you we could make it clear in the readme, though.

Would you be so kind and open a pull request explaining this? Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Utility for syncing training, validation, and evaluation data. #188

Utility for syncing training, validation, and evaluation data. #188

markmester commented Oct 16, 2019

daniel-j-h commented Oct 24, 2019

Utility for syncing training, validation, and evaluation data. #188

Utility for syncing training, validation, and evaluation data. #188

Comments

markmester commented Oct 16, 2019

daniel-j-h commented Oct 24, 2019