DataPipes Testing Requirements
Kevin Tse edited this page Jan 19, 2022
·
3 revisions
Besides functional testing of every DataPipe, tests should include:
- Test if DataPipe resets correctly (#65067)
- Order of the outputs should be consistent between iterations
dp = create_dp()
list1 = list(dp)
list2 = list(dp)
- Test if DataPipe iterators are independent (should not be expected in multiprocessing, but good coding pattern)
- Test if DataPipe is serializable
- Test if DataPipe has
__len__
correctly implemented or throws an expected error - Test if DataPipe is lazy (serialized size is 'reasonable')
- Test how DataPipe works in deterministic context
- Test if DataPipes creates properly linked DataPipes graph
- Test if DataPipe is picklable (using pickle or dill)
- Test if it is still picklable after the DataPipe has been iterated through.
Ideally, there are examples and helper functions for each of these requirements.
TODO: Some of these requirements are only relevant for IterDataPipe
, we also need to requirements for MapDataPipe
.
- Install Prerequisites
- Fork, clone, and checkout the PyTorch source
- Install Dependencies
- Build PyTorch from source
- Tips for developing PyTorch
- PyTorch Workflow Git cheatsheet
- Overview of the Pull Request Lifecycle
- Finding Or Reporting Issues
- Pre Commit Checks
- Create a Pull Request
- Typical Pull Request Workflow
- Pull Request FAQs
- Getting Help
- Codebase structure
- Tensors, Operators, and Testing
- Autograd
- Dispatcher, Structured Kernels, and Codegen
- torch.nn
- CUDA basics
- Data (Optional)
- function transforms (Optional)