This is a general repo structure for "data endeavors" - tasks, initiatives, projects, or missions which revolve around data goals. This repo was originally developed by Microsoft as part of the Team Data Science Process ProjectTemplate and has been modified.
❗ Do not include any real-world or production data samples in this repo (including any output) ❗ Do not include plaintext secrets or passwords of any kind in this repo |
Team Data Science Process (TDSP) is an agile, iterative, data science methodology to improve collaboration and team learning. It is supported through a lifecycle definition, standard project structure, artifact templates, and tools for productive data science.
NOTE: In some projects, e.g. short term proof of principle (PoC) or proof of value (PoV) engagements, it can be relatively time consuming to create and all the recommended documents and artifacts. In that case, at least the Charter and Exit Report should be created and delivered to the customer or client. As necessary, organizations may modify certain sections of the documents. But it is strongly recommended that the content of the documents be maintained, as they provide important information about the project and deliverables.
- Modify this readme to be reflective of the specific endeavor involved
- Ensure .gitignore is setup to ignore any related files which may contain sensitive data or secrets
- Outline business goals and needs
- Explore related data
- Describe related data model and schema
- Create reproducible deployment code