Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Material Library #6

Open
tonyshumlh opened this issue Apr 30, 2024 · 5 comments
Open

Material Library #6

tonyshumlh opened this issue Apr 30, 2024 · 5 comments
Assignees
Labels
admin meeting related

Comments

@tonyshumlh
Copy link
Collaborator

tonyshumlh commented Apr 30, 2024

This issues serves as the storage of all the related and useful material for the creation for Checklist and Prompt.
Summary of material is recommended to be written down to save the effort of other readers

@tonyshumlh
Copy link
Collaborator Author

Microsoft Industry Solutions Engineering Team 2024 https://microsoft.github.io/code-with-engineering-playbook/machine-learning/

  • ml-fundamentals-checklist: A complete checklist. The Data Quality and Governance part could be useful
    ml-fundamentals-checklist.md
  • ml-testing: Provided idea and example what code should be tested. Mainly on Data
    ml-testing.md
  • ml-model-checklist.md: Checklist about ML model in Production. NOT necessarily be useful
    ml-model-checklist.md

@tonyshumlh
Copy link
Collaborator Author

tonyshumlh commented Apr 30, 2024

Jeremy Jordan - Effective testing for machine learning systems https://www.jeremyjordan.me/testing-ml/
Group-7

  • Proposed a workflow to include tests (mainly ML pipeline tests) into ML development
  • Introduced the ideas of Pre-train tests and Post-train tests:
  • Pre-train tests are conducted before the model is trained, aiming to identify bugs early on and potentially save time by avoiding wasted training jobs.
  • Post-train tests utilize the trained model artifact to inspect behaviors for various scenarios defined by the testing process. These tests aim to understand the logic learned during training and provide a behavioral report of model performance.
    • Invariance Tests: Assess whether deliberate change to the input affect the model's output.
    • Directional Expectation Tests: Define deliberate change to the input with predictable effects on the model output.

@tonyshumlh
Copy link
Collaborator Author

tonyshumlh commented Apr 30, 2024

Studying the Practices of Testing Machine Learning Software in the Wild https://arxiv.org/pdf/2312.12604

  • Research on the Practices of 10 Testing Machine Learning Benchmark Projects. Test examples are included in the paper
Screenshot 2024-05-08 at 12 30 56 PM
  1. Testing Strategies: Four major categories were identified: Grey-box, White-box, Black-box, and Heuristic-based techniques. Grey-box and White-box techniques were the most commonly used.
Screenshot 2024-04-30 at 9 52 03 AM
  1. ML Properties Tested: 16 ML properties were identified, with functional correctness, consistency, robustness, data validity, and efficiency being the most frequently tested.
Screenshot 2024-04-30 at 9 52 14 AM
  1. Testing Methods: Thirteen different testing methods were identified, with only seven previously included in the Test Pyramid of ML.
Screenshot 2024-04-30 at 9 52 22 AM

@JohnShiuMK JohnShiuMK added the admin meeting related label May 6, 2024
@tonyshumlh
Copy link
Collaborator Author

tonyshumlh commented May 6, 2024

Retrieval-Augmented Generation

  • Technique for enhancing the accuracy and reliability of generative AI models with facts fetched from external sources, e.g. developer defined database.
  • It could be our ML test checklist and other background information.

image

@JohnShiuMK
Copy link
Collaborator

https://arxiv.org/pdf/2310.01402

Evaluating the Decency and Consistency of
Data Validation Tests Generated by LLMs∗
An application to Canadian political donations data

By a professor from the University of Toronto

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
admin meeting related
Projects
None yet
Development

No branches or pull requests

4 participants