Skip to content

Navigation Menu

Explore
For
- Enterprise
- Teams
- Startups
- Education
By Solution
Resources
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

mlcommons / modelgauge Public

Notifications
Fork 3
Star 19

Code
Issues 45
Pull requests
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Releases: mlcommons/modelgauge

Releases · mlcommons/modelgauge

v0.5.1

26 Apr 21:10

bkorycki

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

v0.5.1 Pre-release

Pre-release

What's Changed

Updated docs
SafeTest compatible with python 3.11+
Add new Llama Guard 2 to LlamaGuardAnnotator
- Can configure LlamaGuardAnnotator with optional llama_guard_version parameter. Defaults to Llama Guard 2
- Minor changes to prompt/category formatting for Llama Guard 1. This may affect results.
SafeTest can also be configured to use Llama Guard 1 or 2 as it's annotator. Defaults to version 2.

Full Changelog: v0.5.0...v0.5.1

Assets 2

All reactions

v0.5.0

15 Apr 22:35

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

v0.5.0 Pre-release

Pre-release

What's Changed

Renamed to ModelGauge and started pushing to PyPI!
A whole bunch of cleanups and preparation for the more public release.
Caching now supports dicts.
Unit tests to ensure you can install from PyPI and run in a notebook.
Expand range of supported python versions to 3.10 and up.
Remove benign hazard from SafeTest.
Start setting up ReadTheDocs.

Full Changelog: v0.3.3...v0.5.0

Assets 2

All reactions

v0.3.3

09 Apr 23:00

bkorycki

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

v0.3.3 Pre-release

Pre-release

What's Changed

Change SafeTest to data_april04 release.
- More prompts
- Removed safe-ben

Full Changelog: v0.3.2...v0.3.3

Assets 2

All reactions

v0.3.2

09 Apr 21:50

bkorycki

Compare

Choose a tag to compare

v0.3.2 Pre-release

Pre-release

What's Changed

max_test_items returns a relatively stable set of prompts
Loading bar for plugins
Have list command report prettier values for secrets
Time out requests stuck on TogetherAI
Updated docs
Move simple_test_runner out of plugins and into core library

Full Changelog: v0.3.1...v0.3.2

Assets 2

All reactions

v0.3.1

03 Apr 17:13

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

v0.3.1 Pre-release

Pre-release

What's Changed

Fix bad version specification for together dependency, which was causing 0.3.0 to not actually install.
Add Deepseek model that is now available on Together.
Stabilize the order of TestItems in SafeTest to better utilize caching.

Full Changelog: v0.3.0...v0.3.1

Assets 2

All reactions

v0.3.0

02 Apr 22:03

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

v0.3.0 Pre-release

Pre-release

What's Changed

Reorganized the run_data folder and made several improvements to caching. This breaks backward comparability. Old files should just be ignored, but if you run into issues, probably best to just delete your run_data folder.
Updated SafeTest to 02apr2024.
We now have all SUTs in the requested set, minus Deepseek.
Simplified the command line to be newhelm once installed or poetry run newhelm when using the local repo.
Annotations are now recorded per completion instead of per TestItem.
HuggingFace sets pad token to default, which should remove warning messages.
Added some enforcement of SUTCapabilities to help them be accurate.
Remove all "Base" prefixes except BaseTest.

Full Changelog: v0.2.6...v0.3.0

Assets 2

All reactions

v0.2.6

28 Mar 19:57

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

v0.2.6 Pre-release

Pre-release

What's Changed

Bug fix for SafeTest

Full Changelog: v0.2.5...v0.2.6

Assets 2

All reactions

v0.2.5

27 Mar 23:31

bkorycki

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

v0.2.5 Pre-release

Pre-release

What's Changed

Tests no longer have a get_metadata() method. Dependency helper uses a Test's class name instead.
Introduced the concept of SUT capabilities (ProducesPerTokenLogProbabilities, AcceptsChatPrompt, AcceptsTextPrompt). SUTs and Tests must specify their capabilities/requirements in the @newhelm_sut and @newhelm_test decorators.
SUTs can now return per-token log probabilities in a SUTCompletion. OpenAIChat is updated with this capability.
SafeTest updates:
- Re-structured to have one test per hazard, grouping all applicable persona types (typical, malicious, or vulnerable).
- Results are reported as mapping from persona type to PersonaResult, which consists of num_items in addition to frac_safe.
- Added tests for new hazards
Added new test DiscrimEval

Full Changelog: v0.2.4...v0.2.5

Assets 2

All reactions

v0.2.4

21 Mar 17:07

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

v0.2.4 Pre-release

Pre-release

What's Changed

Tests and SUTs now have a member variable UID, which gets passed into their constructor.
Introduced @newhelm_test and @newhelm_sut decorators to give us better hooks into user code.
New command list-suts to tell you what secrets each SUT uses.
Bug fixes for SafeTest, max_test_items, our integration with Together

New Contributors

@0xkerem made their first contribution in #263

Full Changelog: v0.2.3...v0.2.4

Contributors

0xkerem

Assets 2

All reactions

v0.2.3

13 Mar 20:47

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

v0.2.3 Pre-release

Pre-release

What's Changed

The results from a test in TestRecord switched from List[Result] to a test specific TypedData. This allows Tests to report their results in a more natural structured form, as well as provide documentation on what that form is.
More SAFE tests, including benign tests.

Full Changelog: v0.2.2...v0.2.3

Assets 2

All reactions

Previous 1 2 Next

Previous Next

Footer

© 2024 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.