Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kitfile overview #70

Merged
merged 3 commits into from Mar 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 0 additions & 1 deletion docs/.vitepress/config.mts
Expand Up @@ -67,7 +67,6 @@ export default defineConfig({
items: [
{ text: 'Overview', link: '/docs/kitfile/kf-overview' },
{ text: 'Format', link: '/docs/kitfile/format.md' },
{ text: 'Benefits', link: '/docs/kitfile/benefits' },
]
},
{
Expand Down
77 changes: 0 additions & 77 deletions docs/src/docs/cli/usage.md

This file was deleted.

1 change: 0 additions & 1 deletion docs/src/docs/kitfile/benefits.md

This file was deleted.

24 changes: 22 additions & 2 deletions docs/src/docs/kitfile/kf-overview.md
@@ -1,3 +1,23 @@
# Kitfiles
# Kitfile: Your AI/ML Project Blueprint

A Kitfile is a manifest showing what is in
## What is a Kitfile?

At the core of every AI/ML project managed by KitOps lies the Kitfile, a YAML-based manifest designed to streamline the encapsulation and sharing of project artifacts. From code and datasets to models and their metadata, the Kitfile serves as a comprehensive blueprint for your project, ensuring every component is meticulously organized and easily accessible.

## Structured for Clarity

Crafted with simplicity and efficiency in mind, the Kitfile organizes project details into distinct sections:

**Project Metadata:** Offers a snapshot of your project, including its name, version, description, and authors, laying the foundation for collaboration and recognition.

**Code:** Details about the source code powering your AI/ML models, complete with licensing information to uphold software best practices.

**Datasets:** Descriptions and paths to datasets, highlighting preprocessing steps and licenses, to ensure reproducibility and ethical use of data.

**Model Specifications:** Insights into the models themselves, including framework details, training parameters, and validation metrics, to foster understanding and further development.

## Designed for Collaboration

By encapsulating the essence of your AI/ML project into a singular, version-controlled document, the Kitfile not only simplifies the packaging process but also enhances collaborative efforts. Whether you're sharing projects within your team or with the global AI/ML community, the Kitfile ensures that every artifact, from datasets to models, is accurately represented and easily accessible.

Embrace the Kitfile in your AI/ML projects to harness the power of structured packaging, efficient collaboration, and seamless artifact management. As the backbone of the KitOps ecosystem, the Kitfile is your first step towards simplifying AI/ML project management and achieving greater innovation.
23 changes: 23 additions & 0 deletions docs/src/docs/modelkit/ModelKit_chart.svg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 2 additions & 0 deletions docs/src/docs/modelkit/intro.md
@@ -1,5 +1,7 @@
# ModelKit Overview

![ModelKit](./ModelKit_chart.svg)

ModelKit revolutionizes the way AI/ML artifacts are shared and managed throughout the lifecycle of AI/ML projects. As an OCI-compliant packaging format, ModelKit encapsulates datasets, code, configurations, and models into a single, standardized unit. This approach not only streamlines the development process but also ensures broad compatibility and integration with a vast array of tools and platforms.

## Key Features of ModelKit:
Expand Down
50 changes: 1 addition & 49 deletions docs/src/docs/modelkit/spec.md
@@ -1,49 +1 @@
# ModelKit Specification v0.1

A **ModelKit** represents a comprehensive bundle of AI/ML artifacts, including models, datasets, and code, along with their associated parameters. These components are crucial at various stages of a model's lifecycle. This specification details the format and organization of these artifacts and parameters, providing guidelines for their creation, management, and use.

## Terminology and Structure



**Artifacts:** The building blocks of a ModelKit. Artifacts can be models, datasets, or code, each stored and addressed individually. This modular approach facilitates direct access via tools. Artifact metadata is encapsulated within the kitfile, ensuring comprehensive documentation of each component.

The artifacts and their media types are
* Serialized Model: `application/vnd.kitops.modelkit.model.v1.tar+gzip`
* Datasets: `application/vnd.kitops.modelkit.dataset.v1.tar+gzip`
* Code: `application/vnd.kitops.modelkit.code.v1.tar+gzip`

**ModelKit File (Kitfile)** Acts as a record detailing the properties, relationships, and intended uses of the included artifacts. The Kitfile is central to understanding the structure and purpose of a ModelKit. It adopts the `application/vnd.kitops.modelkit.config.v1+json` media type for easy access and interpretation by tools.See the seperate kitfile specification on details

**ModelKit Manifest:** This JSON document provides essential information about the model, including creation date, authorship, and a cryptographic hash of each artifact and the Kitfile. The manifest is immutable to preserve the integrity of the ModelKitID, ensuring any modification results in the creation of a new derived ModelKit, rather than altering the existing one.

### Identification and Management:

**ModelKitID:** A unique identifier for each ModelKit, derived from the SHA256 hash of its manifest. For example, `sha256:a9561eb1b190625c9adb5a9513e72c4dedafc1cb2d4c5236c9a6957ec7dfd5a9`.

**Tag:** A tag serves to map a descriptive, user-given name to any single modelKitID. Tag values are limited to the set of characters [a-zA-Z0-9_.-], except they may not start with a . or - character. Tags are limited to 128 characters.

**Repository:** A collection of tags grouped under a common prefix (the name component before :). For example, in a ModelKit tagged with the name myllm:3.1.4, myllm is the Repository component of the name. A repository name is made up of slash-separated name components, optionally prefixed by a DNS hostname. The hostname must comply with standard DNS rules, but may not contain _ characters. If a hostname is present, it may optionally be followed by a port number in the format :8080. Name components may contain lowercase characters, digits, and separators. A separator is defined as a period, one or two underscores, or one or more dashes. A name component may not start or end with a separator.


## ModelKit Manifest Example

Example of a ModelKit manifest with a single serialized model and kitfile.

```JSON
{
"schemaVersion": 2,
"config": {
"mediaType": "application/vnd.jozu.model.config.v1+json",
"digest": "sha256:d5815835051dd97d800a03f641ed8162877920e734d3d705b698912602b8c763",
"size": 301
},
"layers": [
{
"mediaType": "application/vnd.jozu.model.content.v1.tar+gzip",
"digest": "sha256:3f907c1a03bf20f20355fe449e18ff3f9de2e49570ffb536f1a32f20c7179808",
"size": 30327160
}
]
}
```
<!--@include: ../../../../pkg/artifact/spec.md-->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW Vitepress imports also supports importing just a range of lines, like:

<!--@include: ../../../../pkg/artifact/spec.md{2, 10}-->

that would only import from line 2 to line 10. Not sure how helpful that is but is good to know.

47 changes: 47 additions & 0 deletions pkg/artifact/spec.md
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, is there a reason why this (and the other pkg md's) lives outside the docs folder? is this reused in the github repo like some sort of "readme" or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is just close to where those spec are implemented. Makes it convenient for anyone who needs to modify or understand those implementations

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This document makes no reference to the OCI spec -- do we want to include mention there? Otherwise, it might be confusing to see schemaVersion: 2 in the manifest JSON file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should refer to the OCI spec for this section.

@@ -0,0 +1,47 @@
# ModelKit Specification v0.1

A **ModelKit** represents a comprehensive bundle of AI/ML artifacts, including models, datasets, and code, along with their associated parameters. These components are crucial at various stages of a model's lifecycle. This specification details the format and organization of these artifacts and parameters, providing guidelines for their creation, management, and use.

## Terminology and Structure

**Artifacts:** The building blocks of a ModelKit. Artifacts can be models, datasets, or code, each stored and addressed individually. This modular approach facilitates direct access via tools. Artifact metadata is encapsulated within the kitfile, ensuring comprehensive documentation of each component.

The artifacts and their media types are
* Serialized Model: `application/vnd.kitops.modelkit.model.v1.tar+gzip`
* Datasets: `application/vnd.kitops.modelkit.dataset.v1.tar+gzip`
* Code: `application/vnd.kitops.modelkit.code.v1.tar+gzip`
Comment on lines +9 to +12
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're somewhat overloading the term artifact here. The overall ModelKit is an OCI artifact, and the model, datasets, and code are stored as layers within that artifact (since we're reusing the OCI image manifest spec). Referring to the layers as artifacts as well is a little confusing -- is there another word that works (or is "layers" sufficient)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Layers are confusing too becuase they suggest model kit is built incrementally with layers. I agree that artifact is overused on this context but I could not find a better word to describe

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could refer to them as packages which would fit with pack and unpack commands. Although that might get confusing since code libraries are often referred to as packages, maybe especially in Python... Could we call them "parcels" - that's also something you can pack/unpack...


**ModelKit File (Kitfile)** Acts as a record detailing the properties, relationships, and intended uses of the included artifacts. The Kitfile is central to understanding the structure and purpose of a ModelKit. It adopts the `application/vnd.kitops.modelkit.config.v1+json` media type for easy access and interpretation by tools.See the seperate kitfile specification on details
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**ModelKit File (Kitfile)** Acts as a record detailing the properties, relationships, and intended uses of the included artifacts. The Kitfile is central to understanding the structure and purpose of a ModelKit. It adopts the `application/vnd.kitops.modelkit.config.v1+json` media type for easy access and interpretation by tools.See the seperate kitfile specification on details
**ModelKit File (Kitfile)** Acts as a record detailing the properties, relationships, and intended uses of the included artifacts. The Kitfile is central to understanding the structure and purpose of a ModelKit. It adopts the `application/vnd.kitops.modelkit.config.v1+json` media type for easy access and interpretation by tools. See the seperate kitfile specification on details

Also, should we merge the kitfile specification and spec.md into one document? There's a fair bit of overlap.


**ModelKit Manifest:** This JSON document provides essential information about the model, including creation date, authorship, and a cryptographic hash of each artifact and the Kitfile. The manifest is immutable to preserve the integrity of the ModelKitID, ensuring any modification results in the creation of a new derived ModelKit, rather than altering the existing one.

### Identification and Management:

**ModelKitID:** A unique identifier for each ModelKit, derived from the SHA256 hash of its manifest. For example, `sha256:a9561eb1b190625c9adb5a9513e72c4dedafc1cb2d4c5236c9a6957ec7dfd5a9`.

**Tag:** A tag serves to map a descriptive, user-given name to any single modelKitID. Tag values are limited to the set of characters `[a-zA-Z0-9_.-]`, except they may not start with a `.` or `-` character. Tags are limited to 128 characters.

**Repository:** A collection of tags grouped under a common prefix (the name component before `:`). For example, in a ModelKit tagged with the name `myllm:3.1.4`, `myllm` is the Repository component of the name. A repository name is made up of slash-separated name components, optionally prefixed by a DNS hostname. The hostname must comply with standard DNS rules, but may not contain `_` characters. If a hostname is present, it may optionally be followed by a port number in the format `:8080`. Name components may contain lowercase characters, digits, and separators. A separator is defined as a period, one or two underscores, or one or more dashes. A name component may not start or end with a separator.


## ModelKit Manifest Example

Example of a ModelKit manifest with a single serialized model and kitfile.

```JSON
{
"schemaVersion": 2,
"config": {
"mediaType": "application/vnd.jozu.model.config.v1+json",
"digest": "sha256:d5815835051dd97d800a03f641ed8162877920e734d3d705b698912602b8c763",
"size": 301
},
"layers": [
{
"mediaType": "application/vnd.jozu.model.content.v1.tar+gzip",
"digest": "sha256:3f907c1a03bf20f20355fe449e18ff3f9de2e49570ffb536f1a32f20c7179808",
"size": 30327160
}
]
}
```