Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design Meeting Notes (2023-11-06) #137

Open
DanielRosenwasser opened this issue Nov 10, 2023 · 0 comments
Open

Design Meeting Notes (2023-11-06) #137

DanielRosenwasser opened this issue Nov 10, 2023 · 0 comments

Comments

@DanielRosenwasser
Copy link
Member

Python and TypeChat

  • Still thinking about Pydantic as a basis.
  • pydantic-core specifies the built-in discriminators for validators
  • Seems feasible to generate TypeScript from these. Either over JSON schema or directly over data structures.
    • What about custom validators/serializers? Can't handle those with something custom.
  • Would be weird to tell users they have to be on a fixed version of Pydantic.
  • Anecdotally, have had good results with JSON schema (or more specifically, YAML versions of JSON schema).
    • YAML seems to do really well...
    • As well as TypeScript as a spec language? (check back here)
  • What workflow could we have here?
    • Start with a YAML-authored JSON schema.
    • Have a proof-of-concept of using kwalify
  • The good part of these schemas is that they can specify more than built-in annotations for types.
  • So specify in
  • 3 concerns
    • Spec language for language models (succinct, few tokens, familiar to recent LLMs)
    • Validation expressivity (you can say "it's a zip code" or "it's an email address").
    • Developer UX (end-to-end, you have a pleasant authoring language, type-checking, auto-complete, etc.)
  • Tied to that are the following:
    • What does a developer write?
    • What does an LLM see?
    • What
  • Be aware - there's a distinction for errors committed by an LLM versus errors committed by an end-user.
    • If a user says "my zip code is abcdefg", then that's a user error, not a language model error.
  • Another example - TypeChat Programs in Python
    • Top level functions exported.

      def add(x: float, y: float): float
      def sub(x: float, y: float): float
      # ...
  • Nothing seems to work as great as TypeScript for LLMs.
    • Lightest on tokens, most familiar.
  • Okay, but what's the authoring format? What do you do here? What if you need to generate types on the fly?
  • How will we solve the programmatic case in the TypeScript world?
    • We don't have a perfect solution right now. Maybe rely on libraries like Zod?
    • What's that going to have to look like? You say that something is a string, but then it's generated on the fly.
    • How does that get there?
  • Do these string unions/enums actually matter? Maybe for discriminated unions, but maybe not for items in a database?
  • What are these supposed to look like?
    • It may be best to insert these into comments.
  • So what would we do with Python?
  • We really really want to see what the accuracy is between the TypeScript and Python forms.
    • If it's not accurate, we need to see if we can convert it into TypeScript.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant