Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[INTROSPECTION][DISCUSSION] allow to introspect with BaseSchema & BaseSchemaAsync #258

Closed
ecyrbe opened this issue Nov 19, 2023 · 13 comments
Assignees
Labels
enhancement New feature or request priority This has priority

Comments

@ecyrbe
Copy link
Contributor

ecyrbe commented Nov 19, 2023

At the moment when introspecting generic schemas, typescript don't know about the existance of the schema type attribute .

exemple, with error at line 6:
image

Potential Solutions

Add a generic attribute to BaseSchema and BaseSchemaAsync :

/**
 * Base schema type.
 */
export type BaseSchema<TInput = any, TOutput = TInput> = {
  /**
   * The schema type. ie: 'string', 'number', 'object', 'array', 'boolean', etc.
   */
  type: string;

  /**
   * Whether it's async.
   */
  async: false;

  /**
   * Parses unknown input based on its schema.
   *
   * @param input The input to be parsed.
   * @param info The parse info.
   *
   * @returns The parse result.
   *
   * @internal
   */
  _parse(input: unknown, info?: ParseInfo): _ParseResult<TOutput>;
  /**
   * Input and output type.
   *
   * @internal
   */
  _types?: { input: TInput; output: TOutput };
};

/**
 * Base schema async type.
 */
export type BaseSchemaAsync<TInput = any, TOutput = TInput> = {
  /**
   * The schema type. ie: 'string', 'number', 'object', 'array', 'boolean', etc.
   */
  type: string;

  /**
   * Whether it's async.
   */
  async: true;
  /**
   * Parses unknown input based on its schema.
   *
   * @param input The input to be parsed.
   * @param info The parse info.
   *
   * @returns The parse result.
   *
   * @internal
   */
  _parse(input: unknown, info?: ParseInfo): Promise<_ParseResult<TOutput>>;
  /**
   * Input and output type.
   *
   * @internal
   */
  _types?: { input: TInput; output: TOutput };
};

Add an introspection helper

function getSchemaType<T extends BaseSchema|BaseSchemaAsync>(schema: T): string

Add an introspection visitor

function matchSchema<T extends BaseSchema|BaseSchemaAsync>(schema: T): SchemaVisitorBuilder;

exemple :

function introspectSchema<T extends BaseSchema|BaseSchemaAsync>(schema: T) {
  matchSchema(schema)
    .with(
      'string', (stringSchema)=> { // handle string }
   )..with(
      'boolean', (booleanSchema)=> { // handle bolean }
   ).with(
      'object', (objectSchema)=> { // handle object }
   )
    
}

Any other ideas ?

@Saeris
Copy link
Contributor

Saeris commented Nov 19, 2023

Requesting the same thing in #257 essentially. BaseSchema, BaseValidation, and BaseTransform should all be interfaces that describe the most generic shape of the return values for each of those function types, listing all of the common properties of each.

The essential problem here is that we can't really have type guards or build generic utilities that work with these building blocks without proper type inheritance.

Admittedly, this is partially my fault for not catching this in the PR review, as this was possible before but got taken out along the way somewhere.

@fabian-hiller
Copy link
Owner

There is a reason why I did not include type: string in BaseSchema, BaseValidation and BaseTransform. If a function accepts both BaseSchema and a specific schema like ObjectSchema, it is not possible to distinguish between the schemas using .type if BaseSchema contains type: string. Without type: string, however, 'type' in schema can be used to check whether .type is included. The result is that all future checks behave like a discriminated union. This can be seen in the getDefaults function.

To make a long story short. My current recommendation is not to use BaseSchema, but to specifically use the schema types that your function supports. This ensures maximum type safety and type casting can be completely avoided. I look forward to your feedback.

@fabian-hiller
Copy link
Owner

Issue #191 with #191 (comment) is partially related.

@ecyrbe
Copy link
Contributor Author

ecyrbe commented Nov 19, 2023

To make a long story short. My current recommendation is not to use BaseSchema, but to specifically use the schema types that your function supports. This ensures maximum type safety and type casting can be completely avoided. I look forward to your feedback.

For now i'm adding every single schema definition in a union as it adds typescasting automatically with typescript once you match. But the definition is really ugly.

Maybe a type union of all schemas types (like the one i created) could be added to valibot (should not impact treeshaking, since it's type only)

Here is the monster :

type GenericSchemaAsync =
  | AnySchemaAsync
  | ArraySchemaAsync<BaseSchema | BaseSchemaAsync>
  | BigintSchemaAsync
  | BooleanSchemaAsync
  | DateSchemaAsync
  | EnumSchemaAsync<Enum>
  | NullSchemaAsync
  | NumberSchemaAsync
  | NullableSchemaAsync<BaseSchema | BaseSchemaAsync>
  | NullishSchemaAsync<BaseSchema | BaseSchemaAsync>
  | BlobSchemaAsync
  | VoidSchemaAsync
  | NeverSchemaAsync
  | SetSchemaAsync<BaseSchema | BaseSchemaAsync>
  | MapSchemaAsync<BaseSchemaAsync | BaseSchema, BaseSchemaAsync | BaseSchema>
  | ObjectSchemaAsync<
      ObjectEntriesAsync,
      BaseSchema | BaseSchemaAsync | undefined
    >
  | StringSchemaAsync
  | SymbolSchemaAsync
  | UndefinedSchemaAsync
  | UnionSchemaAsync<UnionOptionsAsync>
  | UnknownSchemaAsync
  | LiteralSchemaAsync<Literal>
  | NanSchemaAsync
  | TupleSchemaAsync<TupleItems>
  | RecordSchemaAsync<RecordKeyAsync, BaseSchema | BaseSchemaAsync>
  | SpecialSchemaAsync<any>
  | VariantSchemaAsync<string, VariantOptionsAsync<string>>
  | InstanceSchemaAsync<Class>
  | OptionalSchemaAsync<BaseSchema>
  | PicklistSchemaAsync<PicklistOptions>
  | RecursiveSchemaAsync<() => BaseSchema | BaseSchemaAsync>
  | NonNullishSchemaAsync<BaseSchemaAsync>
  | NonNullableSchemaAsync<BaseSchema | BaseSchemaAsync>
  | NonOptionalSchemaAsync<BaseSchemaAsync | BaseSchema>;

type GenericSchema =
  | AnySchema
  | ArraySchema<BaseSchema>
  | BigintSchema
  | BooleanSchema
  | DateSchema
  | EnumSchema<Enum>
  | NullSchema
  | NumberSchema
  | NullableSchema<BaseSchema>
  | NullishSchema<BaseSchema>
  | BlobSchema
  | VoidSchema
  | NeverSchema
  | SetSchema<BaseSchema>
  | MapSchema<BaseSchema, BaseSchema>
  | ObjectSchema<ObjectEntries, BaseSchema | undefined>
  | StringSchema
  | SymbolSchema
  | UndefinedSchema
  | UnionSchema<UnionOptions>
  | UnknownSchema
  | LiteralSchema<Literal>
  | NanSchema
  | TupleSchema<TupleItems>
  | RecordSchema<RecordKey, BaseSchema>
  | SpecialSchema<any>
  | VariantSchema<string, VariantOptions<string>>
  | InstanceSchema<Class>
  | OptionalSchema<BaseSchema>
  | PicklistSchema<PicklistOptions>
  | RecursiveSchema<() => BaseSchema>
  | NonNullishSchema<BaseSchema>
  | NonNullableSchema<BaseSchema>
  | NonOptionalSchema<BaseSchema>;

function valibotSchemaToJsonSchema<T extends GenericSchema | GenericSchemaAsync>(
  schema: T
): any {
  switch (schema.type) {
    case "string":
      // handle string schema
      break;
    case "number":
      // handle number schema
      break;
    case "boolean":
      // handle boolean schema
      break;
    case "object":
      // handle object schema
      break;
    case "array":
      // handle array schema
      break;
    case "null":
      // handle null schema
      break;
    case "any":
      // handle any schema
      break;
  }
}

@fabian-hiller
Copy link
Owner

Thank you for your feedback. Yes, as written here #191 (comment) I plan to investigate this.

@Saeris
Copy link
Contributor

Saeris commented Nov 29, 2023

So, I think I would be okay with a Union export for all of the schema types because it doesn't add any weight to the package since it's just type info.

However, I still think extending off of a common root interface is the best way to go, as it is generally more maintainable as this library grows and it is simple to understand assignment with this sort of abstraction. Any potential errors are easier to debug when the types themselves are less complex. With a really large union of objects, it can be difficult to debug a type error when one does crop up.

Rather than exporting one swiss-army-knife utility for discriminating between all schemas as with your example with a switch statement above, I think small, individual type predicate functions would be better. The main reason being that you want to tree-shake out as much code here as possible, and you can't shake out unused cases in a switch.

I think the changes I've drafted in #259 will cover what you've requested here.


In some sense here, even though Valibot is built with a functional architecture, every schema is essentially a constructor function returning an instance of a schema class (albeit with very few properties and no inherited methods). What we're discussing here boils down to instanceof at the type level.

Here is an example of what I mean. Here we've deduplicated some of the code, and we could even go further here because with this we no longer need a type field at all, because we can just compare with instanceof BigintSchema instead. Similarly async could be hoisted up to a BaseSchema class, and if our default message is always Invalid type, we could just declare that as the default there too.

export class BigintSchema<TOutput = bigint> implements BaseSchema<bigint, TOutput> {
  type = 'bigint' as const;
  async = false as const;
  message: ErrorMessage;
  pipe?: Pipe<bigint>;

  constructor(arg1?: ErrorMessage | Pipe<bigint>, arg2?: Pipe<bigint>) {
    const [message = 'Invalid type', pipe] = getDefaultArgs(arg1, arg2);
    this.message = message;
    this.pipe = pipe;
  }

  _parse(input: unknown, info?: ParseInfo): _ParseResult<TOutput> {
    if (typeof input !== 'bigint') {
      return getSchemaIssues(info, 'type', 'bigint', this.message, input);
    }

    return executePipe(
      input,
      this.pipe,
      info,
      'bigint'
    ) as _ParseResult<TOutput>;
  }
}

// Honestly, these aren't even necessary anymore except to avoid the `new` keyword

export function bigint(pipe?: Pipe<bigint>): BigintSchema;
export function bigint(
  message?: ErrorMessage,
  pipe?: Pipe<bigint>
): BigintSchema;

export function bigint(
  arg1?: ErrorMessage | Pipe<bigint>,
  arg2?: Pipe<bigint>
): BigintSchema {
  return new BigintSchema(arg1, arg2);
}

Would this be a good idea? I don't know. I think there are some clear advantages as far as deduplication of code is concerned, but my guess would be that this feels icky as we'd be throwing in some OOP to a mostly functional library.

@fabian-hiller
Copy link
Owner

Thanks for your detailed response. Your thoughts are good and I think it might make sense to investigate this approach to discuss the pros and cons in bundle size, performance and developer experience. As for instanceof, we need to consider this issue, which to my knowledge still exists with Vite, as presumably a large portion of users are using Valibot with Vite.

I welcome further feedback from others on this topic.

@Saeris
Copy link
Contributor

Saeris commented Nov 30, 2023

I haven't personally run into that issue with instanceof before. Could Symbol.hasInstance be a possible solution to this? I'm not entirely sure from reading it.

And I'm also not quite sure if this odd behavior with instanceof is more of a concern where inheritance is involved. The way I'm imagining this, there would be the bare minimum of inheritance. At worst, internally, I think you would go as deep as Schema -> BaseSchema / BaseSchemaAsync -> BigintSchema / BigintSchemaAsync, where I believe in the case of the root Schema class, it would purely be a common ancestor for typeinfo where async: boolea instead of async: true or async: false. It would be the runtime equivalent of the generic T extends BaseSchema | BaseSchemaAsync. If this bug doesn't apply in this scenario then I think we can safely ignore it. Any further subclassing would be in userland.

@fabian-hiller
Copy link
Owner

Thank you for your feedback. I will consider this issue on the long run. For this and next week I will focus on #76 and #145.

@pschiffmann
Copy link

I just started generating ElasticSearch mappings (basically database schema declarations) from my Valibot schemas, and I ran into this issue as well.

I already have schemas defined like this:

import { array, date, object, string, uuid } from "valibot";

const UserSchema = object({
  id: string([uuid()]),
  createdAt: date(),
  name: string(),
  friendIds: array(string([uuid()])),
});

I want to generate an ElasticSearch mapping from this schema. I intend to use that mapping to execute an ElasticSearch migration. It needs to look like this in the end:

import { type MappingTypeMapping } from "@elastic/elasticsearch";

const UserMapping: MappingTypeMapping = {
  dynamic: "strict",
  properties: {
    id: {
      type: "keyword",
    },
    createdAt: {
      type: "date",
    },
    name: {
      type: "text",
    },
    friendIds: {
      type: "keyword",
    },
  },
};

Here is how I approach this conversion so far:

function schemaToMapping(
  schema: BaseSchema | BaseSchemaAsync
): MappingTypeMapping {
  switch (schema.type) {
    case "object":
      return {
        dynamic: "strict",
        properties: Object.fromEntries(
          Object.entries(schema.entries).map(([k, v]) => [
            k,
            schemaToMapping(v),
          ])
        ),
      };
    case "array":
      // ElasticSearch doesn't index arrays:
      // https://www.elastic.co/guide/en/elasticsearch/reference/current/array.html
      return schemaToMapping(schema.item);
    case "string":
      return { type: "text" };
    case "date":
      return { type: "date" };
    default:
      throw new Error("Unimplemented");
  }
}

But right now, the switch (schema.type) gives a TypeScript error, and the type narrowing doesn't work.

To make a long story short. My current recommendation is not to use BaseSchema, but to specifically use the schema types that your function supports. This ensures maximum type safety and type casting can be completely avoided. I look forward to your feedback.

This approach doesn't seem clean to me, because schemaToMapping() recurses into ObjectSchema.entries and ArraySchema.items, which could be any schema type.


There is a reason why I did not include type: string in BaseSchema, BaseValidation and BaseTransform. If a function accepts both BaseSchema and a specific schema like ObjectSchema, it is not possible to distinguish between the schemas using .type if BaseSchema contains type: string.

I don't understand what you mean. Could you give an example function where this leads to an error? I'm not suggesting you're wrong, I just can't follow the argument. 😅

@fabian-hiller
Copy link
Owner

Thanks for your message. I am aware that this is not perfect yet, but it is my goal to find a good developer experience on the long run. Please share any ideas and feedback with me.

This is how I would implement it at the moment. If we change the schema types to interface as suggested in #259, you can remove the 'type' in schema check.

import * as v from 'valibot';

type MappingSchema =
  | v.BaseSchema
  | v.ObjectSchema<v.ObjectEntries>
  | v.ArraySchema<v.BaseSchema>
  | v.StringSchema
  | v.DateSchema;

function schemaToMapping(schema: MappingSchema) {
  if ('type' in schema) {
    switch (schema.type) {
      case 'object':
        return {
          dynamic: 'strict',
          properties: Object.fromEntries(
            Object.entries(schema.entries).map(([k, v]) => [
              k,
              schemaToMapping(v),
            ])
          ),
        };
      case 'array':
        // ElasticSearch doesn't index arrays:
        // https://www.elastic.co/guide/en/elasticsearch/reference/current/array.html
        return schemaToMapping(schema.item);
      case 'string':
        return { type: 'text' };
      case 'date':
        return { type: 'date' };
      default:
        throw new Error('Unimplemented');
    }
  }
}

In a perfect world, we could change the schema type as follows. This would make it 100% type safe and also remove the 'type' in schema check. Unfortunately, TypeScript does not allow recursive types as generics.

type MappingSchema =
  | v.ObjectSchema<MappingSchema>
  | v.ArraySchema<MappingSchema>
  | v.StringSchema
  | v.DateSchema;

@pschiffmann
Copy link

Thanks for the suggestion! You are completely right, my function needs to have a union type parameter, simply adding BaseSchema.type would have never achieved the desired result.

For others on this thread, let me explain why (because it wasn't obvious to me at first either).

What we want to do is to switch on a schema .type, and get TypeScript type narrowing to work:

function f(schema: BaseSchema) {
  switch(schema.type) {
    case "object":
      console.log(schema.entries);
      break;
    case "array":
      console.log(schema.item);
      break;
  }
}

This will never work if the schema is typed as BaseSchema, even if a BaseSchema.type property were added to the type. TypeScript will never narrow the case "object" branch to type ObjectSchema, because BaseSchema is not a closed type!
What does "closed type" mean? For example, union types are closed types: The TypeScript compiler knows at compile all the possible sub-types that are constituents of the union. But BaseSchema is not closed – the TypeScript type checker can't know how many subtypes of BaseSchema exist at compile time.
For example, nothing prevents me from doing this:

interface FooSchema extends BaseSchema {
  type: "object";
}

I just defined a new subtype of BaseSchema. I can pass objects of this type to f(), because FooSchema is assignable to BaseSchema. The FooSchema.type matches the case branch, but doesn't have .entries property.
Even if your code doesn't contain a FooSchema, you might be publishing your code as an npm package; and that package might get imported into another project where a FooSchema exists.

As a result, TypeScript will never do the desired type narrowing as long as f() expects a BaseSchema parameter. The only way to get type narrowing is to convert the parameter to a closed type, i.e. a union type. (Or hope that TypeScript adds support for closed class hierarchies one day: microsoft/TypeScript#8306)


So, I think I would be okay with a Union export for all of the schema types

I agree that this would be a nice quality of life improvement that would be great to have in Valibot – this way, I don't have to type out the union type myself.

@fabian-hiller
Copy link
Owner

This should be fixed. The latest version provides isOfKind and isOfType to check the type of an object. Feel free to contact me if you have any questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request priority This has priority
Projects
None yet
Development

No branches or pull requests

4 participants