Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Save/Get SQLTarget table schema in FeatureSet (differences between 'str' and 'type(str)') #5238

Open
2 tasks done
george0st opened this issue Mar 4, 2024 · 5 comments

Comments

@george0st
Copy link
Collaborator

MLRun Version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of MLRun CE.

Reproducible Example

import mlrun
import mlrun.feature_store as fstore
from mlrun.datastore.targets import RedisNoSqlTarget,SQLTarget, CSVTarget, ParquetTarget
from mlrun.data_types.data_types import ValueType
import pandas as pd
import os
import datetime
import random
import string

def mysql_wrong_serialization(project_name):
    mlrun.set_env_from_file("mlrun-nonprod.env")
    project = mlrun.get_or_create_project(project_name, context='./', user_project=False)

    feature_name = "basic-party"
    feature_set_old = fstore.FeatureSet(feature_name, entities=[fstore.Entity("party-id", value_type=ValueType.INT64),
                                                            fstore.Entity("party-idm", value_type=ValueType.INT64)],
                                    engine="storey")
    feature_set_old.add_feature(fstore.Feature(name="party-type"))


    conn = "mysql+pymysql://testuser:testpwd@localhost:3306/test"

    random_table_suffix = ''.join(random.choices(string.ascii_lowercase + string.digits, k=5))
    feature_set_old.set_targets(targets=[SQLTarget(name="we2", db_url=conn, table_name=f'my_table_{random_table_suffix}',
                                               schema = {'party-id': int, "party-idm": int, 'party-type': str},
                                               create_table=True,
                                               primary_key_column='party-id')],
                            with_defaults=False)


    feature_set_old.save()

    feature_set_new=fstore.get_feature_set(f"{project_name}/{feature_name}")

    data = {"party-id": [1, 2, 3],
            "party-idm": [10, 20, 30],
            "party-type": ["a1", "b2", "c3"]}
    dataFrm = pd.DataFrame(data)

    # Error is during the get feature set new, because the value of schema for SQLTarget is load wrong
    # the difference is between 'str' vs 'type(str)' and it generate issue during create table in MySQL

    # NOTE: all is fine, when I used ingest to the feature_set_old, because it is without load

    fstore.ingest(feature_set_new,
                  dataFrm,
                  return_df=False,
                  # overwrite=True,
                  infer_options=mlrun.data_types.data_types.InferOptions.default())

if __name__ == '__main__':
    project="cxxxx2"
    mysql_wrong_serialization(project)

Issue Description

See exception:

  File "C:\Python\test\.venv\lib\site-packages\mlrun\datastore\targets.py", line 1782, in add_writer_step
    self._create_sql_table()
  File "C:\Python\test\.venv\lib\site-packages\mlrun\datastore\targets.py", line 1920, in _create_sql_table
    raise TypeError(f"{col_type} unsupported type")
TypeError: None unsupported type

The error is during the get feature set new, because the value of schema for SQLTarget is load wrong, the difference is between 'str' vs 'type(str)' and it generated issue during create table in MySQL.

Expected Behavior

Current code sample will create table in MySQL (without exception)

Installation OS

Windows

Installation Method

Docker

Python Version

3.9.10

MLRun Version

1.6.1

Additional Information

BTW: It really generated a lot of effort to identify this issue.

@george0st
Copy link
Collaborator Author

@liranbg , little cherry pick.

@assaf758
Copy link
Member

assaf758 commented Mar 5, 2024

@george0st thanks, we'll check it.

@george0st
Copy link
Collaborator Author

Hi @assaf758 , please, any news (are you able to reproduce the issue)?

@assaf758
Copy link
Member

not yet @george0st, hopefully we'll get to this in the coming days.

@george0st
Copy link
Collaborator Author

Errors in TS301 and TS302 generated this issue see the report in attachment
qgt-mlrun-2024-04-12 9212152476112.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants