Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for duckdb #5349

Open
jerome-f opened this issue Mar 20, 2024 · 3 comments
Open

Support for duckdb #5349

jerome-f opened this issue Mar 20, 2024 · 3 comments
Labels
feature Change that does not break compatibility, but affects the public interfaces.

Comments

@jerome-f
Copy link

Motivation

Support for duckdb as a RDB. Sqlalchemy already supports duckdb with duckdb-engine, it can easily be substituted as a drop in for sqlite. Avoids the database lock issue

Description

support duckdb in optuna.storages

Alternatives (optional)

No response

Additional context (optional)

No response

@jerome-f jerome-f added the feature Change that does not break compatibility, but affects the public interfaces. label Mar 20, 2024
@not522
Copy link
Member

not522 commented Mar 29, 2024

Thank you for your feedback! I'm not familiar with duckdb, but I think it already works using URL (duckdb:///...) as the storage url if the SQLAlchemy support is implemented.

@jerome-f
Copy link
Author

@not522 thanks for looking into it and yes sqlalchemy does support duckdb but I got errors in API call. duckdb is same convention as sqlite but allows for seamless integration with many python libraries (pandas, parquet I/O) and avoids database lock issue and it is more efficient, fast and small disk footprint. I ran the following code

import optuna

def objective(trial):
    x = trial.suggest_float("x", -10, 10)
    return (x - 2) ** 2
storage = optuna.storages.RDBStorage(url=f"duckdb:///test.db")
study = optuna.load_study(study_name="distributed-example", storage=storage)
study.optimize(objective, n_trials=100)

and it generates the following error which is related to the SERIAL keyword in db creation. In DuckDB the keyword INTEGER or BIGINT along with the AUTO_INCREMENT keyword to achieve similar functionality as SERIAL. But I am not familiar with api calls to sqlalchemy so posted this here.

---------------------------------------------------------------------------
CatalogException                          Traceback (most recent call last)
File /mambaforge/lib/python3.11/site-packages/sqlalchemy/engine/base.py:1971, in Connection._exec_single_context(self, dialect, context, statement, parameters)
   1970     if not evt_handled:
-> 1971         self.dialect.do_execute(
   1972             cursor, str_statement, effective_parameters, context
   1973         )
   1975 if self._has_events or self.engine._has_events:

File /mambaforge/lib/python3.11/site-packages/sqlalchemy/engine/default.py:919, in DefaultDialect.do_execute(self, cursor, statement, parameters, context)
    918 def do_execute(self, cursor, statement, parameters, context=None):
--> 919     cursor.execute(statement, parameters)

File /mambaforge/lib/python3.11/site-packages/duckdb_engine/__init__.py:163, in ConnectionWrapper.execute(self, statement, parameters, context)
    162     else:
--> 163         self.__c.execute(statement, parameters)
    164 except RuntimeError as e:

CatalogException: Catalog Error: Type with name SERIAL does not exist!
Did you mean "real"?

The above exception was the direct cause of the following exception:

ProgrammingError                          Traceback (most recent call last)
Cell In[3], line 6
      4     x = trial.suggest_float("x", -10, 10)
      5     return (x - 2) ** 2
----> 6 storage = optuna.storages.RDBStorage(url=f"duckdb:///test.db")
      7 study = optuna.load_study(study_name="distributed-example", storage=storage)
      8 study.optimize(objective, n_trials=100)

File /mambaforge/lib/python3.11/site-packages/optuna/storages/_rdb/storage.py:229, in RDBStorage.__init__(self, url, engine_kwargs, skip_compatibility_check, heartbeat_interval, grace_period, failed_trial_callback, skip_table_creation)
    225 self.scoped_session = sqlalchemy_orm.scoped_session(
    226     sqlalchemy_orm.sessionmaker(bind=self.engine)
    227 )
    228 if not skip_table_creation:
--> 229     models.BaseModel.metadata.create_all(self.engine)
    231 self._version_manager = _VersionManager(self.url, self.engine, self.scoped_session)
    232 if not skip_compatibility_check:

File /mambaforge/lib/python3.11/site-packages/sqlalchemy/sql/schema.py:5825, in MetaData.create_all(self, bind, tables, checkfirst)
   5801 def create_all(
   5802     self,
   5803     bind: _CreateDropBind,
   5804     tables: Optional[_typing_Sequence[Table]] = None,
   5805     checkfirst: bool = True,
   5806 ) -> None:
   5807     """Create all tables stored in this metadata.
   5808 
   5809     Conditional by default, will not attempt to recreate tables already
   (...)
   5823 
   5824     """
-> 5825     bind._run_ddl_visitor(
   5826         ddl.SchemaGenerator, self, checkfirst=checkfirst, tables=tables
   5827     )

File /mambaforge/lib/python3.11/site-packages/sqlalchemy/engine/base.py:3255, in Engine._run_ddl_visitor(self, visitorcallable, element, **kwargs)
   3248 def _run_ddl_visitor(
   3249     self,
   3250     visitorcallable: Type[Union[SchemaGenerator, SchemaDropper]],
   3251     element: SchemaItem,
   3252     **kwargs: Any,
   3253 ) -> None:
   3254     with self.begin() as conn:
-> 3255         conn._run_ddl_visitor(visitorcallable, element, **kwargs)

File /mambaforge/lib/python3.11/site-packages/sqlalchemy/engine/base.py:2461, in Connection._run_ddl_visitor(self, visitorcallable, element, **kwargs)
   2449 def _run_ddl_visitor(
   2450     self,
   2451     visitorcallable: Type[Union[SchemaGenerator, SchemaDropper]],
   2452     element: SchemaItem,
   2453     **kwargs: Any,
   2454 ) -> None:
   2455     """run a DDL visitor.
   2456 
   2457     This method is only here so that the MockConnection can change the
   2458     options given to the visitor so that "checkfirst" is skipped.
   2459 
   2460     """
-> 2461     visitorcallable(self.dialect, self, **kwargs).traverse_single(element)

File /mambaforge/lib/python3.11/site-packages/sqlalchemy/sql/visitors.py:664, in ExternalTraversal.traverse_single(self, obj, **kw)
    662 meth = getattr(v, "visit_%s" % obj.__visit_name__, None)
    663 if meth:
--> 664     return meth(obj, **kw)

File /mambaforge/lib/python3.11/site-packages/sqlalchemy/sql/ddl.py:918, in SchemaGenerator.visit_metadata(self, metadata)
    916 for table, fkcs in collection:
    917     if table is not None:
--> 918         self.traverse_single(
    919             table,
    920             create_ok=True,
    921             include_foreign_key_constraints=fkcs,
    922             _is_metadata_operation=True,
    923         )
    924     else:
    925         for fkc in fkcs:

File /mambaforge/lib/python3.11/site-packages/sqlalchemy/sql/visitors.py:664, in ExternalTraversal.traverse_single(self, obj, **kw)
    662 meth = getattr(v, "visit_%s" % obj.__visit_name__, None)
    663 if meth:
--> 664     return meth(obj, **kw)

File /mambaforge/lib/python3.11/site-packages/sqlalchemy/sql/ddl.py:956, in SchemaGenerator.visit_table(self, table, create_ok, include_foreign_key_constraints, _is_metadata_operation)
    947 if not self.dialect.supports_alter:
    948     # e.g., don't omit any foreign key constraints
    949     include_foreign_key_constraints = None
    951 CreateTable(
    952     table,
    953     include_foreign_key_constraints=(
    954         include_foreign_key_constraints
    955     ),
--> 956 )._invoke_with(self.connection)
    958 if hasattr(table, "indexes"):
    959     for index in table.indexes:

File /mambaforge/lib/python3.11/site-packages/sqlalchemy/sql/ddl.py:314, in ExecutableDDLElement._invoke_with(self, bind)
    312 def _invoke_with(self, bind):
    313     if self._should_execute(self.target, bind):
--> 314         return bind.execute(self)

File /mambaforge/lib/python3.11/site-packages/sqlalchemy/engine/base.py:1422, in Connection.execute(self, statement, parameters, execution_options)
   1420     raise exc.ObjectNotExecutableError(statement) from err
   1421 else:
-> 1422     return meth(
   1423         self,
   1424         distilled_parameters,
   1425         execution_options or NO_OPTIONS,
   1426     )

File /mambaforge/lib/python3.11/site-packages/sqlalchemy/sql/ddl.py:180, in ExecutableDDLElement._execute_on_connection(self, connection, distilled_params, execution_options)
    177 def _execute_on_connection(
    178     self, connection, distilled_params, execution_options
    179 ):
--> 180     return connection._execute_ddl(
    181         self, distilled_params, execution_options
    182     )

File /mambaforge/lib/python3.11/site-packages/sqlalchemy/engine/base.py:1533, in Connection._execute_ddl(self, ddl, distilled_parameters, execution_options)
   1528 dialect = self.dialect
   1530 compiled = ddl.compile(
   1531     dialect=dialect, schema_translate_map=schema_translate_map
   1532 )
-> 1533 ret = self._execute_context(
   1534     dialect,
   1535     dialect.execution_ctx_cls._init_ddl,
   1536     compiled,
   1537     None,
   1538     exec_opts,
   1539     compiled,
   1540 )
   1541 if self._has_events or self.engine._has_events:
   1542     self.dispatch.after_execute(
   1543         self,
   1544         ddl,
   (...)
   1548         ret,
   1549     )

File /mambaforge/lib/python3.11/site-packages/sqlalchemy/engine/base.py:1850, in Connection._execute_context(self, dialect, constructor, statement, parameters, execution_options, *args, **kw)
   1848     return self._exec_insertmany_context(dialect, context)
   1849 else:
-> 1850     return self._exec_single_context(
   1851         dialect, context, statement, parameters
   1852     )

File /mambaforge/lib/python3.11/site-packages/sqlalchemy/engine/base.py:1990, in Connection._exec_single_context(self, dialect, context, statement, parameters)
   1987     result = context._setup_result_proxy()
   1989 except BaseException as e:
-> 1990     self._handle_dbapi_exception(
   1991         e, str_statement, effective_parameters, cursor, context
   1992     )
   1994 return result

File /mambaforge/lib/python3.11/site-packages/sqlalchemy/engine/base.py:2357, in Connection._handle_dbapi_exception(self, e, statement, parameters, cursor, context, is_sub_exec)
   2355 elif should_wrap:
   2356     assert sqlalchemy_exception is not None
-> 2357     raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
   2358 else:
   2359     assert exc_info[1] is not None

File /mambaforge/lib/python3.11/site-packages/sqlalchemy/engine/base.py:1971, in Connection._exec_single_context(self, dialect, context, statement, parameters)
   1969                 break
   1970     if not evt_handled:
-> 1971         self.dialect.do_execute(
   1972             cursor, str_statement, effective_parameters, context
   1973         )
   1975 if self._has_events or self.engine._has_events:
   1976     self.dispatch.after_cursor_execute(
   1977         self,
   1978         cursor,
   (...)
   1982         context.executemany,
   1983     )

File /mambaforge/lib/python3.11/site-packages/sqlalchemy/engine/default.py:919, in DefaultDialect.do_execute(self, cursor, statement, parameters, context)
    918 def do_execute(self, cursor, statement, parameters, context=None):
--> 919     cursor.execute(statement, parameters)

File /mambaforge/lib/python3.11/site-packages/duckdb_engine/__init__.py:163, in ConnectionWrapper.execute(self, statement, parameters, context)
    161         self.__c.execute(statement)
    162     else:
--> 163         self.__c.execute(statement, parameters)
    164 except RuntimeError as e:
    165     if e.args[0].startswith("Not implemented Error"):

ProgrammingError: (duckdb.duckdb.CatalogException) Catalog Error: Type with name SERIAL does not exist!
Did you mean "real"?
[SQL: 
CREATE TABLE studies (
	study_id SERIAL NOT NULL, 
	study_name VARCHAR(512) NOT NULL, 
	PRIMARY KEY (study_id)
)

]
(Background on this error at: https://sqlalche.me/e/20/f405)

@not522
Copy link
Member

not522 commented May 14, 2024

Sorry for my late response and thank you for your error report. In my understanding, this issue should be addressed on the duckdb side, and I think it's been discussed recently. ref:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Change that does not break compatibility, but affects the public interfaces.
Projects
None yet
Development

No branches or pull requests

2 participants