Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Omnisci string length error, Cast from dictionary-encoded string to none-encoded would be slow #41

Open
datapythonista opened this issue Jul 13, 2021 · 0 comments
Labels
bug Something isn't working

Comments

@datapythonista
Copy link
Contributor

This issue has been moved from the Ibis repo: ibis-project/ibis#2338

Omnisci is failing in test_strings.py::test_string[length] with the next error: omnisci.thrift.ttypes.TOmniSciException: TOmniSciException(error_msg='Exception: Cast from dictionary-encoded string to none-encoded would be slow')

This test is failing. xfailing for now, to make the CI in ibis-project/ibis#2335, but we should fix the bug.

Full error:

2020-08-18T11:50:03.4754023Z ________________________ test_string[OmniSciDB-length] _________________________ 2020-08-18T11:50:03.4754332Z 2020-08-18T11:50:03.4754820Z self = 2020-08-18T11:50:03.4755801Z operation = 'SELECT CHAR_LENGTH("string_col") AS tmp\nFROM functional_alltypes' 2020-08-18T11:50:03.4756172Z parameters = None 2020-08-18T11:50:03.4756401Z 2020-08-18T11:50:03.4756708Z def execute(self, operation, parameters=None): 2020-08-18T11:50:03.4757010Z """Execute a SQL statement. 2020-08-18T11:50:03.4757327Z 2020-08-18T11:50:03.4757615Z Parameters 2020-08-18T11:50:03.4758248Z ---------- 2020-08-18T11:50:03.4759135Z operation: str 2020-08-18T11:50:03.4759461Z A SQL query 2020-08-18T11:50:03.4759757Z parameters: dict 2020-08-18T11:50:03.4760065Z Parameters to substitute into ``operation``. 2020-08-18T11:50:03.4760373Z 2020-08-18T11:50:03.4760644Z Returns 2020-08-18T11:50:03.4761106Z ------- 2020-08-18T11:50:03.4761522Z self : Cursor 2020-08-18T11:50:03.4762013Z 2020-08-18T11:50:03.4762471Z Examples 2020-08-18T11:50:03.4762906Z -------- 2020-08-18T11:50:03.4763629Z >>> c = conn.cursor() 2020-08-18T11:50:03.4764013Z >>> c.execute("select symbol, qty from stocks") 2020-08-18T11:50:03.4764321Z >>> list(c) 2020-08-18T11:50:03.4764876Z [('RHAT', 100.0), ('IBM', 1000.0), ('MSFT', 1000.0), ('IBM', 500.0)] 2020-08-18T11:50:03.4765216Z 2020-08-18T11:50:03.4765514Z Passing in ``parameters``: 2020-08-18T11:50:03.4766003Z 2020-08-18T11:50:03.4766325Z >>> c.execute("select symbol qty from stocks where qty <= :max_qty", 2020-08-18T11:50:03.4766653Z ... parameters={"max_qty": 500}) 2020-08-18T11:50:03.4767155Z [('RHAT', 100.0), ('IBM', 500.0)] 2020-08-18T11:50:03.4767511Z """ 2020-08-18T11:50:03.4767817Z 2020-08-18T11:50:03.4768179Z # https://github.com/heavyai/pymapd/issues/263 2020-08-18T11:50:03.4768517Z operation = operation.strip() 2020-08-18T11:50:03.4768819Z 2020-08-18T11:50:03.4769091Z if parameters is not None: 2020-08-18T11:50:03.4769579Z operation = str(_bind_parameters(operation, parameters)) 2020-08-18T11:50:03.4770257Z self.rowcount = -1 2020-08-18T11:50:03.4770769Z try: 2020-08-18T11:50:03.4771075Z result = self.connection._client.sql_execute( 2020-08-18T11:50:03.4771541Z self.connection._session, 2020-08-18T11:50:03.4771827Z operation, 2020-08-18T11:50:03.4772116Z column_format=True, 2020-08-18T11:50:03.4772591Z nonce=None, 2020-08-18T11:50:03.4777503Z first_n=-1, 2020-08-18T11:50:03.4777958Z > at_most_n=-1, 2020-08-18T11:50:03.4778057Z ) 2020-08-18T11:50:03.4778133Z 2020-08-18T11:50:03.4778601Z /usr/share/miniconda/lib/python3.7/site-packages/pymapd/cursor.py:118: 2020-08-18T11:50:03.4778748Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 2020-08-18T11:50:03.4778819Z 2020-08-18T11:50:03.4778943Z self = 2020-08-18T11:50:03.4779241Z session = 'w5kKtMXo8LL1wKYX8PJXuDtOzgWlRnFf' 2020-08-18T11:50:03.4779734Z query = 'SELECT CHAR_LENGTH("string_col") AS tmp\nFROM functional_alltypes' 2020-08-18T11:50:03.4780233Z column_format = True, nonce = None, first_n = -1, at_most_n = -1 2020-08-18T11:50:03.4780306Z 2020-08-18T11:50:03.4780443Z def sql_execute(self, session, query, column_format, nonce, first_n, at_most_n): 2020-08-18T11:50:03.4780591Z """ 2020-08-18T11:50:03.4780719Z Parameters: 2020-08-18T11:50:03.4780981Z - session 2020-08-18T11:50:03.4781214Z - query 2020-08-18T11:50:03.4781838Z - column_format 2020-08-18T11:50:03.4782105Z - nonce 2020-08-18T11:50:03.4782518Z - first_n 2020-08-18T11:50:03.4782993Z - at_most_n 2020-08-18T11:50:03.4783120Z 2020-08-18T11:50:03.4783216Z """ 2020-08-18T11:50:03.4783519Z self.send_sql_execute(session, query, column_format, nonce, first_n, at_most_n) 2020-08-18T11:50:03.4783875Z > return self.recv_sql_execute() 2020-08-18T11:50:03.4784118Z 2020-08-18T11:50:03.4784443Z /usr/share/miniconda/lib/python3.7/site-packages/omnisci/thrift/OmniSci.py:1745: 2020-08-18T11:50:03.4784573Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 2020-08-18T11:50:03.4784667Z 2020-08-18T11:50:03.4784964Z self = 2020-08-18T11:50:03.4785034Z 2020-08-18T11:50:03.4785150Z def recv_sql_execute(self): 2020-08-18T11:50:03.4785272Z iprot = self._iprot 2020-08-18T11:50:03.4786421Z (fname, mtype, rseqid) = iprot.readMessageBegin() 2020-08-18T11:50:03.4786555Z if mtype == TMessageType.EXCEPTION: 2020-08-18T11:50:03.4786853Z x = TApplicationException() 2020-08-18T11:50:03.4787125Z x.read(iprot) 2020-08-18T11:50:03.4787251Z iprot.readMessageEnd() 2020-08-18T11:50:03.4787375Z raise x 2020-08-18T11:50:03.4787498Z result = sql_execute_result() 2020-08-18T11:50:03.4787941Z result.read(iprot) 2020-08-18T11:50:03.4788089Z iprot.readMessageEnd() 2020-08-18T11:50:03.4788223Z if result.success is not None: 2020-08-18T11:50:03.4788330Z return result.success 2020-08-18T11:50:03.4788461Z if result.e is not None: 2020-08-18T11:50:03.4788589Z > raise result.e 2020-08-18T11:50:03.4789111Z E omnisci.thrift.ttypes.TOmniSciException: TOmniSciException(error_msg='Exception: Cast from dictionary-encoded string to none-encoded would be slow') 2020-08-18T11:50:03.4789409Z 2020-08-18T11:50:03.4790617Z /usr/share/miniconda/lib/python3.7/site-packages/omnisci/thrift/OmniSci.py:1774: TOmniSciException 2020-08-18T11:50:03.4790709Z 2020-08-18T11:50:03.4790839Z The above exception was the direct cause of the following exception: 2020-08-18T11:50:03.4790905Z 2020-08-18T11:50:03.4791186Z self = 2020-08-18T11:50:03.4791503Z query = 'SELECT CHAR_LENGTH("string_col") AS tmp\nFROM functional_alltypes' 2020-08-18T11:50:03.4791642Z results = True, ipc = None, gpu_device = None, kwargs = {} 2020-08-18T11:50:03.4791928Z cursor = , params = {} 2020-08-18T11:50:03.4792073Z execute = > 2020-08-18T11:50:03.4792142Z 2020-08-18T11:50:03.4792391Z def _execute( 2020-08-18T11:50:03.4792503Z self, 2020-08-18T11:50:03.4792614Z query: str, 2020-08-18T11:50:03.4792727Z results: bool = True, 2020-08-18T11:50:03.4792842Z ipc: Optional[bool] = None, 2020-08-18T11:50:03.4792938Z gpu_device: Optional[int] = None, 2020-08-18T11:50:03.4793052Z **kwargs, 2020-08-18T11:50:03.4793161Z ): 2020-08-18T11:50:03.4793268Z """ 2020-08-18T11:50:03.4793381Z Compile and execute Ibis expression. 2020-08-18T11:50:03.4793495Z 2020-08-18T11:50:03.4793805Z Return result in-memory in the appropriate object type. 2020-08-18T11:50:03.4793904Z 2020-08-18T11:50:03.4794011Z Parameters 2020-08-18T11:50:03.4794241Z ---------- 2020-08-18T11:50:03.4794354Z query : string 2020-08-18T11:50:03.4794467Z DML or DDL statement 2020-08-18T11:50:03.4794584Z results : boolean, default False 2020-08-18T11:50:03.4794704Z Pass True if the query as a result set 2020-08-18T11:50:03.4794807Z ipc : bool, optional, default None 2020-08-18T11:50:03.4794933Z Enable Inter Process Communication (IPC) execution type. 2020-08-18T11:50:03.4795065Z `ipc` default value (None) when `gpu_device` is None is interpreted 2020-08-18T11:50:03.4795199Z as False, otherwise it is interpreted as True. 2020-08-18T11:50:03.4795321Z gpu_device : int, optional, default None 2020-08-18T11:50:03.4795437Z GPU device ID. 2020-08-18T11:50:03.4795547Z 2020-08-18T11:50:03.4795653Z Returns 2020-08-18T11:50:03.4795867Z ------- 2020-08-18T11:50:03.4795992Z output : execution type dependent 2020-08-18T11:50:03.4796114Z If IPC is set as True and no GPU device is set: 2020-08-18T11:50:03.4796239Z ``pandas.DataFrame`` 2020-08-18T11:50:03.4796363Z If IPC is set as True and GPU device is set: ``cudf.DataFrame`` 2020-08-18T11:50:03.4796494Z If IPC is set as False and no GPU device is set: 2020-08-18T11:50:03.4796617Z pandas.DataFrame or 2020-08-18T11:50:03.4796741Z geopandas.GeoDataFrame (if it uses geospatial data) 2020-08-18T11:50:03.4796836Z 2020-08-18T11:50:03.4796941Z Raises 2020-08-18T11:50:03.4797173Z ------ 2020-08-18T11:50:03.4797285Z Exception 2020-08-18T11:50:03.4797396Z if execution method fails. 2020-08-18T11:50:03.4797508Z """ 2020-08-18T11:50:03.4797601Z # time context is not implemented for omniscidb yet 2020-08-18T11:50:03.4797860Z kwargs.pop('timecontext', None) 2020-08-18T11:50:03.4798067Z # raise an Exception if kwargs is not empty: 2020-08-18T11:50:03.4798192Z if kwargs: 2020-08-18T11:50:03.4798306Z raise com.IbisInputError( 2020-08-18T11:50:03.4798616Z '"OmniSciDB.execute" method just support the follow parameter:' 2020-08-18T11:50:03.4798918Z ' "query", "results", "ipc" and "gpu_device". The follow extra' 2020-08-18T11:50:03.4799221Z ' parameters was given: "{}".'.format(', '.join(kwargs.keys())) 2020-08-18T11:50:03.4799343Z ) 2020-08-18T11:50:03.4799428Z 2020-08-18T11:50:03.4799539Z if isinstance(query, (DDL, DML)): 2020-08-18T11:50:03.4799660Z query = query.compile() 2020-08-18T11:50:03.4799772Z 2020-08-18T11:50:03.4799886Z if ipc is None and gpu_device is None: 2020-08-18T11:50:03.4800002Z ipc = self.ipc 2020-08-18T11:50:03.4800119Z gpu_device = self.gpu_device 2020-08-18T11:50:03.4800210Z 2020-08-18T11:50:03.4800325Z self._check_execution_type(ipc, gpu_device) 2020-08-18T11:50:03.4800440Z 2020-08-18T11:50:03.4800544Z cursor = ( 2020-08-18T11:50:03.4800654Z OmniSciDBGeoCursor 2020-08-18T11:50:03.4800770Z if FULL_GEO_SUPPORTED 2020-08-18T11:50:03.4800884Z else OmniSciDBDefaultCursor 2020-08-18T11:50:03.4800973Z ) 2020-08-18T11:50:03.4801080Z 2020-08-18T11:50:03.4801267Z params = {} 2020-08-18T11:50:03.4801376Z 2020-08-18T11:50:03.4801487Z if gpu_device is None and not ipc: 2020-08-18T11:50:03.4801606Z execute = self.con.cursor().execute 2020-08-18T11:50:03.4801700Z elif gpu_device is None and ipc: 2020-08-18T11:50:03.4801818Z execute = self.con.select_ipc 2020-08-18T11:50:03.4801933Z else: 2020-08-18T11:50:03.4802383Z params['device_id'] = gpu_device 2020-08-18T11:50:03.4802511Z execute = self.con.select_ipc_gpu 2020-08-18T11:50:03.4802623Z cursor = OmniSciDBGPUCursor 2020-08-18T11:50:03.4802738Z 2020-08-18T11:50:03.4802841Z try: 2020-08-18T11:50:03.4802932Z > result = cursor(execute(query, **params)) 2020-08-18T11:50:03.4803012Z 2020-08-18T11:50:03.4803121Z ibis/omniscidb/client.py:844: 2020-08-18T11:50:03.4803243Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 2020-08-18T11:50:03.4803307Z 2020-08-18T11:50:03.4803420Z self = 2020-08-18T11:50:03.4803719Z operation = 'SELECT CHAR_LENGTH("string_col") AS tmp\nFROM functional_alltypes' 2020-08-18T11:50:03.4803845Z parameters = None 2020-08-18T11:50:03.4803895Z 2020-08-18T11:50:03.4804005Z def execute(self, operation, parameters=None): 2020-08-18T11:50:03.4804120Z """Execute a SQL statement. 2020-08-18T11:50:03.4804227Z 2020-08-18T11:50:03.4804328Z Parameters 2020-08-18T11:50:03.4804528Z ---------- 2020-08-18T11:50:03.4804636Z operation: str 2020-08-18T11:50:03.4804747Z A SQL query 2020-08-18T11:50:03.4804855Z parameters: dict 2020-08-18T11:50:03.4804969Z Parameters to substitute into ``operation``. 2020-08-18T11:50:03.4805081Z 2020-08-18T11:50:03.4805181Z Returns 2020-08-18T11:50:03.4805379Z ------- 2020-08-18T11:50:03.4805489Z self : Cursor 2020-08-18T11:50:03.4805595Z 2020-08-18T11:50:03.4805695Z Examples 2020-08-18T11:50:03.4805918Z -------- 2020-08-18T11:50:03.4806030Z >>> c = conn.cursor() 2020-08-18T11:50:03.4806146Z >>> c.execute("select symbol, qty from stocks") 2020-08-18T11:50:03.4806238Z >>> list(c) 2020-08-18T11:50:03.4806528Z [('RHAT', 100.0), ('IBM', 1000.0), ('MSFT', 1000.0), ('IBM', 500.0)] 2020-08-18T11:50:03.4806652Z 2020-08-18T11:50:03.4806758Z Passing in ``parameters``: 2020-08-18T11:50:03.4806866Z 2020-08-18T11:50:03.4806980Z >>> c.execute("select symbol qty from stocks where qty <= :max_qty", 2020-08-18T11:50:03.4807181Z ... parameters={"max_qty": 500}) 2020-08-18T11:50:03.4807442Z [('RHAT', 100.0), ('IBM', 500.0)] 2020-08-18T11:50:03.4807557Z """ 2020-08-18T11:50:03.4807662Z 2020-08-18T11:50:03.4807804Z # https://github.com/heavyai/pymapd/issues/263 2020-08-18T11:50:03.4807932Z operation = operation.strip() 2020-08-18T11:50:03.4808041Z 2020-08-18T11:50:03.4808151Z if parameters is not None: 2020-08-18T11:50:03.4808249Z operation = str(_bind_parameters(operation, parameters)) 2020-08-18T11:50:03.4808499Z self.rowcount = -1 2020-08-18T11:50:03.4808615Z try: 2020-08-18T11:50:03.4808732Z result = self.connection._client.sql_execute( 2020-08-18T11:50:03.4808851Z self.connection._session, 2020-08-18T11:50:03.4808964Z operation, 2020-08-18T11:50:03.4809077Z column_format=True, 2020-08-18T11:50:03.4809164Z nonce=None, 2020-08-18T11:50:03.4809405Z first_n=-1, 2020-08-18T11:50:03.4809640Z at_most_n=-1, 2020-08-18T11:50:03.4809752Z ) 2020-08-18T11:50:03.4809862Z except T.TOmniSciException as e: 2020-08-18T11:50:03.4809978Z > raise _translate_exception(e) from e 2020-08-18T11:50:03.4810305Z E pymapd.exceptions.Error: Exception: Cast from dictionary-encoded string to none-encoded would be slow 2020-08-18T11:50:03.4810461Z 2020-08-18T11:50:03.4810760Z /usr/share/miniconda/lib/python3.7/site-packages/pymapd/cursor.py:121: Error 2020-08-18T11:50:03.4810849Z 2020-08-18T11:50:03.4810942Z During handling of the above exception, another exception occurred: 2020-08-18T11:50:03.4811022Z 2020-08-18T11:50:03.4811137Z backend = 2020-08-18T11:50:03.4811255Z alltypes = DatabaseTable[table] 2020-08-18T11:50:03.4811343Z name: functional_alltypes 2020-08-18T11:50:03.4811450Z schema: 2020-08-18T11:50:03.4811561Z index : int64 2020-08-18T11:50:03.4811672Z Unnamed__0 : int64 2020-08-18T11:50:03.4811781Z id : int32 2020-08-18T11:50:03.4811890Z ... date_string_col : string 2020-08-18T11:50:03.4811997Z string_col : string 2020-08-18T11:50:03.4812084Z timestamp_col : timestamp 2020-08-18T11:50:03.4812190Z year_ : int32 2020-08-18T11:50:03.4812297Z month_ : int32 2020-08-18T11:50:03.4812512Z df = index Unnamed__0 id ... timestamp_col year_ month_ 2020-08-18T11:50:03.4812846Z 0 0 0 6690 ... 2010-11-01 00...05:08:13 2010 1 2020-08-18T11:50:03.4813139Z 7299 7299 7299 3959 ... 2010-01-31 05:09:13 2010 1 2020-08-18T11:50:03.4813201Z 2020-08-18T11:50:03.4813449Z [7300 rows x 15 columns] 2020-08-18T11:50:03.4813566Z result_func = at 0x7fb38c72e170> 2020-08-18T11:50:03.4813684Z expected_func = at 0x7fb38c72e200> 2020-08-18T11:50:03.4813740Z 2020-08-18T11:50:03.4813847Z @pytest.mark.parametrize( 2020-08-18T11:50:03.4814122Z ('result_func', 'expected_func'), 2020-08-18T11:50:03.4814236Z [ 2020-08-18T11:50:03.4814342Z param( 2020-08-18T11:50:03.4814600Z lambda t: t.string_col.contains('6'), 2020-08-18T11:50:03.4814867Z lambda t: t.string_col.str.contains('6'), 2020-08-18T11:50:03.4815087Z id='contains', 2020-08-18T11:50:03.4815198Z ), 2020-08-18T11:50:03.4815308Z param( 2020-08-18T11:50:03.4815559Z lambda t: t.string_col.like('6%'), 2020-08-18T11:50:03.4815823Z lambda t: t.string_col.str.contains('6.*'), 2020-08-18T11:50:03.4816056Z id='like', 2020-08-18T11:50:03.4816166Z ), 2020-08-18T11:50:03.4816250Z param( 2020-08-18T11:50:03.4816498Z lambda t: t.string_col.like('6^%'), 2020-08-18T11:50:03.4816761Z lambda t: t.string_col.str.contains('6%'), 2020-08-18T11:50:03.4817009Z id='complex_like_escape', 2020-08-18T11:50:03.4817220Z ), 2020-08-18T11:50:03.4817338Z param( 2020-08-18T11:50:03.4817615Z lambda t: t.string_col.like('6^%%'), 2020-08-18T11:50:03.4817857Z lambda t: t.string_col.str.contains('6%.*'), 2020-08-18T11:50:03.4818113Z id='complex_like_escape_match', 2020-08-18T11:50:03.4818230Z ), 2020-08-18T11:50:03.4818337Z param( 2020-08-18T11:50:03.4818594Z lambda t: t.string_col.ilike('6%'), 2020-08-18T11:50:03.4818858Z lambda t: t.string_col.str.contains('6.*'), 2020-08-18T11:50:03.4819091Z id='ilike', 2020-08-18T11:50:03.4819200Z ), 2020-08-18T11:50:03.4819283Z param( 2020-08-18T11:50:03.4819551Z lambda t: t.string_col.re_search(r'[[:digit:]]+'), 2020-08-18T11:50:03.4819821Z lambda t: t.string_col.str.contains(r'\d+'), 2020-08-18T11:50:03.4820062Z id='re_search', 2020-08-18T11:50:03.4820185Z marks=pytest.mark.xfail_backends((Spark, PySpark)), 2020-08-18T11:50:03.4820304Z ), 2020-08-18T11:50:03.4820409Z param( 2020-08-18T11:50:03.4820662Z lambda t: t.string_col.re_extract(r'([[:digit:]]+)', 0), 2020-08-18T11:50:03.4820949Z lambda t: t.string_col.str.extract(r'(\d+)', expand=False), 2020-08-18T11:50:03.4821194Z id='re_extract', 2020-08-18T11:50:03.4821405Z marks=pytest.mark.xfail_backends((Spark, PySpark)), 2020-08-18T11:50:03.4821522Z ), 2020-08-18T11:50:03.4821627Z param( 2020-08-18T11:50:03.4821920Z lambda t: t.string_col.re_replace(r'[[:digit:]]+', 'a'), 2020-08-18T11:50:03.4822200Z lambda t: t.string_col.str.replace(r'\d+', 'a'), 2020-08-18T11:50:03.4822415Z id='re_replace', 2020-08-18T11:50:03.4822538Z marks=pytest.mark.xfail_backends((Spark, PySpark)), 2020-08-18T11:50:03.4822654Z ), 2020-08-18T11:50:03.4822763Z param( 2020-08-18T11:50:03.4823028Z lambda t: t.string_col.re_search(r'\\d+'), 2020-08-18T11:50:03.4823293Z lambda t: t.string_col.str.contains(r'\d+'), 2020-08-18T11:50:03.4823536Z id='re_search_spark', 2020-08-18T11:50:03.4823660Z marks=pytest.mark.xpass_backends((Clickhouse, Impala, Spark)), 2020-08-18T11:50:03.4823758Z ), 2020-08-18T11:50:03.4824026Z param( 2020-08-18T11:50:03.4824289Z lambda t: t.string_col.re_extract(r'(\\d+)', 0), 2020-08-18T11:50:03.4824563Z lambda t: t.string_col.str.extract(r'(\d+)', expand=False), 2020-08-18T11:50:03.4824811Z id='re_extract_spark', 2020-08-18T11:50:03.4824933Z marks=pytest.mark.xpass_backends((Clickhouse, Impala, Spark)), 2020-08-18T11:50:03.4825047Z ), 2020-08-18T11:50:03.4825149Z param( 2020-08-18T11:50:03.4825384Z lambda t: t.string_col.re_replace(r'\\d+', 'a'), 2020-08-18T11:50:03.4825653Z lambda t: t.string_col.str.replace(r'\d+', 'a'), 2020-08-18T11:50:03.4825893Z id='re_replace_spark', 2020-08-18T11:50:03.4826014Z marks=pytest.mark.xpass_backends((Clickhouse, Impala, Spark)), 2020-08-18T11:50:03.4826128Z ), 2020-08-18T11:50:03.4826232Z param( 2020-08-18T11:50:03.4826484Z lambda t: t.string_col.re_search(r'\d+'), 2020-08-18T11:50:03.4826748Z lambda t: t.string_col.str.contains(r'\d+'), 2020-08-18T11:50:03.4826964Z id='re_search_spark', 2020-08-18T11:50:03.4827086Z marks=pytest.mark.xfail_backends((Clickhouse, Impala, Spark)), 2020-08-18T11:50:03.4827202Z ), 2020-08-18T11:50:03.4827304Z param( 2020-08-18T11:50:03.4827635Z lambda t: t.string_col.re_extract(r'(\d+)', 0), 2020-08-18T11:50:03.4827977Z lambda t: t.string_col.str.extract(r'(\d+)', expand=False), 2020-08-18T11:50:03.4828369Z id='re_extract_spark', 2020-08-18T11:50:03.4828526Z marks=pytest.mark.xfail_backends((Clickhouse, Impala, Spark)), 2020-08-18T11:50:03.4828620Z ), 2020-08-18T11:50:03.4828722Z param( 2020-08-18T11:50:03.4829007Z lambda t: t.string_col.re_replace(r'\d+', 'a'), 2020-08-18T11:50:03.4829272Z lambda t: t.string_col.str.replace(r'\d+', 'a'), 2020-08-18T11:50:03.4829514Z id='re_replace_spark', 2020-08-18T11:50:03.4829632Z marks=pytest.mark.xfail_backends((Clickhouse, Impala, Spark)), 2020-08-18T11:50:03.4829746Z ), 2020-08-18T11:50:03.4829826Z param( 2020-08-18T11:50:03.4829939Z lambda t: t.string_col.repeat(2), 2020-08-18T11:50:03.4830055Z lambda t: t.string_col * 2, 2020-08-18T11:50:03.4830284Z id='repeat', 2020-08-18T11:50:03.4830393Z ), 2020-08-18T11:50:03.4830495Z param( 2020-08-18T11:50:03.4830752Z lambda t: t.string_col.translate('0', 'a'), 2020-08-18T11:50:03.4831030Z lambda t: t.string_col.str.translate(str.maketrans('0', 'a')), 2020-08-18T11:50:03.4831244Z id='translate', 2020-08-18T11:50:03.4831351Z ), 2020-08-18T11:50:03.4831455Z param( 2020-08-18T11:50:03.4831693Z lambda t: t.string_col.find('a'), 2020-08-18T11:50:03.4832052Z lambda t: t.string_col.str.find('a'), 2020-08-18T11:50:03.4832278Z id='find', 2020-08-18T11:50:03.4832391Z ), 2020-08-18T11:50:03.4832470Z param( 2020-08-18T11:50:03.4832716Z lambda t: t.string_col.lpad(10, 'a'), 2020-08-18T11:50:03.4832990Z lambda t: t.string_col.str.pad(10, fillchar='a', side='left'), 2020-08-18T11:50:03.4833226Z id='lpad', 2020-08-18T11:50:03.4833330Z ), 2020-08-18T11:50:03.4833433Z param( 2020-08-18T11:50:03.4833680Z lambda t: t.string_col.rpad(10, 'a'), 2020-08-18T11:50:03.4833957Z lambda t: t.string_col.str.pad(10, fillchar='a', side='right'), 2020-08-18T11:50:03.4834171Z id='rpad', 2020-08-18T11:50:03.4834277Z ), 2020-08-18T11:50:03.4834379Z param( 2020-08-18T11:50:03.4834627Z lambda t: t.string_col.find_in_set(['1']), 2020-08-18T11:50:03.4834882Z lambda t: t.string_col.str.find('1'), 2020-08-18T11:50:03.4835114Z id='find_in_set', 2020-08-18T11:50:03.4835224Z ), 2020-08-18T11:50:03.4835303Z param( 2020-08-18T11:50:03.4835548Z lambda t: t.string_col.find_in_set(['a']), 2020-08-18T11:50:03.4835803Z lambda t: t.string_col.str.find('a'), 2020-08-18T11:50:03.4836046Z id='find_in_set_all_missing', 2020-08-18T11:50:03.4836157Z ), 2020-08-18T11:50:03.4836257Z param( 2020-08-18T11:50:03.4836370Z lambda t: t.string_col.lower(), 2020-08-18T11:50:03.4836802Z lambda t: t.string_col.str.lower(), 2020-08-18T11:50:03.4837101Z id='lower', 2020-08-18T11:50:03.4837212Z ), 2020-08-18T11:50:03.4837316Z param( 2020-08-18T11:50:03.4837425Z lambda t: t.string_col.upper(), 2020-08-18T11:50:03.4837540Z lambda t: t.string_col.str.upper(), 2020-08-18T11:50:03.4837782Z id='upper', 2020-08-18T11:50:03.4837889Z ), 2020-08-18T11:50:03.4837970Z param( 2020-08-18T11:50:03.4838083Z lambda t: t.string_col.reverse(), 2020-08-18T11:50:03.4838536Z lambda t: t.string_col.str[::-1], 2020-08-18T11:50:03.4838769Z id='reverse', 2020-08-18T11:50:03.4838876Z ), 2020-08-18T11:50:03.4838978Z param( 2020-08-18T11:50:03.4839092Z lambda t: t.string_col.ascii_str(), 2020-08-18T11:50:03.4839437Z lambda t: t.string_col.map(ord).astype('int32'), 2020-08-18T11:50:03.4839711Z id='ascii_str', 2020-08-18T11:50:03.4839820Z ), 2020-08-18T11:50:03.4839923Z param( 2020-08-18T11:50:03.4840032Z lambda t: t.string_col.length(), 2020-08-18T11:50:03.4840297Z lambda t: t.string_col.str.len().astype('int32'), 2020-08-18T11:50:03.4840529Z id='length', 2020-08-18T11:50:03.4840618Z ), 2020-08-18T11:50:03.4840723Z param( 2020-08-18T11:50:03.4840834Z lambda t: t.string_col.strip(), 2020-08-18T11:50:03.4840948Z lambda t: t.string_col.str.strip(), 2020-08-18T11:50:03.4841178Z id='strip', 2020-08-18T11:50:03.4841286Z ), 2020-08-18T11:50:03.4841389Z param( 2020-08-18T11:50:03.4841476Z lambda t: t.string_col.lstrip(), 2020-08-18T11:50:03.4841593Z lambda t: t.string_col.str.lstrip(), 2020-08-18T11:50:03.4841827Z id='lstrip', 2020-08-18T11:50:03.4841939Z ), 2020-08-18T11:50:03.4842044Z param( 2020-08-18T11:50:03.4842153Z lambda t: t.string_col.rstrip(), 2020-08-18T11:50:03.4842268Z lambda t: t.string_col.str.rstrip(), 2020-08-18T11:50:03.4842497Z id='rstrip', 2020-08-18T11:50:03.4842582Z ), 2020-08-18T11:50:03.4842685Z param( 2020-08-18T11:50:03.4842871Z lambda t: t.string_col.capitalize(), 2020-08-18T11:50:03.4842990Z lambda t: t.string_col.str.capitalize(), 2020-08-18T11:50:03.4843408Z id='capitalize', 2020-08-18T11:50:03.4843954Z ), 2020-08-18T11:50:03.4844116Z param( 2020-08-18T11:50:03.4844227Z lambda t: t.date_string_col.substr(2, 3), 2020-08-18T11:50:03.4844371Z lambda t: t.date_string_col.str[2:5], 2020-08-18T11:50:03.4844745Z id='substr', 2020-08-18T11:50:03.4844876Z ), 2020-08-18T11:50:03.4844999Z param( 2020-08-18T11:50:03.4845141Z lambda t: t.date_string_col.left(2), 2020-08-18T11:50:03.4845285Z lambda t: t.date_string_col.str[:2], 2020-08-18T11:50:03.4845536Z id='left', 2020-08-18T11:50:03.4845670Z ), 2020-08-18T11:50:03.4845793Z param( 2020-08-18T11:50:03.4845926Z lambda t: t.date_string_col.right(2), 2020-08-18T11:50:03.4846246Z lambda t: t.date_string_col.str[-2:], 2020-08-18T11:50:03.4846523Z id='right', 2020-08-18T11:50:03.4846652Z ), 2020-08-18T11:50:03.4846774Z param( 2020-08-18T11:50:03.4846882Z lambda t: t.date_string_col[1:3], 2020-08-18T11:50:03.4847025Z lambda t: t.date_string_col.str[1:3], 2020-08-18T11:50:03.4847610Z id='slice', 2020-08-18T11:50:03.4847718Z ), 2020-08-18T11:50:03.4847819Z param( 2020-08-18T11:50:03.4848268Z lambda t: t.date_string_col[t.date_string_col.length() - 1 :], 2020-08-18T11:50:03.4848906Z lambda t: t.date_string_col.str[-1:], 2020-08-18T11:50:03.4849112Z id='expr_slice_begin', 2020-08-18T11:50:03.4849218Z ), 2020-08-18T11:50:03.4849317Z param( 2020-08-18T11:50:03.4849429Z lambda t: t.date_string_col[: t.date_string_col.length()], 2020-08-18T11:50:03.4849551Z lambda t: t.date_string_col, 2020-08-18T11:50:03.4849792Z id='expr_slice_end', 2020-08-18T11:50:03.4849899Z ), 2020-08-18T11:50:03.4849976Z param( 2020-08-18T11:50:03.4851336Z lambda t: t.date_string_col[:], 2020-08-18T11:50:03.4851651Z lambda t: t.date_string_col, 2020-08-18T11:50:03.4851949Z id='expr_empty_slice', 2020-08-18T11:50:03.4852179Z ), 2020-08-18T11:50:03.4852310Z param( 2020-08-18T11:50:03.4852439Z lambda t: t.date_string_col[ 2020-08-18T11:50:03.4852888Z t.date_string_col.length() - 2 : t.date_string_col.length() - 1 2020-08-18T11:50:03.4853157Z ], 2020-08-18T11:50:03.4853655Z lambda t: t.date_string_col.str[-2:-1], 2020-08-18T11:50:03.4856337Z id='expr_slice_begin_end', 2020-08-18T11:50:03.4856613Z ), 2020-08-18T11:50:03.4856764Z param( 2020-08-18T11:50:03.4857073Z lambda t: t.date_string_col.split('/'), 2020-08-18T11:50:03.4857337Z lambda t: t.date_string_col.str.split('/'), 2020-08-18T11:50:03.4857570Z id='split', 2020-08-18T11:50:03.4857678Z ), 2020-08-18T11:50:03.4857760Z param( 2020-08-18T11:50:03.4858031Z lambda t: ibis.literal('-').join(['a', t.string_col, 'c']), 2020-08-18T11:50:03.4858295Z lambda t: 'a-' + t.string_col + '-c', 2020-08-18T11:50:03.4858562Z id='join', 2020-08-18T11:50:03.4858672Z ), 2020-08-18T11:50:03.4858775Z ], 2020-08-18T11:50:03.4858883Z ) 2020-08-18T11:50:03.4858966Z @pytest.mark.xfail_unsupported 2020-08-18T11:50:03.4859084Z def test_string(backend, alltypes, df, result_func, expected_func): 2020-08-18T11:50:03.4859204Z expr = result_func(alltypes) 2020-08-18T11:50:03.4859312Z > result = expr.execute() 2020-08-18T11:50:03.4859385Z 2020-08-18T11:50:03.4859468Z ibis/tests/all/test_string.py:237:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants