You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My dataframe records more than 120,000. When I use sqldf to query, if I set limit 20000, the program does not error. but If you remove limit 20000, the program fails. The error message is as follows.
the df records is more than 120,000.
source code:
df=pd.read_csv('./dataAnalyse/ht.csv',encoding='utf_8_sig') #支持中文路径
q="""
select a.[合同编号] as htbh1,b.[合同编号] as htbh2,a.[合同名称] as htmc1,b.[合同名称] as htmc2
from df a left join df b
on a.年度=b.年度 and a.公司名称=b.公司名称 and a.[合同编号]<>b.[合同编号]
limit 20000;
"""
df0=pysqldf(q)
print(df0.info())
The error message is as follows:
PS D:\python\myworks> & C:/python/Python37-32/python.exe d:/python/myworks/DataAnalyse/xsd.py
Traceback (most recent call last):
File "C:\python\Python37-32\lib\site-packages\sqlalchemy\engine\base.py", line 3206, in fetchall
l = self.process_rows(self._fetchall_impl())
File "C:\python\Python37-32\lib\site-packages\sqlalchemy\engine\base.py", line 3173, in _fetchall_impl
return self.cursor.fetchall()
sqlite3.OperationalError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\python\Python37-32\lib\site-packages\pandasql\sqldf.py", line 61, in call
result = read_sql(query, conn)
File "C:\python\Python37-32\lib\site-packages\pandas\io\sql.py", line 438, in read_sql
chunksize=chunksize,
File "C:\python\Python37-32\lib\site-packages\pandas\io\sql.py", line 1231, in read_query
data = result.fetchall()
File "C:\python\Python37-32\lib\site-packages\sqlalchemy\engine\base.py", line 3212, in fetchall
self.cursor, self.context)
File "C:\python\Python37-32\lib\site-packages\sqlalchemy\engine\base.py", line 1843, in _handle_dbapi_exception
from e
sqlalchemy.exc.OperationalError: (OperationalError) None None
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "d:/python/myworks/DataAnalyse/xsd.py", line 20, in
df0=pysqldf(q)
File "d:/python/myworks/DataAnalyse/xsd.py", line 7, in pysqldf
return sqldf(q,globals())
File "C:\python\Python37-32\lib\site-packages\pandasql\sqldf.py", line 156, in sqldf
return PandaSQL(db_uri)(query, env)
File "C:\python\Python37-32\lib\site-packages\pandasql\sqldf.py", line 63, in call
raise PandaSQLException(ex)
pandasql.sqldf.PandaSQLException: (OperationalError) None None
The text was updated successfully, but these errors were encountered:
@dqaudithillstone Part of the performance issue with this package is that it uses SqLite as a backend, which means that you're essentially not using one of the main benefits of pandas, which is in memory computation. As a solution to this problem, a few months back I created a package called dataframe_sql, which solves this problem by parsing the sql and translating it to native pandas operations. Hope this helps!
My dataframe records more than 120,000. When I use sqldf to query, if I set limit 20000, the program does not error. but If you remove limit 20000, the program fails. The error message is as follows.
the df records is more than 120,000.
source code:
df=pd.read_csv('./dataAnalyse/ht.csv',encoding='utf_8_sig') #支持中文路径
q="""
select a.[合同编号] as htbh1,b.[合同编号] as htbh2,a.[合同名称] as htmc1,b.[合同名称] as htmc2
from df a left join df b
on a.年度=b.年度 and a.公司名称=b.公司名称 and a.[合同编号]<>b.[合同编号]
limit 20000;
"""
df0=pysqldf(q)
print(df0.info())
The error message is as follows:
PS D:\python\myworks> & C:/python/Python37-32/python.exe d:/python/myworks/DataAnalyse/xsd.py
Traceback (most recent call last):
File "C:\python\Python37-32\lib\site-packages\sqlalchemy\engine\base.py", line 3206, in fetchall
l = self.process_rows(self._fetchall_impl())
File "C:\python\Python37-32\lib\site-packages\sqlalchemy\engine\base.py", line 3173, in _fetchall_impl
return self.cursor.fetchall()
sqlite3.OperationalError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\python\Python37-32\lib\site-packages\pandasql\sqldf.py", line 61, in call
result = read_sql(query, conn)
File "C:\python\Python37-32\lib\site-packages\pandas\io\sql.py", line 438, in read_sql
chunksize=chunksize,
File "C:\python\Python37-32\lib\site-packages\pandas\io\sql.py", line 1231, in read_query
data = result.fetchall()
File "C:\python\Python37-32\lib\site-packages\sqlalchemy\engine\base.py", line 3212, in fetchall
self.cursor, self.context)
File "C:\python\Python37-32\lib\site-packages\sqlalchemy\engine\base.py", line 1843, in _handle_dbapi_exception
from e
sqlalchemy.exc.OperationalError: (OperationalError) None None
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "d:/python/myworks/DataAnalyse/xsd.py", line 20, in
df0=pysqldf(q)
File "d:/python/myworks/DataAnalyse/xsd.py", line 7, in pysqldf
return sqldf(q,globals())
File "C:\python\Python37-32\lib\site-packages\pandasql\sqldf.py", line 156, in sqldf
return PandaSQL(db_uri)(query, env)
File "C:\python\Python37-32\lib\site-packages\pandasql\sqldf.py", line 63, in call
raise PandaSQLException(ex)
pandasql.sqldf.PandaSQLException: (OperationalError) None None
The text was updated successfully, but these errors were encountered: