You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When performing parallel data reading with ordering in queries, the results are incorrect. Specifically, using DESC ordering on the id column causes the output to contain zeroed data.
What are the steps to reproduce the behavior?
Execute a query with ordering by any column, such as DESC ordering on id.
Database setup if the error only happens on specific data or data type
Table schema and example data
CREATETABLEtest_table (
id SERIALPRIMARY KEY,
name VARCHAR(100),
age INT,
email VARCHAR(100)
);
INSERT INTO test_table (name, age, email) VALUES
('Alice', 30, 'alice@example.com'),
('Bob', 25, 'bob@example.com'),
('Charlie', 35, 'charlie@example.com'),
('Diana', 28, 'diana@example.com');
In [52]: cx.read_sql(url, "select * from test_table order by id DESC limit 2", partition_on="id", partition_num=2)
Out[52]:
idnameageemail0000010000In [53]: cx.read_sql(url, "select * from test_table order by id ASC limit 2", partition_on="id", partition_num=2)
Out[53]:
idnameageemail01Alice30alice@example.com12Bob25bob@example.com
Postgres logs:
SELECTmin(CXTMPTAB_RANGE.id), max(CXTMPTAB_RANGE.id) FROM (SELECT*FROM test_table LIMIT2) AS CXTMPTAB_RANGE
SELECTcount(*) FROM (SELECT*FROM test_table LIMIT2) AS CXTMPTAB_COUNT
COPY (SELECT*FROM (SELECT*FROM test_table ORDER BY id DESCLIMIT2) AS CXTMPTAB_PART WHERE1<=CXTMPTAB_PART.idANDCXTMPTAB_PART.id<2) TO STDOUT WITH BINARY
COPY (SELECT*FROM (SELECT*FROM test_table ORDER BY id DESCLIMIT2) AS CXTMPTAB_PART WHERE2<=CXTMPTAB_PART.idANDCXTMPTAB_PART.id<3) TO STDOUT WITH BINARY
SELECTmin(CXTMPTAB_RANGE.id), max(CXTMPTAB_RANGE.id) FROM (SELECT*FROM test_table LIMIT2) AS CXTMPTAB_RANGE
SELECTcount(*) FROM (SELECT*FROM test_table LIMIT2) AS CXTMPTAB_COUNT
COPY (SELECT*FROM (SELECT*FROM test_table ORDER BY id ASCLIMIT2) AS CXTMPTAB_PART WHERE1<=CXTMPTAB_PART.idANDCXTMPTAB_PART.id<2) TO STDOUT WITH BINARY
COPY (SELECT*FROM (SELECT*FROM test_table ORDER BY id ASCLIMIT2) AS CXTMPTAB_PART WHERE2<=CXTMPTAB_PART.idANDCXTMPTAB_PART.id<3) TO STDOUT WITH BINARY
The text was updated successfully, but these errors were encountered:
Additional Issue: Data Inconsistency with Parallel Reading (without ORDER BY)
When performing parallel reading of queries, there can be instances where data no longer matches the filter criteria. For example, consider a situation where the initial count query returns 100 records with id values ranging from 1 to 101. Before fetching the partition [90..100], the age of the record with id 99 changes from 25 to 26. If the filter condition is WHERE age < 26, this record will no longer match the filter, resulting in df zeroed values like (0, 0, 0, 0).
It also seems that with an increased amount of data on parallel queries, zeros are also possible (I could not check)
WHERE age > 23 and age < 26
count - 20
range - [1...101]
56 id changed from 23 to 24 before own query part
actual_count = 21
In general, I have a problem where I get zeros with parallel reading of database data where there are frequent changes. In most cases, the database can only increase the amount of data for the specified filter
What language are you using?
Python.
What version are you using?
0.3.3/0.3.2
What database are you using?
PostgreSQL
What dataframe are you using?
Pandas
Can you describe your bug?
When performing parallel data reading with ordering in queries, the results are incorrect. Specifically, using DESC ordering on the id column causes the output to contain zeroed data.
What are the steps to reproduce the behavior?
Execute a query with ordering by any column, such as DESC ordering on id.
Database setup if the error only happens on specific data or data type
Table schema and example data
Postgres logs:
The text was updated successfully, but these errors were encountered: