-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]: When importing Parquet files, you can ignore some built-in index columns. #33197
Comments
why do this column exist? did we enable some specical configs? |
The PurposeThe main purpose of How It Appears
How to Avoid or Handle ItIf you want to avoid having the Resetting the Indeximport pandas as pd
# Create a DataFrame with an index
df = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6]
}).set_index('A')
# Reset the index
df.reset_index(inplace=True)
# Save to a Parquet file
df.to_parquet('example.parquet') Ignoring the Index on ReadIf the Parquet file already includes the import pandas as pd
# Read Parquet file and ignore the index
df = pd.read_parquet('example.parquet', index_col=None) By understanding the purpose and appearance of |
How can we determine that this is an index column and not a column mistakenly provided by the user? @zhuwenxing |
Suppose we have a Parquet file with columns a, b, c, and d, and we want to import a collection with columns a, b, and c. Can we make this import successful? @bigsheeper |
I don't think that's feasible. Parquet import does not support importing dynamic field data, currently milvus will raise a message/hint like "column d is not in schema, if it's a dynamic field, please reformat data by bulk_writer". If we "make this import successful", when the user enables dynamic field, they might assume that column d has been imported, but in reality, data is being ignored. |
agree on that. |
Is there an existing issue for this?
Is your feature request related to a problem? Please describe.
For Parquet files generated by pandas, in addition to user-defined columns, some index columns are also generated. When there are extra columns, the import process will treat them as data columns as well.
Describe the solution you'd like.
ignore some built-in index columns
Describe an alternate solution.
No response
Anything else? (Additional Context)
No response
The text was updated successfully, but these errors were encountered: