Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AnalysisException/DATATYPE_MISMATCH error when generating summary dataframe from sources that have a column named "summary" #264

Closed
artruk opened this issue Apr 18, 2024 · 2 comments · Fixed by #274
Assignees
Labels
bug Something isn't working Fixed

Comments

@artruk
Copy link

artruk commented Apr 18, 2024

Expected Behavior

Current Behavior

Generating summary dataframe using DataAnalyzer seems to fail whenever the source being analyzed has a column named "summary"

Steps to Reproduce (for bugs)

import dbldatagen as dg

df = spark.range(10).withColumnRenamed("id", "summary")
summary_df = dg.DataAnalyzer(sparkSession=spark, df = df).summarizeToDF()

Context

Your Environment

  • dbldatagen version used:
  • Databricks Runtime version:
  • Cloud environment used:
@ronanstokes-db ronanstokes-db self-assigned this May 21, 2024
@ronanstokes-db ronanstokes-db added the bug Something isn't working label May 21, 2024
@ronanstokes-db
Copy link
Contributor

we'll add a fix to this in the next hotfix.

In the meantime you can rename the "summary" field to something else - but avoid using leading underscores as these may conflict with internal column names

@ronanstokes-db ronanstokes-db linked a pull request May 21, 2024 that will close this issue
11 tasks
@ronanstokes-db
Copy link
Contributor

Fixed in hotfix as of 05/22/24

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Fixed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants