Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Error 8115 Arithmetic overflow error converting nvarchar to data type numeric" when seeding a numeric column with some empty rows and specifying column_types #425

Open
jabortell-nordic opened this issue Jul 18, 2023 · 5 comments

Comments

@jabortell-nordic
Copy link
Collaborator

I'm running the following:

  • dbt-core==1.4.6
  • dbt-core==1.4.3
  • SQL Server 2022

I'm seeing some strange behavior around seeding numeric columns with some null values, where the column type is specified in column_types. Here's a minimum reproducible example:

In seeds/foo.csv:

x
123

456

In seeds/properties.yml

version: 2

seeds:
  - name: foo
    config:
      column_types:
        x: numeric(18, 0)

Run dbt seed

Here's what I've found so far:

  1. As-is, this results in error 8115 as noted in the title.
  2. If there are no empty rows, there is no error.
  3. If all rows are empty, there is no error.
  4. If column_types is not specified, then there is no error, and the inferred data type is int.
  5. When I run select cast('' as numeric(18, 0)) in my SQL Server instance, the error returned is 8114, "Error converting data type varchar to numeric."

I've been digging through macros and agate methods, but I'm at a loss. I'm happy to keep investigating, but I'm in need of pointers in the right direction.

@alison985
Copy link

What happens for select cast(null as numeric(18, 0))? What happens when there are two columns in the CSV?

@jabortell-nordic
Copy link
Collaborator Author

select cast(null as numeric(18, 0)) gives null. With two columns, the error is the same (8115).

@alison985
Copy link

Okay, those were worth a try to rule some things out. Since "" also doesn't work in SQL Server, albeit with a different error code, I don't think this is an issue with the dbt-sqlserver python package. I have 0 authority or leadership related to this repo, but personally I would close this issue.

That said, here's the next steps I'd propose for your troubleshooting:

  • I'd do all tests with at least two columns where the second column always has a value to deal with red herring, annoying cases around csv formatting.
  • Try removing the (18, 0), I've had some weird experiences with that depending on sql dialect and where you're using it in dbt.
  • Try a different datatype name that does the same thing and sql server also recognizes. (float?)
  • Try changing 123 and 456 in your example to 123.0 and 456.0
  • Look at what may have changed with data types in SQL Server 2022. Look at dbt docs pages for this library to see if this repo has been tested against that version. It's really rare for SQL Server people to be on the latest version.
  • Are you supporting multiple sql dialects or declaring the dbt yml which database/server/dialect you're using?
  • Look at the dbt logs of the SQL statement it's trying to run when it errors out. How could you adjust to make that syntax work? Would over-writing a macro do it? Would changing the seed file help? Do you need to specify more config for the seed?
  • How does the pandas python package handle this?
  • How does the dbt-core python package handle this in other languages? I'd refer to the postgres implementation instead of bigquery, redshift, or snowflake implementations.
  • It seems like the behavior you want is for "" to be treated as null, so would something hacky like putting "NULL" instead of "" work?
  • What dbt setting do you have for quoting, etc.?
  • If you're going to treat 0 and null the same anyways, just put in 0's in the seed.
  • If none of that turns up anything then hack it by putting a dummy value in the seed for null and then run some UPDATE or incremental materialization after seeding.

Once you figure it out, if there's something that could change in this package then it's totally worth opening a new issue with a feature request(or bug fix, I could be wrong). I'm curious for myself now, so note you should be able to leave a comment on a closed issue to keep all the knowledge in one place(the closing would just help with triage).

@jabortell-nordic
Copy link
Collaborator Author

Thanks for the list. Those are good ideas to try.

  • Removing (18, 0): same error.
  • Different datatypes: float and real, which SQL Server calls "approximate numerics" are successful. The other exact numeric, decimal, results in the same error.
  • "Try changing 123 and 456 in your example to 123.0 and 456.0": same error.
  • "It seems like the behavior you want is for "" to be treated as null, so would something hacky like putting "NULL" instead of "" work?" : this is the behavior of dbt-core's from_csv(). The agate.TypeTester is configured to treat "" and "null" as null, and does not appear to be dispatched in dbt-sqlserver.

I'll explore the rest. I'm not convinced that this should be closed, because I'm not convinced this is user error. The issue is not yet reproducible in SQL Server, and the behavior changes not from the seed values but from the dbt configuration of SQL Server data types. Seems like something is happening between reading the CSV and the application of the column_types configuration.

@alison985
Copy link

Ah ha! I was wrong. Definitely needs this issue open to address the agate.TypeTester implementation. That's great research that should make this issue faster to address when someone picks it up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants