Feature request: add "no inference mode" to add from URL #120

aborruso · 2019-05-05T08:57:47Z

Hi,
when I add from URL a file, workbenchdata does inferencing to map the field types. It's a great feature but sometimes gives wrong results.

In example here (https://app.workbenchdata.com/workflows/17120) I import an XLS file and it maps the field "CODISTAT" as number and it's a problem, because in the source xls file it's a text field. And then in workbenchdata the value "001801" becomes "1801" and it's not so good.

It would be great to have an option in the module to have "no inference", and have all fields as text field.

Thank you

aborruso · 2019-05-05T09:17:17Z

Moreover if (it's in tab2 ) I apply CODISTAT.rjust(6,"0") python function, I have wrong result: once again "1801" and not "001801", because the output field type is a number.

pierreconti · 2019-05-06T03:01:45Z

Hi Andrea, Thanks for reporting the issue, it's on our roadmap. We'll make sure to let you know when it's fixed.You can also report bugs through Intercom if that's easier. Thank you! On Sun, May 5, 2019 9:17 AM, Andrea Borruso notifications@github.com wrote: Moreover if (it's in tab2 ) I apply CODISTAT.rjust(6,"0") python function, I have wrong result: once again "1801" and not "001801", because the output field type is a number. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

…

-Pierre ContiCo-founder & CEOWorkbench@pierreconti

aborruso · 2019-05-06T06:47:50Z

Hi @pierreconti I think it could be more useful to write here about feature requests and bugs. I think this avoids duplication.
Then I wait with great interest, because in some cases it becomes uncomfortable.

Thank you

adamhooper · 2019-05-16T21:04:17Z

@aborruso I've seen something similar before. My workaround, using the Python module:

def process(table):
    table['Zip'] = table['Zip'].astype(str).str.zfill(5)
    return table

aborruso · 2019-05-16T21:27:39Z

@adamhooper I will use it waiting for an official "solution".

Thank you

adamhooper · 2019-12-13T16:47:43Z

I deployed new fetch logic that stores raw files. And our new CSV parser backend has this option ... but we don't expose it to users.

Now, the missing pieces are:

New XLS/XLSX parsers in https://github.com/CJWorkbench/arrow-tools. I envision a "strict-types mode" in which the only way we interpret a column as Number/Date is if all values are of that type. (pd.read_excel() is a lost cause.)
As mentioned in Save raw source data in fetch #98, we need a UI so users can set this new option without forcing a fetch.

aborruso · 2019-12-13T17:28:14Z

@adamhooper thank you, it's a good thing!

adamhooper mentioned this issue Dec 13, 2019

Save raw source data in fetch #98

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: add "no inference mode" to add from URL #120

Feature request: add "no inference mode" to add from URL #120

aborruso commented May 5, 2019

aborruso commented May 5, 2019

pierreconti commented May 6, 2019 via email

aborruso commented May 6, 2019

adamhooper commented May 16, 2019

aborruso commented May 16, 2019

adamhooper commented Dec 13, 2019

aborruso commented Dec 13, 2019

Feature request: add "no inference mode" to add from URL #120

Feature request: add "no inference mode" to add from URL #120

Comments

aborruso commented May 5, 2019

aborruso commented May 5, 2019

pierreconti commented May 6, 2019 via email

aborruso commented May 6, 2019

adamhooper commented May 16, 2019

aborruso commented May 16, 2019

adamhooper commented Dec 13, 2019

aborruso commented Dec 13, 2019