Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automated way to set good column widths for new datasets #27

Open
levyj opened this issue May 8, 2015 · 5 comments
Open

Automated way to set good column widths for new datasets #27

levyj opened this issue May 8, 2015 · 5 comments

Comments

@levyj
Copy link
Member

levyj commented May 8, 2015

New datasets often have column widths that are too narrow or too wide. It is possible to get and change the widths, using the views API (e.g., https://data.cityofchicago.org/api/views/ydr8-5enu). This raises the possibility of creating a tool to set widths for a new dataset.

Setting the widths probably is not that hard. The more challenging part of this idea would be figuring out what the widths should be. This is a function both of the column name and the longest value, as well as some subjective preferences about whether either needs to be fully displayed or if more columns, but truncated, are a better use of screen width. However, any solution need not be perfect. If it does a pretty good job so that it cuts the manual adjustments necessary, that is still of value.

Our best (really crude) formula so far is:

Width = (4.7 * Number of Characters in Column Name) + 127

We are happy to share more information with anyone who cares to take on this project.

@shua123
Copy link
Contributor

shua123 commented May 8, 2015

This is a good idea. Thinking a bit out loud about how to do this... Finding existing widths is easy through the views api. I am not sure what publishing API/method would allow you to set the width.

Without digging into the code or functionality, I wonder if a modified version of DataSync's Port Job might be a path to go. I am not sure if it copies width when it copies schema and metadata.

@levyj
Copy link
Member Author

levyj commented May 8, 2015

You can modify the JSON and PUT it back to the same endpoint. It's not 100% encouraged but Socrata knows we do it. That is actually how https://data.cityofchicago.org/Community-Economic-Development/Business-Licenses-Current-Active/uupf-x98q and https://data.cityofchicago.org/Service-Requests/Potholes-Patched-Last-Seven-Days/xpdx-8ivx get updated every day.

Note that for changes to datasets (as opposed to the two links above which are views of datasets), you have to PUT to a working copy.

Automation aside, editing the JSON manually can be a much faster way of mass-editing things like column descriptions than using the Web interface.

As far as DataSync or some other Socrata tool, that would be great if Socrata wanted to do it. I think Port Jobs do now copy the width but I am not sure.

@shua123
Copy link
Contributor

shua123 commented May 8, 2015

Interesting. Do you do that through a Kettle job?

@levyj
Copy link
Member Author

levyj commented May 8, 2015

Yes.

@tomschenkjr
Copy link
Contributor

Maybe @chrismetcalf has thoughts on this idea

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants