Automated way to set good column widths for new datasets #27

levyj · 2015-05-08T20:12:23Z

New datasets often have column widths that are too narrow or too wide. It is possible to get and change the widths, using the views API (e.g., https://data.cityofchicago.org/api/views/ydr8-5enu). This raises the possibility of creating a tool to set widths for a new dataset.

Setting the widths probably is not that hard. The more challenging part of this idea would be figuring out what the widths should be. This is a function both of the column name and the longest value, as well as some subjective preferences about whether either needs to be fully displayed or if more columns, but truncated, are a better use of screen width. However, any solution need not be perfect. If it does a pretty good job so that it cuts the manual adjustments necessary, that is still of value.

Our best (really crude) formula so far is:

Width = (4.7 * Number of Characters in Column Name) + 127

We are happy to share more information with anyone who cares to take on this project.

shua123 · 2015-05-08T21:49:14Z

This is a good idea. Thinking a bit out loud about how to do this... Finding existing widths is easy through the views api. I am not sure what publishing API/method would allow you to set the width.

Without digging into the code or functionality, I wonder if a modified version of DataSync's Port Job might be a path to go. I am not sure if it copies width when it copies schema and metadata.

levyj · 2015-05-08T21:58:10Z

You can modify the JSON and PUT it back to the same endpoint. It's not 100% encouraged but Socrata knows we do it. That is actually how https://data.cityofchicago.org/Community-Economic-Development/Business-Licenses-Current-Active/uupf-x98q and https://data.cityofchicago.org/Service-Requests/Potholes-Patched-Last-Seven-Days/xpdx-8ivx get updated every day.

Note that for changes to datasets (as opposed to the two links above which are views of datasets), you have to PUT to a working copy.

Automation aside, editing the JSON manually can be a much faster way of mass-editing things like column descriptions than using the Web interface.

As far as DataSync or some other Socrata tool, that would be great if Socrata wanted to do it. I think Port Jobs do now copy the width but I am not sure.

shua123 · 2015-05-08T22:01:04Z

Interesting. Do you do that through a Kettle job?

levyj · 2015-05-08T22:02:11Z

Yes.

tomschenkjr · 2015-05-08T22:05:40Z

Maybe @chrismetcalf has thoughts on this idea

levyj added enhancement help wanted automation labels May 8, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automated way to set good column widths for new datasets #27

Automated way to set good column widths for new datasets #27

levyj commented May 8, 2015

shua123 commented May 8, 2015

levyj commented May 8, 2015

shua123 commented May 8, 2015

levyj commented May 8, 2015

tomschenkjr commented May 8, 2015

Automated way to set good column widths for new datasets #27

Automated way to set good column widths for new datasets #27

Comments

levyj commented May 8, 2015

shua123 commented May 8, 2015

levyj commented May 8, 2015

shua123 commented May 8, 2015

levyj commented May 8, 2015

tomschenkjr commented May 8, 2015