Skip to content

LibrariesHacked/wuthering-hacks

Repository files navigation

Wuthering Hacks

UpdateData

A data dashboard to display Newcastle libraries open data. Currently published at https://newcastle.librarydata.uk

What is it?

Newcastle public libraries publish as much of their data as possible under a Public Domain licence (https://creativecommons.org/publicdomain/zero/1.0/). Details of existing datasets can be found at Libraries data sets.

They also have a GitHub account at ToonLibraries, and an open data repository within this account at library-open-data.

Dashboard pages

The dashboard splits visualisations into pages, focussing on different areas of the library data provided by Newcastle.

Page Description
Usage Details of issues, computer use, enquiries, and visits by month and by library
Catalogue Details on the library catalogue - from titles and items data
Members Details on membership by postcode area and date joined/active

Data provided

The dashboard uses CSVs published by Newcastle libraries under the Public Domain licence.

Data Link Description
Current Libraries CSV Location of current Newcastle City Council Libraries along with number of public access computers and Wi-Fi provision
Monthly computer usage CSV Monthly computer usage figures by branch for April 2008 to Present
Monthly enquiries CSV Monthly enquiry figures by branch for April 2008 to Present
Monthly issues CSV Monthly loan figures (number of items issued) by branch for April 2008 to Present
Monthly visits CSV Monthly issue figures by branch for April 2008 to Present
Members CSV Anonymised member data including postcode district, library registered at, date added and last used
Catalogue CSV Extract from the Library Management System (LMS) catalogue
Items CSV Items in the Library Management System (LMS) catalogue

The code does not link directly to these files but uses a copy held within the project. This means that updates to those open data files need to be manually copied into this project. See Build section for instructions.

Data definitions

The data that the dashboard uses is converted from the source data, and put into a format that is most efficient for the code to use.

However, the original data can be copied into this project when it is published. The definitions of the datasets used are included below.

Monthly enquiries

Field Description Example
Library The name of the library Blakelaw
2008-04 The number of enquiries for the month 312

The columns go on to cover each month in the form of YYYY-MM.

Comments

It would be nice to have this dataset with the month in a row, rather than a column header. For example:

Field Description Example
Library The name of the library Blakelaw
Month The month 2008-04
Enquiries The number of enquiries for the month 312

That way the structure would be fixed to three columns and would increase in rows (rather than columns) as new months are added. The same applies to the following datasets on usage.

Monthly issues

Field Description Example
Library The name of the library Blakelaw
2008-04 The number of issues for the month 1048

Monthly visits

Field Description Example
Library The name of the library Blakelaw
2008-04 The number of visits for the month 1768

Monthly computer usage

Field Description Example
Library The name of the library Blakelaw
2008-04 The percentage of computer utilisation 50%

Online resources usage

Field Description Example
Online Resource The type of online resource 19th Century British Library Newspapers
Jan-05 The usage figure for the month 300

Membership

Field Description Example
Postcode The postcode district of the member AB10
Library Registered At The library the member is registered at CITY
Date Added The date the user was added as a member 04/09/15 or 04/08/2005
Time Added The time the user was added as a member 8:45:00 or Empty
Last Used Date The date the member last used services 04/09/15
Last Used Time The time the member last used services 8:45:00

Catalogue

Field Description Example
rcn Unique identifier for the title 413396703
isbn The International Standard Book Number of the title record 9780413396709
publ_y The year the title was published 1980
author Main author of the work Osborne, Charles
title Main title as on title page or equivalent W.H. Auden : the life of a poet
price Price of 1 copy £0.0
langua Main language of the work. Note: for most works in English the language is not specified.
editio Edition or version of the work
class Main classification allocated by library staff or by the supplier for the title 821AUDE
publisher Name of the publisher EYRE METH
firstcopydate Date the first copy was added. Note: field rarely used.
acpy Number of copies in stock for that ISBN 1

Comments

It's probably down to the tool used to create the CSV, but the header is on the second row. The first row includes a timestamp.

__ Mon Sep,19 13:33:20 2016,______

Never mind though, it'll be easy enough to ignore. There seems to be a final column on the end that doesn't have any data in. Will ignore that. Pound (£) signs are included in the price column. I don't think there are any other currencies so will remove these and just have a decimal number. The number of copies sometimes seems to have a pipe character (|), maybe some remnant from a MARC field, so will also remove these.

Items

Field Description Example
item A unique ID for the item. C203255900
rcn The unique title record (links to the catalogue title data above). 573011680
catego A category ID for the item. 2
text Text for the category ID. ADULT NON FICTION
homebr An ID for the item branch. 46
name Name of the item location. CITY STACK
added Date and time added to the catalogue. 22/01/2007 14:26
issues current branch Number of issues at the current branch. 0
issues previous branch Number of issues at the previous branch. 0
renewals current branch Number of renewals at the current branch. 0
renewals previous branch Number of renewals at the previous branch. 0

Combining usage data

To produce a file that is efficient to show for usage data, it's worth merging together a number of the files on usage: enquiries, issues, vists, and computer usage. Each of these include libraries and months, so when separate contain a lot of duplicated data.

The goal will be to produce a file to be used by the dashboard that looks like the following.

Field Description Example
Library The name of the library Blakelaw
Month The month 2008-04
Enquiries The number of enquiries for the month 312
Issues 1048
Visits 1768
Computer Usage 50%

The data is created using a python script. This is included in the scripts directory of this project and prduces 1 file.

  • dashboard_usage.csv

This file is then used in the usage page of the data dashboard.

Combining and aggregating catalogue and items

Both the catalogue and item extracts are fairly large files (29MB and 27MB). Given that this project mainly processes data client-side (in the web browser), those files are too large to expect users to download.

We mainly need aggregated data (e.g. x thousand items, x thousand items of a particular category). For this purpose I have created a single aggregated dataset for catalogue and items. This is made smaller by using Ids for category and branch. These lookups are then included as a separate export.

Field Description Example
CategoryId An integer ID of the category type. 1
Category The textual name of the category. ADULT NON FICTION
Field Description Example
BranchId An integer ID of the branch. 0
Branch The textual name of the location. CITY STACK
Field Description Example
CategoryId Derived from the text field in the item data. 1
BranchId Derived from the name field from the item table. 1
Added Month the items were added to the catalogue. 2016-01
Count A count of the number of items. 1
Issues A count of the number of issues 419757
Renewals A count of the number of renewals 605263
Price Taken from the price field of the title data, in this case a total price for the items. Will be in pounds but with no symbol. 462969.67

So, the above example would show that

The data is created using a python script. This is included in the scripts directory of this project and prduces 3 files.

  • dashboard_catalogue.csv
  • dashboard_catalogue_grouped.csv
  • dashboard_catalogue_branches.csv
  • dashboard_catalogue_categories.csv

These files are then used in the catalogue page of the data dashboard.

Technologies used and licences

The following technologies (with licences listed) are used in this project.

Technology Used for Link Licence
Bootstrap To provide the page structure. Currently using version 4 Alpha 6. Bootstrap MIT
jQuery Required by bootstrap and to provide JavaScript code shortcuts jQuery MIT
DC JS Dimensional Charting JavaScript library - used for the dynamic charts dc.js Apache
Crossfilter Required by DC JS, provides the cross flitering functionality Crossfilter Apache
D3 Required by DC JS, provides the data driven graphs D3JS BSD
Leaflet JavaScript library for mapping. LeafletJS Open Source
CartoJS Specific functions for mappping using data stored in Carto. CartoJS Open Source

Licence

This code is licensed under the MIT Licence.