Skip to content

bertspaan/tutorial-historical-addresses

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tutorial: Historical Addresses & NYC Space/Time Directory

This is a tutorial made for the second meetup of NYC Space/Time Directory's meetup series: NYC Maps, Buildings, and Addresses: Using and combining historic data on February 1st, 2017.

in this tutorial, we will combine three different crowdsourced datasets from the NYC Space/Time Directory to create a web interface makes historical addresses searchable and visible.

Datasets:

Outline

In this tutorial, we will do the following things:

  1. See what data is available via Building Inspector's API
  2. Find out how the NYPL traces the locations and names of streets from historical maps, and turns this into new datasets for everyone to use (and how you can help tracing more maps)
  3. We will use the NYC Space/Time Directory's website to download and use those datasets
  4. Combine Building Inspector and historical street datasets to create a new dataset containing historical addresses
  5. We'll use Leaflet to display Map Warper's historical map tiles
  6. And finally, put everything together and make our new dataset searchable with a simple web interface

Examples from a 1854 New York City Directory:

  • Kelly William E. daguerreotypes, 374 Bowery

Scan of 1854 New York City Directory showing William E. Kelly's address

This address on an 1875 map:

Part of 1875 map showing 374 Bowery in Manhattan

  • Palmer George, painter, 90 Nassau, h. 84½ Fulton, Brooklyn

Scan of 1854 New York City Directory showing George Palmer's address

This address on an 1855 map:

Part of 1855 map showing 84½ Fulton Street in Brooklyn

Goal: web interface for searching historical addresses

Screenshot of web interface described in this tutorial

Data

In this tutorial, we're using data from one of NYPL's crowdsourcing tools (Building Inspector), one crowdsourced dataset (historical streets), and we'll use Map Warper to display historical map tiles. Traditionally, we would have needed to manually download data from the Building Inspector API (more information below), use Shapefiles from the streets dataset, and combine those datasets together ourselves.

Not anymore!

Using the NYC Space/Time Directory, all this data is available in one format, in one place. (Later this year, I will add new search and map interfaces to make finding, visualizing and using all this data easier — like the one we will make in this tutorial, but for all NYPL's geospatial data.)

You can find NYC Space/Time Directory datasets here: spacetime.nypl.org#data.

Screenshot of data section on NYC Space/Time Directory website

Extract, Transform, Load

Data does not magically convert itself to one data model and appear on the NYC Space/Time Directory website. For the project, I have written many extract, transform, load (ETL) modules which take data from one place, transform it, and output Space/Time data.

Diagram showing how data flows through NYC Space/Time Directory: multiple data sources ⟶ data transformation ⟶ Space/Time website

Using data from the command line

Space/Time datasets consist of one or more Newline Delimited JSON (NDJSON) files, and a JSON file with dataset metadata (title, author, license, etc.). NDJSON files contain one JSON object per line, which is convenient when using command line tools, or when doing streaming data processing.

Let's have a look at one line from building-inspector.objects.ndjson:

{"id":"87139-1","type":"st:Address","validSince":1857,"validUntil":1857,"name":"20","data":{"number":"20","sheetId":177,"layerId":859,"mapId":7138},"geometry":{"type":"Point","coordinates":[-73.99559810757634,40.71142649628733]}}

This line contains one address from Building Inspector's API, transcribed with crowdsourcing, and converted to the Space/Time data model. Please note that address only contains a house number, no street name.

The same object, but on multiple lines and easier to read:

{
   "id":"87139-1",
   "type":"st:Address",
   "validSince":1857,
   "validUntil":1857,
   "name":"20",
   "data":{
      "number":"20",
      "sheetId":177,
      "layerId":859,
      "mapId":7138
   },
   "geometry":{
      "type":"Point",
      "coordinates":[
         -73.99559810757634,
         40.71142649628733
      ]
   }
}

It's easy to process those files using your command line. The examples below use the following tools:

  • jq: command-line JSON processor, install with brew install jq
  • ndjson-cli: command-line tools for operating on newline-delimited JSON streams, install with npm install -g ndjson-cli
  • spacetime-cli: command-line tools for Space/Time data, install with npm install -g nypl-spacetime/spacetime-cli

Use ndjson-filter to filter Building Inspector data by year, convert to GeoJSON, and save the resulting file to disk:

curl http://s3.amazonaws.com/spacetime-nypl-org/\
datasets/building-inspector/building-inspector.objects.ndjson \
| ndjson-filter 'd.validSince > 1880' | spacetime-to-geojson > \
~/Downloads/building-inspector-1880.geojson

Use ndjson-map to only display the ID and the name, and then grep for Harlem:

curl http://s3.amazonaws.com/spacetime-nypl-org/\
datasets/mapwarper/mapwarper.objects.ndjson \
| ndjson-map '`${d.id} - ${d.name}`' | grep Harlem

For more information and examples, see https://github.com/nypl-spacetime/spacetime-data/

Using GeoJSON files directly

If you're just interested in geospatial data, you can download GeoJSON files directly from spacetime.nypl.org.

These GeoJSON files can be used in any GIS tool. And you can easily display and edit them using geojson.io:

Building Inspector

The Building Inspector dataset contains two types of objects:

  • Buildings: footprints of historical buildings, with year, map layer, and color
  • Addresses: transcibed house numbers (see Building Inspector's Enter Addresses task), with year, map layer and coordinates

(In the dataset's ZIP file you will find building-inspector.relations.ndjson, which contains links between buildings and addresses, but we will not use those in this tutorial.)

You can use QGIS to display GeoJSON files:

Building Inspector data displayed in QGIS

Historical Streets

The historical streets dataset (nyc-streets) contains one type of objects:

  • Streets: centerlines of historical streets, with their name, year and map layer

The tracing of streets from historic maps is done manually, in QGIS. You can help us, see https://github.com/nypl-spacetime/qgis-trace-tutorial for details.

Street data displayed in QGIS

Map Warper

In this tutorial, we won't use data from Space/Time mapwarper dataset (which contains the polygonal outlines and metadata of thousands of NYC maps), but we will use Map Warper's tile server to display historical map tiles.

Building Inspector's addresses contain the ID of the map that was used for address transcription. On the Export tab in Map Warper's map view, you can see that map's tiles URL:

http://maps.nypl.org/warper/maps/tile/30780/{z}/{x}/{y}.png

Screenshot of Map Warper's export screen

You can use this tile URL in many geospatial tools, including Leaflet:

L.tileLayer('http://maps.nypl.org/warper/maps/tile/30780/{z}/{x}/{y}.png').addTo(map)

Although this tutorial does not use Map Warper data directly, you can still have a look at the dataset's GeoJSON file, or even open it in QGIS:

Map Warper data displayed in QGIS

Finding closest historical street for each Building Inspector address

We have, in two separate datasets, address and street data:

  • building-inspector dataset: house numbers with point geometries
  • nyc-streets dataset: street names with polyline geometries

We need a way to figure out that house number 84½ belongs to Fulton Street on the same map:

Part of 1855 map showing 84½ Fulton Street in Brooklyn

Luckily, buildings with house numbers are usually geographically close to the street they are on, so we can compute the distance between each address and each street, and we should find pretty good matches. Or course, we also want to take the year of both the address and the street into account, to that we will not link 1854 addresses to 1894 streets.

For the NYC Space/Time Directory, I have created an ETL module which uses PostGIS and data from the building-inspector and nyc-streets datasets to create links between those datasets.

Example SQL query from this ETL module, using a 5 year margin for matching addresses and streets:

SELECT addresses.id, (
  SELECT
    streets.id
  FROM objects streets
  WHERE type = 'st:Street' AND
    lower(streets.validsince) - interval '5 year' < lower(addresses.validsince) AND
    upper(streets.validuntil) + interval '5 year' > upper(addresses.validuntil) AND
    ST_Distance(Geography(addresses.geometry), Geography(streets.geometry)) < 20 -- meters
  ORDER BY ST_Distance(addresses.geometry, streets.geometry)
  LIMIT 1
) AS streets
FROM objects addresses
WHERE type = 'st:Address'

See GitHub for the source code of the ETL module.

See bertspaan.nl/west-village for more details about matching addresses and streets using PostGIS.

The resulting dataset is called building-inspector-nyc-streets and can be found on the Space/Time website.

Example GeoJSON from resulting datasets, with links between Building Inspector and nyc-streets:

{
  "type": "Feature",
  "properties": {
    "id": "78675-1",
    "name": "119 East 59th Street",
    "type": "st:Address",
    "validSince": 1857,
    "validUntil": 1858,
    "data": {
      "mapId": 7113,
      "number": "119",
      "layerId": 859,
      "sheetId": 152,
      "addressId": "building-inspector/78675-1",
      "streetId": "nyc-streets/859-east-59th-street"
    }
  },
  "geometry": {
    "type": "Point",
    "coordinates": [
      -73.965545,
      40.761191
    ]
  }
}

Open a sample of 100 addresses in geojson.io.

Preparing data for a web interface

The Objects NDJSON file of building-inspector-nyc-streets, the dataset we've explained in the previous section, is more than 13MB and contains many fields (like IDs and types) we do not need in our visualization.

This tutorial contains a small Node.js script which downloads building-inspector-nyc-streets and nyc-steets from the Space/Time website, removes some unneeded fields out and does some data transformation to index streets by their ID. Yet another ETL step, it never ends.

In many of Space/Time's ETL tools I'm using on Highland for streaming data processing. Highland is great, you should use it too!

To run this script, first install its dependencies:

npm install

Then, run the script:

node data.js

The resulting addresses json file has now shrunk 50% in size to 7.1MB, and around 700KB with HTTP compression.

Web interface

Now, we have all data we need:

  • A new dataset with historical addresses (with street names), including their locations
  • For every address, a links to the geometry of it's street
  • A way to display historic map tiles from Map Warper

With some HTML, JavaScript and CSS, it's pretty easy to make a website with which you can search those historical addresses.

Some libraries we'll need:

  • D3.js: D3.js is a JavaScript library for manipulating documents based on data. It makes downloading JSON data and then modifying the webpage based on this data very easy.
  • Leaflet: Leaflet is a JavaScript library for mobile-friendly interactive maps. It can display Map Warper's map tiles, as well as GeoJSON data.
  • lunr.js: lunr.js is a simple full text search engine for the browser. We use lunr.js to index and search addresses.

Our search tool consists of three files:

index.html is the page that opens when you point your browser to bertspaan.nl/tutorial-historical-addresses.

Below, I will explain the most important parts of js/historical-addresses.js:

First, initialize a Leaflet map, and add three layers:

  1. An OpenStreetMap base layer
  2. A tile layer for Map Warper tiles
  3. A GeoJSON layer for address and street data
var map = L.map('map', {
  center: [40.8, -73.96],
  zoom: 14,
  maxZoom: 20
})

var baseMapTileUrl = 'http://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png'
var baseLayer = L.tileLayer(baseMapTileUrl, {
  attribution: '&copy; <a href="http://www.openstreetmap.org/copyright">OpenStreetMap</a> contributors',
  maxZoom: 20,
  maxNativeZoom: 19
}).addTo(map)

var tileLayer = L.tileLayer('', {
  maxZoom: 20
}).addTo(map)

var geojsonLayer = new L.geoJson(null, {
  style: styles.street,
  pointToLayer: function (feature, latlng) {
    return L.circleMarker(latlng, styles.address)
  },
  onEachFeature: function (feature, layer) {
    var text = feature.properties.name || feature.properties.address
    if (text) {
      layer.bindPopup(text)
    }
  }
}).addTo(map)

Create a new lunr.js index, indexing only the address field and using id as a reference:

var idx = lunr(function () {
  this.field('address')
  this.ref('id')
})

Use D3.js to load the two JSON files, store the data, and index all addresses with lunr.js:

d3.json('data/streets.json', function (json) {
  streets = json
})

d3.json('data/addresses.json', function (json) {
  addresses = json

  addresses.forEach(function (address) {
    idx.add(address)
  })
})

When the user types in the input field, search the lunr.js index, take only the first 75 results, and store them for display:

d3.select('#search')
  .on('input', function () {
    var results = idx.search(this.value)
      .slice(0, 75)
      .map(function (result) {
        return addresses[result.ref]
      })
  })

After searching and finding a new address, the map will move to the coordinates of that address. When the map is finished moving, set the tile URL of the tile layer to the correct Map Warper tile URL:

map.on('moveend', function () {
  var tileUrl = 'http://maps.nypl.org/warper/maps/tile/' + selectedAddress.mapId + '/{z}/{x}/{y}.png'
  tileLayer.setUrl(tileUrl)
})

Final result: bertspaan.nl/tutorial-historical-addresses

Screenshot of final result