Skip to content

nasawakeupcalls/nasawakeupcalls.data

Repository files navigation

nasawakeupcalls.data

Chronology of Wakeup Calls converted to data.

About

Project declared "cool" by NASA Astronaut and Max-Q lead singer Tracy Caldwell Dyson in a telephone interview 12 Feb 2020.

Data extracts and analysis from the NASA wake-up call project. This is a remix project. Credit for the source material goes to Colin Fries of the NASA historical division, and is released as Public Domain by NASA.

More information about the original work and this remix can be found on https://nasawakupcalls.github.io about.

Sources

  • JSON. Is our primary data-source, the _to_csv.py and _to_blog.py scripts will all generate their data from here.
  • CSV. Exists for easy access (who doesn't like a table?). It's a good resource to hack on and generate visualizations, but please consider contributing back to the json document itself.
  • Website. https://nasawakeupcalls.github.io is one of the outputs of this work, providing a searchable front-end to all of this data. The website is based on Jekyll and its source can also be found on Github.

Process and Requirements

When there is an opportunity I will walk people back through what was involved in creating this dataset.

I had started but as the process of converting the data from PDF to JSON continued it became more and more laborious and so difficult to develop a useful audit trail.

Apache Tika helped immensely. Python was my selected scripting language.

Discogs genre tools

The cleaning tools made by Matt in the ipython notebook are commented, but of less general use, and also hideous (Matt's words!). More information on the Discogs API, and how to get an API token can be found here.

Questions without answers

As someone interested in data integrity, moving this data from something locked away in PDF to something machine readable has been "interesting".

  • Has the process been lossless?
  • Has the source material been diluted?
  • Has the transformation of this work been worth it?
  • It was expensive in terms of hours; how expensive was it?
  • How would these costs compare to those in my subject discipline?
  • Does this work match that of the current field of digital humanities?
  • Is the work sustainable?
  • Would I do it again? (Yes, but, I might plan it differently).

What next

  • Playlists. And greater access to the data.
  • I want to find more tools to explore the data with.
  • Visidata looks good! https://www.visidata.org/
  • Keeping this list on-going, I am trying to keep track of additions in Github issues. As the space-program ramps up again, we will need a different approach.

Please do

Visit https://nasawakeupcalls.github.io/ and enjoy what's there.

Please let me know your ideas on how it can be expanded upon.