/
README
570 lines (399 loc) · 21 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
This is the README for the YAMZ metadictionary and includes instructions for
deploying on a local machine for testing and on Heroku (heroku.com) for a
scalable production version. These assume a Ubuntu GNU/Linux environment, but
should be easily adaptable to any system; YAMZ is written in Python and uses
only cross-platform packages. This file is formatted by hand and does not
contain markdown.
Authored by Chris Patton. Last updated 5 March 2016 (jak).
YAMZ is formerly known as SeaIce; for this reason, the database tables
and API use names based on "SeaIce".
Contents
========
0. Prerequisites
0.1 Create the deploy_keys branch
1. Configuring a local instance
1.1 Postgres authentication
1.2 Create the database
1.3 Create a role for standard queries
1.4 OAuth credentials and app key
1.5 N2T persistent identifier resolver credentials
1.6 Test the instance
2. Deploying to Heroku
2.1 Heroku-Postgres
2.2 Mailgun
2.3 Heroku-Scheduler
2.4 Making changes
2.5 Exporting the dictionary
3. URL forwarding
4. Building the docs
0. Prerequisites
================
The contents of this directory are as follows:
sea.py . . . . . . . . . . Console utility for scoring and classifying
terms and other things.
ice.py . . . . . . . . . . Web server front end.
digest.py . . . . . . . . Console email notification utility.
requirements.txt . . . . . Heroku package dependencies.
Procfile . . . . . . . . . Heroku configuration.
seaice/ . . . . . . . . . The SeaIce Python module.
html/ . . . . . . . . . . HTML templates, static Javascript and CSS,
including bootstrap.js.
doc/ . . . . . . . . . . . API documentation and tools for building it.
# .seaice/.seaic_auth . . . DB credentials, API keys, app key, etc. Note
.seaice/.seaice_auth . . . DB credentials, API keys, app key, etc. Note
that these files are just templates and don't
contain actual keys.
Before you get started, you need to set up a database and some software
packages. On Mac OS X, this may suffice:
$ pip install psycopg2 Flask configparser flask-login flask-OAuth \
python-dateutil
On Ubuntu, grab the following:
python-flask . . . . . . . Simple HTTP server.
postgresql . . . . . . . . We're using PostgreSQL for database managment.
python-psycopg2 . . . . . Python API for PostgreSQL.
python-pip . . . . . . . . Package manager for additional Python
programs.
and then download a package from pip that handles configuration files nicely:
$ sudo pip install \
configparser flask-login flask-OAuth python-dateutil urlparse
0.1 Create the deploy_keys branch
=================================
The 'master' branch contains all the code to deploy locally or on heroku.
To deploy to heroku, we first need to create a local branch (one time
operation) called "deploy_keys" and then check it out,
$ git branch deploy_keys # create branch
$ git checkout deploy_keys # update working tree
Then edit .seaice and .seaice_auth with actual API and app keys, and push
to heroku with
$ git push heroku deploy_keys:master
See section 2 for more on heroku.
IMPORTANT: NEVER PUSH THIS BRANCH TO GITHUB.
1. Configuring a local instance
===============================
To start, we'll set up a database in postgres. First, we need to do some
configuration. Postgres requires an administrative user called 'postgres'.
It may be a good idea to create a SeaIce user (called "role" in postgres
jargon) with read/write access granted on the tables. This work takes
place in the top-level source directory. On Linux (assuming postgres was
installed using sudo), set postgres' password:
$ sudo -u postgres psql template1
template1=# alter user postgres with encrypted password 'PASS';
template1=# \q [quit]
On a Mac, assuming homebrew installed the software in your home directory
(ie, without sudo), you'll need to initialize postgres first, with
something like these commands:
$ initdb /usr/local/var/postgres
$ cp /usr/local/Cellar/postgresql/9.6.3/homebrew.mxcl.postgresql.plist \
~/Library/LaunchAgents/
$ launchctl load -w ~/Library/LaunchAgents/homebrew.mxcl.postgresql.plist
That makes postgres run automatically after a reboot. Create user 'postgres'
and set the password to 'PASS':
$ createuser -d postgres
$ createuser -d SeaIce
$ psql -U postgres -c "alter user postgres with encrypted password 'PASS'"
1.1 Postgres authentication
===========================
Now configure the authentication method for postgres and all other users
connecting locally. In /etc/postgresql/9.1/main/pg_hba.conf (on our Mac,
/usr/local/var/postgres/pg_hba.conf), change "peer" in the line that will
become (xxx shouldn't the md5 show below?)
local all postgres peer
to "md5" for the administrative account and local unix domain socket
connections. Next, we want to only be able to connect to the database from
the local machine. In /etc/postgresql/9.1/main/postgresql.conf (on our Mac,
/usr/local/var/postgres/postgresql.conf), uncomment the line
listen_addresses = 'localhost'
After you've done this, you need to restart the postgres server. On Linux,
$ sudo service postgresql restart
or on our homebrew-based Mac,
$ launchctl stop ~/Library/LaunchAgents/homebrew.mxcl.postgresql.plist
$ launchctl start ~/Library/LaunchAgents/homebrew.mxcl.postgresql.plist
1.2 Create the database
=======================
Finally, log back in to postgres to create the database,
$ sudo -u postgres psql
postgres=# create database seaice with owner postgres;
or on our Mac,
$ psql -U postgres -c 'create database seaice with owner postgres'
(Using unique, completely random passwords is a good idea here.) Next,
create a configuration file for the database and user account you set up.
Create a file called '.seaice' like:
[default]
dbname = seaice
user = postgres
password = PASS
xxx do this in a separate "local_deploy" dir?
xxx user = reader?
xxx separate section for [contributor] ? eg
xxx [contributor]
xxx dbname = seaice
xxx user = contributor
xxx password = PASSWORD3
IMPORTANT NOTE: A template of this file is provided in the github
repository. Your working version of this file should remain secret
and must not be published. Set the correct file permissions with:
$ chmod 600 .seaice
This file is used by the SeaIce DB connector to grant access to the database.
To initialize the DB schema and tables, type:
# $ ./sea.py --init-db --config=.seaice
$ ./sea.py --init-db --config=local_deploy/.seaice
xxx move this section above preceding, since you may need those roles?
1.3 Create a role for standard queries
======================================
At this point, it's suggested that you set up user standard read/write
permissions on the table (no DROP, CREATE, GRANT, etc.) for most of the
database queries. Note that this isn't applicable in Heroku; the postgres
interface there doesn't allow you to control user views.
postgres=# create user contributor with encrypted password 'PASS';
postgres=# \c seaice;
postgres=# grant usage on schema SI, SI_Notify to contributor;
postgres=# grant select, insert, update, delete on all tables in
schema SI, SI_Notify to contributor;
On our Mac, that would be
$ psql -c "create user contributor with
encrypted password 'PASS'" template1
xxx shouldn't we do the next line too? grant usage on schema ...
$ psql -c "grant usage on schema SI, SI_Notify to contributor" seaice
$ psql -c "grant select, insert, update, delete on all tables
in schema SI, SI_Notify to contributor" seaice
Add the configuration to '.seaice':
[contributor]
dbname = seaice
user = contributor
password = PASS
XXXX move this last few lines after [dev] and [heroku] are set up
xxx and/or add --deploy=dev so that local instance can come up
without having to configure heroku section
The web user interface creates a database connection pool with the
same role. You can specify this on the command line:
# $ ./ice.py --role=contributor --config=.seaice
$ ./ice.py --role=contributor --config=local_deploy/.seaice
'--role' defaults to 'default'.
1.4 OAuth credentials and app key
=================================
YAMZ uses Google for third party authentication (OAuth-2.0) management of
logins. Visit https://console.developers.google.com to set this service up
for your instance. Navigate to something like API Manager -> Credentials
and select whatever lets you create a new OAauth client ID. For local
configuration, supply these answers:
Application type . . . . . . . . . . Web application
Authorized javascript origins . . . http://localhost:5000
Authorized redirect URI . . . . . . http://localhost:5000/authorized
Create another set of credentials for your heroku instance, say yamz-dev:
Application type . . . . . . . . . . Web application
Authorized javascript origins . . . http://yamz-dev.herokuapp.com
Authorized redirect URI . . . . . . http://yamz-dev.herokuapp.com/authorized
In each case, you should obtain a pair of values to put into another
configuration file called '.seaice_auth'. Create or edit this file,
replacing google_client_id with the returned 'Client ID' and replacing
google_client_secret with the returned 'Client secret'.
xxx Where does app_secret come in? does it come from the 'API key'?
Manoj: app_secret only needed for heroku deployment
(See section 2.) XXX Are the instructions there complete, eg, redirect URL?)
XXX ?document this in a 0.x section, since it applies to local and heroku?
Next, create a configuration file called '.seaice_auth' with the appropriate
client IDs and secret keys. For instance, you may have credentials for
'http://localhost:5000', as well as a deployment on heroku:
XXX the google_client_id identifies your client software/(app?) and is
paired with the redirect URL, eg, one for 'http://localhost:5000'
and another for http://yamz.net...
xxx is this correct? each unique post-auth redirection target needs its
own unique google_client_id
XXX to do: allow local dev to take place offline, ie, without contact
with google for Auth or with minters and binders and n2t
xxx to do: let people create test terms that expire in two weeks
[dev]
google_client_id = 000-fella.apps.googleusercontent.com
google_client_secret = SECRET1
app_secret = SECRET2
[heroku]
google_client_id = 000-guy.apps.googleusercontent.com
google_client_secret = SECRET3
app_secret = SECRET4
IMPORTANT NOTE: A template of this file is provided in the github
repository. This file should remain secret and must not be published.
We provide the template, since heroku requires a commited file.
For convenience, this file will also keep the Flask app's secret key. For
this key, enter a long, random string of characters. Finally, set the correct
file permissions with:
$ chmod 600 .seaice_auth
1.5 N2T persistent identifier resolver credentials
==================================================
Whenever a new term is created, YAMZ uses an API to n2t.net (maintained by
the California Digital Library) in order to generate ("mint") a persistent
identifier. The main role of n2t.net is to be a resolver for the public-
facing URLs that persistently identify YAMZ terms. It is necessary to
provide a minter password for API access to this web service. To do so
include a line in ".seaice_auth" for every view:
minter_password = PASS
A password found in the MINTER_PASSWORD environment variable, however, will
be preferred over the file setting. This password is used again in the
API call to store metadata in a YAMZ binder on n2t.net. The main bit of
metadata stored is the redirection target URL that supports resolution of
ARK identifiers for YAMZ terms.
Because real identifiers are meant to be persistent, no local or test
instance of YAMZ should ever set the boolean "prod_mode" parameter in
".seaice_auth". For such instances the generated and updated terms
should just be for identifiers meant to be thrown away. Only on the
real production instance of YAMZ, when you're done testing term creation
and update, should it be set to "enable" (the default is don't enable).
1.6 Test the instance
=====================
First, create the database schema:
$ ./sea --config=.seaice --init-db
Start the local server with:
$ ./ice.py --config=.seaice --deploy=dev
If all goes well, you should be able to navigate to your server by typing
'http://localhost:5000' in the address bar. To verify that you've set up
Google OAuth-2.0 correctly, try logging in. This will create an account.
Try adding a new term, modifying and deleting a term, and commenting on
terms. To classify a term, do:
$ ./sea.py --config=.seaice --classify-terms
2. Deploying to Heroku
======================
The YAMZ prototype is currently hosted at http://yamz.herokuapp.com.
Heroku is a cloud-computing service which allows users to host web-based
software projects. Heroku is scalable for a price; however, we can
still achieve quite a bit without spending money. We have access to a
small Postgres database, can schedule jobs, use a variety of packages
(all we need are available), and deploy easily with Git. Some limitations
of Heroku are that it is impossible to set up DB roles and any local files
cannot be assumed to persist after a reboot.
To begin, you need to setup an account with Heroku and download their software.
(It's nothing major, just some tools for running commands, interacting with
the database, etc.) Visit http://www.heroku.com.
Heroku requires a couple additional configuration files and some small
code changes. The additional files (already set up in the repo) are:
Procfile . . . . . . . specifies the commands that start web server, as
well as periodic jobs.
requirements.txt . . . a list of packages required by our software that
Heroku needs to make available. These are
available via pip.
I used the following tutorial: https://devcenter.heroku.com/articles/python
to set these up. Once you've set up your heroku account, you're ready to
deploy.
The recommended best practice for managing your heroku instance is to set up
a local branch called 'deploy_keys' based on 'master'. In this branch, edit
.seaice and .seaice_auth to contain actual passwords and API and app keys.
NOTE: IT IS CRITICAL THAT YOU DON'T PUSH THIS BRANCH TO GITHUB.
Publishing these secrets compromises the security of the entire app.
Login via the heroku website and create a new app. Let's say we've named it
"fella". Navigate to the directory containing the cloned repository. Create
and checkout the branch 'deploy_keys'.
$ git checkout -b deploy_keys
$ heroku login
$ heroku git:remote -a fella
This creates a "slug" containing our code and its dependencies. To get the web
app running, we'll now need to set up a database and a couple of heroku backend
services.
2.1 Heroku-Postgres
===================
Heroku-Postgres is a scalable DB interface for heroku apps. (See the
python section of devcenter.heroku.com/articles/heroku-postgresql .)
To create a free addon,
$ heroku addons:create heroku-postgresql:hobby-dev
The 'master' branch is set up to use either a local postgres database server
or Heroku-Postgres. The location of the DB in the "cloud" is specified
when you create the heroku addon, and heroku automatically sets the
instance's environment variable DATABASE_URL, which you can query with
$ heroku config
Using 'sea' or 'ice' with '--config=heroku' will force SeaIce to use the
web address found in this variable to connect to the DB. (Note this is the
default.) Heroku-Postgres doesn't allow you to create roles, so '--role'
will be ignored and the default will be used. To create the database schema:
$ heroku run python sea.py --init-db
2.2 Mailgun
===========
YAMZ provides an email notification service for users who opt in. A utility
called 'digest' collects for each user all notifications that haven't
previously been emailed into a single digest. The code uses a heroku backend
app called Mailgun for SMTP service. To set this up, simply type (you may be
asked to verify your heroku account with a credit card, but note your card
should not be charged for the most basic service level)
$ heroku addons:create mailgun
This sets a number of instance environment variables (see "heroku config").
Of them the code uses "MAILGUN_SMTP_LOGIN" and "MAILGUN_SMTP_PASSWORD" to
connect to Mailgun. Normally that happens when notifications are harvested
by the scheduler (below), but to send out notifications manually, type:
$ heroku run python digest.py
2.3 Heroku-Scheduler
====================
There are two periodic jobs that need to be scheduled in YAMZ: the term
classifier and the email digest. To set this up, do:
$ heroku addons:create scheduler
$ heroku addons:open scheduler
The second command will take you to the web interface for the scheduler. Add
the following two jobs:
"python sea.py --classify-terms" . . . . . every 10 minutes
"python digest.py" . . . . . . . . . . . . once per day
2.4 Starting the instance
=========================
Now that your instance is all prepared, you can get it up and running with
$ git push heroku deploy_keys:master
This pushes the secret keys found in the local deploy_keys branch so that
they update the remote master branch on heroku. (xxx see section 1.4 and
??? for setting the secrets)
# xxx app_secret is the (api key?) password from netrc, or "heroku auth:token"?
2.5 Making changes
==================
Deploying changes to heroku is made easy with Git. Suppose we have changes
to 'master' that we want to push to heroku.
$ git checkout deploy_keys
$ git merge master # updates deploy_keys with latest master commits
$ git push heroku deploy_keys:master
The first command checks out the already created local 'deploy_keys' branch.
The second command merges the latest commits from the master branch into it,
and the final command updates the heroku master branch, which also restarts
the instance. This keeps the secrets outside the master branch.
When you next checkout the master branch, however, your keys and secrets in
the .seaice* files will be overwritten, so you may want to save them in
separate files that you can copy back in to the branch when you later deploy
again; just make sure those separate files don't ever become part of any
branch that will show up in the public github repo.
2.6 Exporting the dictionary
============================
The SeaIce API includes queries for importing and exporting database tables
in JSON formatted objects. This could be used to backup the entire database.
Note however that imports must be done in the proper order in order to satisfy
foreign key constraints. To back up the dictionary, do:
$ heroku config | grep DATABASE_URL
DATABASE_URL: <whatever>
$ export DATABASE_URL=<whatever>
$ ./sea.py --config=heroku --export=Terms >terms.json
3. URL forwarding
=================
The current stable implementation of YAMZ is redirected from http://yamz.net.
Setting this up takes a bit of doing. The following instructions are synthsized
from http://lifesforlearning.com/heroku-with-godaddy/ for redirecting a domain
name managed by GoDaddy to a Heroku app.
Launch the "Domains" app on GoDaddy. Under "Forward Domain" for the appropriate
domain (let's call it "fella.org"), add the following settings:
Forward to . . . . . . . . . . . . . . . . . . . . http://www.fella.org
Redirect type . . . . . . . . . . . . . . . . . . 301 (Permanent)
Forward settings . . . . . . . . . . . . . . . . . Forward only
Update nameservers and DNS settings
to support this change . . . . . . . . . yes
Next, under "Manage DNS", remove all entries except for 'A (Host)' and 'NS
(Nameserver)', and add the following under 'CName (Alias)':
Record type . . . . . . . . . . . . . . . . . CNAME (Alias)
Host . . . . . . . . . . . . . . . . . . . . . www
Points to . . . . . . . . . . . . . . . . . . http://fella.herokuapp.com
TTL . . . . . . . . . . . . . . . . . . . . . 1 Hour
Next, change the IP address for entry '@' under 'A (Host)' to 50.63.202.31
(the current IP address of yamz.net).
That's it for DNS configuration. The last thing we need to do is modify the
redirect URLs in the Google OAuth API. Edit the authorized javascript origins
and redirect URI by replacing "fella.herokuapp.com" with "fella.org" and
save.
It can take a couple hours to a day for your DNS settings to propogate. Once
it's done, you can navigate to YAMZ by typing "fella.org" into your browser.
Try logging in to verify that the OAuth settings are also correct.
4. Building the docs
====================
The seaice package (but not this README file) is autodoc'ed using
python-sphinx. To install on Ubuntu:
$ sudo apt-get install python-sphinx
The directory doc/sphinx includes a Makefile for exporting the docs to
various media. For example,
make html
make latex