Skip to content

Releases: gwu-libraries/sfm-ui

Version 3.0.0

24 Apr 14:50
eaf63a1
Compare
Choose a tag to compare

Bug/security fixes

  • Django upgraded to 3.2.18 (supported until 2024)

Support for Twitter API v.2

See sfm-twitter-harvester

  • Added support for v.2 API credentials, including the bearer token (recommended) and the combination of consumer key/secret and access token/secret
  • Added support (with twarc2) for harvesting and exporting from v.2 endpoints
  • Due to changes in the Twitter API access model, only the v.2 search_recent and user_timeline endpoints (accessible on the new Basic Access tier) are available in production. A new environment variable, TWITTER_COLLECTION_TYPES, specifies which of the supported Twitter API endpoints are available in the app.
  • Twitter v. 1.1 endpoints have been disabled, but collections previously created via these endpoints are still available for export.

Outstanding issues

Streaming API

  • Streaming rules are handled as seeds; because the Streaming API supports multiple rules per request, an SFM stream collection can have multiple seeds. However, the functionality to limit exports to a subset of active/deleted seeds does not work for these collections. (The logic in SFM for seed-based export applies only to user-timeline collections.)
  • During testing, a long-running stream harvest encountered a "Read timed out" error from the Twitter API, as a result of which, no further Tweets could be collected until the harvest was voided in the UI and restarted. Consulted with the twarc developers; the cause of the error remains unclear, but it may be related to the following:
    • Streaming harvests involve a periodic restart of the twarc.stream() process (every 30 minutes). This logic is designed to prevent excessively large WARC files (since a new WARC is created only at the start of the twarc.stream() process).
    • The twarc developers posit that this regular interruption of the twarc stream could cause problems. The stream is designed to be run continuously. Apparently, the v.2 API is less responsive than the v.1 API, so it's possible that the API might be giving a timeout error if the previous connection hasn't fully closed by the time twarc tries to open a new one.
    • If that is the problem – and it's hard to know for sure – then introducing a sleep before restarting could be effective; however, that could result in missed Tweets (a risk already posed by restarting the stream every 30 minutes).

Processing container

  • The processing container needs to be upgraded. The image fails to build because of dependency conflicts with the new versions of certain libraries in sfm-utils. We didn't tackle this work during this release because it will probably also involve upgrading the Python and Ubuntu versions used in the image. Since the processing container doesn't directly interact with other components, it should be fine to use for now with the 2.5.0 image for legacy collections, etc. But to use with collections harvested from the v. 2 API, an upgrade will be necessary.

Version 2.5.0

01 Nov 19:45
Compare
Choose a tag to compare

Changes in this release:

  • Upgrades Python version from 3.6 to 3.8 (#1071)
  • Completes configurability of RabbitMQ port (#1086)
  • Fixes display error with harvest stats (thanks, @sebastian-nagel!) (#1089)

Documentation updates:

  • Updates directory ownership options in installation docs (thanks, @sebastian-nagel!) (#1091)
  • Fixes Readthedocs configuration (#1092)

2.4.0...2.5.0

Version 2.4.0

07 Jul 14:00
Compare
Choose a tag to compare

This release contains required configuration updates for existing SFM instances. It is important to review the sfm-docker release notes carefully before upgrading from versions before 2.4

Changes in this release:

  • This release introduces support for hosting data volumes on different filesystems, rather than as subdirectories in a single sfm-data directory (#1051). This allows RabbitMQ, Postgres, and SFM data for exports, containers, and collection sets to be separately configured. Thank you, @SvenLieber, for code contributions to add this feature! For existing SFM instances, please read carefully the sfm-docker release notes for required configuration changes.
  • Allows seeds to be deleted or undeleted while the collection is not active. Thanks for reporting this bug, @SvenLieber! (#1052)
  • Upgrades Django to 2.2.24 and updates djangorestframework. (#1043, #1049)
  • Upgrades Twarc version to fix bug with retweet text in CSV exports. (#1042)

Documentation updates:

  • Updates Amazon Web Services EC2 installation instructions. (#1068)
  • Explains new data volume configuration strategy. (#1051)
  • Explains required number of seeds in Twitter search collections. (#1052)
  • Adds directions on using JWAT Tools. (#1078)

Version 2.3.0

04 May 14:00
Compare
Choose a tag to compare

Changes in this release:

Documentation changes include:

  • Updates to instructions for obtaining credentials (#985, #999, #1008)
  • Referral to WordPad for users of older versions of Windows when opening README and other .txt files. (#1002)

2.2.0...2.3.0

Version 2.2.0

08 Sep 03:15
Compare
Choose a tag to compare

Changes in this release:

  • Upgraded Python libraries (#986, #973, #975)
  • Added language parameter for Twitter filter collections (#943)
  • Documentation - updated links to Twitter documentation (#981)
  • Bugfixes:
    • Weibo export (#983, #988)
    • Export page seed selection (#905)

2.1.0...2.2.0

Version 2.1.0

03 Jan 14:29
Compare
Choose a tag to compare

Changes in this release:

  • Upgraded Python libraries (#955 and #959)
  • Improved AWS deployment support:
    • Support for deployment with AWS Elastic Load Balancer through refinements to ALLOWED_HOSTS (#960) (Contributed by @justinlittman)
    • Support for AWS Simple Email Service by allowing a separate mail-from address (SFM_MAIL_FROM) in docker-compose.yml (#967). This is backwards compatible, so it will still work using EMAIL_HOST_USER, even if no SFM_MAIL_FROM is configured. (Contributed by @justinlittman)
  • Added queue length threshold configurations for the SFM UI component and for Twitter REST harvesters (#950)
  • Improved privacy for monitor view of harvester status visibility (#956)
  • Bugfixes:
    • Fixed Change Log "Fields" column (#952)
    • Fixed credential view erroneously showing as deleted (#949)
    • Fixed serializecollectionset management command (#945)
    • Fixed import of harvest warnings, errors, and info messages from serialized collection (#947)
    • Fixed export to export correct size when requesting 1,000,000; fixed export page message for 100,000 size exports (#957)
    • Fixed unit test for notifications (#948)

2.0.2...2.1.0

Version 2.0.2

13 Aug 21:08
Compare
Choose a tag to compare

Various minor tweaks:

  • Fixed serialization / deserialization and other management commands.
  • Fixed display issue with credentials on collection detail page.
  • Made SFM UI queue length configurable.

Version 2.0.1

07 Aug 17:57
Compare
Choose a tag to compare

No changes.

Version 2.0.0

25 Jul 16:19
Compare
Choose a tag to compare
  • Upgraded to python 3, django 2, and assorted other libraries.
  • Removed finalware and replaced with management commands.
  • Added management commands for deleting web harvests.
  • Fixes favicon link.

Version 1.12.0

12 Jun 19:10
Compare
Choose a tag to compare

Significant changes in this release:

  • Support for deactivating credentials.
  • Added paging to REST API.
  • Fixed links to Twitter docs.
  • Added public links field to collection to help with properly citing datasets.
  • Added handling / configuration of automatic seed deletion.
  • Removing handling / configuration of web harvester.
  • Add support for filtering by warc created date to REST_API.
  • Fixed defect in export segment size parameter.
  • Fixed defect in downloading exports on Safari.
  • Removed pinning of transitive dependencies.

Documentation changes include:

  • Fixed links to Twitter docs.
  • Add citation guidance page.
  • Updated processing container docs to reflect changes / additions.
  • Corrected smoke test instructions.
  • Deprecated web harvester and ELK.
  • Updated Twitter data dictionary to reflect change in Twitter export.
  • Update Export documentation to add detail about time zones.