Skip to content

bitmakerla/estela-entrypoint

Repository files navigation

estela Entrypoint

Code style: black version python-version

The package implements a wrapper layer to extract job data from environment, prepare the job properly, and execute it using Scrapy.

Entrypoints

  • estela-crawl: Process job args and settings to run the job with Scrapy.
  • estela-describe-project: Print JSON-encoded project information and image metadata.

Installation

$ python setup.py install 

Requirements

$ pip install -r requirements.txt

Environment variables

Job specifications are passed through env variables:

  • JOB_INFO: Dictionary with this fields:
    • [Required] key: Job key (job ID, spider ID and project ID).
    • [Required] spider: String spider name.
    • [Required] auth_token: User token authentication.
    • [Required] api_host: API host URL.
    • [Optional] args: Dictionary with job arguments.
    • [Required] collection: String with name of collection where items will be stored.
    • [Optional] unique: String, "True" if the data will be stored in a unique collection, "False" otherwise. Required only for cronjobs.
  • QUEUE_PLATFORM: The queue platform used by estela, review the list of the current supported platforms.
  • QUEUE_PLATFORM_{PARAMETERS}: Please, refer to the estela-queue-adapter documentation to declare the needed variables.

Testing

$ pytest

Formatting

$ black .