Skip to content

FrancoisChaumont/aws-athena-api-tools_php

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Toolkit for AWS Athena API

GitHub contributions welcome GitHub release GitHub issues GitHub issues

GitHub watchers GitHub stars GitHub forks GitHub contributors Github All Releases

Introduction

What it does? It allows you to do the following from the command line:

  • create/drop database
  • execute a single query
  • execute multiple queries simultaneously while remaining within your max rate limits
  • create partitions on non-hive or hive formatted data
  • get one or multiple queries current states
  • stop a running query
  • create a named query
  • list & detail named queries
  • list & detail databases
  • list & detail database tables

Requirements

¹ The SDK should detect the credentials from environment variables (via AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY), an AWS credentials INI file in your HOME directory, AWS Identity and Access Management (IAM) instance profile credentials, or credential providers

Installation

Download a copy of this repository and run the following:

composer install

Configuration

Modify the following variables inside the file .env for default values to use when related options are omitted

  • PROFILE: AWS profile from ~/.AWS/credentials
  • VERSION: AWS webservice version
  • REGION: AWS region to connect to
  • CATALOG: Athena data source catalog
  • WORKGROUP: Athena workgroup
  • QUERY_OUTPUT: S3 bucket for query results
  • AWS_DEFAULT_MAX_CALLS_PER_SECOND_LEVEL1QUERIES¹: level 1 queries max calls per second
  • AWS_DEFAULT_MAX_BURST_CAPACITY_LEVEL1QUERIES¹⁺⁰: level 1 queries max burst capacity
  • AWS_DEFAULT_MAX_CALLS_PER_SECOND_LEVEL2QUERIES²: level 2 queries max calls per second
  • AWS_DEFAULT_MAX_BURST_CAPACITY_LEVEL2QUERIES²⁺⁰: level 2 queries max burst capacity
  • AWS_DEFAULT_MAX_CALLS_PER_SECOND_LEVEL3QUERIES³: level 3 queries max calls per second
  • AWS_DEFAULT_MAX_BURST_CAPACITY_LEVEL3QUERIES³⁺⁰: level 3 queries max burst capacity
  • AWS_DEFAULT_MAX_CALLS_PER_SECOND_LEVEL4QUERIES⁴: level 4 queries max calls per second
  • AWS_DEFAULT_MAX_BURST_CAPACITY_LEVEL4QUERIES⁴⁺⁰: level 4 queries max burst capacity
  • AWS_DEFAULT_MAX_CALLS_PER_SECOND_LEVEL5QUERIES⁵: level 5 queries max calls per second
  • AWS_DEFAULT_MAX_BURST_CAPACITY_LEVEL5QUERIES⁵⁺⁰: level 5 queries max burst capacity
  • AWS_DEFAULT_SIMULTANEOUS_DDL_QUERIES⁶: max simultaneous DDL queries
  • AWS_DEFAULT_SIMULTANEOUS_DML_QUERIES⁷: max simultaneous DML queries

¹BatchGetNamedQuery, ListNamedQueries, ListQueryExecutions
²CreateNamedQuery, DeleteNamedQuery, GetNamedQuery
³BatchGetQueryExecution
⁴StartQueryExecution, StopQueryExecution
⁵GetQueryExecution, GetQueryResults - a value higher than 2 will exceed the max rate limit
⁶create table, create table add partition ⁷select, create table as (CTAS)

⁰max burst capacity not yet implemented

Important

  • Make sure to double % inside query files for other than parameters passed to the query or they will be replaced by sprintf

Example passing year + month to constitute the table name:

SELECT DATE_FORMAT(FROM_UNIXTIME(1614716423), '%%Y-%%m-%%d %%H:%%i:%%S')
FROM database.table_name_%1$s%2$s

The tools

See tools documentation for more details.

Testing

See tests documentation for more details.

AWS documentation

AWS documentation:

TODO

Methods:

  • BatchGetNamedQuery
  • BatchGetQueryExecution
  • CreateDataCatalog
  • CreatePreparedStatement
  • CreateWorkGroup
  • DeleteDataCatalog
  • DeletePreparedStatement
  • DeleteWorkGroup
  • GetDataCatalog
  • GetPreparedStatement
  • GetQueryResults
  • GetWorkGroup
  • ListDataCatalogs
  • ListEngineVersions
  • ListPreparedStatements
  • ListQueryExecutions
  • ListTagsForResource
  • ListWorkGroups
  • TagResource
  • UntagResource
  • UpdateDataCatalog
  • UpdatePreparedStatement
  • UpdateWorkGroup

Others:

  • implement burst capacity?