Skip to content
This repository has been archived by the owner on May 23, 2024. It is now read-only.

Create Glue Crawler is missed #35

Open
alytle opened this issue Mar 26, 2019 · 3 comments
Open

Create Glue Crawler is missed #35

alytle opened this issue Mar 26, 2019 · 3 comments

Comments

@alytle
Copy link

alytle commented Mar 26, 2019

Describe the bug
When creating a Glue Crawler from the console, the call to create the Crawler itself is missed.

Related Mapping
glue.CreateCrawler

Related Language
n/a

To Reproduce

  1. Go to https://console.aws.amazon.com/glue/home?region=us-east-1#catalog:tab=crawlers
  2. Click Add Crawler
  3. Fill out required information
  4. Click Add Crawler

Expected behavior
Glue Crawler would be created in the resulting code. Currently the secondary items are created successfully (Glue Connection, Glue Database) when using the same wizard, but not the Crawler itself.

Screenshots
n/a

Additional context
n/a

@iann0036
Copy link
Owner

Hi Andrew,

Thanks for raising. I tried to reproduce but was only able to successfully create the crawler resource. Could you attempt to reproduce and check both the main.html and bg.js console logs to see if there is any obvious issues there?

Cheers,
Ian.

@alytle
Copy link
Author

alytle commented Apr 1, 2019

OK, I tried again today. I was able to narrow down the problem slightly. When I create a Glue Crawler with an S3 input source, it seems to work, but if I create one which has a JDBC datastore, it doesn't get captured.

Here is the bg.js logs from the successful S3 crawler creation:

{"actionResponses":[{"action":"com.amazonaws.console.glue.shared.UserPreferenceRequestContext.getUserPreference"}]}  bg.js:5398:13

{"actionResponses":[{"action":"com.amazonaws.console.glue.awssdk.shared.context.AWSGlueContext.getSecurityConfigurations","data":{"securityConfigurations":[],"nextToken":""}}]}  bg.js:5398:13

{"actionResponses":[{"action":"com.amazonaws.console.glue.awssdk.shared.context.AmazonS3Context.listBuckets","data":<redacted>]}  bg.js:5398:13

{"actionResponses":[{"action":"com.amazonaws.console.glue.awssdk.shared.context.AWSGlueContext.getConnections","data":{"connectionList":[{"name":"test","description":"","connectionType":"JDBC","connectionProperties":{"JDBC_ENFORCE_SSL":"false","JDBC_CONNECTION_URL":"jdbc:postgres://something:5432/databasename","USERNAME":"username"},"physicalConnectionRequirements":{"subnetId":"subnet-<redacted>","securityGroupIdList":["sg-<redacted>"],"availabilityZone":"us-east-1b"},"creationTime":1554130760555,"lastUpdatedTime":1554130760555}]}}]}  bg.js:5398:13
{"actionResponses":[{"action":"com.amazonaws.console.glue.shared.IAMRequestContext.listRoles","data":[{"roleName":"AWSGlueServiceRole-Glue","roleId":"<redacted>","arn":"arn:aws:iam::<redacted>:role/service-role/AWSGlueServiceRole-Glue"}]}]}  bg.js:5398:13

{"actionResponses":[{"action":"com.amazonaws.console.glue.awssdk.shared.context.AWSGlueContext.getCrawler","error":{"message":"{\"service\":\"AWSGlue\",\"statusCode\":400,\"errorCode\":\"EntityNotFoundException\",\"requestId\":\"e7f01c7f-548f-11e9-964b-05ff9084c187\",\"errorMessage\":\"Crawler entry with name s3-crawler does not exist\",\"type\":\"AwsServiceError\"}","code":400}}]}  bg.js:5398:13

{"actionResponses":[{"action":"com.amazonaws.console.glue.awssdk.shared.context.AmazonS3Context.listBuckets","data":[<redacted>]}  bg.js:5398:13

{"actionResponses":[{"action":"com.amazonaws.console.glue.awssdk.shared.context.AWSGlueContext.getDatabases","data":{"databaseList":[{"name":"default","description":"Default Hive database","locationUri":"hdfs://ip-172-20-79-123.ec2.internal:8020/user/hive/warehouse","createTime":1528479163000},{"name":"sampledb","description":"Sample database","parameters":{"CreatedBy":"Athena","EXTERNAL":"TRUE"},"createTime":1528479057000}]}}]}  bg.js:5398:13

Calling notify  bg.js:2022:5

Type error for parameter options (Property "buttons" is unsupported by Firefox) for notifications.create.  bg.js:2023

{"actionResponses":[{"action":"com.amazonaws.console.glue.awssdk.shared.context.AWSGlueContext.createCrawler","data":{}}]}  bg.js:5398:13

{"actionResponses":[{"action":"com.amazonaws.console.glue.awssdk.shared.context.AWSGlueContext.listCrawlers","data":{"crawlerNames":["s3-crawler"]}}]}  bg.js:5398:13

{"actionResponses":[{"action":"com.amazonaws.console.glue.awssdk.shared.context.AWSGlueContext.getTaggedResources","data":{"paginationToken":"","resourceTagMappingList":[]}}]}  bg.js:5398:13

{"actionResponses":[{"action":"com.amazonaws.console.glue.awssdk.shared.context.AWSGlueContext.batchGetCrawlers","data":{"crawlers":[{"name":"s3-crawler","role":"service-role/AWSGlueServiceRole-Glue","targets":{"s3Targets":[{"path":"s3://andlytle-test/","exclusions":[]}],"jdbcTargets":[],"dynamoDBTargets":[]},"databaseName":"default","classifiers":[],"schemaChangePolicy":{"updateBehavior":"UPDATE_IN_DATABASE","deleteBehavior":"DEPRECATE_IN_DATABASE"},"state":"READY","crawlElapsedTime":0,"creationTime":1554131284000,"lastUpdated":1554131284000,"version":1}],"crawlersNotFound":[]}}]}  bg.js:5398:13
{"actionResponses":[{"action":"com.amazonaws.console.glue.awssdk.shared.context.AWSGlueContext.getCrawlerMetrics","data":{"crawlerMetricsList":[{"crawlerName":"s3-crawler","timeLeftSeconds":0.0,"stillEstimating":false,"lastRuntimeSeconds":0.0,"medianRuntimeSeconds":0.0,"tablesCreated":0,"tablesUpdated":0,"tablesDeleted":0}]}}]}  bg.js:5398:13

and here are the logs from the unsuccessful JDBC creation:

{"actionResponses":[{"action":"com.amazonaws.console.glue.awssdk.shared.context.AWSGlueContext.getCrawler","error":{"message":"{\"service\":\"AWSGlue\",\"statusCode\":400,\"errorCode\":\"EntityNotFoundException\",\"requestId\":\"8b011b01-5490-11e9-bb13-8bbe5f181416\",\"errorMessage\":\"Crawler entry with name jdbc-crawler does not exist\",\"type\":\"AwsServiceError\"}","code":400}}]}  bg.js:5398:13
{"actionResponses":[{"action":"com.amazonaws.console.glue.awssdk.shared.context.AmazonS3Context.listBuckets","data":[<redacted>]}  bg.js:5398:13

{"actionResponses":[{"action":"com.amazonaws.console.glue.awssdk.shared.context.AWSGlueContext.getDatabases","data":{"databaseList":[{"name":"default","description":"Default Hive database","locationUri":"hdfs://ip-172-20-79-123.ec2.internal:8020/user/hive/warehouse","createTime":1528479163000},{"name":"sampledb","description":"Sample database","parameters":{"CreatedBy":"Athena","EXTERNAL":"TRUE"},"createTime":1528479057000}]}}]}  bg.js:5398:13

{"actionResponses":[{"action":"com.amazonaws.console.glue.awssdk.shared.context.AWSGlueContext.createCrawler","data":{}}]}  bg.js:5398:13

{"actionResponses":[{"action":"com.amazonaws.console.glue.awssdk.shared.context.AWSGlueContext.listCrawlers","data":{"crawlerNames":["jdbc-crawler","s3-crawler"]}}]}  bg.js:5398:13

{"actionResponses":[{"action":"com.amazonaws.console.glue.awssdk.shared.context.AWSGlueContext.getTaggedResources","data":{"paginationToken":"","resourceTagMappingList":[]}}]}  bg.js:5398:13

{"actionResponses":[{"action":"com.amazonaws.console.glue.awssdk.shared.context.AWSGlueContext.batchGetCrawlers","data":{"cra
wlers":[{"name":"s3-crawler","role":"service-role/AWSGlueServiceRole-Glue","targets":{"s3Targets":[{"path":"s3://andlytle-test/","exclusions":[]}],"jdbcTargets":[],"dynamoDBTargets":[]},"databaseName":"default","classifiers":[],"schemaChangePolicy":{"updateBehavior":"UPDATE_IN_DATABASE","deleteBehavior":"DEPRECATE_IN_DATABASE"},"state":"READY","crawlElapsedTime":0,"creationTime":1554131284000,"lastUpdated":1554131284000,"version":1},{"name":"jdbc-crawler","role":"service-role/AWSGlueServiceRole-Glue","targets":{"s3Targets":[],"jdbcTargets":[{"connectionName":"test","path":"%","exclusions":[]}],"dynamoDBTargets":[]},"databaseName":"default","classifiers":[],"schemaChangePolicy":{"updateBehavior":"UPDATE_IN_DATABASE","deleteBehavior":"DEPRECATE_IN_DATABASE"},"state":"READY","crawlElapsedTime":0,"creationTime":1554131558000,"lastUpdated":1554131558000,"version":1}],"crawlersNotFound":[]}}]}  bg.js:5398:13

{"actionResponses":[{"action":"com.amazonaws.console.glue.awssdk.shared.context.AWSGlueContext.getCrawlerMetrics","data":{"crawlerMetricsList":[{"crawlerName":"jdbc-crawler","timeLeftSeconds":0.0,"stillEstimating":false,"lastRuntimeSeconds":0.0,"medianRuntimeSeconds":0.0,"tablesCreated":0,"tablesUpdated":0,"tablesDeleted":0},{"crawlerName":"s3-crawler","timeLeftSeconds":0.0,"stillEstimating":false,"lastRuntimeSeconds":0.0,"medianRuntimeSeconds":0.0,"tablesCreated":0,"tablesUpdated":0,"tablesDeleted":0}]}}]}  bg.js:5398:13

Here's what I get as total output for my two crawlers:

# pip install boto3

import boto3

glue_client = boto3.client('glue', region_name='us-east-1')

response = glue_client.get_security_configurations()
response = glue_client.get_connections()

s3_client = boto3.client('s3', region_name='us-east-1')

response = s3_client.list_buckets()
response = s3_client.list_buckets()
response = glue_client.get_databases()
response = glue_client.create_crawler(
    Name='s3-crawler',
    Role='arn:aws:iam::<redacted>:role/service-role/AWSGlueServiceRole-Glue',
    DatabaseName='default',
    Classifiers=[],
    Schedule='',
    Configuration='{"Version":1}',
    TablePrefix='',
    SchemaChangePolicy={
        'UpdateBehavior': 'UPDATE_IN_DATABASE'
    },
    Targets={
        'S3Targets': [
            {
                'Path': 's3://andlytle-test/',
                'Exclusions': []
            }
        ],
        'JdbcTargets': [],
        'DynamoDBTargets': []
    }
)
response = glue_client.get_classifiers()
response = glue_client.get_classifiers()
response = glue_client.get_security_configurations()
response = glue_client.get_connections()
response = s3_client.list_buckets()
response = s3_client.list_buckets()
response = glue_client.get_databases()
response = glue_client.get_classifiers()

@iann0036
Copy link
Owner

iann0036 commented Apr 3, 2019

Hey Andrew,

Thanks for clarifying the difference between the S3 and JDBC Crawlers. It helped in tracing what I believe was the issue down to the use of a decodeURIComponent call within the initial data processing. This broke when we had a % symbol in the payload, which is likely in your Include Path for a JDBC connection.

I've released a new version, 0.3.24 which should fix the issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants