Skip to content
This repository has been archived by the owner on Jun 8, 2023. It is now read-only.

Issue with geo_point mapping upon log import into ElasticSearch #11

Open
jgarrettsei opened this issue May 27, 2020 · 8 comments
Open

Comments

@jgarrettsei
Copy link

jgarrettsei commented May 27, 2020

Hello. I'm attempting to leverage these Cloudflare Elastic integration config files and I'm running into an issue that is preventing the logs from being imported. Here is a detailed account of my issue:

I’m attempting to follow the instructions here:
https://developers.cloudflare.com/logs/analytics-integrations/elastic/

I’m not using the Elastic Cloud, but my own installation. I’ve got both ElasticSearch and Kibana set up and talking to each other, your ingest pipelines imported, the index template imported, the AWS lambda function deployed, and logs flowing into S3. When the Lamdba function is trying to send the logs into Elastic is where I’m encountering the issue.

I do see a ton of these errors in my “cluster.log” file on my Elastic EC2 server every time it seems to be trying to index logs:

[2020-05-20T12:06:31,803][INFO ][o.e.a.b.TransportShardBulkAction] [logs-node-1] [cloudflare-2020-05-18][0] mapping update rejected by primary
java.lang.IllegalArgumentException: mapper [source.geo.location] of different type, current_type [geo_point], merged_type [ObjectMapper]

Here is what I’m seeing in the AWS Cloudwatch logs from the Lamdba function:
cloudwatch_errors

So, this seems to be having an issue with the “geo-point” data type.

Looking in your cloudflare-index-template.json file, I do see this:
index_template

And I can see this mapping in the your “weekly” ingest pipeline:
pipeline

Doing a quick bit of research, “geoip” seems to be available in Logstash:

https://www.elastic.co/guide/en/logstash/current/plugins-filters-geoip.html

I did not install Logstash since I didn’t think it was needed for this implementation. Looks like “geoip” might be used to derive all of the other properties (timezone, city, etc) all from the “ClientIP” field in the logs. However, I do see that this is also available as a default Elastic ingest processor:

https://www.elastic.co/guide/en/elasticsearch/reference/7.7/geoip-processor.html

When I run a quick API call against Elastic to look for available plugins, I do see geoip referenced:

GET "###.##.###.##:9243/_nodes/plugins" | python -m json.tool | grep geo

{
                    "classname": "org.elasticsearch.ingest.geoip.IngestGeoIpPlugin",
                    "description": "Ingest processor that uses looksup geo data based on ip adresses using the Maxmind geo database",
                    "elasticsearch_version": "7.7.0",
                    "extended_plugins": [],
                    "has_native_controller": false,
                    "java_version": "1.8",
                    "name": "ingest-geoip",
                    "version": "7.7.0"
                }

So, it does seem that I have this installed, as far as I can tell. I did a real quick test to make sure geoip is working properly. I ran a couple of API commands to create a small pipeline with just the “geoip”:

curl --user <user>:<password> -X PUT "###.##.###.##:9243/_ingest/pipeline/testgeoip" -H "Content-Type: application/json" -d '{"description" : "Add geoip info","processors" : [{"geoip" : {"field" : "ip"}}]}'

I then created a small index using that pipeline with just a random IP:

curl --user <user>:<password> -X PUT "###.##.###.##:9243/my_index/_doc/my_id?pipeline=testgeoip" -H "Content-Type: application/json" -d '{"ip":"8.8.8.8"}'

I then fetched the contents of the index:

curl --user <user>:<password> -X GET "###.##.###.##:9243/my_index/_doc/my_id" | python -m json.tool

{
"_id": "my_id",
"_index": "my_index",
"_primary_term": 1,
"_seq_no": 0,
"_source": {
      "geoip": {
      "continent_name": "North America",
      "country_iso_code": "US",
      "location": {
      "lat": 37.751,
      "lon": -97.822
      }
},
"ip": "8.8.8.8"
},
"_type": "_doc",
"_version": 1,
"found": true
}

So, it seems that geoip is working. However, the error message is targeting the “location” field specifically. Here, it looks to be an object (lat and lon values). I have also performed a more accurate test:

Create pipeline (pulled from the Cloudflare file):

PUT /_ingest/pipeline/jmggeoip
{
  "description": "Jason Log Pipeline",
  "processors": [
    {
      "geoip": {
        "field": "ClientIP",
        "target_field": "source.geo",
        "properties": [
          "ip",
          "country_name",
          "continent_name",
          "region_iso_code",
          "region_name",
          "city_name",
          "timezone",
          "location"
        ]
      }
    }
  ]
}

Create index template mapping (pulled from the Cloudflare file):

PUT /_template/jmgtemplate
{
   "index_patterns": [
     "jmgindex-*"
   ],
   "mappings": {
      "properties": {
         "source.geo": {
            "properties": {
               "ip": {
                  "type": "ip"
               },
               "postal_code": {
                  "type": "keyword"
               },
               "location": {
                  "type": "geo_point"
               },
               "dma_code": {
                  "type": "long"
               },
               "country_code3": {
                  "type": "keyword"
               },
               "latitude": {
                  "type": "float"
               },
               "longitude": {
                  "type": "float"
               },
               "region_name": {
                  "type": "keyword"
               },
               "city_name": {
                  "type": "keyword"
               },
               "timezone": {
                  "type": "keyword"
               },
               "country_code2": {
                  "type": "keyword"
               },
               "continent_code": {
                  "type": "keyword"
               },
               "country_name": {
                  "type": "keyword"
               },
               "region_code": {
                  "type": "keyword"
               },
               "continent_name": {
                  "type": "keyword"
               },
               "region_iso_code": {
                  "type": "keyword"
              }
            }
         }
      }
   },
   "settings": {
      "index": {
         "number_of_shards": "1",
         "number_of_replicas": "1",
         "mapping.ignore_malformed": true
      }
   }
}

Create index (index pattern matching above and pipline created above):

PUT /jmgindex-test/_doc/my_id?pipeline=jmggeoip
{"ClientIP":"8.8.8.8"}

Fetch the index:
GET /jmgindex-test/_doc/my_id

This call returns the following information:

{
  "_index" : "jmgindex-test",
  "_type" : "_doc",
  "_id" : "my_id",
  "_version" : 1,
  "_seq_no" : 0,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "source" : {
      "geo" : {
        "continent_name" : "North America",
        "timezone" : "America/Chicago",
        "ip" : "8.8.8.8",
        "country_name" : "United States",
        "location" : {
          "lon" : -97.822,
          "lat" : 37.751
        }
      }
    },
    "ClientIP" : "8.8.8.8"
  }
}

So, as you can see, we are still getting latitude and longitude back. Now, let’s look at the field mapping:
field_mapping

Now, we are properly mapping to “geo_point”. However, while this example seems to be working, the ingest process I set up for Cloudflare is not working. So, there must be something about the setup process that is missing. Just to reiterate, here is the error I’m getting when your Lamdba function tries to insert a log into ElasticSearch:

[2020-05-20T12:06:31,803][INFO ][o.e.a.b.TransportShardBulkAction] [logs-node-1] [cloudflare-2020-05-18][0] mapping update rejected by primary
java.lang.IllegalArgumentException: mapper [source.geo.location] of different type, current_type [geo_point], merged_type [ObjectMapper]

This is why I’m hitting a wall. Everything “seems” to be setup properly from the Elastic side and I think the above proves the geo_point mapping and geoip functionality is working fine.

If this is a bug, please let me know. Otherwise, I would appreciate some assistance in narrowing down this issue. Thanks.

@mr-karan
Copy link

mr-karan commented Jun 2, 2020

+1 faced the same

@jgarrettsei
Copy link
Author

Just ran an even more accurate test to try and narrow this down. I created a brand new index using the existing Cloudflare pipeline and Cloudflare index template that I already submitted to Elastic. I also pulled a single JSON record from one of our edge logs that is getting dumped up to S3:

PUT /cloudflare-123/_doc/my_ip?pipeline=cloudflare-pipeline-weekly
{"BotScore":76,"BotScoreSrc":"Machine Learning","CacheCacheStatus":"hit","CacheResponseBytes":257510,"CacheResponseStatus":200,"CacheTieredFill":false,"ClientASN":####,"ClientCountry":"us","ClientDeviceType":"desktop","ClientIP":"###.###.###.###,"ClientIPClass":"noRecord","ClientRequestBytes":4147,"ClientRequestHost":"www.sample.com","ClientRequestMethod":"GET","ClientRequestPath":"/shop/test","ClientRequestProtocol":"HTTP/2","ClientRequestReferer":"https://www.sample.com/shop/test2","ClientRequestURI":"/shop/test","ClientRequestUserAgent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36","ClientSSLCipher":"AEAD-AES128-GCM-SHA256","ClientSSLProtocol":"TLSv1.3","ClientSrcPort":52355,"ClientXRequestedWith":"","EdgeColoCode":"BNA","EdgeColoID":115,"EdgeEndTimestamp":"2020-05-20T00:00:06Z","EdgePathingOp":"wl","EdgePathingSrc":"macro","EdgePathingStatus":"nr","EdgeRateLimitAction":"","EdgeRateLimitID":0,"EdgeRequestHost":"www.sample.com","EdgeResponseBytes":61001,"EdgeResponseCompressionRatio":4.29,"EdgeResponseContentType":"text/html","EdgeResponseStatus":200,"EdgeServerIP":"","EdgeStartTimestamp":"2020-05-20T00:00:06Z","FirewallMatchesActions":[],"FirewallMatchesRuleIDs":[],"FirewallMatchesSources":[],"OriginIP":"","OriginResponseBytes":0,"OriginResponseHTTPExpires":"","OriginResponseHTTPLastModified":"","OriginResponseStatus":0,"OriginResponseTime":0,"OriginSSLProtocol":"unknown","ParentRayID":"00","RayID":"####","SecurityLevel":"med","WAFAction":"unknown","WAFFlags":"0","WAFMatchedVar":"","WAFProfile":"unknown","WAFRuleID":"","WAFRuleMessage":"","ZoneID":####}

This created a new index called “cloudflare-2020-05-18”. When I queried the index, it returned a valid result with geo_point information:

GET /cloudflare-2020-05-18/_doc/my_ip

…
"found" : true,
  "_source" : {
    "BotScoreSrc" : "Machine Learning",
    "source" : {
      "geo" : {
        "continent_name" : "North America",
        "region_iso_code" : "US-TN",
        "city_name" : "Murfreesboro",
        "country_iso_code" : "us",
        "timezone" : "America/Chicago",
        "ip" : "###.###.###.###",
        "country_name" : "United States",
        "region_name" : "Tennessee",
        "location" : {
          "lon" : -86.3881,
          "lat" : 35.8437
        }
      },
      "as" : {
        "number" : ####
      },
…

So, everything on the Elastic side seems to be working. I’m suspecting more and more this is an issue with the Lambda function. It is using a deprecated bulk load method, so perhaps that is impacting this? Here is that warning:

WARNING ... [types removal] Specifying types in bulk requests is deprecated."

@adrwh
Copy link

adrwh commented Jun 26, 2020

I have exactly the same issue. Any progress on this yet?

@adrwh
Copy link

adrwh commented Jun 26, 2020

This problem seems to be fixed in https://github.com/cloudflare/cloudflare-elastic/releases/tag/v0.3-7.x in the file named cloudflare-elastic-aws.zip (which is the Lambda function).

@jgarrettsei
Copy link
Author

jgarrettsei commented Jun 26, 2020 via email

@adrwh
Copy link

adrwh commented Jun 26, 2020

Yes, it fixed the geo_point mapping issue.

I also had another similar issue with but i was able to add ignore_missing: true to my pipelines.

@jgarrettsei
Copy link
Author

jgarrettsei commented Jul 1, 2020

Hi @adrwh . Just getting back to this. I checked the link you provided above (https://github.com/cloudflare/cloudflare-elastic/releases/tag/v0.3-7.x), but I don't see any files called "cloudflare-elastic-aws.zip". The zip I downloaded, cloudflare-elastic-0.3-7.x.zip, only seems to have the source Java and nothing pre-packaged in a ZIP file. Can you please double-check and make sure I'm looking in the correct place?

Unless perhaps gradlew needs to be run to recompile the code? I tried this previously when trying to debug the Java for the Lamdba function, but could not get this to run properly.

@jgarrettsei
Copy link
Author

@adrwh Jackpot! After recompiling successfully with gradlew and uploading to AWS, it is working now! Thanks a lot for your help!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants