Skip to content

Commit ddd1bd0

Browse files
author
metron
committed
Added all files and cleaned up dependencies
0 parents  commit ddd1bd0

File tree

7 files changed

+292
-0
lines changed

7 files changed

+292
-0
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
venv/
2+
IP2LOCATION-LITE-DB11.CSV

README.MD

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
# Ip2Location Database Wrapper Service
2+
Ip2Location databases are commonly used in cyber security and anomaly detection, marketing and numerous other industries for converting IP addresses into physical location. IP2Loc provides a number of free and paid databases that offer varying degrees of precision and data set features.
3+
4+
On their website they give some examples of how to use their CSV or BIN databases with systems like MySQL. To this we must first convert an IPv4 address into its longform integer representation. This is technique is extremely common and widely used but the gist is:
5+
6+
```
7+
IP Number = 16777216*w + 65536*x + 256*y + z
8+
where
9+
IP Address = w.x.y.z
10+
```
11+
12+
Then we would take that number and perform a range query on the database like so:
13+
14+
```sql
15+
SELECT (country_name, region_name, city_name, latitude, longitude, zip) FROM IP2LOC WHERE (ip_from <= [SEARCH IP NO]) AND (ip_to >= [SEARCH IP NO]);
16+
```
17+
18+
But as you might expect this query can take a long...long time to run especially if there are a lot of rows. In fact on my intial test it took 1.04 sec on average (with indexes) to return a result on a quad-core 16GB CentOS server. That is unbelievably, ridiculously, slow and is not suitable for production.
19+
20+
The code in this repository aims to improve on lookup performance so that IP2Location data can be used in real world systems where high throughput is neccesary.
21+
22+
Instead of using a traditional database, this system capitalizes on the nature of the data. This system builds a single contiguous array of keys and then performs a binary search over the keys to return a result. It takes on average 22 operations to find the data that we are looking for and returns this information in as little as 76 micro seconds. This is orders of magnitude faster. However it should be noted that in its current form, this service runs **directly** on Flask. This is **not** acceptable for production. I still need to wrap the whole thing in a production strength reverse-proxy like NGINX. Which I will do at some point in the future...Still after load testing I was able to get almost 2,000 queries per second through the service without it keeling over and dying.
23+
24+
To increase the ease of use this service has been wrapped as a Flask REST service. Unfortunately adding HTTP to our project causes our service to take quite a performance hit. After adding the REST wrapper query time was slowed to around 12ms per query, still much much better than before, but considerably slower than the 76 microseconds we were able to get without the interface. Additionally the entire service is packaged as an RPM using FPM (Effing Package Management) so that a user can simply use Yum to install the service. Finally the service is deployed as a Red Hat/Cent/Fedora system level service. This means you can bring it up using commands like:
25+
26+
```bash
27+
systemctl ip2loc.service start
28+
```
29+
30+
In order to get the service running you will need to add a python virutal environment to the working directory such that
31+
32+
```
33+
IP2LocationService/venv/
34+
```
35+
36+
is a valid path. You will want to install all the requirements listed in requirements.txt as they are neccesary for the service to run properly. Then you will need to download an IP2Location CSV based database. The free one [available here](http://lite.ip2location.com/database/ip-country-region-city-latitude-longitude-zipcode-timezone) is the easiest to get started with. Simply place the CSV in the working directory and start the service. The steps look like this:
37+
38+
1.) Clone this repo somewhere
39+
40+
2.) Install a virtual env using pip and the included requirements.txt file
41+
42+
3.) Download a Ip2Location database and place it in the same directory. Note that you may want to tailor the "named tuple" in ```search_ip_db.py``` to match the features present in your data. However it should work with the recommended DB out of the box.
43+
44+
Your directory structure should now look something like this:
45+
```
46+
IP2LocationService/
47+
ip2loc.service
48+
make_links.sh
49+
remove_links.sh
50+
README.MD
51+
requirements.txt
52+
search_ip_db.py
53+
IP2LOCATIONDB.CSV
54+
/venv/
55+
all dependencies from requirements.txt
56+
```
57+
58+
4.) Run FPM on the files in the directory with the following commands:
59+
```bash
60+
fpm -n ip2locService -s dir -t rpm --prefix /opt/ --directories /opt/IP2LocationService --after-install ./IP2LocationService/make_links.sh --before-remove ./IP2LocationService/remove_links.sh IP2LocationService
61+
```
62+
63+
5.) Install the RPM that was just created by running
64+
```bash
65+
sudo yum localinstall IP2LocationService
66+
```
67+
68+
6.) Start the service!
69+
```bash
70+
sudo systemctl ip2loc.service start
71+
```
72+
73+
7.) Query the service and get results!
74+
```bash
75+
curl 127.0.0.1:5000/ip2location/getcoor/172.168.0.24
76+
77+
{"type": "success", "result": {"ip_from": 2896690944, "ip_to": 2896692991, "country_code": "US", "country_name": "United States", "region_name": "Virginia", "city_name": "Dulles", "latitude": 38.997708, "longitude": -77.433179, "zip_code": "20166", "time_zone": "-04:00"}}
78+
```
79+
80+
This software is very much in pre-alpha so if you find any issues please let me know!

ip2loc.service

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
[Unit]
2+
Description=IP2LocationService: Creates and externally facing REST service (running on port 8124) that accepts an IP address to GET /ip2location/getcoor/<ip>
3+
After=network.target
4+
5+
[Service]
6+
Type=forking
7+
ExecStart=/opt/IP2LocationService/venv/bin/python /opt/IP2LocationService/search_ip_db.py
8+
9+
[Install]
10+
WantedBy=default.target

make_links.sh

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
#!/bin/sh
2+
RUN_LOC=/opt/IP2LocationService/
3+
cp $RUN_LOC/ip2loc.service /etc/systemd/system/ip2loc.service
4+
chmod 664 /etc/systemd/system/ip2loc.service
5+
systemctl daemon-reload
6+
systemctl enable ip2loc.service

remove_links.sh

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
#!/bin/sh
2+
if [ -h /usr/bin/ip2loc ]
3+
then rm /usr/bin/ip2loc
4+
fi
5+
if [ -e /etc/systemd/system/ip2loc.service ]
6+
then systemctl disable ip2loc.service && rm /etc/systemd/system/ip2loc.service
7+
fi
8+
if [-e /tmp/IP2LOCPID.pid ]
9+
then rm /tmp/IP2LOCPID.pid
10+
fi
11+
systemctl daemon-reload

requirements.txt

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
click==6.6
2+
Flask==0.11.1
3+
ipaddress==1.0.17
4+
itsdangerous==0.24
5+
Jinja2==2.8
6+
MarkupSafe==0.23
7+
numpy==1.11.2
8+
tqdm==4.8.4
9+
Werkzeug==0.11.11

search_ip_db.py

Lines changed: 174 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,174 @@
1+
from flask import Flask, request, logging
2+
from logging.handlers import RotatingFileHandler
3+
import csv
4+
from collections import namedtuple, OrderedDict
5+
import sys
6+
import os
7+
from tqdm import tqdm
8+
import numpy as np
9+
import time
10+
import json
11+
import ipaddress
12+
13+
14+
class ip2location_database:
15+
16+
db = OrderedDict()
17+
db_keys = None
18+
## IN THE ROW BELOW THE '_id' FIELD IS OMITTED BECAUSE NAMED TUPLES CANNOT HAVE FIELDS THAT BEGIN WITH AN UNDERSCORE
19+
ip2location_db_row = namedtuple('ip2location_db_row','ip_from ip_to country_code country_name region_name city_name latitude longitude zip_code time_zone')
20+
database_flatfile_path = None
21+
abs_min = None
22+
abs_max = None
23+
24+
def __init__(self, db_name="/opt/IP2LocationService/IP2LOCATION-LITE-DB11.CSV"):
25+
self.database_flatfile_path = db_name
26+
print self.database_flatfile_path
27+
28+
def read_database(self):
29+
try:
30+
first_line = True
31+
with open(self.database_flatfile_path, "rb") as fin:
32+
reader = csv.reader(fin)
33+
for row in tqdm(reader):
34+
self.db[int(row[1])] = self.ip2location_db_row(ip_from=int(row[0]), ip_to=int(row[1]), country_code=str(row[2]), country_name=str(row[3]), region_name=str(row[4]), city_name=str(row[5]), latitude=float(row[6]), longitude=float(row[7]), zip_code=str(row[8]), time_zone=str(row[9]))
35+
if first_line:
36+
first_line = False
37+
self.abs_min = int(row[0])
38+
self.db_keys = np.array(self.db.keys())
39+
self.abs_max = int(self.db_keys[-1])
40+
except:
41+
print "Failed to read database. Please make sure that file exists and has a schema matching the one found at http://lite.ip2location.com/database/ip-country-region-city-latitude-longitude-zipcode-timezone"
42+
print sys.exc_info()[0]
43+
raise
44+
45+
def set_database_path(self, db_name):
46+
self.database_flatfile_path = db_name
47+
48+
def find_one_ip(self, ip_address_to_query="172.217.3.206"):
49+
if '.' in ip_address_to_query:
50+
ip_address_to_query = int(ipaddress.IPv4Address(unicode(ip_address_to_query)))
51+
else:
52+
ip_address_to_query = int(ip_address_to_query)
53+
54+
if ip_address_to_query < self.abs_min:
55+
return "UNDEFINED"
56+
elif ip_address_to_query > self.abs_max:
57+
return "UNDEFINED"
58+
else:
59+
low = 0
60+
mid = len(self.db)/2
61+
high = len(self.db)
62+
iterations = 0
63+
while True:
64+
iterations += 1
65+
if ip_address_to_query > self.db_keys[mid]:
66+
low = mid
67+
mid = ((high - mid)/2) + mid
68+
elif ip_address_to_query >= self.db[self.db_keys[mid]].ip_from:
69+
print iterations
70+
return self.db[self.db_keys[mid]]
71+
else:
72+
high = mid
73+
mid = mid - ((mid - low)/2)
74+
75+
def find_many_ips(self, ip_addresses_to_query=["172.217.3.206"]):
76+
min_curr = self.abs_min
77+
min_index = 0
78+
ip_num_to_ip_map={}
79+
results = {}
80+
for i in xrange(len(ip_addresses_to_query)):
81+
if '.' in ip_addresses_to_query[i]:
82+
ip_num = int(ipaddress.IPv4Address(unicode(ip_addresses_to_query[i])))
83+
ip_num_to_ip_map[ip_num] = ip_addresses_to_query[i]
84+
ip_addresses_to_query[i] = ip_num
85+
ip_addresses_to_query.sort()
86+
for item in self.db_keys:
87+
for i in xrange(min_index, len(ip_addresses_to_query)):
88+
if (ip_addresses_to_query[i] <= item) and (ip_addresses_to_query[i] >= min_curr):
89+
results[ip_addresses_to_query[i]] = self.db[item]
90+
else:
91+
min_index = i
92+
break
93+
return {ip_num_to_ip_map[results.keys()[i]]:results[results.keys()[i]] for i in xrange(len(results))}
94+
95+
def daemonize():
96+
"""
97+
do the UNIX double-fork magic, see Stevens' "Advanced
98+
Programming in the UNIX Environment" for details (ISBN 0201563177)
99+
http://www.erlenstar.demon.co.uk/unix/faq_2.html#SEC16
100+
"""
101+
stdin = "/dev/null"
102+
stdout = "/dev/null"
103+
stderr = "/dev/null"
104+
try:
105+
pid = os.fork()
106+
if pid > 0:
107+
# exit first parent
108+
sys.exit(0)
109+
except OSError, e:
110+
sys.stderr.write("fork #1 failed: %d (%s)\n" % (e.errno, e.strerror))
111+
sys.exit(1)
112+
113+
# decouple from parent environment
114+
os.chdir("/")
115+
os.setsid()
116+
os.umask(0)
117+
118+
# do second fork
119+
try:
120+
pid = os.fork()
121+
if pid > 0:
122+
# exit from second parent
123+
sys.exit(0)
124+
except OSError, e:
125+
sys.stderr.write("fork #2 failed: %d (%s)\n" % (e.errno, e.strerror))
126+
sys.exit(1)
127+
128+
# redirect standard file descriptors
129+
sys.stdout.flush()
130+
sys.stderr.flush()
131+
si = file(stdin, 'r')
132+
so = file(stdout, 'a+')
133+
se = file(stderr, 'a+', 0)
134+
os.dup2(si.fileno(), sys.stdin.fileno())
135+
os.dup2(so.fileno(), sys.stdout.fileno())
136+
os.dup2(se.fileno(), sys.stderr.fileno())
137+
138+
# write pidfile
139+
# atexit.register(self.delpid)
140+
# pid = str(os.getpid())
141+
# file(self.pidfile,'w+').write("%s\n" % pid)
142+
143+
app = Flask(__name__)
144+
db = ip2location_database()
145+
146+
@app.route("/lookup/", methods=['POST'])
147+
def find_ip():
148+
result = {}
149+
args = request.get_json(force=True)
150+
result['type'] = "success"
151+
result['result'] = db.find_one_ip(args['ip'])._asdict()
152+
result['result']['_id'] = "" #this must be added here as namedtuples do not support fields with '_' in their name
153+
return json.dumps(result)
154+
155+
@app.route("/ip2location/getcoor/<ip>", methods=['GET'])
156+
def emulate_reddys_service(ip):
157+
result = {}
158+
try:
159+
result['type'] = "success"
160+
result['result'] = db.find_one_ip(ip)._asdict()
161+
result['result']['_id'] = "" #this must be added here as namedtuples do not support fields with '_' in their name
162+
return json.dumps(result)
163+
except:
164+
result['type'] = "error"
165+
result['result'] = None
166+
return json.dumps(result)
167+
168+
if __name__ == "__main__":
169+
daemonize()
170+
db.read_database()
171+
logger = logging.getLogger('werkzeug')
172+
handler = RotatingFileHandler('access.log', maxBytes=500)
173+
logger.addHandler(handler)
174+
app.run(host='0.0.0.0', port=5000)

0 commit comments

Comments
 (0)