Merge pull request #84 from fastly/zma/add_journal

Initial commit for build journal. We will use tools/saftw.py to gener…
fastly · Feb 14, 2017 · 7db7c22 · 7db7c22
2 parents 057af7f + 9cfaaf4
commit 7db7c22
Show file tree

Hide file tree

Showing 12 changed files with 355 additions and 44 deletions.
diff --git a/MANIFEST b/MANIFEST
@@ -0,0 +1,12 @@
+# file GENERATED by distutils, do NOT edit
+setup.cfg
+setup.py
+ftw/__init__.py
+ftw/errors.py
+ftw/http.py
+ftw/logchecker.py
+ftw/ruleset.py
+ftw/testrunner.py
+ftw/util.py
+test/test_default.py
+test/test_modsecurityv2.py
diff --git a/README.md b/README.md
@@ -24,8 +24,8 @@ Goals / Use cases include:
 ## Provisioning Apache+Modsecurity+OWASP CRS
 If you require an environment for testing WAF rules, there has been one created with Apache, Modsecurity and version 3.0.0 of the OWASP core ruleset. This can be deployed by:
 
-* Checking out the repository: ``git clone https://github.com/fastly/waf_testbed.git```
-* Typeing ```vagrant up```
+* Checking out the repository: ``git clone https://github.com/fastly/waf_testbed.git``
+* Typing ```vagrant up```
 
 ## Running Tests while overriding destination address in the yaml files to custom domain
 * *start your test web server*

diff --git a/docs/Journaling.md b/docs/Journaling.md
@@ -0,0 +1,106 @@
+===============
+Journaling
+===============
+
+FTW supports the process of creating journal entries for your HTTP tests. The idea behind this stems from the need to decouple the sending of attacks with testing the responses. This might be better explained with the following use cases:
+
+ 1. A pentester needs to issue attacks against a WAF but does not have access to the logs at the time of the test/series of attacks. A journal of attack requests and responses will help the pentester by correlating a database of FTW requests and responses with customer logs at a later time.
+
+ 2. A security engineer integrating FTW into their WAF environment does not want to check each FTW attack/response pair against a log. This is especially painful for cases where logs are sent to a network service and the tool _beats_ the log service by checking for a log as its being sent, being indexed etc. This is not ideal because we run into a halting problem where we cannot guess ahead of time how long to wait before we check the log service for the existence of a log. With this method, the security engineer can fire off an attack and then batch check the logs at a later date when he or she knows they can query a window of time without having to worry about network latency.  
+
+Workflow
+==================
+The workflow is twofold. Run the FTW tool `build_journal.py` against a service with a WAF in front of it and collect response data. Once all of the response data is retrieved, run FTW as you would in any other integration scenario, but write a plugin that opens the sqlite database to retrieve logs instead of a file or a network API.
+
+Usage - Build the Journal
+==================
+
+1. `git clone git@github.com:fastly/ftw.git`
+2. `virtualenv ftwenv`
+3. `./ftwenv/bin/activate`
+4. `pip install -r requirements.txt`
+5. `./tools/build_journal.py --ruledir=dir`   
+  * This will produce `journal.sqlite`
+  * Check out the options in `build_journal.py` for specifying journal files, table names
+
+Once these steps are complete, you will have a `sqlite` file that you can explore and query by rule-id, time etc. 
+
+
+Usage - Using the Journal 
+==================
+
+Because FTW was built with intention of custom integrations, testers can follow similar steps of found in Step 4 of `ExtendingFTW.md`.
+
+A new API in the rulerunner was created to pass in journal files to run FTW against. The testrunner will still need the `logchecker_obj` to call `get_logs()`, since it is correlating sqlite output with log output. Implement a logchecker just like the ones outlined in `ExtendingFTW.md`, and FTW will handle retrieving the correct logs from sqlite for you.
+
+We will use an example from `SpiderLabs/OWASP-CRS-regressions` as the example:
+
+```python
+from ftw import ruleset, logchecker, testrunner
+import pytest
+import sys
+import re
+import os
+import ConfigParser
+
+def test_crs(ruleset, test, logchecker_obj, with_journal, tablename):
+    runner = testrunner.TestRunner()
+    for stage in test.stages:
+        runner.run_stage_with_journal(test.ruleset_meta['name'], test, with_journal, tablename, logchecker_obj)
+
+class FooLogChecker(logchecker.LogChecker):
+
+    def reverse_readline(self, filename):
+        with open(filename) as f:
+            f.seek(0, os.SEEK_END)
+            position = f.tell()
+            line = ''
+            while position >= 0:
+                f.seek(position)
+                next_char = f.read(1)
+                if next_char == "\n":
+                    yield line[::-1]
+                    line = ''
+                else:
+                    line += next_char
+                position -= 1
+            yield line[::-1]
+
+    def get_logs(self):
+        import datetime
+        config = ConfigParser.ConfigParser()
+        config.read("settings.ini")
+        log_location = config.get('settings', 'log_location')
+        our_logs = []
+        pattern = re.compile(r"\[([A-Z][a-z]{2} [A-z][a-z]{2} \d{1,2} \d{1,2}\:\d{1,2}\:\d{1,2}\.\d+? \d{4})\]")
+        for lline in self.reverse_readline(log_location):
+            # Extract dates from each line
+            match = re.match(pattern,lline)
+            if match:
+                log_date = match.group(1)
+                # Convert our date
+                log_date = datetime.datetime.strptime(log_date, "%a %b %d %H:%M:%S.%f %Y")
+                ftw_start = self.start
+                ftw_end = self.end
+                # If we have a log date in range
+                if log_date <= ftw_end and log_date >= ftw_start:
+                    our_logs.append(lline)
+                # If our log is from before FTW started stop
+                if(log_date < ftw_start):
+                    break
+        return our_logs
+
+@pytest.fixture
+def logchecker_obj():
+    return FooLogChecker()
+```
+
+Some notes here:
+  * The FooLogChecker inherits logcherk.LogChecker so FTW knows it can call the `get_logs()` method
+  * We initiate a decorated `@pytest.fixture` so we can pass in a `logchecker_obj` when `test_crs` is called
+  * The `test_crs()` method looks similar to most FTW integrations, except it has two extra fixtures: `with_journal` and `tablename`
+  * When running `py.test`, pass in `with_journal=/path/to/journal` and `tablename=name` so it can be passed to the testrunner correctly. This will ensure FTW will query the correct journalfile and tablename for the FTW response data
+  * Since each stage must be tested and queried, we pass in the `test` fixture and run through each stage with `for stage in test.stages`
+  * `runner.run_stage_with_journal` requires the name of the test, the test object, the with_journal path, tablename and the corresponding logchecker_obj
+
+Once you adhere to the new API call for the testrunner, that should be it! FTW will handle querying the sqlite table to get the correct rule-ids, stage-ids and times and return those log lines back to `get_logs()` to test on your log file.
diff --git a/ftw/pytest_plugin.py b/ftw/pytest_plugin.py
@@ -3,27 +3,6 @@
 import util
 import os
 
-def get_rulesets(ruledir, recurse):
-    """
-    List of ruleset objects extracted from the yaml directory
-    """
-    if os.path.isdir(ruledir) and recurse:
-        yaml_files = []
-        for root, dirs, files in os.walk(ruledir):
-            for name in files:
-                filename, file_extension = os.path.splitext(name)
-                if file_extension == '.yaml':
-                    yaml_files.append(os.path.join(root, name))
-    if os.path.isdir(ruledir) and not recurse:
-        yaml_files = util.get_files(ruledir, 'yaml')
-    elif os.path.isfile(ruledir):
-        yaml_files = [ruledir]
-    extracted_files = util.extract_yaml(yaml_files)
-    rulesets = []
-    for extracted_yaml in extracted_files:
-        rulesets.append(ruleset.Ruleset(extracted_yaml))
-    return rulesets
-
 def get_testdata(rulesets):
     """
     In order to do test-level parametrization (is this a word?), we have to
@@ -65,6 +44,20 @@ def http_serv_obj():
     """
     return HTTPServer(('localhost', 80), SimpleHTTPRequestHandler)
 
+@pytest.fixture
+def with_journal(request):
+    """
+    Return full path of the testing journal
+    """
+    return request.config.getoption('--with-journal')
+
+@pytest.fixture
+def tablename(request):
+    """
+    Set table name for journaling
+    """
+    return request.config.getoption('--tablename')
+
 def pytest_addoption(parser):
     """
     Adds command line options to py.test
@@ -77,6 +70,10 @@ def pytest_addoption(parser):
         help='fully qualified path to one rule')
     parser.addoption('--ruledir_recurse', action='store', default=None,
         help='walk the directory structure finding YAML files')        
+    parser.addoption('--with-journal', action='store', default=None,
+        help='pass in a journal database file to test')
+    parser.addoption('--tablename', action='store', default=None,
+        help='pass in a tablename to parse journal results')
 
 def pytest_generate_tests(metafunc):
     """
@@ -87,11 +84,11 @@ def pytest_generate_tests(metafunc):
     # Check if we have any arguments by creating a list of supplied args we want
     if [i for i in options if i in args and args[i] != None] :
         if metafunc.config.option.ruledir:
-            rulesets = get_rulesets(metafunc.config.option.ruledir, False)
+            rulesets = util.get_rulesets(metafunc.config.option.ruledir, False)
         if metafunc.config.option.ruledir_recurse:
-            rulesets = get_rulesets(metafunc.config.option.ruledir_recurse, True)            
+            rulesets = util.get_rulesets(metafunc.config.option.ruledir_recurse, True)            
         if metafunc.config.option.rule:
-            rulesets = get_rulesets(metafunc.config.option.rule, False)
+            rulesets = util.get_rulesets(metafunc.config.option.rule, False)
         if 'ruleset' in metafunc.fixturenames and 'test' in metafunc.fixturenames:
             metafunc.parametrize('ruleset,test', get_testdata(rulesets),
                 ids=test_id)
diff --git a/ftw/testrunner.py b/ftw/testrunner.py
@@ -1,10 +1,12 @@
 import datetime
+from dateutil import parser
 import errors
 import http
 import pytest
 import ruleset
 import util
 import re
+import sqlite3
 
 class TestRunner(object):
     """
@@ -54,6 +56,101 @@ def test_response(self, response_object, regex):
         else:
             assert False
 
+    def test_response_str(self, response, regex):
+        """
+        Checks if the response response contains a regex specified in the
+        output stage. It will assert that the regex is present.
+        """
+        if regex.search(response):
+            assert True
+        else:
+            assert False
+
+    def query_for_stage_results(self, tablename):
+        """
+        Construct query for sqlite database for a specific stage run from a journal
+        Possible SQL injection here, but since its sqlite and if someone had control of the python script
+        and the sqlite database, they can just open the database/modify it without using our program
+        """
+        q = 'SELECT * FROM %s WHERE stage = ? AND test_id = ?' % tablename
+        return q
+
+    def run_stage_with_journal(self, rule_id, test, journal_file, tablename, logger_obj):
+        """
+        Compare entries and responses in a journal file with a logger object
+        This will follow similar logic as run_stage, where a logger_obj.get_logs()
+        MUST be implemented by the user so times can be retrieved and compared
+        against the responses logged in the journal db
+        """
+        assert logger_obj is not None
+        conn = sqlite3.connect(journal_file)
+        conn.text_factory = str
+        cur = conn.cursor()
+        for i, stage in enumerate(test.stages):
+            '''
+            Query DB here for rule_id & test_title
+            Compare against logger_obj
+            '''
+            q = self.query_for_stage_results(tablename)
+            results = cur.execute(q, [i, test.test_title]).fetchall()
+            if len(results) == 0:
+                raise errors.TestError(
+                    'SQL Query did not return results for test',
+                    {
+                        'rule_id': rule_id,
+                        'test': test.test_title,
+                        'query': q,
+                        'stage_num': i,
+                        'function': 'testrunner.TestRunner.run_stage_with_journal'
+                    })
+            result = results[0]
+            start = parser.parse(result[2])
+            end = parser.parse(result[3])
+            response = result[4]
+            status = result[5]
+            if (stage.output.log_contains_str or stage.output.no_log_contains_str):
+                logger_obj.set_times(start, end)
+                lines = logger_obj.get_logs() 
+                if stage.output.log_contains_str:
+                    self.test_log(lines, stage.output.log_contains_str, False)
+                if stage.output.no_log_contains_str:
+                    # The last argument means that we should negate the resp
+                    self.test_log(lines, stage.output.no_log_contains_str, True)
+            if stage.output.response_contains_str:
+                self.test_response_str(response,
+                                   stage.output.response_contains_str)
+            if stage.output.status:
+                self.test_status(stage.output.status, status)
+
+    def run_test_build_journal(self, rule_id, test, journal_file, tablename):
+        """
+        Build journal entries from a test within a specified rule_id
+        Pass in the rule_id, test object, and path to journal_file 
+        DB MUST already be instantiated from util.instantiate_database()
+        """
+        conn = sqlite3.connect(journal_file)
+        conn.text_factory = str
+        cur = conn.cursor()
+        for i, stage in enumerate(test.stages):
+            response = None
+            status = None
+            try:
+                print 'Running test %s from rule file %s' % (test.test_title, rule_id)
+                http_ua = http.HttpUA()
+                start = datetime.datetime.now()
+                http_ua.send_request(stage.input)
+                response = http_ua.response_object.response
+                status = http_ua.response_object.status
+            except errors.TestError as e:
+                print '%s got error. %s' % (test.test_title, str(e))
+                response = str(e)
+                status = -1
+            finally:
+                end = datetime.datetime.now()
+                ins_q = util.get_insert_statement(tablename)
+                cur.execute(ins_q, (rule_id, test.test_title, start, end, response, status, i))
+                conn.commit()
+
     def run_stage(self, stage, logger_obj=None, http_ua=None):
         """
         Runs a stage in a test by building an httpua object with the stage