Skip to content
anjoola edited this page Oct 29, 2014 · 64 revisions

CS 121 Automation Tool

This tool was created by Angela Gong for Donnie Pinkston's CS 121 Introduction to Relational Databases course. It was made to automate grading SQL queries.

You can contact me at anjoola AT anjoola.com if there are any questions or issues regarding this grading tool.

Submission Format

Student submissions must follow a particular format in order for the automation tool to be able to distinguish each problem. The response to each problem must be preceded by the following tag:

-- [Problem 1a]

Here, 1a is the problem "number". This must be consistent with the problem number in the written assignment, which must be consistent with the specs. There can be no SQL before the first tag; if there is, it would not be run as part of the grading process, which might break all of the student's code.

Homework assignments might be split up into multiple files. The students must name their submitted files correctly or the automation tool will be unable to detect the files. The file names are specified in the assignment, and must match that in the specs.

How it Works

The automation tool works mainly in loops:

for each student:
    for each file:
        for each problem:
            for each test in the problem:
                <do stuff>

It's possible to grade multiple students at a time by running multiple instances of the tool using a different database or user (see Usage on how to specify a different user or database for the automation tool).

Parsing Files

Before a student's file can be graded, it must be parsed (see function parse_file in src/iotools.py). The parser first splits the file up into problems, which it can detect based on the -- [Problem #] tags mentioned in Submission Format. Then, the following must be parsed:

  • Comments - Since MySQL accepts different types of comments, they must be handled separately. The single-line comment can be easily detected by the --, but must be distinguished from a problem tag -- [Problem #]. There is special case for when the comment is found within the SQL code. This is an inline comment and is treated as SQL. The multi-line comment is enclosed by /* and */, and these symbols can appear anywhere on the line. It's also possible for single-line comments to appear within a multi-line comment, but since MySQL does not support nested comments, these are ignored and the comment is treated as a multi-line comment.
  • DELIMITER statements - These are tricky since they are used on the MySQL command-line tool, but are not recognized by any connector. Students will generally have these in their code since they should have tested it. Once the parser (via a call to preprocess_sql in src/sqltools.py) finds an instance of DELIMITER !, where ! is the new delimiter, all further instances of the symbol ! are replaced with the standard MySQL delimiter ;, until the delimiter is changed back again. This can be troublesome as students sometimes forget to include the ending DELIMITER statement, or they might include the delimiter in their comments.
  • SQL - Anything that does not fall into the above categories is assumed to be SQL.

Tests

Each problem will have one or more tests associated with it. These aim to test various aspects of the problem. For example, for a CREATE FUNCTION problem, the tests will try to elicit all possible outputs and error cases for the function. For other types of problems, like INSERTs, there is only one test since there is only one possible outcome of the INSERT statement.

For problems with multiple tests, each test is worth a certain number of points. Failing one test may result in partial credit for a problem; failing all of them will give 0 points.

However, the total number of points from a problem's tests may sum up to more than the problem is worth. The test points should be thought of as potential deductions instead of possible points. Deductions cannot result in the score going below 0, but it's possible to get 0 points when failing only some of the tests.

Problem Types

Each problem in an assignment can be classified into only one problem type (if it can be classified into more types, then the problem should be split into two or more problems). Each type has its own file/class in the the src/problemtype folder. The following are the problem types, and how the automation tool grades them using tests:

  • CREATE FUNCTION - The student's implementation of the function is first loaded into the database. The tests, which should be SELECT statements that call on the function, are run, and the resulting output is compared to the expected output (which need to be manually entered). The expected output type (int, float, string, etc.) must also be specified. This is so that if a function outputs a float such as 0.33333333, and the student truncates and outputs a 0.33333, they will not be marked incorrect.
  • CREATE PROCEDURE - Can only check procedures that modify a table. The test must run a procedure with specified arguments so the output is known. Once the procedure is run, resulting diff onto the table is computed and the grader must manually determine if the output is correct.
  • CREATE TABLE - Makes sure the CREATE TABLE SQL can run, but that's all it can check.
  • CREATE TRIGGER - Can only check triggers which modify a table and activate when another table is modified. First, the trigger is activated and a diff is generated for the other table. This is rolled back and a query (which corresponds to what the trigger should have done) is run on the other table and a diff is generated. These two diffs are compared.
  • DELETE - Runs the student's DELETE statement and the solution DELETE statement on the table and compares the rows deleted from the table. Uses transactions to ensure that the table is the same before and after each DELETE statement.
  • INSERT - Runs the student's INSERT statement and the solution INSERT statement on the table and compares the rows inserted into the table. Uses transactions to ensure that the table is the same before and after each INSERT statement.
  • SELECT - Runs the student's SELECT statement and the solution SELECT statement and compares the resulting rows selected.
  • UPDATE - Runs the student's UPDATE statement and the solution UPDATE statement and compares the diffs that results on the table that was updated.
  • manual - There are also some problems that do not fall into any category. These problems generally require a short answer response instead of SQL, or are much simpler to grade manually than to automate (a CREATE INDEX statement, for example). These are denoted as a manual problem type.

Partial Credit

Other than receiving partial credit for failing tests, the automation tool attempts to give partial credit for things that may have resulted from the student not reading the assignment closely. If any of these occur, they are treated as QueryErrors, which are defined in src/errors.py along with the point deductions. The code to check for these errors are found throughout the code in the src/problemtypes/ files.

For these kinds of errors, the student would have gotten the answer completely correct if they did a few more things, like ordered their columns correctly in their SELECT statement, or ordered their results with an ORDER BY. Thus, they are given full points with a minor deduction for the problem.

The automation tool checks for some of these errors by seeing if the student's response be correct had they done some extra steps. Below is a list of the potential QueryErrors and how they are detected:

  • Column ordering - If the student SELECTed columns in the wrong order. If the tuples representing the resulting rows in both the answer and the response are sorted, they should equal if there is nothing else wrong with the SELECT statement.
  • Ordering of rows - The student forgot an ORDER BY statement or used the wrong ordering. This can be checked by sorting all of the rows and seeing if they are the same.
  • View not updatable - The CREATE VIEW statement is run. Afterwards, the information schema is checked to see if the view is updatable.
  • Renaming computed values - This is done in a sort of naive way. The column names are checked to see if they have parentheses.
  • Wrong number of columns - The student selected fewer columns than required (which can be determined if the row's tuples has fewer or more values). There is no partial credit awarded for this, but this error exists in order to inform the TA during grading.
  • Naming a view incorrectly - The student names a view incorrectly. It is up to the TA to determine how many points to take off for this.

Generating Output

Output is generated as a JSON string, as detailed below. Some fields are optional and are problem type-specific.

{
  "start": <start grade time (string)>,
  "end": <end grade time (string)>,
  "files": <name of files that were graded (list of strings)>,
  "students": [{
    "name": <name of student (string)>,
    "files": <file-specific output (see below)>,
    "got_points": <total number of points the student received (int)>
  }]
}

Output for a file is of the form:

<filename (string)>: {
  "filename": <name of file (string)>,
  "got_points": <total number of points received for this file (int)>,
  "errors": <list of errors for this file (list of strings)>,
  "problems": [
    {
      "num": <problem number (string)>,
      "sql": <student's response to the question (string)>,
      "comments": <student's comments (string)>,
      "got_points": <number of points received for this problem (int)>,
      "errors": <list of errors from this problem (list of strings)>,
      "tests": [
        {
          "deductions": <list of deductions from this test (list of strings)>,
          "errors": <list of errors from this test (list of strings)>,
          "got_points": <number of points received for this test (int)>,
          "success": <result of the test (SuccessType found in src/types.py)>,
          "expected": <expected output (string)> [optional],
          "actual": <actual output (string)> [optional],
          "before": <contents of a table before running the student's query (string)> [optional],
          "after": <contents of a table after running the student's query (string)> [optional]
        }
        .. more tests ..
      ]
    }
    .. more problems ..
  ]
}

The JSON is then taken and formatted into HTML for easier viewing with the format and format_student functions in src/formatter.py. format generates the overall page with links to each student and format_student generates all the test output for a particular student.

Other resources can be found in the style/ folder, which is copied to each output directory if it doesn't already exist. These files and the formatting functions can be modified to change the way the output looks.

How to Grade

While the automation tool makes it very easy to check for correctness, it is by no means comprehensive. It should be treated more as an aid in grading, and the TA should still read through the student's responses. The points are determined by giving the student full points for each problem, minus the deductions from failing tests as specified in the specs. Obviously, since this is an automated tool, a test can only pass or fail; there is no partial credit within tests (other than those listed above). Therefore, it is up to the TA to use their best judgement to give partial credit.

Below is a list of some of the things the tool will not do perfectly:

  • Poor file formatting - If the student does not format their file correctly, it's possible that the tool cannot find any of their answers and will give them an automatic 0. Or, if they put necessary code before the problem header, then it won't be run and all their other queries could fail.
  • Missing answers - If a student does not answer a problem and another problem depends on it (for example, they do not answer a CREATE FUNCTION problem and another problem asks to use it in a SELECT statement), then they will be marked wrong for both questions even though the second one might be correct.
  • Style issues - The automation tool will indicate style issues, such as having too many lines over 80 characters. However, since it's possible that there was no other way the student could have written their file, the tool does not automatically deduct points, and it is up to the TA to do so.
  • Missing temporary stuff - Some students have queries that use temporary tables or functions. They may have forgotten to include it in their submission, causing all problems that depend on these temporary things to fail.
  • Connection problems - Obviously if the grading tool cannot connect to the database server, then it the connection will timeout and it will mark the student responses as incorrect (since they were taking much longer than usual).
  • Strange characters - Students might not have packaged their file correctly or used standard encoding. As a result, files may have extra strange characters that result in the tool being unable to read the file, or attempt to execute a corrupt SQL query.

Examining Output

After running the tool, output is generated at the location assignments/<assignment name>/_results/index.html. Here you can navigate each student's output to determine their grade. You can change students by using the left sidebar and change files using the top bar. You are also able to show and hide the student's raw file by clicking the yellow button on the bottom right.

Student Output

You can also generate output for students to view using the --hide flag as described in Usage. Students can view how the automation tool parsed their SQL, and whether or not they passed the tests. Note: Since test output is hidden, students do not get a complete view of their results, so a problem might be marked wrong when it actually isn't, due to limitations of the tool.

Tools and Backend

The automation tool has various elements working in the backend to ensure that everything is running smoothly and quickly.

Cache

The cache, located in cache.py, allows for faster grading by caching query results. This is used most often for SELECT statements. If caching is enabled, then prior to running a query against the database, the tool first checks the cache to see if there is a stored result. If so, an actual query against the database is avoided, saving time.

The cache returns a deep copy of the stored results, since it is possible that the results are modified for formatting purposes.

Results are not cached for other types of queries like DELETE statements, since there aren't actually any results returned by such queries. Caching is disabled by default but can be enabled by specifying cached=True when using the execute_sql function in src/dbtools.py.

Consistency

Since the automation tool will be grading multiple students in succession, it is critical that the database on which the students are graded on remains consistent. There are several measures in place to do so:

  • Transactions and Savepoints - For problems that modify the database, transactions and savepoints are used to ensure that the database is not modified after grading that problem if specified by the specs using the rollback option. If rollback is specified, then a transaction will be started, for example, prior to running the student's DELETE statement, and rolled back once the solution DELETE statement is run. That means the database will be in the same state before and after the problem is graded, as if nothing had happened.
  • Using the information schema - Prior to grading each student, the state of the database (which includes the tables (with foreign keys), views, functions, procedures, and triggers) is determined. After grading a student, the state of the database is determined again. Any new tables, triggers, etc. that have appeared are dropped/deleted such that the state returns to the original one before grading began.
  • Setup and teardown queries - Each grading run, each problem, and each problem's test can specify a set of setup and teardown queries (see the specs for more details). The setup queries are run before and the teardown queries are run afterwards. The setup queries are usually there to create indices or other miscellaneous queries that can't generated automatically. The teardown queries are also used to run queries for cleanup that can't be generated automatically.

Unruly Queries

Since this is an automated tool, it is essential for it to be protected against unruly queries, such as queries that take too long or those that harm the database.

Malformed or Unexpected SQL

Students often have extra SQL in their responses. For example, for a SELECT question, they might have created a temporary table and SELECTed from that. For a CREATE TABLE statement, they might have random SELECT statements in their response for no reason. The automation tool attempts to fix this by deciding whether or not extra SQL statements are valid and then taking only the valid parts (via find_valid_sql in src/sqltools.py, among other functions in the specific problem-type files).

Timeouts and the Terminator

Queries have a maximum time to execute before they timeout; the default value MAX_TIMEOUT is specified in the src/CONFIG.py file, with specific values for each problem assigned as timeout in the spec file. This limit is there to ensure that grading does not take forever. It is also there as a indicator that there might be something wrong with the student's query if it is taking such a long time to execute. (But be sure to check that it isn't an internet connection problem).

The src/terminator.py daemon is used to terminate these unruly queries. It does this by maintaining a separate connection with the database. When a query times out, a TimeoutError is raised and caught, and the automation tool notifies the terminator to terminate the thread running the query that caused this error. The automation tool keeps track of this thread by its thread_id.

Malicious Queries

An example of a malicious query is as follows: instead of an expected SELECT statement, the student may instead write a DROP DATABASE statement for the tool to execute. The automation tool aims to protect against these queries by making sure that prior to running each query on a database, the query being run is expected. So if the tool is grading an INSERT statement, it expects to find INSERT in the SQL, and there can only be one SQL statement. This code can be found in the check_valid_query function in src/sqltools.py. Currently this is turned off because students like to put code before their answer, causing this function to return false negatives. Really, this function needs to be improved.

Of course, if a student really wanted to be evil, the automation tool cannot protect against them, as the check_valid_query is a very basic function. Therefore when setting up the test user on the testing database, make sure that critical permissions (such as dropping the database, etc.) are disabled.

Utilities

There are several files that have just a bunch of useful functions.

Database Tools (src/dbtools.py)

This is the file that contains all database-related things, such as the connection and queries. All other files that interact with the database must go through functions in this file first. The reason it is all in one file is so that if the database library changes, or you switch from MySQL to PostgreSQL, you'll only need to modify one file. Other related functions are here too, such as a function to import SQL files into the database.

IO Tools (src/iotools.py)

All input/output related functions are found in this file. This includes functions to parse assignment specs and student files, get student files from folders, and output results.

SQL Tools (src/sqltools.py)

Functions related to SQL are found here, including processing SQL for DELIMITER statements, checking that a SQL statement is not malicious, finding valid SQL within a large SQL statement, and splitting SQL queries.

Spec Files

Each assignment should have one spec file in the assignment folder. Spec files for the CS 121 assignments can be found here. The spec file specifies how each problem is graded, including the number of points, the tests to run, dependencies, and other properties.

General Format

The overall format of the file is as specified below:

{
  "assignment": <assignment name (string)>,
  "files": <files to grade (list of strings)>,
  "setup": [
    # "setup" can be one of three types, "dependency", "import", or "queries"
    {"type": "dependency", "file": <file containing SQL to run prior to grading (string)>},
    {"type": "import":, "file": <SQL data file to directly import using command-line (string)>},
    {"type": "queries": "queries": <queries to run (list of strings)>}
  ],
  "teardown": <teardown queries (list of strings)>,

  <filename (string)>: [
    {
      "number": <problem number (string)>,
      "points": <point value (int)>,
      "comments": <if comments are required (boolean)>,
      "cached": <whether or not to get the results from the cache (boolean)>,
      "setup": <setup queries (list of strings)>,
      "teardown": <teardown queries (list of strings)>,
      "keywords": <keywords to check for (list of strings)>,
      "type": <problem type, such as 'select', 'function', etc. (string)>,
      "tests": [
        {
          "points": <point value (int)>,
          "desc": <description of test (string)>,
          "query": <test query (string)>,
          "timeout": <how long to run the query for in seconds before giving up (int)>,

          ... fields specific to the problem type ...
        },
      ]
    }
  ]
}

An example of this:

{
  "assignment": "cs121hw9",
  "files": ["file1.sql", "file2.sql"],
  "setup": [
    {"type": "dependency", "file": "make-database.sql"},
    {"type": "queries": "queries": [
      "CREATE INDEX fast_index(tbl1)",
      "DROP VIEW IF EXISTS view1"
    ]}
  ],
  "teardown": [
    "DROP VIEW IF EXISTS view2"
  ],

  "file1.sql": [
    {
      "number": "1a",
      "points": 5,
      "comments": true,
      "setup": ["START TRANSACTION"],
      "teardown": ["DROP FUNCTION IF EXISTS fn1"],
      "type": "select",
      "tests": [
        {
          "points": 3,
          "desc": "Simple test to get what's in the table",
          "query": "SELECT * FROM tbl1"
        },
        {
          "points": 4,
          ...
        }
      ]
    },
    {
      "number": "2",
      ...
    }
  ],

  "file2.sql": [
    ...
  ]
}

Problem-Specific Format

Each problem type has problem-specific fields as well; these are detailed here.

CREATE FUNCTION

"type": <expected output type (one of string, boolean, int, or float)>
"expected": <the expected result (string)>
"run-query": <whether or not to run the student's CREATE FUNCTION query; should only be done for the first test for each CREATE FUNCTION problem (boolean)>
"setup": <setup before running the query (string)>
"teardown": <teardown after running the query (string)>

CREATE PROCEDURE

"run-query": <whether or not to run the CREATE PROCEDURE statement; should be done for the first test for each CREATE PROCEDURE problem (boolean)>
"table": <the table the procedure is modifying (string)>
"setup": <setup before running the query (string)>
"teardown": <teardown after running the query (string)>

CREATE TABLE

"run-query": <whether or not to run the student's CREATE TABLE statement (boolean)>

CREATE TRIGGER

"query": <query to run to trigger the trigger (string)>
"run-query": <whether or not to run the student's CREATE TRIGGER statement (boolean)>
"expected": <the query to generate the expected results of the trigger (string)>
"actual": <the query to check the actual results of the trigger (string)>

CREATE VIEW

"view": <the name of the view (string)>
"select": <the solution SELECT statement to create the view (string)>
"query": <the solution CREATE VIEW statement (string)>
"updatable": <whether or not to check if the view is updatable (boolean)>

DELETE

"table": <the table that is being deleted from (string)>
"columns": <columns to compare from the table being deleted from (list of strings)>
"rollback": <whether or not to rollback the changes after testing (boolean)>

INSERT

"table": <the table that is being inserted into (string)>
"columns": <columns to compare from the table being inserted into (list of strings)>
"rollback": <whether or not to rollback the changes after testing (boolean)>

SELECT

"column-order": <whether or not the order of the columns matter (boolean)>
"ordered": <whether or not the results need to be in a specific order (boolean)>
"rename": <whether or not to check if they've renamed aggregate-function columns (boolean)>

UPDATE

"table": <the table being updated (string)>
"columns": <columns to compare from the table being updated (list of strings)>
"rollback": <whether or not to rollback the changes after testing (boolean)>

manual

"run-query": <whether or not to run the SQL (boolean)>

Stylechecker

A stylechecker is provided for both students and the automation tool. These two must be kept in sync (especially the regular expressions).

For Students

The student version can be found in check.py. In order to run it, execute the following command:

python check.py filename1 [filename2 ...]

It will warn students of any style mistakes, and which lines they are on. Ideally the student will run this prior to submitting, to make it easier for both the automation tool and the TA to read. Since the automation tool uses the same stylechecker, there is no excuse for the student to have style errors.

Automation Tool

The automation tool also uses a stylechecker and suggests point deductions from the student's total points. When parsing the file, a style mistake is handled as a StyleError, as defined in src/errors.py. Deductions for different types of style errors are defined as class constants in the StyleError class.