Skip to content

Latest commit

 

History

History
51 lines (35 loc) · 5.34 KB

File metadata and controls

51 lines (35 loc) · 5.34 KB

SQL Server ML Services - Performance Test Samples

The sample runtest.R script provided in this repository demonstrates the benefits of following various tips for improving performance of R script running in SQL compute context.

The test script helps to evaluate the performance gains that can be reailized when following the various tips outlined in the SQL Server ML Services Performance Tuning document.

To run these tests, the user needs access to SQL Server with ML Services enabled. The user is also assumed to be familiar with some basic concepts of using SQL Server ML Services.

The directory contents are as follows:

Data This folder contains the data files needed before running the scripts. Due to size limit in Github, the files were not uploaded to Git. You need to download them from the following links and store them under Data folder. You should also decompress airline-cleaned-10M.7z file before running the scripts.

  • createdb.sql

    Creates the PerfTuning database if it does not exist already. The logged in user should have the permissions to create the database and tables.

  • creattables.R

    Creates airline and airlineWithIntCol tables. These tables should be created first.

  • createtablewithindex.sql

    Creates the airlineWithIndex table with an index. This script can be run after running creatables.R.

  • createcolumnartable.sql

    Creates an airline columnar table. This script can be run after running creatableswithindex.sql.

  • createpagecompressedtable.sql

    Creates an airline table with page compression enabled. This script can be run after running creatableswithindex.sql.

  • createrowcompressedtable.sql

    Creates an airline table with row compression enabled. This script can be run after running creatableswithindex.sql.

  • createall.cmd

    A script that creates all tables. Create PerfTuning DB before running this script. Also, ensure that RScript and sqlcmd.exe are in your path.

  • createrdatatable.sql

    Creates the rdata table needed to store models. See searialization.R file for more information.

  • runtests.R

    This is the main test driver that uses PerfTuning database. Runs many tests to see how certain steps taken affect the performance.

  • serialization.R

    Contains helper functions to searialize/deserialize models stored in database.

  • scoring.sql

    Performs batch prediction using CRANR predict method. This depends on the model generated by the lm method. Run SaveLMModel test to store the model in DB before running this script.

To run a .R script, use your favorite IDE, source the file, and then call the function(s) defined in that file.

To run a .sql script, ensure that you have sqlcmd.exe in your path and then invoke sqlcmd -i .

There are 6 tables that are needed in the database PerfTuning before the runtests.R can be used for running the tests. The attached scripts help to create them using data from xdf and csv files in the data folder.

Alternatively, the user can download the database files using the following links and skip creating them using the scripts mentioned above.

To create the database, download the 2 files from the links below into some folder. Restore the database from the downloaded location. Assuming they are downloaded to folder D:\sql, the following command can be used to restore them.

RESTORE DATABASE PerfTuning FROM DISK = 'D:\sql\PerfTuning1.bak', DISK = 'D:\sql\PerfTuning2.bak' WITH REPLACE;

https://sqlrperftuning.blob.core.windows.net/perftuningdb/PerfTuning1.bak https://sqlrperftuning.blob.core.windows.net/perftuningdb/PerfTuning2.bak

Steps

  • Ensure that you have access to SQL Server with ML Services enabled. You should also have permission to create or access the PerfTuning database.
  • Create the PerfTuning database and add tables using the attached R and SQL scripts. You can either run createall.cmd or run the other db and table creating scripts one by one. To expedite, you can skip running the scripts by downloading the database using links above and restoring.
  • Install the dependant RODBC package, if not installed. Ensure that it is installed in the right library where Microsoft RevoScaleR package was installed. (If you ran creatall.cmd, it will install this package. Update the file if lib path needs to be specified)
  • Update the runtests.R file to match your connection string and data directories.
  • To store models, the code depends on rdata table. Ensure that it exists in the database. You should also enable power shell on the client machine. See serialization.R for more information.
  • To run the tests, open your R IDE and set the working directory to the one with the scripts. Source the file runtests.R and call runTests(testsToRun, "output", 1, 500000L)