Skip to content

An alternative to the sqlcmd utility of Microsoft. Can load CSV in bulk. Great for CI/CD pipelines that run integration tests.

License

Notifications You must be signed in to change notification settings

jwbargsten/go-mssql-load

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

go-mssql-load - load sql scripts and test data into MSSQL databases

This is a small tool that serves two main purposes

  • help with populating a MSSQL db with schema and test data during integration testing. This can be on your local machine or part of a CI/CD pipeline.
  • provide a simple way for (data) pipelines to set up the schema of a db where the data is exported to.

Use Cases

Integration Testing

The sqlcmd utility is usually the go-to solution for interacting with MSSQL databases. However, it can be difficult to install and use, especially in the context of Apple M1 machines. That can pose a problem to developers, because now two different ways of dealing with the database exist. One in the CI/CD pipeline, that uses sqlcmd and one on the developer's local machine.

To make the DB setup process consistent, go-mssql-load can be used.

Set up a DB schema

Every Java developer will probably scream FLYWAY or similar, but let's assume that your group is more interdisciplinary, consisting of devs and data engineers. Having a simple CLI tool to setup up the DB can help with shared ownership of the schema.

Bulk load test data

You can bulk load CSV data. This allows you to populate a test db with some or many rows. In particular for data engineering purposes it is a good way to load a bigger chunk of data for a more realistic feeling.

Installation

You need to have go installed.

To install it directly:

go get github.com/jwbargsten/go-mssql-load

If you want to install it from the repo, you can run

$ go build

which will create the executable ./go-mssql-load or install it into $GOPATH/bin using

$ go install

Usage

Before you can use go-mssql-load, you have to have a mssql db running. The easiest way would be to use one of the official Microsoft docker containers (Azure SQL edge is mostly compatible with mssql and it runs on M1 machines):

$ docker run --rm -p 1433:1433 \
  -e "ACCEPT_EULA=1" \
  -e "MSSQL_SA_PASSWORD=Passw0rd" \
  mcr.microsoft.com/azure-sql-edge:latest

Basic usage

With docker running, you can start playing:

# print the help/usage
$ go-mssql-load help

# print the DSN
$ go-mssql-load printdsn

# check the connection for localhost:1433
$ go-mssql-load --user sa --pass Passw0rd check

# load some example data
$ go-mssql-load --user sa --pass Passw0rd loadsql sql/init.sql

# try a query
$ echo "select * from pokemon.pokemon" | go-mssql-load --user sa --pass Passw0rd querysql -
# try a query, ignore logging
$ echo "select * from pokemon.pokemon" | go-mssql-load --user sa --pass Passw0rd querysql - 2>/dev/null

CSV loading

You can use this tool to do CSV bulk loading. By default all columns are treated as string, but you can specify a data type as

  • part of the column name or
  • external definition by using the --types parameter

loadcsv doesn't have any parsing magic and uses the csv parser provided by the go std lib. So, if you don't specify it, it won't happen. The spec is as follows:

<name>::<datatype>[!]

with an optional ! indicating a nullable column.

column spec column name data type nullable
name::string! name string yes
name::! name string yes
name name string no
age::int! age int64 yes
height::float! age float64 yes
height::float age float64 no

Internally the golang parse functions are used.

Supported types:

  • int » strconv.ParseInt(v, 10, 64)
  • float » strconv.ParseFloat(v, 64)
  • bool » custom parsing, anything that looks like: TRUE, true, T, t, YES, yes, Y, y, 1 is considered true.
  • string as default

Example: ./sql/pokemon_typed.csv

You can set the null string and the separator via cli flags. The null string is by default the empty string "".

The TAB character is a bit tricky to specify, but you can just supply a quoted TAB to parse TSV files:

$ go-mssql-load \
    --user sa --pass Passw0rd loadcsv --sep "	" \
    pokemon.pokemon sql/pokemon_typed.csv

Only columns that have the nullable flag ! will use the nullstr flag.

As mentioned in the beginning, you can also supply an external types file in JSON format. There are two options supported, as dict or as list.

The dict looks as follows (order is not important):

{
  "colname2": "int",
  "colname1": "string!",
  "colname3": "float!"
}

Or the same as list (order is important):

[
  "string!",
  "int",
  "float!"
]

You can add the types via the --types parameter:

$ go-mssql-load --user sa --pass Passw0rd loadcsv \
    --sep ";" \
    --types sql/pokemon_types.json \
    pokemon.pokemon sql/pokemon.csv

SQL execution

go-mssql-load uses the batch mechanism of the go-mssqldb lib. This means that each file read and split into statements separated by a GO keyword.

Example: ./sql/init.sql.

$ go-mssql-load --user sa --pass Passw0rd loadsql sql/init.sql

SQL querying

Similar to SQL execution, query scripts are split by the keyword GO. This means you can have multiple query statements per file. Each query results in a set of Newline Delimited JSON records, separated by ---.

Example: ./sql/select.sql

$ go-mssql-load --user sa --pass Passw0rd querysql sql/select.sql  2>/dev/null
{"name":"Wartortle"}
---
{"hp":4}

About

An alternative to the sqlcmd utility of Microsoft. Can load CSV in bulk. Great for CI/CD pipelines that run integration tests.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published