Skip to content

ML Studio (classic) setup

Adam edited this page May 13, 2021 · 1 revision

Note: Microsoft now refers to many of the following concepts as ML Studio (classic). This is due to the newer version available through the Azure Portal known as Azure Machine Learning. You can read about the differences of the two platforms here.

To enable the connector to leverage Azure Machine Learning in order to predict Work Item Type (IR/SR), Classification/Area, Support Group, and optionally the Impacted Configuration Item(s), you'll need a Azure ML Studio account, the respective experiments, and web service for the connector consume. Azure Machine Learning is available in Free and Standard tiers.

  • Free tier: Can only be deployed by signing up directly through the Azure ML Studio home page. If you complete the walkthrough seen below in a free tier just to experiment, you can choose to later move to a Standard (paid) Tier afterwards and copy projects between tiers. This will require models to be retrained. The pricing page seen next in Standard Tier features an FAQ at the bottom detailing these short steps to migrate after the fact. To skip over the paid configuration, go directly to Step 4 below underneath "Create an Azure Machine Learning Studio Workspace"
  • Standard tier: This is the only other tier available. It comes with an associated cost seen here and is deployed through the Azure ARM portal. This walkthrough begins with the deployment of the Standard Tier as though you were preparing for a production deployment.

Create an Azure Machine Learning Studio Workspace

  1. Sign into https://portal.azure.com with the credentials/account you'll use to create the service.
  2. Create a New Resource, searching for "Machine Learning Studio Workspace" and choosing Create.

createAML

createAML

Next we'll need to provide some information about the Machine Learning Studio we're creating. We'll need to give our workspace a name, choose our subscription, the Resource Group, the Azure data center/location we want this hosted in (South Central is one of the few places ML Studio can be deployed), the storage account we'll be keeping our data in, our pricing tier (at the time of this writing, Standard is the only option when deploying ML Studio through the Azure ARM Portal.), a web service plan, and finally the pricing tier for the web service we'll eventually deploy and the SMLets Exchange Connector will consume. If you're new to the idea of Resource Groups, they are fairly analogous to the concept of System Center Management Packs in that you keep "like" things together. For example, you could create a Resource Group called MachineLearningSCSM and it in throw other ML services/engines/projects/experiments.

createAML

  1. Once created, let's head into our new resource and from there click Launch the Azure Machine Learning Studio seen within "Additional links"

createAML

  1. With our AML Studio up and running, next let's copy the Experiments over to your own Azure subscription/ML workspace. You have one of two ways to do this. Note: If you are looking to Predict CIs, you'll need to make some small modifications to the experiments seen on Option 2. SQL Queries and the modifications can be found here:

    (Option 1) This ML solution is entirely the result of the hard work of Microsoft Principal Software Engineering Manager Shashi Lanka Venkata as he's setup the entire basis of what we'll need to get the SMLets Exchange Connector leveraging our ML results. His original projects can be copied from below in the Azure AI Gallery. This provides the most detailed walkthrough and an opportunity to understand how to tune and refine your training models per his original documentation. Once copied, this will require changing the initial data set used from Shashi Lanka Venkata from Azure Blog Storage to the CSV files you upload from your SCSM DW results.

    Not to mention, the sample dataset of 50,000 tickets comes from another Azure Machine Learning Ticket Classification project which is the result of collaboration between Microsoft and Endava.

    (Option 2) I've copied both Shashi Lanka Venkata's ML Experiments, Endava's sample data set, and trained models into a project I've published on my Azure ML account. Given Endava's dataset doesn't have a Support Group column I used their Priority field to translate into a custom L1 through L5 Support Groups. So the lowest Priority tickets I made Level 1, next lowest Level 2, and so on. This bundled version includes all the linking done for you so you can quickly see this in action for yourself before uploading your own CSV (which when ready, you can mark to simply replace the included sample one cited above) and enabling it on the SMLets Exchange Connector.

    (Option 3) In order to predict the Impacted Configuration Item(s) in addition to the 3 Work Item data points, you'll need to copy the experiments from Option 2 and make small series of modifications.

    1. SCSM Email Work Item Classification
    • the "Apply SQL Transformation" step near the top which follows your Ticket dataset needs to change to include the new Affected_Config_Items column. This can be done with the following:
      select Affected_Config_Items,Case_CaseType, Queue_Name,Case_Subject, Email_Subject || ' ' || Email_Description as Email_Text from t1;
    • from left to right after the "Feature Hashing" step seen below, the first 3 "Select Columns in Dataset" need to add a new column to exclude which is: Affected_Config_Items
    • a wholly new "branch" if you will (e.g. all of the steps that follow the "Feature Hashing" step) need to be made for Config Items. From left to right, just duplicate the 3rd branch and connect it up. This new branch...
      • "Select Columns in Dataset" needs to exclude Case_Subject, Case_CaseType, Queue_Name
      • "Filter Based Feature Selection" should select "Affected_Config_Items" as its column
      • The two "Split Data" steps should select "Affected_Config_Items" as their column
      • "Tune Model Hyperparameters" should select "Affected_Config_Items" as its column configpredict01
  2. SCSM Email Work Item (Create Web service w/ trained models)

    • similiar to the first experiment, need a new Score Model from "Feature Hashing". This connects to a new "Execute R Script"
    • the new "Execute R Script" needs to be modified on Line 16 to update the column names being used to the following: colnames(data.set) <- c("Affected_Config_Items","Affected_Config_Items_Probability")
    • daisy chain the new "Execute R Script" into another "Add Columns" to ultimatley distill all 4 branches into the single "Web Service Output" configpredict02
  3. Train the Models (ML Experiment 1 of 2)

  4. Publish the Web Service (ML Experiment 2 of 2)

Get data from SCSM DW by running a SQL query and exporting to CSV
We have one of two ways of achieving this:

  1. Connect AML directly to your SCSM data warehouse to perform future uploads with the push of a button. This can be configured via the following Microsoft documentation: https://docs.microsoft.com/en-us/azure/machine-learning/studio/use-data-from-an-on-premises-sql-server and requires a Paid AML account.
  2. Run a query against your SCSM DW, export results as a CSV, and then upload the CSV to Azure. This is the option we'll choose in this tutorial as it's a task you can do on your own and rather quickly.

We'll be going with option 2 for this tutorial so lets run the following query against our SCSM DW. This will output Closed Incidents and Service Requests into a common format we can export to CSV.

SELECT ir.Title as 'Email_Subject',
ir.Description as 'Email_Description',
ic.EnumTypeId as 'Case_Subject',
tq.EnumTypeId as 'Queue_Name',
'ir' as 'Case_CaseType'
FROM IncidentDim as ir
    inner join IncidentTierQueuesvw as tq on ir.TierQueue_IncidentTierQueuesId = tq.IncidentTierQueuesId
    inner join IncidentClassification as ic on ir.Classification_IncidentClassificationId = ic.IncidentClassificationId
WHERE ir.Source like '%email%'
    and ir.Status like '%Closed%'
AND ir.createddate > '1-1-2019'

UNION ALL

SELECT sr.Title as 'Email_Subject',
sr.Description as 'Email_Description',
sa.EnumTypeId as 'Case_Subject',
sg.EnumTypeId as 'Queue_Name',
'sr' as 'Case_CaseType'
FROM ServiceRequestDim as sr
    inner join ServiceRequestSupportGroup as sg on sr.SupportGroup_ServiceRequestSupportGroupId = sg.ServiceRequestSupportGroupId
    inner join ServiceRequestArea as sa on sr.Area_ServiceRequestAreaId = sa.ServiceRequestAreaId
WHERE sr.Source like '%email%'
    and sr.Status like '%Closed%'
AND sr.createddate > '1-1-2019'

As you can see, we really have the same query twice just ever so slightly different to grab Incidents and Service Requests. That means this is your opportunity to tailor the date range or limit the scope of Azure Machine Learning by doing something like:

  • only selecting Work Items for 3 of your 10 Support Groups. Thereby ensuring ML could only ever suggest those 3 Support Groups
  • cherry picking select months of data as they may represent more accurate data than others

If you are going to predicting Configuration Items, you'll need a slightly different set of queries to include CIs in the resulting dataset. If you aren't going to be predicting Config Items, just skip past the next two SQL queries.

Incidents with Impacted Configuration Items

SELECT DISTINCT
    ir.title as 'Email_Subject',
    ir.Description as 'Email_Description',
    ic.EnumTypeId as 'Case_Subject',
    tq.EnumTypeId as 'Queue_Name',
    'ir' as 'Case_CaseType',
    STUFF((SELECT DISTINCT ',' + cast(ci2.BaseManagedEntityId as varchar(38))
            FROM IncidentDim as ir2
                INNER JOIN EntityDim as e2 on ir2.entitydimkey = e2.entitydimkey
                INNER JOIN workitemdim as wi2 on e2.entitydimkey = wi2.entitydimkey
                INNER JOIN workitemaboutconfigitemfactvw as wiaci2 on wi2.workitemdimkey = wiaci2.workitemdimkey
                INNER JOIN configitemdim as ci2 on wiaci2.workitemaboutconfigitem_configitemdimkey = ci2.configitemdimkey
            WHERE ir2.Id = ir.Id
            FOR XML PATH('')), 1, 1, '') as [Affected_Config_Items]
FROM IncidentDim as ir
    INNER JOIN IncidentTierQueuesvw as tq on ir.TierQueue_IncidentTierQueuesId = tq.IncidentTierQueuesId
    INNER JOIN IncidentClassification as ic on ir.Classification_IncidentClassificationId = ic.IncidentClassificationId
    INNER JOIN EntityDim as e on ir.entitydimkey = e.entitydimkey
    INNER JOIN workitemdim as wi on e.entitydimkey = wi.entitydimkey
    INNER JOIN workitemaboutconfigitemfactvw as wiaci on wi.workitemdimkey = wiaci.workitemdimkey
    INNER JOIN configitemdim as ci on wiaci.workitemaboutconfigitem_configitemdimkey = ci.configitemdimkey
WHERE ir.createddate > '1-1-2020'
    and wiaci.DeletedDate is null
    and ir.Source like '%email%'
    and ir.Title is not null
    and ir.Description is not null
    and ir.Status like '%Closed%'

group by ir.Id,
    ir.title,
    ir.Description,
    ic.EnumTypeId,
    tq.EnumTypeId

and then

Service Requests with Impacted Configuration Items

SELECT DISTINCT
    sr.title as 'Email_Subject',
    sr.Description as 'Email_Description',
    sa.EnumTypeId as 'Case_Subject',
    sg.EnumTypeId as 'Queue_Name',
    'sr' as 'Case_CaseType',
    STUFF((SELECT DISTINCT ',' + cast(ci2.BaseManagedEntityId as varchar(38))
            FROM servicerequestdim as sr2
                INNER JOIN EntityDim as e2 on sr2.entitydimkey = e2.entitydimkey
                INNER JOIN workitemdim as wi2 on e2.entitydimkey = wi2.entitydimkey
                INNER JOIN workitemaboutconfigitemfactvw as wiaci2 on wi2.workitemdimkey = wiaci2.workitemdimkey
                INNER JOIN configitemdim as ci2 on wiaci2.workitemaboutconfigitem_configitemdimkey = ci2.configitemdimkey
            WHERE sr2.Id = sr.Id
            FOR XML PATH('')), 1, 1, '') as [Affected_Config_Items]
FROM servicerequestdim as sr
    INNER JOIN ServiceRequestSupportGroup as sg on sr.SupportGroup_ServiceRequestSupportGroupId = sg.ServiceRequestSupportGroupId
    INNER JOIN ServiceRequestArea as sa on sr.Area_ServiceRequestAreaId = sa.ServiceRequestAreaId
    INNER JOIN EntityDim as e on sr.entitydimkey = e.entitydimkey
    INNER JOIN workitemdim as wi on e.entitydimkey = wi.entitydimkey
    INNER JOIN workitemaboutconfigitemfactvw as wiaci on wi.workitemdimkey = wiaci.workitemdimkey
    INNER JOIN configitemdim as ci on wiaci.workitemaboutconfigitem_configitemdimkey = ci.configitemdimkey
WHERE sr.createddate > '1-1-2020'
    and wiaci.DeletedDate is null
    and sr.Source like '%email%'
    and sr.Title is not null
    and sr.Description is not null
    and sr.Status like '%Closed%'

group by sr.Id,
    sr.title,
    sr.Description,
    sa.EnumTypeId,
    sg.EnumTypeId

You'll also notice this query is only looking against Closed Work Items. The rationale here being a Work Item that is in some active status of sorts may potentially still be going through changes which could adversely affect the ML training. The Support Group that is returned is the GUID of the Support Group, which means the AML suggestion will return a value that can be immediatley set via Set-SCSMObject. But however you choose to run the queries, save the results out a CSV called "Tickets.csv".

Next, you can either run another query to return just Email data OR you can take the results of the above and simply remove all columns that aren't Email_Subject and Email_Description. :

SELECT ir.Title as 'Email_Subject',
ir.Description as 'Email_Description'
FROM IncidentDim as ir
WHERE ir.Source like '%email%'
    and ir.Status like '%Closed%'
AND ir.createddate > '1-1-2019'

UNION ALL

SELECT sr.Title as 'Email_Subject',
sr.Description as 'Email_Description'
FROM ServiceRequestDim as sr
WHERE sr.Source like '%email%'
    and sr.Status like '%Closed%'
AND sr.createddate > '1-1-2019'

This should only be showing us the Work Item Title and Description, which is to say the Email Subject and Body since these Work Items were created from Email as per their Source. Let's save these results out as "Emails.csv". So at this point we should have two CSV

  1. Ticket.csv = Which contains 5 or 6 columns
  • Email_Subject
  • Email_Description
  • Case_Subject
  • Queue_Name
  • Case_CaseType
  • Affected_Config_Items (you'll only have this column if you are going to predict Config Items)
  1. Emails.csv = Which only contains 2 columns. This should be the identical columns contained in Email.csv
  • Email_Subject
  • Email_Description

The one thing you should not do (however tempting it might be) is to change the aliases used for the columns. The ML experiments are using hardcoded column names, so if you change something here in the queries when you attempt to train the model it will fail unless you perform additional modifications within the Experiments. It's doable, but could lead to some initial deployment frustration.

Now let's upload the CSVs as our new data sets. If you've copied my Project, you'll want to update the Existing Dataset called "Tickets" and "Emails" with the respective CSVs you've saved. If you've copied the original Experiments from Shashi, there isn't anything to overwrite so you can just upload both.

createAML

amlUploadData

Train the Models (ML Experiment 1 of 2)
Here you can see the "Tickets" dataset is used as the initial input for training.

amlTraining

With the sample dataset included from Microsoft/Endava, it should take about 20 minutes to perform its initial training. If you've uploaded something larger than 50,000 Work Items or are using production data, chances are it will take longer. Hit run and patiently await for it to finish. Once it's done, we need to update the Models that Experiment 2 is going to use to actually do the prediction. In order to update the Models, we'll need to right click on each of the "Tune Model Hyperparameters" and Save as Trained Model. From left to right, these represent:

  1. Work Item Type (Incident or Service Request)
  2. Work Item Classification
  3. Work Item Support Group

amlSaveModels

You can also see this for yourself as you select each of these as the properties pane on the right reveals its true purpose in the Label Column field:

amlModelTuning01

As Shashi explain in his own posts, this first Experiment is really all about...well experimentation! In the above screenshot where Models are being saved the left input on "Tune Hypermodel Parameters" is MultiClass Neural Network. This is just one of the many classification engines Azure ML has. This is certainly where things get interesting as you can mix and match different types of classification algorithms as seen below:

amlModelTuning02

Given the plethora of choices and rather simple UI, you're probably looking for a quick breakdown of each beyond just trial and error. Fortunatley, Microsoft Docs has you covered offering brief and detailed explanations of each of these over here. But regardless of the classification algorithms you pick, you'll need to re-run your Experiments and re-save your Models every time you upload a new dataset or link to new Models if you want keep a running history of past prediction capabilities. Once trained, you can right click on the "Evaluate Model" step, head to Evaluate results, and then Visualize.

amlModelTuning03

Then you can see how that particular algorithm/path of classification performed. Here you can see how the Support Group training went and what to expect from predictions in terms of accuracy. A vital step in tuning Azure Machine Learning to your SCSM deployment.

amlModelTuning03

Deploy the Web Service (ML Experiment 2 of 2)
With Experiment 1's Models trained, we can move onto Experiment 2, link the Models up (if you've changed them), train the web service, and publish. With the sample dataset included from Microsoft/Endava, it should take about 50 minutes to perform the second experiment.

webService

I've highlighted all of the variables in Experiment 2. You can see the trained Models from Experiment 1 have been linked and the "Email" dataset (which only contains the subject and body) is used as the data input. Once again, hit Run and patiently wait for this to finish. Once it has, you can click Deploy Web Service. This new web service generates a new and unique URL and set of API keys known only to you.

With the web service deployed, head into Web Services, open your Web Services and take note of your API key. Then click on the link for "REQUEST/RESPONSE" that will open a new link to your auto built documentation for your web service and reveal the URL for your web service. It's worth calling out that your URL will probably look something to the effect of

https://ussouthcentral.services.azureml.net/workspaces/UNIQUESTRING/services/MOREUNIQUESTRINGS/execute?api-version=2.0&details=true

But the connector only needs the URL to take the form of:

https://ussouthcentral.services.azureml.net/workspaces/UNIQUESTRING/services/MOREUNIQUESTRINGS/execute?api-version=2.0

In which case the very end which includes "details" and/or "swagger" is what's getting removed. With your API key and URL, plug them into the following function to test. These very same value are what you'll ultimatley plug into the SMlets Exchange Connector.

#config variables - the URL of your AML Web Service and your AML API Key
$amlAPIKey = ""
$amlURL = ""

function Get-AMLWorkItemProbability ($EmailSubject, $EmailBody)
{
    #create the header
    $headerTable = @{"Authorization" = "Bearer $amlAPIKey"; "Content-Type" = "application/json"}

    #create the JSON request
    $messagePayload = @"
    {
        "Inputs": {
            "Input1" : {
                "ColumnNames": ["Email_Subject", "Email_Description"],
                "Values": [
                    ["$EmailSubject", "$EmailBody"]
                ]
            }
        },
        "GlobalParameters": {}
    }
"@

    #invoke the Azure Machine Learning web service for predicting Work Item Type, Classification, and Support Group
    $probabilityResponse = Invoke-RestMethod -Uri $amlURL -Method Post -Header $headerTable -Body $messagePayload -ContentType "application/json"

    #return custom probability object
    $probabilityResults = $probabilityResponse.Results.output1.value.Values[0]
    $probabilityMatrix = New-Object -TypeName psobject
    $probabilityMatrix | Add-Member -MemberType NoteProperty -Name WorkItemType -Value $probabilityResults[0]
    $probabilityMatrix | Add-Member -MemberType NoteProperty -Name WorkItemTypeConfidence -Value (($probabilityResults[1] -as [decimal]) * 100)
    $probabilityMatrix | Add-Member -MemberType NoteProperty -Name WorkItemClassification -Value $probabilityResults[2]
    $probabilityMatrix | Add-Member -MemberType NoteProperty -Name WorkItemClassificationConfidence -Value (($probabilityResults[3] -as [decimal]) * 100)
    $probabilityMatrix | Add-Member -MemberType NoteProperty -Name WorkItemSupportGroup -Value $probabilityResults[4]
    $probabilityMatrix | Add-Member -MemberType NoteProperty -Name WorkItemSupportGroupConfidence -Value (($probabilityResults[5] -as [decimal]) * 100)
    $probabilityMatrix | Add-Member -MemberType NoteProperty -Name AffectedConfigItem -Value $probabilityResults[6]
    $probabilityMatrix | Add-Member -MemberType NoteProperty -Name AffectedConfigItemConfidence -Value (($probabilityResults[7] -as [decimal]) * 100)

    #return the percent score
    return ($probabilityMatrix)
}

Get-AMLWorkItemProbability -EmailSubject "account" -EmailBody "i think something is locking my account out. can you see if everything is ok?"

Using the above PowerShell function with the included sample dataset, a custom PowerShell object should be returned that looks similar to the following. Keep in mind, that the Affected Config Item values will be empty if you aren't training AML to predict Impacted Configuration Items.

WorkItemType                     : ir
WorkItemTypeConfidence           : 96.496629714965800
WorkItemClassification           : 4
WorkItemClassificationConfidence : 98.301595449447600
WorkItemSupportGroup             : Level2
WorkItemSupportGroupConfidence   : 95.186740159988400
AffectedConfigItem               : 684e010f-6643-4476-8339-07010da49740
AffectedConfigItemConfidence     : 30.595784434634903