GitBook: [#56] No subject

zinggAI · Aug 4, 2022 · 21a3478 · 21a3478
1 parent 923bf0a
commit 21a3478
Show file tree

Hide file tree

Showing 12 changed files with 37 additions and 15 deletions.
diff --git a/docs/dataSourcesAndSinks/connectors.md b/docs/dataSourcesAndSinks/connectors.md
@@ -2,6 +2,7 @@
 title: Data Sources and Sinks
 nav_order: 3
 has_children: true
+description: Data sources and file formats supported by Zingg
 ---
 
 # Data Sources and Sinks

diff --git a/docs/setup/link.md b/docs/setup/link.md
@@ -1,6 +1,10 @@
+---
+description: To match two datasets against each other
+---
+
 # Linking across datasets
 
-In many cases like reference data mastering, enrichment, etc, two individual datasets are duplicates free but they need to be matched against each other. The link phase is used for such scenarios.
+In many cases like reference data mastering, enrichment, etc, two individual datasets are free of duplicates but they need to be matched against each other. The link phase is used for such scenarios.
 
 `./zingg.sh --phase link --conf config.json`
 

diff --git a/docs/setup/match.md b/docs/setup/match.md
@@ -1,17 +1,18 @@
 ---
-layout: default
 title: Find the matches
 parent: Step By Step Guide
 nav_order: 8
+description: Identifying matching records
 ---
 
-### match
-Finds the records which match with each other. 
+# Finding the matches
+
+Finds the records which match with each other.
 
 `./zingg.sh --phase match --conf config.json`
 
-As can be seen in the image below, matching records are given the same z_cluster id. Each record also gets a z_minScore and z_maxScore which shows the least/greatest it matched with other records in the same cluster. 
+As can be seen in the image below, matching records are given the same z\_cluster id. Each record also gets a z\_minScore and z\_maxScore which shows the least/greatest it matched with other records in the same cluster.
 
-![Match results](/assets/match.gif)
+![Match results](../../assets/match.gif)
 
-If records across multiple sources have to be matched, the [link phase](./link.md) should be used.
+If records across multiple sources have to be matched, the [link phase](link.md) should be used.
diff --git a/docs/setup/train.md b/docs/setup/train.md
@@ -1,10 +1,14 @@
 ---
-layout: default
 title: Build and save the model
 parent: Step By Step Guide
 nav_order: 7
+description: Guide to build and save model
 ---
-### train - training and saving the models
+
+# Building and saving the model
+
 Builds up the Zingg models using the training data from the above phases and writes them to the folder zinggDir/modelId as specified in the config.
 
-    ./zingg.sh --phase train --conf config.json
+```
+./zingg.sh --phase train --conf config.json
+```
diff --git a/docs/setup/training/addOwnTrainingData.md b/docs/setup/training/addOwnTrainingData.md
@@ -3,6 +3,7 @@ parent: Creating training data
 nav_order: 3
 title: Using preexisting training data
 grand_parent: Step By Step Guide
+description: Instructions on using existing training data with Zingg
 ---
 
 # Using pre-existing training data

diff --git a/docs/setup/training/createTrainingData.md b/docs/setup/training/createTrainingData.md
@@ -3,8 +3,9 @@ parent: Step By Step Guide
 nav_order: 6
 title: Creating training data
 has_children: true
+description: Guide to working with training data
 ---
 
-# Training data
+# Working With Training Data
 
 Zingg builds models to predict similarity. Training data is needed to build these models. The next sections describe how you can use the Zingg Interactive Labeler to create the training data.
diff --git a/docs/setup/training/exportLabeledData.md b/docs/setup/training/exportLabeledData.md
@@ -3,10 +3,11 @@ parent: Creating training data
 title: Exporting labeled data as csv
 grand_parent: Step By Step Guide
 nav_order: 4
+description: Writing labeled data to CSV for exporting
 ---
 
 # Exporting Labeled Data
 
-If we need to send our labeled data for a subject matter expert to review or if we want to build another model in a new location and [reuse training effort](addOwnTrainingData.md) from earlier, we can write our labeled data to a csv&#x20;
+If we need to send our labeled data for a subject matter expert to review or if we want to build another model in a new location and [reuse training efforts](addOwnTrainingData.md) from earlier, we can write our labeled data to a CSV.
 
 `./scripts/zingg.sh --phase exportModel --conf <path to conf> --location <folder to save the csv>`
diff --git a/docs/setup/training/findAndLabel.md b/docs/setup/training/findAndLabel.md
@@ -3,6 +3,7 @@ parent: Creating training data
 title: Find training data and labelling
 grand_parent: Step By Step Guide
 nav_order: 2
+description: Phase which creates training data
 ---
 
 # Find And Label
@@ -11,4 +12,4 @@ This phase is composed of two phases namely [findTrainingData](findTrainingData.
 
 `./zingg.sh --phase findAndLabel --conf config.json`
 
-As this is phase runs findTrainingData and label together, it should be run only for small datasets where findTrainingData takes a short time to run, else the the user will have to wait long for the console for labeling.&#x20;
+As this phase runs findTrainingData and label together, it should be run only for small datasets where findTrainingData takes a short time to run, else the user will have to wait long for the console for labeling.&#x20;
diff --git a/docs/setup/training/findTrainingData.md b/docs/setup/training/findTrainingData.md
@@ -2,7 +2,7 @@
 parent: Creating training data
 nav_order: 1
 grand_parent: Step By Step Guide
-description: pairs of records that could be similar to train Zingg
+description: Pairs of records that could be similar to train Zingg
 ---
 
 # Finding Records For Training Set Creation

diff --git a/docs/setup/training/label.md b/docs/setup/training/label.md
@@ -15,4 +15,4 @@ The label phase opens an interactive learner where the user can mark the pairs f
 
 Proceed running findTrainingData followed by label phases till you have at least 30-40 positives, or when you see the predictions by Zingg converging with the output you want. At each stage, the user will get different variations of attributes across the records. Zingg performs pretty well with even a small number of training, as the samples to be labeled are chosen by the algorithm itself.
 
-The showConcise flag when passed to the Zingg command line only shows fields which are NOT DONT\_USE
+The showConcise flag when passed to the Zingg command line only shows fields that are NOT DONT\_USE.
diff --git a/docs/stepbystep/configuration/tuning-label-match-and-link-jobs.md b/docs/stepbystep/configuration/tuning-label-match-and-link-jobs.md
@@ -1,3 +1,7 @@
+---
+description: Requirements to optimize the performance
+---
+
 # Tuning Label, Match And Link Jobs
 
 #### numPartitions

diff --git a/docs/updatingLabels.md b/docs/updatingLabels.md
@@ -1,3 +1,7 @@
+---
+description: To update the existing labeled pairs as the data modifies
+---
+
 # Updating Labeled Pairs
 
 **Please note: This is an experimental feature. Please keep a backup copy of your model folder in a separate place before running this**