Skip to content

Commit

Permalink
fix: nb issues being valid (#655)
Browse files Browse the repository at this point in the history
  • Loading branch information
JGSweets committed Sep 20, 2022
1 parent ace0e52 commit a432ef9
Showing 1 changed file with 25 additions and 28 deletions.
53 changes: 25 additions & 28 deletions examples/graph_data_demo.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,13 @@
"cells": [
{
"cell_type": "markdown",
"id": "228bb2a6",
"metadata": {},
"source": [
"# Graph Pipeline Demo"
]
},
{
"cell_type": "markdown",
"id": "cab7a569",
"metadata": {},
"source": [
"DataProfiler can also load and profile graph datasets. Similarly to the rest of DataProfiler profilers, this is split into two components:\n",
Expand Down Expand Up @@ -142,71 +140,72 @@
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Saving and Loading a Profile\n",
"Below you will see an example of how a Graph Profile can be saved and loaded again."
],
"metadata": {}
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# The default save filepath is profile-<datetime>.pkl\n",
"profile.save(filepath=\"profile.pkl\")\n",
"\n",
"new_profile = dp.GraphProfiler.load(\"profile.pkl\")\n",
"new_report = new_profile.report()"
],
"metadata": {}
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pp.pprint(report)"
],
"metadata": {}
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Difference in Data\n",
"If we wanted to ensure that this new profile was the same as the previous profile that we loaded, we could compare them using the diff functionality."
],
"metadata": {}
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"diff = profile.diff(new_profile)"
],
"metadata": {}
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pp.pprint(diff)"
],
"metadata": {}
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Another use for diff might be to provide differences between training and testing profiles as shown in the cell below.\n",
"We will use the profile above as the training profile and create a new profile to represent the testing profile"
],
"metadata": {}
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"training_profile = profile\n",
Expand All @@ -215,38 +214,37 @@
"testing_profile = dp.Profiler(testing_data)\n",
"\n",
"test_train_diff = training_profile.diff(testing_profile)"
],
"metadata": {}
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Below you can observe the difference between the two profiles."
],
"metadata": {}
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pp.pprint(test_train_diff)"
],
"metadata": {}
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Conclusion"
],
"metadata": {}
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We have shown the graph pipeline in the DataProfiler. It works similarly to the current DataProfiler implementation."
],
"metadata": {}
]
}
],
"metadata": {
Expand All @@ -266,8 +264,7 @@
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.13"
},
"orig_nbformat": 4
}
},
"nbformat": 4,
"nbformat_minor": 2
Expand Down

0 comments on commit a432ef9

Please sign in to comment.