Skip to content

cloudprivacylabs/lsa-selective-disclosure

Repository files navigation

Selective Disclosure using the Layered Schema Architecture

Selective disclosure allows a party to share a limited set of information with other parties. In personal data exchange scenarios, selective disclosure is a privacy mechanism that allows an individual to share only the necessary information with others. Layered schema architecture enables selective disclosure by annotating schemas with privacy tags. These privacy tags classify data elements with different privacy levels that the data owner can enable or disable based on the data exchange context.

The following figure illustrates selective disclosure of a JSON document to different parties. For each different party or use case, an overlay is defined (or auto-generated) that marks certain fields as "sensitive". This overlay is combined with a schema to form a "schema variant", which is a schema adjusted and annotated for a specific use case.

Ingesting the JSON document with this schema variant results in a labeled property graph representation of the input with annotations containins the "sensitive" tags for the selected fields. The semantic pipeline ingests the JSON document with the given schema variant, removes all fields marked as "sensitive", and translates the labeled property graph back into JSON document which is shared with the recipient. This real-time filtering allows decoupling the use-case specific selective disclosure logic from the backend (a database, or a wallet.)

Selective Disclosure

To demonstrate this operation using the LSA tooling, let's consider a sample user profile data structure containing some demographic information, represented as a JSON schema profile.schema.json. This schema contains person's name, address, and phone information.

{
    "definitions": {
        "Profile": {
            "type": "object",
            "properties": {
                "address": {
                    "$ref": "#/definitions/Address"
                },
                "phone": {
                    "type": "array",
                    "items": {
                        "$ref": "#/definitions/Phone"
                    }
                },
                "firstName": {
                    "type": "string"
                },
                ...
            }
        },
        "Address": {
            "type": "object",
            "properties": {
                "street": {
                    "type": "string"
                },
                ...
            }
        },
        "Phone": {
            "type": "object",
            "properties": {
                "number": {
                    "type": "string"
                },
                "type": {
                    "type": "string"
                }
            }
        }
    }
}

The following is a sample JSON document (given in profile.json):

{
    "firstName": "john",
    "lastName": "doe",
    "address": {
        "street": "123 Main St.",
        "city": "Anycity",
        "state": "CO",
        "postalCode": "80000",
        "country": "US"
    },
    "phone": [
        {
            "type": "cell",
            "number": "123-123 1234"
        }
    ]
}

For this example, we will mark phone number, street address, and last name as "sensitive" using the following overlay:

{
    "definitions": {
        "Profile": {
            "properties": {
                "lastName": {
                    "x-ls": {
                        "privacyLevel": "sensitive"
                    }
                }
            }
        },
        "Address": {
            "properties": {
                "street": {
                    "type": "string",
                    "x-ls": {
                        "privacyLevel": "sensitive"
                    }
                }
            }
        },
        "Phone": {
            "properties": {
                "number": {
                    "x-ls": {
                        "privacyLevel": "sensitive"
                    }
                }
            }
        }
    }
}

Note that the overlay matches the JSON schema structure. It adds x-ls/privacyLevel property to the sensitive fields (x-ls is recognized by the LSA tooling.)

Next step is to combine the schema with this overlay to create a schema variant. This is done with a schema bundle as below:

# Combine the JSON schema and the overlay
# The resulting schema has id http://example.org/ProfileSchema
jsonSchemas:
  - name: profile.schema.json
    id: http://example.org/ProfileSchema
    overlays:
      - profile-sensitive.ovl.json
      
# Declare the data type variants based on the schema
variants:
  # The Profile data type is defined at #/definitions/Profile
  # of the combined JSON schema
  http://example.org/Profile:
    jsonSchema:
      ref: http://example.org/ProfileSchema#/definitions/Profile
      layerId: http://example.org/Profile

This schema bundle reads the JSON schema profile.schema.json, combines it with the overlay profile-sensitive.ovl.json, then defines a data type http://example.org/Profile by pointing to the location in the schema variant where Profile object is defined.

To process data using this schema, we need a pipeline. The below pipeline first ingests JSON data using the profile-sensitive bundle. The output of this operation is a labeled property graph containing the privacyLevel annotations given in profile-sensitive.ovl.json. This is self-describing data, data that contains the schema information together with data elements. The oc operation runs openCypher expressions on this graph, removing all nodes that are marked with privacyLevel: sensitive. The final step translates the graph to JSON.

# Ingest a Profile object with the schema using the sensitive overlay
# The output of this stage is a graph
- operation: ingest/json
  params:
    bundle:
      - profile-sensitive.bundle.yaml
    type: http://example.org/Profile

# Remove all graph nodes that are marked sensitive
- operation: oc
  params:
    expr:
      - match (k {`privacyLevel`:"sensitive"}) detach delete k
    
# Convert the graph back to JSON
- operation: export/json

This pipeline can be run using:

layers pipeline --file sensitive.pipeline.yaml profile.json 

(The layers program can be downloaded from https://github.com/cloudprivacylabs/lsa/releases)

The output is:

{
  "firstName": "john",
  "address": {
    "city": "Anycity",
    "state": "CO",
    "postalCode": "80000",
    "country": "US"
  },
  "phone": [
    {
      "type": "cell"
    }
  ]
}

As you can see, the output does not contain those fields that are markes as sensitive.

Now we can create a second overlay to add more sensitive fields. The profile-moresensitive.ovl.json overlay declares firstName, middlename and city fields as sensitive.

{
    "definitions": {
        "Profile": {
            "properties": {
                "firstName": {
                    "x-ls": {
                        "privacyLevel": "sensitive"
                    }
                },
                "middleName": {
                    "x-ls": {
                        "privacyLevel": "sensitive"
                    }
                }
            }
        },
        "Address": {
            "properties": {
                "city": {
                    "type": "string",
                    "x-ls": {
                        "privacyLevel": "sensitive"
                    }
                }
            }
        }
    }
}

Then a new schema bundle combines both sensitive data overlays (profile-moresensitive.bundle.yaml):

jsonSchemas:
  - name: profile.schema.json
    id: http://example.org/ProfileSchema
    overlays:
      - profile-sensitive.ovl.json
      - profile-moresensitive.ovl.json
variants:
  http://example.org/Profile:
    jsonSchema:
      ref: http://example.org/ProfileSchema#/definitions/Profile
      layerId: http://example.org/Profile

With a new pipeline using this bundle:

- operation: ingest/json
  params:
    bundle:
      - profile-moresensitive.bundle.yaml
    type: http://example.org/Profile

- operation: oc
  params:
    expr:
      - match (k {`privacyLevel`:"sensitive"}) detach delete k
    
- operation: export/json

To get the output, run:

layers pipeline --file moresensitive.pipeline.yaml  profile.json 

Which gives:

{
  "address": {
    "state": "CO",
    "postalCode": "80000",
    "country": "US"
  },
  "phone": [
    {
      "type": "cell"
    }
  ]
}

So we created two overlays, two schema bundles, and two pipelines that are applicable to two separate data exchange scenarios.

This example only illustrates the basics of selective disclosure using the layered schema architecture, which is the foundation for our real-time data filtering engine.

About

Demonstration of selective disclosure of a JSON document using layered schema architecture

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published