- Understand the uses of the Query Service
- Connect a client to the Query Service
- Running Queries
- Generating Datasets from Query Results
- Experience Platform UI: https://platform.adobe.com/
- Connect a client to the Query Service
- Examine Datasets
- Running Queries
- Generating Datasets from Query Results
Data scientists and others may need to do more specific lookups and queries on the data that exists in the Experience Platform. This may also include datasets that are stored on the Experience Platform but not used by Unified Profile service. Query Service gives you the ability to use standard SQL to query data in Adobe Experience Platform to support many different use cases. It is a serverless tool which allows you to join any datasets in Experience Data Lake and capture the query results as a new dataset for use in reporting, Data Science Workspace, or for ingestion into Profile Service.
-
For this lab we are going to use PostgreSQL to connect to the query service.
For MacOS users:
Open a terminal window and issue these three commands:
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" brew install postgres which psql
You should see the following if it worked correctly:
/usr/local/bin/psql
For Windows users:
Download and install PostgreSQL
After installation you will need to edit your path variable to add two entries:
C:\Program Files\PostgreSQL\11.1\bin C:\Program Files\PostgreSQL\11.1\lib
The version number, 11.1 in the case above, may very depending on what version of PostgreSQL you installed. At the time of writing 11.1 was the latest version.
You need to add the two lines shown that include "Postgres". Save your updates, then open a Command prompt and type this:
psql -V
You should see something like this:
psql (PostgreSQL) 11.1
-
Connect to the platform web site: https://platform.adobe.com
Browse to the "Data" tab:
-
Then click the "Queries" tab:
-
Click on the "Connect BI Tools" page. Then click "copy" for "PSQL Command".
-
Now paste the command string into a terminal/command window and press return. You should see a result like this:
psql (11.1, server 0.1.0) SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off) Type "help" for help. all=>
If you don't see at least version 10.5 of psql then you need to download an updated version.
Windows Users: I'm sorry but before you paste the command into a command window you will need to paste it into a text editor like Notepad or VS Code and remove all the new lines. The windows terminal will not run multi-line commands properly.
-
Seeing available tables with PostgreSQL
\d
vsSHOW TABLES;
\d
gives you the standard PostgreSQL viewSHOW TABLES;
is a custom command we have added that gives you a more detailed view and presents the table as well as the DataSet. -
Check the tables root schema details with PostgreSQL
\d table_name
. The schema presented is just the root fields most of which are complex, structured types as you say in the DataSet schema UI. -
Open https://platform.adobe.com/datasets, click on the
Data
tab and from the Dataset list click on the profile dataset you created and then the schema tab.
-
Now we can run queries against the profile dataset we ingested in chapter 7. Run this SQL statement, and substitute your participant name for {ldap}.
SELECT personalemail FROM {ldap}_profiledataset WHERE personalemail IS NOT NULL LIMIT 10;
-
Notice anything about the result? It produced a flattened object rather than returning a single value. The
personalemail
object contains these six parameters primary, address, label, type, status and statusReason. And when the column is only delcared down to the object it will return the entire object as a string. The XDM schema is more complex than what you may have had experience with before because we need to cover multiple solutions, channels, and use-cases. If you just wanted the ID value you would use:SELECT personalemail.address FROM {ldap}_profiledataset WHERE personalemail IS NOT NULL LIMIT 10;
-
Now go back into the Experience Platform UI, click on the Data tab, then Queries.
-
Find the query you previously ran to find all the email addresses. Hove over the row and click on the Create Dataset button.
-
Enter a dataset name, prepended with your LDAP ID (does not have to be unique or SQL-safe, the system generates a "table name" based on the name given here). Enter a dataset description and click
Run Query
. -
This submits a request to create a dataset:
which then runs:
until completion:
-
Now got back to the Datasets tab and you will see you custom dataset:
-
Click on the name of your dataset and then the Preview button:
-
You'll see the same data that you previously queried in psql.
Now we've learned how to use standard SQL queries to retrieve data from our data lake as well as create new datasets based on our queries.
Previous: | Next: |
---|---|
Chapter 7 - API - Data: Ingesting Data via Batch Ingestion | Chapter 9 - Technical - Streaming: Stream in Data using Launch |
Return Home: Workbook Index