Data Analysis

Azure Analysis Services

PaaS
Integrated with Azure data platform services.
You can mashup and combine data from multiple sources, define metrics, and secure your data in a single, trusted semantic data model.
Handles
- Security
- In-memory cache
- Data modeling
- Lifecycle management
- Business logic & metrics
Compatible with many features already in SQL Server Analysis Services Enterprise Edition
- Supports tabular models at the 1200 and 1400 compatibility levels
- Partitions, row-level security, bi-directional relationships, and translations are all supported.
- In-memory and DirectQuery modes are also available for fast queries over massive and complex datasets.

Data Sources
- Cloud: E.g. SQL Database, Azure Synapse Analytics, Data Lake, HDInsights/Spark…
- On-premises: E.g. SQL Server / Oracle…
Client tools
- Cloud: Power BI
- On-premises: Third-Party. Power BI Desktop. Excel

Client library for SQL to describe model objects for developers.
Exposed in JSON through the Tabular Model Scripting Language (TMSL) and the AMO data definition language.
- TOM is built on AMO.
  - Analysis Management Objects (AMO) is a library of programmatically accessed objects that enables an application to manage an Analysis Services instance.
  - E.g. AMO has data mining classes
Has classes for models, relationship, roles, annotations, cultures etc. to manage SQL analysis objects.
Structured in a tabular form.
Arranges data elements in vertical columns and horizontal rows. Each cell is formed by the intersection of a column and row.

Common use:
1. Create HDInsight
2. Schedule Jobs
3. Delete HDInsight Cluster
Azure distribution of Apache Hadoop components
- Framework for processing and analysis of big data sets on clusters.
- Including Apache Hive, HBase, Spark, Kafka, Storm, R and many others.
  - Apache Spark is an open-source parallel processing framework that supports in-memory processing to boost the performance of big-data analytic applications.
Built on top of Azure Storage

A single, central place for all of an organization's users to contribute their knowledge and build a community and culture of data.
- It includes a crowdsourcing model of metadata and annotations.
  - Descriptive metadata supplements the structural metadata (such as column names and data types) that's registered from the data source.
- The data remains in its existing location, but a copy of its metadata is added to Data Catalog, along with a reference to the data-source location.
- The metadata is also indexed to make each data source easily discoverable via search and understandable to the users who discover it.
Any user (analyst, data scientist, or developer) can discover, understand, and consume data sources.
- Users can contribute to the catalog by tagging, documenting, and annotating data sources that have already been registered.
- They can also register new data sources, which can then be discovered, understood, and consumed by the community of catalog users.