Skip to content

Latest commit

 

History

History
278 lines (231 loc) · 9.9 KB

4.1. Storage - Azure Storage.md

File metadata and controls

278 lines (231 loc) · 9.9 KB

Storage

Storage services

  • Storage account is top-level account for following services:
    • Blob Storage
    • File Storage
    • Table Storage
    • Queue Storage

Blob Storage

  • Object and disk storage
  • Blob storage tiers
  • Azure Search integration
  • Blob Lease for exclusive write access
    • Pass in lease id to API to modify
    • E.g. IaaS VMs lease Page Blob disks to ensure its managed by single VM
  • You can create snapshots on blob level and view snapshots.

Azure Data Lake Storage

  • Uses blob storage to store data
  • Big data analytics
  • Analytics interface and APIs
  • Blob storage APIs
  • Hadoop compatible access to data
  • ❗ GPv2 Storage accounts only

Blob Types

  • Block Blob
    • Composed of 100 MB blocks
    • Optimized for efficient upload
    • Insert, replace, delete, blocks
    • ❗ Up to 4.77TB max file size
    • ❗ 50.000 max blobs
  • Append blob
    • Can only append blocks
    • Ideal for log and audit files
    • ❗ 195GB max file size
  • Page Blob
    • Optimized for frequent read/write operations
    • Good for VM disks and databases
      • Foundation for IaaS disks
      • Stores VHD files.
      • Underlying storage for Azure SQL
    • ❗ Standard (HDD) / Premium (SSD) storage
    • ❗ 8 TB max file size
    • ❗ Only offered in General Purpose account types

Blob Storage Access Tiers

  • Set on blob level.
  • Three tiers:
    1. Hot Tier: Frequent reads
      • Lower data access costs
      • Higher data storage costs
    2. Cool Tier: Accessed less frequently
      • Higher data access costs
      • Lower data storage costs
      • Optimized for data that's stored 30 days
    3. Archive Tier: Take hours to get data available
      • Highest data access cost
      • Lowest data storage cost
      • Optimized for data that's stored 180 days
      • ❗ Only supported for Block Blobs
  • Changing storage tiers incurs charges
  • ❗ Can't change the Storage Tier of a Blob that has snapshots
  • Azure Blob Storage Lifecycle Management Policies
    • E.g. configure a policy to move a blob directly to the archive storage tier X days after it's uploaded
    • In portal: Storage Account → Blob Service → Lifecycle Management
    • Executed daily

WORM: Write Once Read Many

  • Cannot be erased or modify for certain period of time.
  • Set on container level
  • Enable in portal
    • Access Policy → Add Policy → Time-based retention (set retention period) / Legal hold (attach tags) → Lock policy

Soft Delete

  • Saves deleted for a specified period of time
  • In portal: Storage Account → Blob Services → Soft Delete

Static Website Hosting

  • When activated it creates $web container.
  • You need to have default document and error page.
  • You can integrate Azure CDN
    • Azure Content Delivery Network (CDN)
      • Distributed network of cache servers
      • Provide data to user from closest source
      • Offload traffic from origin servers to CDN
        • Typically static data
      • Pricing tiers are Microsoft, Akami, Verizon (Microsoft partners)
      • Supports • HTTPS • large file download optimization• file compression • geo-filtering
      • Azure CDN Core Analytics is collected and can be exported to blob storage, event hubs, Log Analytics.
    • Azure Storage blob becomesorigin server.
    • Azure CDN servers become edge servers
    • CDN can authenticate to Blob Storage with SAS tokens to be able to read private data.
    • Caching rules
      • On blobs you can set CDN caching rules, such as CacheControl:max-age=86400 in blob properties.
    • Two alternatives to set up:
      1. Create CDN and configure against blob service.
      2. Storage account → Blob service → CDN
  • You can have custom domain
  • You can have CORS policies

Azure Search

  • Integrates with Blob Storage
  • You can provide metadata in blobs, they'll be used as fields in search index which helps categorize documents and aid with features like faceted search.
    • You can choose index content, content+metadata or just metadata.
  • Searchable blobs can be • PDF • DOC/DOCs • XLS/XLSX • PPT/PPTx • MSG • HTML • XML • ZIP • EML • RTF • TXT • CSV • JSON
  • Structure
    • Index, Fields, Documents
  • Data Load
    • Push data in yourself
    • Pull data from Azure sources (SQL, Cosmos DB or blob storage)
  • Data Access
    • REST API, Simple Query, Lucene, .NET SDK
  • Features:
    • Fuzzy search handles misspelled words.
    • Suggestions from partial input.
    • Facets for categories.
    • Highlighting search tags for the results.
    • Tune and rank search results
    • Paging
    • Geo-spatial search if index data has latitude and longitude, user can get related data based on proximity
    • Synonyms
  • Lexical analysis done by Analyzers
  • You can combine following cognitive skills in pipelines: OCR, language detection, key phrase extraction, NER, sentiment analysis, merger/split/image analysis/shaper.

File Storage

  • SMB File Shares
  • Attach to Virtual Machines as file shares
  • Integrates with Azure File Sync
    • On-prem to Azure sync with caching strategy

Table Storage

  • NoSQL Data Store
  • Scheme-less design
  • Azure Cosmos DB

Queue Storage

  • Message based
  • For building synchronous applications
  • URL format: e.g. http://storageaccount.blob.core.windows.net

Account Types

Blob Storage Account

  • Supported services: Blob storage
  • Supported blob types: Block blobs, append blobs
  • Supports blob storage access tiers (hot, cool, archive)

General Purpose V1

  • Supported services: Blob storage
  • ❗ Does not support blob storage access tiers (hot, cool, archive)
  • ❗ Classic deployment & ARM
  • ❗ Does not support ZRS (Zone Redundant Storage) replication
  • Slightly cheaper storage transaction costs, can be converted to V2.

General Purpose V2

  • Supports all latest features.
    • Including anything in General Purpose V1 and blob storage access tiers.
  • 💡 Recommended choice when creating storage account.
  • Lower storage costs than V1
  • ❗ Has a changing soft limit (as of now 500 TB)
    • You can contact Azure support and request higher limits (as of now 5 PB). Same for ingress/egress limits to.

Account Replication

  • Impacts SLA
  • Locally Redundant Storage (LRS)
    • Three copies of data in single data center.
    • Data is spread across multiple hardware racks.
  • Zone Redundant Storage (ZRS)
    • Three copies of data in different availability zones in same region.
    • ❗ Only available for GPv2 storage accounts
  • Geo-redundant Storage (GRS)
    • Three copies of data in two different data centers in two different regions.
    • ❗ You don't get to choose second region, they're paired regions decided by Microsoft.
    • ❗ Replication involves a delay.
      • RPO (recovery point objective) is typically lower than 15 minutes.
  • Read-access Geo-redundant Storage (RA-GRS)
    • Same as GRS, but you get read-only access to data in secondary region.

Azure Storage Explorer

  • Cross-platform client application to administer/view storage and Cosmos DB accounts.
    • Can be downloaded with Storage Account → Open in Explorer in Portal.
    • Available in Azure portal as well (preview & simpler)
  • Can manage accounts across multiple subscriptions
  • Allows you to
    • Run storage emulator in local environment.
    • Manage SAS, CORS, access levels, meta data, files in File Share, stored procedures in Cosmos DB
    • Manage soft delete:
      • Enables recycle bin (retention period) for deleted items.
  • Connecting and authentication
    • Admin access with account log-in
    • Limited access with account level SAS

Pricing

  • Data storage cost (capacity)
  • Data operations
  • Outbound data transfer (bandwidth)
  • Geo-replication data transfer

Import and export data to Azure

  • You can use portal, PowerShell, REST API, Azure CLI, or .NET Storage SDKs.
  • You can upload files/folders using Azure Storage Explorer.
  • You can use physical drives
    • ❗ 64 bit only operating systems: Windows 8+ and Windows Server 2008+
    • Preparing the drive
      • ❗ NSTF only.
      • ❗ Drives must be encrypted using BitLocker
    • WAImportExportTool
      • Azure Import/Export tool
      • V1: Blob Storage, Export Jobs, V2: GP v1, GP v2
      • Allows you to copy from on-prem.

AzCopy

  • You can use AzCopy command-line utility tool.
  • No limit to # of files in batch
  • Pattern filters to select files
  • Can continue batch after connection interruption
    • Uses internal journal file to handle it
  • Copy newer/older source files.
  • Throttle # of concurrent connections
  • Modify file name and metadata during upload.
  • Generate log file
  • Authenticate with storage account key or SAS.

Importing data

  • Create import job
    1. Create storage account
    2. Prepare the drives
      • Connect disk drives to the Windows system via SATA connectors
      • Create a single NTFS volume on each drive
      • Prepare data using WAImportExportTool
        • Modify dataset.csv to include files/folders
        • Modify driveset.csv to include disks & encryption settings
        • Copy access key from storage account
    3. In Azure → Create import/export job → Import into Azure → Select container RG → Upload JRN (journal) file created from WAImportExportTool → Choose import destination to the storage account → Fill return shipping info
    4. Ship the drives to the Azure data center & update status with tracking number
  • Costs
    • Charged: fixed price per device, return shipping costs
    • Free: for the data transfer in Azure
  • ❗ No SLAs on shipping
    • Estimated: 7-10 days after arrival

Exporting data

  1. In portal: Azure → Create Import/Export Job → Choose Export from Azure
  2. Select storage account and optionally containers
  3. Type shipping info
  4. Ship blank drives to Azure
  5. Azure encrypts & copies files
    • Provides recovery key for encrypted drive.

Azure Data Box

  • Microsoft ships Data Box storage device
    • Each storage device has a maximum usable storage capacity of 80 TB.
  • It lets you send terabytes of data into Azure in a quick, inexpensive, and reliable way