Azure Table Storage Tips for the RDBMS Developer

Adrian Hills

Published On: October 21, 2020

Categories: Azure Table Storage 0

Azure Table Storage is a NoSQL key-value PaaS data store that can be a great option for highly scalable, highly available systems. It supports storing petabytes of data and a flexible data schema, meaning different entities in the same table can have different schemas. References to NoSQL databases having "flexible schema" or being "schema-less" can give the impression that database schema design is a thing of the past and that you can bypass it and focus more on the application code. The reality is, even in this NoSQL world, schema design is very important and if you don't give it due care and attention, then it can come back to bite you.

If you have a RDBMS background and are new to Azure Table Storage, it's common to find yourself "thinking in SQL" and trying to solve database modeling requirements with a SQL approach before then trying to translate that to a key-value mindset. In this blog post, I'll cover some of the fundamentals of Azure Table Storage and dive into some common questions you might find yourself asking about Azure Table Storage. Where code samples or references are applicable in this blog post, we'll be focusing on .NET and using the Azure SDK (specifically relating to the Microsoft.Azure.Cosmos.Table nuget package).

 

Azure Table Storage Fundamentals

First, let's cover some of the fundamental constructs and features of Azure Table Storage, including the main limitations and differences versus what you might be used to as an RDBMS developer.

Tables

  • An individual table can contain entities with different properties—the flexible schema we mentioned above.
  • A table can contain up to 255 properties.
  • Each property can be up to 64KB in size.
  • Each entity can be up to 1MB in size.
  • You can have as many tables as you want, up to the storage capacity of an Azure Storage Account (500TB).
  • Every entity has a PartitionKey, RowKey, and a Timestamp. The Timestamp property is maintained server side as the time the entity was last modified, and is used to provide optimistic concurrency, so it cannot be directly modified.
  • PartitionKey and RowKey are both strings that can each be up to 1KB in size and the combination of both must be unique.

Indexes

  • Only the PartitionKey and RowKey properties are indexed.
  • Secondary indexes on other properties cannot be created.
  • These values are indexed in ascending order.
  • Having a solid understanding of your query requirements is important, as there are strategies for dealing with the apparent limitation on indexes, which I'll cover in some common scenarios below.

 

Transactions

  • Atomic updates can be performed as long as the entities are all within the same table partition (same PartitionKey).
  • You cannot perform cross-table or cross-partition transactions.
  • A maximum of 100 operations can be performed within the same atomic operation.
  • The total payload of the batch must be no more than 4MB.

High Availability / Data Redundancy

  • Choose your level of replication based on your needs.

Locally-Redundant Storage (LRS)

Read-Access Geo-Zone-Redundant Storage (RA-GZRS)

  • Lower cost
  • 99.9% availability SLA for reads and writes
  • 99.9% availability SLA for writes
  • Least durability (99.999999999% - 11 9s)
  • No availability in event of a datacenter or region outage

 

  • Higher cost
  • 99.99% availability SLA for reads (Read-access (RA-*) redundancy options provide read access from a secondary region)
  • 99.9% SLA availability SLA for writes
  • Highest durability (99.99999999999999% - 16 9s)
  • Availability in event of a datacenter or region outage

 

Cost

  • Azure Table Storage is very cost effective.
  • You are charged based on the amount of storage and the number of storage transactions made against the service (e.g., an individual operation against the storage API).
  • For example, at the time of writing (October 2020), for 1000GB of storage and 100 million storage transactions/month the costs are as follows:
    • LRS redundancy: $48.60/month
    • RA-GRS redundancy: $76.60/month (RA-GZRS was not listed on the calculator)

 

Azure Table Storage FAQs for the RDBMS Developer

 

How do I create custom indexes on an Azure Table Storage table?

As mentioned above, the only index that can exist on a table is on the PartitionKey and RowKey properties. Secondary indexes are not supported.

 

How do I optimize querying by different properties if I can't have secondary indexes?

Storing duplicate copies of the data with different PartitionKey and RowKeys properties is the way to go. Storage is inexpensive. You can make use of the in-partition batch support to insert/modify the copies in an atomic manner. Just bear in mind the limitations on batches—e.g., 100 operations max, 4MB limit on total payload size—but for most scenarios, this is not a concern.

Example scenario: Efficiently look up an employee by their employee id or by their domain username.

In this scenario, we have the following data in the Employee table:

PartitionKey RowKey FirstName LastName EmployeeId DomainUsername
Employee Id_012345 Joe Bloggs 012345 jbloggs
Employee Uname_jbloggs Joe Bloggs 012345 jbloggs

Here, we're storing two copies of an Employee entity—all the custom properties are the same (FirstName, LastName, etc.) but we use a different RowKey for each to facilitate our query requirements.

A query by Employee Id would be an efficient Point Query (a query for a single entity identified by its PartitionKey and RowKey).

// Equivalent to SQL: SELECT * FROM Employee WHERE PartitionKey = 'Employee' AND RowKey = 'Id_012345'
var query = TableOperation.Retrieve<EmployeeEntity>("Employee", "Id_012345");
var result = await employeeTable.ExecuteAsync(query);

A query by domain username would also be an efficient Point Query. The above query is a shortened convenience wrapper for single row lookups by PartitionKey and RowKey, but you can also use ExecuteQuerySegmentedAsync (as shown below), which is what you'd used for queries that return multiple entities (full example).

// Equivalent to SQL: SELECT * FROM Employee WHERE PartitionKey = 'Employee' AND RowKey = 'Uname_jbloggs'
var filter =
    TableQuery.CombineFilters(
        TableQuery.GenerateFilterCondition("PartitionKey",QueryComparisons.Equal, "Employee"),
        TableOperators.And,
        TableQuery.GenerateFilterCondition("RowKey", QueryComparisons.Equal, "Uname_jbloggs"));

var query = new TableQuery<EmployeeEntity>().Where(filter);
var querySegment = employeeTable.ExecuteQuerySegmentedAsync(query, null);

But how would you retrieve a list of all employees? If we retrieved everything in the Employee partition, we'd get duplicates, as we're storing multiple copies of the data. So, we would want to query for entities where the RowKey starts with "Id_" (we could also choose to search for RowKeys starting with "Uname_"). See the "How can I perform a LIKE query for values starting with a given value?" question further down for more information.

 

Can I query by properties other than the PartitionKey and RowKey?

Yes. But just like in a RDBMS, when you query on a field that is not indexed, it will perform a scan. If you're searching on a custom property within a specific partition, it will be a Partition Scan. If you're searching across all partitions, it will be a Table Scan, which is the most inefficient query to perform.

 

How can I insert/update multiple entities in an atomic operation?

Using the previous employee example, how do you go about creating both of the entities atomically when adding a new employee to the database? Well, as shown in the code below, you can perform multiple operations as part of a TableBatchOperation—just recall the limitations I mentioned in the fundamentals section: a maximum of 100 operations per batch and all entities must have the same PartitionKey and a maximum batch payload size of 4MB.

var employeeById = new EmployeeEntity {
    PartitionKey = "Employee",
    RowKey = "Id_12345",
    FirstName = "Joe",
    LastName = "Bloggs",
    EmployeeId = "012345",
    DomainUsername = "jbloggs"
};

var employeeByDomainUsername = new EmployeeEntity {
    PartitionKey = "Employee",
    RowKey = "Uname_jbloggs",
    ... rest of properties...
};

var batchOperation = new TableBatchOperation();
batchOperation.Insert(employeeById);
batchOperation.Insert(employeeByUname);

// employeeTable variable is a reference to a CloudTable instance pointing to the Employee table
await employeeTable.ExecuteBatchAsync(batchOperation);

 

How do I store entities with different schema in the same table?

There's nothing special here—just create a new entity with different properties and insert it into the same table. 

The following example demonstrates this for Order and OrderItem entities, storing the order header and order item entities in the same table.

var order = new OrderEntity {
    PartitionKey = "Order",
    RowKey = "12345",
    OrderId = "12345",
    Total = 10.99,
    OrderDate = "2020-01-02T03:04:05.678Z"
};

var orderItem = new OrderItemEntity {
    PartitionKey = "Order",
    RowKey = "Item_12345_ABC123",
    ProductCode = "ABC123",
    Quantity = 1,
    UnitPrice = 10.99
};

var batchOperation = new TableBatchOperation();
batchOperation.Insert(order);
batchOperation.Insert(orderItem);

This would result in the following data in the table:

PartitionKey RowKey OrderId Total OrderDate ProductCode Quantity UnitPrice
Order Id_12345 12345 10.99 2020-01-02T03:04:05.678Z      
Order Item_12345_ABC123       ABC123 1 10.99

A note about the limitations of properties on a single table—the combined total number of properties cannot exceed 255.

 

How can I perform a LIKE query for values starting with a given value?

You can run a Range Query that is equivalent to a LIKE 'Something%' condition. Range queries are the second-best types of query, filtering on the partition key and a range of row key values.

Example scenario: Retrieve a distinct list of employees from the above example.

In this case, we're looking for all rows starting with "Id_".

var rowKeyStartsWith = "Id_";

var partitionKeyFilter =
    TableQuery.GenerateFilterCondition("PartitionKey",QueryComparisons.Equal, "Employee");

// Replace the last character in the starting value with the next character in sequence
// to serve as the end of the range to filter on
var rowKeyRangeEnd =
    rowKeyStartsWith.Substring(0, rowKeyStartsWith.Length - 1) + (char)(rowKeyStartsWith.Last() + 1);

var rowKeyFilter =
    TableQuery.CombineFilters(
        TableQuery.GenerateFilterCondition("RowKey",  QueryComparisons.GreaterThan, rowKeyStartsWith),
        TableOperators.And,
        TableQuery.GenerateFilterCondition("RowKey", QueryComparisons.LessThan, rowKeyRangeEnd));

// Equivalent to SQL: SELECT * FROM Employee WHERE PartitionKey = 'Employee' AND RowKey LIKE 'Id[_]%'
var fullFilter = TableQuery.CombineFilters(partitionKeyFilter, TableOperators.And, rowKeyFilter);

var query = new TableQuery<EmployeeEntity>().Where(fullFilter);

The outcome is that we search for RowKeys > 'Id_' and < 'Id`'.

 

How do I enforce unique constraints?

PartitionKey and RowKey combinations are unique. In a relational database such as SQL Server, you might add a unique constraint or a unique index to fields that must be unique outside of the Primary Key. In Azure Table Storage, you can add an extra copy of the entity, within the same atomic batch, to enforce uniqueness.

For example, in the previous employee scenario above, EmployeeId and DomainUsername values would have uniqueness enforced, as we've duplicated the data with separate RowKeys that have those properties built in.

 

How can I include special characters in the PartitionKey or RowKey?

A limitation of PartitionKeys and RowKeys is that certain characters are not allowed. If you try to create an employee with a DomainUsername of "MyDomain\jbloggs", then you'd get an error when trying to insert the Uname entity shown above because of the backslash.

One approach for this is to base64 encode the value, which results in a safe string except that it could still contain a forward slash, which is not allowed. You can replace that with a substitute safe character, such as an underscore.

For "MyDomain\jbloggs" username the outcome would be as follows:

PartitionKey

RowKey

FirstName

LastName

EmployeeId

DomainUsername

Employee

Id_012345

Joe

Bloggs

012345

MyDomain\jbloggs

Employee

Uname_TXlEb21haW5camJsb2dncw==

Joe

Bloggs

012345

MyDomain\jbloggs

 

How do I perform case-insensitive queries?

Queries are case-sensitive. To perform a case insensitive search, standardize the case of the value being included in the RowKey to lower or uppercase before you store/query them. If you need to encode the value, like in the previous question, standardize the casing before then encoding.

You can store the original value unchanged inside the entity, but for search purposes, standardizing the value's case inside the PartitionKey or RowKey is what we're talking about here.

 

How can I perform an "ORDER BY Date DESC" query?

Rows are stored in ascending order. There are 2 parts to support being able to query by datetime order:

  1. Datetimes should be converted to a string in an orderable format (we'll use the ISO8601 format)
  2. We need to convert datetimes to values that effectively reverse the ordering (i.e., a value that decreases as the datetimes increase)

Example scenario: We want to query for orders, in both ascending and descending order of OrderDate.

Let's take the following example data in an Orders table:

PartitionKey

RowKey

OrderId

Total

OrderDate

Order

ASC_2020-01-02T03:04:05.678_1

1

10.99

2020-01-02T03:04:05.678Z

Order

DESC_7980-12-30T20:55:54.321_1

1

10.99

2020-01-02T03:04:05.678Z

Breaking those RowKeys down:

Date_Order_Descending

  • The first part of the RowKey allows us to differentiate between entities stored in ascending and descending order.
  • The second part for ASC rows will be the plain order date and for DESC rows will be a DateTime value calculated by subtracting the order date from DateTime.MaxValue.
  • The third part is the OrderId, which guarantees uniqueness of each PartitionKey + RowKey combination.

To retrieve a list of orders in ascending OrderDate, we query for entities where the RowKey starts with "ASC_" (as described in the previous LIKE scenario).

To retrieve a list of orders in descending OrderDate, we query for entities where the RowKey starts with "DESC_".

If multiple entities have the same OrderDate, the returned order would then come down to the OrderId part.

 

How do I limit the fields I pull back in a query (SELECT <ColumnList>)?

One way to do this, as shown in the example below, is to specify the fields via the TableQuery.SelectColumns property.

// Equivalent to SQL:
// SELECT PartitionKey, RowKey, DomainUsername FROM Employee WHERE PartitionKey = 'Employee' AND RowKey = 'Id_012345'
var filter =
    TableQuery.CombineFilters(
        TableQuery.GenerateFilterCondition("PartitionKey",QueryComparisons.Equal, "Employee"),
        TableOperators.And,
        TableQuery.GenerateFilterCondition("RowKey", QueryComparisons.Equal, "Id_012345"));

var query = new TableQuery<EmployeeEntity>().Where(filter);
query.SelectColumns = new List<string> { "DomainUsername" });

This will pull PartitionKey, RowKey, and DomainUsername from the table—PartitionKey and RowKey are always returned; however, you can exclude those, too, by providing a TableRequestOptions instance when executing the query, with ProjectSystemProperties=false.

 

What about CosmosDB?

Azure Table Storage has been around for many years, and since then, CosmosDB (formerly DocumentDB) has come along as another NoSQL PaaS offering from Microsoft. CosmosDB is also a great option that offers some added benefits, such as secondary indexes. These come at a cost, however, which is why Azure Table Storage can be a great option depending on your needs. I still love Azure Table Storage as a non-relational data store for its cost effectiveness and simple pricing model. When it suits the requirements and use cases, I keep it at the forefront of my mind to consider. 

There is also an option to migrate to CosmosDB without having to rewrite your application data access logic. CosmosDB has a Table API, and you can use the same client library to access that as you do for accessing Azure Table Storage—it's just a different connection string. There are some differences to be aware of, however, in the behaviors/limits of Azure Table Storage versus CosmosDB Table API, but it's certainly good to know.

 

Summary

When you first try out a new database that you're not familiar with, there is a learning curve that encompasses everything from how to model data and what query patterns are supported, to which client library to use and how to perform the right operations through it. The natural starting point is to think in terms of what you know. For SQL developers trying out Azure Table Storage, that entails thinking about SELECT statements, clustered indexes, non-clustered indexes, etc., and then trying to work out how to replicate those concepts. There are many excellent resources out there for getting up to speed with Azure Table Storage—the Microsoft documentation in particular being a great resource—but hopefully this blog post has given you some useful tips about how to translate some of those RDBMS concepts over.

Adrian (@AdaTheDev) is a Lead Development Engineer at SentryOne, focusing on the development of the Data DevOps SaaS product portfolio - namely SentryOne Document and SentryOne Test. Based in the UK, he joined SentryOne from Pragmatic Works in 2018, with 18 years experience in software development that includes a background in developing SaaS platforms in the hospitality and digital marketing industries. Adrian is a fan of good (bad) puns, dad jokes, and proper British biscuits.


Comments

New call-to-action