An Introduction to B-Tree and Hash Indexes in PostgreSQL
This article explores the PostgreSQL implementation of the B-Tree (the B stands for Balanced) and hash index data structures. As PostgreSQL grows in popularity as an open-source database system for developers and as a target for migrating from Oracle workloads, understanding how PostgreSQL indexes work is extremely important for database developers and administrators. PostgreSQL has several other types of indexes, such as GIN indexes, GiST indexes, and BRIN indexes. I will omit them for this article as they’re somewhat specialty indexes suited for text-based searches, geography, and other complex data types. And while B-Tree index usage makes up roughly 90% of use cases, hash indexes and their concepts are also important to understand.
Understanding and implementing the correct indexes for the workload is the foundation of any well-running relational database system. Adding and adjusting indexes to suit the workload has yielded some of the most significant performance gains over my many years of consulting. However, to add the right indexes, you must first understand them.
In PostgreSQL, all tables are heap tables, so no order is maintained by the relational engine for the table. There isn’t the ability to define a clustered index to maintain the order of the data in the table physically by a key. Because the data in the table is unordered, it can make certain types of scan operations less efficient. However, PostgreSQL does have the concept of the CLUSTER command for a table. When the CLUSTER command is executed against a table, the data in the table is physically ordered based on the keys of the secondary index (more on those in a bit) you provide. However, the engine doesn’t maintain this physical ordering after the CLUSTER command has been executed—new rows are added to the first location in the table where the row fits. The CLUSTER command will need to be run periodically to reorder the table if you determine your workload access patterns benefit from such ordering for scan purposes.
Are you thinking about or currently migrating to Postgres from Oracle? Watch this video for helpful tips from PostgreSQL experts Ash Recksiedler and Robert Mandeville. For a deeper dive into how to optimize PostgreSQL migration projects, read this blog about the benefits of using Wait Time Analysis.
Data and Index Pages
Every table and index in PostgreSQL is made up of an array of pages. A page is a data structure that exists to store table records or index pointers. A database page in PostgreSQL is generally 8KB in size, but this can be changed when compiling the server. Because a table isn’t ordered in PostgreSQL, when a row is to be inserted into a table, the row is inserted into the first page able to hold the row. To do this, PostgreSQL keeps track of how full each data page is through a data structure known as a Free Space Map (FSM). The FSM data structure allocates one byte per page with the purpose of that byte keeping track of how full the page is. If there are no pages in the table with enough free space to store a record of data, a new page is allocated in the table to store it.
Every index in PostgreSQL is a secondary index—a data structure stored separately from the heap table structure, with some type of pointer into the heap table. The two types of indexes I’m going to focus on today are the B-Tree index and the hash index. The B-Tree index is a very commonly used database index structure that allows for high-speed searching and sorting of data with minimal storage overhead for the index. Hash indexes are single-column indexes storing the 4-byte results of a hash algorithm of the index key. The hash value maps to a bucket storing a pointer to the row in the heap table. I’ll explain the advantages and drawbacks of the hash index a bit later.
Let’s create a test database for our demos. For the GUI, I’ll be using DBeaver Community Edition—an excellent interface for developing and administering a myriad of different database instances. For the PostgreSQL database, I’ll be using Azure Database for PostgreSQL Flexible Server. Azure makes it easy to spin up a PostgreSQL database server so I can run my demos and then tear down the instance quickly, with minimal costs incurred.
The first step is to create the test database:
And then switch to the sqlskills database context and create a table named numbers:
I’ll use the PostgreSQL function generate_series to quickly generate a list of 5000 numbers and random string values to insert into the numbers table. Notice that I’m generating a random number in the ORDER BY statement to insert the data into the table randomly.
Now when I run a SELECT against the table, the data is returned in random order:
To show the CLUSTER command in action, I must first create a B-Tree index on the table using the CREATE INDEX command. The index can be specified as ASC or DESC, with ASC being the default. This B-Tree index is ordered by the numbercol column in the numbers table:
And then running the CLUSTER command, giving the index name to use, physically orders the contents of the table:
Now running a simple SELECT without an ORDER BY shows the data from the table being returned in the same ordering as the index I just created:
The index in the previous example is a B-Tree index (the B stands for Balanced). B-Tree indexes are the most common and beneficial data structure across relational database management systems (RDBMSes). There are two objectives behind a B-Tree index. The first and primary goal is to enable finding records rapidly and efficiently instead of having to perform a sequential table scan. The second ancillary goal is to enable the quick sorting of data. To achieve both these goals, a B-Tree index stores the data it contains in sorted order and has a search tree.
The image below is a high-level overview of a B-Tree index. For our purposes, let’s assume this B-Tree is storing the data from the idx_numbers_numbercol index on the numbers table I created in the previous example.
At the top of the index is the “root” page. This is a fixed metadata page containing pointers to other pages based upon the metadata stored. Consider the following query:
To traverse this B-Tree index to find the value 2500, the root page is first interrogated. The root page contains the first clue in the map to find the requested value. The value 2500 is greater than the value 2001, so the root page points the search to the right non-leaf (intermediate) level page in the B-Tree structure, as shown below.
At the non-leaf levels of the B-Tree index are “index pages” containing pointers to either the next non-leaf index level in the tree or to the leaf level of the index. At the non-leaf levels, the index pages are implemented as doubly-linked lists, which maintain the logical ordering of the index. For this search, the value 2500 is between 2001 and 3001, so the next page to go to is the leaf-level page containing 2001 to 3000, as shown below.
Now the search has reached the leaf level of the index. For secondary indexes, the leaf level contains the index key(s), any non-key-included columns defined in the index, and a pointer to the record in the heap table. It’s also worth noting pages at the leaf level are in a doubly-linked list, supporting previous and next page lookups as well as bi-directional sorting.
For the above query, it was only necessary for the PostgreSQL engine to examine three pages to return the data requested. We can easily verify this using the EXPLAIN command. In PostgreSQL, the EXPLAIN command is used to view the execution plan of a SQL statement. Going into detail regarding EXPLAIN is beyond the scope of this article; however, I’ll show you how to view the execution plan for a statement and use the BUFFERS option to see how many shared buffers (in-memory data pages) were touched for a given statement.
To view the execution plan, I’ll type the EXPLAIN keyword before the statement I want to view the execution plan for, as shown below:
After EXPLAIN, the ANALYZE keyword tells PostgreSQL to execute the statement and to include the number of shared buffers touched to return the data. PostgreSQL has a large portion of memory set aside to store data and index pages for queries. Data pages must be brought from disk into this shared memory pool before the data is returned to the end user. As a rule, the fewer shared buffers it takes to return data to the user, the faster the query is.
Viewing the query plan below, we can see the idx_numbers_numbercol index was used to return one row, and three shared buffers were touched to return our data. Those three shared buffers are the reading of the root page, the non-leaf page, and the leaf page to return the data. Very cool.
A hash index implements a variation of a hash table data structure where a hashing function takes the index key value, produces a 4-byte signed int value (32-bit) representing the key value, and stores the hashed value in something known as a bucket along with a pointer to where the record exists in the heap table (essentially a set of tuples). Prior to PostgreSQL version 10, hash indexes didn’t perform well under a high-concurrency workload and didn’t adhere to the PostgreSQL Write-Ahead Logging protocol, meaning they would often be corrupted in the event of going through crash recovery. However, since these hash index fixes were made, they’re safe to use and, in some cases, can outperform B-Tree indexes.
Before I show some demo code for creating a hash index, let’s first look at how a hash index is logically implemented (this isn’t the actual implementation—merely a logical look). For this, assume I have a FirstName field that stores text values. PostgreSQL has a hashtext function for taking a string value and outputting a deterministic integer value for the string. In this case, my first name maps to the value 403,565,329. In terms of how hash indexes work, think of this hashtext as the first part of the hash algorithm.
The second part of the hash algorithm is to map the output of the hash function to a bucket. This bucket structure maps the hash code values to actual table rows. Buckets are implemented as data pages in PostgreSQL, and depending on the size of the table, there may be several different hash values mapped to the same bucket—a common scenario known as a collision. If there are enough collisions to fill the bucket page entirely, then an overflow page is allocated to store additional hash values and the location in the table for the hashed row value. In this example, I’m using the PostgreSQL mod function to output a modulus of 10 (0 through 9 remainder values) of the hashed value of my first name:
This hash value maps to the bucket of 9.
Let’s look at how to create a hash index. I’ll reuse the table schema I created for B-Tree indexes.
To create the hash index, I must use the “using” clause of the CREATE INDEX statement to specify the index is a hash index and then specify the key column in parentheses. I didn’t need to include the using clause when I created the B-Tree indexes earlier, as the B-Tree index is the default index for the CREATE INDEX syntax.
Next, I’ll query the numbers table and look for the key value 2500. When this query is executed, the hash algorithm is applied to the value 2500 to determine the bucket in the hash index pointing to the row in the table.
Limitations of Hash Indexes
In the previous example, I looked at how to create a hash index to find a single record in a table using an equality comparison in the query’s WHERE clause. I had to use a simple example because of how limited hash indexes are in their use cases. Hash indexes only support equality comparisons when performing lookups. This is because of how the data structure is implemented—the hash algorithm operating on the index key produces and stores a single 4-byte integer value. Any time a lookup is done, the same hash algorithm must be executed against the value being searched to retrieve the row pointer from the bucket in the data structure. Because of this, there’s no ordering to the data structure. The only way to search for a range of values is to read through the entire table to find the values, in which case the hash index wouldn’t be used at all. Hash indexes in PostgreSQL also only support a single column key, so having a multi-column index as a hash index isn’t an option. Hash indexes also don’t have the ability to enforce uniqueness constraints.
Hash Index Advantages Over B-Tree Indexes
While there are many restrictions on hash indexes, they do have some advantages over B-Tree indexes. The first advantage is the hash index can locate the key values being searched for without the need to traverse any type of tree data structure. This can be advantageous for large tables because of lowered I/O to find the necessary records.
Another advantage hash indexes have over B-Tree indexes is key size. Because B-Tree indexes store the actual key values in the index structure, the index can grow large. B-Tree indexes also have limitations on the length of the key they can store. Hash indexes, on the other hand, do not store the actual key value, only the 4-byte signed hashed value of the key.
One last advantage hash indexes enjoy over B-Tree indexes is hash index size isn’t affected by how selective the index key value is.
Using B-Tree and Hash Indexes
B-Tree indexes are generally the index of choice for most implementations in PostgreSQL as they allow for the quick searching and sorting of data, have little overhead, and are great with inequality lookups. Hash indexes are data structures designed to return single record lookups with equality searches. Because of the many limitations of hash indexes and the fact they only recently became transactionally aware, these indexes aren’t widely adopted in most PostgreSQL production databases. However, because they optimize single-record lookups, they can be great in production environments relying heavily on fast single-record searches.
Do you have a production environment use case where a hash index would be a good choice versus a traditional B-Tree index structure? Have you used other types of indexes in PostgreSQL, such as BRIN, GiST, or GIN? If so, please comment below; I’d love to hear from you!
Learn more about SolarWinds Postgres performance tuning solutions here.
Paul S. Randal is the CEO of SQLskills.com, which he runs with his wife Kimberly L. Tripp. Both Paul and Kimberly are widely-known and respected experts in the SQL Server world, and both are long-time SQL Server MVPs. Paul was a Contributing Editor for TechNet Magazine, where he wrote the bi-monthly SQL Q&A column and feature articles. He also had #1 top-rated workshops and sessions at the PASS Summit and TechEd. Paul is active in the SQL Server community, from user groups to online forums to helping out on Twitter (@PaulRandal – check out the #sqlhelp tag). His popular and widely-referenced blog can be found at https://www.sqlskills.com/blogs/paul/ and he can be reached at firstname.lastname@example.org.