It's not just PURE StorageDecember 1, 2014
Pure Storage presented at Storage Field Day 6 and I had the opportunity to visit their Headquarters for a second time to discuss their technology. I’ve written about “Pure” before after they presented at Virtualizaton Field Day 3 back in February but it was based more around their “Forever Flash” services. This time I was more interested in their architecture and found that their company name “Pure Storage” may be a bit misleading. Everyone knows that they produce arrays that are all SSD and provide tons of IOPS and low latency, blah blah blah. These arrays are far from being just a device full of fast storage. There is a lot of know how based on SSD architecture that has been put into this array to get more out of it than just fast drives.
All travel expenses and incidentals were paid for by Gestalt IT to attend Storage Field Day 6. This was the only compensation given.
Let’s take a look at their write IO flow to see some information about what makes “Pure Storage”, more than just pure storage.
- When blocks are sent by a host to be written to the Pure array, 8 bit granularity pattern removal is used to reduce the amount of data entering NVRAM. This pattern removal includes zeros. This pattern removal usually happens when a virtual disk is trying to be eager-zeroed.
- The data is broken into 32KB chunks which are checksummed and written to at least half of the NVRAM modules to obtain quorum.
- The write is acknowledged to the host and the data reduction process begins.
- Next, 512-byte granularity Inline Deduplication takes place. Dedupe uses a hash table to identify potential duplicate blocks but a binary comparison is done before updating metadata. The hash table is not used as a trusted source, but rather a good place to start looking for matches. On the off chance that the controllers are very heavily utilized, this process is skipped to ensure high performance.
- Inline Compression is done next to compress the deduped blocks in NVRAM before writing them to SSD. The Lempel–Ziv–Oberhumer (LZO) algorithm is used for compression. Not everything is compressed here though, only blocks that would have moderate or higher cost savings will be compressed.
- Parity is calculated for the RAID 3D Algorithm to protect the data while on SSD
- The write is flushed from NVRAM to the SSDs.
From the write IO Flow you can see that a lot of effort was put into making sure that the number of writes to solid state disks is minimized to prolong the life of the drives, but not at the expense of performance. Notice that the process may skip over Dedupe if there is too much activity.
Pure has a pair of background processes that help take care of any additional blocks that need attention. A background deduplication process is used to catch any blocks that might have been skipped during the inline process. As well, a second compression algorithm is applied to further reduce the space used.
Another process that happens in the background is monitoring disk writes. Some solid state disks behave inconsistently when both reads and writes are taking place simultaneously. The performance of the read, the write or both can be affected in this situation. Pure watches for these types of scenarios and to avoid these possible performance bottlenecks, will rebuild the parity on blocks to move the data to different drives. Pretty Cool! RAID being used as a function of performance, instead of just be a penalty for availability.
Pure Storage is obviously fast, I mean it’s an all flash array, but this isn’t just a storage appliance with solid state disks in it. It really was built with SSDs in mind and is more than pure storage, it’s pure storage with intelligence.
Check out some other posts from Storage Field Day 6 about Pure Storage from the other delegates.