SharePoint Storage Optimization: Why Shredded Storage and Remote Blob Storage are Better Together

Post Date: 08/31/2016
feature image

Since Microsoft introduced Shredded Storage for SharePoint 2013, several myths arose about the best practices and use of the feature in combination with Remote BLOB Storage (RBS). Based on this, I already heard that RBS is a synonym of Shredded Storage. Moreover, I was facing the assumptions that due to small file chunk size, the RBS threshold will never be triggered and that Shredded Storage even makes RBS obsolete.

In this article, I will clarify the truth behind the myth and show best practice recommendations for how to configure Shredded Storage, RBS, and also both features together.

As the name states, RBS externalizes Binary Large Objects (BLOB). BLOBs are usually stored as an “image-file” in SQL content database with a size limit of 2 GB. Since External BLOB Storage (EBS) and eventually RBS were introduced, these files are saved in a Varbinary format. This file format does not actually require a size limit, but Microsoft continues to enforce this 2 GB limitation in SharePoint for performance reasons.

Reducing database size through externalization results in a better performance. This was a major goal of RBS because the SQL storage is very expensive and also very resource-intensive. Thus, it is a good idea to externalize large files into cheaper SAN, NAS, or other storage devices. Before the implementation of Shredded Storage, using RBS to externalize files larger than 1 MB worked well for many organizations. However, Shredded Storage introduced some new and improved features.

Goals and functionality of Shredded Storage

When developing the Shredded Storage, Microsoft focused on these main goals:

  1. Reduce Storage
    • With Shredded Storage, SharePoint stores only changes to a file, not the entire file.
  2. Optimize Bandwidth
    • Only parts have to be sent. This allows multi-user functionality, like co-authoring on Office documents (set one part to read-only, allowing other parts to be changed and quickly synchronized)
  3. Optimize File I/O
    • Smoother I/O Patterns
    • Ensure write costs are proportional to size of change
    • Put I/Ops from WFE to SQL and only changes create information in transaction logs (this even improves backup performance)
  4. Security
    • It is much more difficult to get a file out of the database using scripts.
    • Each chunk is encrypted separately.

But what exactly is Shredded Storage?

In simple terms, it is the splitting of a file into several small pieces called “chunks.” The “FileWriteChunkSize” defines the size of the file pieces, while adding them into SQL Server. However, not all chunks are the same size. For example, a Word document’s header and footer might be a different size than the file’s other chunks. This is because more information is required to rebuild the entire file out of all the small pieces. However, as soon as the first changes are made to a file with versioning settings enabled, the initial bigger size will be compensated for because only the changes need to be saved.

Based on several tests with different files sizes, a 1250 KB FileWriteChunkSize was identified as a best practice because it enables the fastest upload and download performance. Depending on your environment as well as type and size of your data, a larger shred size could be possible. Example: the Microsoft Cloud storage system OneDrive for Business uses a 2 MB shred size. (Further independent test results from Microsoft MVP Chris McNulty).

Goals and functionality of RBS

In terms of storage optimization, what does RBS do for you? The main goal of RBS is the externalization of files based on size or other properties. In the context of SharePoint’s limitations (for instance, the 200 GB limit for content databases), externalized data still counts. While the row entry in the SQL content database table (dbo.AllDocs) is still there, the file itself is removed from the database file and moved to a configured storage.

Shredded Storage and RBS - SharePoint Storage Optimization

When you enable the RBS provider per database, RBS system tables will be added to the content database. In these tables, especially in the mssqlrbs_resources.rbs_internal_blobs table, are where externalized data pieces are listed. When a user accesses a file, the RBS provider retrieves the metadata from the dbo.AllDocs table. It will also look for the externalized file via a mapped SharePoint Globally Unique Identifier (GUID) in the RBS related table and load then the file from the referenced storage location.

Although the externalized files will also be considered in the database size, you can easily achieve the requirements for extended content databases. For example, Microsoft supports (archive) content databases with up to 4 TB on a disk subsystem with a minimum 0.25 I/Ops per GB. Due to the externalization, there’s no real 4 TB database file. There’s maybe only 150 GB, so you can easily satisfy the high subsystem requirements with standard drives (recommended 2 I/Ops per GB).

Example calculation:

Effective data in der database Required I/Ops for the Disk Subsystem
4 TB (à 0,25 I/Ops * 1024 * 4) 1024
150 GB (à 0,25 I/Ops * 150) 37.5

 

If you’re aiming to realize this scenario and extend beyond the 200 GB threshold “virtually” with RBS, it’s no problem. However, consider that there could be situations in which the externalized data needs to go back into the database, e.g. for a migration, or if externalized files have to be included in your backup strategy. Solutions like AvePoint’s DocAve Software can help in these scenarios.

While externalization is the main goal of RBS, there are also some additional goals, which can be achieved:

  1. Performance

The RBS framework also provides a big performance gain. This is because the SQL algorithm is built for small data pieces across tables, but not for large objects. Although it sounds weird to load only the metadata from the content databases but provide the file itself from a connected storage, the RBS approach is really much faster compared to loading all together from SQL.

This is dependent on the subsystem for SQL, but also for RBS storage. The network performance could also influence the loading speed if the storage is connected differently than SQL. In order to load the BLOB from the storage and render it for the user, the browser needs to complete the following steps:

  1. DNS Lookup
  2. Initial Connection
  3. Waiting
  4. Receive data
  5. Close connection

The waiting for data is called TTFB (Time to First Byte). 100-200 ms are generally good values. However, if you are going to connect a NAS (Network Attached Storage) as RBS storage, Microsoft requires a TTFB of less than 20 ms.

  1. Deduplication

Another advantage of using RBS is that you can utilize deduplication on the storage. This means, if you have stored the same document in various libraries, the deduplication algorithm will identify it, store only one replica, but write an index on which locations the file was found. This saves additional space. In contrast to the scenario with Shredded Storage, when the same document is stored in several libraries, it will still be stored several times in the database.

Several tests showed the best achievable performance with an RBS threshold of 1024 KB. However, the independent results from Chris McNulty also point out, this value may vary in regards to file size and environment. Based on this, you need to know the RBS threshold has to be smaller than the FileWriteChunkSize, otherwise the externalization cannot be triggered. This applies to the real-time externalization. However, since the real-time processing is very resource hungry, Microsoft is no longer recommending this scenario. Instead, solutions like DocAve Storage Manager are a great alternative. DocAve does not consider the chunk size – it focuses on the real size of the entire file. In addition, further filters can be configured, such as externalization based on document version, last access time, custom columns, and document properties.

Shredded Storage and RBS – Better Together

Now that we’ve identified their differences, how can you use RBS in combination with Shredded Storage to take full advantage of the two features? To be able to answer this question, we need to identify whether or not both technologies are required. The following questions will help determine the answer:

Shredded Storage? – Answer with Yes:

  1. Are there many (concurrent) transactions?
  2. Do you use versioning?
  3. Do you have many Office files?
  4. Do you want to use co-authoring?
  5. Do you have I/Ops and network issues?

RBS? – Answer with Yes:

  1. Do you have many large files (>1MB)?
  2. Do you use versioning on those files?
  3. Do you have a SQL storage issue? (Content database larger than 100GB)
  4. Do you need to speed up Content database backups?
  5. Do you have the same files uploaded to different libraries and want to use deduplication?

Depending on your answers, you may decide on one, the other, or both technologies. If your company will benefit from both approaches, you will not only realize the advantages of each technology, but also achieve an additional performance boost. If you configured the two concepts with the mentioned best practices, you can achieve loading times up to 80 percent faster (see results from Microsoft MVP Chris McNulty).

Is the RBS technology obsolete when you have Shredded Storage in place? NO! Nevertheless, your SharePoint environment, its problems, and your business goals will determine which combination of technologies will bring the biggest benefit for your company.

Try DocAve Free for 30 Days

How much data do you create and store each year? Use our data storage calculator here.

Robert Mulsow is VP of TSP EMEA at AvePoint, and a Microsoft P-TSP. Together with his previous experience at Microsoft, he specializes in SharePoint infrastructure and peripheral technologies SQL, Windows Server and Active Directory. As a Microsoft MVP and Certified Trainer for Office servers and services, he brings extensive experience in the field of consulting, implementation and troubleshooting.

View all posts by Robert Mulsow
Share this blog

Subscribe to our blog

Fields with * are required