Topics In Demand
Notification
New

No notification found.

Blog
Computational Storage: Potential Benefits of Reducing Data Movement

November 26, 2020

150

0

Talk of the Storage town today is Computational Storage.

In the previous blog, we saw the evolution of storage architectures and emerging storage architectures. And one of the widely talked topics is computational storage. If you would have got a chance to attend SNIA SDC USA 2020, you would have seen the inclination of the session and discussions towards this trending storage architecture.

In yet another previous blog, we saw the 5Ws that you need to know to understand everything you need to know about Computational Storage. We saw that Computational Storage is providing higher capacity solutions with lower power consumption, as we use distributed processing. As a result, Computational Storage provides improved efficiency and performance.

In one of the keynote sessions at SNIA SDC USA 2020, JB Baker from ScaleFlux talked about the advantages of reducing data movement in Computational Storage. He categorized the advantages into two – Saving Time and Saving Money.

  1. Saving Time: With growing storage media, interfaces, and networks, and speeding bandwidth, data movement is becoming sluggish. Moving Tera Bytes and Zetta Bytes of data to perform mission-critical tasks such as transactional processing, big data analytics, and machine learning, can become time-consuming and reduce efficiencies.
  2. Saving Money: Supporting infrastructure to handle all this massive data movement requires consistent investment, creating a lot of challenges for all those managing the data.

So, reducing data movement might help reduce processing time and infrastructure costs. This can be a boon for the IT department and data center architects. Further, Baker gave an example of utilizing Computational Storage to reduce data movement. He said that it would be through a Data Filtering Computational Storage Service (CSS). Let’s dig deeper into the example and results explained by Baker.

Let’s take a 12TB data set that represents all the transactions, worldwide purchases that happened over the past several years. Let’s say a data analyst needs to run a query that covers just 4 months in 2016, rounding up to only 100 GB of data relevant to this query, which is <1% of the entire data set. With ordinary storage, we might have to take all 12TB and push it up through the CPU, which is an invitation to the bottleneck up there to do that filtering and then complete the query.

Instead, if we implement data filtering CSS down at the drives, it filters out the relevant data before it even leaves the drive. So we will have to move only that 1% of data relevant to the query across the PCI bus and to the CPU. This will reduce the total data movement by 99% in this case and even reduce the data processing by the CPU to finish the query, resulting in a faster query completion time. This enables more queries to run in parallel and scale more rapidly. Baker also supported the theory with the practical implementation of the above example by measuring the data movement, CPU utilization, and query completion time for ordinary storage and data filtering CSS.

The results were as below:

Ordinary StorageData Filtering CSS
Data MovementHigh bandwidth for a very long period to move massive dataHigh data movement, but for a very short period
CPU UtilizationExperienced bottleneckCPU scaled nicely due to less data movement
Query Completion TimeSlower query completionRapid query completion (2-4 times faster than the usual)

This example clearly explains the potential benefit of using Computational Storage and Data Filtering Service over ordinary storage.


That the contents of third-party articles/blogs published here on the website, and the interpretation of all information in the article/blogs such as data, maps, numbers, opinions etc. displayed in the article/blogs and views or the opinions expressed within the content are solely of the author's; and do not reflect the opinions and beliefs of NASSCOM or its affiliates in any manner. NASSCOM does not take any liability w.r.t. content in any manner and will not be liable in any manner whatsoever for any kind of liability arising out of any act, error or omission. The contents of third-party article/blogs published, are provided solely as convenience; and the presence of these articles/blogs should not, under any circumstances, be considered as an endorsement of the contents by NASSCOM in any manner; and if you chose to access these articles/blogs , you do so at your own risk.


Calsoft is ISV preferred product engineering services partner in Storage, Networking, Virtualization, Cloud, IoT and analytics domains.

© Copyright nasscom. All Rights Reserved.