GCASR 2016‎ > ‎Presentations‎ > ‎

Cache in a Flash: Making cost-effective use of flash–based caching

Cache in a Flash: Making cost-effective use of flash–based caching
Fred Douglis (EMC)

Classic caching algorithms leverage recency, access count, and/or other properties of cached blocks at per-block granularity. However, for media such as flash which have performance and wear penalties for small overwrites, implementing cache policies at a larger granularity is beneficial. Recent research has focused on buffering small blocks and writing in large granularities, called containers, but it has not explored the ramifications and best strategies for caching compound blocks consisting of logically distinct, but physically co-located, blocks. Containers may have highly diverse blocks, with mixtures of frequently accessed, infrequently accessed, and invalidated blocks.
This talk provides a brief overview of two recent projects in EMC.  The first is Pannier, a flash cache middleware that provides high performance while extending flash lifespan. Pannier uses three main techniques: (1) leveraging block access counts to manage cache containers, (2) incorporating block liveness as a property to improve flash cache space efficiency, and (3) designing a multi-step feedback controller to ensure a flash cache does not wear out in its lifespan while maintaining performance. Our evaluation shows that Pannier improves flash cache performance and extends lifespan beyond previous per-block and container-aware caching policies.  
Second, we show the difficulty in extending Belady's MIN, the optimal offline algorithm normally used for the best-case comparisons,  to consider both containers and erasures.    We describe a set of metrics to evaluate trade-offs between hit rates and erasures, as well as heuristics to approach the optimal offline.  The offline algorithms have various objectives,  such as getting the fewest flash erasures  while still providing the maximal cache hit rate.
This is joint work with two former interns,  Cheng Li (Rutgers, now at VMware) and Yue Cheng (Virginia Tech), as well as Philip Shilane, Michael Trachtman, and Grant Wallace (EMC), Peter Desnoyers (Northeastern), and Kai Li (Princeton).  Pannier was presented at Middleware 2015 and the offline algorithm will be presented at USENIX ATC 2016.

Fred Douglis is in the Advanced Development group of EMC Core Technologies Division, in the office of the CTD CTO. He works on systems and storage technologies such as flash memory, deduplication, compression, load balancing, and others.  He holds M.S. and Ph.D. degrees in computer science from U.C. Berkeley and a B.S. in computer science from Yale.  He has worked in industrial applied research throughout his career, including Matsushita, AT&T (Bell) Labs, and IBM Research before joining EMC in 2009. He also has been a visiting professor at VU Amsterdam and Princeton University.  He served as editor in chief of IEEE Internet Computing from 2007-2010 and has been on its editorial board since 1999.  He is a member of the IEEE Computer Society Board of Governors from 2016-2018.