## Research Interests

In his research, Jeff Vitter seeks to exploit the rich interdependence between computing theory and practice, primarily in four key subfields dealing with big data and data science. He is perhaps best known as a founder of the field of **external memory algorithms**, which focuses on alleviating the I/O bottleneck between fast internal memory and slow external storage (such as disks). The goal is to design algorithms that exploit locality of reference and parallelism in order to reduce I/O costs, which is important in a variety of data-intensive applications. His **book on external memory algorithms** serves as a reference for the field. He has developed paradigms in several domains for efficient algorithms using external memory and hierarchical memory. His approach for utilizing parallel disks (in which communication with each disk can occur simultaneously) using the notion of read/write duality has led to state-of-the-art methods for sorting. He has contributed to algorithm engineering via the **TPIE system** (Transparent Parallel I/O programming Environment) developed by a former student.

A second key aspect of big data where Dr. Vitter plays a leadership role is **compressed data structures**. The goal is to operate directly upon compressed representations of data, yet still achieve fast response time. The so-called **wavelet tree** data structure he co-developed (not to confuse with wavelets discussed two paragraphs below) is an elegant structure for coding sequences of characters from a multicharacter alphabet; it has become a key component in modern indexing and compression. Until this century, fast data structures for text indexing (such as suffix trees and suffix arrays) required much more space than the data being indexed! Based upon a recursive decomposition of the suffix array, Dr. Vitter and colleagues invented the **compressed suffix array**, which is substantially smaller — the first fast index provably shown to use only linear space, and then later the first ever whose size per character was provably shown to be asymptotic (i.e., with constant of proportionality 1) to the higher-order entropy of the text. The index can even reconstruct the original text in a random access manner, and thus the original text can be discarded. The net effect is that the text can be completely replaced by an index structure that has the size of compressed text but can be queried quickly.

In a third aspect of big data, Dr. Vitter has done fundamental work on data compression for text, images, and video and is noted for his analytical bent. A provably efficient algorithm for **adaptive Huffman coding** bears his name. With a former student, Dr. Vitter developed and analyzed fast and practical methods for arithmetic coding. They invented the **FELICS** algorithm for lossless image compression; it was subsequently implemented in hardware as part of NASA’s Mars Reconnaissance Orbiter. It introduced a low-cost prediction framework that influenced algorithms ultimately adopted into the Lossless JPEG standard. In video compression, Dr. Vitter and his group proposed the paradigm of minimizing the combined measure of rate plus distortion to significantly improve motion estimation coding; **rate-distortion optimization** has since been incorporated into the H.264/MPEG-4 AVC standard’s reference encoder, used widely in the computing and communications industry.

Fourth, Dr. Vitter and collaborators were the first in the database and systems communities to apply wavelets and compression techniques as key tools for summarizing, approximating, and predicting data. Wavelets have since become heavily used in database optimization, data warehousing, data streams, image processing, and data mining. For his work on wavelets for approximating high-dimensional aggregates, he and his coauthor were the recipients of the 2009 **ACM SIGMOD Test of Time Award**, which recognizes the SIGMOD paper from 10 years earlier that has had the most impact in the following decade in terms of research, products, and methodology. Dr. Vitter has co-developed novel machine learning and prediction mechanisms based upon data compression, using the principle that the more compressible a sequence is, the more predictable it is. His universal prediction algorithms for online prefetching are provably asymptotically optimal (i.e., with constant of proportionality 1). They predict as well as special-purpose methods tuned to the characteristics of the sequence. His learning work includes algorithms for prefetching, caching, data streams, database query optimization, data mining, and power management in mobile computers.

Beginning with his thesis on **coalesced hashing**, a search method used widely in practice, Dr. Vitter has made many contributions to the **analysis of algorithms**, using mathematical analysis and asymptotics to derive precise estimates for resource requirements. He has developed randomized, parallel, and incremental algorithms for a variety of problems in computational geometry, combinatorial optimization, graphics, random sampling, and random variate generation.