At the bottom of the data pyramid are raw sequences, including DNA (genomics or metagenomics), RNA, and proteins, as well as output from mass spectrometry for proteins and metabolites. These observations can be quite voluminous. For example, the Tara Oceans expedition produced 7.2 × 10bases of metagenomic sequence data (Sunagawa et al. 2015), and a metaproteomics study generated 5.9 × 10mass spectra (Georges et al. 2014). Although these data contain the most information, they are of little use unprocessed.