Sunday, April 10, 2011

Symbolic Aggregate Approximation in Python

Back in 2003, several students from University of California - Riverside introduce a clever method for discretizing near-continuous time-series data: Symbolic Aggregate Approximation. Although this was developed nearly ten years ago, SAX has been the cornerstone of a number of indexing, clustering, and classification algorithms. (418 cites according to Google Scholar)

Recently, I came across a worthy time-series analysis tool on Github called https://github.com/nipy/nitime. To my surprise, there isn't much in the way of symbolic representation algorithms in the library yet, so I slinged myself some code. You can view the full commit here, but I generalized it into a Gist as follows:



This conveniently outputs: ['deeeedbaaaab', 'eedbaaaabdee'], given that our two datasets are sine and cosine, respectively.