Sunday, April 10, 2011

Symbolic Aggregate Approximation in Python

Back in 2003, several students from University of California - Riverside introduce a clever method for discretizing near-continuous time-series data: Symbolic Aggregate Approximation. Although this was developed nearly ten years ago, SAX has been the cornerstone of a number of indexing, clustering, and classification algorithms. (418 cites according to Google Scholar)

Recently, I came across a worthy time-series analysis tool on Github called https://github.com/nipy/nitime. To my surprise, there isn't much in the way of symbolic representation algorithms in the library yet, so I slinged myself some code. You can view the full commit here, but I generalized it into a Gist as follows:



This conveniently outputs: ['deeeedbaaaab', 'eedbaaaabdee'], given that our two datasets are sine and cosine, respectively.

3 comments:

  1. Hi
    How can I set up window size here!

    Also, it will be great if you could re-write the complete SAX matlab(timeseries2symbol.m, mis_dist.m etc.) code in Python!

    Thanks
    Maks

    ReplyDelete
  2. Hey Maks,

    The code shown here doesn't map the SAX algorithm to a sliding window. However, it's pretty dang easy to do so. You can use stride_tricks from numpy.lib to create a multi-dimensional array representing all the subsequences that one would get from running a sliding window through the data. At that point, you can simply map the SAX algorithm to each of those subsequences to extract out the symbolic representation for each of the subsequences. To see this in action, check out:

    https://github.com/slnovak/lcid/blob/master/lcid/distance.py#L105

    (Note that in this example, we're mapping a complexity function instead of the SAX function.)

    ReplyDelete