Codes for correcting editing errors
Madhu Sudan (Harvard)
The ``edit distance'', also known as the Levenstein distance, between two strings X, Y over some alphabet Sigma, is the minimum number of insertions and deletions in X that will transform it to Y. For example if X = 010101010101 and Y = 101010101010, then their Hamming distance is 12 while the edit distance is 2 - since you can delete the first 0 in X and insert a 0 in the last position to get Y. In this talk we will consider the task of correcting editing errors, i.e., the task of recovering X, given a string Y obtained from the transmission of X followed by a "few" editing errors.
Coding theory has traditionally focussed mostly on correcting Hamming errors and till recently the theory of coding for editing errors was far behind. A recent series of works by Haeupler and Shaharasbi (with varying sets of authors) has completely transformed this picture, and brought us to the stage where the results subsume what we know for the Hamming setting, at least for large constant sized alphabets! (Of course the proofs also use the state of the art results in the Hamming setting to get there.)
In this talk, I will use a recent theorem (from a paper with Haeupler and Shahrasbi) as an excuse to talk about these developments. Our theorem shows that for every delta < 1, gamma < infinity and epsilon > 0, there is an constant q and an infinite family of q-ary codes of rate 1 - delta - epsilon which can be list decoded in polynomial time from delta fraction deletions and gamma "fraction" insertions. Depending on availability of time we will speak about the two main ingredients in this result: (1) The corresponding result in the Hamming setting (where we allow delta fraction deletions, but only allow same number of insertions in the same locations) which comes from the Folded-Reed-Solomon-Codes of Guruswami and Rudra and the many followups. (2) Synchronization strings, which are magical ingredients introduced by Haeupler and Shahrasbi that convert codes for Hamming settings into codes for editing errors.