# Suffix Automaton Study Note

This is a review & note for studying suffix automaton.

### What is a Suffix Automaton?

In computer science, a

suffix automatonis the smallest partial deterministic finite automaton that recognizes the set of suffixes of a given string(computer_science).

The above definition is a short excerpt from Wikipedia about suffix automaton. From the definiton of deterministic finite automaton, we let $\mathrm{trans(s,ch)}$ be the state transferred from current state by adding a new character $\mathrm{ch}$; we let $\mathrm{null}$ be a empty state; we let $\mathrm{trans(s,str)}$ be the state transferred from current state by adding a new string $\mathrm{str}$. Then, we know that we can obtain $\mathrm{trans(s,str)}$ by adding $\mathrm{ch_i}\in\mathrm{str}$ one by one into the automaton.

From the definition on Wikipedia, it is obvious that a suffix automaton can recognize all suffixes of a given string. The word recognize is defined as a existence of a path in the automaton that the word formed by the path, by adding the characters on the path, is a suffix. Meanwhile, the suffix automaton can also be used to recognize all substrings.

### Important Properties of Suffix Automaton

A obvious way is to insert all of the suffixes into a Trie tree. But sometime it may take $\mathrm{O(N^2)}$ space. Lets consider how to simplify the states (reduce the number of vertices). We denote $\mathrm{ST(str)}$ the state / vertex $\mathrm{trans(init,str)}$, where $\mathrm{init}$ is the initial state. Obviously, if $\mathrm{str}$ is not a substring of the original string, then $\mathrm{ST(str)} = \mathrm{null}$, since no suffix will be formed by adding character after the vertex.

Shall we build nodes for all substrings? The anwser is also **NO**, since the total number of states will also be $O(N^2)$. Lets denote the set of end position, which is the occurence index of substrings in the original string, for each vertex as $\mathrm{right(v)}$. Then, we can see that if $\mathrm{right(a)} = \mathrm{right(b)}$, then $\mathrm{ST(a)} = \mathrm{ST(b)}$, this simplifies the overall states. Let the length of substrings in the vertex be in range $[\min(\mathrm{v}),\max(\mathrm{v})]$. Consider two different state $\mathrm{a},\mathrm{b}$ and their right set $\mathrm{Ra},\mathrm{Rb}$. Then, the following property holds:

**Property 1**: If $\mathrm{Ra},\mathrm{Rb}$ are intersecting then one set is another’s proper subset.

Lets define a parent tree. where $\mathrm{fa} = \mathrm{Parent(v)}$. Let $\mathrm{right(fa)} = \bigcup_{\mathrm{v}\in \mathrm{child(fa)}}\mathrm{right(v)}$. We also have following properties.

**Property 2**: $\max(\mathrm{Parent(v)}) = \min(\mathrm{v}) – 1$.

**Property 3**: $\mathrm{trans(a,c) = b}$, then an edge from $\mathrm{a}\rightarrow \mathrm{b}$ marked with character $\mathrm{c}$.

**Property 4**: If $\mathrm{ST(s)}\neq \mathrm{null}$, then $\mathrm{s}$ must be a substring of the original string. Otherwise, $s$ is not a substring.

**Property 5**: The string denoted by $\mathrm{Parent(v)}$ is a suffix of the string denoted by $\mathrm{v}$.

**Property 6**: If $\mathrm{trans(v,ch)}\neq \mathrm{null}$, then $\mathrm{trans(Parent(v),ch)} \neq \mathrm{null}$.

**Property 7**: Within a node $\mathrm{v}$, the lengths of strings denoted by the vertex are in range $[\min(\mathrm{v}),\max(\mathrm{v})]$, and all of the lengths of strings denoted by the vertex within such range exist.

To be added on…