The encoding scheme you’re talking about is Huffman Coding, and the ambiguity you’re trying to avoid explicitly occurs when one symbol is the prefix of another. The mechanism to build an optimal prefix(-free) code is called the Huffman Tree, and it uses a greedy algorithm that builds a full binary tree from the bottom up based on the frequencies of the symbols. Leaves are symbols, and the code for a symbol is the sequence of left or right branches you must traverse the reach that symbol.
To get more specific, you add all the symbols to a heap based on their frequency of appearance, lowest-at-the-front. You extract two symbols, foo and bar. These become children of a common parent node, the pseudosymbol baz. baz.freq = foo.freq + bar.freq. Add baz to the heap. Loop until the heap contains one node at the end of this process. That node is the root of the Huffman Tree.
The encoding scheme you’re talking about is Huffman Coding, and the ambiguity you’re trying to avoid explicitly occurs when one symbol is the prefix of another. The mechanism to build an optimal prefix(-free) code is called the Huffman Tree, and it uses a greedy algorithm that builds a full binary tree from the bottom up based on the frequencies of the symbols. Leaves are symbols, and the code for a symbol is the sequence of left or right branches you must traverse the reach that symbol.
To get more specific, you add all the symbols to a heap based on their frequency of appearance, lowest-at-the-front. You extract two symbols, foo and bar. These become children of a common parent node, the pseudosymbol baz. baz.freq = foo.freq + bar.freq. Add baz to the heap. Loop until the heap contains one node at the end of this process. That node is the root of the Huffman Tree.