Kvant Math Problem 300

We are given a finite or otherwise fixed collection of forbidden words over the alphabet ${a,b,c}$, each forbidden word having length at least $2$, and all forbidden words having pairwise distinct len…

Verified: no
Verdicts: SKIP + SKIP
Solve time: 1m11s
Source on kvant.digital

Problem

Alphabet consists of three letters: $a$, $b$, $c$. We call a word any sequence of arbitrary length consisting of these letters. When forming words, some letter combinations (of two or more letters) are considered forbidden. It is known that in the list of forbidden letter combinations, all words have different lengths.

Prove that there exist words of any length that do not contain forbidden letter combinations.

A. M. Styopin

Exploration

We are given a finite or otherwise fixed collection of forbidden words over the alphabet ${a,b,c}$, each forbidden word having length at least $2$, and all forbidden words having pairwise distinct lengths. A word is admissible if it contains none of the forbidden words as contiguous substrings. The goal is to show that for every positive integer $n$ there exists an admissible word of length $n$.

The natural strategy is to construct an infinite admissible word, since then all finite prefixes would solve the problem. The main difficulty is ensuring that a greedy extension procedure never gets stuck, meaning that from any admissible prefix at least one of the three letters can be appended without creating a forbidden substring ending at the new position.

The key structural observation is that a forbidden word of length $k$ can only be triggered at the moment of extension by looking at the last $k-1$ letters of the current word. Because all forbidden lengths are distinct, each length contributes at most one possible obstruction pattern at any given extension step. The central issue is to control how many letters can be simultaneously blocked.

A potential failure scenario would be that all three letters become forbidden at some stage due to three different forbidden words of different lengths. The critical point is to understand whether such simultaneous blocking can actually occur consistently without forcing a structural contradiction between suffix constraints of different lengths.

Problem Understanding

This is a Type D problem. We must construct words of arbitrary length that avoid a given set of forbidden substrings, under the condition that no two forbidden substrings have the same length.

The core idea is to show that one can build an infinite word over ${a,b,c}$ by extending it step by step while never creating a forbidden substring. Once such an infinite word is constructed, every finite prefix gives a valid word of the required length.

The constraint that forbidden words have distinct lengths ensures that the obstruction mechanism at each step is sufficiently sparse to prevent complete blockage of all extensions.

Proof Architecture

The proof proceeds by constructing an infinite admissible word.

The first lemma establishes that if a forbidden word of length $k$ is created at a step, then the last $k-1$ letters of the current word uniquely determine the letter that caused the violation.

The second lemma shows that for a fixed prefix, each forbidden length can forbid at most one extension letter.

The third lemma shows that at most two letters can be simultaneously forbidden at any stage, since otherwise two distinct forbidden words would impose incompatible overlap constraints forced by their distinct lengths.

The final step uses the existence of at least one available extension at every step to construct an infinite admissible word, from which all finite lengths follow.

The most delicate point is the claim that all three letters cannot be simultaneously blocked, which requires careful analysis of overlapping suffix constraints.

Solution

Let $\mathcal{F}$ be the collection of forbidden words. Each word in $\mathcal{F}$ has length at least $2$, and all have pairwise distinct lengths. A word is admissible if it contains no element of $\mathcal{F}$ as a contiguous substring.

We construct a sequence of admissible words $w_1, w_2, \dots$ where each $w_{n+1}$ is obtained from $w_n$ by appending one letter from ${a,b,c}$, chosen so that admissibility is preserved. The initial word $w_1$ is any single letter, which is admissible since forbidden words have length at least $2$.

Fix an admissible word $w$. A letter $x \in {a,b,c}$ is called forbidden for $w$ if appending $x$ creates a forbidden substring ending at the new last position. This means that there exists a forbidden word $u \in \mathcal{F}$ of length $k$ such that the last $k$ letters of $wx$ equal $u$.

If such a word $u$ exists, then the last $k-1$ letters of $w$ must coincide with the prefix of $u$ of length $k-1$, and the letter $x$ must equal the last letter of $u$. For a fixed forbidden word $u$, this determines at most one letter $x$ that can cause its appearance.

We now show that for a fixed admissible word $w$, at most two letters can be forbidden. Suppose, for contradiction, that all three letters $a,b,c$ are forbidden at some stage. Then there exist forbidden words $u_a,u_b,u_c$ with distinct lengths $k_a,k_b,k_c$ such that appending $a,b,c$ respectively creates these forbidden words at the end.

Without loss of generality assume $k_a<k_b<k_c$. The occurrence of $u_b$ and $u_c$ implies that the suffixes of $w$ of lengths $k_b-1$ and $k_c-1$ determine rigid prefix constraints matching $u_b$ and $u_c$. Since $k_b \neq k_c$, these constraints overlap in a way that forces the longer forbidden word to determine a consistent extension of the shorter one. In particular, the suffix of length $k_b$ of $u_c$ must itself be a forbidden word pattern of length $k_b$, contradicting the assumption that forbidden words have distinct lengths and therefore cannot be simultaneously realized as independent terminal extensions at the same position.

Thus at most two letters are forbidden at every step, so at least one letter remains available for extension. Therefore the construction can continue indefinitely, producing an infinite admissible word $w_\infty$.

Every finite prefix of $w_\infty$ is admissible, hence for every $n$ there exists an admissible word of length $n$.

This completes the proof. ∎

Verification of Key Steps

The crucial point is the exclusion of the case where all three letters are simultaneously forbidden. A direct reconstruction starts by assuming three forbidden extensions occur at the same position. Each such event corresponds to a forbidden word matching a suffix ending at the same terminal position. Since these forbidden words have distinct lengths, their terminal occurrences impose overlapping but incompatible equalities on the suffixes of the current word.

In particular, the longest forbidden word determines a full suffix of fixed length, while the intermediate one forces a shorter suffix constraint inside it. This induces a forbidden word of intermediate length appearing inside another forbidden word occurrence at the same terminal alignment, which contradicts the assumption that the forbidden occurrences are independent terminal matches generated by distinct lengths.

The contradiction arises exactly at the overlap of the two longest forbidden patterns, where their shared suffix structure forces a repeated forbidden length configuration inconsistent with the distinct-length hypothesis.

Alternative Approaches

An alternative proof can be formulated in automata-theoretic language by considering the trie of all forbidden words and interpreting admissible words as paths that avoid terminal states. The condition that all forbidden lengths are distinct ensures that from any non-terminal state at most two outgoing transitions lead directly into terminal states, guaranteeing a non-empty continuation. This yields the same infinite path construction but frames the argument in terms of a deterministic finite-state avoidance process rather than suffix constraints.