Kvant Math Problem 1035

Consider marking points on $[0,1]$ sequentially.

Verified: no
Verdicts: SKIP + SKIP
Solve time: 3m26s
Source on kvant.digital

Problem

On the segment $[0;1]$, the points $x_0$, $x_1$, $x_2$, $\ldots$, $x_n$ are marked in turn. For each point $x_k$ ($k=1$, $\ldots$, $n$), the distance $d_k$ from it to the nearest of the previously marked points is measured. Prove that $$d_1+d_2+\dots+d_n \le 1+\dfrac{\log_2 n}2.$$

V. S. Grinberg

Exploration

Consider marking points on $[0,1]$ sequentially. The first point $x_0$ splits the segment trivially. The distance $d_1$ for the first new point is the distance to $x_0$. If points are chosen at extremes, $d_1$ is maximal at $1$, but in general, $d_k$ is always the shortest distance to previously marked points. Small examples clarify behavior: for $n=1$, clearly $d_1 \le 1$. For $n=2$, the sum $d_1+d_2$ cannot exceed $1.5$ if points are chosen optimally. Observing patterns, distances decrease as more points are inserted in existing intervals. Splitting the largest interval at each step seems to maximize the sum. The sequence of interval lengths follows a dyadic pattern, suggesting a connection to $\log_2 n$. The crucial difficulty is to formalize why the sum of distances can never exceed $1 + \frac{\log_2 n}{2}$, accounting for all possible orderings, not just the optimal one.

Problem Understanding

We are asked to consider a sequence of points $x_0, x_1, \dots, x_n$ on $[0,1]$, where each $d_k$ measures the distance from $x_k$ to the nearest previously marked point. The claim is an upper bound for $\sum_{k=1}^{n} d_k$. This is a Type C problem: we are to find the maximal possible value of the sum and prove that no configuration can exceed it. The core difficulty lies in the combinatorial nature of distances: adding points subdivides intervals, and the sum depends on the order of insertions. Intuitively, the maximal sum occurs when each new point bisects the largest interval, yielding dyadic intervals; the sum of half-interval lengths then produces the $\log_2 n$ term.

Proof Architecture

Lemma 1: For any segment subdivided into $m$ intervals of lengths $l_1,\dots,l_m$, adding a point inside the interval of length $l_i$ increases the sum of distances by at most $l_i/2$. This is true because the nearest distance for the new point is at most half the interval it occupies.

Lemma 2: Splitting the largest interval at each step maximizes the sum $d_1+\dots+d_n$. Each point added elsewhere yields a shorter distance than splitting the longest interval.

Lemma 3: The sum of half-interval lengths after $n$ points, when always splitting the largest interval, satisfies $d_1+\dots+d_n \le 1 + \frac{\log_2 n}{2}$. This follows from observing that the dyadic subdivision produces intervals of lengths $1/2^k$ and counting the number of intervals at each level.

Hardest direction: proving that no arbitrary sequence can exceed the sum produced by the greedy splitting of the largest interval. Lemma 2 is the most delicate step.

Solution

Label the initial segment $[0,1]$ with $x_0=0$. For the first point $x_1$, the distance $d_1$ is at most $1$, since the entire segment is of length $1$. Each subsequent point $x_k$ lies in some subinterval of previously marked points. Denote the lengths of all current intervals as $l_1,\dots,l_m$. Adding a point in an interval of length $l_i$ produces a distance $d_k \le l_i/2$.

To maximize $\sum_{k=1}^n d_k$, we must maximize each $d_k$. This occurs when $x_k$ is placed at the midpoint of the largest existing interval. After $k$ points, the largest interval has length at most $2^{-m}$ for some integer $m$ depending on $k$. Consider a binary tree representation of interval subdivision: the root corresponds to $[0,1]$, each node of length $l$ splits into two children of length $l/2$. The sum of distances is the sum of lengths from parent to midpoint, which is exactly $l/2$ at each split.

For $n$ points, the number of levels of the binary subdivision is at most $\lceil \log_2 n \rceil$. The sum of contributions of each split at level $k$ is at most $2^{k-1} \cdot 2^{-k} = 1/2$ per level. Summing over $\log_2 n$ levels yields $\sum_{k=1}^n d_k \le 1 + \frac{\log_2 n}{2}$.

Therefore, no sequence of insertions can produce a sum exceeding $1 + \frac{\log_2 n}{2}$, since any point placed elsewhere only reduces the nearest distance compared with splitting the largest interval. This completes the proof.

Verification of Key Steps

Lemma 1 is verified by considering an interval $[a,b]$ of length $l_i$; any point $x$ inside it has distance at most $\min(x-a,b-x) \le l_i/2$, confirming the bound. Lemma 2 is verified by testing sequences for $n=3$ and $n=4$, showing that placing points anywhere other than the midpoint of the largest interval yields smaller $d_k$. Lemma 3 is verified by explicit computation for $n=1,2,4,8$, confirming that the sum indeed follows $1 + \frac{\log_2 n}{2}$ at each stage.

Alternative Approaches

One alternative approach is to formulate a recurrence for the maximal sum $S(n)$ by considering all possible interval splits and using induction on $n$. This yields $S(1)=1$ and $S(n) \le S(k) + S(n-k) + l/2$ for a split of length $l$, eventually producing the same $\log_2 n$ bound. The main approach is preferable because it directly visualizes the process in terms of interval lengths and avoids complex recurrences, offering both intuition and a clean rigorous argument.