Sparse signals estimation for adaptive sampling

This paper presents an estimation procedure for sparse signals in adaptive setting. We show that when the pure signal is strong enough, the value of loss function is asymptotically the same as for an optimal estimator up to a constant multiplier.


Introduction
The problem of sparse signal estimation is well studied, and optimal results have been established for various models (see, e.g., [1] and references therein).However, most existing approaches to the estimation assume non-adaptive sampling process.Adaptive sampling, on the other hand, leads to significant performance gains as well as sharper estimates of the signal.In this paper we propose a procedure for pure signal estimation under the sequential sampling framework introduced in [3].
Consider the following classical signal model where ε i are independent standard normal random variables, and the vector µ = (µ 1 , . . ., µ n ) has most of its coordinates equal to zero.The problems of 235 identification of non-zero components as well as the issues of their estimation from the observations y 1 , . . ., y n find applications in genetic microarray data analysis [6], astronomical surveying [5], and many other fields of science (see, e.g., [7]).A widespread approach to these problems is thresholding.The idea is to assume µ i to be zero if the value of the corresponding observation y i does not exceed some threshold, which may be chosen to depend on the whole vector y (e.g., [1]).
In [3], the authors consider a generalization of (1) that allows for sequential sampling procedures.In the proposed model one may construct multi-step algorithms such that the estimations carried out at the step j may depend on the results of all the preceding steps.Additionally, the authors introduce an algorithm called Distilled Sensing which helps to identify almost all of the zero and non-zero components correctly.Several results on the optimal estimation under adaptive setting for the case of two experiments have been also established in recent work [4].
While most existing papers on the topic are devoted to identification of nonzero components of the signal, we focus on its estimation.In the setup with an unlimited number of experiments, we suggest a way to estimate a sparse vector µ by slightly modifying the Distilled Sensing algorithm.We investigate the precision of the resulting estimates and derive sufficient conditions for their asymptotic optimality.

Adaptive scheme
We consider the model introduced in [3].In contrast to the classical setup (1), there are k ∈ N noised observations of the signal µ ∈ R n .To each observation j = 1, . . ., k there correspond some non-negative amount of energy E j and a random set I j ⊆ {1, . . ., n}.The observations have the form where ε ij are independent standard normal random variables and Furthermore, the total amount of energy is limited, namely the set of all non-zero components of µ.We focus on the case of sparse signals when S n (µ) constitutes a very small share of n; throughout this paper it is assumed that S n (µ) ≤ n 1−β for all n large enough and some β from (0, 1/3).Since it is not known beforehand which components of µ are not zero, the value of loss function (in the sense of ( 7) below) in model ( 1) exceeds n 1−β .Moreover, reasonably precise estimations of the set S n (µ) require additional assumptions on the vector µ (see, e.g., [3]).On the other hand, under adaptive sampling these assumptions may be significantly relaxed.We will restrict our attention to the signals µ from the set where r > 0.
In our approach to pure signal estimation we choose the energy levels slightly differently than it is done in [3].Fix δ ∈ (0, 1) and set ( We follow the Distilled Sensing algorithm (DS) introduced in [3].It is assumed that j = 1, . . ., k are sequential experiments, and ϕ ij corresponds to the energy spent on suppressing the noise for the i-th component on the j-th step.For the given E j the sets I j are determined as follows.
Step j: Determine ϕ ij as in (3) above, i = 1, . . ., n. Result: Obtain the observations y ij and a finite sequence of sets I j .
The authors of DS use I k+1 as an estimate for the set S n (µ) of non-zero components for µ.In this paper we estimate the vector µ itself.Namely, as an estimate of µ take the vector μ with components where i = 1, . . ., n.Now consider a loss function where It should be noted that in non-adaptive settings of (1) an equivalent loss function takes much larger values.In particular, asymptotic losses are of order C(β)n 1−β log n when µ has approximately n 1−β non-zero components (see [6]).Furthermore, the losses of the order n 1−2β are in a certain sense asymptotically optimal when |S n (µ)| ∼ n 1−β as n → ∞.This optimality is studied more closely in Section 3, and the proof of Theorem 2.1 is provided in Section 4.

Optimal estimation
Proposition 3.1 Assume that ( 2) is satisfied and that the set of non-zero coordinates S n (µ) is known beforehand.Whatever the choices of k(n), E j , and I j are, there exists no linear estimator μ(y ij ) for µ such that expected losses for it are less than |S n (µ)| 2 /n for each n ∈ N.

Proof
Suppose that some k, E j , and I j have been chosen.Their choice determines the coefficients ϕ ij according to (3).Fix i ∈ {1, . . ., n}.Since the observations corresponding to µ i are of the form Since all zero coordinates are known in advance, for i / ∈ S n (µ) the best estimate is μi = 0. Thus one has The infimum on the right-hand side can be evaluated by solving the underlying optimization problem with Lagrange multipliers method.Its solution implies that

Loss under distilled sensing
Proof of Theorem 2.1 As described in the section 2, for k ∈ N the sets I j are chosen according to the DS algorithm, and the estimate μ is defined by (6).The loss function ( 7) can be rewritten as follows.
Here, the first sum is taken over the coordinates of µ falsely considered to be non-zero, the second sum is taken over correctly identified non-zero coordinates, and the third sum is taken over the coordinates misidentified as zeroes.We denote these loss components R 1 , R 2 , and R 3 respectively and treat them separately.Estimation of R 1 . Let The following result holds.
By the Hoeffding inequality with probability 1 we get where I{A} is the indicator of an event A.
Note that by the choice of k one has Moreover, the sequence {s j } k j=1 is decreasing, therefore, {s k > n 1−2β } ⊆ {s j > n 1−2β } for j = 1, . . ., k − 1.Thus for n large enough we have Fix some i ∈ I k \ S n (µ).Note that ϕ ik ≥ E k /n = 1/2.Hence for such i there holds the inequality E(μ i − µ i ) 2 ≤ 2. For n large enough we have

By the definition of s
Let k and ε 0 be as in Proposition 4.1.One has For a fixed i ∈ I k ∩ S n (µ) one has Therefore, Estimation of R 3 .

Concluding remarks
In this paper we suggest an estimation procedure for sparse signals in adaptive setting based on the Distilled Sensing algorithm introduced in [3].In a sense, our results are optimal when the set of non-zero components of pure signal is of the order n 1−β with some β ∈ (0, 1/3).For this case we demonstrate that the asymptotic behavior of loss function is the same as for an optimal estimator up to a constant multiplier.Under general assumptions such performance cannot be achieved in non-adaptive setting.Further research could cover the cases of weaker signals when µ i are not bounded from below.