We consider the problem of recovering a binary image consisting of many filaments or linear fragments in the presence of severe binary noise. Our approach exploits beamlets—a dyadically organized, multiscale system of line segments—and associated fast algorithms for beamlet analysis. It considers models based on beamlet-decorated recursive dyadic partitions, and models the image as a Bernoulli random process with spatially variant success probability, which is "high" within the beamlet complexity-penalized model fitting. Simulation results demonstrate the effectiveness of the method.
Many statistical methods have been proposed in the last years for analyzing the spatial distribution of galaxies. Very few of them, however, can handle properly the border effects of complex observational sample volumes. In this paper, we first show how to calculate the Minkowski Functionals (MF) taking into account these border effects. Then we present a multiscale extension of the MF which gives us more information about how the galaxies are spatially distributed. A range of examples using Gaussian random fields illustrate the results. Finally we have applied the Multiscale Minkowski Functionals (MMF) to the 2dF Galaxy Redshift Survey data. The MMF clearly indicates an evolution of morphology with scale. We also compare the 2dF real catalog with mock catalogs and found that Lambda-CDM simulations roughly fit the data, except at the finest scale.
We construct a new orthonormal basis for $L^2({\Bbb R}^2)$, whose elements are angularly integrated ridge functions---{\it orthonormal ridgelets}. The basis elements are smooth and of rapid decay in the spatial domain, and in the frequency domain are localized near angular wedges which, at radius $r = 2^j$, have radial extent $\Delta r \approx 2^j$ and angular extent $\Delta \theta \approx 2\pi/2^{j}$. Orthonormal ridgelet expansions expose an interesting phenomenon in nonlinear approximation: they give very efficient approximations to objects such as $1_{\{ x_1\cos\theta+ x_2\sin\theta > a\}} \ e^{-x^2_1-x^2_2}$ which are smooth away from a discontinuity along a line. The orthonormal ridgelet coefficients of such objects are sparse: they belong to every $\ell^p$, p > 0. This implies that simple thresholding in the ridgelet orthobasis is, in a certain sense, a near-ideal nonlinear approximation scheme for such objects. Orthonormal ridgelets may be viewed as L2 substitutes for approximation by sums of ridge functions, and so can perform many of the same tasks as the ridgelet systems constructed by Candès [Ph.D. Thesis, Department of Statistics, Stanford University, Stanford, CA, 1998; Appl. Comput. Harmon. Anal., 6 (1999), pp. 197--218]. Orthonormal ridgelets make available the machinery of orthogonal decompositions, which is not available for ridge functions as they are not in $L^2({\Bbb R}^2)$. The ridgelet orthobasis is constructed as the isometric image of a special wavelet basis for Radon space; as a consequence, ridgelet analysis is equivalent to a special wavelet analysis in the Radon domain. This means that questions of ridgelet analysis of linear singularities can be answered by wavelet analysis of point singularities. At the heart of our nonlinear approximation result is the study of a certain tempered distribution on ${\Bbb R}^2$ defined formally by S(u,v) = |v|^{-1/2} \sigma(u/|v|)$ with $\sigma$ a certain smooth bounded function; this is singular at (u,v) = (0,0) and $C^\infty$ elsewhere. The key point is that the analysis of this point singularity by tensor Meyer wavelets yields sparse coefficients at high frequencies; this is reflected in the sparsity of the ridgelet coefficients and the good nonlinear approximation properties of the ridgelet basis.
The ubiquitous least squares method for systems of linear equations returns solutions which typically have all non-zero entries. However, solutions with the least number of non-zeros allow for greater insight. An exhaustive search for the sparsest solution is intractable, NP-hard. Recently, a great deal of research showed that linear programming can find the sparsest solution for certain 'typical' systems of equations, provided the solution is sufficiently sparse. In this note we report recent progress determining conditions under which the sparsest solution to large systems is available by linear programming. Our work shows that there are sharp thresholds on sparsity below which these methods will succeed and above which they fail; it evaluates those thresholds precisely and hints at several interesting applications.
We derive a formula for optimal hard thresholding of the singular value decomposition in the presence of correlated additive noise; although it nominally involves unobservables, we show how to apply it even where the noise covariance structure is not a-priori known or is not independently estimable. The proposed method, which we call ScreeNOT, is a mathematically solid alternative to Cattell's ever-popular but vague Scree Plot heuristic from 1966. ScreeNOT has a surprising oracle property: it typically achieves exactly, in large finite samples, the lowest possible MSE for matrix recovery, on each given problem instance - i.e. the specific threshold it selects gives exactly the smallest achievable MSE loss among all possible threshold choices for that noisy dataset and that unknown underlying true low rank model. The method is computationally efficient and robust against perturbations of the underlying covariance structure. Our results depend on the assumption that the singular values of the noise have a limiting empirical distribution of compact support; this model, which is standard in random matrix theory, is satisfied by many models exhibiting either cross-row correlation structure or cross-column correlation structure, and also by many situations where there is inter-element correlation structure. Simulations demonstrate the effectiveness of the method even at moderate matrix sizes. The paper is supplemented by ready-to-use software packages implementing the proposed algorithm: package ScreeNOT in Python (via PyPI) and R (via CRAN).
In important application fields today—genomics and proteomics are examples—selecting a small subset of useful features is crucial for success of Linear Classification Analysis. We study feature selection by thresholding of feature Z -scores and introduce a principle of threshold selection, based on the notion of higher criticism (HC). For i = 1, 2, …, p , let π i denote the two-sided P -value associated with the i th feature Z -score and π ( i ) denote the i th order statistic of the collection of P -values. The HC threshold is the absolute Z -score corresponding to the P -value maximizing the HC objective ( i / p − π ( i ) )/ i/p(1−i/p) . We consider a rare/weak (RW) feature model, where the fraction of useful features is small and the useful features are each too weak to be of much use on their own. HC thresholding (HCT) has interesting behavior in this setting, with an intimate link between maximizing the HC objective and minimizing the error rate of the designed classifier, and very different behavior from popular threshold selection procedures such as false discovery rate thresholding (FDRT). In the most challenging RW settings, HCT uses an unconventionally low threshold; this keeps the missed-feature detection rate under better control than FDRT and yields a classifier with improved misclassification performance. Replacing cross-validated threshold selection in the popular Shrunken Centroid classifier with the computationally less expensive and simpler HCT reduces the variance of the selected threshold and the error rate of the constructed classifier. Results on standard real datasets and in asymptotic theory confirm the advantages of HCT.
Compressed sensing posits that, within limits, one can undersample a sparse signal and yet reconstruct it accurately. Knowing the precise limits to such undersampling is important both for theory and practice. We present a formula that characterizes the allowed undersampling of generalized sparse objects. The formula applies to Approximate Message Passing (AMP) algorithms for compressed sensing, which are here generalized to employ denoising operators besides the traditional scalar soft thresholding denoiser. This paper gives several examples including scalar denoisers not derived from convex penalization -- the firm shrinkage nonlinearity and the minimax nonlinearity -- and also nonscalar denoisers -- block thresholding, monotone regression, and total variation minimization. Let the variables eps = k/N and delta = n/N denote the generalized sparsity and undersampling fractions for sampling the k-generalized-sparse N-vector x_0 according to y=Ax_0. Here A is an n\times N measurement matrix whose entries are iid standard Gaussian. The formula states that the phase transition curve delta = delta(eps) separating successful from unsuccessful reconstruction of x_0 by AMP is given by: delta = M(eps| Denoiser), where M(eps| Denoiser) denotes the per-coordinate minimax mean squared error (MSE) of the specified, optimally-tuned denoiser in the directly observed problem y = x + z. In short, the phase transition of a noiseless undersampling problem is identical to the minimax MSE in a denoising problem.