In this work, we apply 3D motion estimation to the problem of motion compensation for video coding. We model the video sequence as the perspective projection of a collection of rigid bodies which undergo a roto-translational motion. Motion compensation of the sequence frames can be performed once the shape of the objects and the motion parameters are determined. The motion equations of a rigid body can be formulated as a non linear dynamic system whose state is represented by the motion parameters and by the scaled depths of the object feature points. An extended Kalman filter is then used to estimate both the motion and the object shape parameters simultaneously. We found that the inclusion of the shape parameters in the estimation procedure is essential for reliable motion estimation. Our experiments show that the proposed approach gives the following advantages: the filter gives more reliable estimates in the presence of measurement noise in comparison with other motion estimators that separately compute motion and structure; the filter can effectively track abrupt motion changes; the structure imposed by the model implies that the reconstructed motion is very natural as opposed to more common block-based schemes; the parametrization of the model allows for a very efficient coding of motion information.
In this paper, we present a technique to record a large set of room impulse responses using a microphone moving along a tra jectory. The technique processes the signal recorded by the microphone to reconstruct the signals that would have been recorded at all possible spatial positions along the array. The speed of movement of the microphone is shown to be the key factor for the reconstruction. This fast method of recording spatial impulse responses can also be applied for the recording of head-related transfer functions.
Abstract Acoustic tomography is a type of inverse problem. The idea of estimating physical quantities that influence sound propagation by measuring the parameters of propagation has proven to be successful in many practical domains, including temperature and wind estimation in the atmosphere. However, in most of the previous work in this area, the algorithms used have not been proven mathematically to provide the correct solution to the inverse problem. This paper considers the problem of reconstructing 2D temperature and wind fields by using acoustic tomography setups. Primarily, it shows that the classical time-of-flight measurements are not sufficient to reconstruct wind fields. As a solution, an additional set of measurements related solely to the parameters of sound propagation—more precisely, to the angles of departure/arrival of sound waves—is suggested. To take the full benefit of this additional information, the bent-ray model of sound propagation is introduced. In this work, it is also shown that, when a temperature and a source-free 2D wind field are observed on bounded domains, the complete reconstruction is possible using only measurements of the time of flight. Conversely, the angles of departures/arrivals are sufficient to reconstruct a temperature and a curl-free 2D wind fields on bounded domains. Further, an iterative reconstruction algorithm is proposed and possible variations to the main scheme are discussed. Finally, the performed numerical simulations confirm the theoretical results, demonstrate fast convergence, and show the advantages of the adopted bent-ray model for sound propagation over the straight-ray model.
We consider the problem of reconstructing superimposed temperature and wind flow fields from acoustic measurements. A new technique based solely on acoustic wave propagation is presented. In contrast to the usual straight ray assumption, a bent ray model is considered in order to achieve higher accuracy. We also develop a lab size experiment for temperature estimation.
We are proposing an interpolation technique for head related transfer functions (HRTFs). For deriving the algorithm we study the dual problem where sound is emitted from the listener’s ear and the generated sound field is recorded along a circular array of microphones around the listener. The proposed interpolation algorithm is based on the observation that spatial bandwidth of the measured sound along the circular array is limited (for all practical purposes). Further, we observe that this spatial bandwidth increases linearly with the frequency of the emitted sound. The result of the analysis leads to the conclusion that the necessary angle between HRTFs is about 5 degrees in order to be able to reconstruct all HRTFs up to 44.1 kHz in the horizontal plane.
Neural architecture search has recently attracted lots of research efforts as it promises to automate the manual design of neural networks. However, it requires a large amount of computing resources and in order to alleviate this, a performance prediction network has been recently proposed that enables efficient architecture search by forecasting the performance of candidate architectures, instead of relying on actual model training. The performance predictor is task-aware taking as input not only the candidate architecture but also task meta-features and it has been designed to collectively learn from several tasks. In this work, we introduce a pairwise ranking loss for training a network able to rank candidate architectures for a new unseen task conditioning on its task meta-features. We present experimental results, showing that the ranking network is more effective in architecture search than the previously proposed performance predictor.
In many applications, the sampling frequency is limited by the physical characteristics of the components: the pixel pitch, the rate of the analog-to-digital (AID) converter, etc. A low- pass filter is usually applied before the sampling operation to avoid aliasing. However, when multiple copies are available, it is possible to use the information that is inherently present in the aliasing to reconstruct a higher resolution signal. If the different copies have unknown relative offsets, this is a nonlinear problem in the offsets and the signal coefficients. They are not easily separable in the set of equations describing the super-resolution problem. Thus, we perform joint registration and reconstruction from multiple unregistered sets of samples. We give a mathematical formulation for the problem when there are M sets of N samples of a signal that is described by L expansion coefficients. We prove that the solution of the registration and reconstruction problem is generically unique if MN ges L + M - 1. We describe two subspace-based methods to compute this solution. Their complexity is analyzed, and some heuristic methods are proposed. Finally, some numerical simulation results on one- and two-dimensional signals are given to show the performance of these methods.
The first step of the coding technique proposed in the MPEG standard is motion compensation. It reduces the residual error energy using a fraction of the total bit rate to transmit motion information. Motion compensation is performed using a block matching approach though the algorithm to compute motion vectors is not given in the MPEG standard. Usually, an exhaustive search around the macroblock position is used. This solution (proposed in the test model) gives the lowest error but has the highest complexity. In this work we propose an algorithm that reduces the complexity of the block matching procedure while achieving comparable performance with the exhaustive search. The proposed solution is particularly attractive for the spatially scalable version of the coder when both a full resolution and a spatially downsampled sequence are transmitted. The algorithm uses a multiresolution motion compensation scheme. Exhaustive search block matching is performed in the downsampled sequence and the vector field computed is used as an estimate of the motion vectors for the full resolution sequence. Thus, only a refinement needs to be computed. This allows a consistent reduction of the computation time with respect to exhaustive search at the full resolution level, while the residual error energy increases only slightly.
Quantum computing-based machine learning mainly focuses on computing hardware that is experimentally challenging to realize due to requiring gates that operate at very low temperature. Instead, we demonstrate the existence of a lower performance and much lower effort island on the accuracy-vs-qubits graph that may well be experimentally accessible with room temperature optics. This high temperature quantum computing toy model is nevertheless interesting to study as it allows rather accessible explanations of key concepts in computing, in particular interference, entanglement, and the measurement process.
We specifically study the problem of classifying an example from the MNIST and Fashion-MNIST datasets, subject to the constraint that we have to make a prediction after the detection of the very first photon that passed a coherently illuminated filter showing the example. Whereas a classical set-up in which a photon is detected after falling on one of the $28\times 28$ image pixels is limited to a (maximum likelihood estimation) accuracy of $21.27\%$ for MNIST, respectively $18.27\%$ for Fashion-MNIST, we show that the theoretically achievable accuracy when exploiting inference by optically transforming the state of the photon is at least $41.27\%$ for MNIST, respectively $36.14\%$ for Fashion-MNIST.
We show in detail how to train the corresponding transformation with TensorFlow and also explain how this example can serve as a teaching tool for the measurement process in mechanics.