MK

Martin Krawczyk-Becker

Universität Hamburg

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Trends

Author Order

Document Type

Co-Authors

Timo Gerkmann

Universität Hamburg

Daniel Marquardt

Praxis (United States)

Alfred Mertins

University of Lübeck

Kamil Adiloğlu

Klinikum Oldenburg

Graham Coleman

Hearing4all

Birger Kollmeier

Carl von Ossietzky Universität Oldenburg

Mathias Dietz

Carl von Ossietzky Universität Oldenburg

Huy Phan

Queen Mary University of London

Volker Hohmann

Carl von Ossietzky Universität Oldenburg

Hongmei Hu

Carl von Ossietzky Universität Oldenburg

Cooperative Institutions

Carl von Ossietzky Universität Oldenburg

Hearing4all

Klinikum Oldenburg

Ruhr University Bochum

Technical University of Munich

University of Kaiserslautern

Aalborg University

Signal Processing (United States)

Daimler (Germany)

International Audio Laboratories Erlangen

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Field

Comparing Binaural Pre-processing Strategies I

Trends in Hearing (2015)

Regina M. Baumgärtel Martin Krawczyk-Becker Daniel Marquardt Christoph Völker Hongmei Hu

In a collaborative research project, several monaural and binaural noise reduction algorithms have been comprehensively evaluated. In this article, eight selected noise reduction algorithms were assessed using instrumental measures, with a focus on the instrumental evaluation of speech intelligibility. Four distinct, reverberant scenarios were created to reflect everyday listening situations: a stationary speech-shaped noise, a multitalker babble noise, a single interfering talker, and a realistic cafeteria noise. Three instrumental measures were employed to assess predicted speech intelligibility and predicted sound quality: the intelligibility-weighted signal-to-noise ratio, the short-time objective intelligibility measure, and the perceptual evaluation of speech quality. The results show substantial improvements in predicted speech intelligibility as well as sound quality for the proposed algorithms. The evaluated coherence-based noise reduction algorithm was able to provide improvements in predicted audio signal quality. For the tested single-channel noise reduction algorithm, improvements in intelligibility-weighted signal-to-noise ratio were observed in all but the nonstationary cafeteria ambient noise scenario. Binaural minimum variance distortionless response beamforming algorithms performed particularly well in all noise scenarios.

Intelligibility (philosophy)

Sound Quality

Monaural

10.1177/2331216515617916

Cite

Citations (42)

Phase Processing for Single-Channel Speech Enhancement: History and recent advances

IEEE Signal Processing Magazine (2015)

Timo Gerkmann Martin Krawczyk-Becker Jonathan Le Roux

With the advancement of technology, both assisted listening devices and speech communication devices are becoming more portable and also more frequently used. As a consequence, users of devices such as hearing aids, cochlear implants, and mobile telephones, expect their devices to work robustly anywhere and at any time. This holds in particular for challenging noisy environments like a cafeteria, a restaurant, a subway, a factory, or in traffic. One way to making assisted listening devices robust to noise is to apply speech enhancement algorithms. To improve the corrupted speech, spatial diversity can be exploited by a constructive combination of microphone signals (so-called beamforming), and by exploiting the different spectro?temporal properties of speech and noise. Here, we focus on single-channel speech enhancement algorithms which rely on spectrotemporal properties. On the one hand, these algorithms can be employed when the miniaturization of devices only allows for using a single microphone. On the other hand, when multiple microphones are available, single-channel algorithms can be employed as a postprocessor at the output of a beamformer. To exploit the short-term stationary properties of natural sounds, many of these approaches process the signal in a time-frequency representation, most frequently the short-time discrete Fourier transform (STFT) domain. In this domain, the coefficients of the signal are complex-valued, and can therefore be represented by their absolute value (referred to in the literature both as STFT magnitude and STFT amplitude) and their phase. While the modeling and processing of the STFT magnitude has been the center of interest in the past three decades, phase has been largely ignored.

10.1109/msp.2014.2369251

Cite

Citations (216)

DNN and CNN with Weighted and Multi-task Loss Functions for Audio Event Detection

arXiv (Cornell University) (2017)

Huy Phan Martin Krawczyk-Becker Timo Gerkmann Alfred Mertins

This report presents our audio event detection system submitted for Task 2, "Detection of rare sound events", of DCASE 2017 challenge. The proposed system is based on convolutional neural networks (CNNs) and deep neural networks (DNNs) coupled with novel weighted and multi-task loss functions and state-of-the-art phase-aware signal enhancement. The loss functions are tailored for audio event detection in audio streams. The weighted loss is designed to tackle the common issue of imbalanced data in background/foreground classification while the multi-task loss enables the networks to simultaneously model the class distribution and the temporal structures of the target events for recognition. Our proposed systems significantly outperform the challenge baseline, improving F-score from 72.7% to 90.0% and reducing detection error rate from 0.53 to 0.18 on average on the development data. On the evaluation data, our submission obtains an average F1-score of 88.3% and an error rate of 0.22 which are significantly better than those obtained by the DCASE baseline (i.e. an F1-score of 64.1% and an error rate of 0.64).

10.48550/arxiv.1708.03211

Cite

Citations (32)

Least squares estimate of the initial phases in STFT based speech enhancement

Interspeech 2022 (2015)

Sidsel Marie Nørholm Martin Krawczyk-Becker Timo Gerkmann Steven van de Par Jesper Rindom Jensen

Least-squares function approximation

10.21437/interspeech.2015-407

Cite

Citations (1)

On MMSE-Based Estimation of Amplitude and Complex Speech Spectral Coefficients Under Phase-Uncertainty

IEEE/ACM Transactions on Audio Speech and Language Processing (2016)

Martin Krawczyk-Becker Timo Gerkmann

Among the most commonly used single-channel approaches for the enhancement of noise corrupted speech are Bayesian estimators of clean speech coefficients in the short-time Fourier transform domain. However, the vast majority of these approaches effectively only modifies the spectral amplitude and does not consider any information about the clean speech spectral phase. More recently, clean speech estimators that can utilize prior phase information have been proposed and shown to lead to improvements over the traditional, phase-blind approaches. In this work, we revisit phase-aware estimators of clean speech amplitudes and complex coefficients. To complete the existing set of estimators, we first derive a novel amplitude estimator given uncertain prior phase information. Second, we derive a closed-form solution for complex coefficients when the prior phase information is completely uncertain or not available. We put the novel estimators into the context of existing estimators and discuss their advantages and disadvantages.

10.1109/taslp.2016.2602549

Cite

Citations (25)

An evaluation of the perceptual quality of phase-aware single-channel speech enhancement

The Journal of the Acoustical Society of America (2016)

Martin Krawczyk-Becker Timo Gerkmann

For the enhancement of single-channel speech corrupted by acoustic noise, recently short-time Fourier transform domain clean speech estimators were proposed that incorporate prior information about the clean speech spectral phase. Instrumental measures predict quality improvements for the phase-aware estimators over their conventional phase-blind counterparts. In this letter, these predictions are verified by means of listening experiments. The phase-aware amplitude estimator on average achieves a stronger noise reduction and is significantly preferred over its phase-blind counterpart in a pairwise comparison even if the clean spectral phase is estimated blindly on the noisy signal.

PESQ

10.1121/1.4965288

Cite

Citations (11)

On Speech Enhancement Under PSD Uncertainty

IEEE/ACM Transactions on Audio Speech and Language Processing (2018)

Martin Krawczyk-Becker Timo Gerkmann

Many well-known and frequently employed Bayesian clean speech estimators have been derived under the assumption that the true power spectral densities (PSDs) of speech and noise are exactly known. In practice, however, only power spectral density (PSD) estimates are available. Simply neglecting PSD estimation errors and handling the estimates as true values leads to speech estimation errors causing musical noise and undesired suppression of speech. In this paper, the uncertainty of the available speech PSD estimates is addressed. The main contributions are the following. First, we summarize and examine ways to model and incorporate the uncertainty of PSD estimates for a more robust speech enhancement performance. Second, a novel nonlinear clean speech estimator is derived that takes into account prior knowledge about the absolute value of typical speech PSDs. Third, we show that the derived statistical framework provides uncertainty-aware counterparts to a number of well-known conventional clean speech estimators such as the Wiener filter and Ephraim and Malah's amplitude estimators. Fourth, we show how modern PSD estimators can be incorporated into the theoretical framework and propose to employ frequency dependent priors. Finally, the effects and benefits of considering the uncertainty of speech PSD estimates are analyzed, discussed, and evaluated via instrumental measures and a listening experiment.

Wiener filter

10.1109/taslp.2018.2816241

Cite

Citations (9)

MMSE-optimal combination of wiener filtering and harmonic model based speech enhancement in a general framework

Martin Krawczyk-Becker Timo Gerkmann

For the reduction of additive acoustic noise, various methods and clean speech estimators are available, with specific strengths and weaknesses. In order to combine the strengths of two such approaches, we derive a minimum mean squared error (MMSE)-optimal estimator of the clean speech given two independent initial clean speech estimates. As an example we present a specific combination that results in a weighted mixture of the Wiener filter and a simple, low-cost harmonic speech model. The proposed estimator benefits from the additional information provided by the harmonic model, leading to a better protection of harmonic components of voiced speech as compared to the traditional Wiener filter. Instrumental measures predict improvements in speech quality and speech intelligibility for the proposed combination over each individual estimator.

Wiener filter

Intelligibility (philosophy)

Harmonic

Harmonic Analysis

10.1109/waspaa.2015.7336938

Cite

Citations (4)

Toward Resilience in Mixed Critical Industrial Control Systems: A Multi-Disciplinary View

IEEE Access (2022)

Robert-Jeron Reifert Martin Krawczyk-Becker Laurin Prenzel Svyatoslav Pavlichkov Mohammad Al Khatib

Future industrial control systems face the need for being highly adaptive, productive, and efficient, yet providing a high level of safety towards operating staff, environment, and machinery. These demands call for the joint consideration of resilience and mixed criticality to exploit previously untapped redundancy potentials. Hereby, resilience combines detection, decision-making, adaption to, and recovery from unforeseeable or malicious events in an autonomous manner. Enabling the consideration of functionalities with different criticalities, mixed criticality allows prioritizing safety-relevant over uncritical functions. While both concepts on their own feature a huge research branch throughout various disciplines of engineering-related fields, the synergies of both paradigms in a multi-disciplinary context are commonly overlooked. In industrial control, consolidating these mechanisms while preserving functional safety requirements under limited resources is a significant challenge. In this contribution, we provide a multi-disciplinary perspective of the concepts and mechanisms that enable criticality-aware resilience, in particular with respect to system design, communication, control, and security. Thereby, we envision a highly flexible, autonomous, and scalable paradigm for industrial control systems, identify potentials along the different domains, and identify future research directions. Our results indicate that jointly employing mixed criticality and resilience has the potential to increase the overall systems efficiency, reliability, and flexibility, even against unanticipated or malicious events. Thus, for future industrial systems, mixed criticality-aware resilience is a crucial factor towards autonomy and increasing the overall system performance.

Mixed criticality

Resilience

Industrial control system

10.1109/access.2022.3224425

Cite

Citations (0)

Fundamental Frequency Informed Speech Enhancement in a Flexible Statistical Framework

IEEE/ACM Transactions on Audio Speech and Language Processing (2016)

Martin Krawczyk-Becker Timo Gerkmann

Conventional statistical clean speech estimators, like the Wiener filter, are frequently used for the spectro-temporal enhancement of noise corrupted speech. Most of these approaches estimate the clean speech independently for each time-frequency point, neglecting the structure of the underlying speech sound. In this work, we derive a statistical estimator that explicitly takes into account information about the characteristic structure of voiced speech by means of a harmonic signal model. To this end, we also present a way to estimate a harmonic model-based clean speech representation and the corresponding error variance directly in the short-time Fourier transform domain. The resulting estimator is optimal in the minimum-mean-squared error sense and can conveniently be formulated in terms of a multichannel Wiener filter. The proposed estimator outperforms several reference algorithms in terms of speech quality and intelligibility as predicted by instrumental measures.

Wiener filter

Intelligibility (philosophy)

10.1109/taslp.2016.2533867

Cite

Citations (9)