Sven Ahlbäck

Queen Mary University of London

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Trends

Author Order

Document Type

Co-Authors

Simon Dixon

Queen Mary University of London

Emir Demirel

Queen Mary University of London

Emmanouil Benetos

Queen Mary University of London

Carlos Lordelo

Queen Mary University of London

Susanne Rosenberg

Royal College of Music in Stockholm

Olof Misgeld

Royal College of Music in Stockholm

Bob L. Sturm

KTH Royal Institute of Technology

Bror Brodén

Karolinska Institutet

Bo Holmström

Danderyds sjukhus

Thomas Ihre

Stockholm South General Hospital

Cooperative Institutions

Queen Mary University of London

Karolinska Institutet

KTH Royal Institute of Technology

Saint Göran Hospital

Danderyds sjukhus

Stockholm South General Hospital

Perfect Harmony Health

Institute of Electrical and Electronics Engineers

Malmö University

Lund University

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Field

Spelmansboken som en ”hub” för musikaliskt skapande idag : Hur spelmansbokens skrivna material kan tjäna som en källa till inspiration för dagens folkmusiker och sångare

Susanne Rosenberg Sven Ahlbäck Olof Misgeld Mikaël Marin

Dagens folkmusiker har mycket gemensamt med aldre tiders ”Stads-musikant” och byspelman, de som tillskrivs ”Spelmansbockerna” (Gustafsson, 2016). Precis som dessa har dagens folkmusiker (och folksa ...

Source

Cite

Citations (0)

Polyphonic pitch detection with convolutional recurrent neural networks

arXiv (Cornell University) (2022)

Carl Thomé Sven Ahlbäck

Recent directions in automatic speech recognition (ASR) research have shown that applying deep learning models from image recognition challenges in computer vision is beneficial. As automatic music transcription (AMT) is superficially similar to ASR, in the sense that methods often rely on transforming spectrograms to symbolic sequences of events (e.g. words or notes), deep learning should benefit AMT as well. In this work, we outline an online polyphonic pitch detection system that streams audio to MIDI by ConvLSTMs. Our system achieves state-of-the-art results on the 2007 MIREX multi-F0 development set, with an F-measure of 83\% on the bassoon, clarinet, flute, horn and oboe ensemble recording without requiring any musical language modelling or assumptions of instrument timbre.

MIDI

Timbre

Polyphony

Oboe

Spectrogram

10.48550/arxiv.2202.02115

Cite

Citations (1)

Computational Pronunciation Analysis in Sung Utterances

arXiv (Cornell University) (2021)

Emir Demirel Sven Ahlbäck Simon Dixon

Recent automatic lyrics transcription (ALT) approaches focus on building stronger acoustic models or in-domain language models, while the pronunciation aspect is seldom touched upon. This paper applies a novel computational analysis on the pronunciation variances in sung utterances and further proposes a new pronunciation model adapted for singing. The singing-adapted model is tested on multiple public datasets via word recognition experiments. It performs better than the standard speech dictionary in all settings reporting the best results on ALT in a capella recordings using n-gram language models. For reproducibility, we share the sentence-level annotations used in testing, providing a new benchmark evaluation set for ALT.

Pronunciation

Benchmark (surveying)

Transcription

Lyrics

10.48550/arxiv.2106.10977

Cite

Citations (0)

Osteonecrosis of the knee

Calcified Tissue Research (1968)

Sven Ahlbäck

10.1007/bf02065218

Cite

Citations (22)

Disturbances in the defecation mechanism with special reference to intussusception of the rectum (internal procidentia)

Diseases of the Colon & Rectum (1985)

Claes Johansson Thomas Ihre Sven Ahlbäck

Anorectal disorders that disturb normal defecation are described, especially intussusception of the rectum (internal procidentia). A review of 190 patients, half of whom were treated operatively and the other half conservatively, is presented. Diagnostic procedures, symptoms, and indications for operations are evaluated. We believe that intussusception of the rectum is a relatively common cause of difficult emptying of the rectum and, when the correct diagnosis is established, operation presents a fair chance for improvement.

Rectal prolapse

Colorectal Surgery

Defecography

Obstructed defecation

10.1007/bf02554307

Cite

Citations (87)

Investigating kernel shapes and skip connections for deep learning-based harmonic-percussive separation

arXiv (Cornell University) (2019)

Carlos Lordelo Emmanouil Benetos Simon Dixon Sven Ahlbäck

In this paper we propose an efficient deep learning encoder-decoder network for performing Harmonic-Percussive Source Separation (HPSS). It is shown that we are able to greatly reduce the number of model trainable parameters by using a dense arrangement of skip connections between the model layers. We also explore the utilisation of different kernel sizes for the 2D filters of the convolutional layers with the objective of allowing the network to learn the different time-frequency patterns associated with percussive and harmonic sources more efficiently. The training and evaluation of the separation has been done using the training and test sets of the MUSDB18 dataset. Results show that the proposed deep network achieves automatic learning of high-level features and maintains HPSS performance at a state-of-the-art level while reducing the number of parameters and training time.

Kernel (algebra)

Harmonic

Source Separation

Separation (statistics)

Autoencoder

10.48550/arxiv.1905.01899

Cite

Citations (0)

Automatic Lyrics Transcription using Dilated Convolutional Neural Networks with Self-Attention

arXiv (Cornell University) (2020)

Emir Demirel Sven Ahlbäck Simon Dixon

Speech recognition is a well developed research field so that the current state of the art systems are being used in many applications in the software industry, yet as by today, there still does not exist such robust system for the recognition of words and sentences from singing voice. This paper proposes a complete pipeline for this task which may commonly be referred as automatic lyrics transcription (ALT). We have trained convolutional time-delay neural networks with self-attention on monophonic karaoke recordings using a sequence classification objective for building the acoustic model. The dataset used in this study, DAMP - Sing! 300x30x2 [1] is filtered to have songs with only English lyrics. Different language models are tested including MaxEnt and Recurrent Neural Networks based methods which are trained on the lyrics of pop songs in English. An in-depth analysis of the self-attention mechanism is held while tuning its context width and the number of attention heads. Using the best settings, our system achieves notable improvement to the state-of-the-art in ALT and provides a new baseline for the task.

Lyrics

Transcription

10.48550/arxiv.2007.06486

Cite

Citations (0)

Pitch-Informed Instrument Assignment using a Deep Convolutional Network with Multiple Kernel Shapes

Zenodo (CERN European Organization for Nuclear Research) (2021)

Carlos Lordelo Emmanouil Benetos Simon Dixon Sven Ahlbäck

Kernel (algebra)

10.5281/zenodo.5625682

Cite

Citations (0)

Low Resource Audio-to-Lyrics Alignment From Polyphonic Music Recordings

arXiv (Cornell University) (2021)

Emir Demirel Sven Ahlbäck Simon Dixon

Lyrics alignment in long music recordings can be memory exhaustive when performed in a single pass. In this study, we present a novel method that performs audio-to-lyrics alignment with a low memory consumption footprint regardless of the duration of the music recording. The proposed system first spots the anchoring words within the audio signal. With respect to these anchors, the recording is then segmented and a second-pass alignment is performed to obtain the word timings. We show that our audio-to-lyrics alignment system performs competitively with the state-of-the-art, while requiring much less computational resources. In addition, we utilise our lyrics alignment system to segment the music recordings into sentence-level chunks. Notably on the segmented recordings, we report the lyrics transcription scores on a number of benchmark test sets. Finally, our experiments highlight the importance of the source separation step for good performance on the transcription and alignment tasks. For reproducibility, we publicly share our code with the research community.

Lyrics

Transcription

Polyphony

Benchmark (surveying)

10.48550/arxiv.2102.09202

Cite

Citations (0)

MSTRE-Net: Multistreaming Acoustic Modeling for Automatic Lyrics Transcription

arXiv (Cornell University) (2021)

Emir Demirel Sven Ahlbäck Simon Dixon

This paper makes several contributions to automatic lyrics transcription (ALT) research. Our main contribution is a novel variant of the Multistreaming Time-Delay Neural Network (MTDNN) architecture, called MSTRE-Net, which processes the temporal information using multiple streams in parallel with varying resolutions keeping the network more compact, and thus with a faster inference and an improved recognition rate than having identical TDNN streams. In addition, two novel preprocessing steps prior to training the acoustic model are proposed. First, we suggest using recordings from both monophonic and polyphonic domains during training the acoustic model. Second, we tag monophonic and polyphonic recordings with distinct labels for discriminating non-vocal silence and music instances during alignment. Moreover, we present a new test set with a considerably larger size and a higher musical variability compared to the existing datasets used in ALT literature, while maintaining the gender balance of the singers. Our best performing model sets the state-of-the-art in lyrics transcription by a large margin. For reproducibility, we publicly share the identifiers to retrieve the data used in this paper.

Lyrics

Polyphony

Transcription

Music Information Retrieval

10.48550/arxiv.2108.02625

Cite

Citations (3)