Advancing representation learning in specialized fields like medicine remains challenging due to the scarcity of expert annotations for text and images. To tackle this issue, we present a novel two-stage framework designed to extract high-quality factual statements from free-text radiology reports in order to improve the representations of text encoders and, consequently, their performance on various downstream tasks. In the first stage, we propose a \textit{Fact Extractor} that leverages large language models (LLMs) to identify factual statements from well-curated domain-specific datasets. In the second stage, we introduce a \textit{Fact Encoder} (CXRFE) based on a BERT model fine-tuned with objective functions designed to improve its representations using the extracted factual data. Our framework also includes a new embedding-based metric (CXRFEScore) for evaluating chest X-ray text generation systems, leveraging both stages of our approach. Extensive evaluations show that our fact extractor and encoder outperform current state-of-the-art methods in tasks such as sentence ranking, natural language inference, and label extraction from radiology reports. Additionally, our metric proves to be more robust and effective than existing metrics commonly used in the radiology report generation literature. The code of this project is available at \url{https://github.com/PabloMessina/CXR-Fact-Encoder}.
The 27 th 2016 ACM international conference on Hypertext and Social Media will be held in Halifax, Canada, from July 10 to 13. It will be collocated with UMAP 2016, the 24th international conference on User Modeling, Adaptation and Personalization. This newsletter article briefly introduces the conference and its venue.
The COVID-19 pandemic has underlined the need for reliable information for clinical decision-making and public health policies. As such, evidence-based medicine (EBM) is essential in identifying and evaluating scientific documents pertinent to novel diseases, and the accurate classification of biomedical text is integral to this process. Given this context, we introduce a comprehensive, curated dataset composed of COVID-19-related documents. This dataset includes 20,047 labeled documents that were meticulously classified into five distinct categories: systematic reviews (SR), primary study randomized controlled trials (PS-RCT), primary study non-randomized controlled trials (PS-NRCT), broad synthesis (BS), and excluded (EXC). The documents, labeled by collaborators from the Epistemonikos Foundation, incorporate information such as document type, title, abstract, and metadata, including PubMed id, authors, journal, and publication date. Uniquely, this dataset has been curated by the Epistemonikos Foundation and is not readily accessible through conventional web-scraping methods, thereby attesting to its distinctive value in this field of research. In addition to this, the dataset also includes a vast evidence repository comprising 427,870 non-COVID-19 documents, also categorized into SR, PS-RCT, PS-NRCT, BS, and EXC. This additional collection can serve as a valuable benchmark for subsequent research. The comprehensive nature of this open-access dataset and its accompanying resources is poised to significantly advance evidence-based medicine and facilitate further research in the domain.
Background: Radiologists face an increasing demand for image-based diagnosis from patients every year,and computer-aided diagnosis systems seem like a promising way to alleviate their workload. Many authors have proposed deep learning models to generate reports from medical images. However, they mainly focus on improving Natural Language Processing (NLP) metrics, which may not be suitable to measure clinical correctness. Some metrics based on clinical correctness have been proposed, such as CheXpert and MIRQI, but no analysis has been carried out to assess their robustness and compare them. Furthermore, there is only a preliminary understanding of the relationship between NLP metrics and clinical correctness metrics.Methods: We contest the state-of-the-art models and evaluations in the report generation from the chest X-rays task in this work. We provide further evidence that traditional NLP metrics are insufficient to evaluate this task. We also conduct behavioral tests to analyze and compare the robustness of clinical correctness metrics MIRQI and CheXpert. Moreover, analyses independent of the text generation method helped us understand NLP metrics' performance under different scenarios.Results: We show that NLP metrics cannot discriminate sentences with opposite clinical meanings. We also show that MIRQI is not robust enough to be used as a clinical correctness metric, unlike CheXpert. Finally, we show CIDEr-D performs slightly better than the other NLP metrics to detect the presence of abnormalities, but it fails, as well as BLEU and ROUGE, in all other cases (absence, uncertainty and unmentioning).
Recent research has unveiled the importance of online social networks for improving the quality of recommender systems and encouraged the research community to investigate better ways of exploiting the social information for recommendations. While most of the research focused on enhancing a traditional source of data (e.g., ratings, implicit feedback, or tags) with some type o f social information, little is known about how different sources of social data can be combined with other types of information relevant for recommendation. To contribute to this sparse field of resear ch, in this paper we exploit users’ interactions along three dimen sions of relevance (social, transactional, and location) to assess their performance in a barely studied domain: recommending items to people in an online marketplace environment. To that and we defined s ets of user similarity measures for each dimension of relevance and studied them isolated and in combination via hybrid recommender approaches, to assess which one provides the best recommendation performance. Interestingly, in our experiments conducted on a rich dataset collected from SecondLife, a popular online virtual world, we found that recommenders relying on similarity measures obtained from the social network yielded better results than those inferred directly from the marketplace data.
Every year physicians face an increasing demand of image-based diagnosis from patients, a problem that can be addressed with recent artificial intelligence methods. In this context, we survey works in the area of automatic report generation from medical images, with emphasis on methods using deep neural networks, with respect to: (1) Datasets, (2) Architecture Design, (3) Explainability and (4) Evaluation Metrics. Our survey identifies interesting developments, but also remaining challenges. Among them, the current evaluation of generated reports is especially weak, since it mostly relies on traditional Natural Language Processing (NLP) metrics, which do not accurately capture medical correctness.
In this paper we present a system that recommends online comments written by teachers –suggestions of teachers to their peers- about their experience conducting educational activities in an online educational community called Kelluwen. In Kelluwen, the teachers build, use and share collaborative didactical designs whose educational activities are based on Social Web tools. To generate the recommendations, we propose a hybrid peer-based recommender system that combines collaborative and content filtering, and is also enriched with contextual information. The results of a quantitative evaluation and a survey support the utility of the recommendation method to improve their work.
With the increase of data collected and computation power available, modern recommender systems are ever facing new challenges. While complex models are developed in academia, industry practice seems to focus on relatively simple techniques that can deal with the magnitude of data and the need to distribute the computation. The workshop on large-scale recommender systems (LSRS) is a meeting place for industry and academia to discuss the current and future challenges of applied large-scale recommender systems.
Deep learning, one of the fastest-growing branches of artificial intelligence, has become one of the most relevant research and development areas of the last years, especially since 2012, when a neural network surpassed the most advanced image classification techniques of the time. This spectacular development has not been alien to the world of the arts, as recent advances in generative networks have made possible the artificial creation of high-quality content such as images, movies or music. We believe that these novel generative models propose a great challenge to our current understanding of computational creativity. If a robot can now create music that an expert cannot distinguish from music composed by a human, or create novel musical entities that were not known at training time, or exhibit conceptual leaps, does it mean that the machine is then creative? We believe that the emergence of these generative models clearly signals that much more research needs to be done in this area. We would like to contribute to this debate with two case studies of our own: TimbreNet, a variational auto-encoder network trained to generate audio-based musical chords, and StyleGAN Pianorolls, a generative adversarial network capable of creating short musical excerpts, despite the fact that it was trained with images and not musical data. We discuss and assess these generative models in terms of their creativity and we show that they are in practice capable of learning musical concepts that are not obvious based on the training data, and we hypothesize that these deep models, based on our current understanding of creativity in robots and machines, can be considered, in fact, creative.