A new synthetic dataset (training/validation) for end-to-end Relation Extraction of relationships between Organisms and Natural-Products. The new dataset was generated using Vicuna-13b-v1.5, derived from LLaMA 2. Like the model, the produced synthetic data are also submitted to the License of the model used for generation, see the original LLaMA 2 license. The new dataset was created based on the top-1000 (per biological kingdom) LOTUS literature references extracted with the GME-sampler. The dataset contains 10,405 items in the training set and 547 items in the validation set. The dataset was generated using the same protocol as described in the article.
Thanks to their linguistic capabilities, LLMs offer an opportunity to bridge the gap between informal mathematics and formal languages through autoformalization. However, it is still unclear how well LLMs generalize to sophisticated and naturally occurring mathematical statements. To address this gap, we investigate the task of autoformalizing real-world mathematical definitions -- a critical component of mathematical discourse. Specifically, we introduce two novel resources for autoformalisation, collecting definitions from Wikipedia (Def_Wiki) and arXiv papers (Def_ArXiv). We then systematically evaluate a range of LLMs, analyzing their ability to formalize definitions into Isabelle/HOL. Furthermore, we investigate strategies to enhance LLMs' performance including refinement through external feedback from Proof Assistants, and formal definition grounding, where we guide LLMs through relevant contextual elements from formal mathematical libraries. Our findings reveal that definitions present a greater challenge compared to existing benchmarks, such as miniF2F. In particular, we found that LLMs still struggle with self-correction, and aligning with relevant mathematical libraries. At the same time, structured refinement methods and definition grounding strategies yield notable improvements of up to 16% on self-correction capabilities and 43% on the reduction of undefined errors, highlighting promising directions for enhancing LLM-based autoformalization in real-world scenarios.
Existing accounts of explanation emphasise the role of prior experience in the solution of new problems. However, most of the contemporary models for multi-hop textual inference construct explanations considering each test case in isolation. This paradigm is known to suffer from semantic drift, which causes the construction of spurious explanations leading to wrong conclusions. In contrast, we investigate an abductive framework for explainable multi-hop inference that adopts the retrieve-reuse-revise paradigm largely studied in case-based reasoning. Specifically, we present a novel framework that addresses and explains unseen inference problems by retrieving and adapting prior natural language explanations from similar training examples. We empirically evaluate the case-based abductive framework on downstream commonsense and scientific reasoning tasks. Our experiments demonstrate that the proposed framework can be effectively integrated with sparse and dense pre-trained encoding mechanisms or downstream transformers, achieving strong performance when compared to existing explainable approaches. Moreover, we study the impact of the retrieve-reuse-revise paradigm on explainability and semantic drift, showing that it boosts the quality of the constructed explanations, resulting in improved downstream inference performance.
Complex systems, such as Artificial Intelligence (AI) systems, are comprised of many interrelated components. In order to represent these systems, demonstrating the relations between components is essential. Perhaps because of this, diagrams, as icons of relation, are a prevalent medium for signifying complex systems. Diagrams used to communicate AI system architectures are currently extremely varied. The diversity in diagrammatic conceptual modelling choices provides an opportunity to gain insight into the aspects which are being prioritised for communication. In this philosophical exploration of AI systems diagrams, we integrate theories of conceptual models, communication theory, and semiotics. We discuss consequences of standardised diagrammatic languages for AI systems, concluding that while we expect engineers implementing systems to benefit from standards, researchers would have a larger benefit from guidelines.
Abstract Background Inferring over and extracting information from Large Language Models (LLMs) trained on a large corpus of scientific literature can potentially drive a new era in biomedical research, reducing the barriers for accessing existing medical evidence. This work examines the potential of LLMs for dialoguing with biomedical background knowledge, using the context of antibiotic discovery as an exemplar motivational scenario. The context of biomedical discovery from natural products entails understanding the relational evidence between an organism (e.g. a Fungi such as Albifimbria verrucaria), an associated chemical (Verrucarin A) and its associated antibiotic properties (present antibiotic activity). Results This work provides a systematic assessment on the ability of LLMs to encode and express these relations, verifying for fluency, prompt-alignment, semantic coherence, factual knowledge and specificity of generated responses. The systematic analysis is applied to nine state-of-the-art models, from models specialised on biomedical scientific corpora to general models such as ChatGPT and GPT-4 in two prompting-based tasks: chemical compound definition generation and chemical compound-fungus relation determination. Results show that while recent models have improved in fluency, factual accuracy is still low and models are biased towards over-represented entities. The ability of LLMs to serve as biomedical knowledge bases is questioned, and the need for additional systematic evaluation frameworks is highlighted. The best performing GPT-4 produced a factual definition for 70% of chemical compounds and 43.6% factual relations to fungi, whereas the best open source model BioGPT-large 30% of the compounds and 30% of the relations for the best-performing prompt. Conclusions The results show that while LLMs are currently not fit for purpose to be used as biomedical factual knowledge bases, there is a promising emerging property in the direction of factuality as the models become domain specialised, scale-up in size and level of human feedback.
Probing strategies have been shown to detect the presence of various linguistic features in large language models; in particular, semantic features intermediate to the "natural logic" fragment of the Natural Language Inference task (NLI). In the case of natural logic, the relation between the intermediate features and the entailment label is explicitly known: as such, this provides a ripe setting for interventional studies on the NLI models' representations, allowing for stronger causal conjectures and a deeper critical analysis of interventional probing methods. In this work, we carry out new and existing representation-level interventions to investigate the effect of these semantic features on NLI classification: we perform amnesic probing (which removes features as directed by learned linear probes) and introduce the mnestic probing variation (which forgets all dimensions except the probe-selected ones). Furthermore, we delve into the limitations of these methods and outline some pitfalls have been obscuring the effectivity of interventional probing studies.
This paper presents an abductive framework for multi-hop and interpretable textual inference. The reasoning process is guided by the notions unification power and plausibility of an explanation, computed through the interaction of two major architectural components: (a) An analogical reasoning model that ranks explanatory facts by leveraging unification patterns in a corpus of explanations; (b) An abductive reasoning model that performs a search for the best explanation, which is realised via conceptual abstraction and subsequent unification. We demonstrate that the Step-wise Conceptual Unification can be effective for unsupervised question answering, and as an explanation extractor in combination with state-of-the-art Transformers. An empirical evaluation on the Worldtree corpus and the ARC Challenge resulted in the following conclusions: (1) The question answering model outperforms competitive neural and multi-hop baselines without requiring any explicit training on answer prediction; (2) When used as an explanation extractor, the proposed model significantly improves the performance of Transformers, leading to state-of-the-art results on the Worldtree corpus; (3) Analogical and abductive reasoning are highly complementary for achieving sound explanatory inference, a feature that demonstrates the impact of the unification patterns on performance and interpretability.
Abstract Introduction Patients with persistent chest discomfort or other symptoms suggestive of ischaemia and ST segment elevation in two contiguous leads on electrocardiography should be prompt managed to revascularization and emergent angiography for percutaneous intervention in two hours is the preferred reperfusion strategy. Purpose Our aim is to show the importance of differential diagnosis in a patient with an initial diagnosis of ST segment elevation myocardial infarction (STEMI). Clinical case We present a case of 67 years old women with a past medical history of dyslipidemia and polymyalgia rheumatica, treated with rosuvastatin 10mg id and prednisolone 5 mg id. The patient was admitted to emergency department complaining of chest pain with 3 hours of evolution that started after a period of nausea and vomiting. Physical examination showed slight tachypnea with 22 breath per minute, blood pressure 93/40 mmHg, heart rate 110 beats per minute, oxygen saturation in room air 90%, heart sounds with a systolic murmur II/VI and lung crackles in inferior lobes, with no peripheral oedema. Electrocardiography showed sinus rhythm and ST segment elevation in DI, DII and V2-6. Patient was treated with aspirin 300mg, ticagrelor 180mg, furosemide 40mg, oxygen therapy and was scheduled for emergent coronariography. This procedure revealed no significant coronary lesions and ventriculography identified apical ballooning, diagnosing takotsubo myocardiopathy. Clinical condition starts to deteriorate, and an echocardiography identified akinetic apical and midventricular segments and hyperkinetic basal segments with systolic anterior motion of mitral valve, significant mitral regurgitation and left ventricular outflow tract obstruction (LVOTO) with an intraventricular gradient superior to 60 mmHg. Adequate hemodynamic monitoring and heart rate control allowed a substantial clinical improvement. Two days later a cardiac magnetic ressonance was done, confirmed the diagnosis and identified an apical thrombus. The patient was later discharged stable with oral hypocoagulation with anti-vitamin K antagonist. Discussion and Conclusion Takotsubo cardiomyopathy is a unique cardiac syndrome characterized by transient systolic dysfunction witch often mimics acute coronary syndromes (ACS). After exclusion of an ACS, echocardiography is of primordial importance in the assessment of these patients. Left heart failure with pulmonar oedema, mitral regurgitation, LVOTO and thrombus formation were all complications that were present in this clinical case and established the indication to proper therapeutic attitudes. Abstract P184 Figure.