SCITAB: A Challenging Benchmark for Compositional Reasoning and Claim Verification on Scientific Tables

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (2023)

Xinyuan Lu Liangming Pan Qian Liu Preslav Nakov Min‐Yen Kan

Citation

Reference

Related Paper

Citation Trend

Abstract:

Current scientific fact-checking benchmarks exhibit several shortcomings, such as biases arising from crowd-sourced claims and an over-reliance on text-based evidence. We present SCITAB, a challenging evaluation dataset consisting of 1.2K expert-verified scientific claims that 1) originate from authentic scientific publications and 2) require compositional reasoning for verification. The claims are paired with evidence-containing scientific tables annotated with labels. Through extensive evaluations, we demonstrate that SCITAB poses a significant challenge to state-of-the-art models, including table-based pretraining models and large language models. All models except GPT-4 achieved performance barely above random guessing. Popular prompting techniques, such as Chain-of-Thought, do not achieve much performance gains on SCITAB. Our analysis uncovers several unique challenges posed by SCITAB, including table grounding, claim ambiguity, and compositional reasoning. Our codes and data are publicly available at https://github.com/XinyuanLu00/SciTab.

Keywords:

Benchmark (surveying)

Table (database)

Scientific reasoning

Topics:

Topic Modeling

Data Quality and Management

Natural Language Processing Techniques

10.18653/v1/2023.emnlp-main.483

Cite

PDF

On the Treatment of Intentional Ambiguity and Unintentional Ambiguity in English

Journal of Chongqing Institute of Technology (2005)

Tang Chang-ping

This paper divides ambiguity in English into intentional and unintentional ambiguity in terms of practical effect,analyzes systematically their causes,and puts forward avoiding ambiguity in practical comˉmunication and making full use of intentional ambiguity to reach the goal in communication.

Source

Cite

Citations (0)

Finding and Using Ambiguity to Search for Innovation Opportunities

Design Management Journal (2018)

Linus Tan Thomas Kvan

This article shows the importance and value of ambiguity to reveal opportunities hidden in problems and the manner in which ambiguity is removed from applications of design thinking. It describes the value of introducing, sustaining, and using ambiguity and explains the different types of ambiguity. It follows up by describing the events when a designer encounters ambiguity. This article proposes that an understanding of ambiguity is needed to harness its capabilities in finding innovative opportunities. To do so, design practitioners should consider (1) identifying the type of ambiguity needed to expand the scope of opportunity exploration and (2) becoming aware of and managing one's ability to work with ambiguity. Finally, it identifies the lack of literature on the impact of independent and collective experience on using ambiguity in design.

Scope (computer science)

Value (mathematics)

10.1111/dmj.12045

Cite

Citations (1)

Study on the Pragmatic Value of English Intentional Ambiguity

Liu Don

Ambiguity is a common linguistic phenomenon and has attracted the attention of experts and scholars. From the communicative effect, ambiguity can be divided into intentional ambiguity and unintentional ambiguity. Unintentional ambiguity belongs to pragmatic failure, which influences the normal communication and need be disambiguated. However, Intentional ambiguity is a communication strategy for some particular communication purposes, which needn't be disambiguated and has a certain pragmatic value. The paper focuses on the positive use of intentional ambiguity in advertising, literary works, daily life, political speeches, etc.

Phenomenon

Value (mathematics)

Cite

Citations (0)

The Phenomenon of Ambiguity in communication

Journal of Shaanxi Institute of Technology (2004)

Xiaoyan Li

This paper is an attempt to analyze the phenomenon of ambiguity from the perspective of communication effects. This paper divides ambiguity into intentional ambiguity and unintentional ambiguity, gives examples combined with communication teaching and forwards methods to avoid and eliminate ambiguity so as to help students clear obstacles to communication.

Phenomenon

Source

Cite

Citations (0)

Types of Ambiguity

Palgrave Macmillan UK eBooks (2006)

David Wilkinson

10.1057/9780230597891_3

Cite

Citations (3)

Support Structure Performance Benchmark

Light Engineering für die Praxis (2023)

Katharina Bartsch

Benchmark (surveying)

10.1007/978-3-031-22956-5_7

Cite

Citations (0)

Exploring disk performance benchmarks

Proceedings of SPIE, the International Society for Optical Engineering/Proceedings of SPIE (2017)

Kazimierz Krosman J. Sosnowski

In the paper we discuss the problem of evaluating disc performance with benchmarks. In particular, we concentrate on assessing benchmark properties. For this purpose we have developed benchmark managing platform which allows us to enhance the benchmark execution process with monitoring performance counters. The developed methodology and tool do not need additional benchmark instrumentation and have a negligible impact on its execution. The usefulness of this approach has been illustrated with experimental results covering a representative set of benchmark programs.

Benchmark (surveying)

Instrumentation

Benchmarking

10.1117/12.2280711

Cite

Citations (0)

Decision‐making for others: Ambiguity attitudes

Managerial and Decision Economics (2024)

Antonio Carlos Mercer Ângela Cristiane Santos Póvoa Wesley Pech

Abstract We investigated decision‐making for others in ambiguous settings. In an online survey, subjects were asked to make decisions for themselves, and for other people. In Experiment 1, ambiguity was conveyed in numerical ranges. In Experiment 2, we used verbal probabilities expressions to convey uncertainty. Decisions encompass three degrees of ambiguity (low, moderate, and high). Consistent with previous literature findings, our results showed no significant differences between self‐other decision‐making on ambiguity. We build on the existing literature on ambiguity attitudes, emphasizing the use of verbal probability expressions to measure ambiguity, and provide novel evidence into decision‐making for others.

10.1002/mde.4321

Cite

Citations (0)

Might ambiguity exist when none seems to exist?

Edward Elgar Publishing eBooks (2023)

Mina Mahmoudi Mark Pingle Rattaphon Wuthisatian

It is standard to assume people making an uncertain choice experience no ambiguity when they know the probabilities that actually apply to the possible outcomes. However, bounded rationality, among other possible factors, may effectively create ambiguity in such cases. This chapter examines data from an experiment that allows us to compare decision behaviour under total ambiguity with that under 'no ambiguity'. The experimental evidence indicates people experience ambiguity even when none seems to be present, and the ambiguity biases decision behaviour in a systematic way. In particular, it makes prospects with an intermediate variance level particularly attractive, and it makes high variance prospects more attractive than they otherwise would be.

Bounded rationality

10.4337/9781839107948.00037

Cite

Citations (0)

Theoretical Analysis of the Benchmark for Choosing Manipulative Instruments of Monetary Policies

Yuejiang Academic Journal (2009)

Zhu En-tao

The process of implementing monetary policy by the Central Bank is that of taking the advantage of the manipulative instruments to reach the goals of policy so the Central Bank should choose the manipulative instruments according to certain benchmark so as to reach its goals better. The benchmark of choosing instruments of monetary policy by the Central Bank include theoretical benchmark and empirical benchmark. The theoretical benchmark consist of external and internal benchmark. Internal benchmark are the most important benchmark for daily operation,which include initiative benchmark,fine-tuning benchmark,signal-functioning benchmark,timeliness benchmark,and operablity benchmark.

Benchmark (surveying)

Source

Cite

Citations (0)