logo
    SCITAB: A Challenging Benchmark for Compositional Reasoning and Claim Verification on Scientific Tables
    2
    Citation
    43
    Reference
    10
    Related Paper
    Citation Trend
    Abstract:
    Current scientific fact-checking benchmarks exhibit several shortcomings, such as biases arising from crowd-sourced claims and an over-reliance on text-based evidence. We present SCITAB, a challenging evaluation dataset consisting of 1.2K expert-verified scientific claims that 1) originate from authentic scientific publications and 2) require compositional reasoning for verification. The claims are paired with evidence-containing scientific tables annotated with labels. Through extensive evaluations, we demonstrate that SCITAB poses a significant challenge to state-of-the-art models, including table-based pretraining models and large language models. All models except GPT-4 achieved performance barely above random guessing. Popular prompting techniques, such as Chain-of-Thought, do not achieve much performance gains on SCITAB. Our analysis uncovers several unique challenges posed by SCITAB, including table grounding, claim ambiguity, and compositional reasoning. Our codes and data are publicly available at https://github.com/XinyuanLu00/SciTab.
    Keywords:
    Benchmark (surveying)
    Table (database)
    Scientific reasoning
    This paper divides ambiguity in English into intentional and unintentional ambiguity in terms of practical effect,analyzes systematically their causes,and puts forward avoiding ambiguity in practical comˉmunication and making full use of intentional ambiguity to reach the goal in communication.
    Citations (0)
    This article shows the importance and value of ambiguity to reveal opportunities hidden in problems and the manner in which ambiguity is removed from applications of design thinking. It describes the value of introducing, sustaining, and using ambiguity and explains the different types of ambiguity. It follows up by describing the events when a designer encounters ambiguity. This article proposes that an understanding of ambiguity is needed to harness its capabilities in finding innovative opportunities. To do so, design practitioners should consider (1) identifying the type of ambiguity needed to expand the scope of opportunity exploration and (2) becoming aware of and managing one's ability to work with ambiguity. Finally, it identifies the lack of literature on the impact of independent and collective experience on using ambiguity in design.
    Scope (computer science)
    Value (mathematics)
    Citations (1)
    Ambiguity is a common linguistic phenomenon and has attracted the attention of experts and scholars. From the communicative effect, ambiguity can be divided into intentional ambiguity and unintentional ambiguity. Unintentional ambiguity belongs to pragmatic failure, which influences the normal communication and need be disambiguated. However, Intentional ambiguity is a communication strategy for some particular communication purposes, which needn't be disambiguated and has a certain pragmatic value. The paper focuses on the positive use of intentional ambiguity in advertising, literary works, daily life, political speeches, etc.
    Phenomenon
    Value (mathematics)
    Citations (0)
    This paper is an attempt to analyze the phenomenon of ambiguity from the perspective of communication effects. This paper divides ambiguity into intentional ambiguity and unintentional ambiguity, gives examples combined with communication teaching and forwards methods to avoid and eliminate ambiguity so as to help students clear obstacles to communication.
    Phenomenon
    Citations (0)
    In the paper we discuss the problem of evaluating disc performance with benchmarks. In particular, we concentrate on assessing benchmark properties. For this purpose we have developed benchmark managing platform which allows us to enhance the benchmark execution process with monitoring performance counters. The developed methodology and tool do not need additional benchmark instrumentation and have a negligible impact on its execution. The usefulness of this approach has been illustrated with experimental results covering a representative set of benchmark programs.
    Benchmark (surveying)
    Instrumentation
    Benchmarking
    Citations (0)
    Abstract We investigated decision‐making for others in ambiguous settings. In an online survey, subjects were asked to make decisions for themselves, and for other people. In Experiment 1, ambiguity was conveyed in numerical ranges. In Experiment 2, we used verbal probabilities expressions to convey uncertainty. Decisions encompass three degrees of ambiguity (low, moderate, and high). Consistent with previous literature findings, our results showed no significant differences between self‐other decision‐making on ambiguity. We build on the existing literature on ambiguity attitudes, emphasizing the use of verbal probability expressions to measure ambiguity, and provide novel evidence into decision‐making for others.
    Citations (0)
    It is standard to assume people making an uncertain choice experience no ambiguity when they know the probabilities that actually apply to the possible outcomes. However, bounded rationality, among other possible factors, may effectively create ambiguity in such cases. This chapter examines data from an experiment that allows us to compare decision behaviour under total ambiguity with that under 'no ambiguity'. The experimental evidence indicates people experience ambiguity even when none seems to be present, and the ambiguity biases decision behaviour in a systematic way. In particular, it makes prospects with an intermediate variance level particularly attractive, and it makes high variance prospects more attractive than they otherwise would be.
    Bounded rationality
    The process of implementing monetary policy by the Central Bank is that of taking the advantage of the manipulative instruments to reach the goals of policy so the Central Bank should choose the manipulative instruments according to certain benchmark so as to reach its goals better. The benchmark of choosing instruments of monetary policy by the Central Bank include theoretical benchmark and empirical benchmark. The theoretical benchmark consist of external and internal benchmark. Internal benchmark are the most important benchmark for daily operation,which include initiative benchmark,fine-tuning benchmark,signal-functioning benchmark,timeliness benchmark,and operablity benchmark.
    Benchmark (surveying)
    Citations (0)