The advent of Large Language Models (LLMs) has shown the potential to improve relevance and provide direct answers in web searches. However, challenges arise in validating the reliability of generated results and the credibility of contributing sources, due to the limitations of traditional information retrieval algorithms and the LLM hallucination problem. Aiming to create a "PageRank" for the LLM era, we strive to transform LLM into a relevant, responsible, and trustworthy searcher. We propose a novel generative retrieval framework leveraging the knowledge of LLMs to foster a direct link between queries and online sources. This framework consists of three core modules: Generator, Validator, and Optimizer, each focusing on generating trustworthy online sources, verifying source reliability, and refining unreliable sources, respectively. Extensive experiments and evaluations highlight our method's superior relevance, responsibility, and trustfulness against various SOTA methods.
Recent neural language models have taken a significant step forward in producing remarkably controllable, fluent, and grammatical text. Although studies have found that AI-generated text is not distinguishable from human-written text for crowd-sourcing workers, there still exist errors in AI-generated text which are even subtler and harder to spot. We primarily focus on the scenario in which scientific AI writing assistant is deeply involved. First, we construct a feature description framework to distinguish between AI-generated text and human-written text from syntax, semantics, and pragmatics based on the human evaluation. Then we utilize the features, i.e., writing style, coherence, consistency, and argument logistics, from the proposed framework to analyze two types of content. Finally, we adopt several publicly available methods to investigate the gap of between AI-generated scientific text and human-written scientific text by AI-generated scientific text detection models. The results suggest that while AI has the potential to generate scientific content that is as accurate as human-written content, there is still a gap in terms of depth and overall quality. The AI-generated scientific content is more likely to contain errors in factual issues. We find that there exists a "writing style" gap between AI-generated scientific text and human-written scientific text. Based on the analysis result, we summarize a series of model-agnostic and distribution-agnostic features for detection tasks in other domains. Findings in this paper contribute to guiding the optimization of AI models to produce high-quality content and addressing related ethical and security concerns.
Deep learning (DL) has shown superior performance in many areas, making the quality assurance of DL-based software particularly important. Adversarial examples are generated by deliberately adding subtle perturbations in input samples and can easily attack less reliable DL models. Most existing works only utilize a single metric to evaluate the generated adversarial examples, such as attacking success rate or structure similarity measure. The problem is that they cannot avoid extreme testing situations and provide multifaceted evaluation results.This paper presents MetaA, a multi-dimensional evaluation framework for testing ability of adversarial examples in deep learning. Evaluating the testing ability represents measuring the testing performance to make improvements. Specifically, MetaA performs comprehensive validation on generating adversarial examples from two horizontal and five vertical dimensions. We design MetaA according to the definition of the adversarial examples and the issue mentioned in [1] that how to enrich the evaluation dimension rather than merely quantifying the improvement of DL and software.We conduct several analyses and comparative experiments vertically and horizontally to evaluate the reliability and effectiveness of MetaA. The experimental results show that MetaA can avoid speculation and reach agreement among different indicators when they reflect inconsistencies. The detailed and comprehensive analysis of evaluation results can further guide the optimization of adversarial examples and the quality assurance of DL-based software.
Large-scale language models (LLMs) have demonstrated impressive performance, but their deployment presents challenges due to their significant memory usage. This issue can be alleviated through quantization. In this paper, we identify that the challenge in quantizing activations in LLMs arises from varying ranges across channels, rather than solely the presence of outliers. To address this challenge, we introduce a quantization method called RPTQ, which utilizes a reorder-based approach. By rearranging the channels and quantizing them in clusters, RPTQ effectively mitigates the impact of range differences between channels. To minimize the overhead of the reorder operation, we fuse it into the layer norm operation and weights in linear layers. In our experiments, RPTQ achieved a significant breakthrough by utilizing 3-bit activation in LLMs for the first time, resulting in a substantial reduction in memory usage. For instance, quantizing OPT-175b can lead to a memory consumption reduction of up to 80%.
This research is mainly concerned with establishing the vocabulary learning needs and
goals of the Engineering students from Southeast Asia studying at British universities.
The research was motivated by the needs to enhance the reading skills of these
students. Subtechnical and technical vocabulary are the focus of this investigation.
The research is based on data derived from a 536,051 word corpus of text from
recommended Engineering textbooks. The relative frequency and range of lexis within
the corpus was found to be a good criterion for identifying subtechnical and technical
vocabulary. The students proved to have a better receptive knowledge of subtechnical
than technical vocabulary. The research suggests that there is a need for collaborative
work between ESP teachers and subject teachers to help the students with technical
vocabulary.
The thesis is divided into nine chapters. Chapter One is a review of literature to the
research. It clarifies various definitions and concepts, describes the research approach,
and provides a framework of the thesis. Chapter Two investigates my subjects overall
vocabulary knowledge. Chapter Three introduces some preliminary data that contrasts
the received opinions in ESP regarding technical and subtechnical vocabulary. For
further investigation of these two types of vocabulary, Chapter Four describes the data
on which empirical studies are based. Chapter Five analyses the data. Chapter Six
presents the empirical studies and concludes that students receptive knowledge of
subtechnical vocabulary is better than their technical vocabulary. Chapter Seven
examines the reasons why technical vocabulary was problematic. Chapter Eight
summarises the research findings and proposes pedagogical implications in the teaching
of subtechnical and technical vocabulary to the specified group of learners. And
Chapter Nine draws conclusions, discusses limitations of the research and makes
recommendations for future research.
Retrieval-Augmented Generation (RAG) is applied to solve hallucination problems and real-time constraints of large language models, but it also induces vulnerabilities against retrieval corruption attacks. Existing research mainly explores the unreliability of RAG in white-box and closed-domain QA tasks. In this paper, we aim to reveal the vulnerabilities of Retrieval-Enhanced Generative (RAG) models when faced with black-box attacks for opinion manipulation. We explore the impact of such attacks on user cognition and decision-making, providing new insight to enhance the reliability and security of RAG models. We manipulate the ranking results of the retrieval model in RAG with instruction and use these results as data to train a surrogate model. By employing adversarial retrieval attack methods to the surrogate model, black-box transfer attacks on RAG are further realized. Experiments conducted on opinion datasets across multiple topics show that the proposed attack strategy can significantly alter the opinion polarity of the content generated by RAG. This demonstrates the model's vulnerability and, more importantly, reveals the potential negative impact on user cognition and decision-making, making it easier to mislead users into accepting incorrect or biased information.
Chlorogenic acid (CGA), a dietary natural phenolic acid, has been widely reported to regulate glucose and lipid metabolism. However, the protective effects and the underlying mechanisms of CGA on glucagon-induced hepatic glucose production remain largely uncharacterized. Herein, we investigated the efficacy of CGA on hepatic gluconeogenesis both in vivo and in vitro. The elevated levels of endogenous glucose production induced by infusion of glucagon or pyruvate were lowered in mice administered with CGA. Furthermore, chronic CGA treatment ameliorated the accumulation of glucose and ceramide in high-fat diet (HFD)-fed mice. CGA also attenuated HFD-fed-induced inflammation response. The protective effect of CGA on glucose production was further confirmed in primary mouse hepatocytes by inhibiting accumulation of ceramide and expression of p38 MAPK. Moreover, CGA administration in HFD-fed mice preserved the decreased phosphorylation of Akt in the liver, resulting in the inhibition of FoxO1 activation and, ultimately, hepatic gluconeogenesis. However, these protective effects were significantly attenuated by the addition of C2 ceramide. These results suggest that CGA inhibits ceramide accumulation to restrain hepatic glucagon response.