OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and
Safety
Chuang LiuLinhao YuJiaxuan LiRenren JinYufei HuangLing ShiJunhui ZhangXinmeng JiTingting CuiTao LiuJinwang SongHongying ZanLi SunDeyi Xiong
0
Citation
0
Reference
10
Related Paper
Abstract:
The rapid development of Chinese large language models (LLMs) poses big challenges for efficient LLM evaluation. While current initiatives have introduced new benchmarks or evaluation platforms for assessing Chinese LLMs, many of these focus primarily on capabilities, usually overlooking potential alignment and safety issues. To address this gap, we introduce OpenEval, an evaluation testbed that benchmarks Chinese LLMs across capability, alignment and safety. For capability assessment, we include 12 benchmark datasets to evaluate Chinese LLMs from 4 sub-dimensions: NLP tasks, disciplinary knowledge, commonsense reasoning and mathematical reasoning. For alignment assessment, OpenEval contains 7 datasets that examines the bias, offensiveness and illegalness in the outputs yielded by Chinese LLMs. To evaluate safety, especially anticipated risks (e.g., power-seeking, self-awareness) of advanced LLMs, we include 6 datasets. In addition to these benchmarks, we have implemented a phased public evaluation and benchmark update strategy to ensure that OpenEval is in line with the development of Chinese LLMs or even able to provide cutting-edge benchmark datasets to guide the development of Chinese LLMs. In our first public evaluation, we have tested a range of Chinese LLMs, spanning from 7B to 72B parameters, including both open-source and proprietary models. Evaluation results indicate that while Chinese LLMs have shown impressive performance in certain tasks, more attention should be directed towards broader aspects such as commonsense reasoning, alignment, and safety.Keywords:
Benchmarking
This book will teach you how benchmarking is used to improve performance, set quality objectives, and identify and adapt to best processes. Most of us routinely use benchmarking--to measure, match, compare, evaluate--all to establish a standard of what we believe is best. However, the critical elements of our customers' expectations and requirements are often missing. This book adds those elements into the process, and demonstrates six other types of benchmarking, while helping you decide which method will suit your needs. Coverage includes- To give a historical perspective on benchmarking To provide reasons for benchmarking To discuss the benchmarking process To give tips for successful benchmarking
Benchmarking
Best practice
Cite
Citations (24)
Provides an overview of results achieved so far into research into the processes of benchmarking by the Open University, and reports on a survey showing levels of benchmarking activity and its distribution amongst different industry sectors; presents their reasons for carrying out benchmarking, and summarizes the perceived benefits. Relates benchmarking to other quality management practices, and suggests there is a need to place more emphasis on the processes people are undertaking during a benchmarking exercise.
Benchmarking
Best practice
Cite
Citations (15)
Abstract Benchmarking—the process of measuring a company's current business operations and comparing them to those of best‐practices companies—has emerged in recent years as an important tool in total quality management (TQM) programs across the United States. A 1991 survey found that benchmarking is increasing and that this trend is expected to continue. It has been estimated that three quarters of the Fortune 500 are engaged in the practice. Still, most companies have not instituted benchmarking and do not know how to go about it. This article explores the benchmarking process and its benefits and outlines the steps that companies can take if they wish to establish benchmarking in their own firms. The specific experiences of three companies that have recently embraced benchmarking—Life Technologies Inc., ICI Films, and Caterpillar Inc.—are discussed. It is hoped that their enthusiastic commitment will illustrate the value of benchmarking to the many firms that have yet to adopt the process.
Benchmarking
Best practice
Total Quality Management
World class
Cite
Citations (49)
There is a growing expectation for staff to participate in benchmarking activities. If benchmarking projects are to be successful, managers and clinicians need to be aware of the steps involved. In this article, we identify key aspects of benchmarking and consider how clinicians and managers can respond to and meet contemporary requirements for the development of sound benchmarking relationships. Practicalities and issues that must be considered by benchmarking teams are also outlined. Before commencing a benchmarking project, ground rules and benchmarking agreements must be developed and ratified. An understandable benchmarking framework is required: one that is sufficiently robust for clinicians to engage in benchmarking activities and convince others that benchmarking has taken place. There is a need to build the capacity of clinicians in relation to benchmarking.
Benchmarking
Best practice
Cite
Citations (2)
A review of benchmarking literature exposed that due to the development there are different types of benchmarking and different benchmarking models. There are universal models, but also models uniquely developed for particular benchmarking types. Each model varies in terms of the number of phases involved, number of steps involved, application, etc. The research focuses on one of the most popular benchmarking types: best practice benchmarking. Best practice benchmarking describes the comparison of performance data obtained by studying similar processes or activities and identifying, adapting, as well as implementing the practices that revealed the best results. The research intends to propose best practice benchmarking model after reviewing existing benchmarking models in literature. Implementing successful benchmarking project requires more than abidance to the step-wise models. Factors, which have an impact on the adoption of best practices benchmarking models will be highlighted, because many companies are involved in benchmarking, but adoption of best practices is not as height as might be expected. Key words: benchmarking model, best practice benchmarking, benchmarking cycle.
Benchmarking
Best practice
Cite
Citations (8)
Fierce competition, globalization and the development of new information and communication technologies have constrained organizations to continuously search for and adopt new configurations/practices for survival. Benchmarking is a practical approach, which has proved to be very effective in helping individual companies evaluate their competitive position vis-a-vis other best performers. This paper provides an overview of developments and trends in benchmarking and includes references to various types of benchmarking. It is attempted to provide a simple benchmarking process model for an individual organization by carrying out organized search for best practices. The paper enlists benefits of benchmarking including discussion on select benchmarking success stories. The paper concludes that benchmarking is a powerful approach for bringing improvements in any organizational area and transforming it into a world-class organization.
Benchmarking
Best practice
World class
Cite
Citations (0)
Linking Strategic Planning with Benchmarking Understanding the Essentials of Process Benchmarking Applying Benchmarking Results for Maximum Utility Doing an Internal Benchmarking Study Conducting a Competitive Benchmarking Study Performing a Functional Benchmarking Study Developing a Generic Benchmarking Study Expanding Benchmarking for Broader Applications Creating a Benchmarking Capability Chapter Notes Appendices Index.
Benchmarking
Best practice
Cite
Citations (298)
Article analyzing following issue: benchmarking concept, types of benchmarking, benchmarking methodology. Benchmarking is used to improve performance by understanding the methods and practices required to achieve world – class performance levels. Benchmarking is very different types: performance benchmarking benchmarking, process benchmarking, functional benchmarking, internal benchmarking, external benchmarking, international benchmarking. Benchmarking proceeds in phases: planning, data collection, analysis, adaptation and implementation of good practices.
Benchmarking
World class
Cite
Citations (2)
Argues that benchmarking is much more fundamental to strategic thinking than other tools relied upon thus far by senior managers in their decision‐making processes. Argues that for benchmarking to be effective it has to be closely linked to total quality management programmes in place. Strongly recommends that the focal point has to be on understanding process and behaviour (i.e. the means/enablers) before asking questions about results (i.e. the outcomes). Unless such a discipline is established, results can only be considered as absolutes and as such are not useful in telling us why differences take place. Benchmarking works inwards by helping organizations set desired goals and objectives and set about achieving them through continuous improvement activities. Proposes an implementation strategy for benchmarking based on 16 steps. In addition highlights some factors which are considered to be critical if the practice of benchmarking is to lead to any results. Provides some guidelines on ensuring that benchmarking remains a powerful strategic tool.
Benchmarking
Best practice
Total Quality Management
Cite
Citations (109)
Comparing ourselves: using benchmarking techniques to measure performance between academic libraries
Report of the LIRG seminar: The Effective academic library held on Tuesday 12th June 2001 at Staffordshire University
We can learn a lot from others. Benchmarking provides a structural framework for making comparisons with other organisations. The techniques enable us to learn from one another by looking at why there are differences in performance outcomes between organisations undertaking similar functions. This seminar concentrated on:-
- Importance of benchmarking / benchmarking techniques
- Establishment of benchmarking consortia
- Utilising statistics and performance indicators
- Practical examples of how academic libraries have evaluated and improved their services through benchmarking
Benchmarking
Academic library
Cite
Citations (1)