Large language models (LLMs) have shown impressive capabilities across various natural language tasks. However, evaluating their alignment with human preferences remains a challenge. To this end, we propose a comprehensive human evaluation framework to assess LLMs' proficiency in following instructions on diverse real-world tasks. We construct a hierarchical task tree encompassing 7 major areas covering over 200 categories and over 800 tasks, which covers diverse capabilities such as question answering, reasoning, multiturn dialogue, and text generation, to evaluate LLMs in a comprehensive and in-depth manner. We also design detailed evaluation standards and processes to facilitate consistent, unbiased judgments from human evaluators. A test set of over 3,000 instances is released, spanning different difficulty levels and knowledge domains. Our work provides a standardized methodology to evaluate human alignment in LLMs for both English and Chinese. We also analyze the feasibility of automating parts of evaluation with a strong LLM (GPT-4). Our framework supports a thorough assessment of LLMs as they are integrated into real-world applications. We have made publicly available the task tree, TencentLLMEval dataset, and evaluation methodology which have been demonstrated as effective in assessing the performance of Tencent Hunyuan LLMs. By doing so, we aim to facilitate the benchmarking of advances in the development of safe and human-aligned LLMs.
With the rising consumption of oil resources, major oil companies around the world have increasingly engaged in offshore oil exploration and development, and offshore oil resources have accounted for an increasing proportion. Offshore oil engineering projects are capital intensive, and the development of offshore oil fields faces a tough battle, especially in a period of low oil prices. Thus, a comprehensive evaluation model is highly needed to help assess economic benefits and provide meaningful and valuable information for operators and investors to make sensible decisions. This study firstly proposed a realistic and integrated evaluation model for offshore oil development based on actual historical project data. This evaluation model incorporated modules from the underwater system to the platform system and processes from oil reservoir extraction to oil, gas and water treatment. The uncertain parameters in the evaluation process are dealt with by sensitivity analysis and Monte Carlo simulation. The proposed model is applied to a typical offshore oil development project in Bohai Bay, China. The results reveal that the recovery factor and oil price have the greatest impact on the economic benefits. In the case of deterministic analysis, the breakeven oil price of the project is 40.59 USD/bbl. After considering the uncertainty of project parameters, the higher the oil price, the greater the probability of NPV > 0. When the oil price is higher than 70 USD/bbl, even with uncertain project parameters, the probability of NPV > 0 can still be as high as 97.39%.
Detecting a small number of suspicious network events from large amount of network traffic data is a very challenging task. We extract time series features from the network log data and use models such as LightGBM and stacked CNN-LSTM deep neural networks to predict whether the investigated alerts are suspicious. We apply feature alignment, covariate shift adaptation to overcome the covariate shift between training data and test data. In the IEEE Big Data 2019 Cup: Suspicious Network Event Recognition Competition, our model scored the first place on the public board and the fourth place on the final board respectively.
Digital image of pipeline weld is an important basis for the reliability management of pipeline welds. However, the error rate of artificial discrimination is high. In order to increase the defect identification accuracy of digital image of pipeline weld, we adopted several methods (e.g. multiple edge detection, detection channel and threshold segmentation) to carry out image processing on the image defects of pipeline welds. Then, a defect characteristic database on the digital images of pipeline welds was constructed, including grayscale difference, equivalent area (S/C), circularity, entropy, correlation and other parameters. Furthermore, a multi-classifier construction (SVM) model was established. Thus, the classification and evaluation on the defects in the digital images of pipeline welds were realized. Finally, an automatic defect identification software for digital image of pipeline weld was developed and verified on site. And the following research results were obtained. First, after image processing, the edge detection results obtained by Canny and other algorithms are satisfactory when there is no noise. In the case of noise, however, pseudo-edge emerges in the detection results. In this case, the automatic threshold selection method shall be adopted to detect the image edge to obtain the rational threshold. Second, there are 14 parameters in the defect characteristic database, including shape characteristic, lamination characteristic and image length pixel. Third, by virtue of the SVM classification model, the shape characteristics of each type of defect can be clarified, and the defect characteristics can be identified, such as crack, slag inclusion, air hole, incomplete penetration, non-fusion and strip. Based on field application, the following results were obtained. First, this automatic defect identification technology is applicable to quality identification and evaluation of various defects in pipeline welds. Second, its identification accuracy is higher than 90%. Third, by virtue of this technology, automatic defect identification and evaluation of digital image of pipeline weld is realized. In conclusion, these research results help to ensure the safe operation of pipelines.
Underground hydrogen storage represents an innovative approach to energy storage. To ensure the secure operation of subterranean hydrogen storage strings, a computational fluid dynamics (CFD) methodology was employed to devise an erosion assessment model tailored for high-velocity conditions. The research delved into the erosion and abrasion dynamics of these storage strings when subjected to high-speed gas flows. This study further examined the impacts of gas velocity, particle size, pipe material, and pipe wall corrosion imperfections on flow patterns and erosion wear rates across the column. The outcomes revealed several noteworthy trends. As fluid velocity increased, the flow field’s maximum pressure augmented, while it decreased alongside enlarging pipe diameter and particle size. P110 pipe material exhibited higher maximum pressure in comparison to N80. The effect of centrifugal force induced pressure to surge from the inner to the outer portion of the column. In the curved pipe section’s outer wall, the frequent occurrence of high-angle collisions engendered elevated rates of erosion wear over time. Particularly noteworthy was the observation of the highest erosion rate in curved pipes showcasing three corrosion defects, attributed to the backflow effects of erosion pits.
Data mining is one of the statistical means that extracts useful information from an extremely large set of raw data. Therefore, data mining methods are under vigorous development and are commonly used in artificial intelligence fields such as image processing and robot industry. There has also been recently applications of data mining in electric power industry, such as classification, clustering and forecasting. In this research work, clustering techniques are adopted to identify the phase connectivity in power systems. Supported by smart meter data obtained from end-users on the low-voltage (LV) feeder, phase identification is properly discussed in this paper. Firstly, the LV network model is modeled using simulation tool OpenDSS. Secondly, the phase identification algorithm of the LV network is developed in Matlab by using K-means clustering as well as the Gaussian Mixture Model (GMM) clustering. Finally, the IEEE European Low Voltage Test Feeder is used to verify the proposed method. Results indicate that these two methods enable phase identification to realize its goals, which is to precisely address the active loads as well as the correlated phase of corresponding load.
Gas reservoir-type underground gas storage (UGS) plays a critical role in China’s natural gas reserves and peak shaving, serving as an essential component of the energy security system. Its unique cyclic injection and production operations not only stabilize the natural gas supply but also impose stringent requirements on the safety and integrity of geological structures, wellbores, and surface facilities. Weaknesses in current practices can cause accidents, directly threatening energy security. Therefore, continuously improving integrity management is the key to mitigating energy risks. Currently, the integrity management of gas storage faces challenges such as an abundance of standards and the complexity of management elements, which affect both operational safety and management efficiency. To address these issues, this study systematically analyzes domestic and international standards related to gas storage and establishes a technical system based on “three-in-one” integrity management (geological structure, wellbore, and surface facilities). Key elements of integrity management are identified and optimized, and recommended execution standards for critical factors are proposed to provide a theoretical basis and decision-making support for the safe operation of gas storage. This study not only offers a reference for optimizing and implementing integrity management standards but also has significant practical implications for enhancing energy security and reducing energy risks, ensuring the smooth execution of China’s natural gas reserve and peak shaving initiatives.