Evaluating open-domain dialogue systems is challenging for reasons such as the one-to-many problem, i.e., many appropriate responses other than just the golden response. As of now, automatic evaluation methods need better consistency with humans, while reliable human evaluation can be time- and cost-intensive. To this end, we propose the Reference-Assisted Dialogue Evaluation (RADE) approach under the multi-task learning framework, which leverages the pre-created utterance as reference other than the gold response to relief the one-to-many problem. Specifically, RADE explicitly compares reference and the candidate response to predict their overall scores.Moreover, an auxiliary response generation task enhances prediction via a shared encoder.To support RADE, we extend three datasets with additional rated responses other than just a golden response by human annotation.Experiments on our three datasets and two existing benchmarks demonstrate the effectiveness of our method, where Pearson, Spearman, and Kendall correlations with human evaluation outperform state-of-the-art baselines.
Recent information retrieval (IR) models are pre-trained and instruction-tuned on massive datasets and tasks, enabling them to perform well on a wide range of tasks and potentially generalize to unseen tasks with instructions. However, existing IR benchmarks focus on a limited scope of tasks, making them insufficient for evaluating the latest IR models. In this paper, we propose MAIR (Massive Instructed Retrieval Benchmark), a heterogeneous IR benchmark that includes 126 distinct IR tasks across 6 domains, collected from existing datasets. We benchmark state-of-the-art instruction-tuned text embedding models and re-ranking models. Our experiments reveal that instruction-tuned models generally achieve superior performance compared to non-instruction-tuned models on MAIR. Additionally, our results suggest that current instruction-tuned text embedding models and re-ranking models still lack effectiveness in specific long-tail tasks. MAIR is publicly available at https://github.com/sunnweiwei/Mair.
Evaluating open-domain dialogue systems is challenging for reasons such as the one-to-many problem, i.e., many appropriate responses other than just the golden response. As of now, automatic evaluation methods need better consistency with humans, while reliable human evaluation can be time- and cost-intensive. To this end, we propose the Reference-Assisted Dialogue Evaluation (RADE) approach under the multi-task learning framework, which leverages the pre-created utterance as reference other than the gold response to relief the one-to-many problem. Specifically, RADE explicitly compares reference and the candidate response to predict their overall scores. Moreover, an auxiliary response generation task enhances prediction via a shared encoder. To support RADE, we extend three datasets with additional rated responses other than just a golden response by human annotation. Experiments on our three datasets and two existing benchmarks demonstrate the effectiveness of our method, where Pearson, Spearman, and Kendall correlations with human evaluation outperform state-of-the-art baselines.
Many studies have shown that long noncoding RNAs (lncRNAs) are closely related to the stimulation of osteogenic differentiation of adipose-derived stem cells (ADSCs) and the prevention of osteoporosis. Current research aimed to investigate the novel lncRNA and explored the function and molecular mechanism of the LINC00314/miR-129-5p/GRM5 axis in regulating osteogenic differentiation of ADSCs.LncRNA and miRNA sequencing was performed in normal and osteogenic differentiation-induced ADSCs (osteogenic group). Abnormally expressed lncRNAs and miRNAs were obtained by the R software and the relative expression of LINC00314, miR-129-5p, and GRM5 during osteogenic induction was measured by RT-PCR. ADSCs were then transfected with pcDNA3.1-sh-LINC00314 and agomiR-129-5p. Alizarin red staining (ARS) and alkaline phosphatase (ALP) staining were performed to identify the mechanism of the LINC00314/miR-129-5p/GRM5 axis in regulating osteogenic differentiation of ADSCs.LINC00314 was significantly upregulated in the group of osteogenic-induced ADSCs. LINC00314 and GRM5 mimics increased the early and late markers of osteogenic differentiation, which manifest in not only the markedly increased ALP activity but also higher calcium deposition, while miR-129-5p mimic had the opposite effects. LINC00314 directly targeted miR-129-5p through luciferase reporter assay, and miR-129-5p suppressed GRM5 expression. Moreover, the LINC00314/miR-129-5p/GRM5 regulatory axis activated the Wnt/β-catenin signaling pathway.LINC00314 confers contributory function in the osteogenic differentiation of ADSCs and thus the LINC00314/miR-129-5p/GRM5 axis may be a novel mechanism for osteogenic-related disease.
Large language model agents have demonstrated remarkable advancements across various complex tasks. Recent works focus on optimizing the agent team or employing self-reflection to iteratively solve complex tasks. Since these agents are all based on the same LLM, only conducting self-evaluation or removing underperforming agents does not substantively enhance the capability of the agents. We argue that a comprehensive evaluation and accumulating experience from evaluation feedback is an effective approach to improving system performance. In this paper, we propose Reusable Experience Accumulation with 360{\deg} Assessment (360{\deg}REA), a hierarchical multi-agent framework inspired by corporate organizational practices. The framework employs a novel 360{\deg} performance assessment method for multi-perspective performance evaluation with fine-grained assessment. To enhance the capability of agents in addressing complex tasks, we introduce dual-level experience pool for agents to accumulate experience through fine-grained assessment. Extensive experiments on complex task datasets demonstrate the effectiveness of 360{\deg}REA.
Objective To investigate the effect of recombinant human erythropoietin(rh-EPO) on the nerve regeneration of adult rats sciatic nerves. Methods Tirty-six healthy male Wistar rats were involved and left sciatic nerve repaired model was used.The experimental rats were divided randomly into two groups:the EPO group and the control group,18 rats in each group.rh-EPO 3 000 U/kg was injected daily into the abdominal in EPO group,and normal saline was injected into the abdominal every day after operation in control group.On 4 and 8 weeks after operation,these items were determined,the sciatic function index (SFI),biomechanics examination,histological observation,electrophysiological examination,myelinated fibers density and sectional area measurement.Results On weeks 4 after operation,the SFI of EPO group and control group were-65.26 ± 3.42 and-70.83 ± 4.12,respectively,the maximum tensile resistance were (3.86 ± 0.29)N/mm2 and (3.38 ± 0.21 )N/mm2,the delayed ratio of latency of motor nerve were 2.34 ± 0.23 and 2.78 ± 0.29,and the recovery ratio of wave amplitude were 0.23 ± 0.05 and 0.14 ± 0.03 respectively.On eight weeks after operation,the SFI of EPO group and control group were-51.34 ± 2.98 and-57.23 ± 4.86,respcetively,the maximum tensile resistance were (4.67 ± 0.36) N/mm2 and (4.13 ± 0.32) N/mm2,the delayed ratio of latency of motor nerve were 1.32 ± 0.15 and 1.62 ± 0.21,the recovery ratio of wave amplitude were 0.41 ± 0.09 and 0.26 ± 0.07,the nerve fibers cross ratio were 0.57 ± 0.05 and 0.38 ± 0.03,and the recovery ratio of sectional area of myelinated fibers were 0.81 ± 0.06 and 0.58 ± 0.03,respectively.Those items in EPO group were significantly superior to those in the control group (P < 0.05 =.Conclusion rh-EPO can promote the injured nerve regeneration and improve the recovery of their function.
Key words:
Erythropoietin; Sciatic nerve; Regeneration
The task of related work generation aims to generate a comprehensive survey of related research topics automatically, saving time and effort for authors. Existing methods simplify this task by using human-annotated references in a large-scale scientific corpus as information sources, which is time- and cost-intensive. To this end, we propose a Unified Reference Retrieval and Related Work Generation Model (UR3WG), which combines reference retrieval and related work generation processes in a unified framework based on the large language model (LLM). Specifically, UR3WG first leverages the world knowledge of LLM to extend the abstract and generate the query for the subsequent retrieval stage. Then a lexicon-enhanced dense retrieval is proposed to search relevant references, where an importance-aware representation of the lexicon is introduced. We also propose multi-granularity contrastive learning to optimize our retriever. Since this task is not simply summarizing the main points in references, it should analyze the complex relationships and present them logically. We propose an instruction-tuning method to leverage LLM to generate related work. Extensive experiments on two wide-applied datasets demonstrate that our model outperforms the state-of-the-art baselines in both generation and retrieval metrics.
The title compound has been synthesized in yield of 68% by reaction of a Vilsmeier reagent with 4-tert-butyl-propiophenone obtained by reaction of propionyl chloride with tert-butylbonzene, which is obtained by reaction of benzene with tert-butyl alcohol.
Augmenting large language models (LLMs) with external tools has emerged as a promising approach to extending the capability of LLMs. Although there are some works that employ open-source LLMs for the tool-learning task, most of them are trained in a controlled environment in which LLMs only learn to execute the human-provided tools. However, selecting proper tools from the large toolset is also a crucial ability for the tool-learning model to be applied in real-world applications. Existing methods usually directly employ self-instruction methods to train the model, which ignores differences in tool complexity. In this paper, we propose the Confucius a novel tool-learning framework to train LLM to use complicated tools in real-world scenarios, which contains two main phases: (1) We first propose a multi-stage learning method to teach the LLM to use various tools from an easy-to-difficult curriculum; (2) thenceforth, we propose the Iterative Self-instruct from Introspective Feedback (ISIF) to dynamically construct the dataset to improve the ability to use the complicated tool. Extensive experiments conducted on both controlled and real-world settings demonstrate the superiority of our tool-learning framework in the real-world application scenario compared to both tuning-free (e.g., ChatGPT, Claude) and tuning-based baselines (e.g., GPT4Tools).