This paper presents a three-phase approach to find the correspondence in Target Language (TL) sentence for a fragment of Source Language (SL) sentence in a lexicalized EBMT system. To be practical, it exploits surface information as much as possible instead of using parsers. Experiments show that, although not so perfect, it is very robust and effective. The three phases are: First, align the sentence pair at word level in order to provide anchors for phrase alignment. Second, based on the aligned anchors, find all the TL fragments which may possibly correspond to a SL fragment. And finally, using a score function, select the best TL fragment as the correspondence of a SL fragment.
For the purpose of building domain ontology, this paper proposes a methodology for building core ontology first, and then enriching the core ontology with the concepts and relations in the domain thesaurus. First, the top-level concept taxonomy of the core ontology is built using domain dictionary and general domain thesaurus. Then, the concepts of the domain thesaurus are classified into top-level concepts in the core ontology, and relations between broader terms (BT) - narrower terms (NT) and related terms (RT) are classified into semantic relations defined for the core ontology. To classify concepts, a two-step approach is adopted, in which a frequency-based approach is complemented with a similarity-based approach. To classify relations, two techniques are applied: (i) for the case of insufficient training data, a rule-based module is for identifying isa relation out of non-isa ones; a pattern-based approach is for classifying non-taxonomic semantic relations from non-isa. (ii) For the case of sufficient training data, a maximum-entropy model is adopted in the feature-based classification, where k-NN approach is for noisy filtering of training data. A series of experiments show that performances of the proposed systems are quite promising and comparable to judgments by human experts.
Feature selection is very important for feature-based relation classification tasks. While most of the existing works on feature selection rely on linguistic information acquired using parsers, this letter proposes new features, including probabilistic and semantic relatedness features, to manifest the relatedness between patterns and certain relation types in an explicit way. The impact of each feature set is evaluated using both a chisquare estimator and a performance evaluation. The experiments show that the impact of relatedness features is superior to existing well-known linguistic features, and the contribution of relatedness features cannot be substituted using other normally used linguistic feature sets.
We introduced, for Translation Memory System, a statistical framework, which unifies the different phases in a Translation Memory System by letting them constrain each other, and enables Translation Memory System a statistical qualification. Compared to traditional Translation Memory Systems, our model operates at a fine grained sub-sentential level such that it improves the coverage. Compared with other approaches that exploit sub-sentential benefits, it unifies the processes of source string segmentation, best example selection, and generation by making them constrain each other via the statistical confidence of each step. We realized this framework into a prototype system. Compared with an existing product Translation Memory System, our system exhibits obviously better performance in the assistant quality metric and gains improvements in the range of 26.3% to 55.1% in the translation efficiency metric.
Abstract As automated essay scoring (AES) has progressed from handcrafted techniques to deep learning, holistic scoring capabilities have merged. However, specific trait assessment remains a challenge because of the limited depth of earlier methods in modeling dual assessments for holistic and multi‐trait tasks. To overcome this challenge, we explore providing comprehensive feedback while modeling the interconnections between holistic and trait representations. We introduce the DualBERT‐Trans‐CNN model, which combines transformer‐based representations with a novel dual‐scale bidirectional encoder representations from transformers (BERT) encoding approach at the document‐level. By explicitly leveraging multi‐trait representations in a multi‐task learning (MTL) framework, our DualBERT‐Trans‐CNN emphasizes the interrelation between holistic and trait‐based score predictions, aiming for improved accuracy. For validation, we conducted extensive tests on the ASAP++ and TOEFL11 datasets. Against models of the same MTL setting, ours showed a 2.0% increase in its holistic score. Additionally, compared with single‐task learning (STL) models, ours demonstrated a 3.6% enhancement in average multi‐trait performance on the ASAP++ dataset.
Objective To investigate the distribution and antibiotic resistance of pathogens in the patients with lower respiratory infections in our hospital. Methods Bacterial culture was preformed from sputum specimens of diagnosed patients with lower respiratory infections in our hospital. Bacterium's appraisal and antimi-crobial susceptibility test for positive specimens were preformed by the Microscan A/S-4 semi-automatic bacteria of America's spirit company and drug susceptibility analyzer and its supporting identification and drug susceptibility plate. Results Gram-negative bacilli were the major microorganism in bacterium. The top 5 pathogens are Pseudomo-nas aeruginosa, pneumonia crayresearch bacteria, baumanii/haemolyticus, Xanthomonas maltophilia and Escheri-chia coli. Drug sensitive experiments showed the resistance rate of gram-negative bacteria to broad-spectrum antibi-otics was increased. Drug resistance rate of pseudomonas aeruginosa against Ceftriaxine, pneumonia crayresearch bacteria and colon bacillus against penicillins and cephalosporin and Ammonia Qu Na was above 60%, 80% and 50% respectively. The resistant rate of Baoman/hematolysis bacillus to the majority antibiotics is above 70%, which of Xanthomonas maltophilia to cephalosporins, quinolones and aminoglycosides were above 90%. Coagulase-negative staphylococcus as the core of Gram-positive bacteria, the resistant rate to Penicillin and cephalosporin are above 85%. The resistance rate of staphylococcus to the majority antibiotics was comparatively low. Conclusions More attention should be paid on distribution and diversify of bacterium in lower respiratory infections clinically and rational use of antibiotics.
Key words:
Respiratory infection; Pathogenic bacteria; Antibacterial
Word alignment problem between parallel corpora is based on the similar characteristics of two aligned words in two languages. We investigate linguistic-knowledge-based word similarity measures while other previous works heavily rely on statistical information, and their limits will be discussed. Linguistic knowledge is acquired from linguistic comparison of all layers between two languages, for Chinese and Korean in this paper.