logo
    Refactoring Google’s N-gram frequency norms for psycholinguistic studies
    0
    Citation
    0
    Reference
    10
    Related Paper
    Keywords:
    Code refactoring
    Gram
    n-gram
    Η παρούσα διατριβή τοποθετείται στο πλαίσιο της αυτόματης Μηχανικής Μετάφρασης, στην διαπροσωπίας ανθρώπου και μηχανής για τα άτομα με προβλήματα ακοής κάνοντας χρήση την γλώσσα των Κωφών, τηn Ελληνική Νοηματική Γλώσσα. Σε αυτή τη εργασία παρουσιάζουμε ένα πρωτότυπο σύστημα βασισμένο σε κανόνες μηχανικής μετάφρασης με σκοπό τη δημιουργία μεγάλων παράλληλων εύρωστων γραπτών σωμάτων ελληνικού κειμένου και της Ελληνικής Νοηματικής Γλώσσας κάνοντας χρήση της Σύντομης Μεταγραφής της Ελληνικής Νοηματικής Γλώσσας (ΣΜΕΝΓ) (text glosses). Στη συνέχεια, τα σώματα κειμένου χρησιμοποιούνται ως δεδομένα κατάρτισης για την παραγωγή / δημιουργία γλωσσικών μοντέλων ν-γραμμάτων (n-gram Language Model). Επίσης χρησιμοποιούνται και ως δεδομένα εκπαίδευσης για το σύστημα MOSES Στατιστικής Μηχανικής Μετάφρασης. Πρέπει να σημειωθεί ότι όλη η διαδικασία είναι ισχυρή και ευέλικτη, καθώς δεν απαιτεί βαθιά γνώση γραμματικής της ΕΝΓ. Στην εργασία μας παρουσιάζουμε μετρήσεις χρονικές εκτιμήσεις για την δημιουργία των γλωσσικών πόρων, αξιολογούμε τα γλωσσικά μοντέλα της ΕΝΓ μέσω της περιπλοκής και τέλος χρησιμοποιώντας τη μετρική βαθμολογία BiLingual Understudy Assessment (BLEU) για την αξιολόγηση της μηχανικής μετάφρασης, το πρωτότυπο σύστημα MT μας επιτυγχάνει ελπιδοφόρες επιδόσεις και συγκεκριμένα μια μέση βαθμολογία 60,53% και 85,1% / 65,5% / 53,8% / 44,8% για 1-gram / 2 -gram / 3-gram / 4-gram.
    Gram
    n-gram
    Citations (0)
    In this paper, we present a new method for diagnosis of stochastic discrete event system. The method is based on anomaly detection for sequences. We call the method sequence profiling (SP). SP does not require any system models and any system-specific knowledge. The only information necessary for SP is event logs from the target system. Using event logs from the system in the normal situation, N-gram models are learned, where the N-gram model is used as approximation of the system behavior. Based on the N-gram model, the diagnoser estimates what kind of faults has occurred in the system, or may conclude that no faults occurs. Effectiveness of the proposed method is demonstrated by application to diagnosis of a multi-processor system.
    Gram
    n-gram
    Profiling (computer programming)
    ngram extracts n-gram variables containing counts of how often n-grams occur in a given text. An n-gram is an n-long sequence of words. For example, is a unigram (1-gram), is a bigram (2-gram), and the black sheep is happy is a 5-gram. This is useful for text mining applications.
    Bigram
    n-gram
    Gram
    Extractor
    Feature (linguistics)
    Sequence (biology)
    Citations (0)
    N-gram indexing method is the most popular algorithm for the Japanese full text search system where each index consists of serial N characters. Especially the full text search for Japanese text usually has the 2-gram characters index as base in order to save the volumes of the index file. Although the additional higher-gram index is expected to improve the performance for searching indices, we have no experimental evaluation with additional higher-gram indices. This paper presents the evaluation about improving the text search performance with additional higher-gram indices by Search Term Intensive Approach which decides the term for higher-gram indices depend upon the appearance ratio in application programs as the searching term. On the concrete evaluation, the number of paper articles for searching is one or two hundred thousands, and the simulation for 5 or more gram additional indices can be applied add to evaluation for 3,4-gram additional indices.
    Gram
    n-gram
    Inverted index
    Citations (0)
    The addition of support for genericity to mainstream programming languages has a notable influence in refactoring tools. This also applies to the JAVA programming language. Those versions of the language specification prior to JAVA 5 did not include support for generics. Therefore, refactoring tools had to evolve to modify their refactoring implementations according to the new language characteristics in order to assure the correct effects when transforming code containing generic definitions or using generic instantiations. This paper presents an evaluation of the behaviour of refactoring tools on source code that defines or uses generics. We compare the behaviour of five refactoring tools on a well known refactoring, Extract Method, and its implementation for the JAVA language. We distill the lessons learned from our evaluation into requirements that have to be taken into account by refactoring tools in order to fully conform to this new language feature.
    Code refactoring
    Implementation
    Code (set theory)
    Citations (3)
    n-gram 기반 역색인 구조는 언어 중립적이고 에러 허용적인 장점들로 인해 일부 아시아권 언어에 대한 정보 검색이나 단백질과 DNA의 sequence의 근사 문자열 매칭에 유용하게 사용되고 있다. 그러나, n-gram 기반의 역색인 구조는 색인의 크기가 크고 질의 처리 시간이 오래 걸린다는 단점들을 가지고 있다. 이에 본 논문에서는 n-gram 기반 역색인의 장점을 그대로 유지하면서 색인의 크기를 줄이고 질의 처리 성능을 향상시킨 2단계 n-gram 역색인(간단히 n-gram/2L 역색인이라 부른다)을 제안한다. n-gram/2L 역색인은 n-gram 기반 역색인에 존재하던 위치 정보의 중복을 제거한다. 이를 위해 문서로부터 길이 m의 m-subsequence들을 추출하고, 그 m-subsequence들로부터 n-gram을 추출하여 2단계로 역색인을 구성한다. 이러한 2단계 구성 방법은 이론적으로 의미 있는 다치 종속성이 존재하는 릴레이션을 정규화하여 중복을 제거하는 것과 동일하며, 이를 본문에서 정형적으로 증명한다. n-gram/2L 역색인은 데이타의 크기가 커질 수록 n-gram 역색인에 비해 색인 크기가 줄어들며 질의 처리 성능이 향상되고, 질의 문자열의 길이가 길어져도 질의 처리 시간이 거의 증가하지 않는 좋은 특성을 가진다. 1GByte 크기의 데이타에 대한 실험을 통하여, n-gram/2L 역색인은 n-gram 기반 역색인에 비해 최대 1.9 ~ 2.7배 더 작은 크기를 가지면서, 동시에 질의 처리 성능은 3~18 범위의 길이를 가지는 질의들에 대해 최대 13.1배 향상됨을 보였다.
    Gram
    n-gram
    Gram-Negative Bacteria
    Citations (0)
    ngram extracts n-gram variables containing counts of how often n-grams occur in a given text. An n-gram is an n-long sequence of words. For example, is a unigram (1-gram), is a bigram (2-gram), and the black sheep is happy is a 5-gram. This is useful for text mining applications.
    Bigram
    n-gram
    Gram
    Extractor
    Feature (linguistics)
    Citations (0)