Tibetan web page and its code identification method

2007 
The invention relates to a method for identifying Tibetan language webpage and its coded, including the steps of: giving a code of characteristic string, which is syllable node and/or selected high frequency syllable, in Tibetan language codefirstly; webpage character flow, the code of characteristic string as keyword, scanned and searched; calculating the frequency that accords with characteristic string coded character to appear by the couter; determining whether the webpage is Tibetan language webpage and the Tibetan language code is used according to the result of counter. The invention makes the best of the syllable structural feature of Tibetan language language and the statistics characteristic of Tibetan language word, and respectively applys the identification criteria for different code, accordingly Tibetan language webpage and non Tibetan language webpageu can efficiently be distinguished correctly, and Tibetan language coding used by the webpage is also able to be identified.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []