Results of automatic mining individual fields of personal data operators register

2021 
The work presents the results of mining the records contained in the fields of the register personal data operators "list of actions with personal data" and "period or condition of termination personal data processing" and assessment of their compliance with the requirements of the legislation on personal data. Higher educational institutions were chosen as the research operator community, which allows taking into account similar features of personal data processing when forming expert assessments and intelligent data analysis. For the purpose of the study, a body of texts has been formed that can be used to analyze data mining methods by information processing and protection topics. For text mining, the following libraries were used: Scikit-learn, Gensim, PyMystem3, FuzzyWuzzy. Search queries were performed taking into account synonym dictionaries and the fuzzy location of words. To find stable keyword combinations, the TF-IDF weight function was calculated. Comparison of methods lemmatization of words for research purposes was made. The obtained results show the fidelity of expert assessments on filling in the fields of the register of personal data operators: the maximum cluster determined by the results of mining analysis corresponds to the expert template. The results of automatic mining require the verification of an expert in the field of personal data processing and protection. The use of data mining methods makes it possible to significantly increase the efficiency of experts when working with large volumes of information contained in the register of personal data operators. The work is aimed at forming separate sections of recommendations for the development of a sectoral (in the field of higher education and science) code of conduct in the field of protection of the rights of personal data subjects in order to increase the level of security of such information.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []