Algorithms and models for collaborative filtering from large information corpora

2008 
In this thesis we propose novel collaborative filtering approaches for large data sets. We also demonstrate how these collaborative approaches can be used for creating user recommendations for items, based upon preferences towards items that users demonstrated in the past. We propose a framework, called a collaborative partitioning or CP for short, that is focused on finding a partition of a given set of items in order to maximize the number of partition-satisfied users. Two theoretical models for evaluating the quality of partitions are proposed. Both are introduced as bicriteria optimization problems with the percentage of satisfied users and the level of users satisfaction as the two optimization coefficients. As both of these bicriteria optimization problems are NP-hard, we propose Hierarchical Agglomerative Clustering—based approaches to compute approximations of their solutions. The results obtained by running the heuristic approaches on a real dataset show that the proposed approaches for CP have good results and find items partitions that are very close to a human-based genre partition for a given set. The genre partitions are partitions of items according to some human-created classifications. The results also show that the proposed heuristic approaches are a very good starting point in creating a top-k recommendation algorithms. The second part of this thesis proposes a collaborative filtering framework for finding seminal and seminally affected work for sets of items. The concept of seminal work for a set of items is used to mark items released in the past that are highly correlated to some future sets of items in the terms of users preferences. Similarly, the seminally affected work is a concept that is used in this thesis to mark items that are highly correlated to some previously released (older) items in the terms of users preferences. In this approach, we translate item-item correlation into a correlation directed acyclic graph (DAG). Direction in the DAG is determined by a chronological ordering of items. We demonstrate and validate the proposed approach by applying it on the web-based system called MovieTrack. This system uses seminal and seminally affected work in movies to give movie recommendations to users. It is built by applying the previously proposed approach on a real data set of movie reviews released by Netflix.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    28
    References
    0
    Citations
    NaN
    KQI
    []