Pedigree tracking in the face of ancillary content

Eugene R. Creswick,Emi Fujioka,Terrance Goan

Pedigree tracking in the face of ancillary content

2008

Eugene R. Creswick
Emi Fujioka
Terrance Goan

The accurate tracking and retrieval of content pedigree is a quickly growing requirement as our abilities to create information assets increases exponentially. Plagiarism detection, accurate accreditation, and classification tasks all rely on the ability to determine where content is being used and where it originated. We present an approach to document pedigree tracking that is based on an efficient disk-based data structure and the use of two contrasting collections of historical text. These collections enable content of two types (or degrees of importance) to be defined and accounted for when locating documents with overlapping content. This approach is resilient in the face of substantial ancillary content and paraphrasing, two common sources of error in existing content tracking techniques.

Keywords:

Computer science
Accreditation
Data structure
Information retrieval
Data mining
Asset (computer security)
Plagiarism detection
sources of error

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations