Text Mining for Quality Control of Court Records

2014 
Attorneys across the United States use government-provided electronic databases to submit docket entries and associated case les for processing and archival in public judicial records. Data entry errors in these repositories, while rare, can disrupt the court process, confuse the public record, or breach privacy and condentiality. Docket quality assurance is thus a high priority for the courts, but manual review remains resource-intensive. We have developed a prototype application of text mining and human language technologies to partially automate quality assurance review of electronic court documents. This solution uses document classication and named entity recognition to extract metadata directly from documents. Discrepancies between the extracted metadata and the userprovided metadata indicate a possible data entry error. On two independent samples of publicly available court documents, we nd that for a small number of classes with a sucient number of training documents, the document class can be automatically classied with greater than 94% accuracy in one case, but only 81% in the other. Our attempts to extract case numbers and the names of parties from documents via a conditional random eld model met with less success. Future work with more extensive training data is necessary to more accurately evaluate both applications.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    10
    References
    1
    Citations
    NaN
    KQI
    []