Nitin Indurkhya

University of Science and Technology of China

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Trends

Author Order

Document Type

Co-Authors

Sholom M. Weiss

Rutgers, The State University of New Jersey

Fred J. Damerau

UNSW Sydney

Tong Zhang

Tencent (China)

Tong Zhang

Nanjing University of Science and Technology

Sabine Weiß

Brandenburg University of Technology Cottbus-Senftenberg

Tong Zhang

Xiyuan Hospital

Tong Zhang

South China University of Technology

Zhen-Hua Ling

University of Science and Technology of China

Hiroshi Motoda

Osaka University

Tong Zhang

Tianjin Medical University

Cooperative Institutions

University of Science and Technology of China

South China University of Technology

University of Bergen

IBM (United States)

Ministry of Education of the People's Republic of China

Haukeland University Hospital

Soochow University

Southeast University

IBM Research - Thomas J. Watson Research Center

University of Connecticut

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Field

Overview of Text Mining

Texts in computer science (2015)

Sholom M. Weiss Nitin Indurkhya Tong Zhang

10.1007/978-1-4471-6750-1_1

Cite

Citations (3)

Rule-based Machine Learning Methods for Functional Prediction

Journal of Artificial Intelligence Research (1995)

Sabine Weiß Nitin Indurkhya

We describe a machine learning method for predicting the value of a real-valued function, given the values of multiple input variables. The method induces solutions from samples in the form of ordered disjunctive normal form (DNF) decision rules. A central objective of the method and representation is the induction of compact, easily interpretable solutions. This rule-based decision model can be extended to search efficiently for similar cases prior to approximating function values. Experimental results on real-world data demonstrate that the new techniques are competitive with existing machine learning and statistical methods and can sometimes yield superior regression performance.

Representation

Disjunctive normal form

Decision rule

Value (mathematics)

Rule induction

10.1613/jair.199

Cite

Citations (173)

Handbook of Natural Language Processing (second edition)

Nitin Indurkhya Fred J. Damerau Boca Raton Jochen L. Leidner

The Handbook of Natural Language Processing is a revised edition of an earlier handbook (Dale, Moisl, and Somers 2000). This second edition was prepared by Nitin Indurkhya, a researcher at the University of New South Wales, and the late text processing pioneer Fred J. Damerau of the IBM T. J. Watson Research Center (d. 27 January 2009), whose 1964 paper introduced a version of what is now known as the Damerau-Levenshtein distance, a metric of the similarity between two strings and a dynamic programming algorithm to compute it efficiently (Damerau 1964). Damerau also invented automatic hyphenation (Damerau 1970) and worked on early question-answering systems. Indurkhya, who is also affiliatedwith a consulting company, Data-Miner Pty Ltd., maintains a companion wiki for the book.1 The book has three parts, totaling 26 chapters. The first part, Classical Approaches, essentially covers techniques that were known prior to the statistical revolution, that is, before natural language processing people in the mainstream embraced techniques that speech engineers were already using successfully for awhile. The second part, Empirical and Statistical Approaches, covers state-of-the-art data-driven models.2 Part three, Applications, shows some techniques closer to applications. If you are talking to a computational linguist, information extraction is seen as an application, but if you are talking to business people, they will see it as a general technology area, from which many application products and services can be built. The handbook “aims to cater to the needs of NLP practitioners and language-engineering professionals in academia as well as in industry. . . . The prototypical reader is interested in the practical aspects of building NLP systems and may also be interested in working with languages other than English” (p. xxii). Hence it would have been nice to introduce some descriptions of actual products that generate revenue (even if this meant that this particular part of the handbook would become outdated more quickly) in order to demonstrate how the NLP parts are embedded in non-NLP technology, and how these products are embedded in the businesses that use them. For example, the application chapter “Information Retrieval” does not describe how the topics in other parts were applied in Web search engines or enterprise search products, as one might have expected. Rather, it basically is another technical chapter—and its probabilistic IR material could just as well have been presented in part two (statistical techniques).

Watson

IBM

Source

Cite

Citations (45)

Graph-based induction as a unified learning framework

Applied Intelligence (1994)

Kenichi Yoshida Hiroshi Motoda Nitin Indurkhya

Generality

Digraph

Colored

Macro

10.1007/bf00872095

Cite

Citations (88)

Using Text for Prediction

Texts in computer science (2015)

Sholom M. Weiss Nitin Indurkhya Tong Zhang

10.1007/978-1-4471-6750-1_3

Cite

Citations (0)

Model combination methods in data mining

Nitin Indurkhya

Source

Cite

Citations (0)

Solving regression problems with rule-based ensemble classifiers

Nitin Indurkhya Sholom M. Weiss

We describe a lightweight learning method that induces an ensemble of decision-rule solutions for regression problems. Instead of direct prediction of a continuous output variable, the method discretizes the variable by k-means clustering and solves the resultant classification problem. Predictions on new examples are made by averaging the mean values of classes with votes that are close in number to the most likely class. We provide experimental evidence that this indirect approach can often yield strong results for many applications, generally outperforming direct approaches such as regression trees and rivaling bagged regression trees.

Ensemble Learning

10.1145/502512.502553

Cite

Citations (27)

From Textual Information to Numerical Vectors

Springer eBooks (2005)

Sholom M. Weiss Nitin Indurkhya Tong Zhang Fred J. Damerau

Representation

Concept mining

10.1007/978-0-387-34555-0_2

Cite

Citations (6)

A Study on Improving End-to-End Neural Coreference Resolution

Lecture notes in computer science (2018)

Jia-Chen Gu Zhen-Hua Ling Nitin Indurkhya

Coreference

10.1007/978-3-030-01716-3_14

Cite

Citations (6)

Segmenting eBay item descriptions into coherent sections

Smruthi Mukund Nitin Indurkhya Neel Sundaresan

Item descriptions on an online e-Commerce site such as eBay consist of item-specific information along with generic information such as shipping and return policies, requests for feedback, and contact information. Extracting these textual segments from the item descriptions is non-trivial as they contain html markups, advertisements, templates, and navigational elements. Since sellers have considerable editorial freedom in how to describe their items, many of the descriptions lack homogeneity and compactness. Very often, the relevant information has to be extracted from incomplete, ill-formed discourse units adding to the challenge of finding coherent segments. In this paper we describe an approach that identifies item-specific text segments from eBay descriptions. This approach uses a bootstrapping technique to learn high-quality semantic lexicons for item-agnostic text segments. We first extract useful text by removing html markups using a boiler-plate removal technique that preserves markup information and captures visual segmentation. Each segment is further processed to extract discourse units that play the same role as sentences. This is followed by a clustering technique that identifies thematic breaks to extract coherent segments. We evaluate our approach on a diverse set of descriptions and show that our approach outperforms a commonly-used approach that relies only on the title keywords.

Market Segmentation

Bootstrapping (finance)

10.1145/2034617.2034625

Cite

Citations (0)