The daily use of Internet-based services is involved with hundreds of different tasks being performed by multiple users. A single task is typically involved with a sequence of Web URLs invocation. We study the problem of pattern detection in Web logs to identify tasks performed by users, and analyze task trends over time using a grammar-based framework. Our results are demonstrated on a corporate Intranet portal application with 7000 users over a 6 week period and demonstrate compelling business value from this high-level task analysis.
Today, XML is increasingly becoming a standard for representation of semi-structured information such as documents that combines content and metadata. Typical document management applications include document representation, authoring, validation, and document routing in support of a business process. We propose a framework for intelligent document routing that exploits and extends XML technologies to automate dynamic document routing and real-time update of business routing logic. The document-routing logic is stored in a secure repository and executed by a business rules engine. During rule execution, the input parameters of each business rule are bound with the data from each inbound XML document. This document routing framework is validated in a real-world implementation with reduced development cost, accelerated rule update cycle and simplified administration efforts.
We organized a workshop at SIGIR'01 to explore the area of information retrieval techniques for speech applications. Here we summarize the results of that workshop
To produce and characterize bioactive metabolites from piezotolerant marine fungus Nigrospora sp. in submerged fermentation.A distinct marine strain, Nigrospora sp. NIOT has been isolated from a depth of 800 m at the Arabian Sea. The 18S rRNA and internal transcribed spacers (ITS) analysis demonstrates its close association with the genus Nigrospora. Effect of pH, temperature, salinity, carbon source and amino acids was studied to optimize the fermentation conditions. Optimal mycelia growth and secondary metabolites production were observed at 6·0-8·0 pH, 20-30°C temperature, 7·5% salinity, sucrose as carbon source and tryptophan as amino acid source. The extracellular secondary metabolites exhibited high antimicrobial activities against both gram-positive and gram-negative pathogenic bacteria with minimal inhibitory concentration (MIC) values higher than 30 μg ml(-1). Strongest cytotoxicity was observed in all cell lines tested, GI50 (growth inhibition by 50%) was calculated to be 1·35, 3·2, 0·13 and 0·35 μg ml(-1) against U937, MCF-7, A673 and Jurkat, respectively. Fourier transform infrared spectroscopy (FTIR) and gas chromatography-mass spectrometry (GC-MS) analyses of secondary metabolites confirmed the production of antimicrobial and anticancer substances.A piezotolerant fungus Nigrospora sp. NIOT isolated from deep sea environment was successfully cultured under submerged fermentation. The secondary metabolites produced from this organism showed potent antimicrobial and anticancer activities with immediate application to cosmetics and pharmaceutical industries.This is the first study exploring Nigrospora sp. from 800 m in marine environment. This deep sea fungus under optimized culture conditions effectively produced bioactive secondary metabolites such as griseofulvin, spirobenzofuran and pyrone derivatives at higher concentrations.
The lifecycle of document management applications typically comprises a set of loosely coupled subsystems that provide capture, index, search, workflow, fulfillment and archival features. However, there exists no standard model for composing these elements together to instantiate a complete application. Therefore, every application invariably incorporates custom application code to provide the linkages between each of these loosely coupled subsystems. This paper proposes a model-based approach to instantiating document management applications. An Eclipse Modeling Framework (EMF) based model is used to formalize the variable elements in the document management applications. The modeling tool supports the instantiation of an EMF model for every new application and supports the generation of runtime artifacts - this includes code, XML configurations, scripts and business logic. This approach to creating new instances of document management applications with a formal EMF model has been validated with a real-world document management application.
The role of audio in the context of multimedia applications involving video is becoming increasingly important. Many efforts in this area focus on audio data that contains some built-in semantic information structure such as in broadcast news, or focus on classification of audio that contains a single type of sound such as cleaar speech or clear music only. In the CueVideo system, we detect and classify audio that consists of mixed audio, i.e. combinations of speech and music together with other types of background sounds. Segmentation of mixed audio has applications in detection of story boundaries in video, spoken document retrieval systems, audio retrieval systems etc. We modify and combine audio features known to be effective in distinguishing speech from music, and examine their behavior on mixed audio. Our preliminary experimental results show that we can achieve a classification accuracy of over 80% for such mixed audio. Our study also provides us with several helpful insights related to analyzing mixed audio in the context of real applications.
Combined word-based index and phonetic indexes have been used to improve the performance of spoken document retrieval systems primarily by addressing the out-of-vocabulary retrieval problem. However, a known problem with phonetic recognition is its limited accuracy in comparison with word level recognition. We propose a novel method for phonetic retrieval in the CueVideo system based on the probabilistic formulation of term weighting using phone confusion data in a Bayesian framework. We evaluate this method of spoken document retrieval against word-based retrieval for the search levels identified in a realistic video-based distributed learning setting. Using our test data, we achieved an average recall of 0.88 with an average precision of 0.69 for retrieval of out-of-vocabulary words on phonetic transcripts with 35% word error rate. For in-vocabulary words, we achieved a 17% improvement in recall over word-based retrieval with a 17% loss in precision for word error rites ranging from 35 to 65%.