Francisco Claude

Diego Portales University

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Trends

Author Order

Document Type

Co-Authors

Gonzalo Navarro

Millennium Science Initiative

Jérémy Barbay

University of Chile

Antonio Fariña

Universidade da Coruña

Miguel A. Martínez‐Prieto

Universidad de Valladolid

Patrick K. Nicholson

Nokia (Ireland)

Roberto Konow

University of Chile

Diego Seco

Universidade da Coruña

J. Ian Munro

University of Waterloo

Diego Arroyuelo

Pontificia Universidad Católica de Chile

Alejandro Salinger

Systems, Applications & Products in Data Processing (Germany)

Cooperative Institutions

University of Chile

University of Waterloo

Universidade da Coruña

University of Helsinki

Helsinki Institute for Information Technology

Centre National de la Recherche Scientifique

Aalto University

University of Manitoba

Dalhousie University

Carleton University

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Field

The wavelet matrix: An efficient wavelet tree for large alphabets

Information Systems (2014)

Francisco Claude Gonzalo Navarro Alberto Ordóñez

Cascade algorithm

Stationary wavelet transform

Second-generation wavelet transform

Tree (set theory)

Matrix (chemical analysis)

10.1016/j.is.2014.06.002

Cite

Citations (58)

Extended Compact Web Graph Representations

Lecture notes in computer science (2010)

Francisco Claude Gonzalo Navarro

Adjacency matrix

Adjacency list

10.1007/978-3-642-12476-1_5

Cite

Citations (46)

Broadcasting in Conflict-Aware Multi-channel Networks

Lecture notes in computer science (2013)

Francisco Claude Reza Dorrigiv Shahin Kamali Alejandro López‐Ortiz Paweł Prałat

10.1007/978-3-642-36065-7_16

Cite

Citations (2)

Fast In-Memory XPath Search over Compressed Text and Tree Indexes

arXiv (Cornell University) (2009)

Diego Arroyuelo Francisco Claude Sebastian Maneth Veli Mäkinen Gonzalo Navarro

A large fraction of an XML document typically consists of text data. The XPath query language allows text search via the equal, contains, and starts-with predicates. Such predicates can efficiently be implemented using a compressed self-index of the document's text nodes. Most queries, however, contain some parts of querying the text of the document, plus some parts of querying the tree structure. It is therefore a challenge to choose an appropriate evaluation order for a given query, which optimally leverages the execution speeds of the text and tree indexes. Here the SXSI system is introduced; it stores the tree structure of an XML document using a bit array of opening and closing brackets, and stores the text nodes of the document using a global compressed self-index. On top of these indexes sits an XPath query engine that is based on tree automata. The engine uses fast counting queries of the text index in order to dynamically determine whether to evaluate top-down or bottom-up with respect to the tree structure. The resulting system has several advantages over existing systems: (1) on pure tree queries (without text search) such as the XPathMark queries, the SXSI system performs on par or better than the fastest known systems MonetDB and Qizx, (2) on queries that use text search, SXSI outperforms the existing systems by 1--3 orders of magnitude (depending on the size of the result set), and (3) with respect to memory consumption, SXSI outperforms all other systems for counting-only queries.

XPath

Tree (set theory)

10.48550/arxiv.0907.2089

Cite

Citations (9)

Self-indexed Text Compression Using Straight-Line Programs

Lecture notes in computer science (2009)

Francisco Claude Gonzalo Navarro

Substring

Representation

Line (geometry)

10.1007/978-3-642-03816-7_21

Cite

Citations (49)

Fast in‐memory XPath search using compressed indexes

Software Practice and Experience (2013)

Diego Arroyuelo Francisco Claude Sebastian Maneth Veli Mäkinen Gonzalo Navarro

Summary Extensible Markup Language (XML) documents consist of text data plus structured data (markup). XPath allows to query both text and structure. Evaluating such hybrid queries is challenging. We present a system for in‐memory evaluation of XPath search queries , that is, queries with text and structure predicates, yet without advanced features such as backward axes, arithmetics, and joins. We show that for this query fragment, which contains Forward Core XPath , our system, dubbed Succinct XML Self‐Index (‘SXSI’), outperforms existing systems by 1–3 orders of magnitude. SXSI is based on state‐of‐the‐art indexes for text and structure data. It combines two novelties. On one hand, it represents the XML data in a compact indexed form, which allows it to handle larger collections in main memory while supporting powerful search and navigation operations over the text and the structure. On the other hand, it features an execution engine that uses tree automata and cleverly chooses evaluation orders that leverage the speeds of the respective indexes. SXSI is modular and allows seamless replacement of its indexes. This is demonstrated through experiments with (1) a text index specialized for search of bio sequences, and (2) a word‐based text index specialized for natural language search. Copyright © 2013 John Wiley & Sons, Ltd.

XPath

10.1002/spe.2227

Cite

Citations (17)

Improved Grammar-Based Compressed Indexes

arXiv (Cornell University) (2011)

Francisco Claude Gonzalo Navarro

We introduce the first grammar-compressed representation of a sequence that supports searches in time that depends only logarithmically on the size of the grammar. Given a text $T[1..u]$ that is represented by a (context-free) grammar of $n$ (terminal and nonterminal) symbols and size $N$ (measured as the sum of the lengths of the right hands of the rules), a basic grammar-based representation of $T$ takes $N\lg n$ bits of space. Our representation requires $2N\lg n + N\lg u + \epsilon\, n\lg n + o(N\lg n)$ bits of space, for any $0<\epsilon \le 1$. It can find the positions of the $occ$ occurrences of a pattern of length $m$ in $T$ in $O((m^2/\epsilon)\lg (\frac{\lg u}{\lg n}) +occ\lg n)$ time, and extract any substring of length $\ell$ of $T$ in time $O(\ell+h\lg(N/h))$, where $h$ is the height of the grammar tree.

Substring

Tree (set theory)

Source

Cite

Citations (2)

Efficient Compressed Wavelet Trees over Large Alphabets

arXiv (Cornell University) (2014)

Francisco Claude Gonzalo Navarro Alberto Ordóñez

The {\em wavelet tree} is a flexible data structure that permits representing sequences $S[1,n]$ of symbols over an alphabet of size $\sigma$, within compressed space and supporting a wide range of operations on $S$. When $\sigma$ is significant compared to $n$, current wavelet tree representations incur in noticeable space or time overheads. In this article we introduce the {\em wavelet matrix}, an alternative representation for large alphabets that retains all the properties of wavelet trees but is significantly faster. We also show how the wavelet matrix can be compressed up to the zero-order entropy of the sequence without sacrificing, and actually improving, its time performance. Our experimental results show that the wavelet matrix outperforms all the wavelet tree variants along the space/time tradeoff map.

Cascade algorithm

Tree (set theory)

Stationary wavelet transform

Matrix (chemical analysis)

Sequence (biology)

10.48550/arxiv.1405.1220

Cite

Citations (0)

Space Efficient Wavelet Tree Construction

Lecture notes in computer science (2011)

Francisco Claude Patrick K. Nicholson Diego Seco

Tree (set theory)

10.1007/978-3-642-24583-1_19

Cite

Citations (39)

Practical representations for web and social graphs

Francisco Claude Susana Ladra

In this paper we focus on representing Web and social graphs. Our work is motivated by the need of mining information out of these graphs, thus our representations do not only aim at compressing the graphs, but also at supporting efficient navigation. This allows us to process bigger graphs in main memory, avoiding the slowdown brought by resorting on external memory. We first show how by just partitioning the graph and combining two existing techniques for Web graph compression, k2-trees [Brisaboa, Ladra and Navarro, SPIRE 2009] and RePair-Graph [Claude and Navarro, TWEB 2010], exploiting the fact that most links are intra-domain, we obtain the best time/space trade-off for direct and reverse navigation when compared to the state of the art. In social networks, splitting the graph to achieve a good decomposition is not easy. For this case, we explore a new proposal for indexing MPK linearizations [Maserrat and Pei, KDD 2010], which have proven to be an effective way of representing social networks in little space by exploiting common dense subgraphs. Our proposal offers better worst case bounds in space and time, and is also a competitive alternative in practice.

10.1145/2063576.2063747

Cite

Citations (37)