Chapter 11: Web-based Tools—VO Region Inventory Service
2007
As the size and number of datasets available through the VO grows, it becomes increasingly
critical to have services that aid in locating and characterizing data pertinent
to a particular scientific problem. At the same time, this same increase makes
that goal more and more difficult to achieve. With a small number of datasets, it is
feasible to simply retrieve the data itself (as the NVO DataScope service does). At
intermediate scales, “count” DBMS searches (searches of the actual datasets which
return record counts rather than full data subsets) sent to each data provider will
work. However, neither of these approaches scale as the number of datasets expands
into the hundreds or thousands.
Dealing with the same problem internally, IRSA developed a compact and extremely
fast scheme for determining source counts for positional catalogs (and in
some cases image metadata) over arbitrarily large regions for multiple catalogs in a
fraction of a second. To show applicability to the VO in general, this service has
been extended with indices for all 4000+ catalogs in CDS Vizier (essentially all published
catalogs and source tables).
In this chapter, we will briefly describe the architecture of this service, and then
describe how this can be used in a distributed system to retrieve rapid inventories of
all VO holdings in a way that places an insignificant load on any data supplier. Further,
we show and this tool can be used in conjunction with VO Registries and catalog
services to zero in on those datasets that are appropriate to the user’s needs.
The initial implementation of this service consolidates custom binary index file
structures (external to any DBMS and therefore portable) at a single site to minimize
search times and implements the search interface as a simple CGI program. However,
the architecture is amenable to distribution. The next phase of development will focus
on metadata harvesting from data archives through a standard program interface and
distribution of the search processing across multiple service providers for redundancy
and parallelization.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
0
References
0
Citations
NaN
KQI