Microsoft TerraServer stores aerial, satellite, and topographic images of the earth in a SQL database available via the Internet since June 1998. It is a popular online atlas, combining twenty-two terabytes of image data from the United States Geological Survey (USGS). Initially the system demonstrated the scalability of PC hardware and software Windows and SQL Server on a single, mainframe-class processor [Barclay98]. Later, we focused on high availability by migrating to an active/passive cluster connected to an 18 terabyte Storage Area Network (SAN) provided by Compaq Computer Corporation [Barclay04]. In November 2003, we replaced the SAN cluster with a duplexed set of Y´white-boxi PCs containing arrays of large, low-cost, Serial ATA disks which we dub TerraServer Bricks. Our goal is to operate the popular TerraServer web site with the same or higher availability than the TerraServer SAN at a fraction of the system and operations cost. This paper describes the hardware and software components of the TerraServer Bricks and our experience in configuring and operating this environment for the first year.
In this paper, we show how to use a Relational Database Management System in support of Finite Element Analysis. We believe it is a new way of thinking about data management in well-understood applications to prepare them for two major challenges, - size and integration (globalization). Neither extreme size nor integration (with other applications over the Web) was a design concern 30 years ago when the paradigm for FEA implementation first was formed. On the other hand, database technology has come a long way since its inception and it is past time to highlight its usefulness to the field of scientific computing and computer based engineering. This series aims to widen the list of applications for database designers and for FEA users and application developers to reap some of the benefits of database development.
This is Part II of a three article series on using databases for Finite Element Analysis (FEA). It discusses (1) db design, (2) data loading, (3) typical use cases during grid building, (4) typical use cases during simulation (get and put), (5) typical use cases during analysis (also done in Part III) and some performance measures of these cases. It argues that using a database is simpler to implement than custom data schemas, has better performance because it can use data parallelism, and better supports FEA modularity and tool evolution because database schema evolution, data independence, and self-defining data.
Wireless sensor networks can revolutionise soil ecology by providing measurements at temporal and spatial granularities previously impossible. This paper presents our first steps towards fulfilling that goal by developing and deploying two experimental soil monitoring networks at urban forests in Baltimore, MD. The nodes of these networks periodically measure soil moisture and temperature and store the measurements in local memory. Raw measurements are incrementally retrieved by a sensor gateway and persistently stored in a database. The database also stores calibrated versions of the collected data. The measurement database is available to third-party applications through various Web Services interfaces. At a high level, the deployments were successful in exposing high-level variations of soil factors. However, we have encountered a number of challenging technical problems: need for low-level programming at multiple levels, calibration across space and time, and sensor faults. These problems must be addressed before sensor networks can fulfil their potential as high-quality instruments that can be deployed by scientists without major effort or cost.
NAND flash densities have been doubling each year since 1996. Samsung announced that its 32-gigabit NAND flash chips would be available in 2007. This is consistent with Chang-gyu Hwang’s flash memory growth model1 that NAND flash densities will double each year until 2010. Hwang recently extended that 2003 prediction to 2012, suggesting 64 times the current density250 GB per chip. This is hard to credit, but Hwang and Samsung have delivered 16 times since his 2003 article when 2-GB chips were just emerging. So, we should be prepared for the day when a flash drive is a terabyte(!). As Hwang points out in his article, mobile and consumer applications, rather than the PC ecosystem, are pushing this technology.
The Sloan Digital Sky Survey (SDSS) science database describes over 140 million objects and is over 1.5 TB in size. The SDSS Catalog Archive Server (CAS) provides several levels of query interface to the SDSS data via the SkyServer website. Most queries execute in seconds or minutes. However, some queries can take hours or days, either because they require non-index scans of the largest tables, or because they request very large result sets, or because they represent very complex aggregations of the data. These "monster queries" not only take a long time, they also affect response times for everyone else - one or more of them can clog the entire system. To ameliorate this problem, we developed a multi-server multi-queue batch job submission and tracking system for the CAS called CasJobs. The transfer of very large result sets from queries over the network is another serious problem. Statistics suggested that much of this data transfer is unnecessary; users would prefer to store results locally in order to allow further joins and filtering. To allow local analysis, a system was developed that gives users their own personal databases (MyDB) at the server side. Users may transfer data to their MyDB, and then perform further analysis before extracting it to their own machine. MyDB tables also provide a convenient way to share results of queries with collaborators without downloading them. CasJobs is built using SOAP XML Web services and has been in operation since May 2004.
Jim Gray is a Microsoft Distinguished Engineer. He is part of Microsoft’s research group and is Manager of the Microsoft Bay Area Research Center. Over many years his work has focused on databases and transaction processing and he was awarded the ACM Turing Award for his work on transaction processing. He has also been active in building online databases like http://terraService.Net and http://skyserver.sdss.org. In this discussion, Dr. Gray talks about his view of the Grid, as seen through a Microsoft lense. He describes: how ‘the Grid’ is composed of multiple communities and interests the challenges that face each Web Services and OGSA and places these in context the commercial dimension.
Most scientific data will never be directly examined by scientists; rather it will be put into online databases where it will be analyzed and summarized by computer programs. Scientists increasingly see their instruments through online scientific archives and analysis tools, rather than examining the raw data. Today this analysis is primarily driven by scientists asking queries, but scientific archives are becoming active databases that self-organize and recognize interesting and anomalous facts as data arrives. In some fields, data from many different archives can be cross-correlated to produce new insights. Astronomy presents an excellent example of these trends; and, federating Astronomy archives presents interesting challenges for computer scientists.
This paper describes the Fourth Data Release of the Sloan Digital Sky Survey (SDSS), including all survey-quality data taken through 2004 June. The data release includes five-band photometric data for 180 million objects selected over 6670 deg2 and 673,280 spectra of galaxies, quasars, and stars selected from 4783 deg2 of those imaging data using the standard SDSS target selection algorithms. These numbers represent a roughly 27% increment over those of the Third Data Release; all the data from previous data releases are included in the present release. The Fourth Data Release also includes an additional 131,840 spectra of objects selected using a variety of alternative algorithms, to address scientific issues ranging from the kinematics of stars in the Milky Way thick disk to populations of faint galaxies and quasars.