Drill Down: Putting Data to Work
Unlike love, pioneering new technology almost always means having to say you’re sorry. With great fanfare in late June, Microsoft announced Microsoft TerraServer, which it claims is the world’s largest database on the Web (www.terraserver.microsoft.com). The site contains more than a terabyte of compressed aerial and satellite photographs of Earth (representing 3.5 terabytes of uncompressed image data), making up the world’s largest and most detailed atlas. According to Microsoft, the TerraServer has more information than all the HTML pages all ready on the Web. And it should double in size over the next year.
The images came from the United States Geological Survey (USGS) and a branch of the Russian Space Agency. The USGS photographs cover about 30 percent of the U.S.. The Russian pictures are of Europe, Asia, Africa and the Americas. Users can zoom in as close as 1.0-meter resolution, allowing them to identify buildings and cars but not people. The objective of the site is to demonstrate the scalability of Microsoft SQL Server 7.0 Enterprise Edition and Windows NT Enterprise edition, as well how customers can combine numeric, text, and image data in the same database and then deliver the information seamless over the Web.
Well, maybe. When I visited the site about two weeks after the announcement, the first message I got was an apology. It seems that the folks at Microsoft did not anticipate how many people would like to view an aerial view of their home or businesses or favorite vacation spots. Users were assured that additional servers were being added to cut down delays in accessing information.
Even with a dial-up connection, I decided to soldier on, pointing and clicking on various maps until I thought I could take a peak at my home. On the fourth click, I found myself looking at an aerial map with pretty good resolution (it was built using 27 database queries) but I didn’t recognize anything. So I clicked a button called "Image Info" and learned that I was looking at Bombay Hook Island. Never heard of the place.
So I tried again. This time I wound up at Salem. Then, I entered my address in the "Find a Spot on Earth" section. Once again, I didn’t recognize the place. But why should I? My neighborhood has changed a lot since 1987, when the photograph was taken. In fact, it didn’t exist in 1987.
Don’t get me wrong. I think Microsoft’s TerraServer is an intriguing, impressive demonstration project. The idea that people can access 3.5 terabytes of complex data from anywhere on the Web in seconds is mind boggling. It is just, as they say, not quite ready for prime time.
Aside from the usability and timeliness-of-data issues, the TerraServer project illustrates another important aspect of large-scale data distribution projects. Though Microsoft took the lead in developing the project, it partnered with Compaq Computer, Legato Systems and Storage Technology. Compaq supplied the hardware backbone with its 64-bit Alpha Server 8400 (with eight 400 MHz Alpha processors) and Storage Works Enterprise Storage Array 10000 subsystem. Legato’s Network for Microsoft Windows NT Server provides the mission-critical data protection and disaster recovery support. And StorageTek’s 9710 and 9714 digital linear tape libraries from the storage infrastructure. And then there were the content suppliers. This was a complex, collaborative effort.
Why the attention on this application? One of the primary targets for Microsoft’s SQL Server 7.0 Enterprise Edition is data warehouse applications. And data warehousing and related technologies will be the focus of this column in the months to come. Aficionados will know that the term data warehouse was first coined in 1990 by William Inmon, who among other accomplishments, was a founder of Red Brick. He described a data warehouse as a subject-oriented, integrated, time-variant, nonvolatile collection of data in support of management’s decision needs. A data warehouse can be thought of as a database made up of data captured from different sources that can be used to support decision-making processes.
Over the last several years, the notion of data warehousing has become more complex, as related ideas, such as data marts, data mining, and online analytical processing (OLAP) have emerged. Taken together, capitalizing on data warehousing and its associated concepts has become one of the most significant challenges facing enterprise information managers. The ability to effectively, store, process and learn from corporate-wide data is becoming key for companies looking for a competitive edge.
The force driving the interest in data warehousing is this: Management has realized that they have a treasure trove of information stored on hard disks and tapes throughout the enterprise. If they can access and analyze it, they could improve their business processes and decision making.
Consolidating data from different sources, structuring it in an appropriate ways, and then analyzing and distributing the data on an as-needed basis is no trivial task. It touches on every element of enterprise information technology including software for design and modeling, data movement and transportation, data storage, access and management, and data analysis, as well as computer hardware and storage issues. Not surprisingly, the market for supplying the tools needed to realize the promise of data warehousing is expanding as well. According to a recent Frost & Sullivan study, the U.S. data warehouse software applications and tools market grew 23.7 percent in 1997 to $2.1 billion.
In this column, I will explore the many aspects of data warehousing and its related concepts. I will report on new technologies, case studies, best practices, and insights into the companies supplying tools you can use to implement solutions. I hope that you feel free to contact me with your observations and tips on what has caught your attention.
Bill Gates often says that the network is like the digital nervous system of the enterprise. That may be. But information is its lifeblood. And data warehousing represents the next step in the development of information creation and its application. The successful deployment of this technology will help companies compete more effectively. This column is directed towards that goal.
ABOUT THE AUTHOR:
Dr. Elliot King is an Assistant Professor of Communication and Director of the New Media Center at Loyola College in Maryland. His research interests are in distributed information systems, new communications technology and the diffusion of innovation. He has written five books and hundreds of magazine and journal articles about the use of new information technology. He can be reached at (410) 356-3943, via fax at (410) 356-5217, or by e-mail at firstname.lastname@example.org.