In-Depth
        
        USDA-Approved Data Integration
        The USDA needed to access disparate databases scattered in regional agencies across the country. With a middleware solution and minimal programming, data now flows from an IBM mainframe into Microsoft SQL Server and then to users' browsers.
        
        
        What if you had a ton of data residing on a chain of islands stretching         from Alaska to Hawaii and no way to connect to it? And what if there was         no transportation—no ships, no planes, not even a phone line connecting         this string of islands? And perhaps each island has its own language.         And what if one day the VP of sales stopped by your desk and asked for         yesterday's sales figures for four different items on each of these islands?
 
Such was the situation at the United States Department of Agriculture         Rural Development. The agency is actually a collection of separate agencies         that provide loans, grants and technical assistance to rural communities         for business development, housing programs, community facilities such         as fire stations, fire trucks, jails, courthouses, hospitals, clinics         and schools, water and wastewater treatment systems, and solid waste disposal         programs. Congressional and executive representatives frequently come         to the USDA for data on which projects have been funded in their districts         or what areas haven't received adequate assistance.
      Answering such questions, however, was anything but a snap. With 800         local offices around the country and a host of programs in thousands of         communities, a lot of data resided in relatively inaccessible and unlinked         systems. All the management information systems were located in Kansas         City, including the rural community facilities tracking system (RCFTS),         and other stovepipe IT formats such as AMAS and ACQD. 
      Financial data on dollar amounts being issued for the various projects         was on a different system, the program loan accounting system (PLAS),         and couldn't be combined with other accounting systems using GLAS and         GLS formats. Although the accounting systems all resided on an IBM mainframe,         integration and analytical efforts were hampered by the need for skilled         programmers to enter and extract the data. As a result, reports were limited         to the most critical areas, were wide open to human error and were often         out of date by the time they were compiled. 
      Bridging the Islands
        J. Norman Reid, associate deputy administrator at the USDA's Office of         Community Development, one of the agencies within Rural Development, found         himself assuming the role of project manager for the USDA Rural Development         endeavor to integrate the various information systems for employees to         query on an intranet. Reid says OCD looked at a number of products, but         quickly decided on iWay Software's Enterprise Integration Suite as the         middleware needed to provide integration for the disparate data sources.         "There were two major technological advances that opened up this         solution for us. One was the Web, which we knew was an excellent way to         make a wide variety of information easily available to people. The second         factor was the iWay software, which let us take different databases and         platforms and connect to them. Once we'd done that, we didn't care any         longer what [system] they were on because we were just dealing with data,"         says Reid. 
      When Reid and his group started the data integration project, the initial         difficulty was getting somebody excited about it. "We knew what the         vision was, and what we could do," Reid says, "but telling that         to a politician was difficult because they can't see it in their minds."         Luckily, they were able to find someone at USDA-RD who was enough of a         believer and who was willing to go out on a limb and give them some money—initially         $100,000. 
      Once OCD had the first version of the product ready, which took approximately         three months, the agency did a demonstration with a data run that took         10 seconds to compute and load. Mike Grisby, the product manager on this         project, apologized to the chief budget officer at the demo for the slow         speed. The budget officer's response was a stare of disbelief—apparently,         they were accustomed to waiting months for that sort of information. After         that demo, OCD had no trouble getting additional support for the project.       
      Other problems encountered were neither technical nor software-related,         but rather inconsistencies with the data itself. Since OCD was integrating         incompatible data within the stovepipe systems, the group encountered         some data inconsistencies. The strange codes had to be either filtered         out or recoded.
      Another unexpected issue: Inadequate browsers in field offices. In some         instances, it took several years to get browsers updated in some offices,         which limited the use of certain features by users. Although part of the         problem was hardware limitations, most of the blame lay with slow communications—many         field offices had slow X.25 communication links. To minimize the amount         of data that needed to pass through the Web, OCD decided on HTML screens         using a minimum of Java code.
      Even in the computer world, knowledge is power. Reid frankly acknowledges         that the biggest problems were political. Workers at various levels felt         they owned a particular data set and were unwilling to share it. "We         fought that and that probably took us as much time as anything, and we're         still fighting that. We have one data set that we've been trying since         last November to get hold of, and only now is it looking like we might         get it."
                                                                                                                                   | Product                         Information |                                                                 |                           iWay Enterprise Integration                           Suite                         Components start at $6,000                         Application Server starts at $30,000                                                 iWay ETL                         Pricing starts at $10,100                                                iWay Software (an Information Builders                           Company)                         New York, N.Y.                         (212) 330-1700 phone                         (212) 564-1726 fax                          [email protected]                         www.iwaysoftware.com                                                 WebFOCUS                         Pricing starts at $10,000                                                 Information Builders Inc. (IBI)                         New York, N.Y.                         (212) 736-4433 phone                         (800) 969-4636 phone                         (212) 967-6406 fax                         [email protected]                         www.informationbuilders.com                         |                                                          |                                                        |         
      
      Daily Mainframe Data Access
        Using the iWay Enterprise Integration Suite, Reid's group set up its Web         Reporting System, which performs a daily read on the mainframe data, then         pulls it down to an Intel server running Microsoft SQL 7.0. From there,         it's posted to the Web with Information Builder's WebFOCUS. Using a standard,         Java-enabled Web browser, employees can select the agency, program, time         period and locations for which they want data. The data is then pulled         from the SQL datacube and integrated at the desktop level on the fly.         The results can be displayed onscreen in HTML or saved to an Excel spreadsheet.         For additional details, users can drill down to an OLAP table and add         or subtract variables to modify it, and can also change the way the results         are presented. This means information on specific programs is up-to-date         as of close of business the previous day. It's also available to field         staff and policymakers all over the country—and all without mainframe         reprogramming.
      Once the iWay software connection is made to the mainframe, the extraction,         transformation and load tool, iWay ETL, which also schedules the draws,         performs data cleansing on the data as it's brought down to the SQL server.         Invalid codes in the original data are filtered or recoded as the data         is delivered, resulting in a clean data set. By setting up the reads to         pull down only the variables needed, rather than the entire file, there         is a minimum of overlapping data—just what is needed to match the         files. "One of the advantages of the ETL product," says Reid,         "is that we can bring down a fairly slim file, do the joins between         the different tables we're pulling down, feed it to the SQL server and         it's ready to go." 
      Avoiding any modifications to the mainframe systems was a huge benefit,         but equally important was the fact that virtually no special training         was required for employees to access the information. Although Reid has         no hard-dollar figure for the project, he says it wasn't hugely expensive,         costing only about $200,000. That figure includes a server, the software         itself, which was "dirt cheap," and the labor to put it together.         Labor comprised the major cost, and mainly focused around setting up the         datacube in SQL Server, developing Web screens, and running down data         errors detected in the iWay ETL tool. Because the data mainly came from         existing files, very little data input was necessary, and only a small         amount of mainframe programmer involvement was needed to connect to the         mainframe databases. "It was very cost-effective, didn't require         re-architecting the systems or any particular training and didn't require         client software—it was a beautiful solution, and it still seems to         me that there isn't any better," Reid says.
      Implementation went quickly as well. The first system was built in about         two months, but unhappy with it, OCD rebuilt it over the next couple of         months. The agency continued to modify this preliminary version for the         next five to six months until settling on the present version. "Our         approach has been to take an idea and develop the hell out of it, and         if it doesn't work, then we change it. Our philosophy is that if we had         to do all the planning that a lot of organizations do, we'd never get         started, and we think, frankly, [this approach has] been a lot more efficient         for us." [See "Lessons Learned" on page 56 for additional         tips from Reid. —Ed.]
                                                                                                                                   | Lessons                         Learned  |                                                                 |                           J. Norman Reid, associate deputy administrator at the                           USDA's Office of Community Development, offers these                           tips for successful handling of a large legacy data                           integration project.                                                   - Don't muck around with excessive planning. Figure                             out what you need and do rapid prototyping.
                           -  Get a product out quickly, even if it's a very                             limited version. People can't envision what you're                             talking about until you can show it. If you get the                             initial product out, it'll buy you time to build what                             you really want.
                           -  Realize that a data integration project is going                             to involve partnerships, because in a large project,                             no one owns all the data. You need to be able to connect                             with those who own different pieces of the data, and                             make sure they're willing to share their pieces of                             the pie.
                           -  Some people try to buy prepackaged total solutions;                             consultants may tell you that there's one system that                             will do everything. Keep in mind that prepackaged                             solutions can be expensive and klutzy, and can require                             an extensive amount of training to use, as well as                             lots of programming to make it work in the first place.                             Instead, look for solutions that go through the Web                             browser. Basically, let users decide what they want                             to do without lots of training or understanding [of                             the underlying systems].
                           -  If you're creating a system for an executive manager,                             the Mapping Analysis Program has enormous potential                             for a lot of different agencies, departments, and                             even businesses, because there's a geographic component                             to what most people are doing. Understanding what's                             happening geographically can be extremely helpful                             and useful. 
                           -  Keep it simple and make it easily available. Use                             the Web browser approach—don't try to buy something                             that pretends to offer a total solution if it requires                             you to buy their software.
                                                  —S.J.B.                         |                                                          |                                                        |         
      
      Tip of the Iceberg
        OCD is now working on a system called the Mapping Analysis Program (MAP),         which is similar to the Web Reporting System, except that in addition         to displaying information to queries in tables, it accommodates the way         most people think—in visual terms—by displaying that data in         a map. This data comes from the same datacube populated by the mainframe         data brought down by the iWay software into the SQL database. Then, instead         of using WebFOCUS to post the data to the Web, it's run through a program         called ESRI ArcIMS that builds maps and posts them to the Web. 
                                                                                                                                   | Product                         Information |                                                                 |                           ESRI ArcIMS                         Pricing starts at $7,500                                                ESRI Inc.                         Redlands, Calif.                         (909) 793-2853 phone                         (800) 447-9778 phone                         (909) 793-5953 fax                          [email protected]                                                    www.esri.com                                                 Tele                           Atlas MultiNet (USA)                         Call for pricing                                                 Tele Atlas North America, Inc. (formerly                           Etak Inc.)                         Menlo Park, Calif.                         (650) 328-3825 phone                         (650) 328-3148 fax                          [email protected]                         www.teleatlas.com                                                    |                                                          |                                                        |         
      
      With this system in place, you can discover how much has been spent in         the U.S. for a particular program by going to the Web site and designating         parameters for the data you want. The program displays a U.S. map with         each state colored according to the amount of money spent on the program         you've chosen. The display will include a data table from WebFOCUS with         the actual dollar amounts state-by-state. From there, you can drill down         further by state or by county. The first phase of MAP was built in about         three months and was released last September. 
      Future plans center around adding value to the MAP system. They've just         acquired Tele Atlas MultiNet Software (formerly EtakMap Premium Data,         ESRI ShapeFile) that will allow them to make comparisons with map overlays.         So, for example, they'll be able to overlay high poverty areas with particular         projects in exact locations. As Reid says, "It becomes very intuitive         to do analyses, and you don't have to be a statistician to see what's         going on." As the USDA telecommunications are upgraded, OCD will         probably use more Java code, as it gives users greater drilldown and zoom-in         capabilities.
      Of course, there's always more data to be added. OCD is currently working         on entering housing data, which it hopes to have completed by this fall.         The agency also wants to be able to plug into disaster and environmental         information that's maintained by other departments, such as the Environmental         Protection Agency, the Federal Emergency Management Agency, the Natural         Resources Conservation Service, and others. "Because we take a comprehensive         view of what's happening in rural places, we need hundreds of different         kinds of information—some of which I can't even anticipate,"         Reid says.
      The plans don't stop there. As Reid says, "I can also see the possibility         of this being an entry point to other kinds of information. For example,         say you drill down in Illinois to a particular project in a particular         county. Why shouldn't that project also be hypertext-linked to a whole         array of information, correspondence, legal rulings and other kinds of         things relating to that project?" 
      Indeed, why not?