USDA-Approved Data Integration

The USDA needed to access disparate databases scattered in regional agencies across the country. With a middleware solution and minimal programming, data now flows from an IBM mainframe into Microsoft SQL Server and then to users' browsers.

What if you had a ton of data residing on a chain of islands stretching from Alaska to Hawaii and no way to connect to it? And what if there was no transportation—no ships, no planes, not even a phone line connecting this string of islands? And perhaps each island has its own language. And what if one day the VP of sales stopped by your desk and asked for yesterday's sales figures for four different items on each of these islands?

Such was the situation at the United States Department of Agriculture Rural Development. The agency is actually a collection of separate agencies that provide loans, grants and technical assistance to rural communities for business development, housing programs, community facilities such as fire stations, fire trucks, jails, courthouses, hospitals, clinics and schools, water and wastewater treatment systems, and solid waste disposal programs. Congressional and executive representatives frequently come to the USDA for data on which projects have been funded in their districts or what areas haven't received adequate assistance.

Answering such questions, however, was anything but a snap. With 800 local offices around the country and a host of programs in thousands of communities, a lot of data resided in relatively inaccessible and unlinked systems. All the management information systems were located in Kansas City, including the rural community facilities tracking system (RCFTS), and other stovepipe IT formats such as AMAS and ACQD.

Financial data on dollar amounts being issued for the various projects was on a different system, the program loan accounting system (PLAS), and couldn't be combined with other accounting systems using GLAS and GLS formats. Although the accounting systems all resided on an IBM mainframe, integration and analytical efforts were hampered by the need for skilled programmers to enter and extract the data. As a result, reports were limited to the most critical areas, were wide open to human error and were often out of date by the time they were compiled.

Bridging the Islands
J. Norman Reid, associate deputy administrator at the USDA's Office of Community Development, one of the agencies within Rural Development, found himself assuming the role of project manager for the USDA Rural Development endeavor to integrate the various information systems for employees to query on an intranet. Reid says OCD looked at a number of products, but quickly decided on iWay Software's Enterprise Integration Suite as the middleware needed to provide integration for the disparate data sources. "There were two major technological advances that opened up this solution for us. One was the Web, which we knew was an excellent way to make a wide variety of information easily available to people. The second factor was the iWay software, which let us take different databases and platforms and connect to them. Once we'd done that, we didn't care any longer what [system] they were on because we were just dealing with data," says Reid.

When Reid and his group started the data integration project, the initial difficulty was getting somebody excited about it. "We knew what the vision was, and what we could do," Reid says, "but telling that to a politician was difficult because they can't see it in their minds." Luckily, they were able to find someone at USDA-RD who was enough of a believer and who was willing to go out on a limb and give them some money—initially $100,000.

Once OCD had the first version of the product ready, which took approximately three months, the agency did a demonstration with a data run that took 10 seconds to compute and load. Mike Grisby, the product manager on this project, apologized to the chief budget officer at the demo for the slow speed. The budget officer's response was a stare of disbelief—apparently, they were accustomed to waiting months for that sort of information. After that demo, OCD had no trouble getting additional support for the project.

Other problems encountered were neither technical nor software-related, but rather inconsistencies with the data itself. Since OCD was integrating incompatible data within the stovepipe systems, the group encountered some data inconsistencies. The strange codes had to be either filtered out or recoded.

Another unexpected issue: Inadequate browsers in field offices. In some instances, it took several years to get browsers updated in some offices, which limited the use of certain features by users. Although part of the problem was hardware limitations, most of the blame lay with slow communications—many field offices had slow X.25 communication links. To minimize the amount of data that needed to pass through the Web, OCD decided on HTML screens using a minimum of Java code.

Even in the computer world, knowledge is power. Reid frankly acknowledges that the biggest problems were political. Workers at various levels felt they owned a particular data set and were unwilling to share it. "We fought that and that probably took us as much time as anything, and we're still fighting that. We have one data set that we've been trying since last November to get hold of, and only now is it looking like we might get it."

Product Information

iWay Enterprise Integration Suite

Components start at $6,000

Application Server starts at $30,000

iWay ETL

Pricing starts at $10,100

iWay Software (an Information Builders Company)

New York, N.Y.

(212) 330-1700 phone

(212) 564-1726 fax

info@iwaysoftware.com

www.iwaysoftware.com

WebFOCUS

Pricing starts at $10,000

Information Builders Inc. (IBI)

New York, N.Y.

(212) 736-4433 phone

(800) 969-4636 phone

(212) 967-6406 fax

askinfo@ibi.com

www.informationbuilders.com

Daily Mainframe Data Access
Using the iWay Enterprise Integration Suite, Reid's group set up its Web Reporting System, which performs a daily read on the mainframe data, then pulls it down to an Intel server running Microsoft SQL 7.0. From there, it's posted to the Web with Information Builder's WebFOCUS. Using a standard, Java-enabled Web browser, employees can select the agency, program, time period and locations for which they want data. The data is then pulled from the SQL datacube and integrated at the desktop level on the fly. The results can be displayed onscreen in HTML or saved to an Excel spreadsheet. For additional details, users can drill down to an OLAP table and add or subtract variables to modify it, and can also change the way the results are presented. This means information on specific programs is up-to-date as of close of business the previous day. It's also available to field staff and policymakers all over the country—and all without mainframe reprogramming.

Once the iWay software connection is made to the mainframe, the extraction, transformation and load tool, iWay ETL, which also schedules the draws, performs data cleansing on the data as it's brought down to the SQL server. Invalid codes in the original data are filtered or recoded as the data is delivered, resulting in a clean data set. By setting up the reads to pull down only the variables needed, rather than the entire file, there is a minimum of overlapping data—just what is needed to match the files. "One of the advantages of the ETL product," says Reid, "is that we can bring down a fairly slim file, do the joins between the different tables we're pulling down, feed it to the SQL server and it's ready to go."

Avoiding any modifications to the mainframe systems was a huge benefit, but equally important was the fact that virtually no special training was required for employees to access the information. Although Reid has no hard-dollar figure for the project, he says it wasn't hugely expensive, costing only about $200,000. That figure includes a server, the software itself, which was "dirt cheap," and the labor to put it together. Labor comprised the major cost, and mainly focused around setting up the datacube in SQL Server, developing Web screens, and running down data errors detected in the iWay ETL tool. Because the data mainly came from existing files, very little data input was necessary, and only a small amount of mainframe programmer involvement was needed to connect to the mainframe databases. "It was very cost-effective, didn't require re-architecting the systems or any particular training and didn't require client software—it was a beautiful solution, and it still seems to me that there isn't any better," Reid says.

Implementation went quickly as well. The first system was built in about two months, but unhappy with it, OCD rebuilt it over the next couple of months. The agency continued to modify this preliminary version for the next five to six months until settling on the present version. "Our approach has been to take an idea and develop the hell out of it, and if it doesn't work, then we change it. Our philosophy is that if we had to do all the planning that a lot of organizations do, we'd never get started, and we think, frankly, [this approach has] been a lot more efficient for us." [See "Lessons Learned" on page 56 for additional tips from Reid. —Ed.]

Lessons Learned

J. Norman Reid, associate deputy administrator at the USDA's Office of Community Development, offers these tips for successful handling of a large legacy data integration project.

  • Don't muck around with excessive planning. Figure out what you need and do rapid prototyping.
  • Get a product out quickly, even if it's a very limited version. People can't envision what you're talking about until you can show it. If you get the initial product out, it'll buy you time to build what you really want.
  • Realize that a data integration project is going to involve partnerships, because in a large project, no one owns all the data. You need to be able to connect with those who own different pieces of the data, and make sure they're willing to share their pieces of the pie.
  • Some people try to buy prepackaged total solutions; consultants may tell you that there's one system that will do everything. Keep in mind that prepackaged solutions can be expensive and klutzy, and can require an extensive amount of training to use, as well as lots of programming to make it work in the first place. Instead, look for solutions that go through the Web browser. Basically, let users decide what they want to do without lots of training or understanding [of the underlying systems].
  • If you're creating a system for an executive manager, the Mapping Analysis Program has enormous potential for a lot of different agencies, departments, and even businesses, because there's a geographic component to what most people are doing. Understanding what's happening geographically can be extremely helpful and useful.
  • Keep it simple and make it easily available. Use the Web browser approach—don't try to buy something that pretends to offer a total solution if it requires you to buy their software.

—S.J.B.

Tip of the Iceberg
OCD is now working on a system called the Mapping Analysis Program (MAP), which is similar to the Web Reporting System, except that in addition to displaying information to queries in tables, it accommodates the way most people think—in visual terms—by displaying that data in a map. This data comes from the same datacube populated by the mainframe data brought down by the iWay software into the SQL database. Then, instead of using WebFOCUS to post the data to the Web, it's run through a program called ESRI ArcIMS that builds maps and posts them to the Web.

Product Information

ESRI ArcIMS

Pricing starts at $7,500

ESRI Inc.

Redlands, Calif.

(909) 793-2853 phone

(800) 447-9778 phone

(909) 793-5953 fax

info@esri.com

www.esri.com

Tele Atlas MultiNet (USA)

Call for pricing

Tele Atlas North America, Inc. (formerly Etak Inc.)

Menlo Park, Calif.

(650) 328-3825 phone

(650) 328-3148 fax

info@na.teleatlas.com

www.teleatlas.com

With this system in place, you can discover how much has been spent in the U.S. for a particular program by going to the Web site and designating parameters for the data you want. The program displays a U.S. map with each state colored according to the amount of money spent on the program you've chosen. The display will include a data table from WebFOCUS with the actual dollar amounts state-by-state. From there, you can drill down further by state or by county. The first phase of MAP was built in about three months and was released last September.

Future plans center around adding value to the MAP system. They've just acquired Tele Atlas MultiNet Software (formerly EtakMap Premium Data, ESRI ShapeFile) that will allow them to make comparisons with map overlays. So, for example, they'll be able to overlay high poverty areas with particular projects in exact locations. As Reid says, "It becomes very intuitive to do analyses, and you don't have to be a statistician to see what's going on." As the USDA telecommunications are upgraded, OCD will probably use more Java code, as it gives users greater drilldown and zoom-in capabilities.

Of course, there's always more data to be added. OCD is currently working on entering housing data, which it hopes to have completed by this fall. The agency also wants to be able to plug into disaster and environmental information that's maintained by other departments, such as the Environmental Protection Agency, the Federal Emergency Management Agency, the Natural Resources Conservation Service, and others. "Because we take a comprehensive view of what's happening in rural places, we need hundreds of different kinds of information—some of which I can't even anticipate," Reid says.

The plans don't stop there. As Reid says, "I can also see the possibility of this being an entry point to other kinds of information. For example, say you drill down in Illinois to a particular project in a particular county. Why shouldn't that project also be hypertext-linked to a whole array of information, correspondence, legal rulings and other kinds of things relating to that project?"

Indeed, why not?