In-Depth

"Web" in the House

Web-Enabling a Data Warehouse for Complex Querying and Reporting

At first glance, the advantages of Web-enabling data warehouses appearoverwhelming. Users can easily perform ad hoc and other queries against the database via aWeb browser, eliminating the headaches associated with synchronizing distributedapplications in client/server environments. Business and supply chain partners can easilyretrieve eyes-only information based on their domain or IP address. The costs ofscalability are minimal. The intuitive browser interface reduces training. And IT staffcan focus on more strategic applications rather than satisfying user reporting requests.

Web-enabling a data warehouse is not a technical challenge anymore. For straightforwarddata marts or data warehouses, Java applets, ActiveX controls or even basic HTML can beused to dynamically download applications and data. For more complex applications, datarequests can be easily converted into CGI scripts for database queries. Almost all themajor 4GL applications used to create complex data queries and build client/serverapplications are now Web-compatible, either by generating results in HTML or creatingJava-based clients.

But, as savvy managers know all too well, slam-dunk solutions are rare in IT. Webaccess to databases via "naked browsers" implies constant connectivity, reducingapplications in mobile environments and placing additional workloads on servers. Databaseaccess and report generation can devour bandwidth, placing severe demands on a networkinfrastructure. ODBC and other interfaces must be created and maintained when there aremultiple databases on multiple platforms.

Depending on the environment, a Web-enabled data warehouse can involve interweavingnumerous multi-vendor point solutions, such as OLAP tools and report writers. And Internetaccess always raises the specter of maintaining security, especially if the Web-accesstool is not integrated with the underlying security system of the data warehouse.

Develop Information in Context

While the technical concerns are important, the strategic issues raised by Web-enableddata warehouses are much more critical. For example, even universal access to simple datais no longer enough in an era of frictionless and borderless competition. What's nowrequired is the next level of data retrieval: Information in context. Information incontext does more than answer basic "how much" or "how often" queries;instead, it places the answers within the framework of larger strategic, supply chain orcompetitive concerns.

This information in context should be accessible in a variety of ways across theorganization, whether by the Web or client/server implementations. Ideally, this universalaccess can be combined with the capability to analyze the data from a variety ofperspectives by almost every manager. For example, transactional or summarized data can bedownloaded into various enterprise applications for comparative or other analysis. It canalso generate the foundation for analyzing program, departmental or even organizationaleffectiveness, such as productivity or progress toward goals. Or it can support businessintelligence activities for determining optimal marketing, competitive or otherstrategies.

Achieving these capabilities required a fresh, strategic look at what data warehousescan deliver to the organization, how they can be integrated with the Web and/or theexisting architecture, and what types of analyses and reporting best benefit theorganization and perhaps even the supply chain. Of course, these issues must be looked atfrom the perspective of immediate and continuing IT support costs and capabilities. As aresult, interweaving data warehousing and the Web means closely analyzing three key areas:database goals and capabilities; Web-related issues; and IT support.

Optimize the Warehouse

Key database goals should include a robust, integrated data warehouse, OLAP thatincludes multidimensional data views, support of industry protocols and other standards,scalability, access to either summarized or transactional data, rapid responsiveness,extensive ad hoc and other enterprise reporting capabilities that include"publish-and-subscribe" functionality, metadata management and structured andunstructured data support. Users should be able to drill down ­ online or offline ­through the summarized data down to the actual operational data, if necessary.

While both Web and client/server access should be transparent to the data warehouse,Web-related issues include selective partitioning to apportion data processing amongservers and remote clients, security and "disconnected" support ­ limiteddatabase replication combined with the ability to support analysis even without a directconnection to the Internet.

Disconnected support is critical to overcome the Achilles heel of Web-enabled thinclients, which have to send a request to the server for even the simplest transaction.This complicates analysis and presentation when throughput can be choked by limitedbandwidth or server workloads. Remote sales forces need disconnected support since theymay have to conduct analysis at a client site or other location where Internet access isnot available.

Finally, IT issues that must be addressed include ease of development, 32-bitapplication support, potential integration through APIs, ODBC or other protocols and Javaand ActiveX support. Enabling disconnected support also requires limited electronicsoftware distribution capability to keep applications and replicated databases current.Budgetary and administrative issues also include levels of support, training andinfrastructure requirements.

The Warehouse Measures Up

Building a data warehouse means wrestling with many difficult issues, ranging fromexecutive buy-in to data cleansing. It's important to remember that even the best dataaccess tool will be ineffective if the back-end infrastructure and/or data structure isinadequate. But, eventually, every data warehouse will be judged on the two criticalissues affecting users: analysis and reporting.

OLAP was once the heavy artillery of data analysis, reserved for the technicallyskilled business analysts who make up about 5 to 15 percent of all database users. Butrecent offerings from Microsoft, Oracle, IBM and other vendors underscore the trend towardwhat's been termed "OLAP for the masses" ­ multidimensional data analysiswithout the technical hurdles.

Data tools ­ even those enabling Web access ­ should offer two forms of OLAPanalysis: ad hoc and turnkey. Ad hoc OLAP refers to the ability to conduct analysis ofmultiple, "one-time" variables, such as dates, age groups or collections ofstates or other groupings, sometimes in highly complex configurations. Turnkey OLAP alsoprovides multidimensional analysis but against highly defined data marts or warehouses. Ofcourse, the effectiveness and speed of both types of analysis depend on whether they areworking against transactional or summarized data.

Reporting for both client/server and Web-enabled data tools is another area wherefunctionality should be exceptional, since, after all, reports are at the nexus of usage.Reporting capabilities come in two flavors.

First are the standardized reports that provide a periscope on the operations of everyenterprise ­ inventory levels, sales, commissions, etc. ­ and are produced based on aregular schedule or event. Often, organizations build their own standardized reports basedon executive requirements, but "out-of-the-box" pre-formatted reports for theWeb or client/server are available.

By comparison, ad hoc reports are one-time summaries compiled in response to specificqueries. When required, ad hoc tools should easily convert into standardized reportswithout complex programming. Comprehensive data tools can distribute both types of reportsautomatically via e-mail, printers and faxes. Alternatively, they can be posted on a Website for "publish-and-subscribe," or user viewing/downloading.

Reporting as a Strategic Tool

The ability to publish on the Web or distribute via e-mail opens up new opportunitiesto integrate business partners into a supply chain. Managers
can define which customers/suppliers receive which reports. Thresholds can be set so that,for example, reports are only sent if inventory levels rise or fall below set levels, orthe timing of deliveries is outside contractual agreements.

Managed reports combine the advantages of both types of reports. IT managers oranalysts can create a number of parameter-driven components that are available through theWeb. These components are labeled using common business terms and grouped according todifferent classes of users, which may depend on rank or security authorization. Evennovice users can easily assemble these components to create detailed custom reportswithout requiring advanced analyst skills.

Report management is critical to avoid taxing system resources and user time. Onereport management technique is to use the first report as a launch pad for future reports.When the initial report comes back, OLAP navigation that's available offers the ability tochange report parameters and drill down on summary data for additional detail, withouthaving to create a new report. Another report management technique ­ common inclient/server systems, but rare in Web-based ones ­ is called on-demand paging. Thisallows users to only download specific pages required instead of the entire report. Userscan navigate via hyperlinks instead of through scrolling.

Data Analysis on the Cheap?

Despite the ease and advantages of adding Web-enabled or other decision-supportcapabilities to the organization, some companies still refuse to see a need for therequired investment. They argue that adequate decision-support capabilities can beprovided by ERP systems. Or they try to stretch the functionality of a report writer.

But these arguments have a downside. What kind of competitive distinction lies in beingable to develop the same analysis and reporting capabilities as other companies withsimilar systems? At the same time, point solutions have claimed expanded capabilities toexpand market share, but lack the functionality required for in-depth analysis or flexibledistribution, or can be difficult to integrate effectively.

Web-enabling a data mart or data warehouse has a seductive lure. It's relatively easy,and it avoids the cost-of-ownership issues that turned out to be the albatross around theneck of client/server. But like most simplistic issues, there's more than meets the eye.The data warehouse must be infrastructurally sound, regardless of whether it's Web-enabledor not. Web-enabling a data warehouse can place tremendous demands on a network, and forceusers to remain connected to even conduct the simplest analysis.

Many "naked" Web-enabling data tools can't do the industrial-strength OLAP orother analysis that users demand. Or they don't have the flexible reporting capabilitiesthat can be tailored to the needs of the corporation or infrastructural resources. As aresult, companies should look for the best tools that not only offer complete Webcapabilities, but also offer the advanced functionality that can place information incontext, permit the development of structured enterprise analysis and enhance supply chainintegration.

About the Author:

David Cook is General Manager of the Desktop Technologies Business Unit atInformation Builders, Inc.


Scrape It, Don't Scrap It

By Steve Gimnicher

Like a pot of gold sitting at the end of a rainbow, your treasured legacy applicationsprovide reliable business processing for your corporation. But, driven by the unendingneed to improve business-processing efficiency and expand your customer base, you arestruggling to respond quickly.

Ask yourself these questions:

  • How can you integrate your applications in real-time with your call center?
  • How will you migrate your legacy applications to your new packaged applications?
  • How can you consolidate multiple back-end legacy applications?
  • How will you respond to the new requirements based on deregulation in your industry?

If you think the only way to address these issues is to get rid of your legacy systemsand start over, think again. The answer lies in scraping, not scrapping.

If you've tried screen scraping before, you probably wrote some complicated HLLAPIprograms, or you may have built an administrative and communication infrastructure todeploy the screen scraping applications to your clients, and when the screen definitionschanged, your applications broke.

The issue here is not with screen scraping, but rather with the approach. You initiallychose screen scraping because you understood the inherent elegance and value of being ableto access your legacy applications via a session, eliminating the need to change yourbusiness applications, data or infrastructure. Only via a session can you fully maintainthe business rules and transactional integrity enforced by your application code andaccess the unique data integrated into the presentation screens ­ values that would belost if the data were simply accessed directly.

New generation screen scrapers are arriving comprised of a unique combination of agraphical development studio, middleware and application server. They solve the problemswith screen scrapers of the past and deliver all the benefits. Now you can extend andintegrate the 20-plus years of existing business logic, without making any changes to theapplications or system on which they run. In addition, you can do these tasks in lesstime. A typical project from start to finish takes about two months.

New screen scrapers go beyond traditional screen scraping in two fundamental ways.First, they are server-based, an approach that is vital to any large-scale screen scrapingdeployment. Instead of having to manage each client individually, newer screen scrapersprovide centralized administration. In addition, as you apply a screen scraper to a legacysystem, practically any environment can then reuse the screens from that system. Thisincludes the Web, Windows and UNIX, as well as other vendors' offerings, including callcenter applications, workflow managers, transaction monitors, component brokers andclient/server development tools.

Server-based also means optimized performance. These solutions can allow manyapplications to be accessed concurrently, synchronously or asynchronously, therebyenabling integration. For example, the information from several legacy application screenscan be scraped and integrated into a single HTML form, eliminating much of the interactionbetween the application and the application user. Moreover, since these services can beshared, the required infrastructure for supporting concurrent host access can beminimized.

Integration can be extended beyond screen applications as well. For instance, you maywish to authenticate your users against a relational database prior to allowing them toconnect to the host.

Second, screen information is stored externally to the compiled code. This means thatwhen the screen changes (as it invariably does), the screen can be redefined andre-deployed into production without any disruption, in minutes instead of days or weeks.In addition, GUI tools are provided to make this process even simpler. When changes aremade to a screen, they are often quite subtle, perhaps involving a shift of a field by oneor two screen positions. Because this can be very difficult to see, a graphical tool forcomparing before and after images is useful.

As an example, consider the process to book a trip for a customer servicerepresentative at a travel agency. The applications he must access include:

  • A 3270 application on an IBM MVS system for airline reservations
  • A 5250 application on an IBM AS/400 for hotel reservations
  • A VT220 application on a Digital VAX for car reservations

Booking a transaction requires interfacing with each application and its nativeterminal interface. Can you imagine the knowledge required and the amount of time neededto complete one business transaction? Remember that often the customer is in line, waitingfor an immediate response.

Screen scraping provides the ability to create a single, content-rich GUI interfacethat can be used on any Web browser, transparently interacting with multiple legacyapplications behind the scenes. The representative enters flight, hotel and car rentalinformation into the browser form. When this form is submitted, the screen-scraping serverreceives the information from the Web server. It then simultaneously manipulates thescreens for each of the three applications, extracts confirmation numbers for each of thereservations, integrates them into a single HTML document and sends that document back tothe Web browser via the Web server.

This is what screen scraping applications is all about, accessing legacy applicationswithout the need to change your business rules or transactional integrity. The result isan environment where the IT organization can stay ahead of end-user demands and end userscan spend their time focused on their customers, not their systems.

About the Author:

Steve Gimnicher is Vice President of the Reengineering Business Unit for the CNTEnterprise Integration Solutions Group.

Must Read Articles