The Sources Of Production
Divining Data And Building Corporate Intelligence.
You need to move data securely all over the enterprise, crossing languages, operatingsystems, databases and applications, while still providing "friendly data" thatpeople can and will use. Many pieces of data warehousing, previously jury-rigged are nowbeing "packaged" by collaborating vendors.
Immediately after itemizing requirements of their industry/business environment, ITmanagers generally hit one technological hurdle. "There are hundreds of differentsources of information a company may have to feed reporting applications," explainsBill Seagrave, President, Data 2 Knowledge, Inc. (D2K; San Jose, Calif.), whichspecializes in data transforms between mainframe and other platforms for data mart anddata warehouse provisioning. "IT managers are faced with multiple operating systems,database technologies and languages, not to mention how complex the reporting applicationsare themselves."
With so many potential data sources that an enterprise network can encompass,coordination and data access is a very complex and ripe marketplace. Libraries long agounderstood that finding the information you need is more than rifling through thousands ofbooks. That's why they created card catalogs and now online catalogs. That kind ofinformation about data stored in databases across a company network is called metadata.Metadata is required to help the many front-end user applications find the necessaryinformation from the back end, whether it's in a client-server or legacy system.
Data Compass
Metadata also provides navigation information for applications and coordinates databaseupdates and general data flow. Some vendors such as Hyperion (Sunnyvale, Calif.) andOracle (Redwood Shores, Calif.) provide their metadata layer, along with tools andutilities for data extraction, scrubbing, transformation and movement in the software. Butstart-ups, such as D2K, recognize that in a given client site there will be few if any,people who understand all the variables that will ultimately impact or contribute to thesuccess (or failure) of a data warehouse.
Hence, companies are compelled to call on the services of vendors and consultantswho've developed and practiced this special expertise. These new providers packagestandalone metadata software for use in many environments.
Brightening the picture and making greater legacy integration feasible arevendor-initiated data transfer standards that are gaining momentum. The Meta DataCoalition (Austin, Texas; www.mdcinfo.com) is an alliance of software vendors and userswho've defined the Open Information Model specifications that facilitate sharing andre-use in data resource domains based on its Unified Modeling Language. These standardsstreamline translation, staging, cleansing and QA of the data.
Metadata Matters
Metadata is a must for high performance data warehouses in large environments. It'sclosing one critical gap in data warehouse implementations: enabling databases anddistributed client-server applications to cooperate and make use of metadata generated byeach. But again, using homegrown or packaged apps is dependent on your particular needs.
There is a great balance to providing amounts of corporate information to everyoneacross the network. But at the same time, companies need to be sure they're sending datato the people who really need it and who are who they say they are. Consequently, securityhas become an important topic in the data warehousing community and is generatingeminently workable solutions.
Many data warehouses require that all usernames, passwords and returned information beencrypted. Users receive a public key issued by a certificate authority and stored on theWeb browser. A private key at the data warehouse matches the public key at the time of anyquery. Once the keys match, the user is in and can search for needed information.
Certificates are another valuable way to protect a data warehouse. They're much moredifficult to sniff out than passwords, making it harder for someone to pretend to besomeone else. Encrypting the returned information ensures that no one other than the usercan view a query's results. When companies are dealing with sensitive information aboutcustomer profitability, product plans, or rollover, for example, these security solutionsare very important.
Integration of the data warehouse into Enterprise Resource Planning (ERP) and SupplyChain Management (SCM) systems is made considerably tighter. HP application engineerscontinue to work closely with vendors who are developing end-to-end solutions, such asSAP's Business Warehouse and i2's RHYTHM suite, to optimize data warehouse methodologies,technologies and services for HP equipment. These developments have already improved thetrack records for companies in terms of quick-start implementation, high-availability andincorporating highly automated backup/recovery features.
It's the open nature of OLAP applications that links them to data warehouseapplications. OLAP provides for the essential sharing of sets of user and functionalrequirements that cannot be met by traditional query or personal-productivity toolsworking directly against historical data maintained in the data warehouse relationaldatabase. An OLAP server provides functionality and performance that leverages the datawarehouse for reporting, analysis, modeling and planning requirements. These processesmandate that the organization looks not only at past performance, but more importantly atthe expected future performance of the business.
Break The Law and Fall Up
There is a bumper sticker that reads: "Gravity, it's not just the law, it's a goodidea." You can have fun with the same logic applied to the data warehouse. It's notjust essential, but plainly a good idea to create operational scenarios shaped by the pastand that include planned and potential changes that could impact tomorrow's corporateperformance.
Many companies who've watched data warehousing projects from a distance have learnedimportant lessons from the first decade of implementations. Because proper project scopeis so important and providing a speedy return on investment so critical, manyorganizations now are opting for smaller phased rollouts of their data warehouse strategy.
Putting together bite-sized pieces (300-800GB), often called a data mart, gives IT moreopportunity to test and debug technology and strategy (and gain management approval)before installing other segments. Because data marts are generally department-specific andrun on homogeneous hardware and software, they're much less complex to implement.
In addition, IT can choose the neediest departments and address those needs quickly. IfFinance needs to keep track of general ledger, accounts, or budgeting, for instance, ITcan build a data mart with relevant data so that financial planners won't have to wait forthe entire project to be complete before they get the analytical tools they need.
IT can also prove the value of a data warehouse in microcosm with a data mart, if youhave to justify budget dollars to upper management. A data mart is significantly lessexpensive, provides faster ROI and can serve as the prototype for a larger data warehouseproject. IT can string together multiple data marts into a cohesive data warehouse later.
Island Hopping
Successful data mart implementations have always been part of a larger data warehousingplan. The very things that make data marts successful in departmental solutions are thethings that must be carefully considered in a data warehouse. To prevent data marts frombecoming islands of information, however, the hardware they run on should be compatiblewith everything else in the enterprise, either through hardware standards or softwareintegration across multiple platforms.
The data stored in the data marts should be compatible with a larger scheme for data inthe warehouse as well. Because data marts are often designed for applications that aremore department-specific than the entire enterprise will need, the data can be stored in away that is immediately convenient for the application.
This does not necessarily guarantee that the information is stored in a way that makessense for a data warehouse-to-be. Successful architects roll out data marts as part of atotal data schema plan, even if the data warehouse will not be complete for several years.This ensures that the data marts will integrate into the larger whole without a majorreworking.