I’ve never met a datum that I particularly cared for. I’m a software developer -- data is for users, content developers, executives and clerks. But when it comes to data that describes other data -- metadata -- I’ve never met a metadata I didn’t like. If not liked, at least endured, so I could create some nifty application. Metadata is the glue that holds relational databases together. It’s the stuff that prods class diagrams to consume entire hallways, saving development companies the cost of buying hotel art for the walls. It’s the stuff that is conspicuously absent or incomplete in most object-oriented programming languages. It’s also one of the most valuable resources of a development organization, and it’s a little tricky to track.
Metadata comes in many flavors, describing both the technical and business aspects of your enterprise. While the business metadata of a company is important for the end users, I’m more concerned with describing technical metadata. Billions of dollars have been spent in the pursuit of developing reusable code, while significantly less has been spent on creating an infrastructure that encourages code reuse. It’s a good bet developers are going to spend a limited amount of time searching for existing implementations that meet their needs before they decide to code a solution from scratch. Better access to technical metadata increases the possibility of reuse.
The management of technical metadata is by no means a cut-and-dried solution. Many factors -- size of the enterprise, budget, commitment to reuse, expertise of developers and strength of management -- can influence the technique you choose. There are many choices and they vary in ease of implementation, required commitment and payback. I tend to classify these choices as model-oriented, top-down and bottom-up. As products mature and add features, the distinctions become less clear.
In model-oriented management, creating visual models drives software construction. Computer-aided software engineering (CASE) tools that support entity relationship diagrams and unified modeling language (UML) are exemplary of this style of metadata management. A model of the software is designed using a CASE tool and a popular methodology. Some tools can even generate boilerplate code from the models. Most modeling tools can also reverse engineer existing code, but, once created, metadata work is typically driven from the model, not from the code.
UML modeling tools typically have excellent support for describing relationships between objects. Subtle and important details -- cardinality, ownership, whether or not relationships are bidirectional -- can be described easily in a modeling tool. A problem inherent in the model-oriented approach is that unless developers are experts at using the modeling tool, working with the supported methodology and writing code, they can’t be productive. When crunch time hits, code diverges from the model until the model is no longer accurate. Modeling tools, like Rational’s Rational Rose product (www.rational.com), can provide significant value to the development process, but can be cumbersome to use for metadata management throughout the development lifecycle.
Component repositories tend to manage metadata in a top-down fashion. Unlike modeling tools, repositories typically store the actual object -- or a reference to the object -- that they represent. Depending on the tool, the object may be stored as source or as compiled code. Once a component is added to the repository, it can be decorated with additional information beyond what the source language can provide. Decorating an entry allows the repository to go beyond a development tool’s metadata limitations.
I call this model top-down because developers can pull components directly out of this repository and into their own code, enforcing consistency from the top. Some drawbacks of a repository approach include managing the repository and ensuring components get checked into the repository consistently. Microsoft’s Repository (http://msdn.microsoft.com/repository) is an example of the top-down approach.
Managing metadata by extracting it from source code is what I call bottom-up management. In the bottom-up scenario, the source code is the master model for the enterprise. Bottom-up metadata is attractive because it requires almost no change to existing methods. Genitor’s Surveyor product (www.genitor.com), for example, will analyze your code and produce an object web: a representation of your enterprise’s code that can be browsed, including class hierarchy, methods and related documentation. Bottom-up approaches are typically bound by the semantics of the languages or database they support, meaning they lack the richness of a modeling tool or a decorated class repository.
Development organizations should have a strategy to track and share metadata. The techniques I’ve described are listed in descending order of management commitment required for success. The payoff, however, for each technique depends on the makeup of your company. The key is not which technique you choose but knowing why you chose it and what you realistically expect to gain from your choice. --Eric Binary Anderson is a development manager at PeopleSoft's PeopleTools division (Pleasanton, Calif.) and has his own consulting business, Binary Solutions. Contact him at firstname.lastname@example.org.