In Depth: The Non-Traditional Data Conundrum

Business users are going to consume non-traditional data with or without IT's approval. Experts say data management groups must strive to make consumption of this information predictable and governable.

Information workers are no longer content to use traditional, structured data – the easy-to-categorize data fields any database can store in a record. Organizations are upping their consumption of non-traditional -- or "supplementary" -- information sources, including Web services (such as real-time pricing feeds from suppliers or competitors), subscription services, or other publicly-available -- and frequently free -- data sources.

This has triggered a debate about how users are consuming non-traditional data. Such data has advantages, to be sure; it can add depth and accuracy to BI analysis, for example. These sources also carry risk, both in the form of over-consumption (i.e., information inundation and overload) and information pollution (i.e., what happens when a well-tuned DM infrastructure gets overwhelmed by the information consumption habits of its users).

There's a reason, after all, that much of this information is and always has been "supplemental." Data managers tend to prefer nice, clean, well-defined feeds, typically from nice, clean, well-defined sources -- such as in-house relational databases or operational systems. In the past, data managers have tended to put up roadblocks when business users have clamored for access to additional data sources, not to mention extra data feeds. In some cases, they've drawn lines in the sand when the line-of-business has asked for access to (largely unprofiled) external sources.

Such was (and to a degree still is) the status quo. Changes are, however, clearly taking place, as line-of-business requirements trump data management (DM) concerns. Over the last half-decade, business users have been in the driver's seat; what the line-of-business wants, the line-of-business increasingly gets. One upshot of this, industry watchers say, is that DM groups are beginning to accommodate -- albeit under duress -- the non-traditional requirements of their business user consumers.

There's a certain logic to this development, too: as any IT professional will tell you, frustrated business users tend to "find a way" to get what they want or need. If they're thwarted by their DM groups, they'll acquire the data they need by other means -- e.g., by sharing high-level passwords (for access to off-limits relational databases), copying and pasting data into spreadsheets, or subscribing to external feeds themselves and dumping their contents into department-specific Access or higher-level data stores.

Again, the issue boils down to just how they're acquiring it. The good news -- for DM practitioners and business users alike -- is that the industry is positively brimming with products that address the needs and concerns of both groups. Indeed, business intelligence (BI) vendors seem to have a collective case of supplementary-data-on-the-brain. From data warehouse (DW) appliance specialists such as Vertica Inc. (which positions its most recent DW-on-demand service as a painless way for organizations to consume supplementary data) and start-up players QL2 Software Inc. and ChartSearch Inc. to established players such as Composite Software Inc., Informatica Corp., and SAS Institute Inc., BI vendors are making a lot of noise about supplementary data and its business user discontents. The selling points of such products, advocates argue, are that they let DM groups accommodate business users even as they bring order and form to what has, to date, been a mostly ad hoc process.

"There has always been this sampling of people [inside any organization], just grabbing information and throwing it in a spreadsheet. It's one of those informal processes that nobody talks about … [but that] just happens all the time," observes Scot Madill, director of product management for supplementary data specialist QL2 (see "This [supplementary data] is all stuff that [users] want -- stuff they actually need -- but IT just doesn't want to put it into the data warehouse, so they find a way around [IT] and do it on their own. What this does is create these kind of rogue data [sets] in the organization, where [users are] using this [rogue data] along with the information they're getting from the data warehouse."

How they consumer the data may be kludgy, according to Madill. It sometimes involves copying and pasting information from a Web browser into an Excel spreadsheet. More commonly, he says, business users are grabbing data from Web services feeds -- which are becoming ubiquitous thanks to Web services interfaces -- or subscription feeds.

Companies such as QL2, ChartSearch, and others promise to bring order to the chaos -- a form of data governance, if you will. In any case, it's the kind of proposition that, advocates argue, pretty much sells itself.

Even established heavyweights are coming around. Consider SAS, which -- along with IBM Corp., Oracle Corp., and SAP AG (the steward of the Business Objects Data Integrator product family) -- is one of the industry's biggest data integration (DI) players. SAS DI guru Ken Hausmann -- who (to his credit) isn't afraid to don a vendor-agnostic hat -- says non-traditional or supplementary data consumption is a phenomenon that SAS and other players are going to need to accommodate.

Hausmann acknowledges the interest among customers. Over time, he suggests, non-traditional sources will probably become traditional -- in a certain very definite context, that is.

"We're seeing a sort of general level interest [in] this kind of data. It's being used like unstructured data. What [customers are asking for is] an ability to access all of [their] unstructured data. What they want to do with it, a lot of times, is to add color. They're putting it in reports, for example, to add color -- to enhance [i.e., flesh out] reports with these extra details," Hausman explains.

"I like to think of it is as sort of adding another dimension to your data to give you a more complete picture of what's going on."

QL2 and other firms like to portray the situation as a non-IT-instigated initiative: i.e., business users themselves are agitating for (what Hausmann calls) "color." Hausmann, for his part, agrees -- although he stresses that IT isn't quite as behind the curve as some of the upstart players like to claim.

"What [IT is hearing] from users is 'Give me the bigger picture. Fill in some gaps for me.' That's why [users] say they need this [supplementary] data," he acknowledges. "I think IT sees that. [IT is conceding that] there are these situations where maybe it's not so obvious that this data has a direct connection. [But the user can] see the connection, [the user] knows there's a connection, even if he can't define what it is yet. I think that's another thing that's driving this."

Veteran data warehouse architect Mark Madsen, a principal with consultancy Third Nature Inc. -- and author of several data warehousing tomes -- agrees.

The foundational force that's driving the shift to accommodate supplementary data, Madsen argues, is one that has the potential to revolutionize data warehousing like few other trends before it.

Simply put, Madsen maintains, IT (and DM groups in particular) are in the midst of an economic reevaluation-of-all-values. DM groups once made good-faith efforts to prioritize and address user requirements -- e.g., access to X, Y, and Z data sources is of paramount concern to users, while access to other, less important sources (especially for one-off reporting or analysis requirements) is less important. Therefore, these latter sources get grouped into the nice-to-have (not the need-to-have) queue. Now, however, DM groups are more accountable than ever to the line of business.

There are a couple of sources of pressure, Madsen points out: there's the post-dot-com-crash ascendance of the line-of-business and (more recently) a bona-fide explosion of non-traditional deployment scenarios -- many of which (e.g., software-as-a-service players such as are taking their pitches to the line of business decision-makers directly.

Madsen says this means DM groups are under pressure to deliver many of the nice-to-have features that they might well have consigned to the dustbin of memory. If they don't, he points out, business users might go out-of-band -- i.e., go around IT -- to seek redress.

As a result of this pressure, the data warehousing status quo is beginning to crack, according to Madsen.

"The beginning of the end for data warehousing is right now. The centralized model, especially without good data management around it, it's just prone to failure. It's monolithic," he claims. That sets the scene for the rise of cloud computing (e.g., On Demand BI or DW services that make it possible for IT organizations or lines of business to rapidly create new information services) and other hugely transformative models.

"What are the new models?" he asks. "Centralized? Cloud? Web 2.0? Cloud [for example] is an implementation decision. If I stick it in Oracle [on premises], if I stick it in Oracle-on-the-cloud, really, what's the difference? It's still data in a database, so maybe it comes down to a cost issue."

In any case, Madsen points out, the proliferation of deployment alternatives drastically reorders the economic calculus for DM groups -- and the business users they're notionally charged with supporting.

"A lot of times [in the past] you had the business saying, 'We want this data. We need access to this data,' but IT was saying, 'It's one-off' or 'It's seasonal.' In either case, [this requirement] wasn't that easy [that is, inexpensive] to address with [the then-current] BI tools.

"Unless it was something that was repetitive, there was never enough ROI for [the DM group] to get around to it," Madsen says. "If you look at [the DM group's] priorities, [these one-off or seasonal requirements are] always number 4 or 5 on a list, so numbers 1 to 3 get done, but numbers 4 and 5 never get done because the ROI just isn't there."

That was then. With today's raft of deployment models, Madsen suggests, that position is fast becoming untenable, if not downright incoherent.

"If you look at what the BI tools [address], it's really the 80 percent [of user requirements]. The remainder, that 20 percent, it's a whole lot of one-off needs, or incremental needs, or just interesting projects that are valuable only to one department or a few users. These [projects] never get done," he points out.

That explains why Madsen is far from sanguine about the prospects of the data warehousing status quo. "[With respect to] the enterprise data warehouse, if my answer is, 'I can get that for you in three to four months, after I source it, remap my data warehouse, and do the ETL,' and the [line of business's] response is 'I need it in two months, or it's not in any value' -- that's creating this impetus for change. The fact is that [the line of business] can now go around [the data management group] and do it themselves."

That's precisely the need that QL2, ChartSearch, and other upstarts say they're addressing. "Among users, there is an expectation of being able to access and search for information, but the traditional BI paradigm falls short. The model has always been the traditional reporting-centric model, the push model of information, whereby an analyst would design dashboards and design reports," says ChartSearch CEO Chris Modzelewski.

In the old paradigm, Modzelewski continues, it frequently came down to hard-boiled economics. Not so anymore.

"Because of the economics of producing business intelligence reports, and because of the complexity of [parsing] market research data, it hasn't been feasible for companies to load some of this data into their data warehouses, regardless of how many users actually consume that data," he points out. "Today, users have to send off emails to analytical teams to research those reports, or pay market research firms to do it."

More to the point, Modzelewski and other proponents argue, there's a very good reason not to resort to such out-of-band kludges: they're difficult (and in some cases impossible) to govern.

"How can you ensure the credibility of that information?" he asks, protesting -- at the same time -- that neither he nor his competitors are engaging in promoting fear, uncertainty, and doubt. "[Credibility] is a real issue. You have users copying and pasting [data] into spreadsheets, doing things like that. They're doing it because it's their only alternative. There's no governance there," he points out.

"What we're doing is providing a structured, governable way to give [users] this information, information that [in the past was] either unavailable or just not economical to make available [from the data warehouse]," Modzelewski concludes.

Upheaval Ahead?

Few industry watchers anticipate out-and-out upheaval in the traditionally stodgy data warehousing space. Madsen, for example, thinks it's an issue of enterprise DM groups adapting -- grudgingly perhaps, but unavoidably -- to a new economic reality. It isn't a question of sudden adaptation, Madsen stresses, but of phased accommodation (albeit at an accelerated rate) to a changing status quo. Regardless, he argues, it's going to happen.

SAS' Hausmann, for his part, seems sanguine in a different way.

"Whether it's [as a means] to add that layer of color to reports or dashboards to make them that much more interesting, or -- and we're seeing this, too -- to supply missing pieces of information that directly impact the [information] request itself, it's a good thing," he observes.

There's a degree of inertia in data integration as it's practiced today, says Hausmann, who sees a line-of-business rebellion that forces DM groups to shift from a reactive to a very slightly proactive posture as an unalloyed good.

"The real value of all of that [unstructured] data will not be realized until that [unstructured data] is given structure -- so it can be queried [and] integrated into mainstream reports. As an industry, we [i.e., data integration practitioners] have been slow to address this [need]. But thankfully that's starting to change," Hausmann points out.

The benefits of bringing order and structure to ad hoc information consumption processes are manifold, he argues: they safeguard against rogue or unsanctioned DM processes and let a DM group fulfill one of its titular (and all-too-frequently neglected) roles: that of a guarantor and protector of data.

Inasmuch as users need data, they also need to be protected from data -- i.e., from consuming a surfeit of non-essential information. In other words, users need to be shielded from data over-consumption: the line of business doesn't need access to any or every possible information source out there -- and to the degree that DM groups are willing or able to accommodate business users on most accounts, when DM really does dig its heels in (if, for example, managers believe some information sources are superfluous), it has a better chance of being heeded.

That's what's at stake.