Nonlinear Data Warehousing

Data warehousing is poised for a transformation. The immediate occasion is the economic dynamics that are occasioning a makeover of large segments of our marketing values, consumer values, and social values. The good news is that the transformation will be driven by future-oriented considerations that put human values, community relations, and innovation at the center of productive efforts to reduce suffering and enhance the quality of life in growth-oriented ways. The less good news is that people are – and will still be – people. That is, people within enterprises and society as a whole will remain territorial, greedy in a dysfunctional way, and often willing to sacrifice short term advantage to the common good. The challenge of taking the lessons of a difficult past and putting them back in the past will continue to confront organizations committed to rewriting their future in ways that create new possibilities for human well being as well as classic profit and revenue. Data warehousing will be there and, in an expanding way, on the critical path.
Data warehousing was invented to gain a perspective on the enterprise as a whole. The celebrated “single version of the truth” has morphed into diverse departments that want the same data but each wants it from a different point of view. The data warehouse was invented because the finance, marketing, inventory, customer service, sales, etc. departments all want the same data about basic transactions the enterprise has completed; but they want it represented according to different dimensions, varying key performance indicators, and diverse master data frameworks. Charts of accounts, promotions, inventory turns, customer complaints, as with such basics as product hierarchy, and customer references all turn raw data into information, addressing the questions of the respective business customers.

The transformation of data warehousing will be in the direction of nonlinear data warehousing. What does that mean? Instead of predicting the past, it will actually provide insight into the future. Nonlinear data warehousing is the next high concept. Elements of it have been around forever; but a critical mass of hardware and software improvements is causing a convergence. These include: data integrity, metadata to represent context, correlation between the data warehouse system and near real-time business environment, and master data as the framework for business meaning and results. Given the alignment of these factors of production, the nonlinear data warehouse becomes a possibility engine for generating meaning, competitive advantage and innovations for the business.

Data integrity is the foundation of nonlinear data warehousing. It is now commonplace to say “without integrity, nothing works.” Here, “integrity” is a not a moral judgment about right or wrong; but it is a pragmatic statement about the accuracy of the data in the warehouse. Of course, data integrity is a process, not an end point. The business environment around data warehousing is changing constantly and that means that the data is the target of continuous add, change, and delete operations. However, even more fundamentally, data integrity leads to yet another uncomfortable truth. When the leaders, managers, and staff of an enterprise are casual about honoring their word, then the data warehouse reflects that casualness. The data warehouse correlates nearly perfectly with the level of integrity of the enterprise. Once again, this is not a judgment of moral worth, but rather of pragmatic workability. Nonlinear data warehousing “gets” this correlation and uses it to raise the workability and transparency of the enterprise as a whole.

Metadata is the backbone of nonlinear data warehousing. Metadata correlates to data just as knowledge correlates to information. In other words, metadata implies a commitment. Metadata captures the business context. It represents the business environment. Data says the customer bought ten widgets for ten dollars. Metadata says this event occurred on a given date (January 10, 2009), represented in a certain way (YYYYMMDD), at the web store with such-and-such a timestamp. The currency was represented as dollars (not Euros or Pesos) with a ten position customer identifier, itself referring to billing and delivery information stored in such-and-such a master data structure. When you think about it, most of the data of interest to the information technology (IT) department is metadata. Most of the data of interest to the business is master data. Of course, if the master data lacks integrity, then the IT department hears about it from the business, and IT gets very, very interested in the master data. However, by that time, a breakdown has already occurred, and systems occur as needing “fixing.” In many enterprises, this is the constant and occurring state of IT. The business’s own lack of commitment to integrity (see above on “workability,” not moral worth) is often unsaid, but in the background. Once again, the data warehouse can readily become a lightening rod for surfacing and resolving such issues and challenges.

Correlation is the key process parameter of nonlinear data warehousing. An example will be useful. This is not correlation: A large batch update is an example of a linear update to the data warehouse. The batch causes a global change of state to the database based on a global commit point. Of course, this is workable and useful from the limited perspective of capturing the past within the horizon of a daily, weekly, or monthly time horizon (latency). Causation is not bad; even better is correlation. Correlation is having a customer enter his or her own delivery information and capturing it to an automated process that updates the data warehouse. Note that the process is self-correcting. If the customer enters incorrect delivery data, then the product or service will not be delivered. The customer will be strongly motivated to get in touch and correct whatever inaccuracy has occurred. More than likely, any inaccuracy will be detected before he presses the “enter” button, since people want to get their stuff. Of course, this is an over-simplification, albeit a useful one. Like all IT systems, data warehouses are a representation of the business reality in which the enterprise is operating. There are many points of contact at which data is captured or used to push back on that business reality in the form of promotions, forecasts, and business initiatives. Since most systems were not designed top down in a greenfield, but rather piecemeal in response to the crisis or opportunity of the moment, correlation is first of all a principle of design and operation.

As might be expected, data warehousing, like most IT systems, is trying to hit a moving target. The business environment is changing as fast as systems can be updated to correlate to it – in many contexts, even faster. Federal regulations requiring the archiving and retrieval of email and related documents for legal discovery is driving a convergence of unstructured and transactional business data, even if the convergence will remain incomplete. Initiatives in healthcare around a clean, consistent unified electronic patient record (representation) are front and center. Corporate transparency, compliance, and fraud detection in the market are an increasing priority, even prior to additional enabling legislation. Individually and collectively, these are all areas where metadata, master data, and correlation of data warehousing information are on the critical path to transforming data warehousing into a nonlinear approach to business intelligence.

BY Lou Agosta

Lou Agosta is an independent industry analyst, specializing in data warehousing, data mining and data quality. A former industry analyst at Giga Information Group, Agosta has published extensively on industry trends in data warehousing, business and information technology. He can be reached at

Copyright 2004 — 2009. Powell Media, LLC. All rights reserved.



Copyright 2008-2009 Daily IT News | Contact Us