Monday, February 10, 2014

Document Information Requirements Graphically With BDM Diagrams

BI teams often struggle to keep the business engaged, especially during requirements analysis. This post looks at a graphical technique for documenting information requirements -- one that business people will read and respond to.

Keeping the business engaged is one of the keys to a successful BI program. One technique I have found to be very helpful on this front is Laura Reeves's Business Dimensional Model (BDM).

The BDM is a technique for documenting information requirements. Before I explain the BDM, a few words on the requirements themselves.

Information Requirements

Before you can design a dimensional model, you need to capture the business requirements that it will support. The most successful projects capture business requirements by working directly with people in the business, often through interviews or requirements sessions.

In my book, I suggest that as you organize your information requirements by business function.  You then state them in simple form: as a group of metrics and their associated dimensionality.

For example, a set of interviews about the taking orders might boil down to a requirements statement such as:
  • Order Information by order date, order line, salesperson, customer and product.
The metrics that comprise the group are then fully documented.  For example, "Order Information" is further supported with documentation of:
  • Order dollars
  • Order quantity
  • Cost dollars
  • Gross margin dollars
  • Gross margin rate
Relevant hierarchies in the dimensions should also be specified. For example, "Product" might be described as:
  • All Products à Category à Brand à Product 
Finally, the major dimensions are cross-referenced to the metric groups in a conformance matrix.

These information requirements then drive solution modeling. The next step is to develop a top level dimensional model, and then a detailed database design.

(For more on developing and documenting requirements, including a fully fleshed out example, see my book -- it's listed at the end of this post.)

Getting People to Read It

When it comes to information requirements, you must ensure that the business stakeholders review and respond. (Better still is to involve the business in the identification and documentation process.)

In the book A Manager's Guide to Data Warehousing, Laura Reeves provides a graphical technique that helps keep the business's attention. She calls it the "Business Dimensional Model (BDM)."

This technique integrates nicely with the approach I've outlined above.

Each group of metrics is depicted in a simple diagram, with the metric group in the center and the major dimensions arrayed around it in circles.

For example, the Order Information metric group above might be documented thusly:


Within each circle, the underlined text identifies a dimension. Beneath the dimension, the level of detail applicable in the metric group is listed.

Additional illustrations document the dimension hierarchies. For example, the product dimension from the picture above might be documented like this:



The most detailed level of the dimension is shaded darkly. The arrows indicate hierarchies, going from summarized to detailed. Elements that will drive Type 2 slow changes have a shadow. Separate symbols (not shown) are used for junk dimensions, other derived elements, and future attributes.

People Like Pictures

I've found that using BDM diagrams dramatically increases the participation of business stakeholders. People look at BDM diagrams, understand them, and react to them -- often with great enthusiasm. That's a powerful aid in refining and validating your requirements.

These diagrams are also easy to produce using the built in drawing tools that come with basic productivity software.  This means you can often get business stakeholders to participate in their creation. For example, the pictures above were created in Microsoft PowerPoint using basic shapes and Smart Shapes.

Lastly, the ability to produce these diagrams using basic productivity software means they are easy to incorporate in the best format for this kind of documentation: the presentation.  I find the presentation format is far more likely to be reviewed than a word processing document. (More on this topic in a future post.)

Further Reading

As I said back in 2009, I am a big fan of Laura Reeves's approach to requirements and design. As you can see, there is a natural affinity between the BDM and the techniques I've talked about in the past.  I encourage readers to check out her book (see below).

More info about requirements and documentation can be found on this blog. Have a look at these posts:
You can read more about the process of identifying information requirements in these books:
  • The examples in this post are drawn from my book, Star Schema: The Complete Reference (McGraw-Hill, 2010)  A more fleshed out explanation of tasks and deliverables, with examples, cab be found in  Chapter 18, "How To Design and Document a Dimensional Model."  The examples from this post come from Figure 18-4 (which in turn builds on the star in Figures 3-3, and the hierarchies in Figure 7-3).
You can help support this blog by using the links above to purchase these books from Amazon.com.

[Edited 2/13/14 - Corrected the links, thank you for the emails.]





Thursday, November 14, 2013

Facebook's Ken Rudin on Analytics

If you are interested in how business analytics impact your BI program, carve out forty-five minutes of time to watch Ken Rudin's recent TDWI keynote: "Big Data, Bigger Impact." The video is embedded below.

Rudin is the director of analytics at Facebook. In his presentation, he discusses several topics that are of interest to readers of this blog. Among them:
  • Big data technology should be used to extend your traditional BI solution, not replace it. Facebook has realized this, and is working to bring in relational technology to answer traditional business questions.
  • Successful analytics programs bring together centrally managed core data metrics with a variety of data that is not centrally managed. Rudin shares different ways he has been able to make this happen.
  • A similar balance can be attained with your organizational structure. Use of "embedded analysts" provides the business benefits of decentralization, while maintaining the efficiencies and scale advantages of a centralized program.
These are just a few of the points made during his talk. If you don't have the time to watch it now, bookmark this page for later.

You'll also want to check out Wayne Eckerson's latest book, Secrets of Analytical Leaders. (Details below.)

Big Data, Bigger Impact
Ken Rudin
TDWI World Conference, Chicago 5/6/2013




Recommended Reading

Wayne Eckerson's excellent book, Secrets of Analytical Leaders,features more insights from Ken Rudin and others.

I highly recommend this book if you are interested in analytics.

Get it from Amazon.com in paperback or Kindle editions.


Wednesday, September 25, 2013

Optimizing warehouse data for business analytics

Business analytics often integrate information from your data warehouse with other sources of data. This post looks at the best practices of warehouse design that make this possible.

I receive a lot of questions regarding the best way to structure warehouse data to support an analytics program. The answer is simple: follow the same best practices you've already learned.

I'll cover these practices from a dimensional modeling perspective. Keep in mind that they apply in any data warehouse, including those modeled in third normal form.


Store Granular Facts

Analytic modelers often choose sources external to the data warehouse, even when the warehouse seems to contain relevant data. The number one reason for this is insufficient detail. The warehouse contains summarized data; the analytic model requires detail.

In this situation, the analytic modeler has no choice but to look elsewhere.  Worse, she may be forced to build redundant processes to transform source data and compile history. Luckily, this is not a failure of warehouse design principles; its a failure to follow standard best practices.

Best practices of  dimensional design dictate that we set the grain of base fact tables at the lowest level of detail possible. Need a daily summary of sales? Store the individual order lines. Asked to track the cost of tips? Store detail about each leg.

Dimensional solutions can contain summarized data. This takes the form of cubes, aggregates, or derived schemas. But these summaries should be derived exclusively from detailed data that also lives in the warehouse.

Like all rules, this rule has exceptions. There are times when the cost/benefit calculus is such that it doesn't make sense to house highly granular indefinitely. But more often than not, summary data is stored simply because basic best practices were not followed.

Track Changes to Reference Data and Use Effective Dating

When reference data changes, too many dimensional models default to updating corresponding dimensions, because it is easier.

For example, suppose your company re-brands a product. It's still the same product, but with a new name. You may be tempted to simply update the reference data in your data warehouse. This is easier than tracking changes.  It may even seem to make business sense, because 90% of your reports require this-year-versus-last comparison by product name.

Unfortunately, some very important analysis may require understanding how consumer behavior correlates with the product name. You've lost this in your data set. Best practices help avoid these problems.

Dimensional models should track the change history of reference data. In dimensional speak, this means application of  type 2 slow changes as a rule. This preserves the historic context of every fact recorded in the fact table.

In addition, every row in a dimension table should track "effective" and "expiration" dates, as well as a flag rows that are current. This enables the delivery of type 1 behavior (the current value) even as we store type 2 behavior. From an analytic perspective, it also enables useful "what if" analysis.

As with all rules, again there are exceptions. In some cases, there may be good reason not to respond to changes in reference data by tracking history. But more often than not, type 1 responses are chosen for the wrong reason: because they are easier to implement.

Record Identifying Information

Good dimensional models allow us to trace back to the original source data. To do this, include transaction identifiers (real or manufactured) in fact tables, and maintain identifiers from source systems in dimension tables (these are called "natural keys").

Some of this is just plain necessary in order to get a dimensional schema loaded. For example, if we are tracking changes to a product name in a dimension, we may have multiple rows for a given product. The product's identifier is not a unique identifier, but we must have access to it. If we don't, it would become impossible to load a fact into the fact table.

Identifying information is also essential for business analytics. Data from the warehouse is likely to be combined with data that comes from other places. These identifiers are the connectors that allow analytic modelers to do this.  Without them, it may become necessary to bypass the warehouse.

Summary

If you've been following the best practices of dimensional modeling, you've produced an asset that maximized value for analytic modelers:

  • You have granular, detailed data.  
  • You are tracking and time-stamping changes to reference data. 
  • You've got transaction identifiers and business keys.  

It also goes without saying that conformed dimensions are crucial if you hope to sustain a program of business analytics.

Of course, there are other considerations that may cause an analytic modeler to turn her back on the data warehouse. Latency issues, for example, may steer them to operational solutions. Accessibility and procedural issues, too, may get in the way of the analytic process.

But from a database design perspective, the message is simple: follow those best practices!

Further Reading

You can also read more in prior posts.  For example:
You can also read more in my book, Star Schema: The Complete Reference.  If you use the links on this page to pick up a copy on Amazon, you will be helping support this blog.  

 It covers the best practices of dimensional design in depth. For example:

  • Grain, identifiers, keys and basic slow change techniques are covered in Chapter 3, "Stars and Cubes"
  • The place of summary data is covered in Chapter 14, "Derived Schemas" and Chapter 15, "Aggregates"
  • Conformance is covered in Chapter 5, "Conformed Dimensions"
  • Advanced slow change techniques are explored in Chapter 8, "More Slow Change Techniques"