Design Structure Matrix


At my job, my coworkers and I grapple with a lot of unintended complexity. We created a lot of it because we didn’t know how to avoid it, and after it’s had time to grow and coalesce in the system it starts to slow you down in ways that you might not even notice. Unfortunately, it’s hard to really identify where the bottlenecks are and even harder to know how to go about fixing them because we just don’t have the tools or the knowledge on our team. That’s part of the reason why I signed up for a series of MITx courses on Model Based Systems Engineering (MBSE), the first of which just finished a few weeks ago.

One of the tools that we were introduced to during the course is the Design Structure Matrix (DSM). DSMs represent systems architectures, organizations, and processes by showing the connections between elements as entries in a square, NxN matrix. Computer scientists and mathematicians are familiar with this as an adjacency matrix - it’s the same basic concept. For example, here’s a DSM I created for my class:

This is a basic DSM for a made-up system representing a public transportation bus. It represents the major subsystems in the imaginary bus in each row and column, and the connections between them as entries in the off-diagonal cells. Rows represent inputs - if there is a mark in row A, column B, it means that A consumes an input from B. Conversely, this also means that B outputs something to A. This can be mass (like fuel or air), energy (electric or mechanical), or information (control signals). In most product architecture DSMs like this one all of the flows will be symmetric about the diagonal, representing the fact that in many physical systems a connection goes both ways1.

As you can see, there are some regions with a relatively high number of marks and some regions with a low number of them. If we reorganize the matrix to try to group marks together, we can reveal some underlying structures in the system:

This process of grouping subsystems is called modularization, and its goal is to maximize the number of connections within a group or module and minimize the number of connections outside of them. Some modules, like the battery and power transmission module here, are relatively independent of others. Others such as the electrical cables connect broadly across many other modules and form the backbone infrastructure of the whole product.

When these connecting components change they induce changes in many other subsystems, radiating out along the dependency lines. The scope of a change to given component is easily visible in the DSM from the degree of connectedness it has. Components with low connectivity are easier to change in isolation and will require less engineering effort to modify, while highly-connected subsystems such as communications buses often have a radiating effect which induces changes in many distant reaches of the assembly.

A design which minimizes the number of components that need to respond to a change in other parts is more flexible, more scalable, and more maintainable. It is often at the boundary layers between components where the hardest engineering work is done and where the strongest guarantees are required. In software, we often specify these interfaces using language constructs such as type signatures and visibility annotations. Some languages also allow you to specify the guarantees and contracts that you provide to the callers either by encoding them in the type system (Haskell, Agda, Coq) or by providing explicit contracts (Eiffel, SPARK Ada). These techniques can help to minimize the exposed surface area of your subsystem, and make the change propagation predictable and safe.

Now that I’ve had some time to play around with DSMs in software I’m finding more and more uses for them. Encoding data like risk measures, latency ceilings, or component ownership can reveal a wealth of data about your system and help you to identify ways to make it more modular. I highly recommend that any engineer interested in analyzing their product’s architecture give them a try.

  1. This is generally not the case in software, where visibility is often unidirectional over function invocations ↩︎