Has Anyone Seen My Data?
How data requirements and data analysis impact the success of your project
The computing business used to be called “data processing” for a reason: all software applications create, consume, manipulate, or delete data. We can think of data as the glue that connects all the other requirement types; alternatively, functionality exists to process data. Both perspectives underscore the importance of exploring data considerations during requirements elicitation. Practice #8 in the book Software Requirements Essentials by Karl Wiegers and Candase Hokanson is to assess data concepts and relationships.
Understanding Data Objects and Their Relationships
Data elicitation, analysis, and management are not small tasks. However, the business analyst (BA) needs to understand all the data objects in their problem and solution spaces to be able to specify the correct set of functional and nonfunctional requirements. To gain—and communicate—that understanding, the BA will create multiple views of the data over time and for different audiences.
Begin your data exploration by acquiring a full list of data objects: the logical representation of system information. After identifying likely data objects (entities), you can create a data model to show the logical connections between them (relationships). The entity relationship diagram or ERD is a popular convention for drawing data models.
An ERD for a restaurant online ordering system might look like Figure 1. The entities appear in boxes. The lines show logical links between data objects, and the labels on the lines characterize the relationship.
Figure 1. An entity relationship diagram depicts all the data objects in a problem or solution space and their logical connections.
The numerical nature of each entity pair’s relationship—its cardinality—is shown on the line that connects the entities. There are several ERD notations available to show relationship cardinalities; this example illustrates the crow’s foot notation. Among others, possible cardinalities and their symbols include these:
In Figure 1, a customer account places zero or more orders, an order must contain one or more menu items, and a menu item can belong to zero or more orders. The BA could use this ERD during an elicitation review session to ask questions such as “Must an order always contain at least one item?” and “Under what conditions could a customer account have no delivery address?” Walk systematically through the data objects in the model to identify all their logical connections and verify the relationship cardinalities.
Refining the Data Understanding
Once you have the data objects in hand, look for functionality associated with them. A helpful acronym is CRUD: determine how an instance of each data object is Created, Read, Updated, or Deleted within the solution. Also look for ways that data objects are copied, listed, used, moved, and transformed, which leads to the much more amusing acronym CRUDCLUMT.
Make sure the necessary operations for each data object appear in process flows or use case descriptions. Look for data objects that are created but never used or stored, and for objects that are used by processes but are never explicitly read or created. Understand where each piece of data comes from and how the system inputs it. This analysis can reveal additional requirements about the data, possibly revealing more processes or use cases.
After you understand the data objects and their relationships, consider creating a data dictionary for each system in your solution. The ERD provides a high-level view of the data; the data dictionary supplies the details. Figure 2 shows a fragment of a data dictionary for two data objects from the restaurant online ordering system, Customer Account and Delivery Address. A data dictionary shows all the fields or attributes of each data object, along with various metadata about each field. Common metadata include data type, length, business rules, valid values, whether a field is required, and whether a field must have unique values.
Figure 2. A partial data dictionary for the restaurant online ordering site shows the attributes of the Customer Account and Delivery Address data objects.
Data dictionaries help align data requirements between systems. Carefully study the data types and field lengths for data items that systems exchange. Decide how to handle any type conversions and length mismatches. For example, if there is a length mismatch, determine whether the originating or receiving system should truncate the data or add pad characters to fit. If so, at which end should characters be cut or added? Such details can mean the difference between interfaces that work and those that cause data corruption or loss.
Use the data dictionary to drive elicitation for functional requirements, external interface requirements, and quality attributes. Also, for each data object, understand if updates to the data must be captured in real time (transaction-based) or if a batch-based update is sufficient. A daily update is likely acceptable for your credit score but not for your bank account balance. Each decision will demand different processes and functionality to enable it.
Once external systems are sending data into your system, be wary of introducing new required fields, especially on agile projects where data interfaces are built incrementally. Each new field will demand changes to the systems sending in data, either mapping into the newly required field or defaulting its value.
Finding Data Requirements Wherever They Are Hiding
Look for behind-the-scenes integrity and security requirements that impact your data objects. Your system’s end users won’t tell you about those types of requirements. You’ll need to get them from other stakeholders, such as a corporate data governance group. Regulatory and legal standards, such as Sarbanes-Oxley Act compliance and laws relating to protection of personally identifiable information, dictate many data security, retention, and audit logging requirements.
Solution or database architects may specify requirements to support data access and performance goals. As an example, Karl’s bank provides online access to monthly statements, but he must request statements older than two years from an offline archive. Obviously, the system includes some functionality to archive data periodically and to let users request and access older statements. No bank customer would ever present a requirement like this; they just want to see their statements.
Some people may see data requirements and data management as a technical activity best left to architects and engineers. However, the BA must understand the data objects in their problem space and the conceived solution, the relationships between those objects, and the corresponding data flows to be able to elicit the proper functional, quality, and external interface requirements. Without careful data elicitation and analysis, you may face—as Candase once did—fixing data length mismatches after the system’s launch. It wasn’t fun.
This article is adapted from Software Requirements Essentials: Core Practices for Successful Business Analysis by Karl Wiegers and Candase Hokanson and was published in Analyst’s Corner on Medium on May 9, 2023. Karl is the author of numerous books on software development and other topics, including Software Requirements (with Joy Beatty), Software Development Pearls, and The Thoughtless Design of Everyday Things. Candase is a business architect at ArgonDigital. She has written numerous articles on best practices in requirements management and agile product ownership.
When exploring requirements, it's easy to neglect data needs in favor of functionality. But all software applications create, consume, manipulate, or delete data. Practice #8 in the book "Software Requirements Essentials" is to assess data concepts and relationships. This article describes some approaches and techniques for eliciting and analyzing data requirements. You can download a checklist of questions for identifying data requirements from https://www.softwarereqs.com.