Electronic Health Records and the Need for Data Governance

The 2009 Health Information Technology for Economic and Clinical Health Act (also known as the 2009 HITECH Act) established the “meaningful use of interoperable electronic health records (EHRs) throughout the United States health care delivery system as a critical national goal use of interoperable electronic health records throughout the United States health care delivery system as a critical national goal.”[1] The use of the term “meaningful” is intended to motivate the electronification of healthcare data, and is qualified in terms of improving the quality of providing healthcare, engaging individuals to take active participation in their healthcare management, improve the coordination of care, and generally lead to improved public health.[2] Continue reading

Modeling Reality. Really… What I Implemented on my Summer Vacation

Dear Teacher:

Sorry my assignment was late. The dog ate my flash drive. Anyway, here it is.  (The assignment, not the drive.  Don’t be gross.)

-Joe, Grade 9

The essay assignment homework was to “Compare and contrast” the following:

  • Working software is the primary measure of progress.
  • Good data quality should take a back seat to no one.

Continue reading

Where Government Affects Data Governance. Financial Data Standards and Data Governance: A Dodd-Frank Example

A good example of the definition of a financial data standard is stated directly in the Dodd-Frank Act (section 154(b)(2)(A)) that requires the creation of a database listing financial companies. Of course, without a naming strategy, the unique identification of these companies would be unwieldy (if not impossible), and in 2010 a policy statement[1] issued by the OFR described the creation of a “Legal Entity Identifier,” (LEI)as a “universal standard for identifying parties to financial contracts that is established and implemented by private industry and other relevant stakeholders through a consensus process.”

One might believe that a legal entity identifier would benefit all the financial institutions. Accessing data about any financial company is improved when there is a means for unique identification, especially when it comes to unique counterparty identification. But we must also realize the inherent challenge of attempting to standardize an identifier that has probably been implemented in as many ways as there are financial organizations. Essentially, each firm has had to create its own methods for managing counterparty data, and some even have more than one way for representing entity data and store those representations in a number of different databases and systems.

There is certainly a value proposition in the adoption of the LEI, especially when the reduction in duplicated counterparty data sets can improve operational efficiency (at the very least) and reduce lost transactions, failed transactions, unsettled trades due to the inability to match counterparties, and ultimately provide much more accuracy and precision when it comes to risk assessment and evaluation of aggregate exposure (at the very best).

Yet even with the continued progress in defining and socializing the standard for a legal entity identifier, there may be some persistent challenges that, in the absence of operational data oversight and data governance, will continue to dog adoption efforts, such as:

  • Management of corporate lifetime duration for identifiers in the presence of mergers, acquisitions, and other corporate actions;
  • Maintaining coherence across the varying identifiers that exist both within a single institution as well as those shared across multiple financial institutions;
  • The creation and funding of an external party to issue and manage legal entity identifiers;
  • Expanding the set of standards beyond legal entities;
  • Adoption globally across the financial industry; and
  • Managing the plans and resources necessary internally for each institution to correctly transition to the use of the LEI.

If the LEI is an indicator of things to come down the pike, it would be wise for organizations to ramp up their internal data governance programs, since the increase in expectations for compliance with on-demand data calls or externally-defined standards will certainly drive increased respect internally for auditable and practical best practices in data management that are guided through data governance.

-By David Loshin. Note: Embarcadero Technologies, Inc. has contributed resources toward the development of this content.


[1]“Statement on Legal Entity Identification for Financial Contracts,”accessed via http://www.treasury.gov/initiatives/Documents/OFR-LEI_Policy_Statement-FINAL.PDF

Where Government Affects Data Governance. Dodd-Frank, Financial Stability, and the Use of Information

To round out our perspective on the regulatory demands for formal data governance (along with Basel and Solvency), it is worthwhile to draw some examples from the laws passed in the wake of the global economic contagion that erupted with the credit crunch in 2008-2009. The US Department of the Treasury has expressed concern about the financial industry’s reliance on quality data and information, having stated that policymakers and investors lacked sufficient data to anticipate emerging threats to financial stability or assess how shocks to one financial firm could impact the system as a whole.”[1]

As a result, the US Congress passed the Dodd?Frank Wall Street Reform and Consumer Protection Act (commonly referred to as “Dodd-Frank”), which incorporated reforms and directives to establish a means for early detection of financial issues and risks that could lead to a similar financial instability before those issues spiral out of control the way things did in the 2008-2009 time frame. One notable aspect of Dodd-Frank was the mandate for creating a new agency called the Office of Financial Research (OFR) to support a Financial Stability Oversight Council (FSOC), directing that agency to “improve the quality of financial data available to policymakers and facilitate more robust and sophisticated analysis of the financial system.”

The FSOC’s job is to monitor, investigate, and consider the severity of any potential risks to the US financial system, while the OFR is essentially tasked with overseeing the definitions of data standards and concept hierarchies, publishing reference data sets, and conducting a broad range of financial analyses. To enable this last item, the OFR is also directed to collect and analyze information from financial companies (banks, bank holding companies, nonbank financial companies). Consequently, the director of the OFR retains significant authority for data collection, and has subpoena power to request from any financial institution “any data needed to carry out the functions of the office.”

This is a powerful combination of two significant aspects of data governance. First, the desire for data collection for the purposes of analysis must rely on the ability to collect and blend data from different sources (i.e., companies) in a way that can make sense. This need for standardized data collected at the federal level implies standardized reporting at the firm level.  In order to harmonize data from the many different financial institutions, there must be policies and practices for representing, storing, and aggregating and reporting data about counterparties, financial products, and financial transactions.

This cannot be done without governing the data at the federal level, and in retrospect it probably cannot be done without a set of data governance policies to which each firm must subscribe and commit to observe. Only through the introduction of well-defined protocols for data sharing can the OFR to ensure any means of oversight of standards for data collection or submitting reports regarding any data associated with the wide diversity of financial products to be subjected to scrutiny, analysis, and proper risk assessment.

- By David Loshin. Note: Embarcadero Technologies, Inc. has contributed resources toward the development of this content.


Where Government Affects Data Governance. Solvency II: Data Governance and Data Quality.

The quality characteristics of the data used for compliant Solvency II reporting are not just spelled out in article 35 (as you may recall from my previous post). In fact, data quality is a recurring theme in the regulation. Consider these examples:

  • Article 38 specifies a condition that supervisory authorities have effective access to data (“accessibility”);
  • Article 48 notes that the insurance companies must have an effective actuarial function to assess the sufficiency and quality of data used in calculation of technical provisions (general quality);
  • Article 76 specifies that the calculation of technical provisions be consistent with generally available data (“consistency” or “accuracy”);
  • Article 84 refers to the “adequacy of the underlying statistical data used.”

In actuality, these data quality requirements are more fundamentally addressed within the regulation in terms of oversight and governance of information. This is clearly stated in Article 82:

“Member States shall ensure that insurance and reinsurance undertakings have internal processes and procedures in place to ensure the appropriateness, completeness and accuracy of the data used in the calculation of their technical provisions.”

The directive here goes beyond the quality of the data sets themselves. Rather, it focuses on the processes and procedures used to ensure that the quality of the data is sufficient to meet the specific needs associated with calculating the minimum capital requirements (along with the other technical provisions). This is then echoed in section (f) of article 86, directing the Commission to adopt measures for implementation, specifically including “the standards to be met with respect to ensuring the appropriateness, completeness and accuracy of the data used in the calculation of technical provisions…”

In retrospect, the need for data governance transcends the expectations for data quality, even if data quality is a recurring theme. Of course, governance is the key theme of the second pillar, so the need for data governance as part of the pillar should not be a surprise. However, the need for data governance also reflects the key themes of the first and third pillars. When it comes to quantification of risk, the technical provisions can not be effectively of believably calculated without ensuring some control over the accessibility and level of trust in the accuracy of the data used. And disclosure fundamentally hinges on lineage and assurance of trust in what is being reported. The conclusion is that it would be challenging to comply with Solvency II without instituting sound data management practices overseen within a data governance program.

-By David Loshin. Note: Embarcadero Technologies, Inc. has contributed resources toward the development of this content.

Where Government Affects Data Governance. Solvency II and Data Governance – A Match Made in Brussels?

In the insurance industry, the mandates of Solvency II regulations imposed by the European Union echo those directed by the Basel accords for the financial services industry. Solvency II frames an approach for understanding and managing risk, and focuses on the determination of capital requirements. Like Basel, Solvency II relies on three “pillars,” one centered on quantification of risk (including minimum capital requirements and the requirements to ensure that the organization can absorb unforeseen losses at a very high level of confidence), one of governance (and in particular, approaches for internally-managed and governed risk and solvency assessment), and the third for disclosure (reflected in the requirements for disclosure, reporting, and transparency).

And like Basel (as we will explore), Solvency II demands good data management practices encompassed by a sound data governance framework.

The provision of high quality, accessible information is a key concept in the regulation. For example, article 35 starts with “Member States shall require insurance and reinsurance undertakings to submit to the supervisory authorities the information which is necessary for the purposes of supervision.” This article subsequently qualifies what is meant by this demand for information by describing additional expectations on behalf of the insurance companies. It specifies that the supervisory authorities may determine the “nature, scope, and format” of requested information, which can comprise “qualitative or quantitative elements,” “historic, current or prospective elements,” as well as “data from internal or external sources.” Lastly, article 35 states that the information “must reflect the nature, scale and complexity of the business,” “must be accessible, complete in all material respects, comparable and consistent over time,” and “must be relevant, reliable and comprehensible.”

Proper observance of this aspect of the regulation means that the company must be able to materialize essentially any type of information, either historical of future (!) data elements, do so in a format as specified by the requesting supervisory authority, and ensure the quality characteristics of the information, especially with respect to consistency, reliability, and availability. Clearly, data governance is destined to play a well-defined role when it comes to Solvency II compliance.

- By David Loshin. Note: Embarcadero Technologies, Inc. has contributed resources toward the development of this content.

Register for GREAT Data Governance Talk — Starts in about an hour

David Loshin just finished a talk about Data Governance and is doing another one in about an hour. The content is fantastic, and you won’t want to miss it. Register here

He is also blogging about Data Governance on this page every week and is giving talks like this one, throughout 2012.

Check it out.

Kamille

Modeling Reality. Really. Homonyms, Identifiers, and Half-Baked In-Progress Models

Calibrate your irony meter: Everyone acknowledges that a high-fidelity data model is the result of hard work, yet most “best practices” of data modeling apply only to models that are already quite good. Shouldn’t best practices apply to models that suck?

More generally (and less dramatically), shouldn’t best practices acknowledge the early and intermediate stages of creating a data model? Isn’t that when a modeler really needs to be at his or her best—while working on a model that is poor and needs plenty of improvement? More than at any other time, I benefit from best practices when the wheels are falling off the bus: when I am confused, the users are irritated, and the in-progress model seems only to highlight the absence of a consensus about the data phenomenon the users purportedly agree upon.

In an earlier post, I alluded to this with these words:“Best practices in the kitchen focus on what happens at the stove, not merely how things are arranged on the plate. Why should modeling be any different?”

Alas, I see any number of best practices that ignore the nitty-gritty reality of producing a model with users. Today I’ll focus on one in particular:The oft-cited canard that a conceptual model does not require identifiers. And I do mean oft-cited; David Hay repeated this rule just last month in a discussion in another data-modeling forum:

“…adding an identifier composed of attributes and/or relationships can be done, but is not necessary in a conceptual model.”

I wish I lived in such a well-behaved universe, but I don’t and neither do you. Identifiers must be part of conceptual modeling for several reasons. First, identifiers are part of the user experience of information. Users employ identifiers to distinguish category members from each other. Many software professional believe otherwise—that identifiers are artifacts of software exclusively. That couldn’t possibly be true, because identifiers predate computers. Think of license plates. Come to think of it, license plates not only predate computers, they predate automobiles; the first license plates were used for bicycles.

As if that weren’t enough—and it ought to be because during conceptual data modeling, honoring the user experience is more important than everything else—identifiers can uncover and remedy the homonym problem, which arises frequently during the early and intermediate stages of conceptual modeling. This is not merely a pleasant side effect of identifiers; it helps to establish consensus about the meaning of business terms, which is one of the primary goals of conceptual data modeling.

The homonym problem occurs when two different categories are given the same name. An example:

  • “Which flights have profit margins above three percent?”
  • “Which flights were cancelled because of Hurricane Galinda?”

With these two questions, users are employing one word (“flight”) to refer to two categories:

  •  The category whose members include these two:
    • Flight 877, daily from Tokyo to San Francisco
    • Flight 295, weekdays from Boston to Paris
  • The category whose members include these three:
    • Flight 877 on Tuesday 06 March 2012
    • Flight 877 on Wednesday 07 March 2012
    • Flight 295 on Tuesday 06 March 2012

In this case, the word “flight” is a homonym—a single word with multiple meanings. This is a very common linguistic phenomenon, and it almost always sows confusion. Including identifiers as a fundamental, non-optional part of conceptual models goes a long way to uncovering and remedying the problem.For example, the two instances of the word flight would have separate identifiers that would make the differences between the categories manifestly obvious.If the two categories are already evident on the draft model as two entities either of which could be named “flight,” the use of semantically meaningful identifiers would clarify.If the draft data model shows only one entity named with the overloaded word “flight,” the discussion about candidate identifiers can reveal the homonym problem and the attendant need for two separate entities.

The typical rhetoric disputing the need for conceptual identifiers is easily refuted:

Claim: You don’t need conceptual identifiers because they are artifacts of software.

  • Rebuttal: Identifiers are an information phenomenon, not a technology phenomenon.

Claim: You don’t need identifiers because users, when forced to be explicit, will avoid the homonym problem all by themselves.

  • Rebuttal: There are many counterexamples of this, including the U.S. Supreme Court case Gutierrez v. Ada, in which a homonym problem involving multiple meanings of the word “election” caused ambiguity in the vote-counting process, even though that process was carefully designed and formally expressed in legislation that sought to eliminate ambiguity.

Claim: You don’t need identifiers because natural linguistic context will resolve ambiguities.

  • Rebuttal: Context can resolve ambiguities between widely disparate concepts, such as flight (of stairs) and flight (of an airplane) and flight (of a fugitive). But on a data model, most pairs of homonym-candidate entities are not merely close, but adjacent—separated by a one-many relationship.This occurs especially often for planned vs. actual phenomena and type vs. instance phenomena.

One other claim I sometimes hear: You don’t need identifiers until it becomes obvious that you do need them. After the homonym problem arises, start insisting on identifiers.That is a risky bit of business. Identifiers don’t merely help to remedy the homonym problem, they help detect it. Without identifiers, many instances of the homonym problem will not be obvious until it is too late and you find yourself subject to the vagaries of post-deployment data-integration programs or even (shudder) the judicial branch.

-Joe Maguire

Co-author, Mastering Data Modeling: A User-Driven Approach

 

Some disclosure: Blog posts here will be written by me or my colleague Peter O’Kelly. Although Embarcadero will compensate us for these posts, we are solely responsible for their content. (Proof: We are unconstrained. The best practices offered here might or might not align with what you’ll find elsewhere on the ER/Studio site, in ER/Studio documentation, or in Embarcadero-sponsored whitepapers.)