The AI Productivity Paradox: Immediate Gains vs. Long-Term Risks

AI tools are delivering real efficiency wins, but they’re also quietly reshaping how workers think, what skills atrophy, and where quality unexpectedly breaks down. Here’s what every business leader needs to understand before going all-in.

The AI Productivity Paradox: a framework for understanding short-term efficiency gains alongside emerging cognitive and organizational risks.

There’s a quiet tension building inside AI-adopting organizations. On one side: real, measurable productivity gains that no serious executive should dismiss. On the other: a set of slower-moving, harder-to-see risks that, left unmanaged, could erode the very capabilities organizations are counting on AI to amplify.

This tension is what researchers and strategists are calling the AI Productivity Paradox and it plays out across three interconnected domains: economic and labor dynamics, cognitive and quality shifts, and the governance frameworks organizations need to navigate both.

The Economic Picture: Real Gains, but Not Instant

Field trials across writing, customer support, and software development consistently show reductions in task completion time of 15% to 50% compared to standard workflows. That’s not marginal, or organizations handling high volumes of routine knowledge work, the compounding effect is substantial.

But those gains don’t show up immediately on the macro balance sheet. The Productivity J-Curve explains why: in the short term, organizations must absorb the costs of training, workflow redesign, and integration before realizing broader economic returns. Leaders who expect instant ROI are often disappointed, and sometimes abandon AI initiatives right before the curve bends upward.

15–50%

Task Efficiency Gains

Observed across writing, support, and coding workflows in field trials.

J-Curve

Delayed Macro Growth

Short-term investment dip precedes longer-term productivity payoff.

Realloc.

Not Mass Displacement

Labor markets show skill compression and task reallocation, not widespread job loss.

The labor story is similarly nuanced. Rather than triggering the mass displacement many feared, current market data points to task reallocation and skill compression, workers shifting away from routine production tasks and toward higher-order judgment, verification, and integration work. The jobs aren’t disappearing; they’re changing shape.

The Cognitive Risks Nobody Is Talking About Enough

The second domain is where the paradox gets genuinely uncomfortable. Even as AI accelerates output, it may be slowly degrading the underlying human capabilities organizations depend on.

“EEG studies are detecting weakened brain connectivity and reduced cognitive engagement in regular LLM users, a phenomenon researchers are calling ‘cognitive debt.'”

The mechanism is straightforward: when AI handles the heavy cognitive lifting, such as drafting, reasoning, and synthesis, users engage less deeply with the material. Over time, the neural pathways for critical analysis and creative problem-solving get less exercise. This isn’t theoretical. It’s showing up in neurological data.

There’s also a troubling dynamic around confidence. Research shows that high confidence in AI output actually reduces critical reflection; users who trust the tool most are the ones who check it least. Paradoxically, workers with stronger domain expertise and higher self-confidence engage more critically with AI outputs, applying greater scrutiny and effort to verification. The implication: organizations may want to invest in building genuine expertise rather than assuming AI can substitute for it.

The Jagged Frontier: Where AI Succeeds and Where It Fails

One of the most practically important insights for teams deploying AI is the Jagged Technological Frontier, as researchers call it. AI doesn’t fail gradually or predictably; it excels at surprisingly complex tasks, then fails unpredictably on seemingly simple ones.

A system that can draft a sophisticated legal brief may stumble on a straightforward date calculation. A coding assistant that generates elegant architecture may introduce subtle bugs in basic conditional logic. This irregularity makes AI harder to supervise than traditional software, because failure modes don’t follow intuitive patterns. Effective oversight requires humans who understand both the domain and the tool’s specific failure landscape.

Key Terms: A Working Glossary

Glossary of Key Concepts

Cognitive Debt: The gradual erosion of critical thinking and analytical capability that occurs when workers habitually offload complex reasoning to AI. Identified through EEG studies showing reduced brain connectivity in regular LLM users.

The Productivity J-Curve: The pattern where AI adoption initially appears to slow macro productivity growth due to training, integration, and redesign costs before generating compounding returns as workflows mature.

The Jagged Technological Frontier: The uneven capability profile of AI systems, which perform exceptionally well on some complex tasks while failing unpredictably on seemingly simpler ones. Makes AI harder to supervise than traditional tools.

Task Stewardship: The emerging human role in AI-augmented workflows: shifting from direct material production to critical verification, quality integration, and strategic oversight of AI-generated outputs.

Skill Compression: The narrowing of human skill sets observed as AI absorbs routine tasks. Workers increasingly perform a smaller range of higher-level functions, with implications for long-term workforce capability and adaptability.

LLM (Large Language Model): The class of AI systems underlying tools like ChatGPT, Claude, and Gemini. Trained on vast text datasets to generate, analyze, and transform language, the engine powering most current enterprise AI productivity tools.

Pre-Generation Setup: The first step in the 3-Step Validation System: defining output specifications and providing sufficient context before prompting AI, to reduce hallucinations and anchor outputs to accurate information.

Context Window: The amount of text an AI model can “see” and process at once. Providing rich context within this window, such as background documents, specifications, and examples, directly improves output quality and reduces error rates.

A Framework for Sustainable AI Use

The infographic’s 3-Step Validation System offers a practical governance structure that addresses both the quality risks and the cognitive risks simultaneously:

Step 1: Pre-Generation Setup

Define output specifications clearly and load the AI’s context window with grounding information before generating anything. This step dramatically reduces hallucinations and misalignments, and it requires the human to engage meaningfully with the task requirements, counteracting cognitive disengagement.

Step 2: Real-Time & Post-Analysis

Use iterative prompting rather than accepting first outputs, and verify all deliverables against objective criteria or domain expertise. This is where task stewardship happens in practice, and where critical reflection must be deliberately preserved against the pull of over-reliance.

Step 3: Performance Monitoring

Track downstream outcomes, brand impact, SEO performance, error rates, and customer responses to close the feedback loop and continuously refine prompting and verification processes. Organizations that treat AI outputs as the end of the workflow, rather than an input to be refined and measured, will accumulate quality debt they won’t see until it’s costly.

“The organizations that will win with AI aren’t those who use it most; they’re those who’ve built the governance, expertise, and culture to use it best.”

The AI Productivity Paradox isn’t an argument against adopting AI tools. The efficiency gains are real, and the competitive pressure to act is legitimate. It’s an argument for how to adopt them: with clear-eyed awareness of the cognitive and quality risks, deliberate governance frameworks, and sustained investment in the human expertise that makes AI outputs actually valuable.

Organizations that manage this balance well will compound both the AI gains and their human capital. Those who don’t will find themselves more efficient at the surface while quietly hollowing out the judgment capabilities they need for anything genuinely difficult.




AI’s $2 Trillion Moment—and the Hidden Costs We’re Ignoring

Spending on artificial intelligence is expected to cross the $2 trillion mark by 2026. This massive investment signals that AI is no longer a peripheral experiment but a central part of how global businesses function. Companies are quickly moving past basic chatbots toward agentic systems that can plan and execute complex tasks with very little human help. About 62% of organizations are already testing these autonomous assistants to see how they can improve efficiency. While many people worry about robots taking their jobs, the data suggests a more complicated story. The World Economic Forum predicts that while 92 million roles might disappear by 2030, technology will help create 170 million new ones. This results in a net growth of 78 million jobs, though the transition will likely be quite messy.

For the people actually doing the work, the day-to-day is changing in a major way. We are seeing a shift where knowledge workers move from being creators of content to being stewards of AI systems. This means spending less time on basic execution and more time on verifying and integrating what the AI produces. However, this comes with a strange productivity paradox. Some developers finish their tasks 26% faster with AI, but others actually take 19% longer because they spend so much time fixing mistakes the software made. There is also a real danger of producing what experts call workslop: content that looks good at first glance but lacks any real substance. About 40% of employees have already received this kind of low quality work from colleagues, and it usually takes about two hours to fix each instance.

There are also deeper concerns about what this does to our mental sharpness. A study from the MIT Media Lab suggests that relying too much on AI can lead to cognitive debt, where our brain connectivity actually weakens because we are offloading our thinking. This is particularly true for younger workers, who are seeing a 16% decline in hiring for entry level roles as AI takes over basic tasks. Beyond the human element, businesses are also struggling with a confusing maze of global rules. The EU AI Act and different state laws in the US often conflict with one another, making it a nightmare for international companies to stay compliant.

Finally, the environmental cost of all this computing power is becoming impossible to ignore. Training just one large model can produce as much carbon as several cars do over their entire lifetimes. This is leading to a new push for Green AI, focusing on energy-efficient hardware such as neuromorphic chips that mimic the human brain. As we head into 2026, the real winners will not be the companies with the most AI, but the ones who can balance speed with high-quality human judgment.

Sources




From Connectomes to Digital Twins: Forecasting the Brain in Real Time

Mapping the Living Mind: From Wiring Diagrams to Neural Forecasting

Scientists have spent years trying to figure out how the biological brain works by looking at it from two different angles. One group has focused on connectomics, which is basically mapping the physical wiring of the brain. The other group has looked at functional imaging, or watching neurons fire in real time. We are now seeing these two fields merge through advanced AI to create what researchers call a digital twin of the brain. This move goes beyond just taking high-resolution pictures. It is about building models that can actually predict what a brain will do next.

Building the Physical Maps

The foundation of this work is the wiring diagram. We recently saw a massive milestone with the completion of the central brain connectome for the adult fruit fly, Drosophila melanogaster. This map includes more than 125,000 neurons and 50 million synaptic connections. While a fly brain is small, the data is incredibly complex. A single neuron might connect to hundreds of others, making it very difficult to understand how these paths lead to specific behaviors.

We are seeing similar progress in humans too. Researchers recently reconstructed a tiny fragment of the human cerebral cortex. Even though it was only one cubic millimeter in size, it required over a petabyte of data to map at a nanoscale resolution. These physical maps have shown us things we never knew existed, like neurons that form unusual triangular shapes. However, as many experts have pointed out, a connectome is just a map. It does not tell us how the “traffic” of neural activity moves through those wires.

Predicting the Traffic of the Brain

To solve this, researchers are turning to neural forecasting. One of the most important tools in this area is the Zebrafish Activity Prediction Benchmark, or ZAPBench. It uses light sheet microscopy to record the activity of over 70,000 neurons in larval zebrafish. This is currently the only vertebrate where we can see the whole brain active at once at such a high resolution.

By using models originally built for weather forecasting, like those in WeatherBench, scientists are testing how well AI can predict the next 30 seconds of a brain’s activity based on just a few seconds of history. This is a massive shift in how we study neuroscience. Instead of just describing what happened, we are trying to forecast what will happen.

Several new techniques are making this possible:

  • Volumetric Video Models: Instead of just looking at individual neuron signals, new models like 4D UNets look at the raw 3D video over time. This helps the AI understand the spatial relationships between neurons that other methods might miss.
  • Foundation Models: Just like the models that power modern chat tools, new foundation models of the mouse visual cortex are being trained on huge amounts of data. These models can be applied to new animals they have never seen before, successfully predicting how their neurons will react to new videos.
  • Classification Strategies: New architectures like QuantFormer are changing the way we think about brain signals. Instead of trying to predict a continuous wave of activity, they treat neural spikes like a classification problem. This has proven much more effective at capturing the quick, sparse bursts of energy that define how neurons communicate.

Why Global Brain States Matter

One of the biggest hurdles in this research is that a single neuron does not act alone. Its behavior is often influenced by the global state of the brain, such as whether an animal is alert or performing a specific task. A model called POCO, which stands for Population Conditioned forecaster, handles this by looking at local neuron dynamics while also considering the overall state of the entire population. This helps the model understand how shared brain structures influence individual cells.

Future Applications and Interventions

The goal of this research is not just to understand the brain but to interact with it. If we can forecast neural activity in real time, we can develop systems that intervene before something goes wrong. Some models can now run in as little as 3.5 milliseconds. This speed could allow for closed-loop optogenetic interventions, where light is used to stimulate neurons to stop a seizure or a specific craving before the person even realizes it is happening.

We are moving into an era where we can see inside ourselves with the same clarity that we see the world around us. While managing petabytes of data is a major challenge, combining physical maps with AI forecasting brings us much closer to a true mechanistic understanding of intelligence.


This post was written with the help of AI for analysis, using the NotebookLM shared resource here: https://notebooklm.google.com/notebook/74dc7f14-54cb-481b-9ee8-8347a6f5cba1

References and Research Links




Comparing Artificial Intelligence (AI), Machine Learning (ML) and Deep Learning (DL)

Introduction

Artificial intelligence (AI), machine learning (ML), and deep learning (DL) are terms that are commonly used in the technology industry. While these terms are often used interchangeably, they are not the same thing. Each technology has unique features, advantages, and disadvantages. In this blog post, we will explain the differences between AI, ML, and DL and provide supporting citations from authoritative sources.

Artificial Intelligence (AI) AI refers to the ability of machines to perform tasks that typically require human intelligence. AI is divided into two categories: narrow or weak AI and general or strong AI. Narrow AI is designed to perform specific tasks, such as speech recognition, image recognition, and natural language processing. On the other hand, general AI is designed to perform any intellectual task that a human can do. General AI systems can learn from experience and adapt to new situations. However, as of now, there is no truly general AI in existence, and most AI applications are narrow AI systems.

Machine Learning (ML) ML is a subset of AI that involves training machines to learn from data without being explicitly programmed. In other words, ML algorithms can automatically learn and improve from experience without human intervention. ML algorithms are designed to identify patterns in data and make predictions based on those patterns. The process of training an ML model involves providing it with a large dataset, and then the algorithm will learn to recognize patterns in the data and make predictions based on those patterns.

Deep Learning (DL) DL is a subset of ML that involves training deep neural networks. Neural networks are computing systems inspired by the structure and function of the human brain. These networks consist of layers of interconnected nodes, each of which performs a mathematical operation on the input data. Deep neural networks have multiple layers, which allows them to learn more complex representations of the input data. The process of training a deep neural network involves providing it with a large dataset and adjusting the weights of the nodes to minimize the error between the predicted output and the actual output.

Differences between AI, ML, and DL

Now that we have a basic understanding of AI, ML, and DL, let’s take a closer look at the differences between these technologies.

  1. Complexity of Tasks

    • AI systems are designed to perform tasks that typically require human intelligence, such as speech recognition, image recognition, and natural language processing. ML algorithms are designed to identify patterns in data and make predictions based on those patterns. DL algorithms, on the other hand, are designed to learn from large datasets and can perform tasks that are too complex for traditional ML algorithms. For example, DL algorithms can be used to detect fraud in financial transactions, diagnose medical conditions, and even play complex games such as Go.

  2. Type of Learning

    • While both ML and DL involve training machines to learn from data, they differ in the type of learning. ML algorithms use supervised, unsupervised, or semi-supervised learning, depending on the problem they are trying to solve. In supervised learning, the algorithm is provided with labeled data, and it learns to make predictions based on that data. In unsupervised learning, the algorithm is provided with unlabeled data, and it learns to identify patterns in the data. In semi-supervised learning, the algorithm is provided with both labeled and unlabeled data. DL algorithms, on the other hand, use a technique called backpropagation to adjust the weights of the nodes in the neural network. This technique involves computing the error between the predicted output and the actual output and then adjusting the weights to minimize that error.

  3. Training Data Size

    • The size of the training data also differs between AI, ML, and DL. AI systems typically require a large amount of data to learn and perform well. ML algorithms can be trained with smaller datasets than AI systems, but still require a significant amount of data. DL algorithms, however, require large datasets to train deep neural networks. The larger the dataset, the better the performance of the DL algorithm.

  4. Hardware Requirements

    • DL algorithms require significant computational power and memory to train deep neural networks. As a result, DL algorithms require specialized hardware such as graphics processing units (GPUs) and tensor processing units (TPUs) to achieve high performance. ML algorithms, on the other hand, can be trained on a standard computer.

  5. Interpretability

    • Another key difference between AI, ML, and DL is interpretability. AI systems are typically rule-based, and the rules can be easily understood by humans. ML algorithms can be more difficult to interpret, as they learn from data and do not necessarily follow explicit rules. DL algorithms are even more difficult to interpret, as deep neural networks can have millions of parameters and can learn complex relationships between the input and output data.

Conclusion

In conclusion, AI, ML, and DL are all related but distinct technologies with unique features, advantages, and disadvantages. AI refers to machines that can perform tasks that typically require human intelligence, while ML involves training machines to learn from data without being explicitly programmed. DL is a subset of ML that involves training deep neural networks. The key differences between these technologies are the complexity of tasks they can perform, the type of learning they use, the size of the training data, the hardware requirements, and the interpretability of the results. As AI, ML, and DL continue to evolve, they will play an increasingly important role in many aspects of our lives, from healthcare to finance to entertainment.

References:

  • Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.
  • Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and prospects. Science, 349(6245), 255-260.
  • Kelleher, J. D., Tierney, B., & Tierney, B. (2018). Data science: An introduction. CRC Press.
  • LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444.
  • McCarthy, J., Minsky, M. L., Rochester, N., & Shannon, C. E. (2006). A proposal for the Dartmouth summer research project on artificial intelligence, August 31, 1955. AI magazine, 27(4), 12-14.



Digital Twin – exploring the basics

The concept of digital twins is not new, but rather built on ideas that have been explored for the last couple of decades. The technology (compute power, data management & analytics, etc..) and thinking (increasing regulatory and community acceptance of digital approaches to science) have finally hit an inflection point that makes in silico modeling attainable in a cost effective manner.

What this now unlocks is a new opportunity set in the form of machine accessible data, as well as integration of the data sets / ontologies across the target systems / interactions. The need to get to a standardized mechanism to make these data available is tied to the FAIR Data work, and an important dimension to Digital Twin.

Digital twins vs. simulations
Although simulations and digital twins both utilize digital models to replicate a system’s various processes, a digital twin is actually a virtual environment, which makes it considerably richer for study. The difference between digital twin and simulation is largely a matter of scale: While a simulation typically studies one particular process, a digital twin can itself run any number of useful simulations in order to study multiple processes.

Source: IBM , What is a Digital Twin

At it’s heart, the idea of a digital twin is to reproduce a system in a “runnable” computer model. This oversimplifies the idea, but is a useful construct to think about the problem space and the opportunity it presents. If you can take a scientific instrument, and fully model it in silico, you can then run data sets through it virtually – this makes the assumption that both the inbound and outbound data are available in a machine usable format – something that is tied to this work.

Digital twin is an interdisciplinary research field which includes engineering, computer science, automation and control, and so on. But due to the multidisciplinary nature of the field, it also touches on materials science, communication, operations management, robotics, medicine and other disciplines. A keyword analysis indicates that digital twin, ‘smart manufacturing’, ‘big data’, ‘cyber-physical system’, and ‘digital economy’ are closely related fields.

Source: “Innovations in digital twin reserach” from Nature Portfolio

The article in nature.com is an interesting piece in that it ties together the many dimensions in this field of research. We can’t think of “Digital Twin” as a single entity opportunity, rather to fully realize the potential, we need to look at it as a part of an emerging “virtual capability ecosystem” with applications back to the real world. The value is realized in lower long term costs with increased innovation driven by reduced cost and cycle times, accompanied by increases in application of AI / ML on these models to gain targeted insights that more sharply focus the bench work.

Track the past and help predict the future of any connected environment

Source: Azure Digital Twins

The ability to create learning models for these Digital Twins will improve the accuracy and usefulness of the models over time, and that feedback loop will be a critical part of design. While the industry is maturing, we are seeing more vendors coming to the table with solutions in this space. One of the interesting things to watch is how we as an industry continue to drive open standards in support of these ideas to avoid the traps of “vendor lock in” that were so prevalent in the past.




Algorithms for decision making: Free book download from MIT

MIT press has provided a free book on Algorithms for decision making. You can download it from MIT Press here, or alternatively it is available from this site if the original link fails.

From the data science website:

The book takes an agent based approach
An agent is an entity that acts based on observations of its environment. Agents may be physical entities, like humans or robots, or they may be nonphysical entities,such as decision support systems that are implemented entirely in software. The interaction between the agent and the environment follows an observe-act cycle or loop.

  • The agent at time t receives an observation of the environment
  • Observations are often incomplete or noisy;
  • Based in the inputs, the agent then chooses an action at through some decision process.
  • This action, such as sounding an alert, may have a nondeterministic effect on the environment.
  • The book focusses on agents that interact intelligently to achieve their objectives over time.
  • Given the past sequence of observations and knowledge about the environment, the agent must choose an action at that best achieves its objectives in the presence of various sources of uncertainty including:
  1. outcome uncertainty, where the effects of our actions are uncertain,
  2. model uncertainty, where our model of the problem is uncertain,
  3. 3. state uncertainty, where the true state of the environment is uncertain, and
  4. interaction uncertainty, where the behavior of the other agents interacting in the environment is uncertain.

The book is organized around these four sources of uncertainty.

Making decisions in the presence of uncertainty is central to the field of artificial intelligence




Ontologies as applied to FAIR data

(The I in FAIR)

Ontologies provide a common structure to bring disparate data together – for this post I will refer to the definition of Ontology from Tom Gruber below – emphasis added by me. Note the last highlighted statement as a critical bit with significant implications in the implementation of systems in support of scientific processes. Having led and survived many data and systems integration efforts over the years, one of the most challenging aspects is hidden in this last statement. Changing data format, naming, etc… at the source is often met with almost religious fervor as change had wide-ranging implications to linked analysis, and multiple stakeholders have disparate needs or views of the data in question. The idea of an abstraction layer to bring these data together is nothing new, and this approach is a natural evolution in my mind. We are recognizing as an industry that isolated data is useful in context, but far more powerful when shared. To attain that goal, we need a common vocabulary and structure – enter the domain ontologies we can map to.

In the context of computer and information sciences, an ontology defines a set of representational primitives with which to model a domain of knowledge or discourse. The representational primitives are typically classes (or sets), attributes (or properties), and relationships (or relations among class members). The definitions of the
representational primitives include information about their meaning and constraints on their logically consistent application. In the context of database systems, ontology
can be viewed as a level of abstraction of data models, analogous to hierarchical and relational models, but intended for modeling knowledge about individuals, their attributes, and their relationships to other individuals
. Ontologies are typically specified in languages that allow abstraction away from data structures and implementation strategies; in practice, the languages of ontologies are closer in expressive power to first-order logic than languages used to model databases. For this reason, ontologies are said to be at the “semantic” level, whereas database schema are models of data at the “logical” or “physical” level. Due to their independence from lower level data models, ontologies are used for integrating heterogeneous databases, enabling interoperability among disparate systems, and specifying interfaces to independent, knowledge-based services.

https://tomgruber.org/writing/definition-of-ontology.pdf

The ontology provides a navigable structure to the data relationships that will be consistent across all sources in scope of reference. This is the critical bit to derive value from the data – moving it from isolated to interoperable, and supporting the rest of the FAIR principles. Access control is often a critical bit when joining / sharing data, especially anything that can be used to form a conclusion that may be subject to challenge or reinterpretation absent context. Ontology based access can be used to support these access controls given the proper structure. While outside the scope of this surface level post, you can read more from MIT Press Direct here on that topic.

http://www.semantic-web-journal.net/system/files/swj2523.pdf

Mapping these ontologies and related data sets to a graph database and unlocking the power of the relationship hierarchy inferred through the ontology mapping, secured through the same, provides a rich foundation to build a query and interaction layer. There are challenges to be solved throughout this process, and this posts scratches the surface and provides some context / links, but it does help frame jumping off point for these ideas along with connections to papers and resources with the “rest of the story” as Paul Harvey would say.

References: 

  1. Tom Gruber (2008), Ontology. Entry in the Encyclopedia of Database Systems, Ling Liu and M. Tamer Özsu (Eds.), Springer-Verlag, 2009. https://tomgruber.org/writing/definition-of-ontology
  2. Giancarlo Guizzardi; Ontology, Ontologies and the “I” of FAIR. Data Intelligence 2020; 2 (1-2): 181–191. doi: https://doi.org/10.1162/dint_a_00040
  3. Poveda-Villalón, María & Espinoza-Arias, Paola & Garijo, Daniel & Corcho, Oscar. (2020). Coming to Terms with FAIR Ontologies. https://www.researchgate.net/publication/344042645_Coming_to_Terms_with_FAIR_Ontologies
    1. Direct link in case above link fails
  4. Francesco Beretta, 06/30/2020. A challenge for historical research: making data FAIR using a collaborative ontology management environment (OntoME) http://www.semantic-web-journal.net/content/challenge-historical-research-making-data-fair-using-collaborative-ontology-management-0
    1. Direct link to paper in case above link fails
  5. Christopher Brewster, Barry Nouwt, Stephan Raaijmakers, Jack Verhoosel; Ontology-based Access Control for FAIR Data. Data Intelligence 2020; 2 (1-2): 66–77. doi: https://doi.org/10.1162/dint_a_00029
  6. Tim Berners-Lee, Date: 2006-07-27, last change: $Date: 2009/06/18 18:24:33 $, Status: personal view only. Editing status: imperfect but published. https://www.w3.org/DesignIssues/LinkedData.html



Is your Scientific Data FAIR

For many years, we have seen the proliferation of data as we increasingly instrument our scientific processes. We have developed a diverse landscape of tools and processes, making significant leaps from paper based documentation, but created a new nightmare of integration and complex analysis. The FAIR initiative or set of principles is a framework to reduce that complexity through the application of a core set of principles outlined below, making data machine readable across sources. This unlocks the data from the proprietary structure and system walls, and offers a foundation to build interconnected analysis and insights.

Reference this excerpt from the abstract here that summarizes quite nicely what the objective is:

There is an urgent need to improve the infrastructure supporting the reuse of scholarly data. A diverse set of stakeholders—representing academia, industry, funding agencies, and scholarly publishers—have come together to design and jointly endorse a concise and measurable set of principles that we refer to as the FAIR Data Principles. The intent is that these may act as a guideline for those wishing to enhance the reusability of their data holdings. Distinct from peer initiatives that focus on the human scholar, the FAIR Principles put specific emphasis on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals.

https://www.nature.com/articles/sdata201618#Abs1

The FAIR Guiding Principles

To be Findable:

F1. (meta)data are assigned a globally unique and persistent identifier
F2. data are described with rich metadata (defined by R1 below)
F3. metadata clearly and explicitly include the identifier of the data it describes
F4. (meta)data are registered or indexed in a searchable resource

To be Accessible:

A1. (meta)data are retrievable by their identifier using a standardized communications protocol
A1.1 the protocol is open, free, and universally implementable
A1.2 the protocol allows for an authentication and authorization procedure, where necessary
A2. metadata are accessible, even when the data are no longer available

To be Interoperable:

I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
I2. (meta)data use vocabularies that follow FAIR principles
I3. (meta)data include qualified references to other (meta)data

To be Reusable:

R1. meta(data) are richly described with a plurality of accurate and relevant attributes
R1.1. (meta)data are released with a clear and accessible data usage license
R1.2. (meta)data are associated with detailed provenance
R1.3. (meta)data meet domain-relevant community standards

There are an increasing number of resources targeted at supporting the movement to FAIR data, a couple of which are included here to get you started. There is much to cover on this topic, but these links and materials are a start on the conversation.

https://www.pistoiaalliance.org/projects/current-projects/fair-implementation/

References




Scrum Roles vs Job Titles – Learning in a Digital / Agile Transformation

As we have been taking our organization through an Agile Transformation as a part of an overall Digital Transformation, we have been leading extensive training across our business lines and technology teams. One of the most common questions as we discuss Scrum, is around job titles and role. In scrum, there are only a few key roles, and job titles do not apply! There is a good article on the Atlassian site, linked here, that I encourage you to read through. Ihit the highlights here, but they offer a deeper read.

“the essence of Scrum is empiricism, self-organization, and continuous improvement, the three roles give a minimum definition of responsibilities and accountability to allow teams to effectively deliver work.”

The idea of scrum is to keep things simple, contained to a self organized team, and optimized for continuous delivery. To accomplish this, there are 3 defined roles and it is from the intersection of these roles and classic titles or thinking that I see many questions.

Development team

The development team does not have to consist of software developers. According to the Scrum Guide, the development team can be comprised of all kinds of people including designers, writers, programmers, etc.  Titles here have been a source of confusion as involved leaders ask why a business analyst or business user is called a developer – I hear arguments that “I’m not an IT person” or “I don’t write code, why am I a developer?”. The key point for the developers role is that the person is a team member, focused on delivery of value in one or more capacities.

Ideally in a long running scrum team, the members will cross train sufficiently to blur the lines in roles, even in the development team! Unfortunately, in my industry and area, most teams come together for a project, and then disband leading to a higher than ideal rate of churn on this topic.

CSM / Scrum Master:

The scrum master is the “glue of the team” and the challenges in role vs title I see most often are those that come from the traditional PM thinking. I get the question – so who is the PM on this agile project? Or I get the CSM trying to hand out tasks against a project schedule instead of the collaborative scrum approach. This curve is an interesting one to watch, especially in the long time PMI / PMP trained professionals!

Product Owner

“The product owner should not only understand the customer, but also have a vision for the value the scrum team is delivering to the customer. The product owner also balances the needs of other stakeholders in the organization.”

I see business owners who initially struggle with the idea of the product owner role. They are used to the paradigm of meeting with a business analyst, providing requirements and waiting for something do be delivered that they then need to spend more money on since it does not match what was in their head or did not keep pace with the changing business needs.

The product owner role is a far more hands on role, and has expanded responsibility. The key step to success in communication and managing this role is establishing real clarity on the value of the high engagement. The regular feedback cycles, direct ownership and participation in sprint reviews means change is always an option, and surprises are minimal. This means less re-work but to get there, a product owner must understand that they are not the “hands off sponsor” or business lead that gets a “white glove” engagement. The product owner role is a bit of a dirty, hands on, learning in role to get the most value. This is a real paradigm shift for many, but pays off in amazing ways.




GDPR compliance steps, and impacts on MDM thinking

The impacts of the General Data Privacy Regulations are tremendous and cannot be overstated for any business that deals in personal or private data. In large enterprises, the subject of Master Data Management is often a 3rd rail, or considered a sure path to career death due to the complexity of the topic. Done properly the benefits are tremendous, but the ownership of information at the functional level is so often a barrier to success that it has become generally understood that these efforts are rarely fully successful.

Now we enter the world of GDPR. The regulation in principle is simple. It can be generalised to say: If you collect any personal information (broadly defined) you must make the information owner aware of what you collect, what you do with it and how to get that information back or purge it.

While it sounds simple in application, the complexity of the average enterprise information ecosystem makes this horrendously challenging. In the pharmaceutical industry where I currently work, patient information is collected in the course of a number of activities. As a part of that, various systems and reports are impacted and the process gets exponentially more complicated when biospecimens (samples of tissue, blood, etc…) are concerned. All samples must have consent for use documented, and a traceability through the full lifecycle. When data is “blinded” or anonymized, this adds even more complexity as the ramifications of “unblinding” data are significant to the integrity of the overall trial process. Beyond patient data, there is data on clinicians, research organizations, collaborators and more.

My recommendations I have made, and I suggest to clients include the following:

  1. A comprehensive audit of all Personally Identifiable Information collected or generated in the course of business. This should result in a list of data attributes, source of collection, systems and processes impacted
  2. For all data collected, where mastering or simplification of process is feasible, a plan must be defined and should be mapped into an overall value stream analysis for inclusion in a portfolio of work focused on GDPR compliance.
  3. All data once mapped, must be included in a reference that allows comprehensive aggregation of the record – unifying the person to a single view. This requirement is where the value of mastering comes to the top. Without the ability to view a person in a record, the next step is exponentially more challenging.
  4. The Data Privacy Officer for the company and if applicable, related functions, must be pulled in to evaluate the plans and approach, supplementing the data record or approach as needed.
  5. The next step in compliance once the data is clearly mapped and understood is an evaluation of the impact of “providing and purging” as each data owner has the right to request at any time a comprehensive export of all data collected about them. Correspondingly, they can also request that all personal data be purged. This becomes particularly challenging when data has cross dependencies and is a part of a larger analysis.
  6. The evaluation of the provide and purge process must result in a documented program and process to answer the question of what is the impact of a request. How will we mitigate the risk and what needs to change about our processes to accommodate these needs. This starts with the evaluation, moves to a documented plan, and then must be accompanied by at a minimum, a tabletop exercise of the multiple scenarios identified.

This is a costly effort, from a time and resources perspective. As an enterprise, leaders must be aware of the risk of non compliance relative to the cost of compliance.