AI Monitoring Needs are Evolving. It’s Time for Our Tools to Catch Up

April 9, 2026

AI monitoring is in the midst of a paradigm shift. As systems become more complex, networked and agentic, it becomes increasingly important to stress test them in real-world contexts, not just in controlled environments. Recent high-profile incidents underscore the stakes of insufficient post-deployment monitoring.

But despite strong appetite across government, academia and industry, the field of AI post-deployment monitoring remains fragmented. Illustratively, a recent review found that just nine percent of FDA-approved AI healthcare tools had a plan in place for post-deployment monitoring.

Why hasn’t AI post-deployment monitoring in practice kept pace with demand?

A new report from the National Institute of Standards and Technology (NIST) offers insights into what holds teams back from undertaking post-deployment monitoring, offering a crucial blueprint for how industry can fill the gaps. 

The Case for Post-Deployment Monitoring

Monitoring during the AI model and application development stage is a widely accepted requisite. It enables developers to evaluate a model’s capabilities, operational shortcomings, and risks, allowing for product improvements and fine-tuning before release. It shapes a system’s AI Bills of Materials (AIBOMs), a growing compliance expectation.

Though crucial, this pre-deployment monitoring may not sufficiently capture how AI systems act in practice. Testing pre-release is typically done under controlled conditions that do not perfectly mirror real-world environments, even in the most sophisticated simulations. Moreover, models are non-deterministic, meaning they can produce different results even when the same set of inputs are applied. This means that their behavior can evolve over time due to changes in input data, context and relationships, producing model drift and performance degradation over time. 

Post-deployment monitoring can be an effective complement to model stage testing. Post-market activities like field tests and continuous monitoring are able to detect drift, trace unwanted agent behavior, and uncover other deviations from results derived in air-gapped, in silico testing environments. This fosters continuous iteration, helping systems improve across everything from operational efficiency and output quality to data and agent security.

If the AI systems of today and tomorrow present a clear need for post-deployment monitoring, why is so much of AI evaluation still concentrated in the pre-deployment stage? 

To explore this issue, NIST recently convened a series of workshops with government experts, members of its AI Consortium and other stakeholders. consulted existing literature and But despite strong appetite across government, academia and industry, the field of AI post-deployment monitoring remains fragmented. Illustratively, a recent review found that just nine percent of FDA-approved AI healthcare tools had a plan in place for post-deployment monitoring., surfaces a number of key factors inhibiting post-deployment monitoring. 

A few key barriers and takeaways stand out:

Barrier: System Complexity

Today’s AI systems are distributed, complex and malleable, making post-deployment monitoring especially difficult to undertake. NIST’s participants highlighted complicating factors that inhibit post-market assessment, including: 

  • The ability for users to fine tune open-weight models downstream from the application layer. 
  • A distributed architecture that makes it difficult to aggregate monitoring data and logs.
  • The growing proliferation of agentic AI, particularly those lacking agent identifiers.
  • The use of non-sanctioned AI tools in organizational contexts (Shadow AI). 
  • Fast-evolving technological developments and changes in use cases that require frequent recalibration of monitoring tools. 

As NIST workshop participants note, this complexity isn’t just logistically challenging, but also has significant cost implications across compute, personnel, training and management. 

Industry Takeaway

Modern AI systems may increasingly be built with post-deployment monitoring specifically in mind. The NIST report envisions a sort of “monitorability tax” requiring more considered approaches to model development that enable longer term monitoring. This could widen the market space for continuous monitoring tools and field studies undertaken post-deployment.

Barrier: Lack of Standardization and Interoperable Tools

As compliance obligations for model developers expand under regulations like the EU AI Act, pre-deployment monitoring has evolved into a structured, standards-based practice. Yet post-deployment monitoring does not enjoy that same structural backing. NIST participants highlighted a dearth of appropriate standards for post-deployment monitoring and conflicting advice in existing ones (the report cites the differences between the ISO and EU’s definitions of “AI systems,” for example.) 

Participants raised questions about what to monitor, what measurements to use, and what incidents to report, particularly as purpose-built agentic AI expands and use cases become more narrow. They acknowledged that many third-party and open source aftermarket tools for post-deployment monitoring exist, but expressed caution at how these are vetted and against what criteria are they assessed.

Industry Takeaway

Expect the post-deployment monitoring ‘standards lag’ gap to close. NIST convening key stakeholders around this topic is a promising first step, and the report is already driving important public discourse. Combined with NIST’s recent focus on agentic AI and references to post-market assessments in the AI Risk Management Framework (AI RMF), it is clear that post-deployment monitoring is an emerging priority for the standards body. 

It is likely that further guidance from NIST will follow, whether via updates within the AI RMF or as separate standards. Industry would do well to engage NIST in the coming months in anticipation of such a push to shape standards around shared language and key methodologies. Conforming to the resulting NIST guidance will serve as an important trust marker for firms moving forward. 

Barrier: Availability-Needs Mismatch

The NIST report coded workshop comments into six monitoring categories (Functionality, Operational, Human Factors, Security, Compliance and Large-Scale Impacts) and compared distribution to the topic’s prevalence in existing literature. This allowed them to uncover differences between research saturation and real-world priorities.

Security monitoring was the most heavily covered in the literature, whereas human factors received the highest distribution of participant quotes in the workshops. Monitoring for functionality and to assess human factors was much more emphasized among workshop discussants than were security and compliance.

Industry Takeaway

The differences in academic versus practitioner priorities suggest that there is room for further collaboration between public and private research bodies to better align with industry needs. Beyond this, the results indicate that practitioners seek AI monitoring tools for reasons beyond just security and compliance. Organizations in this field should emphasize their ability to provide a holistic picture of full system operations, including human-AI feedback loops and much harder-to-parse user characteristics like intent and perception. Effective monitoring tools will be able to effectively gauge human experiences without generating user fatigue or friction.

As NIST’s new report indicates, there is strong consensus that post-deployment monitoring is essential as our AI systems expand and advance– but formidable hurdles remain. Strong partnerships between industry and standards bodies like NIST can help narrow the gaps and ensure that AI monitoring tools are fit for the changing AI landscape. 

Interested in engaging NIST on this topic alongside like-minded organizations? We invite you to get in touch and explore opportunities for engagement as part of the Open Policy coalition.

Don’t just watch policy happen.

Understand it. Act on it. Build with it.

Request a Demo