Merck and Mastercard are seeing real agentic AI results. Both say the plumbing came first.
Source: VentureBeat
Merck’s AI‑Driven Acceleration in Drug Discovery & Marketing
Merck is leveraging AI agents to cut drug‑discovery cycles by one‑third and to ship compliant marketing materials up to 80 % faster. According to VP of Digital Platforms Sean Finnerty, the success hinges on building the right infrastructure first.
Early Results
-
Marketing compliance
- AI‑generated drafts are ≈ 99 % correct on compliance.
- Review cycles have shrunk from months to days.
- Delivery speed is up 70 %–80 %.
-
Drug discovery
- An AI‑assisted discovery cycle was reduced by 33 %.
Key Takeaway
“If we do one‑offs, we’re gonna end up with thousands and thousands of things that are ultimately just gonna be debt that we’ll have to deal with later. And that’s gonna be a drag on any further innovation.”
— Sean Finnerty, VP of Digital Platforms, speaking at the AI Impact Series event.
Bottom Line
- Infrastructure first: Building the underlying “plumbing” is essential for scalable, sustainable AI adoption.
- Avoid one‑off projects: Consolidated, reusable platforms prevent technical debt and keep innovation moving forward.
Starting with the Plumbing
Merck’s plumbing‑first strategy stems from lessons learned during the early days of cloud in the 2010s—“when nobody knew what the heck was going on,” Finnerty recalled.
Why the Infrastructure Matters
-
Scale – The foundation now supports:
- 2,500 AWS accounts
- Numerous Microsoft Azure subscriptions
- New Google Cloud Platform (GCP) integrations
-
Future‑proofing – “AI is gonna be the same exact thing,” Finnerty warned. “We’ll have thousands and thousands of agents.”
This raises critical questions:
- How do we register them?
- How do we secure them?
- How do we ensure they’re connected to the right tools, have access to the right data, and receive the right context?
Context Delivery at Scale
-
Merck works with three hyperscalers, operates 47 edge locations, and manages hundreds of databases.
-
“Many, many petabytes” of structured and unstructured data reside in:
- Oracle databases
- SQL databases
- Excel spreadsheets
- Phone transcripts
- Other repositories
Building the Scaffolding
Finnerty’s team is creating a flexible scaffolding to deliver meaningful context in various situations. The data pipeline must be organized and ingested into the appropriate platform because “there’s no one solution to solve every single problem.”
| Situation | Platform(s) Used |
|---|---|
| General analytics | Databricks |
| Data warehousing | Amazon Redshift |
| Other needs | Four additional solutions (unspecified) |
The End Goal
“Let’s make that easy and frictionless for people to do, secure it, and ensure it’s well integrated with MCP (Model Context Protocol), A2A (Agent‑to‑Agent), and upstream compute,” Finnerty said.
“If you want to run stuff on GCP or AWS, we’ve got the plumbing in place so you can run your adjacent workloads wherever you want.”
Key takeaways:
- Infrastructure first enables rapid, secure AI expansion.
- A modular, multi‑cloud approach ensures workloads can run wherever they’re most efficient.
- Context delivery is a continuous, data‑driven effort that must adapt to diverse tools and data sources.
How Merck Is Using Agents
Merck is experimenting with AI agents across three main areas:
- Regulated enterprise operations
- Scientific‑discovery workflows
- Application modernization
Accelerating Drug Discovery
- Current challenge: Scientists must evaluate molecular structures and disease states to decide if a condition is druggable. Even when a target is known, developing a drug can take years.
- AI impact:
- Teams are seeing “very promising things,” such as cutting a research cycle by one‑third.
- “That’s a year off of the life of the discovery cycle,” says Finnerty. “Theoretically, we can get it to a patient who needs that therapy a year faster.”
Regulated Marketing & Compliance
- Once a product is approved, all marketing materials must be clearly and explicitly articulated for each market, country, state, or region.
- Historically, humans performed due‑diligence reviews, leading to multiple iteration cycles and long delays.
- AI‑driven workflow:
- Shifts from “human‑in‑the‑loop” to a human‑as‑governor model.
- Generates a first draft in days or a week that is ~99 % complete.
- Enables teams to ship compliant materials up to 80 % faster.
App Modernization
AI agents can now:
- Discover architecture and document data interactions, APIs, network paths, authentication, and authorization.
- Write infrastructure‑as‑code (e.g., Terraform) for deployment.
- Refactor code (e.g., convert JavaScript to Python).
“Where the company would have previously spent weeks, months, and hundreds of thousands of dollars to update one application, agents are now handling the work through prompts.” – Finnerty
Bottom line: By embedding AI agents throughout its pipeline, Merck is shortening drug‑discovery timelines, streamlining regulated communications, and dramatically reducing the cost and time required for application modernization.
Running into “Wackiness”
Finnerty acknowledges that his team has encountered significant challenges, especially with automated code and scenario testing. AI has sometimes fabricated scenarios—either due to incorrect context, infrastructure issues, or simply “getting creative,” suggesting tests for functions that don’t exist in the codebase.
“That surprised me a little bit because I thought we were further past some of the hallucination challenges in these later models,” he said.
Mitigation Strategies
To curb hallucinations, the team has implemented guardrails that essentially use AI‑to‑AI supervision and confidence scoring:
- Initial Generation – Claude produces the first output.
- Cross‑Check – Microsoft Copilot evaluates Claude’s result.
- Iterative Confirmation – The same query is asked a third time; confidence scores increase with each pass, reducing early‑run “garbage.”
“So if you ask something once, have AI check it, then ask it a third time, the confidence increases every time, and it minimizes some of the garbage that gets created in the early runs,” Finnerty explained.
Use Cases for Agentic AI in Financial Services
Context: Mastercard’s Chargeback & Dispute Workflow
Mastercard’s Chief Data Officer, Andrew Reiskind, and his team are experimenting with agentic AI to streamline the highly orchestrated transaction‑and‑dispute process. A chargeback or fraud dispute is not a single event; it triggers a cascade of back‑office activities that are traditionally labor‑intensive.
“When a consumer disputes a charge (typically online), that kicks off an entire other process on the back‑end that tends to be very labor‑intensive,” – Andrew Reiskind
Key Steps in the Current Workflow
- Consumer initiates dispute – often via an online portal.
- Mastercard collects dispute specifics (e.g., reason code, transaction details).
- Merchant conducts its own investigation
- Was the card reported lost or stolen?
- Does the consumer have a history of frequent disputes?
- Network (Mastercard) applies its rules for timing, data submission, and escalation.
“You have each and every one of these steps, many of which are unstructured, but there are also structured data elements to this,” – Reiskind
- Structured data – e.g., “card reported lost/stolen.”
- Unstructured data – e.g., free‑form consumer complaint, which may be of “questionable reliability.”
Consequently, the decision‑making system must handle both deterministic (rule‑based) and probabilistic (AI‑driven) decisions.
Why Agentic AI?
| Challenge | How Agentic AI Helps |
|---|---|
| Task allocation – deciding which steps to automate vs. keep human‑in‑the‑loop | AI agents can be programmed to take on routine, high‑volume tasks (e.g., data extraction, initial triage). |
| Human hand‑off timing – determining when an agent should defer to a human rep | Agents can monitor confidence scores and trigger escalation when uncertainty exceeds a threshold. |
| Scalability & cost – managing the number of agents and associated expenses | Dynamic provisioning of agents based on workload reduces idle capacity and operational cost. |
| Reputational risk – avoiding false accusations (e.g., calling a truthful consumer a liar) | Probabilistic models provide calibrated risk scores, enabling more nuanced, evidence‑based decisions. |
Core Questions to Address
- Task delegation: Which specific tasks are handed off to AI agents?
- Escalation criteria: When should agents return the case to a human representative?
- Agent fleet size: How many agents are needed to meet service‑level expectations?
- Cost implications: What are the operational cost savings versus the expense of deploying agents?
- Reputational safeguards: How do we ensure the system does not unfairly label honest consumers as deceptive?
“It’s an exact problem where you want to, as a bank, maintain trust with your consumer,” – Reiskind
“But you also wanna make this efficient and take costs out of the system.” – Reiskind
Takeaway
Agentic AI offers a promising path to accelerate dispute resolution, reduce manual effort, and preserve consumer trust—provided that the design carefully balances automation with human oversight and cost considerations.
The PB&J versus Turkey Mistake: Determining Acceptable Risks
There’s always going to be risk with AI, and enterprises should assess it from the beginning of product design, says Reiskind. The key question is: what level of risk is acceptable?
Illustrative Example
- Minor inconvenience: Serving a customer a peanut‑butter‑jelly sandwich instead of a turkey sandwich.
- Serious consequence: Serving gluten to someone with celiac disease.
“Is it an acceptable risk if one percent of the time it makes the mistake? If it is, let’s move to the next stage of how you’re mitigating that risk,” Reiskind explains.
How Leaders Should Approach the Problem
- Perform a cost‑benefit analysis.
- Break problems down into their constituent pieces.
- Calculate the cost for each piece.
“These are estimates; it’s near‑impossible to forecast real usage,” Reiskind notes.
“It is not a simple process to get to the cost, but it is doable.”
By systematically evaluating both the likelihood and impact of AI errors, organizations can decide which risks are tolerable and how best to mitigate them.