The numbers are staggering. Tech giants Microsoft, Meta, Amazon, and Alphabet are on track to spend nearly $400 billion on AI-focused capital expenditures in 2025, with Morgan Stanley projecting an additional $2.9 trillion in AI-related investment between 2025 and 2028. Yet beneath this unprecedented spending lies a troubling reality: most AI implementations are struggling to deliver results, and investment banking—one of the industry’s most coveted automation targets—has proven more resistant to disruption than expected.
The Great AI Implementation Challenge
While tech companies build massive data centers at breakneck speed, a recent MIT study found that approximately 95% of generative AI pilot programs struggle to achieve rapid revenue acceleration, with many delivering little to no measurable impact on profit and loss. The research, based on 150 interviews with leaders, a survey of 350 employees, and analysis of 300 public AI deployments, reveals a reality that contrasts sharply with the hype.
The core issue isn’t the quality of AI models but rather a “learning gap”—companies and their tools simply don’t understand how to integrate AI effectively into existing workflows. While executives often blame regulation or model performance, the MIT researchers discovered that the real problem lies in flawed enterprise integration.
Investment Banking: AI’s Toughest Challenge
The financial services industry presents unique obstacles that make it particularly difficult to automate. Despite OpenAI assembling a team of over 100 former investment bankers under Project Mercury, paying them $150 per hour to train AI on building financial models for transactions including IPOs and restructurings, significant barriers persist.
The core challenge: investment banking work consists of long chains of interdependent steps, each requiring validation before proceeding to the next. Humans naturally check their work after each step—verifying assumptions, cross-referencing numbers, and catching errors before they propagate. Current AI systems struggle with this critical self-verification capability, which can allow small mistakes to compound into significant problems across multi-step workflows.
The Compounding Error Problem: Why AI Struggles to “Check Its Work”
Banking remains “a zero-defect game where spotting errors in a deck before a client meeting is a prized skill,” with senior bankers having no tolerance for defects in client materials. This is where AI’s current limitations become particularly challenging.
The critical difference isn’t just that AI makes mistakes—it’s that AI typically doesn’t check after each step the way humans do. When a junior analyst builds a financial model, they constantly validate their work: Does this revenue growth assumption make sense? Do the numbers tie out? Is this discount rate appropriate for this industry? Each step gets verified before moving to the next.
Current AI systems, by contrast, tend to execute entire multi-step workflows without this critical self-checking mechanism. A misplaced decimal in step 2 can flow unchecked into step 3, where it produces an incorrect valuation multiple. That flawed multiple feeds into step 4’s comparable company analysis. By step 10, the financial model may have diverged significantly from reality—but the AI continued confidently through all steps without pausing to verify.
This compounding error dynamic is especially problematic because generative AI outputs often require validation for hallucination—the fabrication of confident responses that cannot be grounded in real-world data. In investment banking, where even standard large language models sometimes hallucinate when handling financial tasks, a single unchecked error at the beginning of a workflow can cascade into serious problems by the end.
Consider a DCF model: If the AI miscalculates free cash flow in year one and doesn’t catch it, that error propagates through the entire forecast period, potentially producing an incorrect enterprise value. A human analyst would likely notice the cash flow number seems off and double-check the calculation before proceeding. Current AI systems typically just keep computing, compounding the error with each subsequent step.
The Non-Deterministic Output Challenge
Unlike traditional software that produces consistent results, generative AI can produce different outputs each time you run it—a significant challenge for financial work that demands precision and reproducibility. Investment bankers need to produce identical valuations when running the same analysis, create standardized documentation that adheres to strict regulatory requirements, and maintain audit trails that can be verified years later.
This non-deterministic nature creates difficulties for high-stakes financial work where consistency and accuracy are paramount. Building reliable financial recommendations requires systems that produce the same results given the same inputs—a characteristic that current generative AI architectures don’t guarantee.
The Data Annotation Industry: A Billion-Dollar Solution Attempt
The AI industry’s response to these challenges reveals the scale of the difficulty. Companies like Surge AI have built substantial businesses by recruiting specialized domain experts—medical doctors earning $200-500 per hour, lawyers at $150-400 per hour, software engineers at $100-300 per hour—to annotate training data.
By 2024, Surge had built a network of over 50,000 domain experts, making it one of the most successful data labeling companies in the world with revenue approaching $1.2 billion. The fact that AI companies are willing to pay such premium rates for human expertise underscores a key limitation: current AI systems require thousands of carefully labeled examples to learn what human analysts often grasp from a single mistake.
Human Learning vs. Machine Learning: The Verification Advantage
The contrast between how humans and machines work reveals why investment banking has proven resistant to automation. It’s not just about learning from fewer examples—it’s about how humans approach multi-step tasks fundamentally differently.
A first-year analyst who miscalculates a discount rate learns immediately from the correction and applies that lesson across all future valuations. But more importantly, they develop a natural verification instinct: they pause after each calculation to ask “Does this make sense?” They tie out their numbers. They cross-check assumptions against market data. They catch errors before they cascade.
This step-by-step validation happens naturally in human cognition. When building a leveraged buyout model, an analyst doesn’t just execute 47 sequential calculations—they verify after steps 1, 5, 10, 15, and constantly spot-check throughout. If something looks wrong at step 12, they stop and fix it before it affects steps 13 through 47.
Current AI systems don’t yet have this verification mechanism built in. While few-shot learning methods have been developed to address data scarcity, allowing models to learn from fewer examples, today’s AI architectures haven’t fully solved the step-verification challenge. Traditional supervised learning requires thousands of labeled examples and still produces systems that tend to execute workflows start-to-finish without the natural checkpoints that prevent error compounding.
The challenge is substantial: companies like Surge AI have built billion-dollar businesses paying domain experts $100-500 per hour to create training data, yet all that expensive human input hasn’t yet taught AI to do what humans do instinctively—pause, verify, and catch mistakes before they multiply.
The Economics Don’t Add Up
A $400 billion capital expenditure depreciated over a 10-year lifespan results in $40 billion in annual depreciation, which contrasts starkly with revenue estimates between $20 and $40 billion in 2025. This scale of investment requires generating $2 trillion in annual revenue by 2030 to justify costs, yet current AI revenues stand at only $20 billion—requiring a 100-fold increase.
For investment banking specifically, the ROI proposition becomes even more challenging. While more than half of generative AI budgets are devoted to sales and marketing tools, MIT found the biggest ROI in back-office automation—eliminating business process outsourcing, cutting external agency costs, and streamlining operations. Yet even in these simpler use cases, success rates remain disappointingly low.
Why Leading AI Labs Still Struggle With Basic Banker Tasks
Despite Project Mercury’s efforts, Bloomberg columnist Matt Levine noted the project’s interesting dynamic: most junior bankers leave after two years, worn down by 100-hour weeks and demanding work, making them “perfectly happy to train a robot to replace junior bankers”.
Yet investment banking requires productivity improvements averaging 34%, with tasks requiring generation of initial deal structures, due diligence, compliance, and valuation—all areas where generative AI’s hallucination tendencies create significant challenges.
The work that AI can more safely automate—basic formatting, simple data entry, routine document assembly—represents only a portion of what junior analysts do. The judgment calls, pattern recognition, error detection, and client relationship skills that distinguish competent bankers remain largely in human territory for now.
The Bottom Line
Companies aggressively growing their balance sheets underperformed conservative peers by 8.4% annually from 1963 to 2025, a pattern known as the “asset-growth anomaly”. Current AI spending already exceeds the internet boom’s peak relative to GDP, yet the practical results in high-value professional services remain mixed.
For investment bankers, this suggests several things:
Near-term job security may be stronger than headlines suggest. The key challenge isn’t that AI can’t do individual tasks—it’s that current AI systems struggle to check their work after each step the way humans do. Until AI systems develop more reliable self-verification mechanisms that prevent error compounding in multi-step workflows, wholesale replacement of banking analysts faces significant hurdles.
The checking matters as much as the doing. As the industry recognizes this, the value may shift toward verification, quality control, and error detection—skills that remain solidly in human territory. Junior bankers who excel at catching mistakes and verifying assumptions could become more valuable in an AI-augmented environment.
Skill mix is evolving. As economist Shawn DuBravac noted, “I’m not convinced we get rid of entry-level workers anytime soon. But the skill set we need those workers to have is different”. The focus appears to be shifting toward judgment, quality control capabilities, and the instinct to verify before proceeding—capabilities where current AI systems face limitations.
Expectations may be moderating. The MIT report that 95% of AI pilots struggle to deliver results contributed to stock market volatility, with shares of many tech companies declining. As reality emerges about the challenges AI faces with self-verification in professional services, expectations may adjust.
Investment banking requires not just precision but continuous verification—catching errors after each step before they compound into larger problems. Current AI architecture faces significant challenges in delivering this capability reliably. While AI will likely change aspects of banking work—automating some routine tasks and augmenting human capabilities in specific areas—the wholesale replacement scenario requires solving verification challenges that leading AI labs are still working on despite billions in investment.
The $400 billion spending spree may be encountering practical constraints: building systems that can reliably verify their own work at each step appears considerably harder than simply executing workflows. Your ability to catch mistakes and verify assumptions may remain a valuable skill for the foreseeable future.
