Cointime

Download App
iOS & Android

The Trust Dilemma: Overcoming LLM Hallucinations in Financial Services

Validated Project

From chainlink by Author: Laurence Moroney

This is a guest post from Laurence Moroney, Chainlink Advisor and former AI Lead at Google.

In recent years, large language models (LLMs) have become synonymous with artificial intelligence (AI), spurring massive investment and interest. However, the impressive capabilities afforded by LLMs are offset by a severe caveat — the tendency to ‘hallucinate’ or generate false and misleading information. This phenomenon can pose significant trust challenges and techniques to overcome them in high-stakes domains like financial services, where accuracy and reliability are paramount, presenting a massive opportunity. The intersection of AI and technologies like blockchain — where trust and integrity are baked into the platform — could be the solution.

First, let’s examine the problem of hallucinations and then explore why this new industry initiative with Chainlink, Euroclear, Swift, and six financial Institutions is so transformative.

Understanding LLM Hallucinations

A large language model is a predictor of tokens—fundamental units of text or data. Trained on massive amounts of text, using a transformer architecture that learns sequence-to-sequence patterns, LLMs like OpenAI’s GPT, Google’s Gemini, and Anthropic’s Claude have proven to be excellent models for artificially understanding and generating text. But, given their artificial nature, they don’t truly understand their outputs and instead predict the next statistically relevant token for an output.

Consider the phrase from a popular children’s song: If you are happy and you know it.

In your brain, you have learned what comes next. It’s likely the words “clap,” “your,” and “hands.” The transformer architecture mimics this. 

In some cultures, however, the next word is not “clap,” but in fact, “you,” and they sing, “If you are happy and you know it, you clap your hands”. So, if one is predicting the next token based on a training corpus of text, where most instances don’t use “you,” but some do, then the predictive subsequent token modeling would indicate a high likelihood that the next word is “clap,” a lower likelihood that it is “you,” and then very low likelihoods for all other words.

And this is for a well-known phrase. Now, consider what happens if a model, trained on text like this, is asked to predict the next token for something that it has never before seen, like a news story or a corporate action that has only just been written like “Company X today announced a stock split of…” — how would the LLM predict the next token? From its corpus, it has likely seen very similar phrases many times, but they would have many different subsequent tokens like “twenty to one,” “ten to one,” or “one to ten,” etc. 

The LLM would calculate the next likely token from the most common one in its training set and output that. (Just like “clap” instead of “you” for the children’s song). For example, it might output a phrase like “Company X today announced a stock split of ten to one.” 

If the reality is that Company X is factually doing a six-to-one split, we now have a hallucination!

Given that, for our scenario, it’s not the core usage of an LLM to generate content like this, but instead to parse existing content — such as reading a PDF of the corporate actions where the stock split is mentioned. We can have it artificially understand the contents on our behalf so we can question it. It is important to note that the underlying hallucination issue *still* applies. The text of the PDF might say that the split is six-to-one, but the LLM could hallucinate ten-to-one based on its statistical next-token analysis. The output it gives you when you ask about the PDF is still generating the subsequent tokens based on the LLM’s best guesses.

The Peril of Hallucinations in Financial Services

Trusting an LLM blindly is a big mistake for the reasons demonstrated above. For financial services, the consequences of this could be:

Misinformed Decision Making

Inaccurate data could lead to flawed risk assessments, suboptimal investment strategies, and inefficient capital allocation.

  • Regulatory and Reporting Issues: False information could lead to unintentional violations of regulatory and reporting requirements
  • Erosion of Trust: Clients or stakeholders discovering that any institution relies on unreliable, AI-generated information could severely damage trust and reputation
  • Financial Losses: Hallucinated data leading to bad advice or forecasting could lead to significant monetary losses

Thus, fully embracing LLMs for financial operations is fraught with risk. The need for accuracy and reliability in financial data and advice makes the current state of LLM technology challenging to integrate safely into many core processes.

Blockchain: A Path to Trust and Verifiability

While LLMs present challenges in accuracy and reliability, blockchain technology, with its core attributes of trust and verifiability, may be the key to a solution. Blockchain’s decentralized and immutable ledger system provides a framework for recording and verifying information that could be leveraged to help mitigate the risks associated with LLM hallucinations. Let’s explore how that might work, beginning with the idea of consensus.

Consensus: A Method for Trust

The scientific process begins with a theory. This theory is then supported with experimental evidence. This is then reviewed by trusted peers who come up with a consensus—opinions may vary. Still, when most peers support that the experimental evidence underpinning the theory is valid, the scientific discovery is validated and becomes the current ground truth.

Inspired by this process, Chainlink implemented a novel technique to overcome the risks of hallucination. 

They used several LLMs to have them artificially understand the contents of a corporate action and output it in machine-readable JSON format. Instead of trusting a single prompt to a single LLM, the idea was to have a swarm of LLM-prompt combinations to produce various results. 

The consensus could then be measured. If they all produced the same result, we could begin to trust it, and it could be placed on the blockchain as a unified golden record. This is a verifiable, persistent, updateable, and interoperable data container that is synchronized across blockchains.

Of course, if consensus is not attained, a manual process could be used to establish the ground truth and then publish it as a unified golden record.

This process greatly lowered the risk of hallucination, increasing trust in the automation of the process to reduce costs. The publication of the findings on-chain means that all parties can trust the data going forward. 

Thus, an end-to-end system for converting unstructured data to highly trusted unified golden records is attainable. Much of this system could be automated, increasing trust and reducing the costs and risks associated with using LLMs in financial services.

Chainlink used this process in an industry initiative conducted alongside Euroclear, Swift, and six major financial institutions. This project demonstrated the automation of taking unstructured financial data, artificially understanding it with LLMs to produce on-chain golden records, and avoiding the risks of LLM hallucination. 

Given a lack of standardization in reporting processes for corporate actions, significant human capital is needed to read diverse document types to understand data for these events. 75% of firms have to revalidate this data manually, and the inefficient processes cost businesses many millions of dollars to overcome. 

Transforming Asset Servicing With AI, Oracles, and Blockchains

Chainlink’s approach to solving this problem can be found in Transforming Asset Servicing With AI, Oracles, and Blockchains. It shows very encouraging results at the prototype stage with:

  • Data Extraction and Structuring: It establishes a novel data extraction and structuring process that leverages unstructured data from public company sources and turns this into structured data that adheres to regulatory frameworks such as SPMG
  • Consensus Framework: It successfully demonstrated an LLM consensus framework for financial data comparing the outputs of multiple LLMs, greatly enhancing the reliability of their outputs and mitigating the hallucination risks
  • Near Real-Time Data distribution: Once the consensus data was established, Chainlink’s industry initiative propagated it across multiple blockchain ecosystems and stored it as unified golden records in smart contracts. This makes it accessible to market participants and provides a framework for them to build new applications on top of.

Conclusion

We are only at the beginning of the AI revolution. It can be compared to the Internet at the dial-up stage. As novel solutions to existing problems arise, the opportunity to build more and better solutions becomes clearer. 

In this study, the power of AI and LLMs, held back by the risk of hallucination, could be unleashed by a novel intersection of data extraction for consensus, and data publication on a trusted, verifiable solution with blockchain. As LLMs evolve and hopefully improve, the underlying technique of driving consensus and publishing established consensus on-chain as a golden record will continue to show value.

Chainlink’s industry initiative is a very early prototype of what could be a powerful solution that opens many new opportunities for AI, blockchain, and financial services to build better, together.

Comments

All Comments

Recommended for you

  • How Crypto Could Help Open-Source AI Reach Its Potential

    The impact of artificial intelligence (AI) is being felt across various sectors, including drug discovery, workforce productivity, and personalized content on streaming platforms like Netflix. Experts predict that the AI industry will grow by 40% annually and reach a trillion-dollar market by 2030, potentially transforming industries on an unprecedented scale. The use of cryptocurrency could play a crucial role in enabling open-source AI to overcome current limitations and reach its full potential.

  • ECB board member Patsalides warns Trump's tariff plan could lead to stagflation in Europe

    Christodoulos Patsalides, a member of the European Central Bank's board, warns that if US President-elect Donald Trump follows through on his threatened trade tariffs, the European economy could ultimately fall into stagflation. "Trade tensions are escalating," said the Cyprus Central Bank governor on Thursday in Nicosia. "If trade restrictions become a reality, the outcome could be inflation, economic recession, or worse, stagflation." He said that although there is room for further lowering of borrowing costs, it should be done "at a stable pace and magnitude."

  • Scam Sniffer: Crypto-Malware "Meeten" Renamed to "Meetio", Reminding Community to Be Vigilant

    Scam Sniffer posted on X platform, stating that the crypto conference malware "Meeten" has been renamed to "Meetio". The community is warned to be vigilant, as the renamed application is just a "disguise" and still poses a security threat.

  • Bankless Co-founder: The market has entered the beginning of the second half of the crypto bull market

    Ryan Sean Adams, co-founder of Bankless, posted on X platform stating that the current market has entered the beginning stage of the second half of the crypto bull market.

  • Elon Musk appointed by Trump to lead advisory board on government efficiency and restructuring

    President-elect Donald Trump has appointed Elon Musk and Ramaswamy to lead an advisory board called the "Department of Government Efficiency." The board aims to reduce government bureaucracy, cut wasteful spending, and restructure federal agencies. Rep. Marjorie Taylor Greene will chair a House subcommittee on "DOGE" to recommend executive actions to reduce waste and provide savings for taxpayers. Musk and Ramaswamy are reportedly creating a smartphone app for Americans to file taxes for free, causing shares of tax filing services H&R Block and Intuit to drop. However, the commission has received criticism from Senator Elizabeth Warren.

  • Curve: Market leverage demand surged after Trump's election, and protocol revenue grew rapidly

    On November 21st, Curve Finance stated that the crypto industry has experienced a large-scale increase after Trump recently won the US election. Key stocks such as MSTR and COIN have been reevaluated, and Bitcoin has approached the $100,000 mark. The demand for leverage has led to an increase in DAO's weekly income, rising from an average of $268,000 before Trump took office to $581,000 in the past week. Currently, the annual income allocated to veCRV holders is approximately $31 million, not including income from participating in voting incentives. As of today, including voting incentive bonuses, DAO has accumulated $554 million.

  • Paypal: There is a problem with the system at present, which may affect multiple products

     Paypal: Currently experiencing system issues that may affect multiple products; Merchants may be facing a higher number of errors.

  • Sui: The cause of the outage has been identified and a fix will be released soon

    Sui stated in a post on X that the Sui network is currently experiencing a malfunction and is unable to process transactions. The problem has been identified and a fix will be released soon. Earlier reports indicated that Sui Network stopped producing blocks 2 hours ago and has not yet resumed.

  • BCH breaks through $500

    market shows BCH has surpassed $500, currently trading at $521, with a daily increase of over 20%. The market is volatile, please be prepared for risk control.

  • Bitwise Bitcoin ETF's holdings exceed $4 billion

    Bitwise's official data update shows that as of November 20th, the BITB Bitcoin exchange-traded fund's position has reached 42,451.73 BTC, with a market value exceeding 4 billion USD, currently reaching 4,003,716,971.36 USD.