Dennis Verstappen
2025-08-07

Serving AI with Data Infrastructure Fit for Web3

Web3 technology is perfectly positioned to ensure AI operates on trustworthy data while making AI accountable, transparent, and interconnected.


Web3 technology is perfectly positioned to ensure AI operates on trustworthy data while making AI accountable, transparent, and interconnected. Blockchain can verify data through the network, guaranteeing that the inputs to and outputs from AI models are reliable. While the current Web3 landscape largely centers around financial data, blockchain technology has the potential to extend far beyond, encompassing personal information, scientific data, and government records.

Currently, developers of AI, AI agents, bots, and ML models in Web3 are working to determine the necessary data infrastructure and data for training, inference, monitoring, and retraining their models. Before diving into the entire data pipeline required for these processes, let's look at a few examples of model types that can be supported:

A few challenges exist in the current Web3 environment with this data with the main problems being:


We will now take a look at the processes needed to put AI into production and how the unique data infrastructure from The Indexing Company can serve the builders and AI in Web3.

Training

To train models, a vast amount of data is needed. The training of these models happens often in a local environment with easy access to the data. The data is fed to these models in batches so these models can learn from those inputs. Historical on-chain data has to be fetched and can come from multiple chains. Ideally this data is transformed into a unified data schema, regardless of chain (EVM or non-EVM), while data is enriched with off-chain data like contract labels. Since the data pipelines built by The Indexing Company are chain agnostic and allow custom transformations, the data can be put into a unified data schema before it hits the training database or data lake. Since the data pipelines are highly configurable, data like contract labels or pricing data can be added to ensure a more complete feature set. The parallel processing network utilized by The Indexing Company ensures that backfilling this historical data is pushed fast to the target data infrastructure.

Inference

Inference is a term that covers the process trained AI models use to make predictions and decisions based on new incoming data. Ideally this data reaches the model in the same schema and with the same features as in the training stage. Data needs to be frequently updated to have the AI serve the user or act on its own. Data can be streamed in real time to a database which can trigger the AI based on certain thresholds. If the AI needs to pull data, the AI can query the database or can call an API which is hosted on top of the database. Since the pipelines from The Indexing Company can be configured in a way where it does not matter if the data is historical or real time, the same infrastructure can be used to both train the AI and serve the data for inference purposes. Basically, setting up these pipelines for historical data ensures that the data pipelines for inference are already in place too. These data pipelines can furthermore be optimized for low latency to have the AI act as fast as possible after blocks are confirmed on the blockchain.

Monitoring

Once an AI or a swarm of multiple agents begin transacting on the blockchain, they should be monitored to ensure performance. The data resulting from the agent's actions can also be indexed and used for real-time alerts, monitoring and analytics, giving users the ability to disable or reconfigure the agent in real time. We designed our infrastructure to be responsive (vs. a static approach to configurations), automatically indexing new data based on the data coming in and/or the reconfigured logic (either on events emitted on the blockchain or when a trigger is sent to the pipelines). This ensures that every new action by the bot or every new bot added to the swarm gets monitored.

One example of this responsive data infrastructure is Just In Time Indexing (JITI). In a previous article, we described how Just In Time Indexing can work to continuously backfill and index new transactions from new addresses. For example, when a new agent is registered to the network, it would do so from a Factory Contract. JITI would be triggered to now monitor this new address and all transactions related to this address. This process ensures data completeness without manual intervention by developers.

Retraining

Models need to be retrained frequently to stay up to date with changes in the environment, to improve performance or to add new chains the bots need to be active on. With new types of data coming in, the chance that this data is in a different schema and requires new transformations is high. This is both true when new data from protocols or chains is added, since the smart contract structure or event structure might be different. Luckily, since we designed our data pipelines to be highly configurable, these transformations can happen before the data hits the target data infrastructure. Even if data comes from different sources or chains (EVM vs. Non-EVM) the resulting data schema can be unified. The unification of the data ensures continuity in the data schemas needed to calculate the features. This reduces the additional data engineering needed to integrate new data.

Conclusion

We welcome the opportunity AI brings to Web3. The potential is promising to both improve UX for users or automate tasks with settlement on a blockchain. The data infrastructure The Indexing Company provides is fully ready to help developers in AI and Web3 build a next generation of products. With fast and complete historical data, real time data streaming and responsive data pipelines, any type of model and AI can be (re-)trained, served and monitored.

We are happy to spar with developers and businesses on their data needs. If you want to chat or need support, reach out to us.

INDEXING CO

Menu