Indexing Mirror.xyz

9/30/2022 - originally posted on mirror.xyz

If you read content online you’ve probably at least heard of publishing services like Medium and Substack. These are centralized, web2 companies that make money with views; subscriptions, ads, etc. Thankfully, as we transition into the web3 space we are already seeing some promising alternatives. The largest of these web3 publishers is Mirror.

The beauty of protocols like Mirror is that they don’t own any of the data. They still have a login, a clean text editor, and shareable links just like Medium or Substack. But the content that flows through Mirror is entirely decentralized on a network they don’t control, Arweave. We’ve spoken briefly about Arweave previously, but the gist is that any content stored on it is stored forever thanks to unique incentive models built into the network. This has two important ramifications:

  1. There are 0 paywalls in Mirror. You can still choose to support individual creators, and Mirror helps facilitate this, but it’s entirely optional.

  2. You don’t even have to use Mirror to participate in the broader ecosystem.

That second point is what we’re going to focus on in this post. Specifically, because the data is freely available forever, we can choose to use this data in any way we want. For instance, this very blog post is available on Mirror, but it’s also available on our company’s website. Any updates to this post are instantly available on both sites because they both use the exact same data source, Arweave. Let’s unpack this a bit:

  1. Content is written in an editor such as Mirror

  2. That content is added to a transaction on Arweave

  3. A user visits a related link on either mirror.xyz or indexing.co

  4. The site fetches the content from Arweave, formats it, and displays it to the end user

  5. Content is shared from creator to reader

The Nitty Gritty

Now that we have an overview of the steps required to load an individual post, let’s look at the technical details for indexing all of a given user’s posts from Mirror. For this, we’re going to focus on using Typescript alongside the arweave package on npm.

Mirror helps structure the content stored on Arweave with what are known as tags. These are pretty much what you might expect: key <> value pairs representing arbitrary strings tied to a piece of content. For instance, these are the tags for our post on web3 storage options:

{
  'Content-Type': 'application/json',
  'App-Name': 'MirrorXYZ',
  Contributor: '0x0317d91C89396C65De570c6A1A5FF8d5485c58DC',
  'Content-Digest': 'B1ytOURSn75aACoOHmVHrV31bl0tL4ffWHEtl4JeGUE',
  'Original-Content-Digest': 'FDyv8i8c15ATs_KIpAtEdeMP20WZ00FfssPYOj3EZRY'
}

For our purposes we’re most interested in the App-Name and Contributor tags. We’ll use the combination of these two to pull all of the content published on Mirror by our given writer.

Alright, time for some code. We’re first defining our arweave instance and pointing it at the publicly hosted arweave.net provider. If you want to run your own node, check out their docs here.

import Arweave from "arweave";

const arweave = Arweave.init({
  host: "arweave.net",
  port: 443,
  protocol: "https",
});

Since we’re using Typescript, we can define our Post structure. This roughly reflects what we’ll get from Arweave directly with the addition of the originalDigest key. That originalDigest will be pulled from the Original-Content-Digest and is important because that’s what Mirror uses in their URLs (i.e. why you can edit a post without having to share a new link).

type Post = {
  authorship: {
    contributor: string;
  };
  content: {
    body: string;
    timestamp: string;
    title: string;
  };
  digest: string;
  originalDigest: string;
};

Finally, we get to the meat of this whole shebang. We first query Arweave for the transactions matching our given Contributor tag and then fetch the full transaction for each identifier, including its data. Since the current arweave package only allows us to search by one tag, we filter by the App-Name: MirrorXYZ piece further down after we parse out the tags.

Now that we’ve filtered down to only those transactions that match our Contributor and App-Name, we can pull out the data and turn it into a Post. Mirror adds all of their content as structured JSON strings, so we can readily parse that out and typecast to our Post type. Of course, null checks and error handling would be welcomed additions as well.

async function getPostsForContributor(address: string): Promise<Post[]> {
  const arweaveTransactionIds = await arweave.transactions.search(
    "Contributor",
    address
  );

  const arweaveTransactions = await Promise.all(
    arweaveTransactionIds.map((txId) => arweave.transactions.get(txId))
  );

  const postsByOriginalDigest: Record<string, Post> = {};

  for (const transaction of arweaveTransactions) {
    const tags: Record<string, string> = {};

    for (const tag of transaction.tags) {
      const name = tag.get("name", { decode: true, string: true });
      const value = tag.get("value", { decode: true, string: true });
      tags[name] = value;
    }

    const appName = tags["App-Name"];
    if (appName !== "MirrorXYZ") {
      continue;
    }

    const originalDigest = tags["Original-Content-Digest"];
    if (postsByOriginalDigest[originalDigest]) {
      continue;
    }

    const rawData = transaction.get("data", { decode: true, string: true });
    postsByOriginalDigest[originalDigest] = {
      ...JSON.parse(rawData),
      originalDigest,
    };
  }

  return Object.values(postsByOriginalDigest);
}

And that’s really all there is to it! You can view all of the code above, together in this gist. It’s worth noting that the current arweave package does not support subscriptions. Because of this, we have to regularly check for new Arweave transactions in a manual way. This can be done via polling, or in the case of indexing.co, simply at request time.

Lastly, if you want to render a given post, the Post.content.body parameter is stored as markdown and can be roughly converted to HTML using a package like markdown-it.

import md from "markdown-it";

function PostView(post: Post) {
  return (
    <div dangerouslySetInnerHTML={{ __html: md().render(post.content.body) }} />
  );
}

Happy indexing!