logoContent Collections

Transform

The transform function of a collection can be used to perform various tasks on the document before it is saved to the collection. It receives two arguments: the document, which matches the schema's shape, and a context object. The result of the transform function defines the TypeScript type of the document and should return an object that can be serialized to JSON. The function can be synchronous or asynchronous.

Content

In Content Collections, the content of a document is just a string. One common use case for the transform function is to transform this content, such as converting markdown to HTML. While the content can be used directly with libraries like react-markdown, transforming the content during build time is often beneficial.

Various libraries available in the Node.js ecosystem can be used for transformation, with the only requirement being that the result is serializable. However, there are specific helper packages designed for Content Collections:

Another option is to use Content Collections solely for metadata (e.g., frontmatter) while employing a different tool to render the content (e.g., next/mdx). This approach is particularly beneficial for larger sites using mdx. If the bundler that compiles the rest of the application also includes the mdx content files, it can optimize the resulting bundle. This way, imports in mdx files are resolved by the bundler and function like those in the rest of the application. Using an external bundler, as we do with @content-collections/mdx, may lead to larger bundles because the external bundler must re-bundle every used component, even if it is already utilized elsewhere in the application. To implement this approach, the frontmatter-only parser can be used alongside Static Import. For a complete sample with Next.js, refer to the next-mdx-static-import sample.

Caching

The transform function is invoked for every document during each build or document change. While this behavior may change in the future, it currently holds true. Therefore, ensuring fast transformations is crucial. For slower operations like markdown to HTML conversion, the context object provides a cache function to cache costly operations.

Example:

transform: async (doc, { cache }) => {
  const html = await cache(doc.content, async (content) => {
    return markdownToHtml(content);
  });
 
  return {
    ...doc,
    html,
  };
};

In this example, the markdownToHtml function is only called if the content has changed.

Note: Caching the compilation steps of @content-collections/markdown or @content-collections/mdx is unnecessary as they already utilize the same caching mechanism.

Access sibling documents

Since version 0.7.0, it is possible to access other documents of the same collection by using the documents function of the collection object, which is part of the context object. The function is asynchronous, requires no parameters, and returns an array of all documents of the collection. The documents are not transformed; they have the shape as defined in the schema of the collection.

Example:

const posts = defineCollection({
  // ...
  transform: async (doc, { collection }) => {
    const docs = await collection.documents();
    const idx = docs.findIndex((d) => doc._meta.filePath === d._meta.filePath);
    return {
      ...doc,
      prev: idx > 0 ? docs[idx - 1] : null,
      next: idx < docs.length - 1 ? docs[idx + 1] : null,
    };
  },
});

Access other collections

The transform function can access other collections using the documents function of the context object. The function requires a collection reference as parameter and returns an array of documents for that collection. But keep in mind the returned document are not transformed, they have the shape as defined in the schema of the referenced collection.

Example:

const authors = defineCollection({
  // ...
  schema: (z) => ({
    ref: z.string(),
    displayName: z.string()
  }),
});
 
const posts = defineCollection({
  // ...
  transform: async (doc, { documents }) => {
    const author = await documents(authors).find(
      (a) => a.ref === doc.author
    );
    return {
      ...doc,
      author: author.displayName
    };
  },

For a complete example have a look at the Join collections example.

It is not possible to access documents of the same collection with the documents function. Use the collection.documents function instead. Please refer to Access sibling documents for more information.

Static Imports

Since version 0.8.0, the transform function can return a reference to a static import. This reference can be used to import a JavaScript module, which then becomes part of the generated document.

Example:

transform: async (doc) => {
  const component = createDefaultImport<React.ComponentType>(`@/components/${doc.component}`);
  return {
    ...doc,
    component,
  };
},

The static import is very useful when working with MDX files, and the application's bundler allows for their import.

Example:

transform: async (doc) => {
  const mdxContent = createDefaultImport<MDXContent>(`@/content/posts/${doc._meta.filePath}`);
  return {
    ...doc,
    mdxContent,
  };
},

The mdxContent can the be used in the application:

const Post = ({ post }: Props) => {
  const MDXContent = post.mdxContent;
  return <MDXContent />;
};

To ensure type safety, a generic for the imported module must be passed to the createDefaultImport.

Examples

Here are some common use cases of the transform function:

Generate a slug

transform: (doc) => {
  return {
    ...doc,
    slug: doc.title.toLowerCase().replace(/ /g, "-"),
  };
};

Fetch external data

transform: async (author, { cache }) => {
  const bio: string = await cache(
    `https://api.authors.com/bio/${author.username}`,
    async (url) => {
      const response = await fetch(url);
      const user = await response.json();
      return user.bio;
    },
  );
 
  return {
    ...user,
    bio,
  };
};

Get last modification date from Git

transform: async (doc, { cache }) => {
  const lastModified = await cache(doc._meta.filePath, async (filePath) => {
    const { stdout } = await exec(`git log -1 --format=%ai -- ${filePath}`);
    if (stdout) {
      return new Date(stdout.trim()).toISOString();
    }
    return new Date().toISOString();
  });
 
  return {
    ...doc,
    lastModified,
  };
};

Join collections

import { defineCollection, defineConfig } from "@content-collections/core";
 
const authors = defineCollection({
  name: "authors",
  directory: "content/authors",
  include: "*.md",
  schema: (z) => ({
    ref: z.string(),
    displayName: z.string(),
    email: z.string().email(),
  }),
});
 
const posts = defineCollection({
  name: "posts",
  directory: "content/posts",
  include: "*.md",
  schema: (z) => ({
    title: z.string(),
    author: z.string(),
  }),
  transform: async (document, context) => {
    const author = await context
      .documents(authors)
      .find((a) => a.ref === document.author);
    return {
      ...document,
      author,
    };
  },
});
 
export default defineConfig({
  collections: [authors, posts],
});

On this page