Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help


description: LangChain Record Manager Nodes

Record Managers


Record Managers keep track of your indexed documents, preventing duplicated vector embeddings in Vector Store.

When document chunks are upserting, each chunk will be hashed using SHA-1 algorithm. These hashes will get stored in Record Manager. If there is an existing hash, the embedding and upserting process will be skipped.

In some cases, you might want to delete existing documents that are derived from the same sources as the new documents being indexed. For that, there are 3 cleanup modes for Record Manager:

{% tabs %} {% tab title="Incremental" %} When you are upserting multiple documents, and you want to prevent deletion of the existing documents that are not part of the current upserting process, use Incremental Cleanup mode.

  1. Let's have a Record Manager with Incremental Cleanup and source as SourceId Key
  1. And have the following 2 documents:
TextMetadata
Cat{source:"cat"}
Dog{source:"dog"}
  1. After an upsert, we will see 2 documents that are upserted:
  1. Now, if we delete the Dog document, and update Cat to Cats, we will now see the following:
  • The original Cat document is deleted
  • A new document with Cats is added
  • Dog document is left untouched
  • The remaining vector embeddings in Vector Store are Cats and Dog
{% endtab %}

{% tab title="Full" %} When you are upserting multiple documents, Full Cleanup mode will automatically delete any vector embeddings that are not part of the current upserting process.

  1. Let's have a Record Manager with Full Cleanup. We don't need to have a SourceId Key for Full Cleanup mode.
  1. And have the following 2 documents:
TextMetadata
Cat{source:"cat"}
Dog{source:"dog"}
  1. After an upsert, we will see 2 documents that are upserted:
  1. Now, if we delete the Dog document, and update Cat to Cats, we will now see the following:
  • The original Cat document is deleted
  • A new document with Cats is added
  • Dog document is deleted
  • The remaining vector embeddings in Vector Store is just Cats
{% endtab %}

{% tab title="None" %} No cleanup will be performed {% endtab %} {% endtabs %}

Current available Record Manager nodes are:

  • SQLite
  • MySQL
  • PostgresQL

Resources