description: >- Split documents recursively by different characters - starting with "\n\n", then "\n", then " ".
Recursive Character Text Splitter
.png)
Recursive Character Text Splitter Node
The RecursiveCharacterTextSplitter node is a text splitting component used for dividing large text documents into smaller chunks. It’s particularly useful for processing long documents that need to be broken down into more manageable pieces for analysis, summarization, or other natural language processing tasks.
Parameters
-
Chunk Size
-
Label: Chunk Size
-
Name: chunkSize
-
Type: number
-
Description: Number of characters in each chunk
-
Default: 1000
-
Optional: Yes
-
-
Chunk Overlap
-
Label: Chunk Overlap
-
Name: chunkOverlap
-
Type: number
-
Description: Number of characters to overlap between chunks
-
Default: 200
-
Optional: Yes
-
-
Custom Separators
-
Label: Custom Separators
-
Name: separators
-
Type: string
-
Description: Array of custom separators to determine when to split the text, will override the default separators
-
Placeholder: ["|", "##", ">", "-"]
-
Optional: Yes
-
Additional: This is an advanced parameter
-
Input
The node takes the following inputs:
-
Text document(s) to be split (handled internally by the system)
-
Configuration parameters as described above
Output
The node outputs a RecursiveCharacterTextSplitter object, which can be used to split text documents into chunks based on the specified parameters.
Usage
This node is typically used in document processing pipelines where large texts need to be broken down into smaller, more manageable pieces. It's particularly useful for:
-
Preparing text for large language models with context limitations
-
Breaking down documents for summarization tasks
-
Splitting text for parallel processing
-
Creating more granular sections for information retrieval or question-answering systems
Implementation Details
-
The node uses the
code RecursiveCharacterTextSplitterclass from the 'langchain/text_splitter' library. -
It supports dynamic configuration of chunk size, overlap, and custom separators.
-
The
code initmethod creates and returns a configuredcode RecursiveCharacterTextSplitterobject based on the input parameters. -
Custom separators, if provided, are parsed from a JSON string to an array.
Error Handling
The node includes error handling for parsing custom separators. If the separators string cannot be parsed as valid JSON, it will throw an error.
{% hint style="info" %} This section is a work in progress. We appreciate any help you can provide in completing this section. Please check our Contribution Guide to get started. {% endhint %}