Every document and search query that enters lunr is passed through a text processing pipeline. The pipeline is simply a stack of functions that perform some processing on the text. Pipeline functions act on the text one token at a time, and what they return is passed to the next function in the pipeline.
By default lunr adds a stop word filter and stemmer to the pipeline. You can also add your own processors or remove the default ones depending on your requirements. The stemmer currently used is an English language stemmer, which could be replaced with a non-English language stemmer if required, or a Metaphoning processor could be added.
var index = lunr(function () {
this.pipeline.add(function (token, tokenIndex, tokens) {
// text processing in here
})
this.pipeline.after(lunr.stopWordFilter, function (token, tokenIndex, tokens) {
// text processing in here
})
})
Functions in the pipeline are called with three arguments: the current token being processed; the index of that token in the array of tokens, and the whole list of tokens part of the document being processed. This enables simple unigram processing of tokens as well as more sophisticated n-gram processing.
The function should return the processed version of the text, which will in turn be passed to the next function in the pipeline. Returning undefined
will prevent any further processing of the token, and that token will not make it to the index.