libstemmer

SnowballStemmerException

class SnowballStemmerException: object.Exception;

SnowballStemmer

struct SnowballStemmer;

Encapsulates a stemmer, providing safe interface to it.

This struct is non-copyable. If this makes you unhappy, allocate it with new or safeRefCounted.

Encoding

enum Encoding: string;
- utf8
  
  utf8 iso8859_1 iso8859_2 koi8r
algorithms

static pure nothrow @nogc @trusted immutable(string)[] algorithms();

Get a list of supported stemming algorithms (i.e., languages).

Only the canonical name of each algorithm is returned: "english\0" is there, but "en\0" is not. See modules.txt to get an impression of what this list may look like.
this

pure @safe this(scope const(char)[] algorithm, scope Encoding encoding = Encoding.utf8) scope;

Construct a stemmer with the specified algorithm and input encoding.

algorithm and encoding are case-sensitive and must be zero-terminated (e.g., you have to pass "en\0", not "en"); that is asserted.

This constructor is unavailable in betterC mode.

Throws

SnowballStemmerException on an unknown algorithm or encoding or an unsupported combination of those.
reset

pure nothrow @nogc @trusted _Bool reset(scope const(char)[] algorithm, scope Encoding encoding = Encoding.utf8) scope;

Try to change the algorithm and encoding used by this stemmer.

algorithm and encoding are case-sensitive and must be zero-terminated (e.g., you have to pass "en\0", not "en"); that is asserted.

The return type of this method implicitly converts to bool. If algorithm or encoding are unknown or their combination is unsupported, then false is returned and no changes are made.
isNull

pure nothrow @nogc @safe bool isNull() const scope;
this

pure nothrow @nogc @system this(sb_stemmer* handle) scope;

Acquire ownership over a low-level stemmer.

It will be deleted automatically, hence @system.
handle

pure nothrow @nogc @system inout(sb_stemmer)* handle() inout scope;

Get the low-level stemmer.

Manipulating it directly may interfere with SnowballStemmer, hence @system.
release

pure nothrow @nogc @trusted sb_stemmer* release() scope;

Extract the low-level stemmer.

From now on, you are responsible for deleting it.

stemUtf8

@safe auto stemUtf8(alias callback)(ref scope SnowballStemmer st, scope const(char)[] word); @safe auto stem(alias callback)(ref scope SnowballStemmer st, scope const(ubyte)[] word);

Determine the stem of the given word.

The stem is passed to callback, which it must not escape. (If you compile with -dip1000, the compiler will enforce that.) Also, callback has to be @safe or @trusted. Whatever it returns will be passed back to the caller.

During callback invocation, you cannot stem another word with the same stemmer. (Doing so will result in assertion failure.)

stemUtf8 does not actually require the stemmer to be created with Encoding.utf8; it is merely a convenience function that inserts char[ ] <-> ubyte[ ] casts. It can be used interchangeably with stem; but there is a convention in the D community that char[ ] contains UTF-8 and ubyte[ ] holds arbitrary binary data.

Note these are not member functions (to avoid deprecations about dual context). Thanks to UFCS, most of the time there is no difference.

Examples

      auto st = SnowballStemmer("en\0");
st.stemUtf8!((stem) {
    assert(stem == "minifi");
})("minify");

    

stemUtf8

pure nothrow @safe string stemUtf8(ref scope SnowballStemmer st, scope const(char)[] word); pure nothrow @safe immutable(ubyte)[] stem(ref scope SnowballStemmer st, scope const(ubyte)[] word);

Determine the stem of the given word; allocate from the GC heap.

Only provided for convenience. When possible, you are encouraged to use the other overload, which does not allocate; please refer to it for detailed documentation.

Examples

      auto st = SnowballStemmer("en\0");
assert(st.stemUtf8("minify") == "minifi");

libstemmer

Throws

Examples

Examples