libstemmer
-
class SnowballStemmerException: object.Exception; -
struct SnowballStemmer;Encapsulates a stemmer, providing safe interface to it.
This struct is non-copyable. If this makes you unhappy, allocate it with
neworsafeRefCounted.-
enum Encoding: string;-
utf8
iso8859_1
iso8859_2
koi8r
-
-
static pure nothrow @nogc @trusted immutable(string)[] algorithms();Get a list of supported stemming algorithms (i.e., languages).
Only the canonical name of each algorithm is returned:
"english\0"is there, but"en\0"is not. See modules.txt to get an impression of what this list may look like.
-
pure @safe this(scope const(char)[] algorithm, scope Encoding encoding = Encoding.utf8) scope;Construct a stemmer with the specified algorithm and input encoding.
algorithmandencodingare case-sensitive and must be zero-terminated (e.g., you have to pass"en\0", not"en"); that is asserted.
This constructor is unavailable in betterC mode.Throws
SnowballStemmerExceptionon an unknown algorithm or encoding or an unsupported combination of those. -
pure nothrow @nogc @trusted _Bool reset(scope const(char)[] algorithm, scope Encoding encoding = Encoding.utf8) scope;Try to change the algorithm and encoding used by this stemmer.
algorithmandencodingare case-sensitive and must be zero-terminated (e.g., you have to pass"en\0", not"en"); that is asserted.
The return type of this method implicitly converts tobool. Ifalgorithmorencodingare unknown or their combination is unsupported, thenfalseis returned and no changes are made. -
pure nothrow @nogc @safe bool isNull() const scope; -
pure nothrow @nogc @system this(sb_stemmer* handle) scope;Acquire ownership over a low-level stemmer.
It will be deleted automatically, hence
@system. -
pure nothrow @nogc @system inout(sb_stemmer)* handle() inout scope;Get the low-level stemmer.
Manipulating it directly may interfere with
SnowballStemmer, hence@system. -
pure nothrow @nogc @trusted sb_stemmer* release() scope;Extract the low-level stemmer.
From now on, you are responsible for deleting it.
-
-
@safe auto stemUtf8(alias callback)(ref scope SnowballStemmer st, scope const(char)[] word);
@safe auto stem(alias callback)(ref scope SnowballStemmer st, scope const(ubyte)[] word);Determine the stem of the given word.
The stem is passed to
callback, which it must not escape. (If you compile with-dip1000, the compiler will enforce that.) Also,callbackhas to be@safeor@trusted. Whatever it returns will be passed back to the caller.
Duringcallbackinvocation, you cannot stem another word with the same stemmer. (Doing so will result in assertion failure.)
stemUtf8does not actually require the stemmer to be created withEncoding.utf8; it is merely a convenience function that insertschar[ ] <-> ubyte[ ]casts. It can be used interchangeably withstem; but there is a convention in the D community thatchar[ ]contains UTF-8 andubyte[ ]holds arbitrary binary data.
Note these are not member functions (to avoid deprecations about dual context). Thanks to UFCS, most of the time there is no difference.Examples
auto st = SnowballStemmer("en\0"); st.stemUtf8!((stem) { assert(stem == "minifi"); })("minify");
-
pure nothrow @safe string stemUtf8(ref scope SnowballStemmer st, scope const(char)[] word);
pure nothrow @safe immutable(ubyte)[] stem(ref scope SnowballStemmer st, scope const(ubyte)[] word);Determine the stem of the given word; allocate from the GC heap.
Only provided for convenience. When possible, you are encouraged to use the other overload, which does not allocate; please refer to it for detailed documentation.
Examples
auto st = SnowballStemmer("en\0"); assert(st.stemUtf8("minify") == "minifi");