libstemmer
-
class SnowballStemmerException: object.Exception;
-
struct SnowballStemmer;
Encapsulates a stemmer, providing safe interface to it.
This struct is non-copyable. If this makes you unhappy, allocate it with
new
orsafeRefCounted
.-
enum Encoding: string;
-
utf8
iso8859_1
iso8859_2
koi8r
-
-
static pure nothrow @nogc @trusted immutable(string)[] algorithms();
Get a list of supported stemming algorithms (i.e., languages).
Only the canonical name of each algorithm is returned:
"english\0"
is there, but"en\0"
is not. See modules.txt to get an impression of what this list may look like.
-
pure @safe this(scope const(char)[] algorithm, scope Encoding encoding = Encoding.utf8) scope;
Construct a stemmer with the specified algorithm and input encoding.
algorithm
andencoding
are case-sensitive and must be zero-terminated (e.g., you have to pass"en\0"
, not"en"
); that is asserted.
This constructor is unavailable in betterC mode.Throws
SnowballStemmerException
on an unknown algorithm or encoding or an unsupported combination of those. -
pure nothrow @nogc @trusted _Bool reset(scope const(char)[] algorithm, scope Encoding encoding = Encoding.utf8) scope;
Try to change the algorithm and encoding used by this stemmer.
algorithm
andencoding
are case-sensitive and must be zero-terminated (e.g., you have to pass"en\0"
, not"en"
); that is asserted.
The return type of this method implicitly converts tobool
. Ifalgorithm
orencoding
are unknown or their combination is unsupported, thenfalse
is returned and no changes are made. -
pure nothrow @nogc @safe bool isNull() const scope;
-
pure nothrow @nogc @system this(sb_stemmer* handle) scope;
Acquire ownership over a low-level stemmer.
It will be deleted automatically, hence
@system
. -
pure nothrow @nogc @system inout(sb_stemmer)* handle() inout scope;
Get the low-level stemmer.
Manipulating it directly may interfere with
SnowballStemmer
, hence@system
. -
pure nothrow @nogc @trusted sb_stemmer* release() scope;
Extract the low-level stemmer.
From now on, you are responsible for deleting it.
-
-
@safe auto stemUtf8(alias callback)(ref scope SnowballStemmer st, scope const(char)[] word);
@safe auto stem(alias callback)(ref scope SnowballStemmer st, scope const(ubyte)[] word);Determine the stem of the given word.
The stem is passed to
callback
, which it must not escape. (If you compile with-dip1000
, the compiler will enforce that.) Also,callback
has to be@safe
or@trusted
. Whatever it returns will be passed back to the caller.
Duringcallback
invocation, you cannot stem another word with the same stemmer. (Doing so will result in assertion failure.)
stemUtf8
does not actually require the stemmer to be created withEncoding.utf8
; it is merely a convenience function that insertschar[ ] <-> ubyte[ ]
casts. It can be used interchangeably withstem
; but there is a convention in the D community thatchar[ ]
contains UTF-8 andubyte[ ]
holds arbitrary binary data.
Note these are not member functions (to avoid deprecations about dual context). Thanks to UFCS, most of the time there is no difference.Examples
auto st = SnowballStemmer("en\0"); st.stemUtf8!((stem) { assert(stem == "minifi"); })("minify");
-
pure nothrow @safe string stemUtf8(ref scope SnowballStemmer st, scope const(char)[] word);
pure nothrow @safe immutable(ubyte)[] stem(ref scope SnowballStemmer st, scope const(ubyte)[] word);Determine the stem of the given word; allocate from the GC heap.
Only provided for convenience. When possible, you are encouraged to use the other overload, which does not allocate; please refer to it for detailed documentation.
Examples
auto st = SnowballStemmer("en\0"); assert(st.stemUtf8("minify") == "minifi");