public class NgramExtractor
extends java.lang.Object
| Modifier and Type | Field and Description |
|---|---|
private @Nullable NgramFilter |
filter |
private @NotNull java.util.List<java.lang.Integer> |
gramLengths |
private @Nullable java.lang.Character |
textPadding |
| Modifier | Constructor and Description |
|---|---|
private |
NgramExtractor(@NotNull java.util.List<java.lang.Integer> gramLengths,
@Nullable NgramFilter filter,
@Nullable java.lang.Character textPadding) |
| Modifier and Type | Method and Description |
|---|---|
private void |
_extractCounted(java.lang.CharSequence text,
int gramLength,
int len,
java.util.Map<java.lang.String,java.lang.Integer> grams) |
private java.lang.CharSequence |
applyPadding(java.lang.CharSequence text) |
@NotNull java.util.Map<java.lang.String,java.lang.Integer> |
extractCountedGrams(@NotNull java.lang.CharSequence text) |
@NotNull java.util.List<java.lang.String> |
extractGrams(@NotNull java.lang.CharSequence text)
Creates the n-grams for a given text in the order they occur.
|
NgramExtractor |
filter(NgramFilter filter) |
java.util.List<java.lang.Integer> |
getGramLengths() |
static NgramExtractor |
gramLength(int gramLength) |
static NgramExtractor |
gramLengths(java.lang.Integer... gramLength) |
private static int |
guessNumDistinctiveGrams(int textLength,
int gramLength)
This is trying to be smart.
|
NgramExtractor |
textPadding(char textPadding)
To ensure having border grams, this character is added to the left and right of the text.
|
@NotNull private final @NotNull java.util.List<java.lang.Integer> gramLengths
@Nullable private final @Nullable NgramFilter filter
@Nullable private final @Nullable java.lang.Character textPadding
private NgramExtractor(@NotNull
@NotNull java.util.List<java.lang.Integer> gramLengths,
@Nullable
@Nullable NgramFilter filter,
@Nullable
@Nullable java.lang.Character textPadding)
public static NgramExtractor gramLength(int gramLength)
public static NgramExtractor gramLengths(java.lang.Integer... gramLength)
public NgramExtractor filter(NgramFilter filter)
public NgramExtractor textPadding(char textPadding)
Example: when textPadding is a space ' ' then a text input "foo" becomes " foo ", ensuring that n-grams like " f" are created.
If the text already has such a character in that position (eg starts with), it is not added there.
textPadding - for example a space ' '.public java.util.List<java.lang.Integer> getGramLengths()
@NotNull
public @NotNull java.util.List<java.lang.String> extractGrams(@NotNull
@NotNull java.lang.CharSequence text)
Example: extractSortedGrams("Foo bar", 2) => [Fo,oo,o , b,ba,ar]
text - @NotNull
public @NotNull java.util.Map<java.lang.String,java.lang.Integer> extractCountedGrams(@NotNull
@NotNull java.lang.CharSequence text)
private void _extractCounted(java.lang.CharSequence text,
int gramLength,
int len,
java.util.Map<java.lang.String,java.lang.Integer> grams)
private static int guessNumDistinctiveGrams(int textLength,
int gramLength)
private java.lang.CharSequence applyPadding(java.lang.CharSequence text)