API Reference
compress(wv [;kwargs...])

Compresses wv::WordVectors by using array quantization.

Keyword arguments

  • sampling_ratio::AbstractFloat specifies the percentage of vectors to use

for quantization codebook creation

  • k::Int number of quantization values for a codebook
  • m::Int number of codebooks to use
  • method::Symbol specifies the array quantization method
  • distance::PreMetric is the distance

Other keyword arguments specific to the quantization methods can also be provided.

compressedwordvectors(filename [,type=Float64][; kind=:text])

Generate a CompressedWordVectors type object from a file.

Arguments

  • filename::AbstractString the embeddings file name
  • type::Type type of the embedding vector elements; default Float64

Keyword arguments

  • kind::Symbol specifies whether the embeddings file is textual (:text)

or binary (:binary); default :text

conceptnet2wv(cptnet, language)

Converts a ConceptNet object, cptnet to a WordVectors object. The language of the word embeddings has to be specified explicitly as a Symbol or Languages.Language (Conceptnet embeddings can be multilingual).

cosine_vec(wv::WordVectors, wordvector, n=10 [;vocab=nothing])

Compute the cosine similarities and return best n positions and calculated values between wordvector and the word vectors from wv. A vocabulary mask vocab can be specified to consider only a subset of word vectors.

pca_reduction(wv::WordVectors, rdim=7, outdim=size(wv.vectors,1); [do_pca=true])

Post-processes word embeddings wv by removing the first rdim PCA components from the word vectors and also reduces the dimensionality to outdim through a subsequent PCA transform, if do_pca=true.

Arguments

  • wv::WordVectors the word embeddings
  • rdim::Int the number of PCA components to remove from the data (default 7)
  • outdim::Int the output dimensionality of the data after the PCA dimensionality reduction; it is performed only if do_pca=true and the default value is the same as that of the input embeddings i.e. no reduction

Keyword arguments

  • do_pca::Bool whether to perform a PCA transform of the post-processed data (default true)

References:

similarity_order(wv::WordVectors, alpha=-0.65)

Post-processes the word embeddings wv so that the embeddings capture more information than directly apparent through a linear transformation that adjusts the similarity order of the model. The function returns a new WordVectors object containing the processed embeddings.

Arguments

  • wv::WordVectors the word embeddings

alpha::AbstractFloat the α parameter of the algorithm (default -0.65)

References:

vocab_reduction(wv::WordVectors, seed, nn)

Produces a reduced vocabulary version of wv by removing all but the nn nearest neighbors of each word present in the vocabulary seed.

write2disk(filename::AbstractString, wv::CompressedWordVectors [;kind=:binary])

Writes compressed embeddings to disk.

Arguments

  • filename::AbstractString the embeddings file name
  • wv::CompressedWordVectors the embeddings

Keyword arguments

  • kind::Symbol specifies whether the embeddings file is textual (:text)

or binary (:binary); default :binary

write2disk(filename::AbstractString, wv::WordVectors [;kind=:binary])

Writes embeddings to disk.

Arguments

  • filename::AbstractString the embeddings file name
  • wv::WordVectors the embeddings

Keyword arguments

  • kind::Symbol specifies whether the embeddings file is textual (:text)

or binary (:binary); default :binary

analogy_words(cwv, pos, neg, n=5)

Return the top n words computed by analogy similarity between positive words pos and negaive words neg. from the CompressedWordVectors cwv.

get_vector(cwv, word)

Return the vector representation of word from the CompressedWordVectors cwv.

vocabulary(cwv)

Return the vocabulary as a vector of words of the CompressedWordVectors cwv.

Base.sizeMethod.
size(cwv)

Return the word vector length and the number of words as a tuple.

analogy(cwv, pos, neg, n=5)

Compute the analogy similarity between two lists of words. The positions and the similarity values of the top n similar words will be returned. For example, king - man + woman = queen will be pos=["king", "woman"], neg=["man"].

cosine(cwv, word, n=10)

Return the position of n (by default n = 10) neighbors of word and their cosine similarities.

cosine_similar_words(cwv, word, n=10)

Return the top n (by default n = 10) most similar words to word from the CompressedWordVectors cwv.

in_vocabulary(cwv, word)

Return true if word is part of the vocabulary of the CompressedWordVector cwv and false otherwise.

index(cwv, word)

Return the index of word from the CompressedWordVectors cwv.

similarity(cwv, word1, word2)

Return the cosine similarity value between two words word1 and word2.