API Reference ·

Base.size
EmbeddingsAnalysis.analogy
EmbeddingsAnalysis.compress
EmbeddingsAnalysis.compressedwordvectors
EmbeddingsAnalysis.conceptnet2wv
EmbeddingsAnalysis.cosine
EmbeddingsAnalysis.cosine_similar_words
EmbeddingsAnalysis.cosine_vec
EmbeddingsAnalysis.in_vocabulary
EmbeddingsAnalysis.index
EmbeddingsAnalysis.pca_reduction
EmbeddingsAnalysis.similarity
EmbeddingsAnalysis.similarity_order
EmbeddingsAnalysis.vocab_reduction
EmbeddingsAnalysis.write2disk
EmbeddingsAnalysis.write2disk
Word2Vec.analogy_words
Word2Vec.get_vector
Word2Vec.vocabulary

EmbeddingsAnalysis.compress — Method.

compress(wv [;kwargs...])

Compresses wv::WordVectors by using array quantization.

Keyword arguments

sampling_ratio::AbstractFloat specifies the percentage of vectors to use

for quantization codebook creation

k::Int number of quantization values for a codebook
m::Int number of codebooks to use
method::Symbol specifies the array quantization method
distance::PreMetric is the distance

Other keyword arguments specific to the quantization methods can also be provided.

EmbeddingsAnalysis.compressedwordvectors — Method.

compressedwordvectors(filename [,type=Float64][; kind=:text])

Generate a CompressedWordVectors type object from a file.

Arguments

filename::AbstractString the embeddings file name
type::Type type of the embedding vector elements; default Float64

Keyword arguments

kind::Symbol specifies whether the embeddings file is textual (:text)

or binary (:binary); default :text

EmbeddingsAnalysis.conceptnet2wv — Method.

conceptnet2wv(cptnet, language)

Converts a ConceptNet object, cptnet to a WordVectors object. The language of the word embeddings has to be specified explicitly as a Symbol or Languages.Language (Conceptnet embeddings can be multilingual).

EmbeddingsAnalysis.cosine_vec — Function.

cosine_vec(wv::WordVectors, wordvector, n=10 [;vocab=nothing])

Compute the cosine similarities and return best n positions and calculated values between wordvector and the word vectors from wv. A vocabulary mask vocab can be specified to consider only a subset of word vectors.

EmbeddingsAnalysis.pca_reduction — Method.

pca_reduction(wv::WordVectors, rdim=7, outdim=size(wv.vectors,1); [do_pca=true])

Post-processes word embeddings wv by removing the first rdim PCA components from the word vectors and also reduces the dimensionality to outdim through a subsequent PCA transform, if do_pca=true.

Arguments

wv::WordVectors the word embeddings
rdim::Int the number of PCA components to remove from the data (default 7)
outdim::Int the output dimensionality of the data after the PCA dimensionality reduction; it is performed only if do_pca=true and the default value is the same as that of the input embeddings i.e. no reduction

Keyword arguments

do_pca::Bool whether to perform a PCA transform of the post-processed data (default true)

References:

Vikas Raunak "Simple and effective dimensionality reduction for word embeddings", NIPS 2017 Workshop

EmbeddingsAnalysis.similarity_order — Method.

similarity_order(wv::WordVectors, alpha=-0.65)

Post-processes the word embeddings wv so that the embeddings capture more information than directly apparent through a linear transformation that adjusts the similarity order of the model. The function returns a new WordVectors object containing the processed embeddings.

Arguments

wv::WordVectors the word embeddings

alpha::AbstractFloat the α parameter of the algorithm (default -0.65)

References:

Artetxe et al. "Uncovering divergent linguistic information in word embeddings with lessons for intrinsic and extrinsic evaluation", 2018

EmbeddingsAnalysis.vocab_reduction — Method.

vocab_reduction(wv::WordVectors, seed, nn)

Produces a reduced vocabulary version of wv by removing all but the nn nearest neighbors of each word present in the vocabulary seed.

EmbeddingsAnalysis.write2disk — Method.

write2disk(filename::AbstractString, wv::CompressedWordVectors [;kind=:binary])

Writes compressed embeddings to disk.

Arguments

filename::AbstractString the embeddings file name
wv::CompressedWordVectors the embeddings

Keyword arguments

kind::Symbol specifies whether the embeddings file is textual (:text)

or binary (:binary); default :binary

EmbeddingsAnalysis.write2disk — Method.

write2disk(filename::AbstractString, wv::WordVectors [;kind=:binary])

Writes embeddings to disk.

Arguments

filename::AbstractString the embeddings file name
wv::WordVectors the embeddings

Keyword arguments

kind::Symbol specifies whether the embeddings file is textual (:text)

or binary (:binary); default :binary

Word2Vec.analogy_words — Function.

analogy_words(cwv, pos, neg, n=5)

Return the top n words computed by analogy similarity between positive words pos and negaive words neg. from the CompressedWordVectors cwv.

Word2Vec.get_vector — Method.

get_vector(cwv, word)

Return the vector representation of word from the CompressedWordVectors cwv.

Word2Vec.vocabulary — Method.

vocabulary(cwv)

Return the vocabulary as a vector of words of the CompressedWordVectors cwv.

Base.size — Method.

size(cwv)

Return the word vector length and the number of words as a tuple.

EmbeddingsAnalysis.analogy — Method.

analogy(cwv, pos, neg, n=5)

Compute the analogy similarity between two lists of words. The positions and the similarity values of the top n similar words will be returned. For example, king - man + woman = queen will be pos=["king", "woman"], neg=["man"].

EmbeddingsAnalysis.cosine — Function.

cosine(cwv, word, n=10)

Return the position of n (by default n = 10) neighbors of word and their cosine similarities.

EmbeddingsAnalysis.cosine_similar_words — Function.

cosine_similar_words(cwv, word, n=10)

Return the top n (by default n = 10) most similar words to word from the CompressedWordVectors cwv.

EmbeddingsAnalysis.in_vocabulary — Method.

in_vocabulary(cwv, word)

Return true if word is part of the vocabulary of the CompressedWordVector cwv and false otherwise.

EmbeddingsAnalysis.index — Method.

index(cwv, word)

Return the index of word from the CompressedWordVectors cwv.

EmbeddingsAnalysis.similarity — Method.

similarity(cwv, word1, word2)

Return the cosine similarity value between two words word1 and word2.