SoFunction
Updated on 2024-11-14

Python implementation of word2Vec model process analysis

This article introduces the Python word2Vec model process analysis, the text of the sample code through the introduction of the very detailed, for everyone's learning or work has a certain reference learning value, the need for friends can refer to the following

import gensim, logging, os
(format='%(asctime)s : %(levelname)s : %(message)s', level=)
import nltk

corpus = ()

fname = 'brown_skipgram.model'
if (fname):
  # load the file if it has already been trained, to save repeating the slow training step below
  model = .(fname)
else:
  # can take a few minutes, grab a cuppa
  model = .Word2Vec(corpus, size=100, min_count=5, workers=2, iter=50)
  (fname)

words = "woman women man girl boy green blue".split()
for w1 in words:
  for w2 in words:
    print(w1, w2, (w1, w2))

print(model.most_similar(positive=['woman', ''], topn=1))
print(('woman', 'girl'))girl

The model proposed in '13, word2vec, is already encapsulated in the gensim module, so let's start building the model directly

This is the process of building the model, and at the end the statement saving Word2vec will appear, representing that the model has been successfully built

This is the word that came back after the keywords gorvement and news were typed in -- administration, and the correlation between them is 0.508.

When I enter women and man, they show a correlation of 0.638, which is already a very high number.

For what it's worth, the corpus I used was the brown corpus directly from nltk. It probably includes some data from news and such.

If you're interested, you can build your own model and pass in different corpora to calculate the relevance of some terms.

This is the whole content of this article.