word2vec

What is word2vec?

  1. Word2vec is an approach to create word embeddings.
  2. Word embedding is a representation of a word as a numeric vector.
  3. Except for word2vec there exist other methods to create word embeddings, such as fastText, GloVe, ELMO, BERT, GPT-2, etc.

Screenshot 2025-09-24 at 5.30.57 PM.png
Image: Word2Vec Overview. Source: Stanford CS224N Notes

Model Architecture

Word2vec is based on the idea that a word’s meaning is defined by its context. Context is represented as surrounding words.

Pasted image 20250923162204.png
Image: A word and its context. Image by Author

There are two word2vec architectures proposed in the paper:

Both CBOW and Skip-Gram models are multi-class classification models by definition.

Pasted image 20250923162354.png
Image: CBOW Model: High-level Overview.

Pasted image 20250923162358.png
Image: Skip-Gram Model: High-level Overview.

Word2vec model is very simple and has only two layers:

Pasted image 20250923162605.png
Image: CBOW Model: Architecture in Details.

Pasted image 20250923162628.png
Image: Skip-Gram Model: Architecture in Details.

Data Preparation

It is better to create vocabulary:

vocab = {
     "a": 1,
     "analysis": 2,
     "analytical": 3,
     "automates": 4,
     "building": 5,
     "data": 6,
     ...
}

Pasted image 20250923164056.png
Image: How to create Vocabulary from a text corpus.

Pasted image 20250923164333.png
Image: How to Encode words with Vocabulary IDs.

Objective Function

For each position t=1,,T, predict context words within a window of fixed size m, given center word wt. Data likelihood:
Likelhood=L(θ)=t=1Tmjm,j|ne0P(wt+j|wt;θ)

The objective function is the average negative log likelihood:
J(θ)=1TlogL(θ)=1Tt=1Tmjm,j0logP(wt+j|wt;θ)
Minimizing objective function Maximizing predictive accuracy

For a center word c and a context word o:
P(o|c)=exp(uoTvc)wVexp(uwTvc)

Training Details

Word2vec is trained as a multi-class classification model using Cross-Entropy loss.
TODO: add details on the dataset prep, optimizer choice, ...

Notes:

References