[TensorFlow 2.0] Word Embeddings — Part 1

Machine learning models take vectors (arrays of numbers) as input. When working with text, the first thing we must do is to come up with a strategy to convert strings to numbers (or to “vectorize” the text) before feeding it to the model.

Photo by Sincerely Media on Unsplash
  • Segment text into words, and transform each word into a vector.
  • Segment text into characters, and transform each character into a vector.
  • Extract n-grams of words or characters, and transform each n-gram into a vector. N-grams are overlapping groups of multiple consecutive words or characters.

One-hot encodings

['The', 'cat', 'sat', 'on', 'the', 'mat.']
['The', 'dog', 'ate', 'my', 'homework.']
The 
cat
sat
on
the
mat.
The
dog
ate
my
homework.

References

--

--

Ydobon is nobody.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store