How To Make Word Cloud In Python
Word cloud
Word cloud helps to know which words are common for example when analyzing what words are frequently used in a conversation or what words are used by a certain author. We use word cloud a lot in natural language processing, before we do word cloud it is important to remove stopwords which are common words like the, is, that, are, etc commonly used in English. In this post, we shall be looking at how to do word clouds using python. A video explanation can be found here
How it works
In word clouds, the most frequent words take on the larger size. we can also use different colours to easily distinguish words.
Ways to make word clouds
we can: Make word clouds of different shapes Make word cloud with different colours Make clouds with a limited number of words to avoid clutter
Implementation
Import Packages
These are the packages we shall need to make our word cloud. if you don’t have them installed you should install them before proceeding. world cloud has the most common words used and they are the stopwords. we import them so we don’t have to manually remove them.
import matplotlib.pyplot as plt
from wordcloud import WordCloud,STOPWORDS
import pandas as pd
Get data
We use data in text format. since my data is in CSV I turned it into dataframe and I used contents on a dataframe and turn them into strings using the join function. I make the words to be in lowercase for consistency since one word can be written using different cases and python is case sensitive. you should also remove nulls at this step if you have them in your dataset.
# reading the csv file as a DataFrame
df = pd.read_csv("data.csv")
# make a string
words=''.join(df.Content.astype(str)).lower()
Make word cloud
To create a word cloud we use the WordCloud function which takes in a number of parameters but the ones we are using are:
- stopwords - contains the words we don’t want to consider
- max_words - the maximum number of words you want to be in the word cloud
- background_color - the colour of the word cloud background
- width - the width of the word cloud
- height - the height of the word cloud after passing in our parameters we then generate the word cloud using generate a function that takes in the text/words we are interested in.
# defining the stop words
stopwords = set(STOPWORDS)
# making the word cloud
wordcloud = WordCloud(stopwords = stopwords,
max_words =20,
min_font_size = 10,
background_color = 'white',
width = 800,
height = 800,
).generate(words)
We use matplotlib’s imshow to visualize the word cloud. we can and other functionalities like removing the axis, save the image, add title and so on.
plt.figure(figsize = (8, 8), facecolor = None)
plt.imshow(wordcloud, interpolation = "bilinear")
plt.axis("off")
plt.tight_layout(pad = 0)
#plt.savefig("wordcloud.png")
plt.show()

Add words to leave out(Stopwords)
We can also update stopwords and remove the words we don’t want to visualize. we use the update function and pass in a list of words we want to leave out. in the example below, I decided to leave out “quality”, “pwd”, “amp”.
## list of stop words
stopwords = set(STOPWORDS)
# Create stopword list:
stopwords.update(["quality", "pwd", "amp"])
# Generate a word cloud image
wordcloud = WordCloud(stopwords=stopwords,
max_words =20,
min_font_size = 10,
width = 800,
height = 800,
background_color="white").generate(words)
# Display the generated image:
# the matplotlib way:
plt.figure(figsize = (8, 8), facecolor = None)
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()

Add image mask
We can change the shape of the word cloud using images of different shapes. it is advised to use an image with white background to have the desired shape. here we are going to use two more libraries. pillow to read the image and NumPy to turn it into a NumPy array that can then be displayed/read by matplotlib.
from PIL import Image
import numpy as np
from matplotlib.pyplot import imshow
#display mask we are going to use
im =Image.open('image.png')
imshow(np.asarray(im))
plt.axis("off")
(-0.5, 1023.5, 767.5, -0.5)

create mask
We create a mask from the image then mask into Wordcloud to make the word cloud have the shape of our image.
mask = np.array(Image.open('image.png'))
# Generate a word cloud image
wordcloud = WordCloud(stopwords=stopwords,
mask=mask,
max_words =20,
min_font_size = 10,
width = 800,
height = 800,
#width=mask.shape[1],
#height=mask.shape[0],
background_color="white").generate(words)
# Display the generated image:
# the matplotlib way:
plt.figure(figsize = (8, 8), facecolor = None)
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()

Manipulate color
We can also manipulate the colour of the word cloud. In this section, we are looking at one way of doing it. we are going to make the word cloud take on the colour of our image. We do this using ImageColorGenerator and passing the generated color using color_func.
im =Image.open('music.png')
imshow(np.asarray(im))
plt.axis("off")
(-0.5, 1023.5, 767.5, -0.5)

from wordcloud import ImageColorGenerator
mask = np.array(Image.open('music.png'))
mask_color = ImageColorGenerator(mask)
# Generate a word cloud image
wordcloud = WordCloud(stopwords=stopwords,
mask=mask,
max_words =20,
min_font_size = 10,
width = 800,
height = 800,
color_func=mask_color,
background_color="white").generate(words)
# Display the generated image:
# the matplotlib way:
plt.figure(figsize = (8, 8), facecolor = None)
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()
