WordCloud.jl

juliadoc

Word cloud (tag cloud or wordle) is a novelty visual representation of text data. The importance of each word is shown with font size or color. Our generator has the following highlights:

Flexible Any mask, any color, any angle, adjustable density. You can specify the initial position of some words. Or you can pin some words and adjust others, etc.
Fast 100% in Julia and efficient implementation based on Quadtree & gradient optimization (see Stuffing.jl). The advantage is more obvious when generating large images.
Exact Words with the same weight have the exact same size. The algorithm will never scale the word to fit the blank.

run showexample(:juliadoc) to see how to generate the banner

Installation

import Pkg; Pkg.add("WordCloud")

Basic Usage

using WordCloud
words = "天地玄黄宇宙洪荒日月盈昃辰宿列张寒来暑往秋收冬藏闰余成岁律吕调阳云腾致雨露结为霜金生丽水玉出昆冈剑号巨阙珠称夜光果珍李柰菜重芥姜海咸河淡鳞潜羽翔龙师火帝鸟官人皇始制文字乃服衣裳推位让国有虞陶唐吊民伐罪周发殷汤坐朝问道垂拱平章"
words = [string(c) for c in words]
weights = rand(length(words)) .^ 2 .* 100 .+ 30
wc = wordcloud(words, weights)
generate!(wc)
paint(wc, "qianziwen.svg")

Run the command runexample(:qianziwen) or showexample(:qianziwen) to get the result.

More Complex Usage

using WordCloud
wc = wordcloud(
    processtext(open(pkgdir(WordCloud)*"/res/alice.txt"), stopwords=WordCloud.stopwords_en ∪ ["said"]), 
    mask = loadmask(pkgdir(WordCloud)*"/res/alice_mask.png", color="#faeef8"),
    colors = :Set1_5,
    angles = (0, 90),
    density = 0.55) |> generate!
paint(wc, "alice.png", ratio=0.5, background=outline(wc.mask, color="purple", linewidth=1))

Run the command runexample(:alice) or showexample(:alice) to get the result.

More Examples

Training animation

Run the command runexample(:animation) or showexample(:animation) to get the result.

Gathering style

Run the command runexample(:gathering) or showexample(:gathering) to get the result.

Recolor

Run the command runexample(:recolor) or showexample(:recolor) to get the result.

Comparison

Run the command runexample(:compare) or showexample(:compare) to get the result.

The variable WordCloud.examples holds all available examples. You can also see more examples or try it online.

Algorithm Description

Unlike most other implementations, WordCloud.jl is programmed based on local grayscale gradient optimization. It’s a non-greedy algorithm in which words can be further moved after they are positioned. This means shrinking words is unnecessary, thus the word size can be kept unchanged during the adjustment. In addition, it allows words to be assigned to any initial position whether or not there will be an overlap. This enables the program to achieve the maximum flexibility.
First, raster masks of words are represented as AbstractStackQtrees. The AbstractStackQtree forms a pyramid structure. It consists of layers of different pixel density of the original mask. Each layer is generated by reducing the pixel density of the layer below it. In this way, the AbstractStackQtree can be seen as a set of hierarchical bounding boxes. The value of each pixel of each layer (the node of AbstractStackQtree) can be FULL, EMPTY or MIX.
Second, we use a top-down method (collision_randbfs) to detect collision between two AbstractStackQtrees. On the level 𝑙 and coordinates (𝑎,𝑏), if one tree's node is FULL and another's is NOT EMPTY, then these two trees collide at (𝑙,𝑎,𝑏). However, to detect collisions between many trees/objects, pairwise detection would be time-consuming. So, we first locate the objects in hierarchical sub-areas (implemented as linked-quadtree), and then detect the collision between objects within each sub-area and between the objects in the sub-areas and those in their ancestral areas (see batchcollision_qtree and locate_core!).
At last, each word in collision pair is moved according to the gray gradient (calculated by whitesum) near the collision point (𝑙,𝑎,𝑏), that is, moving the word away from EMPTY region. This will enlarge space between the two words. Note that, at least one of the eight pixels around the collision point is EMPTY or MIX, otherwise the collision would occur at the layer 𝑙−1. After moving the words, the AbstractStackQtrees should be rebuilt for the next round of collision detection.

排序 & 预放置
基于四叉树碰撞检测
根据局部灰度梯度位置调整（训练迭代）
引入动量加速训练
分代调整以优化性能
定位树批量碰撞检测（≈O(n)）
LRU优化性能
控制字体大小和填充密度的策略
重新放置和缩放的策略
文字颜色和方向
并行计算

linux添加中文字体

mv wqy-microhei.ttc ~/.fonts
fc-cache -vf

配置ffmpeg环境

add /path/to/ffmpeg-4.2.1/lib to ENV["LD_LIBRARY_PATH"]
add /path/to/ffmpeg-4.2.1/bin to ENV["PATH"]

other wordclouds

word_cloud
d3-cloud
wordcloud