WordCloud.jl
Word cloud (tag cloud or wordle) is a novelty visual representation of text data. The importance of each word is shown with font size or color. Our generator has the following highlights:
- Flexible Any mask, any color, any angle, adjustable density. You can specify the initial position of some words. Or you can pin some words and adjust others, etc.
- Fast 100% in Julia and efficient implementation based on Quadtree & gradient optimization (see Stuffing.jl). The advantage is more obvious when generating large images.
- Exact Words with the same weight have the exact same size. The algorithm will never scale the word to fit the blank.
run showexample(:juliadoc)
to see how to generate the banner
Installation
import Pkg; Pkg.add("WordCloud")
Basic Usage
using WordCloud
using Random
words = [randstring(rand(1:8)) for i in 1:300]
weights = randexp(length(words)) .* 1000 .+ rand(10:100, length(words))
wc1 = wordcloud(words, weights)
generate!(wc1)
paint(wc1, "random.svg")
#Or it could be
wc2 = wordcloud("It's easy to generate word clouds") |> generate!
wc3 = wordcloud(open(pkgdir(WordCloud)*"/res/alice.txt")) |> generate!
More Advanced Usage
using WordCloud
wc = wordcloud(
processtext(open(pkgdir(WordCloud)*"/res/alice.txt"), stopwords=WordCloud.stopwords_en ∪ ["said"]),
mask = loadmask(pkgdir(WordCloud)*"/res/alice_mask.png", color="#faeef8"),
colors = :Set1_5,
angles = (0, 90),
density = 0.55) |> generate!
paint(wc, "alice.png", ratio=0.5, background=outline(wc.mask, color="purple", linewidth=2))
Run the command runexample(:alice)
or showexample(:alice)
to get the result.
More Examples
Training animation
Run the command runexample(:animation)
or showexample(:animation)
to get the result.
Gathering style
Run the command runexample(:gathering)
or showexample(:gathering)
to get the result.
Recolor
Run the command runexample(:recolor)
or showexample(:recolor)
to get the result.
Comparison
Run the command runexample(:compare)
or showexample(:compare)
to get the result.
The variable WordCloud.examples
holds all available examples.
You can also see more examples or try it online.
Algorithm Description
Unlike most other implementations, WordCloud.jl is programmed based on local grayscale gradient optimization. It’s a non-greedy algorithm in which words can be further moved after they are positioned. This means shrinking words is unnecessary, thus the word size can be kept unchanged during the adjustment. In addition, it allows words to be assigned to any initial position whether or not there will be an overlap. This enables the program to achieve the maximum flexibility.
First, raster masks of words are represented as AbstractStackQtree
s. The AbstractStackQtree
forms a pyramid structure. It consists of layers of different pixel density of the original mask. Each layer is generated by reducing the pixel density of the layer below it. In this way, the AbstractStackQtree
can be seen as a set of hierarchical bounding boxes. The value of each pixel of each layer (the node of AbstractStackQtree
) can be FULL
, EMPTY
or MIX
.
Second, we use a top-down method (collision_randbfs
) to detect collision between two AbstractStackQtree
s. On the level 𝑙 and coordinates (𝑎,𝑏), if one tree's node is FULL
and another's is NOT EMPTY
, then these two trees collide at (𝑙,𝑎,𝑏). However, to detect collisions between many trees/objects, pairwise detection would be time-consuming. So, we first locate the objects in hierarchical sub-areas (implemented as linked-quadtree), and then detect the collision between objects within each sub-area and between the objects in the sub-areas and those in their ancestral areas (see batchcollision_qtree
and locate_core!
).
At last, each word in collision pair is moved according to the gray gradient (calculated by whitesum
) near the collision point (𝑙,𝑎,𝑏), that is, moving the word away from EMPTY
region. This will enlarge space between the two words. Note that, at least one of the eight pixels around the collision point is EMPTY
or MIX
, otherwise the collision would occur at the layer 𝑙−1. After moving the words, the AbstractStackQtrees
should be rebuilt for the next round of collision detection.
- 排序 & 预放置
- 基于四叉树碰撞检测
- 根据局部灰度梯度位置调整(训练迭代)
- 引入动量加速训练
- 分代调整以优化性能
- 定位树批量碰撞检测(≈O(n))
- LRU优化性能
- 控制字体大小和填充密度的策略
- 重新放置和缩放的策略
- 文字颜色和方向
- 并行计算
linux添加中文字体
mv wqy-microhei.ttc ~/.fonts
fc-cache -vf
配置ffmpeg环境
add /path/to/ffmpeg-4.2.1/lib to ENV["LD_LIBRARY_PATH"]
add /path/to/ffmpeg-4.2.1/bin to ENV["PATH"]