Visualising NLP Internals
The cds.nlp.viz module renders the three things a learner most wants to
see when reading about transformers — attention, embeddings, and the
training-loss curve — as ASCII, so you need no plotting backend. Every
renderer returns a str, which means they compose under print(), log
cleanly, and are trivially testable.
1. The training-loss curve
from cds.nlp import render_training_curve
losses = [3.5, 3.1, 2.8, 2.55, 2.3, 2.1, 1.95, 1.82, 1.7, 1.6]
print(render_training_curve(losses, width=50, height=10))
The curve is min-max normalised to the canvas. A single point or an
all-equal series is handled safely (no divide-by-zero); narrow widths
(width < 10) are also safe — the x-axis step label right-aligns rather
than raising a format error.
2. The attention heatmap
from cds.nlp import render_attention_heatmap
attn = [[0.7, 0.2, 0.1],
[0.1, 0.8, 0.1],
[0.05, 0.15, 0.8]]
tokens = ["the", "cat", "sat"]
print(render_attention_heatmap(attn, tokens, tokens))
Weights are normalised across the whole matrix per render, then mapped to
nine ASCII shades (' ' → '#'). The diagonal of a causal or
identity-style attention pattern lights up as the darkest cells. Shape
mismatches between the matrix, the row tokens, and the column tokens each
raise a labelled ValueError so a wrong call is easy to spot.
3. The embedding projection
from cds.nlp import render_embedding_projection
# Six 3-D embedding vectors; imagine six vocabulary entries.
emb = [[1, 0, 0], [0, 1, 0], [0, 0, 1],
[1, 1, 0], [0, 1, 1], [1, 0, 1]]
labels = ["a", "b", "c", "ab", "bc", "ac"]
print(render_embedding_projection(emb, labels=labels, top_n=6))
The projection is real PCA, not a placeholder: the covariance matrix of
the embeddings is built in pure Python, then
cds.math_utils.linalg.power_iteration
recovers the top-2 eigenvectors (the second via deflation). This is the
project's signature "slow but honest" trade-off — the math is exactly what
sklearn.decomposition.PCA does, just without the BLAS.
top_n keeps large vocabularies readable by plotting only the
highest-variance points along PC1 (pass top_n=0 or a negative value to
render every point).
Why ASCII?
Three reasons, in priority order:
- Zero-dependency. No matplotlib import on the default path keeps
cdsinstallable with nothing but the standard library. - Composability. A
strreturn value logs, prints, and diffs in a test exactly like any other value. - Teaching clarity. The renderer source is short enough to read in
one sitting, which is the point of the whole
cds.nlptrack.
For publication-quality plots, export the matrices
(attn_weights, the _pca_2d result, losses) and plot them with your
own toolchain — the data path is the same. The runnable demo at
examples/nlp_viz_demo.py
wires all three renderers to a tiny live tokenizer + embedding pass.