Coding TypeGym: A Rust Terminal Typing Trainer

In Part 5 we extend TypeGym’s content system with a new option: Markov Chain text generation.

Instead of picking random words from a list we build a small statistical model from a text file (book) and generate new practice text that resembles the original source but isn’t a direct copy.

This is a perfect fit for a typing trainer: the text stays varied but still feels “language-like”.

What We Covered

Added a new TextSource::MarkovChain(path) variant
Implemented a markov module with a MarkovChain struct:
- transition table (word1, word2) -> [next_words...]
- list of valid starter pairs
Built the chain by splitting the text into sentences and extracting word triplets
Generated new text by walking transitions, chunking output into terminal-friendly lines
Added a lightweight cleanup step so quotes/dashes behave nicely in ASCII terminals
Wired everything into get_text() so the app can generate Markov text each session

Design Insights

Markov Chains are “procedural text” without the AI baggage

A Markov chain is simple:

look at the previous N words
randomly pick the next word based on what you saw in the training data

In our implementation we use bigrams (2-word keys), which is the sweet spot for a tutorial:

still easy to implement and explain
produces surprisingly coherent local structure
doesn’t require any external ML libraries

Sentence splitting matters more than you’d think

If you build transitions across punctuation blindly, the chain tends to drift into nonsense faster and your starters become low quality.

By splitting into sentences first we:

get better starter pairs
avoid weird cross-sentence transitions
make the generated output feel more like real phrases

Implementation Highlights

1) Data structures: starters + transitions

We represent each state as a pair of words:

type Key = (String, String);

pub struct MarkovChain {
    pub transitions: HashMap<Key, Vec<String>>,
    pub starters: Vec<Key>,
}

starters stores the first two words of each training sentence
transitions stores all possible “next word” choices for a given (prev, current) pair

We deliberately store a Vec of choices, not a frequency map because repeating values naturally encodes weighting.

2) Building the chain from word triplets

For each sentence:

split to words
cleanup punctuation / weird quotes
keep only non-empty words
push the first two as a starter
slide a window of 3 words and insert transitions

Conceptually:

(w0, w1) -> w2
(w1, w2) -> w3
(w2, w3) -> w4
…and so on.

3) Generating text by walking transitions

Generation is:

pick a random starter (prev, cur)
repeatedly sample next from transitions[(prev, cur)]
shift the window (prev = cur, cur = next)
stop when there is no transition
repeat until we have enough words

Then we format it for terminal typing by chunking:

10 words per line
newline between lines

This keeps the output readable and consistent for the UI.

4) Cleaning up quotes/dashes for terminal output

We convert a few common Unicode punctuation characters into ASCII equivalents:

curly quotes → " / '
em dash → -

Then we trim common wrappers (" ' _ () []) around tokens.

This prevents “invisible weird Unicode punctuation” from showing up in practice text and keeps the experience consistent across terminals.

5) Wiring it into `TextSource`

Finally, we connect it all to the existing text pipeline:

TextSource::MarkovChain(ref path) => generate_markov_chain(path),

Where generate_markov_chain reads the input file, builds the chain and returns generated text.

External Resources

Markov Chain - Wikipedia entry on Markov Chain
Markov Chain Text Generation - an article explaining the approach, the resource we used for visual aid via provided image of transitions

What’s Next?

Now that TypeGym can generate natural-ish text, we have just one more step to implement

add CLI options to configure input files for different modes, manipulate output structure (height / width / words count / etc)

Project Code

You will find the complete source code here: typegym