In the fields of computational linguistics an n-gram is a sequence of items from a corpus of language.
An n-gram could be any combination of letters, phonemes, syllables or words, etc. Looking at n-grams is useful to help work out how language works and is used in everyday situations.
Google Books offers an n-gram search online. This allows users to see how a word, etc, is used. It offers searches through different corpora including:
- American English (155 billion words)
- British English (34 billion words)
- Fiction (91 billion words)
Searches can also be refined to books for certain decades and periods from the past.
The results typically show the number of occurrences of a search string. For example, looking at the American corpus for various search strings show:
- fill in a form – 1,012 examples
- fill out a form – 5,298 examples
Thus we can say that people are 5 times more likely to fill out a form than fill in a form.
Google books also allows you to create simple graphs showing how word usage has changed over time and comparing different terms. For example, the following graph shows the difference in usage from 1950 to the present for TEFL, TESOL and TESL:
In the expression, n-gram, the n part (usually in italics) stands for one or more (i.e. it’s a number, hence n); the gram goes back to Ancient Greek and means letter.
Thus n-gram means one or more letters.
Google Books n-grams home page