The High Frequency Word List - A Slightly Humorous Survey

Computers allow us to count words fast, and that means people have figured out the most frequently used words in Spanish and 50 other languages. Guess what? The most frequently used Spanish words do not line up with the most frequently taught  Spanish words in most Spanish classes.  And the same applies to other languages.  Looking at the count results helps us understand why the beginner TPRS stories so often follow the adventure of someone wanting something and going several places to find it!  (Here I look at Spanish word frequencies, but the conclusions I draw will  apply to teaching any second language.)

A most interesting Spanish word frequency list is generated by Matthias Buchmeier and found on Wiktionary(1). Buchmeier analyzed 27.4 million words of TV and movie subtitle scripts. That pile of documents is called a corpus.  From the corpus, Buchmeier computer-generated a list of the top 1000 most frequently used Spanish words.

Bryce Hedstrom, a TPRS teacher, took Buchmeier’s Wiktionary list and truncated it at 400 words(2). Hedstrom’s reasoning?  The top 300 words account for 65% of all reading material.  It is common for people to use only 500-800 different words in a day.  So a Spanish student will be a fairly strong speaker if he or she has mastered the 400 most common words.

I took Bryce’s list and focussed on verbs.  Here is what I found, in table form:

   Most Frequently Used Spanish Verbs and Where They Appear in a Word Frequency List (*translation below)

Screenshot 2016-11-07 16.45.13.png

This table is not perfect, but it gives you a general idea of what is going on with word frequency. Here is how to read the table: The fact that ser is in the first column means that some form of ser (it happens to be es) is in the top 12 most frequent words and has an actual frequency of more than 10,000 ppm in this corpus. Other forms of ser crop up fairly soon in the list (For example era is #73) but only the first form, es,  is recorded in my table. Another example:  the most frequently occurring form of ir (happens to be vamos) is between #13 and #50 on the frequency list and occurs with a frequency between 2200 and 10,000 ppm.

So, for those of us who grew up studying or teaching Spanish verb conjugation, consider this.  We probably studied ‘Regular -Ar Verb Conjugation’ in Chapter 3, right after Ch 1. ‘Ser’, Ch 2a. ‘Estar’, and Ch 2b. ‘Ser Vs. Estar’.  In Chapter 3 we conjugated hablar, mirar, practicar, preparar, estudiar. But LOOK! The 14 most frequently used Spanish verbs are all irregular in some way! So maybe we should study all of those irregulars early on, just as we study ser and estar early on in the year.

                                                                     Using…

                                                                     Using the high frequency verbs!

The regular verbs appear late on the table, and with much, much lower frequencies than the common irregulars. The first regular verb is mira. According to the Buchmeier’s list, it’s actually #148, and with a frequency of 825 ppm, mira is used only 1/3 as often as tengo (2349 ppm).  Entrar, in column 6, has a frequency of 261 ppm, only 1/9 the frequency of tengo.  In fact, of the top 41 verbs used in the Spanish language, only about 14 of them conjugate with perfect regularity.   The average frequency of use of the top 9 (irregular) verb words is 3400 ppm. The top 9 perfectly regular verb words average a frequency of only 1/6 of that, or 600 ppm.

As a Spanish teacher I have to laugh.  In the list of top 400 words only one of my go-to model teaching verbs appears - hablar.  Thankfully, a few of my favorite words for practicing conjugation do make the top 1000: comer is #442, vivir is #477 and buscar is #533.  But escribir, preparar, estudiar and beber do not even make the top 1000!

So again, based on word frequency, when I teach I should consider leading off with the irregular verbs, and stay with them for a long time!


Brain Break! Weird and fun points to ponder about those first 400 words:

1. I am happy to see that amor (279) and cariño (267) appeared, and odio did not.

2. Amigo is #168, and enemigo is only #991.

3. Lots of family words appear: familia (261), madre, padre, mamá, papá, hermano. Most of the rest - tío, abuelo, hija - make the top 1000.

4. Some common word pairs survived the cut intact: chico/chica, arriba/abajo, día/noche, feliz/triste, nuevo/viejo, antes/después, hombre/mujer, bueno/malo.

5. Some words appeared without their partner: esposa, grande, cerca.

6. Uno, dos, and tres made the cut, but after tres the numbers stop.

7. I am pleased to see the word dios got a spot, #87.

8. And some polite words floated through to warm a parent’s heart: hola (84), gracias (64), and por(12) favor (93).


By the way, there are other frequency lists and they each differ slightly.  For example, one by Mark Davies, A Frequency Dictionary of Spanish(3),  used a corpus based on 1/3 spoken, 1/3 fiction and 1/3 nonfiction works. Davies groups all the forms of each verb together in one count.  But for our purposes - deciding what to teach students - the results are similar.

So, we have perused the Spanish word frequency list, paying special attention to verbs,  and summarized the findings in table form.  Based on the words in the first four columns of the table, we can conclude:

If we want to teach high frequency Spanish then we need to talk about people or creatures who are good and/or bad, who are happy or sad, who are some place, and who have stuff, want stuff, can or can’t do stuff, and know, make and believe stuff. These individuals are interesting and dynamic. They talk, look, need, like, hope and feel.  

         "¿Quieres un s'more?  ¡Te hago uno!"

         "¿Quieres un s'more?  ¡Te hago uno!"

That’s our foundation - be, want, have, need, like, how, feel, believe, talk.  Build that strong foundation and our students will be on their way to fluency. Of course, we don’t need to ignore eating, drinking and studying - or flying, singing, vomiting or kissing - just gently fold those words into the foundation we have built.

Looking at these frequency lists really surprised me, encouraging me to rethink how much time I spend on different aspects of Spanish language in my classes.  Even after moving to TPRS, away from a traditional textbook and teaching method, my tendency was still to focus on the body of regular verbs.  Seeing the difference in frequencies encourages me to slow down - to stay with, and come back to quiere, tiene and puede for as long as necessary to solidify those structures in my students’ brains. For any language teacher, these frequency lists can be helpful input when choosing the subjects and focus of your class.

NOTES:

Most Frequently Used Spanish Verbs and Where They Appear in a Word Frequency List (translated)*