Miguel -SpanishInput
Community Tutor
2 Great Ways To Check How Common A Spanish Word Is

I've just begun to watch the TV series "Soy Luna" on Netflix. I'd recommend it for students because:

-It's family friendly

-Has both Mexican and Argentine leads. Mexico and Argentina have a huge cultural influence on the rest of Latin America, so it's important to get used to both varieties of Spanish.

-Has almost 100% matching Spanish subtitles (a huge plus for language learners, as you can check what you hear).

In the first episode, Luna falls into a pool. A rich Argentine woman asks her about her accident in the "pileta". Luna, who is Mexican, says: "I don't know about any pileta, but I almost drowned in the alberca."

In Spain, Ecuador and other places, we don't use pileta or alberca to refer to a swimming pool. We only use "piscina". So, how do you know which word is the most common, the most internationally understood? For beginners, it's especially important to use the "Pareto Principle" and learn the most common words first, and regional variations later. Asking a native won't give you an impartial answer. Everybody thinks their variation is "the standard". So, what do you do? You have two options:

A frequency dictionary

Frequency dictionaries are your friend. Since I'm a Spanish tutor, I have the Kindle version of Routledge's Frequency Dictionary of Spanish, 2018 edition. It's based on a corpus of 2 billion words coming from spoken Spanish, blogs, fiction, news, and more, and the authors tried to balance content from all across the Spanish speaking world. One of my students here on italki purchased the paperback version and is going through the frequency index to fill vocabulary gaps. (And no, they don't pay me to promote it. I just think it's a great resource). So I checked the alphabetic index in this dictionary.

Piscina is ranked as word #4604. Neither alberca nor pileta are included in this dictionary, which, sadly, is limited to the top 5,000 Spanish words (lemma-based).

Free frequency lists

Another option is to check with a free frequency list. For example, the Subtlex movement has produced free subtitle-based frequency lists for several languages. Subtitle-based frequency lists are more likely to reflect actual spoken Spanish than lists based on books and newspapers. These lists, however, haven't been lemmatized, so their content is not suitable for study. Only for checking how common an exact word form is. For example: A frequency dictionary would list the verb "amar" as one entry. A frequency list would instead show the different word forms of "amar" as different entries: "amo", "amé", "amabas", "amamos", etc, without grouping them.

There's a free subtitle-based tool to check how common a particular Spanish word form is. Go to EasyPronunciation.com > More Tools > Word Frequency Counters > Spanish. You can paste any text into the window and this tool will show you how common are the words in this text. The tool will color the words by frequency. If you're a beginner, don't even bother with texts that have a high percentage of words not in the top 5,000.

So, again, checking all three words with this tool, piscina is in the top 3,000, thus colored yellow, while alberca and pileta are not in the top 5,000 and thus are not colored. If you want to know where those two words rank, search for: "Subtlex Spanish" and download the complete list.

Aug 7, 2018 5:00 PM
Comments · 5
2

Thank you Miguel and Susan for the great tips! I've so far just used Google to tell me if a word is popular or not.

Piscina: 345 000 000

Alberca: 115 000 000

Pileta: 18 700 000

, but this is by far inferior to your methods and sometimes if a word is shared across multiple languages or part of the name for popular location, company or something else, the results will be even less reliable.

August 8, 2018
2

Miguel,

This site, by the Real Academia Española,  has the word lists by frequency in Spanish. http://corpus.rae.es/lfrecuencias.html

I took the long list and cut and pasted it into an Excel spreadsheet, with the frequency in the left column and the word in the right column.  I then sorted it alphabetically and saved it.  Now if I am uncertain about how common a word is, I can look it up easily (and without cost.)  I can use the ¨find¨ box to make it fast. 

According to this list by the RAE, alberca is number 32854, pileta is number 28416, and  piscina is 6321.  The list is very long because different forms of the words are counted as different words (piscinas is number 18224.)  I used the list when I was doing flashcards to avoid spending time memorizing words that were less frequent than the 30,000th for example.  However, I still prefer just asking my tutors if they think a word is common or uncommon, because they can often do as you have done here, and tell me the versions they are aware of that are used in other countries.  Occasionally, like now, I will look something up just out of curiosity.

August 8, 2018
1

Personally, I like to use ngrams to show me how word frequencies change over time.

Here's an example:

goo.gl/mF4trN

However, that doesn't help in finding regional differences.  For languages that I am familiar with, I usually try to find regional corpora.

For instance, for American English I use this corpus:https://corpus.byu.edu/coca/

And for British English I use this corpus: <cite class="iUh30">https://corpus.byu.edu/bnc/</cite>;

<cite class="iUh30">And for Canadian English I use this corpus: https://corpus.byu.edu/can/
</cite>

August 8, 2018
1
no hay "soy luna" en Netflix de estados unidos.
August 7, 2018

Thanks, everyone! Wow! So much collective wisdom!


@Susan: Thanks a lot for that list. I wonder how they compiled it, however. It seems to be based only on a written Spanish corpus (not including spoken Spanish), because "ustedes" is #1399 and "vosotros" is #5364, while both words are in the top 1,000 according to subtitle-based frequencies.

EDIT: I've just found the explanation of how it was compiled here. They did include spoken language transcriptions from radio and TV:

http://corpus.rae.es/creanet.html

And here you can actually look up how many instances a word had per country, see in which fields it was used and even retrieve snippets of text of where it was used:

http://www.rae.es/recursos/banco-de-datos/crea

@Timo: I've also used Google a lot for this purpose while learning other languages!

@Chris: Thanks a lot! That's also great!

August 8, 2018