Skip to content

jakemclelland/frequenSee

Repository files navigation

Where language analysis meets cultural insights


Why

Many of the "most common words" list I've ever seen for language learners were based largely on works such as the Brown Corpus research study compiled in the late 1960's at the Brown University in Providence Rhode Island. But English has changed a lot since then, and much of the world has their own culturally / regionally unique flavor of English that make most lists of common words not only antiquated, but sometimes useless or even wasteful.

For example, a common list of the 800 supposedly most common words in Armenian translates the English word "parcel" to the Armenian word "ծանրոց", a word rarely used in real daily life. Another similar list for Hindi shows the English word "carraige" translated as "गाडी", but this word really means "vehicle", and by "carraige", in real daily life in India, a person probably means to say "rickshaw" or simply use the English word "car".

Lists like this develop because most "common words" lists start in the learner's own language, instead of the target language. Then translaters are forced to do the best they can to approximate the concept in the target language. Not only is this a very subjective process, but also just because a word is common in the learner's language doesn't mean the same concept will be common in the target language.

The bigger, and more real challenge to a non-native language learner is the fact that the vocabulary used at home (informal and usually intimate and familial or even coloquial) (like "hey hun, pass the butter!") is often vastly different than the vocabulary used at the workplace (more formal and professional) ("John, the web server just went down again, could you reboot it?"), which is still very different than the vocabulary you hear on the news ("Man caught smuggling drugs through airport security"), or on TV ("Set phase pistols to stun!"), or in books ("Oh mister Darcy!"). What I mean is that language is extreamely context sensitive. The top 100 most common words used at home might not really get you very far at work, or in a public setting. You may not really care about learning vocabulary of all the different body parts, unless you plan on working in the medical field, for example, and you may never hear important workplace vocabulary while listening to the news. So, why not target vocabulary learning to the most urgent and specific type of setting that YOU will mostly likely find yourself in today?

Even more to the point, imagine the most practical list of words that's perfectly suited just to you. That list of helpful words will most certainly change over time as you become more fluent, and as native people start expecting more vocabulary from you. In other words, even the most modern generic list of 1000 words will almost certainly contain only a fraction of the words you really need long term, and yet still be filled with tons of words you may never realistically need.

The purpose of this project is to empower you with the ability to find the vocabulary that you need to work on today. If you can collect and enter text that is more specific to the setting you're interested in, be it a household setting, proffessional, technical, or a public forum setting, the more relavant the statistics are to the words you need to learn. This allows you to be more dynamic, able to adapt your vocabulary to your current context and subject of interest.

How to run it

Download the code, and run it in Visual Studio Community Edition or higher. A small UI is used to copy and paste text into the program, which will produce a word frequency and letter frequency chart.