Skip to content

ilanschnell/wfc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Word Frequency Count

This small C program counts the frequency of all words in a text file. Usage (you might have to adjust PYTHON_PREFIX in build.sh before running this):

$ make
$ ./wfc <somefile>

Using Python as a C library

Ilan Schnell - May 2019

In this article, we want to show something unusual: how to write a simple C program that uses only the Python C library.

The Python/C API is very well documented and offers access to all of Python's data types and their functionality, such as list, dict, etc. . Python provides this C API in order to allow users to write Python C extension modules, which basically behave like Python modules but are writen in C rather than in Python.

These C extensions are very important when interfacing Python with existing C libraries and when writing speed- or memory-critical libraries. The Python standard library itself contains a large number of C extension modules, which are an excellent resource for exploring how to write C extensions. When I wrote bitarray in 2008, I found studying the array standard library extremely useful.

Now we want to write a pure C program that uses some of the functionalities exposed in the Python/C API, such as Python's data structures and algorithms, but is not a Python C extension. Instead this C program has its own main() function and only uses the Python dictionary implementation to count word frequencies in a text file. The task of reading a text file and recognizing words separated by whitespace is not too hard to accompish in C. Whenever a word is read, it is added to a dictionary, which maps words according to their frequency count. This function looks like this:

void add_word(PyObject *dict, char *str)
{
    PyObject *value;
    long cnt;

    value = PyDict_GetItemString(dict, str);
    cnt = (value == NULL) ? 0 : PyLong_AsLong(value);
    cnt++;
    PyDict_SetItemString(dict, str, PyLong_FromLong(cnt));
}

We need to include Python.h and a link to libpython to make this work. The entire program, and a build script that works on Linux and MacOS, can be found here.

Obviously, we could have easily written a program for this particular task in pure Python without having to worry about any C code at all. But this is not the point of this exercise. The point is to show how the Python C library can be used directly within a C program (without writing an entire C extension).

I hope you've enjoyed this article, and maybe learned something new and useful.

About

Word Frequency Count

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published