Why Python use GIL
Python uses reference count for memory management. Each objects in python has a reference count variables that keep track of the number of reference to the object. When the reference count goes back to 0, Python would release this object from memory.
However, in multi threading scenario, multiple thread might access the same object, and object reference count could be changed incorrectly in race conditions. Then objects that should be released could still stay in memory and worst case, objects that should not be release are incorrectly released.
To solve this problem, python introduced GIL, which is a global lock in python interpreter level. The rules is that, any python code has to acquire this lock to be executed. You might ask why not add one lock to each objects? This could result in deadlock.
In this way, python code guarantees that only one thread would be able to change the object reference count.
Problem of GIL
The GIL solution, however has problems in that Python code would not be able to utilize multi CPUs. If your code is CPU intensive, python multi-thread would not help you at all. However, if you program is not CPU intensive, but I/O intensive, for example, network application, Python thread is still a good choice.
Solution to this problem?
Are there solutions to the problem? Yes, there are. Python community has tries many times to solve this problem. Python GIL is not really tie to python language it self, it ties to Python interpreter it self. So, as long as we change the underlying python interpreter, python could support multithread. For example, Jython, is implemented in Java.
However, another important reason why Python GIL is not removed is that python has many extended libraries that are writing in C. Those libraries works well with Python in that they don’t need to worry about multi-thread models, the GIL model is really easy to integrate. Moving those libraries to other interpreters are hard works.
Another solution is to use multiprocess
instead of multithread
. Python has good designed libraries that supports multiprocess. However, process management would have more overhead than thread management for operating system, which means the performance of multiprocess programs are worse than multithreads.