Why does python use both reference counting and mark-and-sweep for gc? Why does python use both reference counting and mark-and-sweep for gc? python python

Why does python use both reference counting and mark-and-sweep for gc?


Python (the language) doesn't say which form of garbage collection it uses. The main implementation (often known as CPython) acts as you describe. Other versions such as Jython or IronPython use a purely garbage collected system.

Yes, there is a benefit of earlier collection with reference counting, but the main reason CPython uses it is historical. Originally there was no garbage collection for cyclic objects so cycles led to memory leaks. The C APIs and data structures are based heavily around the principle of reference counting. When real garbage collection was added it wasn't an option to break the existing binary APIs and all the libraries that depended on them so the reference counting had to remain.


Reference counting deallocates objects sooner than garbage collection.

But as reference counting can't handle reference cycles between unreachable objects, Python uses a garbage collector (really just a cycle collector) to collect those cycles when they exist.


My initial guess is that using reference counting can easily remove non-cyclic referenced objects, this may somewhat speed up mark-and-sweep and gain memory immediately. Don't know if my guess is right?

Yes. As soon as the refcount goes to zero and object can be removed. This won't happen in a cyclic referenced object. AFAIK, mark and sweep is a costly operation and the simplest way to implement it requires you to "stop the world" while objects are marked. When all of the objects are traversed, andy object not marked (as reachable) is released.