There are dozens or even hundreds of open source licenses available out there. More generally speaking, these licenses can be grouped into two groups: permissive licenses and copyleft licenses.

On the permissive group, perhaps the MIT license is one of the most common one.

A “permissive” license permits the code to be incorporated in a program that could be distributed under some other license. For example, an open source software licensed under a permissive license could be incorporated into the other proprietary software without disclosing the source code, and this newly created software may be distributed.

On the other side, GPL is one of the essential copyleft licenses.

“Copyleft’’ licenses allow developers to guarantee unlimited open-source access to their work. Because of this requirement, “Copyleft’’ licenses are sometimes called as “restrictive’’ licenses. The core requirement of copyleft licenses is that any derivative work must be distributed under the same license as the original.

Which one is the most commonly used in open source projects?

Although the question is easy to pose, it is not necessarily easy to answer. This happens due to the multiple threats hidden when analyzing license usage in open source projects at scale.

  • First, because there is not a single coding hosting website that hosts all open source projects. Although GitHub is one of the most common alternatives, many open source projects are hosted in the community git repository, in other repositories such as the Debian package archive, and GNU Project FTP archive, etc.
  • Second, because to understand open source license usage in scale, we should rely on tools that infer the license used. This process is not always precise because developers often state open source licenses in different ways. For example, while some developers declare their licenses as a comment in the header of every source code file, other developers cite the license used for the whole project within the README file. Moreover, some developers copy and paste the full text of the license, whereas other developers might only mention the name of the license.

Given these limitations, we could still try to infer the open source license usage in practice by, for instance, using GitHub data stored on Libraries.io (a service that gathers metadata from open source projects hosted in several package managers). Libraries.io provides its data through a Zenodo repository. Feel free to go there and play a bit with it.

Using this data, I plotted the figure next.

Open source license usage using Libraries.io data

As one could see, permissive licenses are by far the most used one. Indeed, MIT is the most used license, appearing in more than 812k open source projects. Apache 2.0 comes next, appearing in 465k projects. BSD-3, also a permissive license, appears next, licensing 71k projects. These three licenses (MIT, Apache 2.0, and BSD-3) are used in about 70% of the open source projects stored in the Libraries.io dataset.

On the other spectrum of this figure, one could see that the GPL family of licenses (GPL-2, GPL-3, AGPL-3, LGPL-3) comprehends around only 5% of the overall license usage.

Why is this happening?

According to a 2016 survey with about 3,400 participants, it is estimated that 67% of the surveyed companies actively encourage developers to engage in and contribute to open source software. This shows a clear commercial interest in open source projects. This is particularly interesting for permissive licenses.

[Permissive licenses] employs minimal requirements about how the code could be redistributed. Permissive licenses are not only used for licensing open source software but also important to support the basis for proprietary applications. This is possible because permissive licenses do not place any restrictions on the source code distribution. That is, a program licensed under a permissive license could “close’’ its source and redistribute only the binaries. Although this might impact open source developers to read and learn from that source code, it creates a medium for companies to exploit open source programs in their business.

However, some researchers also believe that GPL is not the most appropriate choice for a business that relies on open source. Even though software licensed under GPL can be used (and also modified) in corporate environments, software companies should be aware of the characteristics of GPL. In particular, the key feature of GPL is that it restricts the terms of the distribution of derived works. If any software company incorporates any source code licensed under GPL, the company must license their own software products that use GPL code under GPL as well.

GPL, created by the Free Software Foundation (FSF), is the principal copyleft license. The GPL license has the ultimate goal of making software 100% free for everyone. This decision of going 100% free is actually a challenge for a business that may not always support providing 100% of their code to the public (for instance, have you ever seen the actual Google implementation of its search engine?). As a consequence, some businesses might not be comfortable using a license that is very aligned with such goals.

But, what could happen if a company misuse open source license? Many things can happen, and I approach many examples in this book. This issue is so serious that some companies decided to literally remove all GPL-licensed code in their codebase.

For now, consider the case of Acme Inc, a startup that was urgently looking for an acquisition. Acme received a acquisition deal made by Shockwave, a bigger company in the same segment. During the inspection by the acquirer, Shockwave noticed that the Acme development team were misusing open source licenses. “Shockwave ultimately backed out of the deal and the Acme technology was put on the shelf without a financial return to their employees or investors.

If you want to know more about open source licensing and some hidden problems behind it, consider buying a copy of my ebook on Open Source Licensing 101.

--

--