A vendored dependency is an aggregation of code (such as a package, module or library) that is included in source form as part a larger aggregation (usually an application) but which is also available separately standalone (such as a dynamic library). A typical example is an application (e.g. Mariadb) that distributes the source for a separate library within its source tree (e.g. readline). This can be done for a number of reasons, such as licensing issues that prevent using of the module separately, custom code changes made for the application's use that are not/can not be upstreamed, and ease of compilation for the developer. git modules one mechanism developers use do this (in Mageia we always start from a source tar ball and never directly from a source code control system).
There are many downsides to this approach. Probably the biggest one is that when a standalone dynamic library is updated to fix a security bug, the vendored versions included in other applications are not automatically updated. These applications must be updated separately and recompiled, and the upstream developer may not immediately (or even ever) include the security fixes in the vendored copy, leaving the application vulnerable to security issues.
For these reasons, vendored libraries are discouraged in Mageia (TBD: point to the policy)
A closely related issue is using dependencies that are downloaded at compile time. This are common in languages such as Go (go install), Rust (cargo) and Javascript (npm) and using them ends up with similar problems to vendoring. The result is even worse because it can not only be difficult to determine which dependencies have been used, but downright impossible to determine the version numbers actually used at compile time. Without dependency names and version numbers, it becomes impossible to tell when a package is affected by a security issue in a dependency. When a security issue is reported, it can result in the need to recompile many application packages instead of a single one. It is also difficult to ensure that the licenses of all downloaded packages match those allowed by the distribution. And when packages have licenses like the GPL that require source code be supplied with the binary, it becomes mandatory to make a copy of the downloaded source available to users.
For these reasons, files downloaded at compile time are disallowed in Mageia (TBD: point to the policy)
Static linking is another practice that effectively results in the same problems as the above. Some languages (e.g. Go, Rust) statically link their dependencies so security issues in those dependencies means rebuilding all applications with the fixed packages.
For these reasons, static linking is discouraged in Mageia (TBD: point to the policy)
These are really three separate topics but since the effects of all three are very similar, they are discussed here as a block.
Contents
Problems with disallowing vendoring
When vendored (bundled) and downloaded packages are disallowed, they must instead be packaged separately. This means extra work for packagers since adding a single new complicated application can require individually packaging dozens or even hundreds of new separate dependency packages. This takes extra time, disk space and the especially rare commodity, packager time. It is simply not scalable and effectively means that new applications written in some of the languages particularly prone to this way of working just aren't available to Mageia users.
The landscape
Two languages becoming more popular these days, Go and Rust, particularly suffer from the issues described and supporting applications using them in Mageia is difficult due to policies designed for the C and C++ applications that were the most popular ones in the past. If we want to support programs in these languages, we need to ease the burden on packagers.
The main reasons for policies against vendoring dependencies are:
- to easily identify which packages need to be updated to fix security issues
- to ensure that a known security issue does not unknowingly go unfixed in the distribution
- to reduce the work in updating those packages when it becomes necessary
- to reduce time, bandwidth and disk space for users
- to ensure source code is always available to users to fulfill licensing obligations
If we can find a way to satisfy those requirements to a reasonable degree while still allowing vendoring and downloading of modules at compile time, we can ease the burden on our packagers and infrastructure.
Language landscape
To give a rough idea of what languages might benefit from easing restrictions on vendoring, here are the languages with the most number of modules in Mageia as of this writing (in approximate decreasing order):
- C/C++
- Rust
- Perl*
- Python*
- Java
- Go
- Ruby*
- OCaml
- Javascript* (nodejs)
- PHP*
- Erlang*
- other compiled languages like C# (mono), BASIC (FreeBASIC), FORTRAN (gfortran), Pascal (lazarus)
- other interpeted languages like Forth*, Lua*, Scheme*, CLisp
* Interpreted languages that don't statically link dependencies into applications or modules
C/C++
These languages support dynamic linking and the developer culture does not generally encourage either a huge number of small dependencies or vendored dependencies. There are also no popular cross-platform build mechanisms that many upstream developers uses to download and building dependencies alongside their applications (although Conda, Conary and vcpkg are changing this) that could be utilized to save packager time. No proposal is therefore currently being made to ease the vendoring restrictions in C or C++ applications.
Erlang, Nodejs, Perl, PHP, Python, Ruby
These are all interpreted languages that either have no concept of bundling dependencies somehow into submodules (a static linking equivalent) or developers don't generally use them. They generally do, however, have means to automatically obtain dependent modules at build time (e.g. pip, cpan, npm) that Mageia's current policies forbid. Generally, interpreted languages rely on modules being installed in the system when they are executed and do not compile/bind/link them into an independent blob like compiled languages are forced to do. Vendoring can still be useful to greatly reduce the number of packages necessary to create for a new application.
Go, Java, OCaml, Rust
These languages result in statically-linked (or equivalent) binaries. If a dependency has a security issue, every application using that dependency must be recompiled.
A way forward
The following proposal satisfies the reasons for the anti-vendoring policies in the introduction, while allowing applications to be packaged without separately needing to package each dependency. It handles vendored dependencies, dependencies downloaded at build time, as well as statically-linked applications.
Overview:
- Developer builds a package SRPM containing all application source code as well as any unpackaged dependency source code needed by the application (i.e. vendoring it), including a SBOM (Software Bill of Materials) for those dependencies
- The build system uses only locally-available source to build (as always) and adds a reference to the main source(s) to the SBOM, for completeness
- For interpreted languages, the build system puts any vendored code into a filesystem location specific to the application in the final RPM
- The build system stores the SBOM at the end of the build into a central repository
- A security scanner periodically scans all SBOMs to look for dependencies that have reported security vulnerabilities
- If a security vulnerability is found, it outputs a list of packages that need to be updated and rebuilt and opens one or more bugs
- Each package needing a rebuild goes back to step 1 (if a local patch to fix a vulnerability has been added, it is noted in the SBOM)
SBOMs will be stored in the SPDX format.
Security updates are assumed to consist of upgrading to a new upstream release. Those that require patching a dependency complicates this flow, since the same patch must then be applied to each vendored instance of that dependency. If an unpackaged dependency needs a local patch instead of an upgrade, then we could implement a policy that the dependency must be first be packaged before rebuilds are performed, with that new package added as a dependency to any application that needs it before rebuilding. That avoids carrying the identical patch around in many packages.
A script will be created to take care of the bulk of step 1 for the developer. It would scan the application source code to find out what dependencies are needed, then exclude any dependencies already supplied by packages in BuildRequires: leaving a list of outstanding ones. These would be downloaded using the language's normal package download mechanism and installed into a private temporary location. All these would then be archived into a compressed tarball along with an SBOM containing all the packaged dependency names and versions and stored in the SOURCES/ directory under a standard name (maybe dependencies.tar.xz, but see other historic precedence below). This file would then be added to sha1.lst and uploaded to binrepo. This could all be integrated into a mgarepo subcommand. TODO: who is responsible for ensuring that the licenses of all the dependencies are allowed, compatible and that the License: line in the .spec file matches?
For step 2., the various RPM build macros would be updated to handle any dependencies stored in dependencies.tar.xz. They would be extracted into a temporary location in BUILDROOT/ and the compile command extended to look for missing dependencies in this location.
For interpreted languages (step 3), the dependencies would instead be installed in the RPM in an appropriate location in /usr/share/ (that doesn't conflict with other dependencies), and the application's launch command extended to find these dependencies (since they are private to that one application and won't be found in the normal search paths). TODO: how to handle locally patching these dependencies? patching before or after storing in dependencies.tar.xz
C/C++
While C/C++ programs occasionally vendor in dependencies, a bigger problem is those packages that statically link to libc. The build nodes should scan for statically-linked binaries and highlight those in the SBOM alongside their dependencies to ensure the static binaries can be rebuilt to include the security fixes when those dependencies (mostly glibc) are patched. It's also possible for a binary to be partially statically linked and these cases should also be detected (likely with difficulty), either by detecting static libraries in the dependency lists, or by instrumenting the linker.
Go
The go list -json command can be used to generate the list of dependencies needed by an application (step 1).
The Go Vulnerability Database can be used on an ongoing basis to find security issues in Go packages from the SBOMs of those packages.
Some Go packages already have vendored dependencies (e.g. fzf) stored in a file called SOURCES/vendor.tar.xz (likely created through some variant of go mod download;go mod verify;go mod vendor;tar caf vendor.tar.xz vendor). These may also have Provide: bundled(golang(...)) lines that list the vendored packages included, but there seems to be no mechanism to keep those in sync and they don't contain version numbers so they can't be used as a replacement for a proper SBOM.
A possible workflow:
- After the RPM %install step, run the following:
syft scan --output spdx-tag-value="%{NAME}-%{VERSION}.%{RELEASE}.%{ARCH}.spdx" dir:%{buildroot}
- syft scans the installed binaries and generates a SBOM including the Go dependencies embedded therein (including Go's stdlib version). The resulting SBOM file can be stored in a permanent location for later scans.
- Periodically, scan all the SBOM files to see if any of them show dependencies that have reported vulnerabilities by running on each file:
grype --output json sbom:"%{NAME}-%{VERSION}.%{RELEASE}.%{ARCH}.spdx"
- If any new vulnerabilities are found, open a bug so the package can be rebuilt.
Rust
Some Rust packages in Mageia already include vendored dependencies. These are stored in the tree in a binrepo file called SOURCES/<packagename>-vendor.tar.xz. The macro %cargo_prep -v vendor in the %prep section takes care of extracting them into the right place before a build. This archive is created with the cargo vendor command. Some means of extracting a list of those vendored packages into a SPDX file needs to be determined.
See Also
- Packages carrying bundled copies of system libraries
- Security Updates
- Fedora proposing allowing vendored Go packages
- Fedora policy on vendored Rust dependencies
- Rust, RPMs, and the Fine Art of Dependency Bundling
- Thread on packages with many components/modules/subpackages
- OBS method of vendoring Go dependencies
- Go Vulnerability Database
- GUAC SBOM management tool
- Trustify SBOM management tool
- grype, tool that can look up security issues from a SPDX SBOM
- Trivy, tool that can look up security issues from a SPDX SBOM