[2858] | 1 | ============================ |
---|
| 2 | Design of The Invirtibuilder |
---|
| 3 | ============================ |
---|
| 4 | |
---|
| 5 | Introduction |
---|
| 6 | ============ |
---|
| 7 | |
---|
| 8 | The Invirtibuilder is an automated Debian package builder, APT |
---|
| 9 | repository manager, and Git repository hosting tool. It is intended |
---|
| 10 | for projects that consist of a series of Debian packages each tracked |
---|
| 11 | as a separate Git repository, and designed to keep the Git and APT |
---|
| 12 | repositories in sync with each other. The Invirtibuilder supports |
---|
| 13 | having multiple threads, or "pockets" of development, and can enforce |
---|
| 14 | different access control and repository consistency rules for each |
---|
| 15 | pocket. |
---|
| 16 | |
---|
| 17 | Background and Goals |
---|
| 18 | ==================== |
---|
| 19 | |
---|
| 20 | The Invirtibuilder was originally developed for Invirt_, a project of |
---|
| 21 | the MIT SIPB_. When we went to develop a tool for managing our APT and |
---|
| 22 | Git repositories, we had several goals, each of which informed the |
---|
| 23 | design of the Invirtibuilder: |
---|
| 24 | |
---|
| 25 | * One Git repository per Debian package. |
---|
| 26 | |
---|
| 27 | Because of how Git tracks history, it's better suited for tracking a |
---|
| 28 | series of small repositories, as opposed to one large one |
---|
| 29 | [#]_. Furthermore, most pre-existing tools and techniques for |
---|
| 30 | dealing with Debian packages in Git repositories (such as |
---|
| 31 | git-buildpackage_ or `VCS location information`_) are designed |
---|
| 32 | exclusively for this case. |
---|
| 33 | |
---|
| 34 | * Synchronization between Git and APT repositories. |
---|
| 35 | |
---|
| 36 | In our previous development models, we would frequently merge |
---|
| 37 | development into trunk without necessarily being ready to deploy it |
---|
| 38 | to our APT repository (and by extension, our servers) yet. However, |
---|
| 39 | once the changes had been merged in, it was no longer possible to |
---|
| 40 | see the current state of the APT repository purely from inspection |
---|
| 41 | of the source control repository. |
---|
| 42 | |
---|
| 43 | * Support for multiple *pockets* of development. |
---|
| 44 | |
---|
| 45 | For the Invirt_ project, we maintain separate production and |
---|
| 46 | development environments. Initially, they each shared the same APT |
---|
| 47 | repository. To test changes, we had to install them into the APT |
---|
| 48 | repository and install the update on our development cluster, and |
---|
| 49 | simply wait to take the update on our production cluster until |
---|
| 50 | testing was completed. When designing the Invirtibuilder, we wanted |
---|
| 51 | the set of packages available to our development cluster to be |
---|
| 52 | separate from the packages in the production cluster. |
---|
| 53 | |
---|
| 54 | * Different ACLs for different pockets. |
---|
| 55 | |
---|
| 56 | Access to our development cluster is relatively unrestricted—we |
---|
| 57 | freely grant access to interested developers to encourage |
---|
| 58 | contributions to the project. Our production cluster, on the other |
---|
| 59 | hand, has a much higher standard of security, and access is limited |
---|
| 60 | to the core maintainers of the service. The Invirtibuilder needed to |
---|
| 61 | support that separation of privilege. |
---|
| 62 | |
---|
| 63 | * Tool-enforced version number restrictions. |
---|
| 64 | |
---|
| 65 | Keeping our packages in APT repositories adds a few restrictions to |
---|
| 66 | the version numbers of packages. First, version numbers in the APT |
---|
| 67 | repository must be unique. That is, you can not have two different |
---|
| 68 | packages of the same name and version number. Second, version |
---|
| 69 | numbers are expected to be monotonically increasing. If a newer |
---|
| 70 | version of a package had a lower version number than the older |
---|
| 71 | version, dpkg would consider this a downgrade. Downgrades are not |
---|
| 72 | supported by dpkg, and will not even be attempted by APT. |
---|
| 73 | |
---|
| 74 | In order to avoid proliferation of version numbers used only for |
---|
| 75 | testing purposes, we opted to bend the latter rule for our |
---|
| 76 | development pocket. |
---|
| 77 | |
---|
| 78 | * Tool-enforced consistent history. |
---|
| 79 | |
---|
| 80 | In order for the Git history to be meaningful, we chose to require |
---|
| 81 | that each version of a package that is uploaded into the APT |
---|
| 82 | repository be a fast-forward of the previous version. |
---|
| 83 | |
---|
| 84 | Again, to simplify and encourage testing, we bend this rule for the |
---|
| 85 | development pocket as well. |
---|
| 86 | |
---|
| 87 | Design |
---|
| 88 | ====== |
---|
| 89 | |
---|
| 90 | Configuration |
---|
| 91 | ------------- |
---|
| 92 | |
---|
| 93 | For the Invirt_ project's use of the Invirtibuilder, we adapted our |
---|
| 94 | existing configuration mechanism. Our configuration file consists of a |
---|
| 95 | singls YAML_ file. Here is the snippet of configuration we use for our |
---|
| 96 | build configuration:: |
---|
| 97 | |
---|
| 98 | build: |
---|
| 99 | pockets: |
---|
| 100 | prod: |
---|
| 101 | acl: system:xvm-root |
---|
| 102 | apt: stable |
---|
| 103 | dev: |
---|
| 104 | acl: system:xvm-dev |
---|
| 105 | apt: unstable |
---|
| 106 | allow_backtracking: yes |
---|
| 107 | tagger: |
---|
| 108 | name: Invirt Build Server |
---|
| 109 | email: invirt@mit.edu |
---|
| 110 | |
---|
| 111 | The Invirtibuilder allows naming Invirtibuilder pockets separately |
---|
| 112 | form their corresponding Git branches or APT components. However, if |
---|
| 113 | either the ``git`` or ``apt`` properties of the pocket are |
---|
| 114 | unspecified, they are assumed to be the same as the name of the |
---|
| 115 | pocket. |
---|
| 116 | |
---|
| 117 | The ``acl`` attributes for each pocket are interpreted within our |
---|
| 118 | authorization modules to determine who is allowed to request builds on |
---|
| 119 | a given pocket. ``system:xvm-root`` and ``system:xvm-dev`` are the |
---|
| 120 | names of AFS groups, which we use for authorization. |
---|
| 121 | |
---|
| 122 | The ``tagger`` attribute indicates the name and e-mail address to be |
---|
| 123 | used whenever the Invirtibuilder generates new Git repository objects, |
---|
| 124 | such as commits or tags. |
---|
| 125 | |
---|
| 126 | Finally, it was mentioned in `Background and Goals`_ that we wanted |
---|
| 127 | the ability to not force version number consistency or Git |
---|
| 128 | fast-forwards for our development pocket. The ``allow_backtracking`` |
---|
| 129 | attribute was introduced to indicate that preference. When it is set |
---|
| 130 | to ``yes`` (i.e. YAML's "true" value), then neither fast-forwards nor |
---|
| 131 | increasing-version-numbers are enforced when validating builds. The |
---|
| 132 | attribute is assumed to be false if undefined. |
---|
| 133 | |
---|
| 134 | Git Repositories |
---|
| 135 | ---------------- |
---|
| 136 | |
---|
| 137 | In order to make it easy to check out all packages at once, and for |
---|
| 138 | version controlling the state of the APT repository, we create a |
---|
| 139 | "superproject" using Git submodules [#]_. |
---|
| 140 | |
---|
| 141 | There is one Git branch in the superproject corresponding to each |
---|
| 142 | pocket of development. Each branch contains a submodule for each |
---|
| 143 | package in the corresponding component of the APT repository, and the |
---|
| 144 | submodule commit referred to by the head of the Git branch matches the |
---|
| 145 | revision of the package currently in the corresponding component of |
---|
| 146 | the APT repository. Thus, the heads of the Git superproject match the |
---|
| 147 | state of the components in the APT repository. |
---|
| 148 | |
---|
| 149 | Each of the submodules also has a branch for each pocket. The head of |
---|
| 150 | that branch points to the revision of the package that is currently in |
---|
| 151 | the corresponding component of the APT repository. This provides a |
---|
| 152 | convenient branching point for new development. Additionally, there is |
---|
| 153 | a Git tag for every version of the package that has ever been uploaded |
---|
| 154 | to the APT repository. |
---|
| 155 | |
---|
| 156 | Because the Invirtibuilder and its associated infrastructure are |
---|
| 157 | responsible for keeping the superproject in sync with the state of the |
---|
| 158 | APT repository, an update hook disallows all pushes to the |
---|
| 159 | superproject. |
---|
| 160 | |
---|
| 161 | Pushes to the submodules, on the other hand, are almost entirely |
---|
| 162 | unrestricted. Like with the superproject, the Git branches for each |
---|
| 163 | pocket and Git tags are maintained by the build infrastructure, so |
---|
| 164 | pushes to them are disallowed. Outside of that, we make no |
---|
| 165 | restrictions on the creation or deletion of branches, nor are pushes |
---|
| 166 | required to be fast-forwards. |
---|
| 167 | |
---|
| 168 | The Build Queue |
---|
| 169 | --------------- |
---|
| 170 | |
---|
| 171 | We considered several ways to trigger builds of new package versions |
---|
| 172 | using Git directly. However, we realized that what we actually wanted |
---|
| 173 | was a separate build queue where each build request was handled and |
---|
| 174 | processed independently of any requests before or after it. It's not |
---|
| 175 | possible to have these semantics using Git as a signalling mechanism |
---|
| 176 | without breaking standard assumptions about how remote Git |
---|
| 177 | repositories work. |
---|
| 178 | |
---|
| 179 | In order to trigger builds, then, we needed a side-channel. Since it |
---|
| 180 | was already widely used in the Invirt_ project, we chose to use |
---|
| 181 | remctl_, a GSSAPI-authenticated RPC protocol with per-command ACLs. |
---|
| 182 | |
---|
| 183 | To trigger a new build, a developer calls remctl against the build |
---|
| 184 | server with a pocket, a package, and a commit ID from that package's |
---|
| 185 | Git repository. The remctl daemon then calls a script which validates |
---|
| 186 | the build and adds it to the build queue. Because of the structure of |
---|
| 187 | remctl's ACLs, we are able to have different ACLs depending on which |
---|
| 188 | pocket the build is destined for. This allows us to fulfil our design |
---|
| 189 | goal of having different ACLs for different pockets. |
---|
| 190 | |
---|
| 191 | For simplicity, the queue itself is maintained as a directory of |
---|
| 192 | files, where each file is a queue entry. To maintain order in the |
---|
| 193 | queue, the file names for queue entries are of the form |
---|
| 194 | ``YYYYMMDDHHMMSS_XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX``, where ``X`` |
---|
| 195 | indicates a random hexadecimal digit. Each file contains the |
---|
| 196 | parameters passed in over remctl (pocket, package, and commit ID to |
---|
| 197 | build), as well as the Kerberos principal of the user that requested |
---|
| 198 | the build, for logging. |
---|
| 199 | |
---|
| 200 | The Build Daemon |
---|
| 201 | ---------------- |
---|
| 202 | |
---|
| 203 | To actually execute builds, we run a separate daemon to monitor for |
---|
| 204 | new build requests in the build queue. The daemon uses inotify so that |
---|
| 205 | it's triggered whenever a new item is added to the build |
---|
| 206 | queue. Whenever an item in the build queue triggers the build daemon, |
---|
| 207 | the daemon first validates the build, then executes the build, and |
---|
| 208 | finally updates both the APT repository and Git superproject with the |
---|
| 209 | results of the build. The results of all attempted builds are recorded |
---|
| 210 | in a database table for future reference. |
---|
| 211 | |
---|
| 212 | Build Validation |
---|
| 213 | ```````````````` |
---|
| 214 | |
---|
| 215 | The first stage of processing a new build request is validating the |
---|
| 216 | build. First, the build daemon checks the version number of the |
---|
| 217 | requested package in each pocket of the repository. If the package is |
---|
| 218 | present in any other pocket with the same version number, but the Git |
---|
| 219 | commit for the package is different, the build errors out, because it |
---|
| 220 | is not possible for an APT repository to contain two different |
---|
| 221 | packages with the same name and version number. |
---|
| 222 | |
---|
| 223 | Next, the build daemon checks to make sure that the version number of |
---|
| 224 | the new package is a higher version number than the version currently |
---|
| 225 | in the APT repository, as version numbers must be monotonically |
---|
| 226 | increasing. |
---|
| 227 | |
---|
| 228 | Finally, we require new packages to be fast-forwards in Git of the |
---|
| 229 | previous version of the package. This is verified as well. |
---|
| 230 | |
---|
| 231 | As mentioned above, the ``allow_backtracking`` attribute can be set |
---|
| 232 | for a pocket to bypass the latter two checks in development |
---|
| 233 | environments. |
---|
| 234 | |
---|
| 235 | When the same package with the same version is inserted into multiple |
---|
| 236 | places in the same APT repository, the MD5 hash of the package is used |
---|
| 237 | to validate that it hasn't changed. Because rebuilding the same |
---|
| 238 | package causes the MD5 hash to change, when a version of a package |
---|
| 239 | identical to a version already in the APT repository is added to |
---|
| 240 | another pocket, we need to copy it directly. Since the validation |
---|
| 241 | stage already has all of the necessary information to detect this |
---|
| 242 | case, if the same version of a package is already present in another |
---|
| 243 | pocket, the validation stage returns this information. |
---|
| 244 | |
---|
| 245 | Build Execution |
---|
| 246 | ``````````````` |
---|
| 247 | |
---|
| 248 | Once the build has been validated, it can be executed. The requested |
---|
| 249 | version of the package is exported from Git, and then a Debian source |
---|
| 250 | package is generated. Next, the package itself is built using sbuild. |
---|
| 251 | |
---|
| 252 | sbuild creates an ephemeral build chroot for each build that has only |
---|
| 253 | essential build packages and the build dependencies for the package |
---|
| 254 | being built installed. We use sbuild for building packages for several |
---|
| 255 | reasons. First, it helps us verify that all necessary build |
---|
| 256 | dependencies have been included in our packages. Second, it helps us |
---|
| 257 | ensure that configuration files haven't been modified from their |
---|
| 258 | upstream defaults (which could cause problems for packages using |
---|
| 259 | config-package-dev_). |
---|
| 260 | |
---|
| 261 | The build daemon keeps the build logs from all attempted builds on the |
---|
| 262 | filesystem for later inspection. |
---|
| 263 | |
---|
| 264 | Repository Updates |
---|
| 265 | `````````````````` |
---|
| 266 | |
---|
| 267 | Once the build has been successfully completed, the APT and Git |
---|
| 268 | repositories are updated to match the new state. First, a new tag is |
---|
| 269 | added to the package's Git repository for the current version |
---|
| 270 | [#]_. Next, the pocket tracking branch in the submodule is also |
---|
| 271 | updated with the new version of the package. Then the a new commit is |
---|
| 272 | created on the superproject which updates the package's submodule to |
---|
| 273 | point to the new version of the package. Finally, the new version of |
---|
| 274 | the package is included in the appropriate component of the APT |
---|
| 275 | repository. |
---|
| 276 | |
---|
| 277 | Because the Git superproject, the Git submodules, and the APT |
---|
| 278 | repository are all updated simultaneously to reflect the new package |
---|
| 279 | version, the Git repositories and the APT repository always stay in |
---|
| 280 | sync. |
---|
| 281 | |
---|
| 282 | Build Failures |
---|
| 283 | `````````````` |
---|
| 284 | |
---|
| 285 | If any of the above stages of executing a build fail, that failure is |
---|
| 286 | trapped and recorded for later inspection, and recorded along with the |
---|
| 287 | build record in the database. Regardless of success or failure, the |
---|
| 288 | build daemon runs any scripts in a hook directory. The hook directory |
---|
| 289 | could contain scripts to publish the results of the build in whatever |
---|
| 290 | way is deemed useful by the developers. |
---|
| 291 | |
---|
[2868] | 292 | Security |
---|
| 293 | ======== |
---|
| 294 | |
---|
| 295 | As noted above, our intent was for a single instance of the |
---|
| 296 | Invirtibuilder to be used for both our trusted production environment |
---|
| 297 | and our untrusted development environment. In order to be trusted for |
---|
| 298 | the production environment, the Invirtibuilder needs to run in the |
---|
| 299 | production environment as well. However, it would be disasterous if |
---|
| 300 | access to the development environment allowed a developer to insert |
---|
| 301 | malicious packages into the production apt repository. |
---|
| 302 | |
---|
| 303 | In terms of policy, we enforce this distinction using the remctl ACL |
---|
| 304 | mechanism described in `The Build Queue`_. But is that mechanism on |
---|
| 305 | its own actually secure? |
---|
| 306 | |
---|
| 307 | Only mostly, it turns out. |
---|
| 308 | |
---|
| 309 | While actual package builds run unprivileged (with the help of the |
---|
| 310 | fakeroot_ tool), packages can declare arbitrary build dependencies |
---|
| 311 | that must be installed for the package build to run. Packages' |
---|
| 312 | maintainer scripts (post-install, pre-install, pre-removal, and |
---|
| 313 | post-removal scripts) run as root. This means that by uploading a |
---|
| 314 | malicious package that another package build-depends on, then |
---|
| 315 | triggering a build of the second package, it is possible to gain root |
---|
| 316 | privileges. Since breaking out of the build chroot as root is trivial |
---|
| 317 | [#], it is theoretically possible for developers with any level of |
---|
| 318 | access to the APT repositories to root the build server. |
---|
| 319 | |
---|
| 320 | One minor protection from this problem is the Invirtibuilder's |
---|
| 321 | reporting mechanism. A single independent malicious build can't |
---|
| 322 | compromise the build server on its own. Even if a second build |
---|
| 323 | compromises the build server, the first build will have already been |
---|
| 324 | reported through the hook mechanism described in `Build Failures`_. We |
---|
| 325 | encourage users of the Invirtibuilder to include hooks that send |
---|
| 326 | notifications of builds over e-mail or some other mechanism such that |
---|
| 327 | there are off-site records. The server will still be compromised, but |
---|
| 328 | there will be an audit trail. |
---|
| 329 | |
---|
| 330 | Such a vulnerability will always be a concern so long as builds are |
---|
| 331 | isolated using chroots. It is possible to protect against this sort of |
---|
| 332 | attack by strengthening the chroot mechanism (e.g. with grsecurity_) |
---|
| 333 | or by using a more isolated build mechanism |
---|
| 334 | (e.g. qemubuilder_). However, we decided that the security risk didn't |
---|
| 335 | justify the additional implementation effort or runtime overhead. |
---|
| 336 | |
---|
[2869] | 337 | Future Directions |
---|
| 338 | ================= |
---|
| 339 | |
---|
| 340 | While the Invirtibuilder was written as a tool for the Invirt_ |
---|
| 341 | project, taking advantage of infrastructure specific to Invirt, it was |
---|
| 342 | designed with the hope that it could one day be expanded to be useful |
---|
| 343 | outside of our infrastructure. Here we outline what we believe the |
---|
| 344 | next steps for development of the Invirtibuilder are. |
---|
| 345 | |
---|
| 346 | One deficiency that affects Invirt_ development already is the |
---|
| 347 | assumption that all packages are Debian-native [#]. Even for packages |
---|
| 348 | which have a non-native version number, the Invirtibuilder will create |
---|
| 349 | a Debian-native source package when the package is exported from Git |
---|
| 350 | as part of the `Build Execution`_. Correcting this requires a means to |
---|
| 351 | find and extract the upstream tarball from the Git repository. This |
---|
| 352 | could probably be done by involving the pristine-tar_ tool. |
---|
| 353 | |
---|
| 354 | The Invirtibuilder is currently tied to the configuration framework |
---|
| 355 | developed for the Invirt_ project. To be useful outside of Invirt, the |
---|
| 356 | Invirtibuilder needs its own, separate mechanism for providing and |
---|
| 357 | parsing configuration. It should not be difficult to use a separate |
---|
| 358 | configuration file but a similar YAML configuration mechanism for the |
---|
| 359 | Invirtibuilder. And of course, as part of that process, filesystem |
---|
| 360 | paths and the like that are currently hard-coded should be replaced |
---|
| 361 | with configuration options. |
---|
| 362 | |
---|
| 363 | The Invirtibuilder additionally relies on the authentication and |
---|
| 364 | authorization mechanisms used for Invirt_. Our RPC protocol of choice, |
---|
| 365 | remctl_, requires a functional Kerberos environment for |
---|
| 366 | authentication, limiting its usefulness for one-off projects not |
---|
| 367 | associated with an already existing Kerberos realm. We would like to |
---|
| 368 | provide support for some alternative RPC mechanism—possibly |
---|
| 369 | ssh. Additionally, there needs to be some way to expand the build ACLs |
---|
| 370 | for each pocket that isn't tied to Invirt's authorization |
---|
| 371 | framework. One option would be providing an executable in the |
---|
| 372 | configuration that, when passed a pocket as a command-line argument, |
---|
| 373 | prints out all of the principals that should have access to that |
---|
| 374 | pocket. |
---|
| 375 | |
---|
[2858] | 376 | .. _config-package-dev: http://debathena.mit.edu/config-packages |
---|
[2868] | 377 | .. _fakeroot: http://fakeroot.alioth.debian.org/ |
---|
[2858] | 378 | .. _git-buildpackage: https://honk.sigxcpu.org/piki/projects/git-buildpackage/ |
---|
[2868] | 379 | .. _grsecurity: http://www.grsecurity.net/ |
---|
[2858] | 380 | .. _Invirt: http://invirt.mit.edu |
---|
[2869] | 381 | .. _pristine-tar: http://joey.kitenet.net/code/pristine-tar/ |
---|
[2868] | 382 | .. _qemubuilder: http://wiki.debian.org/qemubuilder |
---|
[2858] | 383 | .. _remctl: http://www.eyrie.org/~eagle/software/remctl/ |
---|
| 384 | .. _SIPB: http://sipb.mit.edu |
---|
| 385 | .. _VCS location information: http://www.debian.org/doc/developers-reference/best-pkging-practices.html#bpp-vcs |
---|
| 386 | .. _YAML: http://yaml.org/ |
---|
| 387 | |
---|
| 388 | .. [#] http://lwn.net/Articles/246381/ |
---|
| 389 | .. [#] A Git submodule is a second Git repository embedded at a |
---|
| 390 | particular path within the superproject and fixed at a |
---|
| 391 | particular commit. |
---|
| 392 | .. [#] Because we don't force any sort of version consistency for |
---|
| 393 | pockets with ``allow_backtracking`` set to ``True``, we don't |
---|
| 394 | create new tags for builds on pockets with |
---|
| 395 | ``allow_backtracking`` set to ``True`` either. |
---|
[2868] | 396 | .. [#] http://kerneltrap.org/Linux/Abusing_chroot |
---|
[2869] | 397 | .. [#] http://people.debian.org/~mpalmer/debian-mentors_FAQ.html#native_vs_non_native |
---|