source: trunk/packages/invirt-dev/README.invirtibuilder @ 2950

Last change on this file since 2950 was 2870, checked in by broder, 15 years ago

I can spell, I promise.

File size: 17.9 KB
Line 
1============================
2Design of The Invirtibuilder
3============================
4
5Introduction
6============
7
8The Invirtibuilder is an automated Debian package builder, APT
9repository manager, and Git repository hosting tool. It is intended
10for projects that consist of a series of Debian packages each tracked
11as a separate Git repository, and designed to keep the Git and APT
12repositories in sync with each other. The Invirtibuilder supports
13having multiple threads, or "pockets" of development, and can enforce
14different access control and repository consistency rules for each
15pocket.
16
17Background and Goals
18====================
19
20The Invirtibuilder was originally developed for Invirt_, a project of
21the MIT SIPB_. When we went to develop a tool for managing our APT and
22Git repositories, we had several goals, each of which informed the
23design of the Invirtibuilder:
24
25* One Git repository per Debian package.
26
27  Because of how Git tracks history, it's better suited for tracking a
28  series of small repositories, as opposed to one large one
29  [#]_. Furthermore, most preexisting tools and techniques for dealing
30  with Debian packages in Git repositories (such as git-buildpackage_
31  or `VCS location information`_) are designed exclusively for this
32  case.
33
34* Synchronization between Git and APT repositories.
35
36  In our previous development models, we would frequently merge
37  development into trunk without necessarily being ready to deploy it
38  to our APT repository (and by extension, our servers) yet. However,
39  once the changes had been merged in, it was no longer possible to
40  see the current state of the APT repository purely from inspection
41  of the source control repository.
42
43* Support for multiple *pockets* of development.
44
45  For the Invirt_ project, we maintain separate production and
46  development environments. Initially, they each shared the same APT
47  repository. To test changes, we had to install them into the APT
48  repository and install the update on our development cluster, and
49  simply wait to take the update on our production cluster until
50  testing was completed. When designing the Invirtibuilder, we wanted
51  the set of packages available to our development cluster to be
52  separate from the packages in the production cluster.
53
54* Different ACLs for different pockets.
55
56  Access to our development cluster is relatively unrestricted—we
57  freely grant access to interested developers to encourage
58  contributions to the project. Our production cluster, on the other
59  hand, has a much higher standard of security, and access is limited
60  to the core maintainers of the service. The Invirtibuilder needed to
61  support that separation of privilege.
62
63* Tool-enforced version number restrictions.
64
65  Keeping our packages in APT repositories adds a few restrictions to
66  the version numbers of packages. First, version numbers in the APT
67  repository must be unique. That is, you can not have two different
68  packages of the same name and version number. Second, version
69  numbers are expected to be monotonically increasing. If a newer
70  version of a package had a lower version number than the older
71  version, dpkg would consider this a downgrade. Downgrades are not
72  supported by dpkg, and will not even be attempted by APT.
73
74  In order to avoid proliferation of version numbers used only for
75  testing purposes, we opted to bend the latter rule for our
76  development pocket.
77
78* Tool-enforced consistent history.
79
80  In order for the Git history to be meaningful, we chose to require
81  that each version of a package that is uploaded into the APT
82  repository be a fast-forward of the previous version.
83
84  Again, to simplify and encourage testing, we bend this rule for the
85  development pocket as well.
86
87Design
88======
89
90Configuration
91-------------
92
93For the Invirt_ project's use of the Invirtibuilder, we adapted our
94existing configuration mechanism. Our configuration file consists of a
95single YAML_ file. Here is the snippet of configuration we use for our
96build configuration::
97
98 build:
99  pockets:
100   prod:
101    acl: system:xvm-root
102    apt: stable
103   dev:
104    acl: system:xvm-dev
105    apt: unstable
106    allow_backtracking: yes
107  tagger:
108   name: Invirt Build Server
109   email: invirt@mit.edu
110
111The Invirtibuilder allows naming Invirtibuilder pockets separately
112form their corresponding Git branches or APT components. However, if
113either the ``git`` or ``apt`` properties of the pocket are
114unspecified, they are assumed to be the same as the name of the
115pocket.
116
117The ``acl`` attributes for each pocket are interpreted within our
118authorization modules to determine who is allowed to request builds on
119a given pocket. ``system:xvm-root`` and ``system:xvm-dev`` are the
120names of AFS groups, which we use for authorization.
121
122The ``tagger`` attribute indicates the name and e-mail address to be
123used whenever the Invirtibuilder generates new Git repository objects,
124such as commits or tags.
125
126Finally, it was mentioned in `Background and Goals`_ that we wanted
127the ability to not force version number consistency or Git
128fast-forwards for our development pocket. The ``allow_backtracking``
129attribute was introduced to indicate that preference. When it is set
130to ``yes`` (i.e. YAML's "true" value), then neither fast-forwards nor
131increasing-version-numbers are enforced when validating builds. The
132attribute is assumed to be false if undefined.
133
134Git Repositories
135----------------
136
137In order to make it easy to check out all packages at once, and for
138version controlling the state of the APT repository, we create a
139"superproject" using Git submodules [#]_.
140
141There is one Git branch in the superproject corresponding to each
142pocket of development. Each branch contains a submodule for each
143package in the corresponding component of the APT repository, and the
144submodule commit referred to by the head of the Git branch matches the
145revision of the package currently in the corresponding component of
146the APT repository. Thus, the heads of the Git superproject match the
147state of the components in the APT repository.
148
149Each of the submodules also has a branch for each pocket. The head of
150that branch points to the revision of the package that is currently in
151the corresponding component of the APT repository. This provides a
152convenient branching point for new development. Additionally, there is
153a Git tag for every version of the package that has ever been uploaded
154to the APT repository.
155
156Because the Invirtibuilder and its associated infrastructure are
157responsible for keeping the superproject in sync with the state of the
158APT repository, an update hook disallows all pushes to the
159superproject.
160
161Pushes to the submodules, on the other hand, are almost entirely
162unrestricted. Like with the superproject, the Git branches for each
163pocket and Git tags are maintained by the build infrastructure, so
164pushes to them are disallowed. Outside of that, we make no
165restrictions on the creation or deletion of branches, nor are pushes
166required to be fast-forwards.
167
168The Build Queue
169---------------
170
171We considered several ways to trigger builds of new package versions
172using Git directly. However, we realized that what we actually wanted
173was a separate build queue where each build request was handled and
174processed independently of any requests before or after it. It's not
175possible to have these semantics using Git as a signaling mechanism
176without breaking standard assumptions about how remote Git
177repositories work.
178
179In order to trigger builds, then, we needed a side-channel. Since it
180was already widely used in the Invirt_ project, we chose to use
181remctl_, a GSSAPI-authenticated RPC protocol with per-command ACLs.
182
183To trigger a new build, a developer calls remctl against the build
184server with a pocket, a package, and a commit ID from that package's
185Git repository. The remctl daemon then calls a script which validates
186the build and adds it to the build queue. Because of the structure of
187remctl's ACLs, we are able to have different ACLs depending on which
188pocket the build is destined for. This allows us to fulfill our design
189goal of having different ACLs for different pockets.
190
191For simplicity, the queue itself is maintained as a directory of
192files, where each file is a queue entry. To maintain order in the
193queue, the file names for queue entries are of the form
194``YYYYMMDDHHMMSS_XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX``, where ``X``
195indicates a random hexadecimal digit. Each file contains the
196parameters passed in over remctl (pocket, package, and commit ID to
197build), as well as the Kerberos principal of the user that requested
198the build, for logging.
199
200The Build Daemon
201----------------
202
203To actually execute builds, we run a separate daemon to monitor for
204new build requests in the build queue. The daemon uses inotify so that
205it's triggered whenever a new item is added to the build
206queue. Whenever an item in the build queue triggers the build daemon,
207the daemon first validates the build, then executes the build, and
208finally updates both the APT repository and Git superproject with the
209results of the build. The results of all attempted builds are recorded
210in a database table for future reference.
211
212Build Validation
213````````````````
214
215The first stage of processing a new build request is validating the
216build. First, the build daemon checks the version number of the
217requested package in each pocket of the repository. If the package is
218present in any other pocket with the same version number, but the Git
219commit for the package is different, the build errors out, because it
220is not possible for an APT repository to contain two different
221packages with the same name and version number.
222
223Next, the build daemon checks to make sure that the version number of
224the new package is a higher version number than the version currently
225in the APT repository, as version numbers must be monotonically
226increasing.
227
228Finally, we require new packages to be fast-forwards in Git of the
229previous version of the package. This is verified as well.
230
231As mentioned above, the ``allow_backtracking`` attribute can be set
232for a pocket to bypass the latter two checks in development
233environments.
234
235When the same package with the same version is inserted into multiple
236places in the same APT repository, the MD5 hash of the package is used
237to validate that it hasn't changed. Because rebuilding the same
238package causes the MD5 hash to change, when a version of a package
239identical to a version already in the APT repository is added to
240another pocket, we need to copy it directly. Since the validation
241stage already has all of the necessary information to detect this
242case, if the same version of a package is already present in another
243pocket, the validation stage returns this information.
244
245Build Execution
246```````````````
247
248Once the build has been validated, it can be executed. The requested
249version of the package is exported from Git, and then a Debian source
250package is generated. Next, the package itself is built using sbuild.
251
252sbuild creates an ephemeral build chroot for each build that has only
253essential build packages and the build dependencies for the package
254being built installed. We use sbuild for building packages for several
255reasons. First, it helps us verify that all necessary build
256dependencies have been included in our packages. Second, it helps us
257ensure that configuration files haven't been modified from their
258upstream defaults (which could cause problems for packages using
259config-package-dev_).
260
261The build daemon keeps the build logs from all attempted builds on the
262filesystem for later inspection.
263
264Repository Updates
265``````````````````
266
267Once the build has been successfully completed, the APT and Git
268repositories are updated to match the new state. First, a new tag is
269added to the package's Git repository for the current version
270[#]_. Next, the pocket tracking branch in the submodule is also
271updated with the new version of the package. Then the a new commit is
272created on the superproject which updates the package's submodule to
273point to the new version of the package. Finally, the new version of
274the package is included in the appropriate component of the APT
275repository.
276
277Because the Git superproject, the Git submodules, and the APT
278repository are all updated simultaneously to reflect the new package
279version, the Git repositories and the APT repository always stay in
280sync.
281
282Build Failures
283``````````````
284
285If any of the above stages of executing a build fail, that failure is
286trapped and recorded for later inspection, and recorded along with the
287build record in the database. Regardless of success or failure, the
288build daemon runs any scripts in a hook directory. The hook directory
289could contain scripts to publish the results of the build in whatever
290way is deemed useful by the developers.
291
292Security
293========
294
295As noted above, our intent was for a single instance of the
296Invirtibuilder to be used for both our trusted production environment
297and our untrusted development environment. In order to be trusted for
298the production environment, the Invirtibuilder needs to run in the
299production environment as well. However, it would be disastrous if
300access to the development environment allowed a developer to insert
301malicious packages into the production apt repository.
302
303In terms of policy, we enforce this distinction using the remctl ACL
304mechanism described in `The Build Queue`_. But is that mechanism on
305its own actually secure?
306
307Only mostly, it turns out.
308
309While actual package builds run unprivileged (with the help of the
310fakeroot_ tool), packages can declare arbitrary build dependencies
311that must be installed for the package build to run. Packages'
312maintainer scripts (post-install, pre-install, pre-removal, and
313post-removal scripts) run as root. This means that by uploading a
314malicious package that another package build-depends on, then
315triggering a build of the second package, it is possible to gain root
316privileges. Since breaking out of the build chroot as root is trivial
317[#], it is theoretically possible for developers with any level of
318access to the APT repositories to root the build server.
319
320One minor protection from this problem is the Invirtibuilder's
321reporting mechanism. A single independent malicious build can't
322compromise the build server on its own. Even if a second build
323compromises the build server, the first build will have already been
324reported through the hook mechanism described in `Build Failures`_. We
325encourage users of the Invirtibuilder to include hooks that send
326notifications of builds over e-mail or some other mechanism such that
327there are off-site records. The server will still be compromised, but
328there will be an audit trail.
329
330Such a vulnerability will always be a concern so long as builds are
331isolated using chroots. It is possible to protect against this sort of
332attack by strengthening the chroot mechanism (e.g. with grsecurity_)
333or by using a more isolated build mechanism
334(e.g. qemubuilder_). However, we decided that the security risk didn't
335justify the additional implementation effort or runtime overhead.
336
337Future Directions
338=================
339
340While the Invirtibuilder was written as a tool for the Invirt_
341project, taking advantage of infrastructure specific to Invirt, it was
342designed with the hope that it could one day be expanded to be useful
343outside of our infrastructure. Here we outline what we believe the
344next steps for development of the Invirtibuilder are.
345
346One deficiency that affects Invirt_ development already is the
347assumption that all packages are Debian-native [#]. Even for packages
348which have a non-native version number, the Invirtibuilder will create
349a Debian-native source package when the package is exported from Git
350as part of the `Build Execution`_. Correcting this requires a means to
351find and extract the upstream tarball from the Git repository. This
352could probably be done by involving the pristine-tar_ tool.
353
354The Invirtibuilder is currently tied to the configuration framework
355developed for the Invirt_ project. To be useful outside of Invirt, the
356Invirtibuilder needs its own, separate mechanism for providing and
357parsing configuration. It should not be difficult to use a separate
358configuration file but a similar YAML configuration mechanism for the
359Invirtibuilder. And of course, as part of that process, filesystem
360paths and the like that are currently hard-coded should be replaced
361with configuration options.
362
363The Invirtibuilder additionally relies on the authentication and
364authorization mechanisms used for Invirt_. Our RPC protocol of choice,
365remctl_, requires a functional Kerberos environment for
366authentication, limiting its usefulness for one-off projects not
367associated with an already existing Kerberos realm. We would like to
368provide support for some alternative RPC mechanism—possibly
369ssh. Additionally, there needs to be some way to expand the build ACLs
370for each pocket that isn't tied to Invirt's authorization
371framework. One option would be providing an executable in the
372configuration that, when passed a pocket as a command-line argument,
373prints out all of the principals that should have access to that
374pocket.
375
376.. _config-package-dev: http://debathena.mit.edu/config-packages
377.. _fakeroot: http://fakeroot.alioth.debian.org/
378.. _git-buildpackage: https://honk.sigxcpu.org/piki/projects/git-buildpackage/
379.. _grsecurity: http://www.grsecurity.net/
380.. _Invirt: http://invirt.mit.edu
381.. _pristine-tar: http://joey.kitenet.net/code/pristine-tar/
382.. _qemubuilder: http://wiki.debian.org/qemubuilder
383.. _remctl: http://www.eyrie.org/~eagle/software/remctl/
384.. _SIPB: http://sipb.mit.edu
385.. _VCS location information: http://www.debian.org/doc/developers-reference/best-pkging-practices.html#bpp-vcs
386.. _YAML: http://yaml.org/
387
388.. [#] http://lwn.net/Articles/246381/
389.. [#] A Git submodule is a second Git repository embedded at a
390       particular path within the superproject and fixed at a
391       particular commit.
392.. [#] Because we don't force any sort of version consistency for
393       pockets with ``allow_backtracking`` set to ``True``, we don't
394       create new tags for builds on pockets with
395       ``allow_backtracking`` set to ``True`` either.
396.. [#] http://kerneltrap.org/Linux/Abusing_chroot
397.. [#] http://people.debian.org/~mpalmer/debian-mentors_FAQ.html#native_vs_non_native
Note: See TracBrowser for help on using the repository browser.