1 | ============================ |
---|
2 | Design of The Invirtibuilder |
---|
3 | ============================ |
---|
4 | |
---|
5 | Introduction |
---|
6 | ============ |
---|
7 | |
---|
8 | The Invirtibuilder is an automated Debian package builder, APT |
---|
9 | repository manager, and Git repository hosting tool. It is intended |
---|
10 | for projects that consist of a series of Debian packages each tracked |
---|
11 | as a separate Git repository, and designed to keep the Git and APT |
---|
12 | repositories in sync with each other. The Invirtibuilder supports |
---|
13 | having multiple threads, or "pockets" of development, and can enforce |
---|
14 | different access control and repository consistency rules for each |
---|
15 | pocket. |
---|
16 | |
---|
17 | Background and Goals |
---|
18 | ==================== |
---|
19 | |
---|
20 | The Invirtibuilder was originally developed for Invirt_, a project of |
---|
21 | the MIT SIPB_. When we went to develop a tool for managing our APT and |
---|
22 | Git repositories, we had several goals, each of which informed the |
---|
23 | design of the Invirtibuilder: |
---|
24 | |
---|
25 | * One Git repository per Debian package. |
---|
26 | |
---|
27 | Because of how Git tracks history, it's better suited for tracking a |
---|
28 | series of small repositories, as opposed to one large one |
---|
29 | [#]_. Furthermore, most pre-existing tools and techniques for |
---|
30 | dealing with Debian packages in Git repositories (such as |
---|
31 | git-buildpackage_ or `VCS location information`_) are designed |
---|
32 | exclusively for this case. |
---|
33 | |
---|
34 | * Synchronization between Git and APT repositories. |
---|
35 | |
---|
36 | In our previous development models, we would frequently merge |
---|
37 | development into trunk without necessarily being ready to deploy it |
---|
38 | to our APT repository (and by extension, our servers) yet. However, |
---|
39 | once the changes had been merged in, it was no longer possible to |
---|
40 | see the current state of the APT repository purely from inspection |
---|
41 | of the source control repository. |
---|
42 | |
---|
43 | * Support for multiple *pockets* of development. |
---|
44 | |
---|
45 | For the Invirt_ project, we maintain separate production and |
---|
46 | development environments. Initially, they each shared the same APT |
---|
47 | repository. To test changes, we had to install them into the APT |
---|
48 | repository and install the update on our development cluster, and |
---|
49 | simply wait to take the update on our production cluster until |
---|
50 | testing was completed. When designing the Invirtibuilder, we wanted |
---|
51 | the set of packages available to our development cluster to be |
---|
52 | separate from the packages in the production cluster. |
---|
53 | |
---|
54 | * Different ACLs for different pockets. |
---|
55 | |
---|
56 | Access to our development cluster is relatively unrestricted—we |
---|
57 | freely grant access to interested developers to encourage |
---|
58 | contributions to the project. Our production cluster, on the other |
---|
59 | hand, has a much higher standard of security, and access is limited |
---|
60 | to the core maintainers of the service. The Invirtibuilder needed to |
---|
61 | support that separation of privilege. |
---|
62 | |
---|
63 | * Tool-enforced version number restrictions. |
---|
64 | |
---|
65 | Keeping our packages in APT repositories adds a few restrictions to |
---|
66 | the version numbers of packages. First, version numbers in the APT |
---|
67 | repository must be unique. That is, you can not have two different |
---|
68 | packages of the same name and version number. Second, version |
---|
69 | numbers are expected to be monotonically increasing. If a newer |
---|
70 | version of a package had a lower version number than the older |
---|
71 | version, dpkg would consider this a downgrade. Downgrades are not |
---|
72 | supported by dpkg, and will not even be attempted by APT. |
---|
73 | |
---|
74 | In order to avoid proliferation of version numbers used only for |
---|
75 | testing purposes, we opted to bend the latter rule for our |
---|
76 | development pocket. |
---|
77 | |
---|
78 | * Tool-enforced consistent history. |
---|
79 | |
---|
80 | In order for the Git history to be meaningful, we chose to require |
---|
81 | that each version of a package that is uploaded into the APT |
---|
82 | repository be a fast-forward of the previous version. |
---|
83 | |
---|
84 | Again, to simplify and encourage testing, we bend this rule for the |
---|
85 | development pocket as well. |
---|
86 | |
---|
87 | Design |
---|
88 | ====== |
---|
89 | |
---|
90 | Configuration |
---|
91 | ------------- |
---|
92 | |
---|
93 | For the Invirt_ project's use of the Invirtibuilder, we adapted our |
---|
94 | existing configuration mechanism. Our configuration file consists of a |
---|
95 | singls YAML_ file. Here is the snippet of configuration we use for our |
---|
96 | build configuration:: |
---|
97 | |
---|
98 | build: |
---|
99 | pockets: |
---|
100 | prod: |
---|
101 | acl: system:xvm-root |
---|
102 | apt: stable |
---|
103 | dev: |
---|
104 | acl: system:xvm-dev |
---|
105 | apt: unstable |
---|
106 | allow_backtracking: yes |
---|
107 | tagger: |
---|
108 | name: Invirt Build Server |
---|
109 | email: invirt@mit.edu |
---|
110 | |
---|
111 | The Invirtibuilder allows naming Invirtibuilder pockets separately |
---|
112 | form their corresponding Git branches or APT components. However, if |
---|
113 | either the ``git`` or ``apt`` properties of the pocket are |
---|
114 | unspecified, they are assumed to be the same as the name of the |
---|
115 | pocket. |
---|
116 | |
---|
117 | The ``acl`` attributes for each pocket are interpreted within our |
---|
118 | authorization modules to determine who is allowed to request builds on |
---|
119 | a given pocket. ``system:xvm-root`` and ``system:xvm-dev`` are the |
---|
120 | names of AFS groups, which we use for authorization. |
---|
121 | |
---|
122 | The ``tagger`` attribute indicates the name and e-mail address to be |
---|
123 | used whenever the Invirtibuilder generates new Git repository objects, |
---|
124 | such as commits or tags. |
---|
125 | |
---|
126 | Finally, it was mentioned in `Background and Goals`_ that we wanted |
---|
127 | the ability to not force version number consistency or Git |
---|
128 | fast-forwards for our development pocket. The ``allow_backtracking`` |
---|
129 | attribute was introduced to indicate that preference. When it is set |
---|
130 | to ``yes`` (i.e. YAML's "true" value), then neither fast-forwards nor |
---|
131 | increasing-version-numbers are enforced when validating builds. The |
---|
132 | attribute is assumed to be false if undefined. |
---|
133 | |
---|
134 | Git Repositories |
---|
135 | ---------------- |
---|
136 | |
---|
137 | In order to make it easy to check out all packages at once, and for |
---|
138 | version controlling the state of the APT repository, we create a |
---|
139 | "superproject" using Git submodules [#]_. |
---|
140 | |
---|
141 | There is one Git branch in the superproject corresponding to each |
---|
142 | pocket of development. Each branch contains a submodule for each |
---|
143 | package in the corresponding component of the APT repository, and the |
---|
144 | submodule commit referred to by the head of the Git branch matches the |
---|
145 | revision of the package currently in the corresponding component of |
---|
146 | the APT repository. Thus, the heads of the Git superproject match the |
---|
147 | state of the components in the APT repository. |
---|
148 | |
---|
149 | Each of the submodules also has a branch for each pocket. The head of |
---|
150 | that branch points to the revision of the package that is currently in |
---|
151 | the corresponding component of the APT repository. This provides a |
---|
152 | convenient branching point for new development. Additionally, there is |
---|
153 | a Git tag for every version of the package that has ever been uploaded |
---|
154 | to the APT repository. |
---|
155 | |
---|
156 | Because the Invirtibuilder and its associated infrastructure are |
---|
157 | responsible for keeping the superproject in sync with the state of the |
---|
158 | APT repository, an update hook disallows all pushes to the |
---|
159 | superproject. |
---|
160 | |
---|
161 | Pushes to the submodules, on the other hand, are almost entirely |
---|
162 | unrestricted. Like with the superproject, the Git branches for each |
---|
163 | pocket and Git tags are maintained by the build infrastructure, so |
---|
164 | pushes to them are disallowed. Outside of that, we make no |
---|
165 | restrictions on the creation or deletion of branches, nor are pushes |
---|
166 | required to be fast-forwards. |
---|
167 | |
---|
168 | The Build Queue |
---|
169 | --------------- |
---|
170 | |
---|
171 | We considered several ways to trigger builds of new package versions |
---|
172 | using Git directly. However, we realized that what we actually wanted |
---|
173 | was a separate build queue where each build request was handled and |
---|
174 | processed independently of any requests before or after it. It's not |
---|
175 | possible to have these semantics using Git as a signalling mechanism |
---|
176 | without breaking standard assumptions about how remote Git |
---|
177 | repositories work. |
---|
178 | |
---|
179 | In order to trigger builds, then, we needed a side-channel. Since it |
---|
180 | was already widely used in the Invirt_ project, we chose to use |
---|
181 | remctl_, a GSSAPI-authenticated RPC protocol with per-command ACLs. |
---|
182 | |
---|
183 | To trigger a new build, a developer calls remctl against the build |
---|
184 | server with a pocket, a package, and a commit ID from that package's |
---|
185 | Git repository. The remctl daemon then calls a script which validates |
---|
186 | the build and adds it to the build queue. Because of the structure of |
---|
187 | remctl's ACLs, we are able to have different ACLs depending on which |
---|
188 | pocket the build is destined for. This allows us to fulfil our design |
---|
189 | goal of having different ACLs for different pockets. |
---|
190 | |
---|
191 | For simplicity, the queue itself is maintained as a directory of |
---|
192 | files, where each file is a queue entry. To maintain order in the |
---|
193 | queue, the file names for queue entries are of the form |
---|
194 | ``YYYYMMDDHHMMSS_XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX``, where ``X`` |
---|
195 | indicates a random hexadecimal digit. Each file contains the |
---|
196 | parameters passed in over remctl (pocket, package, and commit ID to |
---|
197 | build), as well as the Kerberos principal of the user that requested |
---|
198 | the build, for logging. |
---|
199 | |
---|
200 | The Build Daemon |
---|
201 | ---------------- |
---|
202 | |
---|
203 | To actually execute builds, we run a separate daemon to monitor for |
---|
204 | new build requests in the build queue. The daemon uses inotify so that |
---|
205 | it's triggered whenever a new item is added to the build |
---|
206 | queue. Whenever an item in the build queue triggers the build daemon, |
---|
207 | the daemon first validates the build, then executes the build, and |
---|
208 | finally updates both the APT repository and Git superproject with the |
---|
209 | results of the build. The results of all attempted builds are recorded |
---|
210 | in a database table for future reference. |
---|
211 | |
---|
212 | Build Validation |
---|
213 | ```````````````` |
---|
214 | |
---|
215 | The first stage of processing a new build request is validating the |
---|
216 | build. First, the build daemon checks the version number of the |
---|
217 | requested package in each pocket of the repository. If the package is |
---|
218 | present in any other pocket with the same version number, but the Git |
---|
219 | commit for the package is different, the build errors out, because it |
---|
220 | is not possible for an APT repository to contain two different |
---|
221 | packages with the same name and version number. |
---|
222 | |
---|
223 | Next, the build daemon checks to make sure that the version number of |
---|
224 | the new package is a higher version number than the version currently |
---|
225 | in the APT repository, as version numbers must be monotonically |
---|
226 | increasing. |
---|
227 | |
---|
228 | Finally, we require new packages to be fast-forwards in Git of the |
---|
229 | previous version of the package. This is verified as well. |
---|
230 | |
---|
231 | As mentioned above, the ``allow_backtracking`` attribute can be set |
---|
232 | for a pocket to bypass the latter two checks in development |
---|
233 | environments. |
---|
234 | |
---|
235 | When the same package with the same version is inserted into multiple |
---|
236 | places in the same APT repository, the MD5 hash of the package is used |
---|
237 | to validate that it hasn't changed. Because rebuilding the same |
---|
238 | package causes the MD5 hash to change, when a version of a package |
---|
239 | identical to a version already in the APT repository is added to |
---|
240 | another pocket, we need to copy it directly. Since the validation |
---|
241 | stage already has all of the necessary information to detect this |
---|
242 | case, if the same version of a package is already present in another |
---|
243 | pocket, the validation stage returns this information. |
---|
244 | |
---|
245 | Build Execution |
---|
246 | ``````````````` |
---|
247 | |
---|
248 | Once the build has been validated, it can be executed. The requested |
---|
249 | version of the package is exported from Git, and then a Debian source |
---|
250 | package is generated. Next, the package itself is built using sbuild. |
---|
251 | |
---|
252 | sbuild creates an ephemeral build chroot for each build that has only |
---|
253 | essential build packages and the build dependencies for the package |
---|
254 | being built installed. We use sbuild for building packages for several |
---|
255 | reasons. First, it helps us verify that all necessary build |
---|
256 | dependencies have been included in our packages. Second, it helps us |
---|
257 | ensure that configuration files haven't been modified from their |
---|
258 | upstream defaults (which could cause problems for packages using |
---|
259 | config-package-dev_). |
---|
260 | |
---|
261 | The build daemon keeps the build logs from all attempted builds on the |
---|
262 | filesystem for later inspection. |
---|
263 | |
---|
264 | Repository Updates |
---|
265 | `````````````````` |
---|
266 | |
---|
267 | Once the build has been successfully completed, the APT and Git |
---|
268 | repositories are updated to match the new state. First, a new tag is |
---|
269 | added to the package's Git repository for the current version |
---|
270 | [#]_. Next, the pocket tracking branch in the submodule is also |
---|
271 | updated with the new version of the package. Then the a new commit is |
---|
272 | created on the superproject which updates the package's submodule to |
---|
273 | point to the new version of the package. Finally, the new version of |
---|
274 | the package is included in the appropriate component of the APT |
---|
275 | repository. |
---|
276 | |
---|
277 | Because the Git superproject, the Git submodules, and the APT |
---|
278 | repository are all updated simultaneously to reflect the new package |
---|
279 | version, the Git repositories and the APT repository always stay in |
---|
280 | sync. |
---|
281 | |
---|
282 | Build Failures |
---|
283 | `````````````` |
---|
284 | |
---|
285 | If any of the above stages of executing a build fail, that failure is |
---|
286 | trapped and recorded for later inspection, and recorded along with the |
---|
287 | build record in the database. Regardless of success or failure, the |
---|
288 | build daemon runs any scripts in a hook directory. The hook directory |
---|
289 | could contain scripts to publish the results of the build in whatever |
---|
290 | way is deemed useful by the developers. |
---|
291 | |
---|
292 | Security |
---|
293 | ======== |
---|
294 | |
---|
295 | As noted above, our intent was for a single instance of the |
---|
296 | Invirtibuilder to be used for both our trusted production environment |
---|
297 | and our untrusted development environment. In order to be trusted for |
---|
298 | the production environment, the Invirtibuilder needs to run in the |
---|
299 | production environment as well. However, it would be disasterous if |
---|
300 | access to the development environment allowed a developer to insert |
---|
301 | malicious packages into the production apt repository. |
---|
302 | |
---|
303 | In terms of policy, we enforce this distinction using the remctl ACL |
---|
304 | mechanism described in `The Build Queue`_. But is that mechanism on |
---|
305 | its own actually secure? |
---|
306 | |
---|
307 | Only mostly, it turns out. |
---|
308 | |
---|
309 | While actual package builds run unprivileged (with the help of the |
---|
310 | fakeroot_ tool), packages can declare arbitrary build dependencies |
---|
311 | that must be installed for the package build to run. Packages' |
---|
312 | maintainer scripts (post-install, pre-install, pre-removal, and |
---|
313 | post-removal scripts) run as root. This means that by uploading a |
---|
314 | malicious package that another package build-depends on, then |
---|
315 | triggering a build of the second package, it is possible to gain root |
---|
316 | privileges. Since breaking out of the build chroot as root is trivial |
---|
317 | [#], it is theoretically possible for developers with any level of |
---|
318 | access to the APT repositories to root the build server. |
---|
319 | |
---|
320 | One minor protection from this problem is the Invirtibuilder's |
---|
321 | reporting mechanism. A single independent malicious build can't |
---|
322 | compromise the build server on its own. Even if a second build |
---|
323 | compromises the build server, the first build will have already been |
---|
324 | reported through the hook mechanism described in `Build Failures`_. We |
---|
325 | encourage users of the Invirtibuilder to include hooks that send |
---|
326 | notifications of builds over e-mail or some other mechanism such that |
---|
327 | there are off-site records. The server will still be compromised, but |
---|
328 | there will be an audit trail. |
---|
329 | |
---|
330 | Such a vulnerability will always be a concern so long as builds are |
---|
331 | isolated using chroots. It is possible to protect against this sort of |
---|
332 | attack by strengthening the chroot mechanism (e.g. with grsecurity_) |
---|
333 | or by using a more isolated build mechanism |
---|
334 | (e.g. qemubuilder_). However, we decided that the security risk didn't |
---|
335 | justify the additional implementation effort or runtime overhead. |
---|
336 | |
---|
337 | Future Directions |
---|
338 | ================= |
---|
339 | |
---|
340 | While the Invirtibuilder was written as a tool for the Invirt_ |
---|
341 | project, taking advantage of infrastructure specific to Invirt, it was |
---|
342 | designed with the hope that it could one day be expanded to be useful |
---|
343 | outside of our infrastructure. Here we outline what we believe the |
---|
344 | next steps for development of the Invirtibuilder are. |
---|
345 | |
---|
346 | One deficiency that affects Invirt_ development already is the |
---|
347 | assumption that all packages are Debian-native [#]. Even for packages |
---|
348 | which have a non-native version number, the Invirtibuilder will create |
---|
349 | a Debian-native source package when the package is exported from Git |
---|
350 | as part of the `Build Execution`_. Correcting this requires a means to |
---|
351 | find and extract the upstream tarball from the Git repository. This |
---|
352 | could probably be done by involving the pristine-tar_ tool. |
---|
353 | |
---|
354 | The Invirtibuilder is currently tied to the configuration framework |
---|
355 | developed for the Invirt_ project. To be useful outside of Invirt, the |
---|
356 | Invirtibuilder needs its own, separate mechanism for providing and |
---|
357 | parsing configuration. It should not be difficult to use a separate |
---|
358 | configuration file but a similar YAML configuration mechanism for the |
---|
359 | Invirtibuilder. And of course, as part of that process, filesystem |
---|
360 | paths and the like that are currently hard-coded should be replaced |
---|
361 | with configuration options. |
---|
362 | |
---|
363 | The Invirtibuilder additionally relies on the authentication and |
---|
364 | authorization mechanisms used for Invirt_. Our RPC protocol of choice, |
---|
365 | remctl_, requires a functional Kerberos environment for |
---|
366 | authentication, limiting its usefulness for one-off projects not |
---|
367 | associated with an already existing Kerberos realm. We would like to |
---|
368 | provide support for some alternative RPC mechanism—possibly |
---|
369 | ssh. Additionally, there needs to be some way to expand the build ACLs |
---|
370 | for each pocket that isn't tied to Invirt's authorization |
---|
371 | framework. One option would be providing an executable in the |
---|
372 | configuration that, when passed a pocket as a command-line argument, |
---|
373 | prints out all of the principals that should have access to that |
---|
374 | pocket. |
---|
375 | |
---|
376 | .. _config-package-dev: http://debathena.mit.edu/config-packages |
---|
377 | .. _fakeroot: http://fakeroot.alioth.debian.org/ |
---|
378 | .. _git-buildpackage: https://honk.sigxcpu.org/piki/projects/git-buildpackage/ |
---|
379 | .. _grsecurity: http://www.grsecurity.net/ |
---|
380 | .. _Invirt: http://invirt.mit.edu |
---|
381 | .. _pristine-tar: http://joey.kitenet.net/code/pristine-tar/ |
---|
382 | .. _qemubuilder: http://wiki.debian.org/qemubuilder |
---|
383 | .. _remctl: http://www.eyrie.org/~eagle/software/remctl/ |
---|
384 | .. _SIPB: http://sipb.mit.edu |
---|
385 | .. _VCS location information: http://www.debian.org/doc/developers-reference/best-pkging-practices.html#bpp-vcs |
---|
386 | .. _YAML: http://yaml.org/ |
---|
387 | |
---|
388 | .. [#] http://lwn.net/Articles/246381/ |
---|
389 | .. [#] A Git submodule is a second Git repository embedded at a |
---|
390 | particular path within the superproject and fixed at a |
---|
391 | particular commit. |
---|
392 | .. [#] Because we don't force any sort of version consistency for |
---|
393 | pockets with ``allow_backtracking`` set to ``True``, we don't |
---|
394 | create new tags for builds on pockets with |
---|
395 | ``allow_backtracking`` set to ``True`` either. |
---|
396 | .. [#] http://kerneltrap.org/Linux/Abusing_chroot |
---|
397 | .. [#] http://people.debian.org/~mpalmer/debian-mentors_FAQ.html#native_vs_non_native |
---|