Opened 16 years ago

Closed 15 years ago

#64 closed defect (fixed)

Booting 32-bit guest kernel

Reported by: broder Owned by: broder
Priority: minor Milestone: Public Beta
Component: xen Version:
Keywords: Cc:

Description (last modified by broder)

A 32-bit kernel, even with PAE enabled, kernel panics very early in the boot process. This makes it really annoying to convert a 32-bit install to a ParaVM without a wheelbarrow full of --try-harders.

I've asked a few times on irc.freenode.net/##xen, and so far nobody has any good ideas.

Attachments (1)

update-grub.diff (422 bytes) - added by broder 16 years ago.
Patch to cause update-grub to put both Xen and non-Xen kernels in /boot/grub/menu.lst

Download all attachments as: .zip

Change History (11)

comment:1 Changed 16 years ago by broder

  • Milestone set to Public Beta

comment:2 Changed 16 years ago by broder

I started looking through some of the pygrub source code. I can't tell everything that it's doing, but I can follow it through reading the partition table, finding the boot partition, and looking for the grub config file.

But...it doesn't seem to be finding the grub config file. Once it finds a partition, it calls out to a C library (libfsimage) which determines and parses the filesystem and returns files given a path. As best as I can tell, the library is broken. At the very least, it doesn't seem to be dealing well with the image for d_broder-test, whose first partition is /

comment:3 Changed 16 years ago by broder

Ok - I took a VM whose partition table pygrub could read, and I changed it into a ParaVM by installing the linux-image-2.6.22-14-xen package and manually frobbing /boot/grub/menu.lst so that it was the default. When I then try to boot the VM with a serial console, pygrub accurately reads the /boot/grub/menu.lst and prompts me for which kernel I want, and I select the xen one. I then get this:

Started domain d_fink
                     [72282.114945] PCI: Fatal: No config space access function found
[72282.238629] i8042.c: No controller found.
[72282.239719] Buffer I/O error on device hda, logical block 0
[72282.239745] BUG: unable to handle kernel paging request at virtual address a33e9cc4
[72282.239751]  printing eip:
[72282.239757] 0152c000 -> *pde = 00000000:00000000
[72282.239761] Oops: 0000 [#1]
[72282.239763] SMP 
[72282.239767] Modules linked in:
[72282.239772] CPU:    0
[72282.239773] EIP:    0061:[<c026e783>]    Not tainted VLI
[72282.239774] EFLAGS: 00010887   (2.6.22-14-xen #1)
[72282.239784] EIP is at blkif_int+0x63/0x1c0
[72282.239788] eax: d4009c00   ebx: c1076ba0   ecx: c03cf140   edx: 03000100
[72282.239792] esi: cf3e0000   edi: 00000000   ebp: c10ff0ac   esp: c03efec0
[72282.239796] ds: 007b   es: 007b   fs: 00d8  gs: 0000  ss: e021
[72282.239800] Process swapper (pid: 0, ti=c03ee000 task=c03bb1c0 task.ti=c03ee000)
[72282.239803] Stack: 00000205 c02398af 00000000 00000001 00000002 00000000 03000100 00000001 
[72282.239813]        c1076ba0 00000000 00000000 00000105 c014e384 c03e1680 00000105 00000000 
[72282.239822]        c03e16ac c014fc33 00008280 c03e1680 00000000 00000105 c0106e1b 0775369b 
[72282.239832] Call Trace:
[72282.239837]  [<c02398af>] add_timer_randomness+0x16f/0x190
[72282.239844]  [<c014e384>] handle_IRQ_event+0x24/0x80
[72282.239851]  [<c014fc33>] handle_level_irq+0x83/0x110
[72282.239856]  [<c0106e1b>] do_IRQ+0x3b/0x70
[72282.239861]  [<c02599f6>] evtchn_do_upcall+0xb6/0xf0
[72282.239867]  [<c01057a6>] hypervisor_callback+0x46/0x4e
[72282.239873]  [<c0108120>] xen_safe_halt+0xa0/0xf0
[72282.239878]  [<c0104361>] xen_idle+0x31/0x60
[72282.239883]  [<c01033f8>] cpu_idle+0x68/0xc0
[72282.239887]  [<c03f3ab5>] start_kernel+0x335/0x3d0
[72282.239893]  [<c03f31f0>] unknown_bootoption+0x0/0x260
[72282.239899]  =======================
[72282.239902] Code: 0c 0f 84 03 01 00 00 89 44 24 1c 8b 46 24 83 e8 01 23 44 24 1c 6b c0 6c 8d 68 40 03 6e 28 8b 55 00 69 c2 9c 00 00 00 89 54 24 18 <8b> 94 30 c4 00 00 00 8d 5c 30 58 89 54 24 08 80 7b 01 00 74 1a 
[72282.239947] EIP: [<c026e783>] blkif_int+0x63/0x1c0 SS:ESP e021:c03efec0
[72282.239954] Kernel panic - not syncing: Fatal exception in interrupt

Any of the kernel hackers have brilliant ideas?

comment:4 Changed 16 years ago by broder

Oh - so removing the quiet option from the kernel command line gives the longer but equally uninformative:

Started domain d_fink
                     [    0.000000] Linux version 2.6.22-14-xen (buildd@terranova) (gcc version 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)) #1 SMP Tue Feb 12 09:27:26 UTC 2008 (Unofficial)
[    0.000000] Reserving virtual address space above 0xf5800000
[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  Xen: 0000000000000000 - 0000000010800000 (usable)
[    0.000000] 0MB HIGHMEM available.
[    0.000000] 264MB LOWMEM available.
[    0.000000] NX (Execute Disable) protection: active
[74014.471164] Zone PFN ranges:
[74014.471166]   DMA             0 ->    67584
[74014.471168]   Normal      67584 ->    67584
[74014.471170]   HighMem     67584 ->    67584
[74014.471171] early_node_map[1] active PFN ranges
[74014.471173]     0:        0 ->    67584
[74014.476979] ACPI in unprivileged domain disabled
[74014.477536] Allocating PCI resources starting at 20000000 (gap: 10800000:ef800000)
[74014.477594] Built 1 zonelists.  Total pages: 67056
[74014.477598] Kernel command line: root=/dev/mapper/fink-root ro
[74014.477790] Enabling fast FPU save and restore... done.
[74014.477797] Enabling unmasked SIMD FPU exception support... done.
[74014.477800] Initializing CPU#0
[74014.478030] PID hash table entries: 2048 (order: 11, 8192 bytes)
[74014.478060] Xen reported: 2393.640 MHz processor.
[74014.478087] Console: colour dummy device 80x25
[74014.478462] Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
[74014.478683] Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
[74014.478736] Software IO TLB disabled
[74014.478743] vmalloc area: d1000000-f53fe000, maxmem 2d800000
[74014.481532] Memory: 238080k/270336k available (2071k kernel code, 23788k reserved, 925k data, 200k init, 0k highmem)
[74014.481557] virtual kernel memory layout:
[74014.481558]     fixmap  : 0xf568f000 - 0xf57ff000   (1472 kB)
[74014.481559]     pkmap   : 0xf5400000 - 0xf5600000   (2048 kB)
[74014.481560]     vmalloc : 0xd1000000 - 0xf53fe000   ( 579 MB)
[74014.481561]     lowmem  : 0xc0000000 - 0xd0800000   ( 264 MB)
[74014.481563]       .init : 0xc03f3000 - 0xc0425000   ( 200 kB)
[74014.481564]       .data : 0xc0305dfe - 0xc03ed544   ( 925 kB)
[74014.481565]       .text : 0xc0100000 - 0xc0305dfe   (2071 kB)
[74014.481584] Checking if this processor honours the WP bit even in supervisor mode... Ok.
[74014.481634] SLUB: Genslabs=22, HWalign=64, Order=0-1, MinObjects=4, CPUs=1, Nodes=1
[74014.546346] Calibrating delay using timer specific routine.. 4807.05 BogoMIPS (lpj=9614119)
[74014.546400] Security Framework v1.0.0 initialized
[74014.546409] SELinux:  Disabled at boot.
[74014.546424] Mount-cache hash table entries: 512
[74014.546556] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
[74014.546563] CPU: L2 Cache: 1024K (64 bytes/line)
[74014.546576] Compat vDSO mapped to f57fe000.
[74014.546586] Checking 'hlt' instruction... OK.
[74014.546749] SMP alternatives: switching to UP code
[74014.546927] Freeing SMP alternatives: 11k freed
[74014.547021] Early unpacking initramfs... done
[74014.562610] Brought up 1 CPUs
[74014.563161] NET: Registered protocol family 16
[74014.564180] Brought up 1 CPUs
[74014.564203] PCI: Fatal: No config space access function found
[74014.564208] PCI: setting up Xen PCI frontend stub
[74014.564711] ACPI: Interpreter disabled.
[74014.564778] Linux Plug and Play Support v0.97 (c) Adam Belay
[74014.564862] pnp: PnP ACPI: disabled
[74014.565200] xen_mem: Initialising balloon driver.
[74014.574492] Setting mem allocation to 262144 kiB
[74014.574629] PCI: System does not support PCI
[74014.574635] PCI: System does not support PCI
[74014.574664] NET: Registered protocol family 8
[74014.574668] NET: Registered protocol family 20
[74014.574928] NET: Registered protocol family 2
[74014.577962] Time: xen clocksource has been installed.
[74014.606007] IP route cache hash table entries: 4096 (order: 2, 16384 bytes)
[74014.606061] TCP established hash table entries: 16384 (order: 5, 196608 bytes)
[74014.606197] TCP bind hash table entries: 16384 (order: 5, 131072 bytes)
[74014.606295] TCP: Hash tables configured (established 16384 bind 16384)
[74014.606301] TCP reno registered
[74014.618050] checking if image is initramfs... it is
[74014.641107] Freeing initrd memory: 16548k freed
[74014.641398] audit: initializing netlink socket (disabled)
[74014.641415] audit(1207149973.975:1): initialized
[74014.642684] VFS: Disk quotas dquot_6.5.1
[74014.642723] Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
[74014.642825] io scheduler noop registered
[74014.642831] io scheduler anticipatory registered
[74014.642835] io scheduler deadline registered
[74014.642848] io scheduler cfq registered (default)
[74014.660321] Real Time Clock Driver v1.12ac
[74014.660732] RAMDISK driver initialized: 16 RAM disks of 65536K size 1024 blocksize
[74014.660864] input: Macintosh mouse button emulation as /class/input/input0
[74014.660913] xencons_init: Initializing xen vfb; pass xencons=tty to prevent this
[74014.661011] Xen virtual console successfully installed as xvc0
[74014.661061] Event-channel device installed.
[74014.677656] netfront: Initialising virtual ethernet driver.
[74014.687546] PNP: No PS/2 controller found. Probing ports directly.
[74014.688374] i8042.c: No controller found.
[74014.688460] mice: PS/2 mouse device common for all mice
[74014.688580] TCP cubic registered
[74014.688596] NET: Registered protocol family 1
[74014.688622] Using IPI No-Shortcut mode
[74014.689117] xen-vbd: registered block device major 3
[74014.689469]  hda:end_request: I/O error, dev hda, sector 0
[74014.689579] Buffer I/O error on device hda, logical block 0
[74014.689607] BUG: unable to handle kernel paging request at virtual address a33e7cc4
[74014.689614]  printing eip:
[74014.689617] c026e783
[74014.689623] 0152c000 -> *pde = 00000000:00000000
[74014.689628] Oops: 0000 [#1]
[74014.689631] SMP 
[74014.689635] Modules linked in:
[74014.689641] CPU:    0
[74014.689641] EIP:    0061:[<c026e783>]    Not tainted VLI
[74014.689643] EFLAGS: 00010887   (2.6.22-14-xen #1)
[74014.689655] EIP is at blkif_int+0x63/0x1c0
[74014.689659] eax: d4009c00   ebx: c107eba0   ecx: c03cf140   edx: 03000100
[74014.689664] esi: cf3de000   edi: 00000000   ebp: c14a30ac   esp: c03efec0
[74014.689669] ds: 007b   es: 007b   fs: 00d8  gs: 0000  ss: e021
[74014.689674] Process swapper (pid: 0, ti=c03ee000 task=c03bb1c0 task.ti=c03ee000)
[74014.689678] Stack: 00000205 c02398af 00000000 00000001 00000002 00000000 03000100 00000001 
[74014.689689]        c107eba0 00000000 00000000 00000105 c014e384 c03e1680 00000105 00000000 
[74014.689699]        c03e16ac c014fc33 00008280 c03e1680 00000000 00000105 c0106e1b 07773069 
[74014.689710] Call Trace:
[74014.689715]  [<c02398af>] add_timer_randomness+0x16f/0x190
[74014.689722]  [<c014e384>] handle_IRQ_event+0x24/0x80
[74014.689731]  [<c014fc33>] handle_level_irq+0x83/0x110
[74014.689737]  [<c0106e1b>] do_IRQ+0x3b/0x70
[74014.689742]  [<c02599f6>] evtchn_do_upcall+0xb6/0xf0
[74014.689747]  [<c01057a6>] hypervisor_callback+0x46/0x4e
[74014.689753]  [<c0108120>] xen_safe_halt+0xa0/0xf0
[74014.689758]  [<c0104361>] xen_idle+0x31/0x60
[74014.689763]  [<c01033f8>] cpu_idle+0x68/0xc0
[74014.689767]  [<c03f3ab5>] start_kernel+0x335/0x3d0
[74014.689773]  [<c03f31f0>] unknown_bootoption+0x0/0x260
[74014.689779]  =======================
[74014.689781] Code: 0c 0f 84 03 01 00 00 89 44 24 1c 8b 46 24 83 e8 01 23 44 24 1c 6b c0 6c 8d 68 40 03 6e 28 8b 55 00 69 c2 9c 00 00 00 89 54 24 18 <8b> 94 30 c4 00 00 00 8d 5c 30 58 89 54 24 08 80 7b 01 00 74 1a 
[74014.689830] EIP: [<c026e783>] blkif_int+0x63/0x1c0 SS:ESP e021:c03efec0
[74014.689838] Kernel panic - not syncing: Fatal exception in interrupt

comment:5 Changed 16 years ago by broder

Ok - It seems that this kernel panic only happens with a 32-bit kernel. ParaVMs will boot fine with pygrub if they're running a 64-bit kernel - even with a 32-bit userland.

I'm really hitting the wall of my knowledge, though - it'd be great if someone else could help debug this.

comment:6 Changed 16 years ago by andersk

  • Owner hartmans deleted
  • Status changed from new to assigned

comment:7 Changed 16 years ago by broder

  • Owner set to broder

Changed 16 years ago by broder

Patch to cause update-grub to put both Xen and non-Xen kernels in /boot/grub/menu.lst

comment:8 Changed 16 years ago by broder

  • Priority changed from minor to trivial

An Ubuntu machine needs the following packages installed from the amd64 repository: linux-headers-2.6.22-14 linux-headers-2.6.22-14-xen linux-image-2.6.22-14-xen linux-restricted-modules-2.6.22-14-xen linux-restricted-modules-common linux-ubuntu-modules-2.6.22-14-xen

(Or equivalents for other versions of Ubuntu)

The packages needed for Debian are similar, but, e.g., probably don't have ubuntu in their name.

For a 32-bit machine, find the packages and download them from packages.ubuntu.com and install them with dpkg -i --force-all.

The stock version of update-grub will only configure Xen kernels if currently booted in ParaVM mode, and only non-Xen kernels otherwise. The attached patch will alter update-grub to include all kernels. You should check and make sure the right kernel is selected by default for your application.

Additionally, if you have a 32-bit machine and you force the 64-bit packages on it, module-assistant won't work because it's extremely confused. Fortunately, the amd64 Xen packages for OpenAFS from Debathena seem to work, and I'm not dependent on anything else.

comment:9 Changed 16 years ago by broder

  • Description modified (diff)
  • Priority changed from trivial to minor
  • Summary changed from Converting HVM to ParaVM to Booting 32-bit guest kernel
  • Type changed from enhancement to defect

comment:10 Changed 15 years ago by broder

  • Resolution set to fixed
  • Status changed from assigned to closed

This apparently magically works on the new prod cluster

Note: See TracTickets for help on using tickets.