This part describes the GRUB internals so that developers can understand the implementation and start to hack GRUB. Of course, the source code has the complete information, so refer to it when you are not satisfied with this documentation.
GRUB is broken into 2 distinct components, or stages, which are loaded at different times in the boot process. The Stage 1 has to know where to find Stage 2, and the Stage 2 has to know where to find its configuration file (if Stage 2 doesn't have a configuration file, it drops into the command line interface and waits for a user command).
Here is the memory map of the various components (1):
See the file `stage2/shared.h', for more information.
GRUB's stage1 and stage2 have embedded variables whose locations are well-defined, so that the installation can patch the binary file directly without recompilation of the modules.
In stage1, these are defined (The number in the parenthesis of each entry is an offset number):
jmp
command to the starting address of
the component loaded by the stage1.
A stage1.5 should be loaded at address 0x2000, and a stage2
should be loaded at address 0x8000. Both use a CS of 0.
In the first sector of stage1.5 and stage2, the blocklists are recorded between firstlist (0x200) and lastlist (determined when assembling the file `stage2/start.S').
The trick here is that it is actually read backward, and the first 8-byte blocklist is not read here, but after the pointer is decremented 8 bytes, then after reading it, it decrements again, reads, decrements, reads, etc. until it is finished. The terminating condition is when the number of sectors to be read in the next blocklist is 0.
The format of a blocklist can be seen from the example in the code just
before the firstlist
label. Note that it is always from the
beginning of the disk, and not relative to the partition
boundaries.
In stage1.5 and stage2 (these are all defined at the beginning of `shared_src/asm.S'):
device:filename
(`:' is not present actually).
device is an unsigned long like install_partition, and
filename is an absolute filename or a blocklist. If device
is disabled, that is, the drive number is 0xff, then stage1.5 uses
the boot drive and the install partition instead.
For any particular partition, it is presumed that only one of the normal filesystems such as FAT, FFS, or ext2fs can be used, so there is a switch table managed by the functions in `disk_io.c'. The notation is that you can only mount one at a time.
The blocklist filesystem has a special place in the system. In addition to the normal filesystem (or even without one mounted), you can access disk blocks directly (in the indicated partition) via the blocklist notation. Using the blocklist filesystem doesn't effect any other filesystem mounts.
The variables which can be read by the filesystem backend are:
current_drive
current_partition
current_slice
saved_drive
saved_partition
part_start
part_length
print_possibilities
dir
function should print the possible completions
of a file, and false when it should try to actually open a file of that
name.
FSYS_BUF
The variables which need to be written by a filesystem backend are:
filepos
filemax
disk_read_func
NULL
at all other times (it will be
NULL
by default). If this isn't done correctly, then the
testload
and install
commands won't work
correctly.
The functions expected to be used by the filesystem backend are:
devread
grub_read
grub_read
can be used, after setting block_file
to 1.
The functions expected to be defined by the filesystem backend are described at least moderately in the file `filesys.h'. Their usage is fairly evident from their use in the functions in `disk_io.c', look for the use of the fsys_table array.
Caution: The semantics are such that then `mount'ing the filesystem, presume the filesystem buffer FSYS_BUF is corrupted, and (re-)load all important contents. When opening and reading a file, presume that the data from the `mount' is available, and doesn't get corrupted by the open/read (i.e. multiple opens and/or reads will be done with only one mount if in the same filesystem).
The disk space can be used in a boot loader is very restricted because a MBR (see section The structure of Master Boot Record) is only 512 bytes but it also contains a partition table (see section The format of partition table) and a BPB. So the question is how to make a boot loader code enough small to be fit in a MBR.
However, GRUB is a very large program, so we break GRUB into 2 (or 3) distinct components, stage1 and stage2 (and optionally stage1.5). See section The memory map of various components, for more information.
We embed stage1 in a MBR or in the boot sector of a partition , and place stage2 in a filesystem. The optional stage1.5 can be installed in a filesystem, in the boot loader area in a FFS, and in the sectors right after a MBR, because stage1.5 is enough small and the sectors right after a MBR is normally an unused region. The size of this region is the number of sectors per head minus 1.
Thus, all the stage1 must do is just load a stage2 or stage1.5. But even if stage1 needs not to support the user interface or the filesystem interface, it is impossible to make stage1 less than 400 bytes, because GRUB should support both the CHS mode and the LBA mode (see section INT 13H disk I/O interrupts).
The solution used by GRUB is that stage1 loads only the first sector of a stage2 (or a stage1.5) and stage2 itself loads the rest. The flow of stage1 is:
The flow of stage2 (and stage1.5) is:
Note that stage2 (or stage1.5) does not probe the geometry or the accessing mode of the loading drive, since stage1 has already probed them.
In the PC world, BIOS cannot detect if a hard disk drive is SCSI or IDE, generally speaking. Thus, it is not trivial to know which BIOS drive corresponds to an OS device. So the Multiboot Specification describes some techniques on how to guess mappings (see section `BIOS device mapping techniques' in The Multiboot Specification).
However, the techniques described are unreliable or difficult to be implemented, so we use a different technique from them in GRUB. Our technique is INT 13H tracking technique. More precisely, it runs the INT 13 call (see section INT 13H disk I/O interrupts) in single-step mode just like a debugger and parses the instructions.
To execute the call one instruction at a time, set the TF (trap flag) flag in the register FLAGS. By this, your CPU generates Break Point Trap after each instruction is executed and call INT 1. In the stack in the interrupt handler, callee's FLAGS and the far pointer which points to the next instruction to be executed are pushed, so we can know what instruction will be executed in the next time and the current contents of all the registers. If the next instruction is an I/O operation, the interrupt handler adds the I/O port into the I/O map.
If the INT 13 handler returns, the TF flag is cleared automatically by
the instruction iret
, and then output the I/O map on the screen.
See the source code for the command ioprobe
(@xref{Command-line and menu entry commands}), for more information.
There are three BIOS calls which return the information of installed RAM. GRUB uses these calls to detect all installed RAM and which address range should be treated by operating systems.
Real mode only.
This call returns a memory map of all the installed RAM, and of physical memory ranges reserved by the BIOS. The address map is returned by making successive calls to this API, each returning one "run" of physical address information. Each run has a type which dictates how this run of physical address range should be treated by the operating system.
If the information returned from INT 15h, AX=E820h in some way differs from INT 15h, AX=E801h (see section INT 15H, AX=E801h interrupt call) or INT 15h AH=88h (see section INT 15H, AX=88h interrupt call), then the information returned from E820h supersedes what is returned from these older interfaces. This allows the BIOS to return whatever information it wishes to for compatibility reasons.
Input:
EAX | Function Code | E820h |
EBX | Continuation | Contains the continuation | value to get the next run of physical memory. This is the value returned by a previous call to this routine. If this is the first call,
ES:DI | Buffer Pointer | Pointer to an Address Range | Descriptor structure which the BIOS is to fill in.
ECX | Buffer Size | The length in bytes of the | structure passed to the BIOS. The BIOS will fill in at most
EDX | Signature | `SMAP' - Used by the BIOS to | verify the caller is requesting the system map information to be returned in
Output:
CF | Carry Flag | Non-Carry - indicates no error |
EAX | Signature | `SMAP' - Signature to verify | correct BIOS revision.
ES:DI | Buffer Pointer | Returned Address Range | Descriptor pointer. Same value as on input.
ECX | Buffer Size | Number of bytes returned by the | BIOS in the address range descriptor. The minimum size structure returned by the BIOS is 20 bytes.
EBX | Continuation | Contains the continuation value | to get the next address descriptor. The actual significance of the continuation value is up to the discretion of the BIOS. The caller must pass the continuation value unchanged as input to the next iteration of the E820h call in order to get the next Address Range Descriptor. A return value of zero means that this is the last descriptor. Note that the BIOS indicates that the last valid descriptor has been returned by either returning a zero as the continuation value, or by returning carry.
The Address Range Descriptor Structure is:
Offset in Bytes | Name | Description |
0 | BaseAddrLow | Low 32 Bits of Base Address |
4 | BaseAddrHigh | High 32 Bits of Base Address |
8 | LengthLow | Low 32 Bits of Length in Bytes |
12 | LengthHigh | High 32 Bits of Length in Bytes |
16 | Type | Address type of this range |
The BaseAddrLow and BaseAddrHigh together are the 64 bit BaseAddress of this range. The BaseAddress is the physical address of the start of the range being specified.
The LengthLow and LengthHigh together are the 64 bit Length of this range. The Length is the physical contiguous length in bytes of a range being specified.
The Type field describes the usage of the described address range as defined in the table below:
Value | Mnemonic | Description |
1 | AddressRangeMemory | This run is available | RAM usable by the operating system.
2 | AddressRangeReserved | This run of addresses is in | use or reserved by the system, and must not be used by the operating system.
Other | Undefined | Undefined - Reserved for future | use. Any range of this type must be treated by the OS as if the type returned was AddressRangeReserved.
The BIOS can use the AddressRangeReserved address range type to block out various addresses as not suitable for use by a programmable device.
Some of the reasons a BIOS would do this are:
Here is the list of assumptions and limitations:
Here we explain an example address map. This sample address map describes a machine which has 128 MB RAM, 640K of base memory and 127 MB extended. The base memory has 639K available for the user and 1K for an extended BIOS data area. There is a 4 MB Linear Frame Buffer (LFB) based at 12 MB. The memory hole created by the chipset is from 8 M to 16 M. There are memory mapped APIC devices in the system. The IO Unit is at FEC00000 and the Local Unit is at FEE00000. The system BIOS is remapped to 4G - 64K.
Note that the 639K endpoint of the first memory range is also the base memory size reported in the BIOS data segment at 40:13.
Key to types: ARM is AddressRangeMemory, ARR is AddressRangeReserved.
Base (Hex) | Length | Type | Description |
0000 0000 | 639K | ARM | Available Base memory - | typically the same value as is returned via the INT 12 function.
0009 FC00 | 1K | ARR | Memory reserved for use by the | BIOS(s). This area typically includes the Extended BIOS data area.
000F 0000 | 64K | ARR | System BIOS. |
0010 0000 | 7M | ARM | Extended memory, this is not | limited to the 64MB address range.
0080 0000 | 8M | ARR | Chipset memory hole required to | support the LFB mapping at 12 MB.
0100 0000 | 120M | ARM | Base board RAM relocated | above a chipset memory hole.
FEC0 0000 | 4K | ARR | IO APIC memory mapped I/O at | FEC00000. Note the range of addresses required for an APIC device may vary from one motherboard manufacturer to another
FEE0 0000 | 4K | ARR | Local APIC memory mapped I/O at | FEE00000.
FFFF 0000 | 64K | ARR | Remapped System BIOS at end of | address space.
The following code segment is intended to describe the algorithm needed when calling the Query System Address Map function. It is an implementation example and uses non standard mechanisms.
E820Present = FALSE; Regs.ebx = 0; do { Regs.eax = 0xE820; Regs.es = SEGMENT (&Descriptor); Regs.di = OFFSET (&Descriptor); Regs.ecx = sizeof (Descriptor); Regs.edx = 'SMAP'; _int (0x15, Regs); if ((Regs.eflags & EFLAGS_CARRY) || Regs.eax != 'SMAP') { break; } if (Regs.ecx < 20 || Regs.ecx > sizeof (Descriptor)) { /* bug in bios - all returned descriptors must be at least 20 bytes long, and can not be larger than the input buffer. */ break; } E820Present = TRUE; . . . Add address range Descriptor.BaseAddress through Descriptor.BaseAddress + Descriptor.Length as type Descriptor.Type . . . } while (Regs.ebx != 0); if (! E820Present) { . . . call INT 15H, AX E801h and/or INT 15H, AH=88h to obtain old style memory information . . . }
Real mode only.
Originally defined for EISA servers, this interface is capable of reporting up to 4 GB of RAM. While not nearly as flexible as E820h, it is present in many more systems.
Input:
AX | Function Code | E801h. |
Output:
CF | Carry Flag | Non-Carry - indicates no error. |
AX | Extended 1 | Number of contiguous KB between 1 | and 16 MB, maximum 0x3C00 = 15 MB.
BX | Extended 2 | Number of contiguous 64KB blocks | between 16 MB and 4GB.
CX | Configured 1 | Number of contiguous KB between 1 | and 16 MB, maximum 0x3c00 = 15 MB.
DX | Configured 2 | Number of contiguous 64KB blocks | between 16 MB and 4 GB.
Not sure what this difference between the Extended and Configured numbers are, but they appear to be identical, as reported from the BIOS.
It is possible for a machine using this interface to report a memory hole just under 16 MB (Count 1 is less than 15 MB, but Count 2 is non-zero).
Real mode only.
This interface is quite primitive. It returns a single value for contiguous memory above 1 MB. The biggest limitation is that the value returned is a 16-bit value, in KB, so it has a maximum saturation of just under 64 MB even presuming it returns as much as it can. On some systems, it won't return anything above the 16 MB boundary.
The one useful point is that it works on every PC available.
Input:
AH | Function Code | 88h |
Output:
CF | Carry Flag | Non-Carry - indicates no error. |
AX | Memory Count | Number of contiguous KB above 1 | MB.
In the PC world, living with the BIOS disk interface is definitely a nightmare. This section documents how awful the chaos is and how GRUB deals with the BIOS disks.
CHS -- Cylinder/Head/Sector -- is the traditional way to address sectors on a disk. There are at least two types of CHS addressing; the CHS that is used at the INT 13H interface and the CHS that is used at the ATA device interface. In the MFM/RLL/ESDI and early ATA days the CHS used at the INT 13H interface was the same as the CHS used at the device interface.
Today we have CHS translating BIOS types that can use one CHS at the INT 13H interface and a different CHS at the device interface. These two types of CHS will be called the logical CHS or L-CHS and the physical CHS or P-CHS in this section. L-CHS is the CHS used at the INT 13H interface and P-CHS is the CHS used at the device interface.
The L-CHS used at the INT 13 interface allows up to 256 heads, up to 1024 cylinders and up to 63 sectors. This allows support of up to 8GB drives. This scheme started with either ESDI or SCSI adapters many years ago.
The P-CHS used at the device interface allows up to 16 heads up to 65535 cylinders, and up to 63 sectors. This allows access to about 2^26 sectors (32GB) on an ATA device. When a P-CHS is used at the INT 13H interface it is limited to 1024 cylinders, 16 heads and 63 sectors. This is where the old 528MB limit originated.
LBA -- Logical Block Address -- is another way of addressing sectors that uses a simple numbering scheme starting with zero as the address of the first sector on a device. The ATA standard requires that cylinder 0, head 0, sector 1 address the same sector as addressed by LBA 0. LBA addressing can be used at the ATA interface if the ATA device supports it. LBA addressing is also used at the INT 13H interface by the AH=4xH read/write calls.
ATA devices may also support LBA at the device interface. LBA allows access to approximately 2^28 sectors (137GB) on an ATA device.
A SCSI host adapter can convert a L-CHS directly to an LBA used in the SCSI read/write commands. On a PC today, SCSI is also limited to 8GB when CHS addressing is used at the INT 13H interface.
First, all OS's that want to be co-resident with another OS (and that is all of the PC based OS's that we know of) must use INT 13H to determine the capacity of a hard disk. And that capacity information must be determined in L-CHS mode. Why is this? Because:
During the boot processing, all of the disk read accesses are done in L-CHS mode via INT 13H and this includes loading the first of the OS's kernel code or boot manager's code.
Second, because there can be multiple BIOS types in any one system, each drive may be under the control of a different type of BIOS. For example, drive 80H (the first hard drive) could be controlled by the original system BIOS, drive 81H (the second drive) could be controlled by a option ROM BIOS and drive 82H (the third drive) could be controlled by a software driver. Also, be aware that each drive could be a different type, for example, drive 80H could be an MFM drive, drive 81H could be an ATA drive, drive 82H could be a SCSI drive.
Third, not all OS's understand or use BIOS drive numbers greater than 81H. Even if there is INT 13H support for drives 82H or greater, the OS may not use that support.
Fourth, the BIOS INT 13H configuration calls are:
An ATA disk must implement both CHS and LBA addressing and must at any given time support only one P-CHS at the device interface. And, the drive must maintain a strict relationship between the sector addressing in CHS mode and LBA mode. Quoting the ATA-2 document:
LBA = ( (cylinder * heads_per_cylinder + heads ) * sectors_per_track ) + sector - 1 where heads_per_cylinder and sectors_per_track are the current translation mode values.
This algorithm can also be used by a BIOS or an OS to convert a L-CHS to an LBA.
This algorithm can be reversed such that an LBA can be converted to a CHS:
cylinder = LBA / (heads_per_cylinder * sectors_per_track) temp = LBA % (heads_per_cylinder * sectors_per_track) head = temp / sectors_per_track sector = temp % sectors_per_track + 1
While most OS's compute disk addresses in an LBA scheme, an OS like DOS must convert that LBA to a CHS in order to call INT 13H.
The basic problem is that there is no requirement that a CHS translating BIOS followed these rules. There are many other algorithms that can be implemented to perform a similar function. Today, there are at least two popular implementations: the Phoenix implementation (described above) and the non-Phoenix implementations. Because a protected mode OS that does not want to use INT 13H must implement the same CHS translation algorithm. If it doesn't, your data gets scrambled.
In the perfect world of tomorrow, maybe only LBA will be used. But today we are faced with the following problems:
These are difficult problems to overcome in today's industry environment. The result: chaos.
Real mode only. These functions are the traditional CHS mode disk interface. GRUB calls them only if LBA mode is not available.
INT 13H, AH=02h reads sectors into memory.
Input:
AH | 02h |
AL | The number of sectors to read (must be non-zero). |
CH | Low 8 bits of cylinder number. |
CL | Sector number in bits 0-5, and high 2 bits of | cylinder number in bits 6-7.
DH | Head number. |
DL | Drive number (bit 7 set for hard disk). |
ES:BX | Data buffer. |
Output:
CF | Set on error. |
AH | Status. |
AL | The number of sectors transferred (only valid if CF | set for some BIOSes).
INT 13H, AH=03h writes disk sectors.
Input:
AH | 03h |
AL | The number of sectors to write (must be non-zero). |
CH | Low 8 bits of cylinder number. |
CL | Sector number in bits 0-5, and high 2 bits of | cylinder number in bits 6-7.
DH | Head number. |
DL | Drive number (bit 7 set for hard disk). |
ES:BX | Data buffer. |
Output:
CF | Set on error. |
AH | Status. |
AL | The number of sectors transferred (only valid if CF | set for some BIOSes).
INT 13H, AH=08h returns drive parameters. For systems predating the IBM PC/AT, this call is only valid for hard disks.
Input:
AH | 08h |
DL | Drive number (bit 7 set for hard disk). |
Output:
CF | Set on error. |
AH | 0. |
AL | 0 on at least some BIOSes. |
BL | Drive type (AT/PS2 floppies only). |
CH | Low 8 bits of maximum cylinder number. |
CL | Maximum sector number in bits 0-5, and high 2 bits | of maximum cylinder number in bits 6-7.
DH | Maximum head number. |
DL | The number of drives. |
ES:DI | Drive parameter table (floppies only). |
Real mode only. These functions are IBM/MS INT 13 Extensions to support LBA mode. GRUB uses them if available so that it can read/write over 8GB area.
INT 13, AH=41h checks if LBA is supported.
Input:
AH | 41h. |
BX | 55AAh. |
DL | Drive number. |
Output:
CF | Set on error. |
AH | Major version of extensions (10h for 1.x, 20h for | 2.0 / EDD-1.0, 21h for 2.1 / EDD-1.1 and 30h for EDD-3.0) if successful, otherwise 01h (the error code of invalid function).
BX | AA55h if installed. |
AL | Internal use. |
CX | API subset support bitmap (see below). |
DH | Extension version. |
The bitfields for the API subset support bitmap are(2):
Bit(s) | Description |
0 | Extended disk access functions (AH=42h-44h, 47h, 48h) | supported.
1 | Removable drive controller functions (AH=45h, 46h, 48h, | 49h, INT 15H, AH=52h) supported.
2 | Enhanced disk drive (EDD) functions (AH=48h, 4Eh) | supported.
3-15 | Reserved (0). |
INT 13, AH=42h reads sectors into memory.
Input:
AH | 42h. |
DL | Drive number. |
DS:SI | Disk Address Packet (see below). |
Output:
CF | Set on error. |
AH | 0 if successful, otherwise error code. |
The format of Disk Address Packet is:
Offset (hex) | Size (byte) | Description |
00 | 1 | 10h (The size of packet). |
01 | 1 | Reserved (0). |
02 | 2 | The number of blocks to transfer (max 007F for | Phoenix EDD).
04 | 4 | Transfer buffer (SEGMENT:OFFSET). |
08 | 8 | Starting absolute block number. |
INT 13, AH=43h writes disk sectors.
Input:
AH | 43h. |
AL | Write flags (In version 1.0 and 2.0, bit 0 is the | flag for verify write and other bits are reserved (0). In version 2.1, 00h and 01h indicates write without verify, and 02h indicates write with verify.
DL | Drive number. |
DS:SI | Disk Address Packet (see above). |
Output:
CF | Set on error. |
AH | 0 if successful, otherwise error code. |
INT 13, AH=48h returns drive parameters. GRUB only makes use of the total number of sectors, and ignore the CHS information, because only L-CHS makes sense. See section CHS addressing and LBA addressing, for more information.
Input:
AH | 48h. |
DL | Drive number. |
DS:SI | Buffer for drive parameters (see below). |
Output:
CF | Set on error. |
AH | 0 if successful, otherwise error code. |
The format of drive parameters is:
Offset (hex) | Size (byte) | Description |
00 | 2 | The size of buffer. Before calling this function, | set to the maximum buffer size, at least 1Ah. The size actually filled is returned (1Ah for version 1.0, 1Eh for 2.x and 42h for 3.0).
02 | 2 | Information flags (see below). |
04 | 4 | The number of physical cylinders. |
08 | 4 | The number of physical heads. |
0C | 4 | The number of physical sectors per track. |
10 | 8 | The total number of sectors. |
18 | 2 | The bytes per sector. |
v2.0 and later | ||
1A | 4 | EDD configuration parameters. |
v3.0 | ||
1E | 2 | Signature BEDD to indicate presence of Device Path | information.
20 | 1 | The length of Device Path information, including | signature and this byte (24h for version 3.0).
21 | 3 | Reserved (0). |
24 | 4 | ASCIZ name of host bus (`ISA' or `PCI'). |
28 | 8 | ASCIZ name of interface type (`ATA', | `ATAPI', `SCSI', `USB', `1394' or `FIBRE').
30 | 8 | Interface Path. |
38 | 8 | Device Path. |
40 | 1 | Reserved (0). |
41 | 1 | Checksum of bytes 1Eh-40h (2's complement of sum, | which makes the 8 bit sum of bytes 1Eh-41h equal to 00h).
The information flags are:
Bit(s) | Description |
0 | DMA boundary errors handles transparently. |
1 | CHS information is valid. |
2 | Removable drive. |
3 | Write with verify supported. |
4 | Drive has change-line support (required if drive is | removable).
5 | Drive can be locked (required if drive is removable). |
6 | CHS information set to maximum supported values, not | current media.
7-15 | Reserved (0). |
A Master Boot Record (MBR) is the sector at cylinder 0, head 0, sector 1 of a hard disk. A MBR-like structure must be created in each of partitions by the FDISK program.
At the completion of your system's Power On Self Test (POST), INT 19H is called. Usually INT 19 tries to read a boot sector from the first floppy drive(3). If a boot sector is found on the floppy disk, that boot sector is read into memory at location 0000:7C00 and INT 19H jumps to memory location 0000:7C00. However, if no boot sector is found on the first floppy drive, INT 19H tries to read the MBR from the first hard drive. If an MBR is found it is read into memory at location 0000:7C00 and INT 19H jumps to memory location 0000:7C00. The small program in the MBR will attempt to locate an active (bootable) partition in its partition table(4). The small program in the boot sector must locate the first part of the operating system's kernel loader program (or perhaps the kernel itself or perhaps a boot manager program) and read that into memory.
INT 19H is also called when the CTRL-ALT-DEL keys are used. On most systems, CTRL-ALT-DEL causes an short version of the POST to be executed before INT 19H is called.
The stuff is:
However, the first 62 bytes of a boot sector are known as the BIOS Parameter Block (BPB), so GRUB cannot use these bytes for its own purpose.
If an active partition is found, that partition's boot record is read
into 0000:7C00 and the MBR code jumps to 0000:7C00 with SI
pointing to the partition table entry that describes the partition being
booted. The boot record program uses this data to determine the drive
being booted from and the location of the partition on the disk.
The first byte of an active partition table entry is 80. This byte is
loaded into the DL
register before INT 13H is called to read the
boot sector. When INT 13H is called, DL
is the BIOS device
number. Because of this, the boot sector read by this MBR program can
only be read from BIOS device number 80 (the first hard disk). This is
one of the reasons why it is usually not possible to boot from any other
hard disk.
FDISK creates all partition records (sectors). The primary purpose of a partition record is to hold a partition table. The rules for how FDISK works are unwritten but so far most FDISK programs seem to follow the same basic idea.
First, all partition table records (sectors) have the same format. This includes the partition table record at cylinder 0, head 0, sector 1 -- what is known as the Master Boot Record (MBR). The last 66 bytes of a partition table record contain a partition table and a 2 byte signature. The first 446 bytes of these sectors usually contain a program but only the program in the MBR is ever executed (so extended partition table records could contain something other than a program in the first 466 bytes). For more information, see section The structure of Master Boot Record.
Second, extended partitions are nested inside one another and extended partition table records form a linked list. We will attempt to show this in a diagram at section The format of the table entry.
Each partition table entry is 16 bytes and contains things like the start and end location of a partition in CHS, the start in LBA, the size in sectors, the partition type and the active flag. Older versions of FDISK may compute incorrect LBA or size values. And when your computer boots itself, only the CHS fields of the partition table entries are used (another reason LBA doesn't solve the >528MB problem). The CHS fields in the partition tables are in L-CHS format, see section CHS addressing and LBA addressing.
There is no central clearing house to assign the codes used in the one byte type field. But codes are assigned (or used) to define most every type of file system that anyone has ever implemented on the x86 PC: 12-bit FAT, 16-bit FAT, HPFS, NTFS, etc. Plus, an extended partition also has a unique type code.
In the FDISK program `sfdisk', the following list is assumed:
The 16 bytes of a partition table entry are used as follows:
+--- Bit 7 is the active partition flag, bits 6-0 are zero. | | +--- Starting CHS in INT 13 call format. | | | | +--- Partition type byte. | | | | | | +--- Ending CHS in INT 13 call format. | | | | | | | | +-- Starting LBA. | | | | | | | | | | +-- Size in sectors. | | | | | | v <--+---> v <--+--> v v 0 1 2 3 4 5 6 7 8 9 A B C D E F DH DL CH CL TB DL CH CL LBA..... SIZE.... 80 01 01 00 06 0e be 94 3e000000 0c610900 1st entry 00 00 81 95 05 0e fe 7d 4a610900 724e0300 2nd entry 00 00 00 00 00 00 00 00 00000000 00000000 3rd entry 00 00 00 00 00 00 00 00 00000000 00000000 4th entry
Bytes 0-3 are used by the small program in the Master Boot Record to read the first sector of an active partition into memory. The DH, DL, CH and CL above show which x86 register is loaded when the MBR program calls INT 13H AH=02h to read the active partition's boot sector. For more information, see section The structure of Master Boot Record.
These entries define the following partitions:
Keep in mind that there are no written rules and no industry standards on how FDISK should work but here are some basic rules that seem to be followed by most versions of FDISK:
There are no written rules as to how an OS scans the partition table entries so each OS can have a different method. For DOS, this means that different versions could assign different drive letters to the same FAT file system partitions.
This document was generated on 13 February 2001 using texi2html 1.56k.