The FR-V FDPIC ABI
				   
                        Kevin Buettner
                        Alexandre Oliva
                       Richard Henderson

                         Red Hat, Inc.
                         April 9, 2004
                          Version 1.0

Introduction
------------

This document describes extensions (and some minor changes) to the
existing FR-V EABI required to support the implementation of shared
libaries on a system whose OS (and hardware) require that processes
share a common address space.  This document will also attempt to
explore the motivations behind and the implications of these
extensions.

One of the primary goals in using shared libraries is to reduce the
memory requirements of the overall system.  Thus, if two processes use
the same library, the hope is that at least some of the memory pages
will be shared between the two processes resulting in an overall
savings.  To realize these savings, tools used to build a program and
library must identify which sections may be shared and which must not
be shared.  The shared sections, when grouped together, are commonly
referred to as the "text segment" whereas the non-shared (grouped)
sections are commonly referred to as the "data segment".  The text
segment is read-only and is usually comprised of executable code and
read-only data.  The data segment must be writable and it is this fact
which makes it non-sharable.

Systems which utilize disjoint address spaces for its processes are
free to group the text and data segments in such a way that they
may always be loaded with fixed relative positions of the text
and data segments.  I.e, for a given load object, the offset from
the start of the text segment to the start of the data segment is
constant.  This property greatly simplifies the design of the
shared library machinery.

The design of the shared library mechanism described in this document
does not (and cannot) have this property.  Due to the fact that all
processes share a common address space, the text and data segments
will be placed at arbitrary locations relative to each other and will
therefore need a mechanism whereby executable code will always be able
to find its corresponding data.  One of the CPU's registers is
typically dedicated to hold the base address of the data segment. 
This register will be called the "FDPIC register" in this document. 
Such a register is sometimes used in systems with disjoint address
spaces too, but this is for efficiency rather than necessity.

The fact that the locations of the text and data segments are at
non-constant offsets with respect to each other also complicates
function pointer representation.  As noted above, executable code
must be able to find its corresponding data segment.  When making an
indirect function call, it is therefore important that both the
address of the function and the base address of the data segment are
available.  This means that a function pointer needs to represented as
the address of a "function descriptor" which contains the address of
the actual code to execute as well as the corresponding data (FDPIC
register) address.


FDPIC Register
--------------

The FDPIC register is used as a base register for accessing the global
offset table (GOT) and function descriptors.  Since both code and data
are relocatable, executable code may not contain any instruction
sequences which directly encode a pointer's value.  Instead, pointers
to global data are indirectly referenced via the global offset table. 
At load time, pointers contained in the global offset table are
relocated by the dynamic linker to point at the correct locations.

Note: The FR-V EABI [2] specifies GR17 as the PIC register.  GR17
plays no role in the shared library ABI described in this document
apart from being a callee saved register.

Upon entry to a function, the caller saved register GR15 is the FDPIC
register.  As described above, it contains the GOT address for that
function.  GR15 obtains its value in one of three ways:

    1) By being inherited from the calling function in the case
       of a direct call to a function within the same load module.

    2) By being set either in a PLT entry or in inlined PLT code.

    3) By being set from a function descriptor as part of an
       indirect call.

The specifics associated with each of these cases are covered in
greater detail in "Procedure Linkage Table (PLT)" and "Function
Calls", below.

The prologue code of a non-leaf function should save GR15 either on
the stack or in one of the callee-saved registers.  After each
function call, GR15 must be restored if it is needed later on in the
function.  Direct calls to functions in the same load module and
direct calls which are routed through a PLT entry require that GR15 be
restored.  Calls which use inlined PLT code and indirect calls may be
able to avoid using GR15; such calls will need to use some other
register in which the GOT address has been saved, however.  A leaf
function makes no calls and need not save GR15.

Note that once a function has moved GR15 to one if it's callee saved
registers, the function is then free to use that register as the FDPIC
register for accessing data.  This is why the sections describing
relocations are careful to specify FDPIC-relative references instead
of GR15-relative references.

The location of the data segment must be chosen in such a way so that
the GOT address (i.e, FDPIC register value) has double word (64-bit)
alignment.  Note: This makes it possible to load the resolver's
descriptor stored in the dynamic linker reserve area (see below) with
a single doubleword load instruction.  Also, it's envisioned (though
not mandated) that the GOT entries are located at positive FDPIC-based
offsets and that function descriptors are found at negative offsets
to FDPIC.


GR14 Considerations
-------------------

GR14, a caller saved register, plays a role in effecting transfer of
control for some function calls.  A PLT entry (or inlined PLT code)
loads a function descriptor into GR14 and GR15 via a single 64-bit
load instruction.  After such a load, GR14 will contain the code
address to which control should be transferred.  (GR15 will contain
the GOT address.)  The address loaded into GR14 will either be the
entry point of the function itself or the address of the lazy PLT
fragment corresponding to the function to call.  See "Lazy Procedure
Linkage" below.  In either case, the PLT entry (or inlined PLT code)
will branch to the address contained in GR14.

Using the GR14/GR15 pair in this way makes PLT entries very compact. 
They are so compact, in fact, that expanding a PLT entry inline only
adds one instruction (best case) to the call site.  At worst case,
three extra instructions are required.  Also, assuming that the FDPIC
value has been saved in some other callee-saved register in the
function prologue, the use of an inlined PLT entry may obviate the
need for restoring GR15 after the function call which precedes a call
using an inlined PLT entry.  This means that functions using inlined
PLT entries require only a few extra instructions.  In addition to
being faster for the obvious reason of executing fewer instructions,
inlining PLT entries offer greater opportunities to schedule
instructions at the call site.

Note: Upon entry to a function, GR14 should not be relied upon to
contain the entry point address of the function.  It is possible
that the function was called directly, i.e, via a call instruction.
Also, after (lazy) resolution, there's no requirement for the resolver
to set GR14 in this manner.


GR16/GR17 Usage
---------------

The FR-V EABI [2] specifies that GR16 may be used as the base register
for small data references.  When GR16 is not used for this purpose, it
is a callee saved register.  The EABI also specifies that GR17 may be
used as the PIC register for position independent code.  When GR17 is
not used for this purpose, it is a callee saved register.  Either
register was traditionally initialized to the value of the _gp symbol,
that used to be located next to the .got and .sdata sections.

For the FR-V Shared Library ABI, the _gp symbol is defined in the text
segment, making it unsuitable for referencing small data or GOT
entries.  It is suitable, however, for referencing read-only data,
because _gp is defined within the .rodata section.  The _gp address
can be obtained with the following instruction:

	ldi	@(gr15, #got12(_gp)), gr#


The PIC register may not be initialized nor used in the manner
described in the FR-V EABI [2].  The code sequence suggested therein
for initializing the PIC register in function prologues will not work
for shared library support which relocates text and data segments by
different amounts.  (If the text and data segments are always
relocated by the same amount, then it works fine.)  It has at least
two problems for the type of shared library system described in this
document:

    1) The sethi / setlo relocations in the suggested code sequence
       would need to be load time relocations in the text segment. 
       This is unacceptable because it conflicts with the goal of
       being able to place and execute text segments in read-only
       memory.

    2) A PC-relative offset to the GOT can't possibly be correct
       for more than one process, because the location of the data
       segment will vary from one process to another while the
       text segment will remain at the same address.


Function Descriptors
--------------------

A number of programs assume that pointers to functions are as wide as
pointers to data, even though programming languages don't require
this.  However, two words are needed to represent a function pointer
meaningfully:  not only is the function's entry point required, but
also some context information that enables the function to find the
corresponding data segment in the current process.  Such context
information is given in the form of a pointer to the GOT in FDPIC
(which is GR15 upon entry to a function).

In order to keep pointers to functions as 32-bit values, while adding
context information to them, we introduce function descriptors, such
that, when the address of a function is taken, the address of its
descriptor is obtained.  As shown below, the descriptor contains
pointers to both the function's entry point and its GOT.  A load
module will also likely contain a number of private function
descriptors which are used in conjunction with a corresponding PLT
entry (or inlined PLT code) for calling a function.

A function descriptor consists of two 4-byte words:

    1) The "entry point" at offset 0 contains the text address of the
       function.  This is the address at which to start executing
       the function.
    
    2) The "GOT address" at offset 4 contains the value to which the FDPIC
       register must be set when executing the function.

Each direct function call requiring a PLT entry (or which uses inlined
PLT code) requires a function descriptor stored in the data segment. 
These descriptors should ideally be located near enough to the address
specified by the FDPIC register to allow these two words to be accessed
with a single LDDI instruction.

Each private function descriptor needs to be initialized using a
64-bit relocation which fills in both the function entry point and GOT
address.  The R_FRV_FUNCDESC_VALUE relocation is used for this
purpose.


Function Addresses
------------------

When a function address is required, the address of an "official" (or
canonical) function descriptor is used.  Descriptors corresponding to
static, non-overridable functions are allocated by the link editor
and are initialized at load time via the R_FRV_FUNCDESC_VALUE relocation.
The dynamic linker is responsible for allocating and initializing all
other "official" function descriptors.

As described above, a function's address is actually the address of a
function descriptor, not that of the function's entry point.  As is
the case with other kinds of pointers, executable code obtains the
values of pointer constants via the global offset table.  The
R_FRV_FUNCDESC relocation (see below) is used in global offset table
entries and initialized data to obtain the addresses of function
descriptors used for representing function addresses.

Note: This document borrows many of the concepts and terminology
related to function addresses and their descriptors from the IA-64
System V ABI [5, 6].


Procedure Linkage Table (PLT)
-----------------------------

In order to make direct calls to a function external to a given load
module, the CALL instruction's target is a PLT entry.  (Calls to
internal, but overridable functions also need PLT entries.)  The PLT
entry contains instructions for fetching the function's start address
and global pointer value from a function descriptor associated with
the function in question.  The function descriptor will be located at
a fixed offset from the address specified by the FDPIC register.  The
instructions in a PLT entry look like this:

        plt(foo):       lddi  @(gr15, gotofffuncdesc12(foo)), gr14
                        jmpl  @(gr14, gr0)

Due to the limited range of the LDDI instruction, one or two
additional instructions may be needed to access function descriptors
that are out its range.  A "worst case" PLT entry is as follows:

        plt(foo):       sethi #gotofffuncdeschi(foo), gr14
                        setlo #gotofffuncdesclo(foo), gr14
                        ldd   @(gr14, gr15), gr14
                        jmpl  @(gr14, gr0)

When the function descriptor is out of range of ``lddi'' but is within
the address range afforded by 16 bits of offset, the ``sethi'' instruction
in the above "worst case" sequence may be eliminated and the ``setlo''
instruction may be replaced by a ``setlos'' instruction.  Such a PLT
entry is only three instructions long.

The "load double" instructions in the PLT entries load the address of
the function's entry point into GR14 and the new GOT address into
GR15.

Note that despite the "l" in its name, a ``jmpl'' instruction doesn't
actually set the link register (LR).  The value of the link register
is not changed by a PLT entry.

In order to accomplish "lazy dynamic linking" (see below), GR14 must be
set to the entry point address found in the function descriptor.

Since PLT entries are so short, the compiler may choose to inline them
directly into the call site.  The resultant code should be speedier,
both due to the fact that branch instruction is eliminated, and due to
the fact that it may be possible to move the LDDI instruction earlier
in the instruction stream.  However, calling functions within the same
translation unit may often be done with a single call instruction, so
it's not always advantageous to do the inlining.


Dynamic Linker Reserve Area
---------------------------

The linker reserves three words starting at the location pointed to by
the FDPIC register for use by the dynamic linker.  The first two words
comprise a function descriptor for invoking the resolver used in lazy
dynamic linking.  The third (at GR15+8) is used by the dynamic linker
and the debugger to obtain access to information regarding the loaded
module and the amount that each segment has been relocated by.


Lazy Procedure Linkage
----------------------

Lazy procedure linkage requires an additional PLT fragment for each
dynamic function that requires a local descriptor in the module. 
These entries are not large, but their aggregate will increase the
size of the text segment.  For this reason, the use of lazy dynamic
linking is optional.  (Implementation of lazy dynamic linking in the
dynamic linker is mandatory, however.)

A lazy PLT fragment looks like this:

                        .word   funcdesc_value_reloc_offset(foo)
        lazy_plt(foo):  bra     resolverStub

The code for ``resolverStub'' looks like this:

        resolverStub:   lddi    @(gr15, 0), gr4
                        jmpl    @(gr4, gr0)

The link editor adds as many ``resolverStub'' fragments as necessary
to ensure that the branch in each lazy PLT fragment is within range.

It is also possible to inline the resolverStub instructions as
follows:

                        .word   funcdesc_value_reloc_offset(foo)
        lazy_plt(foo):  lddi    @(gr15, 0), gr4
                        jmpl    @(gr4, gr0)

Lazy PLT fragments have word (32-bit) alignment.

Function descriptors residing in the GOT segment are initialized so
that the entry point is that of the corresponding lazy PLT entry
address.  The function descriptor's GOT address is initialized to the
GOT address for the load module itself.  These initializations occur
as the result of the dynamic linker performing R_FRV_FUNCDESC_VALUE
relocations (located in the .rel.plt section) at load time.

Thus a function call to an unresolved function will go through the
lazy PLT fragment for that function as a result of picking up the lazy
PLT entry point from the function descriptor.  The lazy PLT fragment
immediately branches to ``resolverStub'', a special PLT entry which
uses the dynamic linker reserve area (see above) to cause execution to
be transferred to the actual resolver without disturbing either GR14
or GR15.

Upon entry to the actual (lazy) resolver, the following register
values are important:

    GR4     -- the address of the resolver itself
    GR5     -- the GOT address (FDPIC value) for the resolver's GOT
    GR14    -- the address of the lazy PLT entry being resolved
    GR15    -- the GOT address for the caller's GOT

The resolver must take care not to modify the argument registers or
the callee-saved registers, or if it does, to restore them to their
original state when it's done.

The resolver uses the word at GR14 - 4 (that is @(gr14,-4) ) which is
an offset to a R_FRV_FUNCDESC_VALUE relocation.  This offset is
relative to the value (address) associated with the DT_JMPREL tag in
the dynamic section.  (Tags related to DT_JMPREL are DT_PLTRELSZ and
DT_PLTREL.  The value associated with DT_PLTRELSZ provides the size of
this section.  The value associated with DT_PLTREL must be set to
DT_REL indicating that Elf32_Rel structs are used to hold the
relocation information.)  The R_FRV_FUNCDESC_VALUE relocation provides
the offset to the function descriptor to update and the symbol table
index of the function to resolve.

Assuming the resolver completes successfully, it will perform the
following actions prior to transferring control to the entry point of
the resolved function:

    1) Fill in the function descriptor in the caller's GOT so that
       the entry point and GOT address are correct for the next call
       of the resolved function.  These values must be written
       in such a way so as to avoid the possibility of a race
       condition between both words getting written and some other
       thread attempting to read them.  One way to achieve this is
       to write the words using a single 64-bit store instruction.
    2) Set GR15 to the GOT address of the resolvee's GOT.


Function Calls
--------------

Direct function calls are performed as follows:

                "set up arguments as mandated by FR-V EABI"
                call foo
                "restore any needed ``caller saves'' registers"

The ``call foo'' pseudo-instruction will either transfer control
directly to foo's entry point or will transfer control to foo's PLT
entry if one is needed.

Since PLT entries reference GR15, a function must ensure that GR15
is set correctly prior to making a function call.

Inlined PLT code may be able to make use of the FDPIC value stored in
another register - thus avoiding the need for setting GR15.  A direct
call with an inlined PLT entry looks like this:

                "set up arguments as mandated by FR-V EABI"
		lddi  @(fdpic, gotofffuncdesc12(foo)), gr14
		calll @(gr14, gr0)
		"restore any needed ``call

In the sequence above, ``fdpic'' refers to either GR15 or some
other register containg the GOT address for the current load
module.  Note that an opportunity exists for scheduling the lddi
instruction at an earlier point in order to avoid a stall between
the lddi and the call.

Indirect calls are performed by loading -- via a 64-bit load
instruction -- the entry point and GOT address from the function
descriptor into GR14 and GR15, respectively.  Control is transferred
via a CALLL instruction to the function's entry point.  The call site
for an indirect function call might look like this:

                "set up arguments as mandated by FR-V EABI"
                "load entry point and GOT address from function descriptor
                 into GR14 and GR15"
                calll    @(gr14, gr0)
                "restore any needed `caller saves' registers"


Global Data and the Global Offset Table (GOT)
---------------------------------------------

As noted earlier, position independent code must not contain any
instruction sequences which directly encode a reference to global
data.  If they did so, load time relocations would be necessary to
adjust these addresses.  Also, any reference to a address in a
non-shared segment would force the executable segment in question to
be non-sharable.

The global offset table (GOT) contains words which hold the
addresses of global data.  In order to access these global data,
position independent code must first use an FDPIC-relative load
instruction to fetch the data address from the GOT.
The data structure is then accessed as necessary using the address
obtained from the GOT.  It is envisioned that the various GOT
related structures might look something like this:

                +-----------------------+ <--------------------\
                |          .            |                      |
                           .                                   |
                |          .            |                      |
                +-----------------------+                      |
                |                       |                      |
                +-    Func Descr #2    -+                      |
                |                       |                      |
                +-----------------------+                      |
                |                       |                      |
                +-    Func Descr #1    -+                      |
                |                       |                      |
                +-----------------------+ <---\                |
   FDPIC -----> |                       |     |                |
                +- Resolver Descriptor -+   Dynamic Linker     |
                |                       |   Reserve Area       |
                +-----------------------+     |                |
                |   link_map pointer    |     |                |
                +-----------------------+ <---/             Global
                | Global Data Addr #1   |                   Offset
                +-----------------------+                   Table
                | Global Data Addr #2   |                   (GOT)
                +-----------------------+                      |
                | Global Data Addr #3   |                      |
                +-----------------------+                      |
                |          .            |                      |
                           .                                   |
                |          .            |                      |
                +-----------------------+ <--------------------/

The link-editor is responsible for determining the precise layout
of the GOT.  The only hard requirements are the following:

    (a) FDPIC must point at the first word of the dynamic linker
        reserve area.
    (b) The dynamic linker reserve area needs to start on a
        doubleword (64-bit) aligned word.
    (c) Each function descriptor must be doubleword (64-bit)
        aligned.
    (d) The global offset table must reside in a non-shared segment.

In the picture above, function descriptors are placed at negative
offsets relative to GR15 and the GOT data address entries are placed
at positive offsets relative to GR15.  The link editor is free to
place either the function descriptors at postitive offsets (subject to
alignment constraints) or the data address entries at negative
offsets.  It may wish to do so in order to maximize the number of
instructions which access the GOT via 12-bit offsets, or via 16-bit
offsets once the 12-bit offset slots are used up.  Also, note that
there is no requirement that the function descriptors or data address
entries have any particular grouping.

GOT initialization is performed at load time by the dynamic linker. 
In order to accomplish these initializations, the dynamic linker uses
R_FRV_32 relocations that have been placed in the object file by the
link editor.  R_FRV_32 relocations may cause addresses of other
global data in other load modules to be resolved or the relocation
may refer to data within the same load module.  See the description
of R_FRV_32 in "New Relocations" below.  (For function descriptors,
the R_FRV_FUNCDESC_VALUE relocation is used.  This relocation is
described in greater detail below.)

Each load module has a symbol _GLOBAL_OFFSET_TABLE_ which resolves to
the GOT address for that load module.  The DT_PLTGOT dynamic section
entry in each load module contains the GOT address also.

Computing the address of a data object can be done in several
different ways.  The simplest one is:

        sethi   #gothi(bar), gr#
        setlo   #gotlo(bar), gr#
        ld      @(gr15, gr#), gr#

or, for -fpic:

        ldi     @(gr15, #got12(bar)), gr#

If data symbol bar is known to be local to the translation unit, or to
have internal, hidden or protected (but not global) visibility,
different sequences can be used that assume the symbol to be located
at a fixed offset within the text or data segments.  If the symbol
is known to be in the .data section, the following sequence computes
the address of bar:

        sethi   #gotoffhi(bar), gr#
        setlo   #gotofflo(bar), gr#
        add     gr15, gr#, gr#

If the symbol is known to be in the .rodata section (that is mapped to
the text segment), the following sequence has to be used instead:

        sethi   #gprelhi(bar), gr#
        setlo   #gprello(bar), gr#
        add     gr16?, gr#, gr#

gr16 (or any other register) must have been previously initialized
with the gprel base address, as described in the GR16/GR17 Usage
section.

The possibility of using gotoff12 or gprel12 is not affected by -fpic,
since -fpic causes the GOT section to be assumed small, but not
offsets from the GOT to other arbitrary sections.  If bar is known to
be mapped to a small data section, however, narrower offsets using
gotoff12 or gprel12 relocations, can be used.

However, since there are no guarantees about _GLOBAL_OFFSET_TABLE_ or
_gp being close enough to small data sections, a reasonable approach
in some cases is to initialize a base register with the address of
some local variable, then use this base register plus the offset
between the base variable and other local variables defined in the
same translation unit to reference other such variables throughout the
function.  For example, if gr18 is initialized in the beginning of a
function or before a loop with the address of such a base variable,
one can then use an instruction such as:

	ldi	@(gr18, other_var - base_var), gr#

to access other_var.  This only works for symbols that are both
defined in the same section in the same translation unit, and known to
non-overridable.


Taking the address of a function can be accomplished with the
following sequences:

        sethi   #gotfuncdeschi(foo), gr#
        setlo   #gotfuncdesclo(foo), gr#
        ld      @(gr15, gr#), gr#

or, in case it can be assumed that the GOT is smaller:

        ldi     @(gr15, #gotfuncdesc12(foo)), gr#

If the function is local to a translation unit, or is known to have
internal or hidden (but not protected or global) visibility, the
canonical function descriptor of the function will be in the module,
so it is possible to avoid the need for a GOT entry containing the
address of the function descriptor, by using code sequences like:

        sethi   #gotofffuncdeschi(foo), gr#
        sethi   #gotofffuncdesclo(foo), gr#
        add     gr15, gr#, gr#

or, for -fpic:

        addi    gr15, #gotofffuncdesc12(foo), gr#


Global-scope variable initialized with a pointer to a function causes
code like this to be generated:

bar:    .picptr #funcdesc(foo)

Variables initialized with pointers (to data or code) must not be
assigned to read-only segments.


Preexisting Relocation Types
----------------------------

In the course of researching this document, the authors noticed that
the relocation numbers listed in the FR-V EABI [2] do not match those
used by existing tools.  The ABI documented here will break from
the FR-V EABI and use the numbers already in use by existing tools.
They are as follows:

    Name                    Value       Value in FR-V EABI
    ----                    -----       ------------------
    R_FRV_NONE               0          same
    R_FRV_32                 1          same
    R_FRV_LABEL16            2          same
    R_FRV_LABEL24            3          same
    R_FRV_LO16               4          same
    R_FRV_HI16               5          same
    R_FRV_GPREL12            6          same
    R_FRV_GPRELU12           7          missing
    R_FRV_GPREL32            8          7
    R_FRV_GPRELHI            9          8
    R_FRV_GPRELLO           10          9

    R_FRV_GNU_VTINHERIT     200         missing
    R_FRV_GNU_VTENTRY       201         missing


New Relocations
---------------

The following are new relocation types for supporting position independent
code.

    Name                    Value  Meaning
    ----                    -----  -------
    R_FRV_GOT12             11     Used with immediate instructions for 
                                   FDPIC-relative references to GOT entries
    R_FRV_GOTHI             12     Used with sethi for FDPIC-relative
                                   references to GOT entries
    R_FRV_GOTLO             13     Used with setlo for FDPIC-relative
                                   references to GOT entries

    R_FRV_FUNCDESC          14     Used to obtain the address of an
                                   "official" function descriptor

    R_FRV_FUNCDESC_GOT12    15     Used with immediate instructions for
                                   FDPIC-relative references to GOT entries
                                   containing the address of an "official"
				   function descriptor
    R_FRV_FUNCDESC_GOTHI    16     Used with sethi for FDPIC-relative
                                   references to GOT entries containing
				   the address of an "official" function
				   descriptor
    R_FRV_FUNCDESC_GOTLO    17     Used with setlo for FDPIC-relative
				   references to GOT entries containing
				   the address of an "official" function
				   descriptor

    R_FRV_FUNCDESC_VALUE    18     Used to fill in function entry point
                                   and GOT address in private function
                                   descriptors

    R_FRV_FUNCDESC_GOTOFF12 19     Used with immediate instructions for
                                   FDPIC-relative references to private
                                   function descriptors, i.e, those used by
                                   inlined PLT code
    R_FRV_FUNCDESC_GOTOFFHI 20     Used with sethi for FDPIC-relative
                                   references to private function descriptors
    R_FRV_FUNCDESC_GOTOFFLO 21     Used with setlo for FDPIC-relative
                                   references to private function descriptors

    R_FRV_GOTOFF12          22     Used with immediate instructions for
                                   FDPIC-relative references to small data
    R_FRV_GOTOFFHI          23     Used with sethi for FDPIC-relative
                                   references to small data
    R_FRV_GOTOFFLO          24     Used with setlo for FDPIC-relative
                                   references to small data

The dynamic loader needs to ajust or "fix up" portions of the data
segment due to it being dynamically located.  The various dynamic
relocation entries tell the dynamic loader how to do this.  The text
segment is dynamically located too, but it is read-only and must not
have any relocation entries associated with it.

Dynamic relocations have the following types: R_FRV_32,
R_FRV_FUNCDESC, and R_FRV_FUNCDESC_VALUE.  The precise interpretration
given to these relocation types by the dynamic linker is described in
the following paragraphs.

  R_FRV_32
  --------
    The R_FRV_32 relocation is used to initialize pointer values in
    the global offset table and in initialized data.  The ``r_offset''
    field in the Elf32_Rel relocation struct contains the location to
    which the relocation should be applied.  The ``r_info'' field
    encodes a symbol table index (as well as the R_FRV_32 relocation
    type).
    
    When the symbol table index refers to a section (in which case the
    symbol type is STT_SECTION), the relocation value is computed by
    adding the base address of that section to the offset stored in
    the relocation location.

    Otherwise, the symbol table index refers to a symbol which is
    defined in some other load module.  The symbol's address is
    determined and is added to the addend at the location given by
    ``r_offset''.

  R_FRV_FUNCDESC
  --------------
    The R_FRV_FUNCDESC relocation is used to obtain the address of an
    "official" function descriptor from the dynamic linker.  The
    ``r_offset'' field contains the location (offset) of the word
    which must receive this address.  The ``r_info'' field contains an
    encoding of the symbol table index corresponding to the function
    to resolve.  The dynamic linker resolves the function and
    determines the address of the corresponding official descriptor,
    allocating and initializing it as necessary.  (It is the dynamic
    linker's responsibility to allocate and initialize all official
    descriptors.)  The address of the official descriptor is written to
    the location specified by ``r_offset''.

    Note: This relocation is always expected to reference symbols for
    which the dynamic linker is expected to create an "official
    descriptor".  References to descriptors which are allocated and
    initialized by the link editor are handled via the R_FRV_32
    relocation.

  R_FRV_FUNCDESC_VALUE
  --------------------
    The R_FRV_FUNCDESC_VALUE relocation is used to initialize
    both words of a function descriptor.  The ``r_offset'' member (in
    an Elf32_Rel struct) specifies the location of the descriptor to
    initialize.  The ``r_info'' member encodes both the number
    associated with the R_FRV_FUNCDESC_VALUE type and a symbol table
    index.

    Support for lazy binding is accomplished by R_FRV_FUNCDESC_VALUE
    relocations residing in the .rel.plt section.  The symbol index
    encoded in ``r_info'' corresponds to the symbol to resolve.  In
    the descriptor itself, the link editor sets the low word to the
    address of the lazy PLT entry which, when executed, will ultimately
    resolve the symbol.  The high word is set to the index of the
    segment containing the lazy PLT code.  Relocations in .rel.plt are
    potentially processed twice, once at load time to fix up the
    offset so that the function descriptor really points at the lazy
    PLT entry, and possibly later on, as a result of the code in the
    lazy PLT entry being run, forcing actual binding to be done. 
    Note:  The environment variable ``LD_BIND_NOW'' may be set to a
    non-null value to force binding to occur at load time.  When
    ``LD_BIND_NOW'' is used for this purpose, the descriptor's
    contents are ignored, and the relocations are only processed
    once.

    R_FRV_FUNCDESC_VALUE relocations found outside of .rel.plt are
    used either for non-lazy binding support (forced at compile/link
    time) or for static function descriptor initializations.  These
    cases will be considered separately.
    
    Relocations used for resolving external functions (in a non-lazy
    manner) have the symbol index encoded in ``r_info'' set to
    correspond to symbol to resolve.  The descriptor contents are
    irrelevant and are ignored.  The function corresponding to the
    symbol index is resolved and the entry point and GOT address
    for that function are written to the descriptor.

    The R_FRV_FUNCDESC_VALUE relocation is also used to initialize
    function descriptors used as addresses for static, non-overridable
    functions.  When used for this purpose, the ``r_info'' member encodes
    the symbol table index for the section in which the function is
    found.  The low word of the descriptor contains the offset to the
    function and the high word contains the segment index.

    The segment index can be used to speed up the computation of the
    address of the symbol, if the dynamic linker maintains internally
    an array that maps a segment number to the offset by which it was
    relocated.  Such a map is not required, though, and the dynamic
    linker is free to ignore segment index information.


Assembler pseudo-functions
--------------------------

Below is a list of additional pseudo-functions for writing assembly code:

    Name                Corresponding relocation
    ----                ------------------------
    got12               R_FRV_GOT12
    gotlo               R_FRV_GOTLO
    gothi               R_FRV_GOTHI

    gotfuncdesc12       R_FRV_FUNCDESC_GOT12
    gotfuncdeschi       R_FRV_FUNCDESC_GOTHI
    gotfuncdesclo       R_FRV_FUNCDESC_GOTLO

    funcdesc            R_FRV_FUNCDESC

    gotofffuncdesc12    R_FRV_FUNCDESC_GOTOFF12
    gotofffuncdeschi    R_FRV_FUNCDESC_GOTOFFHI
    gotofffuncdesclo    R_FRV_FUNCDESC_GOTOFFLO

    gotoff12            R_FRV_GOTOFF12
    gotoffhi            R_FRV_GOTOFFHI
    gotofflo            R_FRV_GOTOFFLO


ELF Header
----------

The FR-V processor specific flag for the `e_flags'' field in the ELF
header which indicates the use of the FR-V shared library ABI is
EF_FRV_FDPIC.  The value for this flag is 0x00008000.

When both EF_FRV_FDPIC and EF_FRV_PIC are set, it means each segment
of the binary can be loaded at an arbitrary address, which means
sharing of text segments is possible.  If EF_FRV_FDPIC is set but
EF_FRV_PIC is clear, all segments must be relocated by the same
amount.  The linker should warn and clear EF_FRV_PIC when linking
FDPIC binaries if it finds any inter-segment relocation, and set it
otherwise.  Examples of inter-segment relocations are a GPREL
relocation referencing a symbol that is not in the text segment, or a
GOTOFF relocation referencing a symbol that is not in the data
segment.


Start up
--------

At the program's entry point, the stack pointer must be set to an
address close to the end of the stack segment.  The size of the stack
segment is specified by the PT_GNU_STACK program header, and is
derived from the value of the symbol __stacksize, that can be defined
to an absolute value when linking a program.  The default stack size
is 128Kb.  Starting at the address pointed to by sp, the program
should be able to find its arguments, environment variables, and
auxiliary vector table and load maps.  Here's what the stack looks like:

  sp:		argc
  sp+4:		argv[0]
  ...
  sp+4*argc:	argv[argc-1]
  sp+4+4*argc:	NULL
  sp+8+4*argc:	envp[0]
  ...
  		NULL

The NULL terminator of envp is immediately followed by the Auxiliary
Vector Table.  Each entry is a pair of words, the first being an entry
type, the second being either an integer value or a pointer.  An entry
type of value zero (AT_NULL) marks the end of the auxiliary vector.

Load maps will often, but not necessarily, follow the auxiliary
vector.  They use the following data structure:

struct elf32_fdpic_loadmap {
  /* Protocol version number, must be zero.  */
  Elf32_Half version;
  /* Number of segments in this map.  */
  Elf32_Half nsegs;
  /* The actual memory map.  */
  struct elf32_fdpic_loadseg segs[/*nsegs*/];
};

/* This data structure represents a PT_LOAD segment.  */
struct elf32_fdpic_loadseg
{
  /* Core address to which the segment is mapped.  */
  Elf32_Addr addr;
  /* VMA recorded in the program header.  */
  Elf32_Addr p_vaddr;
  /* Size of this segment in memory.  */
  Elf32_Word p_memsz;
};

At program start-up, register GR16 should hold a pointer to a struct
elf32_fdpic_loadmap that describes where the kernel mapped each of the
PT_LOAD segments of the executable.  At start-up of an interpreter for
another program (e.g., ld.so), GR17 will be set to the load map of the
interpreter, and GR18 will be set to a pointer to the PT_DYNAMIC
section of the intepreter, if it was mapped as part of any loadable
segment, or 0 otherwise.  In the absence of an interpreter, GR17 will
be 0, and GR18 will be the main program's PT_DYNAMIC address.  All
other callee-saved registers (GR19, GR21-GR27 and GR29) are supposed
to be initialized to 0 by the kernel before it transfers control to
userland, but applications shoudln't rely on this (except for GR20,
see below) since future extensions of the ABI may assign other
meanings to these registers.  Caller-saved registers have
indeterminate value.

Both static and dynamic executables are responsible for
self-relocating and initializing the PIC register.  Self-relocation is
accomplished by adjusting, according to the link map stored in GR16,
every pointer in the range [__ROFIXUP_LIST__,__ROFIXUP_END__-4).  The
addresses of __ROFIXUP_LIST__ and __ROFIXUP_END__ can be computed by
means of GP/PC-relative addressing, since they are known to be in the
text segment, as in the code below:

	call	.Lcall
.Lcall:
	movsg	lr, gr4
	sethi.p	#gprelhi(.Lcall), gr5
	setlo	#gprello(.Lcall), gr5
	sub.p	gr4, gr5, gr4
	/* gr4 now holds the _gp address.  */
	
	mov	gr16, gr8
	sethi.p #gprelhi(__ROFIXUP_LIST__), gr9
	sethi	#gprelhi(__ROFIXUP_END__), gr10
	setlo.p #gprello(__ROFIXUP_LIST__), gr9
	setlo	#gprello(__ROFIXUP_END__), gr10
	add.p	gr9, gr4, gr9
	add	gr10, gr4, gr10

Note that, unlike EABI, the pointers in the .rofixup section are
created by the linker; FDPIC object files should not contain .rofixup
sections.  The linker emits rofixup entries in static or dynamic
executables that are not linked with -pie wherever it would emit a
dynamic relocation in PIEs or dynamic libraries.

The linker also emits, as the last entry of the .rofixup section, the
value of the _GLOBAL_OFFSET_TABLE_ symbol.  The code that performs
self-relocation should not dereference this last entry to relocate its
contents; instead, it should simply compute the relocated value of the
entry itself, thus obtaining the PIC register value without using any
non-PIC or inter-segment relocation, that would force the executable
to relocate as a unit.

In case a dynamic loader is used, it may set GR20 to the address of a
function descriptor that represents a function to be called at program
termination time.  The dynamic loader, however, must not depend on
this function being called for proper termination.

The dynamic loader may change the stack pointer such that it is not
aligned to a double-word boundary, but rather to a single-word
boundary.  It is recommended that every program's start up code
adjusts the stack pointer after obtaining the program arguments from
the top of the stack.

Chunks of code inserted in .init and .fini sections (_init and _fini
functions, respectively) must not assume gr15 to hold the value of the
PIC register.  _init and _fini prologues are expected to save the
initial gr15 at @(fp,4), and the initial lr at @(fp,8).

Debugger Support - Overview
---------------------------

Debugger support is substantially different from what is normally done
on GNU/Linux for the following reasons:

    1) The usual method for finding the dynamic linker data structures
       won't work since the text and data area for the main program
       itself are dynamically located.  Normally, the debugger is able
       to find the address of the executable's sections by looking in
       the executable itself.  This, in turn allows the debugger to
       find the dynamic section in which it looks for the value of the
       DT_DEBUG tag.  The DT_DEBUG value provides the debugger with
       the address of the r_debug struct which, in turn, provides
       access to the necessary relocation information for shared
       objects.  But, since none of this will work, an alternate
       method must be found for locating the dynamic linker data
       structures.

    2) The debugger must relocate different sections by different
       amounts due to the fact that the text and data areas (and
       perhaps other sections too) are relocated independently.
       The dynamic linker's debug interface must allow the debugger
       to find out how much each section has been relocated by.

    3) It must be possible for the debugger to attach to a process at
       an arbitrary point of its execution.

    4) Text areas are truly shared among processes which means there
       must be some sort of kernel level support for breakpoints.

Debugger Support - Locating the Dynamic Linker's Data Structures
----------------------------------------------------------------

In a given process, for all possible values of FDPIC (which is in GR15
at function entry time), the word at FDPIC+8 - which is in the dynamic
linker reserve area - contains a pointer to the dynamic linker's data
structures.  This means that each data area for a shared library or
the main executable in a given process contains a pointer to dynamic
linker data structures describing the various load objects and their
relocations.

Unfortunately, GR15 may not keep its value throughout the execution of
a function.  It may be overwritten and used for any other computation.
If it's needed again, it can be copied to another register or to a
stack slot.  It might be possible for the debugger to locate the PIC
value at such alternate locations by using call-frame debug
information, but to do so, it would need the PC value as in the
executable, not the relocated PC value in the memory location the
kernel chose to map the text segment of the executable, or of any of
the shared libraries it may have been linked with.

To enable a debugger to find where an executable is located in memory,
the initial load maps that the kernel passes to the program in GR16
and GR17 are made available with ptrace calls, as described below:

#define PTRACE_GETFDPIC  31 /* get the ELF fdpic loadmap address */

#define PTRACE_GETFDPIC_EXEC ((void*)0) /* [addr] request the executable loadmap */
#define PTRACE_GETFDPIC_INTERP ((void*)1) /* [addr] request the interpreter loadmap */

struct elf32_fdpic_loadmap *x;
ptrace (PTRACE_GETFDPIC, pid, PTRACE_GETFDPIC_EXEC /* or _INTERP */, &x);

With these maps plus the executable (and/or interpreter) symbol table,
the debugger can locate the program's GOT in memory, and thus obtain
the link_map doubly-linked list (see below), from which it can obtain
the loadmaps of all loaded modules.

Obtaining r_debug requires the dynamic loader's link map and symbol
tables only, to locate the _dl_debug_addr symbol defined in the
dynamic loader.  If there is no dynamic loader, or if it hasn't got to
the point at which it sets up the main program's GOT reserve area,
r_debug won't be available.


Debugger Support - Data structures
----------------------------------

The word at GR15+8 is a pointer to a struct of the following form:

  struct link_map {
    /* These first few members are part of the protocol with the debugger.
       This is the same format used in SVR4.  */

    struct elf32_fdpic_loadaddr l_addr;
    char *l_name;		/* Absolute file name object was found in.  */
    ElfW(Dyn) *l_ld;		/* Dynamic section of the shared object.  */
    struct link_map *l_next, *l_prev; /* Chain of loaded objects.  */
  };

Where l_addr's type definition is:

  struct elf32_fdpic_loadaddr {
    struct elf32_fdpic_loadmap *map;
    void *got_value;
  };

(struct elf32_fdpic_loadaddr is the type of field dlpi_addr in struct
 dl_phdr_info as well)

_dl_debug_addr (a global symbol defined in the dynamic loader) is a
pointer to the following type:

  struct r_debug {
    int r_version;		/* Version number for this protocol.  */

    struct link_map *r_map;	/* Head of the chain of loaded objects.  */

    /* This is the address of a function internal to the run-time linker,
       that will always be called when the linker begins to map in a
       library or unmap it, and again when the mapping change is complete.
       The debugger can set a breakpoint at this address if it wants to
       notice shared object mapping changes.  Being a pointer to a
       function, it is actually a pointer to a function descriptor.  */
    ElfW(Addr) r_brk;
    enum
      {
	/* This state value describes the mapping change taking place when
	   the `r_brk' address is called.  */
	RT_CONSISTENT,		/* Mapping change is complete.  */
	RT_ADD,			/* Beginning to add a new object.  */
	RT_DELETE		/* Beginning to remove an object mapping.  */
      } r_state;

    ElfW(Addr) r_ldbase;	/* GOT pointer of the dynamic loader.  */
  };

The version number for this protocol will be 1.


Debugger Support - Finding GOT Addresses
----------------------------------------

The field ``got_value'' in the link_map struct provides the debugger
with the GOT address for all functions in the load module described by
that link_map entry.


Debugger Support - Finding "Official" Function Descriptor Addresses
-------------------------------------------------------------------

We might want to add some means for the debugger to obtain a function
descriptor for a function at a certain address, like
_dl_funcdesc_for(void *entry_point, void *got_value), that is defined
in the dynamic loader but is static.

However, since the debugger has to make do without it for static
executables, it can probably make do without it for dynamic
executables as well.  For global functions, it could look for dynamic
R_FRV_FUNCDESC relocations pointing to the function's symbol when it
needs the same pointer that the application would use.  For local
functions, R_FRV_FUNCDESC_VALUEs within the GOT of the module that
defines the function would do.  If it can't find a function
descriptor, it has to allocate memory and initialize it with a
descriptor.

There is a risk that a dlopen()ed module may trigger the creation of a
canonical function descriptor for a function that previously didn't
need one, in which case the debugger will have created a different
function descriptor for the function and they won't compare equal.
This is the only case in which _dl_funcdesc_for would come in handy.
But is any of this worth all the complexity and duplication of
functionality?


Debugger Support - Breakpoint Considerations
--------------------------------------------

Debugger applications implement software breakpoints by causing a trap
instruction to be written at the address at which a breakpoint is
desired.  (The debugger will first fetch the contents of the location
under consideration so that it may be restored when the breakpoint is
removed.)

In order to implement software breakpoints, the text sections for the
process being debugged must reside in writable memory.  It is okay for
the text section of non-debugged processes to reside in read-only
memory, but some provision must be made to run a process being
debugged in read/write memory.  Furthermore, this determination must
be made at the time the process is started.  (Trying to migrate a
running process from read-only to read/write memory would involve
attempting to fix text section pointers on the stack, which is an
impossible task without type information about each stack slot.)

The solution we suggest the kernel to implement on non-MMU systems is
the following: when a process that is being ptrace()d runs exec()s,
the kernel will not share the text segment of the newly-exec()ed
program, nor those of an interpreter it might require.  Also, the
mmap() system call will not share text segments used by libraries of
such a process, which it would normally do in response to the presence
of MAP_EXECUTABLE and MAP_DENYWRITE in the flags passed to mmap().

This arrangement will not make processes that the debugger attaches to
after they are mapped in look like they have independent sets of
breakpoints; they may just crash instead of they reach a breakpoint
instruction set with ptrace for another process.  Enabling independent
breakpoints in this case would require the kernel to monitor
breakpoint installation with POKETEXT and arrange for such changes to
code sections to only be visible while the affected process is
running.  This was regarded to be a sufficiently uncommon case that we
have decided to not penalize every context switch with the additional
verifications that would have been needed to implement this solution.
It remains as an optional feature of the kernel, but it is no longer
mandated by the ABI.


FR-V EABI vs. FR-V Shared Library ABI Differences
-------------------------------------------------

The FR-V shared library ABI uses the same parameter passing
conventions established by the FR-V EABI, but it is a different ABI
due to the following differences:

    *  The representation of function pointers is different.  In
       the FR-V EABI, a function pointer is merely the address of
       the function in question.  In the FR-V shared library ABI,
       a function pointer is the address of a descriptor containing
       the function's entry point and GOT address.

    *  The FR-V EABI assumes that any text and data segment
       load time relocations will cause both segments to be relocated
       by the same amount.  The FR-V shared library ABI assumes that
       these segments will be relocated by different amounts.

    *  Calling conventions are different (even though parameter
       passing conventions are the same).  The FR-V shared library
       ABI requires that GR15 be set to the GOT address upon
       function entry.  The FR-V EABI has no such requirement.

    *  The mechanisms used for accessing global data are different
       (and incompatible) between the FR-V EABI and the FR-V shared
       library ABI.

    *  The numbers associated with some of the relocation types
       differs between the ABIs.


FR-V EABI vs. FR-V Shared Library ABI Linking Limitations
---------------------------------------------------------

As a consequence of the differences noted in the previous section, the
following limitations exist when attempting to link a library using
the FR-V EABI with code using the FR-V shared library ABI:

    1) Function pointers may not be passed to, nor returned from
       functions in the EABI library.  This includes not only
       function pointers passed (or returned) directly, but those
       appearing in struct or union members as well.

    2) All segments comprising the EABI library will be relocated
       together.  This means that there will be no sharing of any of
       the text sections from such a library.  The EABI library must
       use position independent code to make load-time relocation
       possible.

       In order to implement this behavior, a custom linker script is
       required for such libraries which doesn't add a page boundary
       in between the text and data segments, such that they end up in
       the same segment.

       Alternately (to avoid the need for a custom linker script), an
       EABI library may be linked into a static executable.

    3) Calls to functions external to the EABI library must occur
       through glue code which is responsible for fetching the GOT
       address and entry point from a local function descriptor.  The
       latter half of the glue code is very much like a PLT entry:

          glue_plt(foo):     
                        movsg lr, gr4
                        call  .LCF0
          .LCF0:        movsg lr, gr14
                        movgs gr4, lr
                        sethi #gprelhi(.LCF0), gr5
                        setlo #gprello(.LCF0), gr5
                        sub gr14, gr5, gr14
                        ldi   @(gr14, gprel12(_GLOBAL_OFFSET_TABLE_)), gr15
                        sethi #gotofffuncdeschi(foo), gr14
                        setlo #gotofffuncdesclo(foo), gr14
                        ldd   @(gr14, gr15), gr14
                        jmpl  @(gr14, gr0)

       [ Note: The above glue code is an example only.  It is quite
         likely that more efficient sequences will be possible.  ]

       Calls from code using the FR-V shared library ABI to the EABI
       will work the same as other calls to other shared library ABI
       functions.  A function descriptor and possibly a PLT entry will
       have been created, and they are used as normal.  Whether the
       called function actually uses the FDPIC register (GR15) is
       up to the function itself.  If it's an EABI function, it will
       set up the PIC register and FDPIC (GR15) will be irrelevant.

    4) Any global data accessed by the EABI library must be local to the
       EABI library.  Global data accessed by code using the FR-V
       shared library ABI must not be in an FR-V EABI library.

    5) Care must be taken to ensure that the numbers associated with
       relocation types are consistent across libraries.


Provisioning for Native Posix Thread Library
--------------------------------------------

The Native Posix Thread Library (NPTL) requires a register to be used
as the thread context pointer.  Register GR29 is reserved for this
purpose.  This requires the kernel to actually preserve the value of
this register, a requirement that is not present in the EABI.


Syscall Argument Passing Conventions
------------------------------------

The following argument passing conventions are used for syscalls:

    REG     ENTRY           EXIT
    ----    -----           ----
    GR7     syscall no.     preserved
    GR8     arg 1           return value / error
    GR9     arg 2           preserved
    GR10    arg 3           preserved
    GR11    arg 4           preserved
    GR12    arg 5           preserved
    GR13    arg 6           preserved

Note that, with the exception of GR8, the kernel preserves the values
of each of these registers.  All other registers (with the possible
exception of GR28, GR30 and GR31) are preserved too.

The syscall is made via:

        TIRA    GR0,#0


Page size
---------

The page size is fixed at 16 kilobytes, for compatibility with MMU
Linux.  The mmap2 system call will take offsets right-shifted by 12
bits, like other ports, but it will reject offsets that do not
represent multiples of the page size.  Programs must not, however,
assume the result of mmap to be aligned to 16-kilobyte boundaries, nor
that the amount of space obtained from mmap is rounded up to a
multiple of the page size, since uClinux does not offer such
guarantees.

We could have defined a larger maximum page size, to enable MMU Linux
to use larger page sizes, but this would impact binary sizes and
memory use even on uClinux, since the linker would have to generate
binaries that could work with the maximum ABI-defined page size.


Revision History
----------------

Revision 1.0:

- Document lay out of arguments, environment and auxiliary vector in
the stack.

- Permit single-word stack alignment at the program entry point, such
that using glibc's ld.so as the main program doesn't require copying
arguments, environment and auxvec if it happens to skip an odd number
of arguments.

- Recommend GR20 as dynamic loader finish function.

- Document location of gr15 within _init/_fini.

- Fix description of entry-point value of GR18.

Revision 0.9.9:

- Document page size definition.

Revision 0.9.8:

- Dropped the requirement of separate per-process software breakpoint
sets; it's now optional.  Mandatory behavior now is to not share text
segments of processes being ptrace()d.

- Reserved GR29 for use as the NPTL base register.  Syscalls are no
longer allowed to clobber it.

- Renumber PTRACE_GETFDPIC to a safer range.

Revision 0.9.7:

- Split load map and got value from struct link_map into new struct
elf32_fdpic_loadaddr, the type of dl_phdr_info::dlpi_addr.

- Clarify that _dl_debug_addr is a symbol defined in the dynamic
loader.

Revision 0.9.6:

- _gp is now in the text segment, next to .rodata and .rodata1.
  Recommend its use to reference symbols in the text segment.

- Explain why it's not always profitable to inline PLT entries.

- Change type of _GLOBAL_OFFSET_TABLE_ + 8 to struct link_map *.

- Added examples of accessing small data.

- Lazy FUNCDESC_VALUE in-place value changed.  Added notes about the
  purpose of the segment index.

- The linker clears EF_FRV_PIC to force relocation as a unit when
  there are inter-segment relocations.

- New section on `Start up'.  New loadmap data structure.  Document
  changes regarding .rofixup.

- Introduce PTRACE_GETFDPIC and struct r_debug _dl_debug_addr.  Update
  debugger data structures to match implementation.

- Removed special considerations for static executables.

- Fixed typo in glue_plt.

- Removed Miscellanea.

- Added Revision History.

Revision 0.9.5:

- Add comment to the "GR16/GR17 Usage" section.

- Add code snippets to the section "Global Data and the Global Offset
  Table (GOT)" which show how to compute data addresses and function
  addresses.

- Revise section "Static Executables".

- Add section "Syscall Argument Passing Conventions" from David Howells.

- Add section "Miscellanea".

References
----------

[1] "Linkers & Loaders", John R. Levine, Morgan Kaufmann Publishers, 2000.

[2] "FR-V EABI (Embedded Application Binary Interface)" version 1.0
    release 8/28/2001, Fujitsu Limited, 2001

[3] "FR-V Architecture Specification, Vol 1" version 1.3, Fujitsu Laboratories
    Ltd, 1999.

[4] "GNUPro Toolkit User's Guide for Fujitsu FR-V Processors", Red Hat,
    2001, pp. 21 thru 27.

[5] "IA-64 Software Conventions and Runtime Architecture Guide", Intel, 2000,
    pp. 8-1 thru 8-4.

[6] "Unix System V Application Binary Interface" (for IA-64), Intel, 2000,
    pp. 5-4 thru 5-9.