Additional Assembler Examples
The examples within the explasm directory show various techniques which may
be useful when writing assembler code.
Example utoa1.s shows Integer to String Conversion, but also shows use of
the stack and a recursive function. This example also uses files udiv10.s
and utoatest.c
Example divc.c generates assembler code for dividing by a constant. One
sample of this generated code is used in testing the above utoa1.s example.
Example random.s shows an assembler routine for generating pseudo-random
numbers. This example also uses the file randtest.c to test the example.
Example bytedemo.c shows how byte order reversal can be achieved in just
four instructions.
1 utoa1.s - Integer to String Conversion
This example shows how to:
* convert an integer to a string in ARM assembly language
* use a stack in an ARM assembly language program
* write a recursive function in ARM assembly language
Its dtoa entry point converts a signed integer to a string of decimal
digits (possibly with a leading '-'); its utoa entry point converts an
unsigned integer to a string of decimal digits.
1.1 Algorithm
Converts a signed integer to a decimal string, generates a '-' and negates
the number if it is negative; then converts the remaining unsigned value.
Converts a given unsigned integer to a decimal string by dividing it by 10,
yielding a quotient and a remainder. The remainder is in the range 0-9 and
is used to create the last digit of the decimal representation. If the
quotient is non-zero it is dealt with in the same way as the original
number, creating the leading digits of the decimal representation;
otherwise the process has finished.
1.2 Explanation
On entry, a2 contains the unsigned integer to be converted and a1
addresses a buffer to hold the character representation of it.
On exit, a1 points immediately after the last digit written.
Both the buffer pointer and the original number have to be saved across
the call to udiv10. This could be done by saving the values to memory.
However, it turns out to be more efficient to use two 'variable' registers,
v1 and v2 (which, in turn, have to be saved to memory).
Because utoa calls other functions, it must save its return link address
passed in lr. The function therefore begins by stacking v1, v2 and lr
using STMFD sp!, {v1,v2,lr}.
In the next block of code, a1 and a2 are saved (across the call to udiv10)
in v1 and v2 respectively and the given number (a2) is moved to the first
argument register (a1) before calling udiv10 with a BL Branch with Link)
On return from udiv10, 10 times the quotient is subtracted from the
original number (preserved in v2) by two SUB instructions. The remainder
(in v2) is ready to be converted to character form (by adding ASCII '0')
and to be stored into the output buffer.
But first, utoa has to be called to convert the quotient, unless that is
zero. The next four instructions do this, comparing the quotient (in a1)
with 0, moving the quotient to the second argument register (a2) if not
zero, moving the buffer pointer to the first argument/result register
(a1), and calling utoa if the quotient is not zero.
Note that the buffer pointer is moved to a1 unconditionally: if utoa is
called recursively, a1 will be updated but will still identify the next
free buffer location; if utoa is not called recursively, the next free
buffer location is still needed in a1 by the following code which plants
the remainder digit and returns the updated buffer location (via a1).
The remainder (in a2) is converted to character form by adding '0' and is
then stored in the location addressed by a1. A post-incrementing STRB is
used which stores the character and increments the buffer pointer in a
single instruction, leaving the result value in a1.
Finally, the function is exited by restoring the saved values of v1 and
v2 from the stack, loading the stacked link address into pc and popping
the stack using a single multiple-load instruction:
LDMFD sp!, {v1,v2,pc}
1.3 Creating a runnable example
You can run the utoa routine described here under armsd. To do this, you
must assemble the example and the udiv10 function, compile a simple test
harness written in C, and link the resulting objects together to create a
runnable program.
Copy utua1.s, udiv10.s and utuatest.c from directory examples/explasm to
your current working directory. Then issue the following commands to build
and run the program.
armasm utoa1.s -o utoa1.o
armasm udiv10.s -o udiv10.o
armcc -c utoatest.c
armlink utoa1.o udiv10.o utoatest.o -o utoatest
armsd utoatest
The first two armasm commands assemble the utoa function and the udiv10
function, creating object files utoa1.o and udiv10.o.
The armcc command compiles the test harness. The -c flag tells armcc to compile
only (not to link).
The armlink command links the three objects with the ARM C library to create
an executable (here called utoatest), which can be run under armsd.
Alternatively, use the CodeWarrior project file (utoa.mcp) provided.
1.5 Further Notes - Stacks in assembly language
In this example, three words are pushed on to the stack on entry to utoa
and popped off again on exit. By convention, ARM software uses r13,
usually called sp, as a stack pointer pointing to the last-used word of
a downward growing stack (a so-called 'full, descending' stack). However,
this is only a convention and the ARM instruction set supports equally all
four stacking possibilities: FD, FA, ED, EA.
The instruction used to push values on the stack was:
STMFD sp!, {v1, v2, lr}
The action of this instruction is as follows:
* Subtract 4 * number-of-registers from sp
* Store the registers named in {...} in ascending register number
order to memory at [sp], [sp,4], [sp,8] ...
The matching pop instruction was:
LDMFD sp!, {v1, v2, pc}
Its action is:
* Load the registers named in {...} in ascending register number
order from memory at [sp], [sp,4], [sp,8] ...
* Add 4 * number-of-registers to sp.
Many, if not most, register-save requirements in simple assembly language
programs can be met using this approach to stacks.
A more complete treatment of run-time stacks requires a discussion of:
* stack-limit checking (and extension)
* local variables and stack frames
In the utoa program, you must assume the stack is big enough to deal with
the maximum depth of recursion, and in practice this assumption will be
valid. The biggest 32-bit unsigned integer is about four billion, or ten
decimal digits. This means that at most 10 x 3 registers = 120 bytes have
to be stacked. Because the ARM Procedure Call Standard guarantees that
there are at least 256 bytes of stack available when a function is called,
and because we can guess (or know) that udiv10 uses no stack space, we can
be confident that utoa is quite safe if called by an APCS-conforming
caller such as a compiled C test harness.
The stacking technique illustrated here conforms to the ARM Procedure Call
Standard only if the function using it makes no function calls. Since utoa
calls both udiv10 and itself, it really ought to establish a proper stack
frame. If you really want to write functions that can 'plug and play' together
you will have to follow the APCS exactly.
However, when writing a whole program in assembly language you often know
much more than when writing a program fragment for general, robust
service. This allows you to gently break the APCS in the following way:
* Any chain of function/subroutine calls can be considered compatible with
the APCS provided it uses less than 256 bytes of stack space.
So the utoa example is APCS compatible, even though it is not APCS
