Porting »C« Code to the Geos Platform
Originally published in Handheld Systems, Vol 5.5
Whenever your programs have to deal with existing data formats or commonly used algorithms, you are faced with the choice between »reimplementing the wheel« or trying to use code that other people (or yourself) have written before. The second option is especially tempting when you think of the huge amount of freely available source code in the Public Domain, promising to reduce debugging and learning time. The following article tries to prepare you for a few of the things that may cross your path when converting existing C code to the Geos platform. It should provide a guideline for approaching this kind of project as well as a checklist for assessing the effort involved.
The language: GOC versus Plain C
The first thing to look at when porting code to another platform is the language itself: is the code written in a language for which there is a suitable compiler available on the target platform?
The only language for which this question can be whole-heartedly answered with »yes« under Geos is »orderinary« C. (In this article, I won't differentiate between different flavors of C, like K&R or ANSI, because normally the Geos-specific aspects of porting become much more important than issues raised by various versions of the C language.)
Even though Geos-applications are generally developed in an object-oriented C dialect called »GOC«, it is worth remembering that actual compiling takes place by first pre-processing the GOC source file into plain C which is then fed into a standard DOS compiler to translate it into a linkable object module. In case of the PC-based SDK, this compiler is usually Borland C (Version 3.0 and up).
Generally, all statements specific to the GOC language are prefixed by an »@« (ASCII 64) character, so there is little potential for conflicts caused by reserved words in the language extension. As a result, all parts of the code which are not using GOC are esentially compiled by Borland C.
Files to be pre-processed by the GOC stage are marked by assigning them a .GOC extension. Anyway, if one or more of your source code modules (this is, individual C files) will not make use of any GOC features, you can also leave them with the .C extension. In this case, they will only be compiled by Borland C without any intervention of the GOC pre-processor.
In some cases, the intervention of the pre-processor may still have undesirable side-effects because of bugs in its parser implementation.
- Even though C++ style »//« comments are normally recognized by Borland C, they confuse older versions of the GOC compiler (the one in the Nokia SDK 1.0 still has this problem) and cause improper error messages. To avoid this, comments either have to be rewritten to /*...*/ notation (this can often be done by using a simple editor macro), or the code has to be »routed around« the GOC stage by renaming the source file to have a .C extension. In this case, no GOC features or include files can be used.
- A similar effect can be caused by a »#« symbol in a character literal which is occasionally misread as starting a pre-processor directive in the middle of a line. To avoid this, it should be replaced by a »\x23« representation if an error is reported in such a line.
Function pointers
One prominent feature of the C language which is not directly supported under Geos are function pointers: as in Geos most code segments are potentially movable, storing their address in a variable for later use will generally not work (the only exception being fixed segments). Actually, whenever a function's address is stored in a variable, Geos will fix up the code in such a way at runtime that it stores a so-called »virtual pointer«, which contains the handle of the code segment rather than the current location of the segment in memory.
When dereferencing a function pointer, most DOS compilers will simply use the function's stored address for an assembly language indirect CALL statement. As a virtual pointer does not contain a physical address, this will of course lead to crashes at runtime. Anyway, unless the original code uses a special naming convention for marking function pointers, there is no general method for finding all the occurences of this problem in your code except for checking every routine that is suspicous of using function pointers.
The proper way to use a function pointer in Geos at runtime is to pass the pointer to a routine of the ProcCallFixedOrMovable (»PCFM«) familly which makes sure that the code segment is properly locked down, so the handle can be converted to an absolute address.
PCFM routines are a bit tricky to use because they come in two »flavors«: one for routines with Pascal calling convention (ProcCallFixedOrMovable_pascal) and one for routines called C-style (ProcCallFixedOrMovable_cdecl). Each of these routines receives the function pointer as an argument, together with all the other arguments to be passed to the function. The only visible difference between the two PCFM versions is that the _pascal version requires the function pointer to be that last one in the argument list, while _cdecl expects it to be preceeding all the other arguments.
To make this a little clearer and to avoid compiler warnings, it is a good idea to define custom data types for the argument lists PCFM has to take with the various types of function pointers you are going to use and cast the routine to whatever type you need in the curretn call. This way, you can still enjoy most of the benefits of type checking.
Take, for example, the following snippets of code:
typedef int (*MyFuncPtr)(int foo, int bar);
MyFuncPtr func;
/*...*/
ret = func(1, 2);In Geos, the same call could be expressed as:
typedef int (*MyFuncPtr)(int foo, int bar);
typedef int (*pcfm_MyFuncPtr)(MyFuncPtr, int foo, int bar);
MyFuncPtr func;
/*...*/
ret = ((pcfm_MyFuncPtr)ProcCallFixedOrMovable_cdecl)(func, 1, 2);Predefined #define Macros
In some cases, the pre-processor approach in implementing GOC can be turned into an advantage rather directly when the code to be ported explicitly contains conditional compilation directives for generating a Borland C version: In this case, it is usually most reasonable to start porting from the Borland C (or, for lack of that, MS-DOS) branch of the code, because these will already take care of a number of the limitations also present under Geos, like the int type being 16-bit in size, or restrictions imposed by Real Mode memory management.
When compiling under Geos, the following important #define macros are already pre-defined by the compiler resp. the makefile (notice the double underscores):
__BORLANDC__ __GEOS__ __LARGE__ __MSDOS__ __TURBOC__
The __LARGE__ macro indicates that compilation uses the »large« memory model. I will discuss some implications of this further down in the text.
If you are porting code which already contains #ifdef conditionals for a number of different environments and you want to stick with that practice, you can distinguish between Geos-specific modifications and the original code by surrounding the code with the following type of conditionals:
#ifdef __MSDOS__
#ifdef __GEOS__
/* ... your Geos modifications ... */
#else
/* ... original MS-DOS code ... */
#endif
#else
/* ... other environments ... */
#endifObject-Oriented Programming: a caveat
Even though it is beyond the scope of this article to cover porting strategies for object oriented code in C++ to the Geos platform, it may be worth mentioning that mapping C++ objects directly onto Geos object classes will often result in less-than-optimum results.
The reason for this is that the Geos object model is not particularly well suited for maintaining large numbers of relatively small and low-level objects - creating and destroying individual objects and operating on them by sending messages are generally more costly than their C++ counterparts.
In contrast, the Geos object model usually lends itself well to entities which are either more complex (for example, entire documents) or which have an immediate visual representation and would therefore require a large part of VisClass/VisCompClass functionality anyway.
Linking
It may seem unusual to devote a section of a »porting« article specifically to the question of linking, but Glue, the linker, introduces a few additional constraints on the source code which are more restrictive than those found in other environments.
As Glue also assembles the symbol file for the Swat debugger, it is in a good position to check the consistency of symbolic names between the various source files. This has the advantage of sometimes uncovering mishaps like including the wrong version of a header file in one of the source modules, but there are especially two checks which have turned out to be somewhat irritating when porting code to Geos:
A name used as a value for an enumerated type in one file may not be used by a different enumerated type in another source file. For example, the following use of enumerated types
SOURCE1.C: enum { ONE, TWO, THREE } ValueOne;SOURCE2.C: enum { THREE, FOUR, FIVE } ValueTwo;
will cause an error message because the value THREE is used by more than one type. This error will even strike when these types are only used internally by their respective modules. Depending on the structure of the program, commonplace words like NONE, YES, or NO are good candidates for triggering this check, especially as most other environments do not enfore such a restriction.
If the types are only used internally within each of the files (and not used as parameters in public routines), they can be made unique by adding a prefix (or postfix) to the ambiguous names, for example the name of the file in which they are defined. If the same type is used across files with partly different declarations (usually by having a different set of include files in some of the source modules) and passed as an argument to functions, it has to be split into multiple types which are typecast-compatible (that is, values with an identical meaning have to be represented by the same value when the type is cast to an integer).
A similar verification is performed on structures which have the same name, but different definitions in multiple files. Again, if the structures are private to their source files, making them unique by using specific pre- or postfixes in each of the files is sufficent.
Another cause can be the use of »opaque« data structures which are only defined as dummy placeholders in the public include files, while code operating on the internals of the data types includes a different version with a detailed declaration. In this case, a solution is to define an »internal« version of the type with all the details in it (and a different name), and then to cast pointers to the »opaque« structure whenever its internals have to be accessed. If this would result in a large number of typecasts, using a macro encapsulating the typecasted reference to the pointer or a second pointer variable which is assigned once in the beginning of a routine may help.
Another linker error which can be fairly hard to track down is the »unhandled object record type b6« message. It is usually caused by forward references or function pointer assignments with functions declared as being static. The easiest solution is to avoid static functions at all, unless this is causing function name conflicts between modules (in this case, functions have to be either renamed or moved around until the error disappears).
Library Geodes
If the code you are porting has already been designed for use as a library, it often makes sense to create a standalone library geode from it which can be used by more than one program.
The first step in the creation of a library is to specify to its callers. It will usually consist of a number of header files containing function prototypes and data types, together with a list of »exported« functions which can be called from other programs. To make these functions available from the outside, their names have to be listed in the .GP file for the library with the export keyword:
name mylib.lib
type library, single
# :
# :
export PASCAL_FUNCTION_1
export Cdecl_function_2Please not that the names of functions declared with the _pascal keyword have to be specified in upper case, while functions using the C calling convetion must be written in mixed case.
It may be a little cumbersome to get a full list of all the exported function names of a large library project. If a Windows DLL version of the library can also be compiled from the sources, look for the associated .DEF file which should also contain a full list of exported functions (under Windows, it serves a similar purpose as a GP file in Geos).
After compiling the library, the directory containing the .GEO file will also contain a file with the same name and the extension .LDF. This file has to be placed into the INCLUDE\LDF subdirectory of either your local development tree (if you are using one) or the main SDK tree in order for other applications to use the library. In addition, you will have to add a line in the form of
library mylib
to the GP file of any application that wants to make calls to your new library.
When compiling a library, it has to be kept in mind that the situation the code runs in is somewhat special as its exported routines are normally called from another program with a different data segment and also have no stack of their own. As C code normally expects its global variables and string constants to reside in the current data segment, special care has to be taken to set it properly whenever the library code is entered.
To do this in a Geos application, there are two possible ways: The first is to set a special flag when calling the Borland C compiler which tells it to generate code that can run as a Windows DLL. As the problems faced by DLL code are almost exactly the same, the resulting code will also be suitable for Geos library usage. This can be done by adding the following local.mk file to your library project directory (if you already have one, just add the XCCOMFLAGS line or append the -WDE switch to the list of compiler options):
#include <$(SYSMAKEFILE)>
XCCOMFLAGS = -WDEIn addition, all the exported functions of your library have to be marked with the _export keyword in the code (you can omit this keyword in the header prototypes if you want because it does not influence the way the routine will be called). If you forget either of the two changes, the resulting crashes may be fairly hard to track down, because their reason is not obvious from the C source code's point of view.
As an alternative, you can do without these changes and add these two lines at the beginning of each exported function:
asm{push ds}
GeodeLoadDGroup(GeodeGetCodeProcessHandle());Also, you have to add the following line at the end of each function and in front of every return statement in the middle:
asm{pop ds}
This is the way that is recommended by Geoworks. It has the advantage of being less dependent on the compiler version and of not introducing the additional uncertainty of what other side effects the -WDE switch may have (I don't know of any). On the other hand, the GeodeGetCodeProcessHandle() call is relatively expensive and may take a considerable amount of processing time for frequently called routines.
When using the first method, you may run into another unusual Swat error mesage: »unsupported fixup type FLT_LDRRES_OFF«. It is usually caused by immediate assignments to function pointers. The cure is to make a static variable pointing to the function and transfer its value to the variable you want to assign the pointer to:
int Function(int foo, int bar);
static MyFuncPtr func = Function;
/*...*/
MyFuncPtr *localFunctionPtr;
localFunctionPtr = func;If you look at the comments for the FSFILTER sample in the SDK, you will notice that this is the recommended way for dealing with function pointer assignments anyway because some compiler versions may not create code suitable for use under Geos if you assigned the address of Function () directly to localFunctionPtr.
Runtime Environment
The following list shows the routines of the C runtime library which are available to a Geos application (actually, some of these are not true routines, but macros):
abs (m) acos (m) acosh (m) asin (m)asinh (m) atan (m) atanh (m) atof (m)atoi bsearch cabs (m) calloccos (m) cosh (m) fclose fdclosefdopen feof fflush fgetcfgets fopen fread freefseek ftell fwrite hypot (m)isalnum isalpha iscntrl isdigitisgraph islower isprint ispunctisspace isupper isxdigit itoa (g)labs (m) malloc memchr memcmpmemcpy memmove memset qsortrealloc rename sin (m) sinh (m)sprintf sqrt (m) strcat strchrstrcmp strcmpi (g) strcoll strcpystrcspn strlen strlwr (g) strncatstrncmp strncpy strpbrk strpos (g)strrchr strrev (g) strrpos (g)strspnstrstr strupr (g) tan (m) tanh (m)tolower toupper va_arg (B) va_end (B)va_start (B) vsprintf
Include Files and Library Versions
If not marked otherwise, all these routines are provided by the ansic library and require the inclusion of the header files in the INCLUDE\ANSI subdirectory of the SDK (that is, stdio.h, stdlib.h, string.h, ctype.h). Routines marked with (m) are provided by the math library and require inclusion of math.h. Routines marked with (g) require the inclusion of geomisc.h, while (B) indicates routines provided by the standard include files of Borland C that are automatically loaded from stdio.h.
Most of these routines are not covered in the official SDK documentation, but as usual, you can learn quite a lot about them from the comments in their respective header files.
If you want to compile a C file without sending it through the GOC pre-processor first, you should at least include the geos.h file before using any of the more specific header files, because most other include files rely on the definitions made there. If you are not including object.h at the same time, you will get a warning that the structure _ClassStruct is undefined, but you can safely ignore that...
Because the C runtime library routines did not receive too much testing in the early days of Geos, some of the routines contain bugs in older versions of the ansic library. For many of these routines, the information about the earliest »safe« library release for the respective call has been marked in the LDF files for the library that come with more recent versions of the SDK.
When a geode is linked in the presence of such an updated LDF file, the minimum required version of the library will be stored in the executable file, yielding the infamous »not compatible with this version of the system software« error message when the user tries to start it on an older version of the OS. In some cases, it is possible to avoid this by including the appropriate platform statement in the GP file for the application. For example, the statement
platform zoomer
will try to include some bugfixing routines in your code to make sure that the program will run on the Zoomer as well.
Because of this, it is often a good idea to make sure that you use the LDF files from the most recent version of a Geos SDK (currently, the one for the Nokia 9000), because it will be likely to contain the most information about bugs on earlier platforms.
Memory management
GOC generally uses the so called »large« memory model. The »large« model on x86-style CPUs specifies that code and data can be stored in multiple segments of up to 64k each in size, requiring all pointers to be »far« pointers, that is, to consist of a segment as well as an offset part. In contrast to the »huge« model, all pointer operations and arrays are limited to a range of up to 65536 bytes each. In addition, no single memory allocation may ever request more than 64k. In reality, even an allocation of that size may already reduce free heap space dramatically, especially when the free space on the heap is fragmented to some extent.
It is thus important that the code to be ported already addresses this issue - many sources designed for multiple platforms already do this regularly as part of their MS-DOS support if you enable the proper #define statements for conditional compilation. Otherwise, modifying code that expects to run within a flat (non-segmented) environment will require a good deal of understanding of the program's inner workings.
Geos is fundamentally a real mode operating system. Because the so called »real mode« of x86 CPUs does not support any hardware memory management, virtual memory pools outside the primary 1 MB arena (which at runtime is usually reduced to a few hundered K or less by system libraries, ROM code and memory mapped hardware, varying strongly from device to device) are accessible for applications only by using Geos-specific memory management functions: data blocks have to be »locked« to make sure that they are in accessible (»conventional«) memory before use. Afterwards they should be »unlocked« as soon as possible to indicate that they may be swapped out to the backing store again.
This procedure, however, is not required when using the Geos emulations of the C standard functions malloc(), calloc(), and free(): they operate with pointers instead of memory handles, so all allocated memory is ready for use immediately. In addition, malloc()/calloc() operate their own system of heaps to satisfy small (that is, a few hundered bytes) allocations without adding the overhead of a global memory block each time. This frees the user from having to worry about creating LMem heaps for storing individual items of data.
The price paid for this convenience is that all data allocated by malloc()/calloc() is stored in »fixed« memory. This means that it cannot be swapped out if conventional memory becomes crowded. For this reason, the total amount of malloc'able memory is strictly limited by the amount of free space on the conventional heap (as opposed to the amount of swap space, which is usually higher). Depending on the memory layout of the device and the degree of heap fragmentation, this may not be more than 200k (or even less), even if an application is the only one contributing to the size of the fixed memory pool.
As an added complication, by the time malloc() fails due to lack of memory, it may already have allocated so much space that the ability of the system to swap code resources into memory as needed is severely reduced. In an extreme case, this may mean that a large block code cannot be swapped in at all, leading to the dreaded »Conventional memory below 640k is full« error message.
Generally, any application that allocates more than about 100k of malloc() memory should be expected to require very careful attention to the way it uses conventional memory as opposed to swap space. In this case, it should be considered rewriting the memory management to use the native allocate/lock/unlock scheme for memory access. If only a few data structures make up a large part of the memory allocations, it may be an option to reimplement only them using Huge Arrays or Database Library functions in a temporary VM file.
Additionally, such applications will especially depend on a proper heapspace value specified in the GP file when operating on »transparent detach« devices like the Nokia or the OmniGo which try to keep applications in memory as long as possible, even if they are not currently visible by the user.
The one exception from the rule that non-Geos code does not make use of Geos virtual memory is of course the automatic swapping in and out of code or movable data segments whenever they are used: Code is allocated to segments based on the simple rule: one source file makes one segment. Because existing code often does not take this into account, a rather simple optimization may consist of »workset tuning« in the form of moving routines between the source files to make sure that infrequently used routines (like initialization code) or routines which are rarely used together (like compression and decompression) go into their own segments so they can be discarded from memory when they have done their job. This gives the data memory management valuable extra space to place around its own blocks.
Stack space may become an issue if the program you are porting assumes that it has plenty of it, while Geos by default only assigns 2048 bytes to each thread. As stack checking is only performed whenever a system call is made in the EC (error checking) version of the operating system, you may run low on stack for quite some time without noticing it. You can increase the stack space available to an application (libraries do not have their own stack) by specificing the number of bytes in the »stack« statement of the GP file. Anyway, the size of the stack also contributes to the amount of fixed memory, so this should be used with some care to avoid reducing available memory even further.
File I/O
Except for the routines fgets() and feof(), which have only been introduced on the OmniGo and later releases, all file functions listed in the table above are available in all versions starting from Ensemble (that is, PC/Geos 2.0). Anyway, earlier versions of the ansic library (pre-OmniGo, library protocol before 1.5) have a bug in the fgetc() function which make it read garbage every 4096 bytes (thus rendering it virtually unusable). Again, if you are using an updated LDF file from a newer SDK, Glue will make sure that your application cannot be started on platforms which have an older version of the ansic library.
There are two Geos-specifics worth mentioning with regards to the flags passed to the fopen() call (for information like this, look directly into the INCLUDE\ANSI\STDIO.H file):
- There are no »binary« and »text« mode flags, usually 'b' and 't' in other implementations. Using one of these flags will even result in a fatal error in the EC version of the operating system. The reason is that currently all files are opened in binary mode. This means that any handling of Ctrl-Z as end-of-file character and Carriage Return/Line Feed translation has to be performed by the application if required.
- There is an additional 'V' flag, which makes sure that a new file (if opened in »write« mode) is created as a »native« file, meaning that it doesn't get a Geos-file header and cannot use features like long filenames and extended attributes. If you are porting an application which writes files to be read by other (non-Geos) programs, it is usually a good idea to use this flag.
You may notice that only functions for buffered file input/output are included in the list (obviously, all the functions using stdin and stdout are missing because they would only make sense with a console emulation which is not offered by Geos), while low-level functions to access a file by handle in unbuffered mode are missing. Anyway, functions like open(), read(), or lseek() can be easily mapped to the native Geos functions FileOpen(), FileRead(), FilePos() etc., only substituting the int file handle type by FileHandle.
Error handling
There is no equivalent in Geos for concepts like Signals, Exceptions or the setjmp()/longjmp() mechanism for »short-cutting« out of a deeply nested call tree in case of an unexpected error. For code which heavily relies on one of these features (or which just tries to abort after giving an error message by calling exit() in the middle of the code), there is hardly another option than to go all the way and implement a »proper« error recovery scheme using return codes and conditionals...
On the other hand, if the code makes use of internal »sanity checks« either by means of the assert() macro or by employing its own techniques, this can be nicely converted to an error checking version of your Geode: conditional compilation directives depending on some custom »debug« flag (often called DEBUG) only have to be replaced by the following statements:
#ifdef DO_ERROR_CHECKING
/*...special debug code...*/
#endifSimple in-line checks for a certain assertion made by the code being true can be reformulated using the EC_ERROR_IF macro defined in ec.h, which takes a boolean expression as the first parameter and an error code as the second. If the expression evaluates to »true« (not that this is just the reverse of how conventional assert() macros work like). If you just want the code to fail in the debugger and plan to inspect the location where it failed anyway, it is often sufficent to specify a value of -1 as the error code. If you want Swat to give a clearer error message, you can define an enumerated type called FatalErrors in a common include file used by all your source modules which lists the error messages as names of enumeration values:
typedef enum {
SYSTEM_ERROR_CODES, /* reserved error codes */
ERROR_MESSAGE_1,
ERROR_MESSAGE_2,
/*...and so on...*/
} FatalErrors;After that, you can pass one of the error message values to EC_ERROR_IF (or the FatalError() function), and it will be displayed by Swat in case of an error.
Floating point code
There isn't too much to say about using C floating point code (float and double data types) in Geos, except for »Beware«: most of the functionality expected from the C standard library is offered through support for the Borland C floating point emulation (this is what the borlandc library does) and routines of the math library.
Again, this has to be taken with a grain of salt in pre-OmniGo releases: older versions of borlandc give incorrect results for operations like:
float a,b,c,d, ret;
ret = (a-b)/(c-d);What happens is that the order of operands to the »/« operator is reversed in cases where the operands are more complex than simple variables. Needless to say that this creates hard to find problems in perfectly fine code.
Another thing to keep in mind is that the implementation of the floating point emulator is not particularly fast, compared with native Geos routines operating on FloatNum 80-bit values (equal to the long double data type in C) or the extremely fast WWFixed fixed-precision format, so it is often a worthwhile optimization to replace at least some core operations by native calls (for more details, see the article »Optimizing Geos Applications« in PDA Developers issue 4.6).
If you want to convert floating point numbers to ASCII, a call like
sprintf(buf, "%f", myFloatVar);
will not work because sprintf() does not support floating point values. Instead, the Knowledge Base on www.geoworks.com recommends using the native FloatFloatToAscii() call (or one of the functions derived from it, which are more restricted in scope, but easier to use).
Closing Notes
Some more tips and tools for porting command-line C applications can be found in David Challener's online book »Handy Tips and Techniques for Geos programming«, which is available from Breadbox Computer.
I won't even try to touch the subject of porting highly-graphical applications written for other environments (namely, Windows) to Geos. As the paradigm for implementing user-interfaces in Geos is quite different from things found on other platforms (even though there is a lot of overlap with class-library based UI design) the most reasonable approach will usually be to carefully seperate the user interaction part from the »intelligence« of the code first. After that, only the components containing the core algorithms are ported, while most of the UI layer probably has to be rewritten to take advantage of the features offered by Geos.