ChaOS Diary Jan 2010 - Mar 2010

ChaOS Home    ChaOS Source Notes    ChaOS Source Index    ChaOS Downloads    CTPP Home

Current Diary    Diary to Jun 2010    Diary to Dec 2009    Diary to Aug 2009    Diary to Apr 2009    Diary to Nov 2008

ChaOS Diary - monoblog and links to reference documents

Many golden nuggets lie herein.

31/3/2010 NJOB over ChaOS End of the financial year, and statement day. Added a tweak to accenquiry transaction amount editor, often used when a transaction amount has been incorrectly or provisionally entered. When a journal is entered, the incorrect amount is written to between two and four records due to the double entry, and posts to sales or purchase ledger control. This has always been a bother, requiring manual intervention on all the affected accounts. I like to enter provisonal accruals into the accounts, such as estimates for electricity costs, and these have to be corrected when the actual bill arrives in the following month. Just added a rewrite parameter to writetransfer(), to be called by accenquiry. If a journal amount is altered on any account, all corresponding double entries are located and rewritten to keep the system in balance.

30/3/2010 D10 over ChaOS/CC ChaOS compiler A rewarding but infuriating day, testing final D10 prototype in readiness to engrave a full roller. Added COM port emergency stop (interrupt generated by button press on an old serial mouse with the ball removed), this took me about a week to write 10 years ago under DOS4GW, took an hour today to port this to ChaOS. Ran test engrave with left and right autotees, perfect. Then began engraving full roller, 140mm right hand tee burn OK, then main ribbon started, running time 7 hours with autoshutdown set. After an hour or so I noticed the main ribbon seemed to be stretched in the X direction, which seemed impossible given that the same program had just engraved a test perfectly.

Stopped the burn, discovered that the design was scaled by 188%, which is a really strange number. I use an integral ratio of 18838/18820 to make a tiny adjustment for the empirical difference between rollers engraved on my rig and those engraved by my old roller supplier. Had one of these integers somehow found its way into the xstepsperrev calculation?

Noticing a stray value of 2.5399999999 on the co-processor immediately points to a compiler bug. The co-processor stack should always be empty at the end of an expression, unless returning a value from a function. This is obviously an inch/centimetre ratio, usually introduced into a program as the constant 2.54. I had spotted a couple of unruly floating point results whilst porting D10 to ChaOS, which occured when using a (DB) override on a floating point constant. Because the ChaOS compiler promotes integers to floating point when combined with floating point constants in expressions, (DB) casts are not needed in expressions such as int n=(DB)y/2.54. The error in the compiler was an extra fetch() generated in cast_expression(), when the (DB) should be ignored when the carrying value of the expression is already a floating point value. Infuriating that an innocuous expression such as xstepspercm=xstepsperinch/(DB)2.54 should spoil a whole roller, but rewarding that such a destructive bug was so easy to locate, and fix.

29/3/2010 D10 over ChaOS Further testing of D10 prototype, including machine ancilliaries (compressor, extractor, water pump etc) and autoshutodown. More complex engraving functions added such as engrave with offset, engrave with autotees, smart horizontal mirror, on-the-fly laser shouldering using lookahead column buffer. No major issues, all being well this version will engrave a full roller tomorrow.

28/3/2010 D10 over ChaOS Tested revised D10 prototype, laser works fine.

27/3/2010 D10 over ChaOS Decided to go the whole hog and see if a D10 port running over ChaOS can drive the laser and cut rollers. First prototype took 6 hours, motor acceleration ramps sound normal but the Y motor stalled as the laser started. At first I thought this was a timing problem caused by slightly different processor branching in the main xyx decision loop, but extra instructions added to balance processor clocks through if and else branches made no difference. Then discovered that I had imported a bit mask from G2 for the Z-axis (Laser pulse) which was equal to YMOTORSTEP. Of course these completely explains why the Y motor immediately stalls as the laser engraving starts! Will run the revised prototype tomorrow.

26/3/2010 G2 over ChaOS Tried G2 on laser CNC, just to test stepper acceleration/deceleration ramps. Initially ran too fast, (100Hz timer interrupt frequency!). Adjusted loopspersecond by multiplying by 100, instead of 182/10, and motors run OK.

25/3/2010 G2 over ChaOS Started porting the G2 CNC program to ChaOS, which we use to run a small paper re-rolling machine. This is a hacked-down version of the D10 project used to run the CO2 laser CNC. Once again most of the time is spent modifying display code - ChaOS times and dates are nothing like the old DOS functions I had to work with back in 1998. The stepper motor code passes through the ChaOS compiler with little modification - the ChaOS timer interrupt runs at 100Hz rather than 18.2Hz, so real-time calibration is a bit different. About 6 hours to produce a prototype of G2 to test on the machine tomorrow. If this goes well I will consider porting D10 too.

One of the great hurdles in developing the D10 CNC was managing a machine with processor interrupts disabled - this provides the smoothest possible timing for the stepper motors. If the D10 port goes well, I plan to upgrade the laser computer to a dual-core processor, probably an Intel Atom 330. It will then be possible to drive the motors using one core with interrupts disabled, whilst leaving ChaOS running on the BSP to provide display, networking and machine supervision. Given that ChaOS has TCP/IP, the CNC could even be accessed over the internet without disturbing the stepper timing loops.

24/3/2010 NJOB over ChaOS One or two teething problems going live with NJOB over ChaOS today, some small mistakes in the code rewritten to avoid using screen memory to build order and invoice headers before printing. I could have used the old code, but I am just making sure the NJOB displays are ready to run in VESA modes, as well as the old CGA 25-line and VGA 50-line text modes. Once again the indexed databases ran flawlessly.

23/3/2010 NJOB over ChaOS Added ink shop databases support to NJOB, just enough to run live next time the Cobden Chadwick runs, i.e. recursive mixture database: input, edit and quick recipe, print job database: input and edit, plus print recipe sheets and shade sheets. Although used extensively in NJOB, The ChaOS compiler does not handle multi-dimensional arrays passed as function arguments. Without checking the C and C++ language specifications, I seem to remember passing arrays as function arguments and as return values is a feature of C++, not C. Obviously a large array could absorb lots of stack space, so I imagine a C++ compiler would allocate memory and pass a pointer instead, but this leads to garbage collection problems. Since the arrays I am passing around are of a known size, I decided to embed these inside a structure to work around this limitation of my compiler, whilst invoking stronger type-checking.

21/3/2010 NJOB over ChaOS Wading through the thousands of lines of NJOB code to add more functionality to the ChaOS-NJOB prototype. Most of the revision involves replacing C++ function invocations with C, e.g W->trn->START(KEYGE,0) becomes START(W->trn,KEYGE,0). Surprisingly the indexed database engine seems to be flawless. Now working: account database:input, view,edit, list.delete and transaction enquiry; job database: input view edit, delete, list; order database: input, edit, delivery with auto-invoice and print; invoice database:edit,view,print; transaction database: manual input, sales ledger, purchase ledger, nominal ledger, opening balances, trial balance, sales daybook, purchase daybook, cashbook, auto-nominal daybook postings, aged debtor, aged creditor, sales account statement, purchase account remittance advice. Added PCL soft font support to download fonts into the printer (previously I used an old proprietary program to do this). The fonts, along with the macro templates for order forms and invoices now load rapidly using the new print spooler PRNSRV.

This project has exposed a couple of minor compiler bugs, but the processor fault trap in the ChaOS debugger exposed these as they occurred. Some might think that running absolutely no memory or task protection on a processor would result in a simple bug taking down the entire system - and they would be right. However this is a useful feature - the ugly crashes occur soon after the mistake has been made, and the ChaOS processor fault trap usually displays the offending source code as the system locks up. This forces a more defensive approach to programming, knowing that slack coding will be fatal. One general observation is that rogue code (null pointers, corrupt opcode streams etc) nearly always interferes with the real-mode IVT at at linear address zero. Setting a debug memory write trap on the first few bytes of linear memory often catches the bug red-handed.

19/3/2010 PRNSRV/NJOB over ChaOS Moving on at speed with the porting of NJOB to ChaOS, tackled the job of adding a simple background print spooler to buffer Laserjet macro loading and print jobs. Totally amazed at how easy this was, using the ChaOS SRV template developed for HTPSRV, FTPSRV and TELSRV.

Realistic comparisons can now be made between NJOB over ChaOS and NJOB over WATCOM DOS4GW. I am heartened by the fact that the ChaOS version loads and indexes all the CTPP databases in two seconds, three times faster than DOS4GW. I had always dreamed that an unsegmented operating system would be faster than the conventional model, but this is the first evidence that this is true, despite my rudimentary compiler.

18/3/2010 NJOB over ChaOS Modify-on-load and modify-on-store now implemented in ChaOS compiler to support SLV1/SLV2 and SLV3 as inbuilt types. However transferring simple integers to the FPU creates a small error which gives rise to rounding complications when storing the results of computations back to integral types. Since almost all SLVs used in NJOB represent money (two decimal places to encode pence) I suddenly realised that the easiest way to handle virtual decimal points in integers is to ignore them! A global replace of all SLV2s in NJOB by SL types (signed 32-bit integer) and the accounting database works perfectly. This is because double entry book-keeping uses only addition and subtraction, never multiplication or division when combining transactions. Decimal points when representing money only exist when displaying a transaction value on screen.

Where money needs to be multiplied or divided (such as invoicing and tax calculation) the modify-on-load/modify-on-store method is accurate enough to generate the transaction value stored to the accounting database. By using integers, rather than floating point numbers in the accounting database, rounding errors can never compromise the double-entry book-keeping principle.

16/3/2010 NJOB over ChaOS Account database management now ported, moving on to accenquiry which manages transaction history by account. Motorola types are handled by the compiler - they are all over the place in the database indexes, to provide correct byte-ordering when used as part of a key structure. But the old COBOL virtual decimal integers need a little more work. These are very efficient for financial records, storing values with decimal places as plain integers with an implied decimal point. Loading and storing these types is easy enough, but combining them in calculations is more of a challenge, not because the arithmetic is difficult, but because of the sheer number of different combinations of types and operators which the compiler must handle.

C++ uses operator overloads to handle these combinations, but I have been there before. Each new arithmetic type added to a system needs 16 operator overload functions to combine with inbuilt rvalues. 16 more overloads are needed for lvalue combinations, which then allows the new type to be freely mixed in arithmetic expressions. An additional type needs these 32 overloads, plus 32 more to combine with the first new type. So to add the three common COBOL virtual decimals as inbuilt types requires 160 different type/operator combinations to be covered.

The job can be simplified by loading these special integers into the floating point registers, and dividing by 10*decimal places. Subsequent arithmetic is performed on double floating point temporaries, and storing to memory is preceded by a multiplication to preserve the decimal precision. SLV2 type has been implemented in this way, but still needs some pass-throughs* to fall through the expression analyser as a proper inbuilt type. Compiler modifications such as this provide a generic solution to the problem, which C++ cannot do.

A similar solution can be achieved on an as-need basis, by calling a modify-on-load function to access a database field, performing arithmetic using local variables of inbuilt type, then calling a modify-on-store function to update the database field.

Update: *pass-throughs had already been implemented, but were malfunctioning due to to a typo if(!typ[0]&tUSER) should have been if(!(typ[0]&tUSER)). Added SLV1 and SLV3 types, almost all done, but for the fact that I have the floating point core programmed to round down when doing an integer store (this avoids calling _CHOP every time an integer result is stored by the FPU). This demands a revision of the code in the ChaOS compiler which generates double floating point constants. Typically,  DB d=0.175 will load .1749999970 into the FPU, which has always been near enough, but no good when multiplying by 1000  then storing the result to an integer. (we expect 175 but get 174)

15/3/2010 NJOB over ChaOS One way to ensure the COBFILE indexed database code is bomb-proof is to port NJOB to ChaOS. NJOB uses up to a dozen indexed databases simultaneously, each with up to three indexes. Because COBFILE was written in C++, I have always hesitated about porting this to ChaOS, as I believed my C compiler would not be able to cope. However I grasped the nettle this weekend and now have a working prototype of NJOB, with databases all indexed and working as far as I can tell, ready to port the bread and butter code.

Sounds like a reverse step, porting code back from C++ to C, but actually it is a step forward. C++ is designed for programmers who wish to collude and hide their work from others, either their colleagues or their customers, but in a vertical system like ChaOS there is nothing to hide so C++ gets in the way. Switching back to C struct instead of C++ class is not as much of a wrench as I expected. And many of the C++ features I used in NJOB, such as operator overloading to handle arithmetic involving Motorola data types, has been superseded by adding extra inbuilt types to the ChaOS compiler.

On reworking the NJOB C++ classes, I decided to change the names of the main structures, as these may soon become embedded in the CFS file system. So class COBFILE is now struct DBASE, class COBFILESHELL becomes DBSHELL.

Another C++ class used extensively in NJOB is OUTPUTSTREAM, used to generate reports either on-screen, to a file or to a printer. OUTPUTSTREAM was partly ported to C when I bootstrapped the ChaOS compiler, (for the linker map file, assembly and compiler debug listings), but was not ported with printer output in mind.

So to implement printer support for NJOB over ChaOS, I have created a quick loadable PAR.DRV to handle the parallel port, and struct OUTPUTSTR, which can accept output for screen, file or printer. All this in a little over two hours, just tweaking code written 15 years ago to make it pass though the ChaOS compiler.

12/3/2010 CFS ChaOS file system/COBFILE indexed database A better solution to the linear file search time problem in CFS is to attach a binary index, which brings me back to my COBFILE indexed databases written back in the 1980s. These were written to replace COBOL indexed databases, which were slow and very prone to error in the index files. This caused me to write fully memory-resident databases (using the extra memory over DOS allowed by HIMEM.SYS then by DOS4GW), with indexes built from scratch each time the system started. These have worked well over the years, such that I have never needed to port my order-processing and accounting software NJOB from MSDOS/DOS4GW.

The COBFILE indexes are a simple table of integers, giving the physical position of fixed length records in the main database, created and maintained using a very simple binary comparison algorithm. They are simpler than hash tables, and not burdened by the memory allocation overhead of hash table buckets. Inserting records into the database does mean shifting thousands of index records to make space, but since the index is a memory array of integers, this is achieved by a simple memmove() which gets faster as hardware improves*. The time penalty of this approach is only evident when doing the initial index build. If this ever becomes tiresome, the index can be downloaded to disk on system shutdown and reloaded when needed.

For the binary comparisons to work, numeric value in keys must be stored in Motorola (Big-endian order), but since these integer types are built-in to the ChaOS compiler this is easy.

The virtual directory system in CFS uses a nametable index to describe the subdirectory branch strings. So all that is needed to use the CHAOSFILE path structure as a binary key to the whole filesystem is to store the subdirectory branch indexes in Motorola integer order. There is a slight problem in that subdirectory branch strings in the nametable are stored in order of creation. A further small binary index into the nametable can make these appear in alphabetical order.

Thus I have the design now for a filesystem which orders files according to the conventional tree-based subdirectory layout, without the overhead of maintaining subdirectories or inodes. If effect, every file has its full path compacted into an integer array in a one-sector file header, eliminating the risk of data loss caused by corruption of file allocation tables, inodes or subdirectory sectors. In addition, files added to this filesystem immediately appear in alphabetical order.

Index build times (single core 32-bit 2.667GHz) from a quick test program using code ported from COBFILE are: 8000 records=0.03 sec; 16000 records 0.11sec; 32000 records=0.45sec; 64000 records 1.79sec; 133000 records=7.23sec; as expected times quadruple each time the database doubles in size. I will now combine this index design with the anchor records described on 10/3/2010, to enable the file headers to be rapidly loaded into memory on startup, ready for indexing. If startup time ever becomes an issue, the file database can be stored to disk in key order on system shutdown. This database can be loaded on restart without the need for re-indexing.

Update: *133000 records=2.46sec, using 32-bit memmove() to shuffle indexes;

10/3/2010 CFS ChaOS file system Because FAT16 with long filenames is so messy, I am dusting off my old ChaOS filesystem (CFS) from vintage year 2000.  Using case-sensitive long filenames by default, and 64-bit LBAs, CFS was a bit ahead of the hardware I was working with back then. Having suffered plenty of file losses due to faulty hard disks, and mistakes during the development of filesystem software, I was aware that file losses were worst when directory sectors or file allocation tables became corrupted or unreadable. So CFS was a bit radical in that there are no directories on a CFS partition, the directories are virtual, as all files are stored with their full pathname in a branch[] array, and directories exist only in the sense that files with exactly matching branch[] arrays can be grouped together. To make all this work, each file on disk is preceded by a one-sector header containing a CFS signature, rather like a memory heap node, organised as a doubly-linked chain across the disk. Files are always stored contiguously, so even if the chain is broken, a disk scan for the CFS signature sectors can easily rebuild the chain. Deleted nodes are always blanked to ensure they do not interfere. Deleted files can be marked as free, and will be used to store any file within the original size, or marked as archive, in which case they remain as duplicate versions of the same filepath (useful to undo file changes!). New files can be added to the end of the chain until the disk if full, thus spreading wear and tear across the entire media surface. At this point (which I have never reached), a utility can compact the filesystem by eliminating free nodes and duplicates.

The down side to this linear approach to storage is the time penalty of a linear search over a binary search when locating a particular file. In 2000 I overcame this by cacheing the CFS nodes into memory. This approach now leads to a delay when logging a CFS drive, about 4 seconds per 10000 files, but this seems comparable to the time delays seen when Windows XP starts. The solution in hand will be to create larger anchor nodes, which will contain a copy of all the nodes for the next 128 files. These anchor nodes form a coarser chain which can be accessed quickly to log the drive about 100 times faster.

8/3/2010 BIOS Boot Devices Added the logic used by partition manager PART to match ChaOS hard disk devices to BIOS boot devices (by matching partition table images) to ChaOS kernel. This allows the exact ChaOS boot device and boot partition to be identified on startup, which then becomes the current LDRIVE for the ChaOS session. Very handy, especially since I am carrying 19 partitions around on my removable development drive. Building on this theme, added a read/write mass storage device driver for all boot devices, along with BIOS device info, extended BIOS device info and BIOS CD emulation info. If no mass storage devices are located after device scan (i.e. no IDE device driver!) ChaOS scans all BIOS boot devices for logical drives, and finds the boot partition anyway. Much more powerful than the present bbd: driver (used to find the device drivers), as all the hard disks are available with read/write access through BIOS whether or not device drivers exists for mass storage. This is great fun, as ChaOS can easily be installed on any computer, regardless of drivers. As a test, just partitioned, formatted and installed ChaOS on the SCSI RAID of my IBM Quad-Xeon eServer 440, even though I have not the foggiest idea how the SCSI raid works.

3/3/2010 COBDEN The first major revision of COBDEN, last year was an object lesson in how ChaOS copes with escalation to a new platform, years after a deployment. There have been a few issues, understandably, mainly to do with the fact that the older computers have poor PnP mechanisms, and need to be told which drivers to load. COBDEN still runs all machine-control IO on ISA cards, so has to be one of those special cases. COBDEN still runs on the old 10baseT network, communicating with nodes running Netware Lite over MSDOS, but this network now has access to the latest ChaOS TCP/IP network through a network switch. I just discovered that network packets from the old 10Mbit NE200 cards which are larger than 1024 bytes seem to be blocked by standard network switches, but reducing file transfers to 1 sector per packet fixes this problem. For the first time in years I am seeing reliable transfers of large files across the old thin ethernet around the factory.

The latest version of ChaOS hung on COBDEN at the first attempt, breaking into the debugger on an invalid opcode. The problem is immediately obvious, the RDTSC Intel struction does not exist on pre-Pentium processors. I use this now as part of the high-precision timer support in ChaOS. The fix? Write different code for the old timer, and branch according to processor type? No. With ChaOS anything expedient is possible. I just add a case statement to exception 6 invalid opcode handler for ANY processor which faults the RDTSC instruction, skipping over the two offending opcode bytes, and clearing registers EDX,EAX to zero. Result: old processors run the new 64-bit timer code unchanged, returning the old low-speed 100Hz resolution.

1/3/2010 PART Partition manager Dusted off a partition manager PART, originally started in 2003, to wrap up my disk-partitioning ideas into a utility program. With the advent of BIOS BBS popups, it makes sense to determine the BIOS identities of all mass storage devices as ChaOS starts. So I have added an extra field to struct HDG{} which is set to BIOS id when the partition table on the mass storage device matches the sector returned by BIOS int 13h, read sector 1, head 0, cyl 0. At the moment I just read from BIOS devices 0x80 to 0x87, i.e. up to 8 BIOS boot devices. Dad has been struggling to install Debian Linux on a removable hard disk on his Windows XP box, and looking at his problems it is becoming clear that the GRUB Debian bootstrap cannot automatically handle the re-ordering of BIOS identities performed by BIOS BBS. Add this to the fact that Microsoft is determined to dominate MBR bootable partition 1, refuses to accept multiple bootable paritions in the MBR, it is small wonder that few people can be bothered with the hassle of configuring a multi-OS system. This small change now means ChaOS correctly identifies the hard disk AND the partition number from which it booted, and sets this to be the current drive on startup. ChaOS netname, already embedded in the partition table, now displayed in top left of screen banners. These changes vastly reduce the confusion when viewing multiple ChaOS computers on a network, especially when switching between systems on a KVM switch.

25/2/2010 Hard disk partitions Rationalising all my disks around the factory causes me to revisit FAT16 partitioning. I have long been aware that ChaOS FAT16s are slightly different to MSDOS FAT16, particularly as I add an extra blank sector to each FAT. This means DOS cannot read ChaOS partitions, but of course ChaOS reads MSDOS partitions just fine. But the extra FAT sector limits the maximum size of the FAT16, so today I added an option to create a DOS-compatible format, in order to mirror my various old MSDOS/DOS4GW partitions on to my main development drive for archiving and re-organisation. I�ve still not found a way to make MSDOS accept two MSDOS primary partitions (type 0x06)in the MBR, but I have spent some time re-implementing DOS extended partitions, and am adding code now to allow ChaOS to create extended partitions which DOS can read.

25/2/2010 ChaOS/Linux FTP Ok so I have spent a couple of weeks getting to understand Debian Linux, and bouncing IP packets to and from the ChaOS experimental servers. Every FTP client I have tried is different, so ChaOS FTPSRV has to gradually become all things to all clients. But I never cease to be amazed at the narrowness of some of the coding in top-line systems. For instance, when passing back a data port address to the Debian desktop in response to a PASV command, the Debian client hangs when spaces are embedded in the A1,A2,A3,A4,a1,a2 port string. This is in contrast to indusry standard Pure-FTP, which does not mind white space after the commas.

6/2/2010 ChaOS FTPSRV/ChaOS FTP/Debian FTP/Pure-FTP With the Debian box on the LAN, I am now improving FTPSRV, to make it more amenable to Linux clients. Also improving FTP, by bouncing packets off the Pure-FTP server. I would never have worked out how to configure Pure-FTP without browsing the ChaOS network packet buffer, and spotting the raw packets sent over the FTP control connection, detailing the exact source of errors in the server configuration files.

6/2/2010 Debian Linux Installed Debian Linux on one of the spare hard disks in the i7-920 machine, using the Debian business-card ISO rather than one of the multi-CD ISO releases. I need to use a Linux box as client and server to develop some secure internet protocols for ChaOS. I have tried a few Linux distros over the years, Red Hat 3, Suse Professional, and been unimpressed. Quite apart from the complexity, the main drawback for me has always been that Linux does not cope well being carried around on a removable hard disk. Most recently I tried Suse Enterprise on my quad Xeon IBM eServer x440. It works, but does not seem stable. After two restarts of the Apache server, the eServer goes into slow-motion and needs a full reboot (yawn). This latest Debian release installs in about 20 minutes, then boots and reboots faster than any Linux distro I have seen to date, so I think I can work with it. Whilst I still think Linux is overly complex, I did manage to download, install and configure a Pure-FTP server, with virtual user accounts and PureDB authentication.

21/1/2010 AP debugger using IPI Following on from fun to be had sending inter-processor interrupts to the APs, it follows that the same mechanism could be used to single-step an AP. Briefly: Set AP vector 1 to a handler which saves its stack pointer in a known memory location, sends IPI to BSP to signal that the AP is executing a single-step interrrupt, then HLT before IRET.  When BSP receives this interrupt, read saved stack pointer as IREGS*. IREGS->ebx (see below) contains cpu id of AP, and address of the process it is running. Call debug_kernel with this IREGS* to examine AP single step state. On return from debug_kernel, send AP NMI, to clear the HLT. Any changes made to the AP machine state by debug_kernel (e.g. clear trap flag) will be written to the AP as it exits the vector 1 handler. To cause AP to break into the debugger at any time, a similar IPI can be sent from the BSP to the AP. This time the handler need only set the trap flag on the AP, and return from interrupt. The mechanism above kicks in, and the system debugger gains control of the AP.

The AP interrupt handlers to do this are thus very simple. A real-mode interrupt handler to do this is a little more complicated (segment registers need to be passed). Such a handler would allow single-step debugging of an AP from almost its first instruction. Single-stepping an AP in real mode (or 64-bit mode!!!!), under the control of the BSP in protected mode will be very useful indeed. Reversing the roles, an AP could act as a debugger for the BSP, to single-step the bootstrap process. Rapidly thinking ahead, if the ChaOS bootstrap was remapped, to place the partition table at address 0x1000 instead of 0x500, the bootstrap would be entered by BSP or AP on receiving SIPI for 4k page 1. (i.e. *(UL*)0xfee00310=cpu<<24; *(UL*)0xfee00300=0x4601;). A beautifully simple way of demystifying the boot process on todays modern, multicore processors.

18/1/2010 MP/AP boot thunk Added argc and argv[] to AP context structure, filled in with copies of MP command line arguments located at the top of the AP stack. Stack pointer passed to AP adjusted to leave this area untouched when the AP boots. Now command line arguments can be passed to AP process being launched, e.g. MP 4 TEST arg1 arg2 arg3 launches TEST.XEC on cpu #4, and passes the command line TEST arg1 arg2 arg3 to the AP process main() function. Now I have simple method of launching test programs on the APs. Obviously semaphores are now needed to stop multiple CPUs banging into each other, but these can be added as needed. I should have semaphores around functions such as addprocess(), destroyprocess(), but those can come later. My gut feeling is that just two semaphores, one to serialise access to the system memory allocator and another to serialise access to the filesystem may be all that are needed to launch the compiler on an AP. There will be a slight holdup if one processor tries to load a file whilst another is reading or saving a large file. But this is much easier than trying share access to the disks at a lower level, as the hard disk driver cannot know which disk accesses might trash the filesystems!

Further fun with APs: Tried SIPI targeted at the current AP, to see if AP can reboot itself. *(UL*)0xfee00310=cpu<<24; *(UL*)0xfee00300=0x4618; does nothing; but if INIT is sent first, i.e. *(UL*)0xfee00300=0x500; AP restarts 4k page 0x18 (linear address 0x18000)! I half-expected the INIT to kill the AP dead, but clearly the APIC hardware queues the two signals and the AP wakes up again. Self-suicidal reincarnation. Wacky!

17/1/2010 MP/AP boot thunk Having developed a simple semaphore mechanism, I am now improving MP and AP1 boot thunk towards an embedded command-line launcher for ChaOS .XECs to be run on the APs. AP1 is the launch code which runs on the AP when it receives a SIPI, now modified to switch straight into 32-bit protected mode after patching the code stream with one far jump into the 4Gb linear code segment. Code to do this is always messy, as the assembler struggles to generate correct relocations for an inline processor mode switch (moving from 16-bit real mode CS base to 32-bit zero-based addressing). I load a 16-bit register with a 16-bit relocation target, load a 32-bit register with a the AP 4k page index shifted left by 12, then OR the two registers together to create the 32-bit linear address. This kludge is only needed twice as the AP boots, once to patch the code stream, and once to load an AP register with the address of a context structure describing the process being launched on the AP (stack pointer, entry point etc). From the context pointer, any amount of setup information can be passed to the AP before the call to main() of a standard ChaOS .XEC program.

Cleaning up is easy, if destroyprocess() is called for a non-zero cpu id, an INIT is sent by the local APIC of the BSP to kill the AP, and all memory blocks allocated by the AP are freed.

For fun, tried calling term() after AP exits main(). AP thread can be debugged step-by-step right up until the instruction which writes INIT to the AP local APIC. AP stops, and control falls back to cli() on BSP. So a processor can kill itself with an INIT, but another one will need to clean up afterwards!

15/1/2010 ChaOS 64-bit As always, development is incremental, but moving steadily towards a native 64bit system. ChaOS64 will requires a minimum of 2 cores, to allow BIOS calls to be used without a mode change on the BSP. This is not a problem as 64 bit processors have more than one core. I am less sure about the the development path for the compiler to generate 64-bit code. I will try a new compiler for 64 bit data types and functions, a 32-bit function modifier in the existing compiler to generate 64 bit stack frames, and an initial far call to switch to a 64-bit code segment. This has to be via a 128-byte aligned 80-bit memory pointer. Once in 64-bit mode, 64-bit subfunctions can be developed.

15/1/2010 MP Semaphores Added simple semaphore to MP TEST program, to investigate serialization of access to screen memory. Without any scheduling algorithms, two simple steps result in adequate time sharing with up to 7 Nehalem APs contending the same semaphore. First, CMPXCHG needs a LOCK prefix for the semaphore to be correctly acquired. Second, after writing zero to free the semaphore, a WBINVD instruction resynchronises the processor caches. CPU id 1 is favoured slightly by this random algorithm.

Lower 4 bits EBX (process ID) now carry the CPU ID of the core running the process, which will usually be the value written when a semaphore is acquired.

APs can be stopped by a HLT and started by NMI as required, rather than try to implement complex scheduling algorithms. This will keep code simpler, and power consumption to a minimum.

15/1/2010 Local Heaps LHEAP EBX register has always been reserved in ChaOS, just refined ChaOS kernel to use EBX register to carry the current thread ID at all times. This means operating system functions can be written which act on the current process without reference to global data. Very useful. Added a simple local heap system (LHEAP), using a pointer in the process header referenced by EBX. LHEAP contains a block chain pointer, to allow allocation of extra space when the local heap is full. This is an advance on the present ChaOS main heap, which has to be a contiguous block (limiting TOM to 3Gb). By widening LHEAP pointers to 64bits, this code can be recycled to run the main heap for ChaOS64.

8/1/2010 IBM eServer 440/SCSI RAID Read/write support for BIOS drives, added a few weeks ago to get ChaOS running on an Atom N270 netbook via USB, has yielded an unexpected bonus - the BIOS hard disk driver will read and write hard disks all over the place - including the SCSI RAID on the old eServer. The main problem with using BIOS for disk support has always been processor mode switching between real and protected mode, which is awkward when lots of protected mode interrupts are happening (e.g. USB/network/mouse activity etc). Linux uses a VMM for BIOS calls which is even more complicated. It is usually OK to ignore mouse and network interrupts, but USB stalls if interrupt service is delayed. This problem can now be resolved by using a second processor core, running in real mode, to provide a BIOS call service to the other cores. Mixed mode interrupt handlers will become a thing of the past.

3/1/2010 NCC NCL NED NEX Continuing the current theme of a single self-modifying OS image, OS1 v1.02.25944 contains NCC (inbuilt C compiler), NCL inbuilt linker, NED (inbuilt editor) and NEX (inbuilt system browser and source file extractor). 117 compilation units, many of which are shared with concurrent projects OS,CC,CL,ED and EX. Building these as a monolith takes longer than linking them into 5 separate projects, but has to be weighed against the time that can be wasted when the compilers and operating system are disconnected by a phase error.

2/1/2010 Longfilenames now permanently ON ChaOS has long had longfilename support, but doesn�t write longfilenames unless a flag (LFN) is set in the LDRIVE. Spent today bomb-testing the compilers with LFN switched on, surprisingly few issues came up - mainly problems caused by switching LFN off, rather than on (leaving orphan LFN directory entries after deleting an 8:3 directory entry). COPYTREE works fine, even when copying from a non-LFN drive. I will run for a while using a batch file to switch LFN on as ChaOS starts, if no serious problems arise I will lock LFN on when LDRIVES are initialised.

This change had to come, since HTPSRV and FTPSRV need long filenames.

1/1/2010 ChaOS v1.02.nnnnn with embedded compiler/linker ChaOS sub-version advanced to 1.02, to mark the increase in user-defined types to 8192 per process. This limit can now be easily increased if necessary. Some stats: First build of ChaOS with compiler and linker as inbuilt commands resulted in 4238 user-defined types, just a little over the previous 4096 limitation. 108 compilation units generated 1122 embedded source records, 109 header files and 905 duplicates. Linker generated 16248 symbols, 30258 relocations (which are condensed to 15774 for the loader). The resulting OS image, including all these system tables is 8.5Mb. Code is 800316 bytes, Data 673600. System table:program size ratio is 5:1.

The combined OS/compiler throws up a few issues, mainly global variables which needed to be reset to run the compiler a second time. A slightly bigger problem is that the compiler allocates thousands of blocks on the system heap, which need to be garbaged at the end of the compilation unit. This is far more time-consuming than discarding the process heap for a separate compiler process. Initally the inbuilt compiler was 40 times slower than CC.XEC. By modifying the system garbage collector freeprocessheapitems(), speed was doubled by merging adjacent heap items belonging to the process being killed, and calling free() only once for each merged block. This is important as tens of thousands of heap nodes can be created during compilation. By adding a new global variable firstfree to the system malloc() - to hold the address of the main free memory block - memory allocation speed increases by more than an order of magnitude. Inbuilt compiler is now only 10% slower than CC.XEC, but has access to 3Gb of system memory.

Version 1.02.25913 just passed the acid test, and recompiled and rebooted itself; one bootable OS file produced v1.02.25914, on New Years Day 2010.