KS Microcode Version 124 This document describes the changes made to the KS10 Microcode version 123 to create version 124 for TOPS-10 7.03. All of the external changes are controlled by conditional assemblys, so it is possible to generate an externally identical microcode. The result of assembling with all of the new conditionals off will not be identical binary to version 123 because part of the BLT instruction setup was moved to a subroutine. There is no performance penalty if no new features are used. Performance changes associated with each feature are discussed below. Feature Description ------- ----------------------------------------------------------------- INHCST Enables code which prevents the CST from being written if the CST base address is specified as zero. It allows the CST to be written if a non-zero base address is specified. Costs several CSB=0 tests on a page refill. NOCST Suppresses all code to update the CST. Avoids the CSB=0 tests, but would not allow TOPS-20 to run from this microcode. Any time that the CST isn't written, memory references are saved. 7.03 requires either INHCST or NOCST. KIPAGE Enables code to support KI paging. A KI paging microcode is still needed for diagnostics. KLPAGE Enables code to support KL paging. KL paging is now used by both TOPS-10 and TOPS-20. Turning off either KI or KL paging provides the microcode space needed for UBABLT, without requiring changing the DROM chips. UBABLT Enables code to support the new BLTBU and BLTUB instructions. These instructions are discussed below in detail. ---------- All of the new assemblys will cause the APRID word's microcode options field to be non-zero. Bits are defined as follows: Bit Meaning --- --------------------------------------------------------- 0 Inhibit CST update code is included in this Ucode. 1 No CST update code is included in this Ucode. 2 This microcode is "non-standard". Same bit as KL APRID. 3 UBABLT instructions are included in this Ucode. 4 KI Paging is present in this Ucode. 5 KL Paging is present in this Ucode. Note that if bit 4 & bit 5 equal zero, both KI and KL paging are defined to be present for compatibility with previous microcodes. Turning off either KI or KL paging saves the internal tests mades for every page-fail or UUO to determine which format the UPT is in. UBABLT instructions. The KS10 spends a large amount of time re-formatting data from 36-bit to Unibus format. Previous studies have shown that the time spent doing this is a limiting factor in network (ANF/DECnet) bandwidth. Although the KDP currently used for a network interface has other limits, the DMR and DEUNA do not. 7.03 is expected to include a DEUNA driver for the KS10, both to support DECnet and LAT. The best known software implementation of the required byteswapping requires on the order of 20 memory references per 36-bit word transformed, plus loop setup. The proposed instructions should make effectively two memory references per word. Although the microcode is currently believed to be complete, performance characterization has not yet been done. Instruction formats: BLTUB __________________________________________________________ |717 | AC | I | X | Y | ---------------------------------------------------------- 0 8 9 12 13 14 17 18 35 The BLTUB instruction will transform data in Unibus (MACY11) format into PDP-10 byte-string format. The source data is required to be left-aligned in a PDP-10 word. The destination data will be a left-aligned, zero-offset PDP-10 byte stream. BLTBU __________________________________________________________ |716 | AC | I | X | Y | ---------------------------------------------------------- 0 8 9 12 13 14 17 18 35 The BLTBU instruction will transform data in PDP-10 byte-string format into Unibus (MACY11) format. The source data is required to be a left-aligned, zero-offset PDP-10 byte string. The destination data will be left-aligned on a PDP-10 word. Both instructions are processed exactly as a BLT instruction is, are interruptable between words, and return the same result in AC. The only difference is that a BLT moves the data intact, while the UBABLT instructions transform the data during the move. Also, there is no "special" "clear core" case. Of course, pagefails work correctly, and PXCT works exactly as on a BLT (Eg, one can blt to/from user buffers' virtual addresses). Format definitions: PDP-10 byte string: BYTE (8) 1,2,3,4 (4)undefined Equivalent UBA (MACY11) string: BYTE (2)undef (8)2,1 (2)undef (8)4,3 The undefined bits will end up as zero in this implementation, but probably shouldn't be spec'ed as such. Performance observations: The current MACRO code to do BLTBU looks something like: MOVNI CNT,3(CNT) ASH CNT,-2 MOVNS CNT HRRI CNT,SOURCE LOOP: MOVE T,(CNT) LSH T,-4 DPB T,BYTABL+4 LSH T,-8 DPB T,BYTABL+3 LSH T,-8 DPB T,BYTABL+2 LSH T,-8 DPB T,BYTABL+1 AOBJN CNT,LOOP Where source and destination are assumed to be the same, and BYTABL is a table of byte-pointers indexed by CNT. This is 19 memory references per word in the LOOP. (3 refs/DPB x 4 bytes + 4 LSH + 2 MOVE + AOBJN ) BLTBU would make 2. Further, DPB does shifting and masking that BLTBU does not. BLTBU shifts 2 bytes at a time, for a total of 19 shifts (1 bit each) for the entire word. In addition, in practice DECNET often copies a whole message into a UBA-mapped buffer, then swaps it in place. UBABLTs can further reduce the cost by replacing the MOVSLJ / BLT to get the data to the buffer. Obviously in the (more rare) case where multiple MSD's are chained, the transformation will have to remain in place, as UBABLTs can't start in the middle of a string. Other Notes: 1. Why not use EXTEND? The KS would have had to reserve many words (16) of CRAM just to dispatch. The performance would not have been as high, since microcoding byte refs (rather than the current 36-bit word) would have been slower. Optimization is possible, but would have required huge amounts of microcode. Anyhow, for the intended use, the "word-aligned" restriction seems a small price to pay for the blinding speed. 2. Other uses I'm told that the KLIPA/KLNIA drivers would like this, but I couldn't find un-EXTENDed opcodes that KL DRAM could decode directly. FILEX, MACY11, MAC36, TKB36, FAL, NFT, and others who import/export PDP-11 files could benefit from these instructions. Other KS devices could use this; if LPTSPL (for example) would use 8-bit bytes on the KS (Likely due to 8-bit ASCII soon anyhow), the monitor would not have to swap user LPT data. 3. Why these opcodes? I needed a block of IO instructions that I could use the AC field of. I wanted them close to KL DRAM, so if the KL chose to, it could implement the instructions. All the other opcodes were either assigned, or their dispatch is hard- coded in ROM in the KS DROM. (Eg "reserved MUUOs") 4. Why not restrict to IO-Legal use? I could be pursuaded. It would cost a microword. But it would prevent users from making use of the instructions. What with "integration", I expect more and more data exchange with PDP-11s, large or small. Any performance gain seems worthwhile. If this is not done, the documentation will want to flag these instructions as KS-only. If it is, they get documented as KS system operations.