Keith Packard 1
By taking advantage of Microsoft's NT architecture, WinCenterPro directly converts NT graphics calls into X protocol [Scheifler92] . The resulting multi-user NT application server can provide a complete Windows application environment simultaneously to many X desktop devices with performance similar to modern accelerated native NT graphics devices. The overall architecture of the NT window system, WinCenterPro and some of the X related tricks will be presented.
While Microsoft doesn't call the Client Server Runtime System Service (CSRSS) a "window system server", it performs the same functions as an X window system server. The CSRSS listens for connections on a well known port and services window system requests from client applications. The IPC rendezvous uses the NT Local Procedure Call (LPC) system; a simple duplex byte stream with some kernel primitives which are optimized for RPC to reduce latency, (e.g. send and await reply).
Unfortunately, this mechanism isn't very fast. After the LPC rendezvous, the window system server and client application create a shared memory section between themselves to transfer data. This shared memory segment is 64K long and limits the maximum length of data transferred in one context switch.
For each client application which connects to the window system server, a separate thread is created to process requests for that client. This is similar to the MT X server architecture developed several years ago.
Because of the many display cards available for the PC architecture, the NT window system splits out the display driver into a separate shared library, which is loaded dynamically by the window system server at initialization time. This is similar in intent to the X server DIX/DDX split, but provides a somewhat lower level of interaction operating below the concept of windows, dealing instead with clip lists.
The miniport driver provides information about mapping I/O ports and physical address ranges into the window system server process for use by the user mode display driver. The miniport driver also handles switches to full-screen mode and other low-performance, and highly card specific stuff.
When an NT Win32 application starts, it connects to the window system server via the well known LPC port "\Windows\ApiPort". This name is not configurable, so on a regular NT system, there can be only one window system server. The NT system provided with WinCenterPro has been augmented to provide an inherited process attribute which indicates the window system server to connect to. Attempts to connect to "\Windows\ApiPort" are redirected to the correct server at the kernel level.
WinCenterPro adds to the basic NT architecture two new services, a WinCenterPro listening task, and a fully functional Unix-style RSH daemon. The RSH daemon is provided to allow existing systems to start WinCenterPro sessions and other NT applications without any custom software.
The WinCenterPro listener process is responsible for responding to requests for new sessions. It supports XDMCP [Packard89] which can be used from an X terminal to directly access the NT system. When started this way, WinCenterPro creates a full-screen window without any decoration which makes the X terminal appear to be running NT itself. The addition of a local floppy disk can turn an NCD X terminal into a low cost NT desktop.
The RSH service recognizes "wincenter" as a special command and passes it to the WinCenterPro listener service. This command accepts arguments to configure the geometry and color model. The listener starts a session which appears as a single X application window, managed by the regular X window manager. All NT applications appear within this single window which can be resized, moved or iconified. Because the NT display surface is not resizable, if the WinCenterPro window is shrunk, scroll bars appear to navigate around NT desktop.
In a regular NT environment, the window system server is started as a part of the boot sequence by the login process. The window system server opens each available video device, looking for something it can communicate with. When a suitable device is found, the window system server loads the appropriate user-mode display driver and then the login process can put up the "Press Ctrl-Alt-Del to login" message.
WinCenterPro operates similarly. For each configured "WinStation", a login process is created which then starts the window system server. Instead of opening each available video device, the window system server waits in the initialization code for the WinCenterPro listener service to start a new session.
When a request for a session is processed by the listener, it attempts to create a network connection to the indicated X display device. If this is successful, the listener passes the network connection over to one of the waiting window system server processes (using the DuplicateHandle NT API call) and then wakes up the window system server.
The window system server then loads the X specific display driver. This is a regular NT display driver with one difference: there isn't any underlying kernel mode driver. With that exception, the entire interaction between the window system server and the user mode display driver is identical to other NT display drivers.
The WinCenterPro display driver is nothing more than a regular NT display driver which sends X protocol over the network instead of communicating with the registers, I/O ports and DMA regions associated with a local graphics device. Because WinCenterPro uses a regular NT display driver, there are no operational differences compared with running NT on the local display hardware. Any NT Windows application that works on the system console will work on an attached X display. WinCenterPro required no substantive changes in the NT window system, except to allow more than one window system server to run at the same time. This architecture ensures 100% application compatibility.
The other alternative was to change the window system server to map top level NT application windows into separate top level X windows. Attempting to mix the X and NT window models runs into many compatibility problems with existing Windows/NT applications. Because the NT window system is based on the availability of fast synchronous communication between client and server, many operations which in X have been made asynchronous for improved performance are left synchronous in the NT world. Reconciling these differences is difficult and best left for a future project.
The NT window system server allows much, but requires little of the display driver. Functionality which may be implemented in the display driver can also be emulated by the window system server with some reduction in performance. For a display driver implementor, this allows a fully functional system to be up and running in a short amount of time. This is similar, but slightly different from the X server architecture which provides emulation of much rendering functionality in the MI layer, but even more pervasive--the NT display driver can refuse to implement BitBlt with any but a simple Copy raster operation and the NT window system will emulate the rest.
In the X window system, there aren't many graphics objects, just drawables, graphics contexts and fonts. NT provides a wealth of different objects: brushes, clip lists, fonts, paths, strings, surfaces, windows, transformations, and translations. There is no overall rendering context in NT. Each rendering primitive in the display driver is given all of the appropriate objects needed for the rendering operation.
Brushes provide color information to the rendering primitives; they can be either solid, patterned, or dithered. The window system server passes RGB values to the driver which is allowed to create an appropriate dither pattern from whatever fixed palette of colors it has available, usually the sixteen or twenty default Windows colors.
Clip lists are generally accessed as a list of rectangles although the internal representation is opaque to the display driver. They are used both to clip rendering operations, and to fill more than one rectangle in a single driver call.
The display screen is represented by a single surface object. Each off-screen pixmap is another surface. Because the screen is represented as a single surface, all windowing operations are hidden from the display driver. This is a simplification which is welcome most of the time, but does make it difficult to support multiple depths or "visuals" in the X sense. With the addition of OpenGL to NT, windows are now sort-of available to the display driver. Applications can select the desired color resolution per window, but not with the ease X applications are used to.
The NT rendering model provides scaling and rotation in the API. Most of the time this is irrelevant to the display driver; the display driver is provided 28.4 bit fixed point coordinates in device space for all rendering operations. However, the driver is given the transformation to tune the appearance of some objects.
At the application level, NT provides a virtual palette for each window and pixmap. On the screen, RGB values are dynamically mapped to pixel values which are allocated in priority order, the active application allocating pixel values first, then inactive ones following along. If too few pixel values are available to satisfy every request, some are mapped to the nearest available color. When copying data from one window or pixmap to another, the window system server automatically computes a mapping to convert the pixel values in the source into the nearest available destination values. The dynamic mapping between pixel values is presented to the display driver in a translation object. A translation object is either a simple pixel to pixel value mapping, or an opaque structure which is passed to XLATEOBJ_iXlate along with the source pixel value to generate the appropriate destination pixel value.
To implement each of the NT rendering functions, the WinCenterPro display driver directly generates X protocol. In each case, the common cases are handled by generating the obvious X protocol. For infrequently used operations, WinCenterPro has the window system server emulate the rendering operation with a sequence of simpler operations.
DrvBitBlt is complicated as it handles many different image formats and raster operations. It is required to be able to convert between any of the standard NT image formats and the device image format. It is given color translations and has four operands to the final blt: source, destination, pattern and mask. Depending on the raster operation, all but destination may not be provided. Each complicating factor is handled separately by WinCenterPro.
When the operation is too complex to manage easily with X operations, WinCenterPro falls back to the window system server which simply accumulates each of the operands locally in temporary pixmaps, performs the operation and replaces the destination with the result. At first, there were many such operations; some analysis of existing Windows applications was done to identify which operations needed to be managed without all of the wire image traffic.
While NT supports four operand rasterops, the fourth operand is a mask used as a region of interest; areas where the mask is set are operated on, other areas are left alone. This matches the X notion of a clip mask. However, at this point, having never seen this used in an application, WinCenterPro lets such requests fall back to the window system server.
The other three operands are usually used as two operand rasterops in one of two ways, either to paint a rectangle with a pattern (dest = dest rop pattern) or to copy bits around (dest = dest rop source). Either of these two cases can be handled trivially by converting the resulting rop into an operation directly supported by the X protocol. Occasionally, however, an operation is given which does use all three operands. In this case, a table is used which converts the common three operand raster operations into sequences of binary operations for those operations which don't need an intermediate temporary pixmap. For instance, the PATPAINT operation
Out of 256 three operand raster operations, 152 can be performed with similar sequences. The other 104 operations would need an intermediate X pixmap but are so rare that WinCenterPro lets them fall back to the window system server.
The final function, DrvTextOut is implemented by loading the NT font into the X server (described in detail below) and using the regular X text rendering requests, or by scan converting the glyphs and filling them with PolyFillRectangle. While the glyphs can be positioned arbitrarily, the most common case uses the default character spacing defined in the font and uses a horizontal baseline. This case is directly supported by PolyText8 or PolyText16; when the horizontal position doesn't quite match up, WinCenterPro is able to use the ability of PolyText8 to adjust sets of glyphs by small amounts. For text which is drawn along a non-horizontal baseline, WinCenterPro always renders it by scan converting the glyphs and filling them with PolyFillRectangle. This results in reasonable performance and avoids generating a large number of fonts for which only a few glyphs are ever used.
During initialization, WinCenterPro generates a list of visuals which support each color model. The user specifies which model is desired, and then WinCenterPro chooses either a visual which supports that model if one is available, otherwise it chooses the best available model.
This model requires that black pixel be zero (and white be one). If the X server does not provide a static grey visual for which this is the case, WinCenterPro modifies all raster operations so that the result will appear correct on the screen. This modification is straightforward and can be used in any situation where the pixel values must be inverted from their true sense on the screen. For binary X raster operations, there are two cases:
In this model, the sixteen colors are conventionally the stock sixteen colors found on a VGA adapter. In any case, they are not modifiable. WinCenterPro supports this model with either a pseudo color visual with at least sixteen colormap entries, or with a static color visual with precisely sixteen colormap entries. In the latter case, WinCenterPro assumes that the colors in the colormap correspond closely to the expected values., The 16 pixel values must be allocated so that raster operations will have the expected results. For a pseudo color visual under X that's easily done by allocating four planes to generate 16 unique pixel values with the correct logical relationship.
Both of these are directly supported by X; in either case, if a visual is found which exactly matches the required model, it can be used directly. WinCenterPro does create a new colormap for 256 color mode, so unless the X server hardware supports multiple hardware colormaps, the screen will flash when the WinCenterPro window is activated.
WinCenterPro includes a unique solution to providing high performance text rendering in the X environment. It dynamically generates X fonts from NT fonts which are delivered to the X server via the X Font Service Protocol [Fulton91] .
At the display device driver level, there is no easy way to identify which NT font is being used. All that the display device is given is a pointer to the font object containing little more than the glyphs. These are identified by 16 bit values which may have no relationship with the encoding of the source text.
Instead of attempting to identify the font and access it from the original source, WinCenterPro directly uses the font as provided to the rendering function. The NT font is converted into an X font and passed (via shared memory) to another process which functions as a limited X font server. A name is generated for this font which is designed to be unique across all WinCenterPro sessions as well as not match any existing X fonts:
The font server handles only names in this form. It further avoids confusion by returning empty lists for any ListFonts or ListFontsWithXInfo requests. By restricting the scope of the font server to precisely what was needed, the WinCenterPro font server contains a fraction of the code needed for a complete X font server.
As the original font encoding is unknown to the display driver, WinCenterPro encodes the glyphs sequentially as found in the font. While the order is not specified to concur with the font encoding, in fact it most often does. For fonts which encode the Windows character set, WinCenterPro ends up with an encoding which is 32 less than the original glyph encodings. While this cannot be relied upon, it does provide a useful optimization.
Because NT applications use many fonts, and because WinCenterPro cannot load them until the first rendering request using them is made, a significant delay may occur while font is loaded. To minimize this, and also to minimize the amount of memory consumed in the X server by fonts, WinCenterPro loads only the portion of the font currently in use. The MaxChar field in the font name identifies the number of characters to load out of the font. The WinCenterPro font server recognizes this and automatically generates a subset of the font to provide to the X server. Most Windows TrueType fonts contain nearly 600 characters for a wide variety of potential encodings, however they end up encoding ASCII in the first 95 and the Windows character set in the first 191. WinCenterPro saves a significant amount of time and memory by reducing most fonts loaded over the network into the X server to one of these subsets. When a text rendering request is made which doesn't fit the currently loaded subset, WinCenterPro automatically loads the larger set.
The only access to TCP/IP available in stock NT is via the WinSock API. This API does not meld well with the NT kernel I/O model and as such suffers badly in performance and functionality. Because WinCenterPro directly generates X protocol which may well be interpreted by a significantly faster machine, in many environments the speed with which NT can generate network traffic is the performance limiting factor. To solve this problem, WinCenterPro includes a custom kernel driver which communicates directly with the NT TCP/IP stack and exports I/O in the regular NT style: asynchronous overlapping DMA I/O requests with delayed acknowledgment. With this, even using a relatively slow ethernet adapter, WinCenterPro can get peak network utilization of over 80%.
As WinCenterPro is a multi-user system, performance must include both the peak benchmark speed as well as the number of users supportable from a particular machine: how much memory, network utilization and CPU each remote user consumes.
1. Keith Packard (email@example.com) is the chief architect of WinCenterPro and has been a member of the Network Computing Devices engineering staff since 1992. Prior to that, he was the senior staff member at the X Consortium. He has worked with and participated in the design of much of the X Window System since the early days of the X Consortium.