Performance Tuning Windows NT

22 April 2003

Written by Scott B. Suhy, Consultant with Microsoft Consulting Services, responsible for enterprise architecture, design, and optimization for Fortune 500 companies. Email Scottsu@microsoft.com

Baseline

Introduction

Would it not be nice if there were no traffic bottlenecks during your everyday task of going to work? No traffic lights, fender benders, car problems, detours, people pulling out in front of you, people in the left hand lane going less than the speed limit, four lane highways narrowing down to two lanes.... This is rather unrealistic, just like with a computer system it is unrealistic to expect at some point in time there will not be a limit to the amount of memory, CPU, or I/O being consumed by internal or external processes.

You might also say that it might be nice to know how long it was going to take you to get to work in the morning (with some expected normal variation). Users of a computer system have the same expectation. They expect their jobs to finish in an acceptable amount of time without bottlenecks in the system slowing them down.

If there were bottlenecks on your way to work each day, I suppose you could optimize or tune the trip (reduce bottlenecks) by possibly finding an alternate route, car pooling, taking advantage of a car pool lane, taking a bus, or even changing your working hours (possibly to the evening when there is no traffic and the only thing keeping you from getting to work any faster is the speed limit and possibly the size of your engine). Computer systems have the same optimizations (run jobs during off peak hours, etc.). As with transportation systems, there is also the same lack of environmental control with a computer system. For example, it is not realistic to think that there will always be the same amount of traffic (on the road or in your computer system), it's also not realistic to think that you have control over the traffic (on the road or in your computer system). Problems always occur (a rain storm causing increased slowdowns on the road or one user consuming a great deal of the bandwidth of the server's memory, CPU, or I/O). Managing, as well as expecting, the problems, and knowing what to do when they occur is the key.

Once you feel you have the trip optimized, you might also think about taking some statistics, daily, weekly, or monthly, such as the amount of time it takes you to arrive at the office, number of red lights you got rather than green, and so on. This type of information will allow you to make future decisions on such things as "If I stop to get gas in the morning, how much earlier will I have to leave the house?" Of course you would also have to know how much time it would take you at the gas station (another set of statistics). The same thing goes for your computer system. It's called Capacity Planning.

The following information provides you with tips on areas of the Microsoft® Windows NT™ operating system in which you should pay attention (What to Watch). It also gives you a few rules/guidelines to use to optimize the system (What You Can Do). Once you take each of these areas into consideration, your system should be optimized. Once you feel your system is optimized it is then time to gather data on current capacity. The data will allow you to do the following:

Project how much the workload at the memory, CPU, I/O, and bandwidth levels will increase in response to business growth and new Microsoft BackOffice applications.
Diagnose problems by comparing subsequent measurements.

This information is rather technical in nature and assumes that you already know a great deal about Microsoft Windows NT™ Workstation and Microsoft Windows NT Server operating systems. However, it only touches the surface of optimization. Many books could be written on the subject. Consequently, this paper neglects to explain many details and assumes you know where to get information about the hardware and software concepts mentioned. If you stumble upon a concept that is not explained in detail, you may want to refer to the Microsoft Windows NT Resource Kit, Server Message Block specification (which can be obtained from Microsoft), Microsoft TechNet, or any book that details network architecture (such as the book Local Area Networks by James Martin or LAN Times Encyclopedia of Networking by Tom Sheldon).

Definitions

Before diving into any Performance Tuning, it is necessary to go over some definitions and terms.

Task

For the purpose of this paper, I refer to the word task as a series of computer instructions, the execution of which involves work to be performed by one or more computer components or resources (for example, CPU, memory, hard disk, and network adapters).

The amount of time it takes to complete a task can be divided up among the several resources that are involved in the task's execution-some resources will be responsible for small amounts of the total time, others will be responsible for larger amounts.

Bottleneck

The single resource that consumes the most time during a task's execution is that task's bottleneck. Bottlenecks can occur because resources are not being used efficiently, resources are not being used fairly, or a resource is too slow or too small. Let me try to elaborate on this point with the following example.

Example.If a task takes 2.2 seconds to complete, with .2 seconds spent executing instructions in the CPU and 2 seconds retrieving data from the disk (assuming both are not overlapping in time), the disk is the bottleneck in the task. If the CPU were replaced with one twice as fast, task execution time would drop from 2.2 to 2.1 seconds. This would be approximately a 4.5% increase in productivity. However, if the disk controller were replaced with one twice as fast, it would drop the disk access time from 2 seconds to 1 second, dropping the total execution time from 2.2 to 1.2 seconds. This would be approximately a 45% increase in productivity.

It would be easy if the previous example were on a workstation running the Microsoft MS-DOS® operating system, but we are dealing with a multitasking OS. One thing to always keep in mind, especially in a multitasking OS, is that resolving one bottleneck will always lead to the next one.

Windows NT System Tuning

The goal in tuning Windows NT is to determine what hardware resource is experiencing the greatest demand (bottleneck), and then adjusting the operation to relieve that demand and maximize total throughput. A system should be structured so that its resources are used efficiently and distributed fairly among the users. This is not as difficult as it sounds, assuming you use a few good rules/guidelines and have a thorough understanding of the computing environment. For example, in a file and print server environment, most of the activity at the server is in support of file and print services. This tends to cause high disk utilization because of the large number of files being opened and closed. It also causes the network interface card(s) to endure a heavy load because of the large amount of data that is being transferred. Memory typically does not get a heavy load in this environment (memory usage however can be heavy due to the large amount of system memory that may be allocated to file system cache). Processor utilization is also typically low in this environment. In contrast, a server application environment (for example, other Microsoft BackOffice products such as Microsoft SQL Server™ database server for PC networks, Microsoft Mail electronic mail system, Microsoft Systems Management Server centralized management for distributed systems, and Microsoft SNA Server) is much more processor and memory bound than a typical file and print server environment because much more actual processing is taking place at the server. The disk and network tend to be less utilized, due to a smaller amount of data being sent over the wire and to the disk. Understanding these generalizations is not enough; the only way to get an idea of the utilization of the resources is to monitor them, and one of the most powerful tools that you can use is the Windows NT Performance Monitor.

Performance Monitor is a graphical tool for measuring the performance of your own Windows NT-based computer or other Windows NT-based computers on a network. It is located in the Administrative Tools group of both the Windows NT Workstation and Windows NT Server products. On each computer, you can view the behavior of objects such as processors, memory, cache, threads, and processes. Each of these objects has an associated set of counters that provide information on such things as device usage, queue lengths, and delays, as well as information used for throughput and internal congestion measurements. It provides charting, alerting, and reporting capabilities that reflect current activity along with ongoing logging. You can also open log files at a later time for browsing and charting as if they were reflecting current activity.

Before spending money to add more hardware or replace existing hardware with faster, it's best to use Performance Monitor to first tune the system to make the most efficient use of existing resources. Here are a couple of examples of where the tool may be useful:

Example. If we find that the CPU is 100% utilized, before replacing it with a faster CPU or adding another one, we should identify and analyze the process that is utilizing the bulk of the CPU time. We may find that the processor cycles are being consumed by a disk controller requiring PIO. In this case a DMA disk controller will then reduce processor utilization.

Example. If we determine the hard disk is full, before adding additional disk drives, identify how much of the page file is being utilized. You may find that the system page file size is initialized at 100 MB, but there is never more than 40 MB of it being used. Instead of purchasing another disk, we could adjust the size of the page file.

Typical Questions

If you talk to our product support engineers or our consultants in the field and ask them about the tuning questions they most frequently hear, you may find the following:

How do I determine how well an application is performing?

How can I support my environment in a proactive manner?

How do I know what component of my system is the most limiting (the bottleneck)?

How can I ensure my system is performing the best it possibly can perform?

How do I determine what size system I need based on the following criteria?

How do I know when to upgrade?

All of these questions play some part in performance tuning. We are going to focus mostly on answering questions 2, 3, and 4, primarily by focusing our attention on exploring each of the primary components of a computer system-the memory, processor, and the I/O subsystem (e.g., disks and networks). From this standpoint, performance tuning means ensuring that every user gets a fair share of available resources of the entire system. Once you feel you have 2, 3, and 4 under control, you can start focusing on 5 and 6, which are more capacity planning issues. Once you have 5 and 6 under control, you will be able to answer number 1, and more important, do "What If" analysis.

Tuning for "Memory" Performance

Lack of memory is by far the most common cause of serious performance problems in computer systems. If you read no further in this document you could just answer by saying "Memory!", if anyone ever asks you how to improve the performance of a system.

Memory contention arises when the memory requirements of the active processes exceed the physical memory available on the system; at this point, the system is out of memory. To handle this lack of memory the system starts paging (moving portions of active processes to disk in order to reclaim physical memory). At this point, performance decreases dramatically. Consider the following example. If the average instruction in a computer takes approximately 100 nanoseconds to execute and disk access takes somewhere on the order of 10s of milliseconds, how many times slower would the machine run, if there were 1 paging operation per instruction? If you answered 100,000 you would be correct! Let's hope things don't get that bad....

To optimize overall performance, steps must be taken to ensure that main memory is used as efficiently as possible and thus paging is held to a minimum. As you will see in the next section, you can tell how loaded system memory is by watching how the system pages.

What to Watch

The Performance Monitor counter "Memory Pages/sec" is the number of pages read from the disk or written to the disk to resolve memory references to pages that were not in memory at the time of the reference. As a rule, you can assume that if the average of this counter is consistently greater than 5, then memory is probably becoming a bottleneck in the system. Once this counter starts to average consistently at 10 or above, performance is significantly degraded and disk thrashing is probably occurring.
If the actual size of the page file is greater than its initial size (typically physical RAM + 12), time is being spent growing the page file and dealing with page file fragmentation. It is best that the page file not be required to grow during the operation of the system because it adds time to the paging processes (additional disk access to allocate the needed sectors, update any allocation, and free sector tables used by the various file systems). Another result of this behavior is fragmentation, causing the file to exist on many areas of the disk (the initial page file is created using contiguous disk space).
A quick way to tell if your system is struggling for memory is to call up WINMSD.EXE (located in %System Root%\system32) and look at the Memory dialog.

It details the total memory in your system, the current available memory ready for allocation to applications you may start, available space within your page file, and the Memory Load Index. The Memory Load Index specifies a number between 0 and 100 that gives a general idea of current memory utilization, in which 0 indicates no memory use and 100 indicates full memory use. This dialog is built with a call to the Microsoft Win32® application programming interface GlobalMemoryStatus() in the SDK.
The counter "Memory Available Bytes" displays the amount of free physical memory. If this counter stays consistently below 1 MB on servers and 4 MB on workstations, paging is occurring and performance is less than optimal.
"Memory Committed Bytes" displays the size of virtual memory (in bytes) that has been committed (as opposed to simply reserved). If this counter is greater than the amount of main memory, it indicates that main memory MAY not be large enough to accommodate all functions of all currently active processes-some paging MAY be inevitable. However, before making such an assumption, you should check "Memory Pages/sec" and "Memory Page Faults/sec." If the "Memory Pages/sec" is greater than 10 (10 is a reasonable guideline, but varies with disk hardware) and "Memory Page Faults/sec" is greater than "Memory Cache Faults/sec" then you are paging too much.
If you are trying to determine if adding more memory to your system will benefit your Microsoft SQL Server system, then you may want to monitor the "SQLServer Cache Hit Ratio" while the system is under a typical load. If the hit ratio is relatively high (over 90%), adding more memory will usually not be beneficial. This is because additional memory can mainly be used for additional Microsoft SQL Server data cache, thereby increasing the hit ratio. In this case, the hit ratio is already high, and the maximum available improvement quite small. If the hit ratio is consistently lower than this, adding more memory may improve the hit ratio and thereby performance, if the locality of reference is such that it can be "bracketed" by economically or technically feasible amounts of memory.
When "Memory Committed bytes" approaches the "Memory Commit Limit"-and the page file has already reached maximum page file size, there are simply no more pages available, in main memory or in the page file. The "Memory Commit Limit" is the amount of virtual memory that can be committed without extending the page file. If this occurs on a server running Windows NT Server, you may experience 3 errors in the Event Log. (EVENTVWR.EXE is located in the Administrative Tools group). They are from the source: SRV.
2020: The server was unable to allocate from the system paged pool because the pool was empty.
2001: The server was unable to perform an operation due to a shortage of available resources.
2016: The server was unable to allocate virtual memory.
If this occurs, it is generally related to a memory leak in another process. To determine the process at fault you can monitor each process's Page File bytes or Working Set.
Another condition you may want to be aware of is the following nonpaged pool error in the server's Event Log:
2019: The server was unable to allocate from the system nonpaged pool because the pool was empty.
Nonpaged pool pages cannot be paged out to the paging file, but instead remain in main memory as long as they are allocated. NonPagedPoolSize is calculated using complex algorithms based on physical memory size. However, you can use the following formulas to 'approximate' these values for an X86-based computer.

MinimumNonPagedPoolSize = 256K
MinAdditionNonPagedPoolPerMb = 32K
DefaultMaximumNonPagedPool = 1 MB
MaxAdditionNonPagedPoolPerMb = 400K
PAGE_SIZE=4096

NonPagedPoolSize = MinimumNonPagedPoolSize +
((Physical MB - 4) * MinAdditionNonPagedPoolPerMB)

Example. On a 32 MB x86-based computer:

MinimumNonPagedPoolSize = 256K
NonPagedPoolSize = 256K + ((32 - 4) * 32K) = 1.2 MB

MaximumNonPagedPoolSize = DefaultMaximumNonPagedPool +
((Physical MB - 4) * MaxAdditionNonPagedPoolPerMB)

If MaximumNonPagedPoolSize < (NonPagedPoolSize + PAGE_SIZE * 16),
then MaximumNonPagedPoolSize = (NonPagedPoolSize + PAGE_SIZE *16)

Example. On a 32 MB x86-based computer:

MaximumNonPagedPoolSize = 1 MB + ((32 - 4) * 400K) = 12.5 MB

You can monitor the system's nonpaged pool allocation with the "Memory Pool Non Paged Bytes" counter. If there is a shortage of nonpaged pool, you may also see the following error on a remote system or even the local system:
Not enough storage available to process this command.
If this occurs, start looking at each process's nonpaged pool allocation. This is generally caused by an application incorrectly making system calls and using up all allocated nonpaged pool.
If you are concerned that one application is consuming a great deal of memory (paged or nonpaged) then you may want to use a utility such as the Win32 Software Development Kit utility PMON.EXE (this is also included in the Windows NT Resource Kit volume 3 utilities) to monitor its load on the system. At the top of the PMON display you see some system global statistics: memory size and available bytes, the virtual memory commitment, and pool sizes. Then, for each process, PMON shows processor usage during the last update interval. The next column is total processor time. The third column is how many pages each process is using, and then the change since the last update. PMON also shows how many Page Faults have occurred in the process and the change since the last update. Next is the virtual memory commitment charge, and then the pool usage estimates for the process. Finally you see process priority and the number of threads. There's nothing here that is not in Performance Monitor (you could get the same information by looking at such counters as "Process Page Faults/sec"), but it is a very handy overview and is quicker to start up, as well as being "preconfigured" to show you the system at a glance. Here is how it looks:

img00002.gif (27828 bytes)

What You Can Do

Schedule memory-intensive applications during off-peak hours. You can use the AT scheduler that ships with Windows NT.
Distribute memory-intensive applications/processes across multiple machines.
Add more memory. To determine ABOUT how much memory to add, use the following formula:
"Paging File % Usage MAX" * Page file size = number of bytes used

Add together the bytes used for all page files. This is the amount of memory that would need to be added to allow all of the applications to perform their operations with minimum paging. For example, if your page file is 100 MB and the % Usage MAX is 20%, then you would need 20 MB additional RAM to have a system that does minimal paging. The reason this formula only gives you an idea ABOUT how much memory to add is that a) not all page file "in use" code is accessed all of the time; and b) the formula ignores the requirements for code and mapped files not backed by the paging file. Therefore this estimate is neither an upper bound, nor a lower bound-it is only an "indication." The truth is that there is no good way to know how much memory to add at this time. A more accurate way to measure the amount of memory an application would require is to run the application on a very large machine and measure the needs under some slight memory pressure. (There is a tool in the Windows NT Resource Kit volume 3 utilities called Response Probe that can aid in this area.)

Gotcha. Adding memory without upgrading the secondary cache size sometimes degrades processor performance. This is because the secondary cache now has to map the larger memory space, usually resulting in lowered hit rates in the cache. This slows down processor-bound programs because they are scattered more widely in memory after memory has been added. (Secondary cache refers to the physical cache memory chip(s) usually located on the motherboard, as opposed to within the processor itself. If the future, processors will be built with secondary cache on the same substrate as the processor chip, or even within the processor chip itself.)
If you determine that a great deal of memory is being consumed by an application for which you have the source code, you may want to investigate tuning the application to be less memory intensive. Good tools to use to profile your applications' memory allocation are the Working Set Tuner and the VADUMP tools in the Win32 SDK.
Spreading paging files across multiple disk drives and controllers generally improves performance as multiple disks can process I/O requests concurrently. After all, you can have up to 16 separate page files. Also, since Windows NT has several system files that are frequently accessed, you may want to experiment with locating the paging file on one disk and the Windows NT system files on another. You should also locate the page file(s) on separate disk(s) from application files to allow for page file I/O and application file I/O to occur concurrently. This will only work if the disk driver(s) and controller(s) used can accommodate asynchronous I/O requests. Keep in mind that most IBM-compatible "non-super servers" have an ATDISK as the default and the ATDISK driver can have only one I/O request pending at a time. If your system mixes high-speed disks and low-speed disks, use the fastest disks for all your paging.
Use the Control Panel | System | Virtual Memory and set the page file size such that extension of it will rarely occur.
Use the Control Panel | Services to turn off unnecessary Windows NT services, and Control Panel | Network to uninstall any unnecessary Windows NT device drivers. This can free up both CPU and memory.
User accounts are stored in a registry hive, which means each account consumes paged pool on a Primary Domain Controller or Backup Domain Controller. Therefore the limit on the number of user accounts depends on the amount of memory and swap file space in your PDC and BDCs. User accounts take about 1K each, so 10,000 is about 10 MB. You may want to consider a second domain (possibly a different domain model) if you have more than 15,000 user accounts. However, the only answer may be to add more memory.
Some machines provide the ROM BIOS shadowing option. While this feature provides an advantage with MS-DOS, it is NOT an advantage with Microsoft Windows NT. ROM BIOS shadowing is the process of copying the BIOS from ROM into RAM and using either hardware or 386 enhanced mode to remap the RAM into the normal address space of the BIOS. Because reading RAM is much faster than reading ROM, BIOS-intensive operations are substantially faster. For example, MS-DOS uses the BIOS to write to the screen; therefore, with ROM BIOS shadowing, directory listings run more quickly. Windows NT does not use the BIOS (except during startup); therefore, no performance is gained by shadowing. If ROM BIOS shadowing is not used, more RAM is available. With Windows NT, there is an advantage to disabling the ROM BIOS shadowing option. This applies to other BIOS shadowing schemes as well. Typically the CMOS settings allow the system to shadow any BIOS. This includes the following: System BIOS, Video BIOS, Other adapters ROM BIOS (in a given select range).

Tuning for "Processor" Performance

A processor (running at a given clock speed) can execute a set number of instructions per second. Therefore, if a processor is switched among multiple threads that all have work to do, a given thread will take x (x being the number of simultaneously executing threads) times longer to complete a given task.

There are times when a thread has no work to do, such as when waiting for user input, or when waiting for another thread to finish a related operation. As long as the thread is in this waiting state, it will not be scheduled for execution and, thus, does not take up any CPU time. Since most Microsoft Windows®-type applications spend a considerable amount of time with their threads in this waiting state, there may be little performance degradation when running multiple Windows-based applications.

Some applications are considered CPU intensive. A CPU-intensive application almost always has work to do and spends very little, if any, time in the waiting state. For example, the following C program consumes 100% of the CPU. When additional applications are started, their performance, and that of the CPU-intensive application, will be less than optimal since all must share the processor's time. This is an example of how NOT to write an application; a better approach would be to create an event or wait on a semaphore.

main(){

while(1){}

}

The figure below shows the application's utilization of the CPU.

img00003.gif (15226 bytes)

What to Watch

If the "Processor % Processor Time" counter consistently registers at or near 100%, the processor may be the bottleneck. ('System % Total processor time" can be viewed for multiprocessor systems.) If this occurs you need to determine WHO or WHAT is consuming the CPU. To determine which process is using up most of the CPU's time, monitor the "Process objects % Processor Time" for all of the process instances (as in the previous figure).

You can tell if the CPU activity is due to applications or to servicing hardware interrupts by monitoring "Processor Interrupts/sec." This is the number of device interrupts the processor is experiencing. A value over 1000 should cause you to look at the efficiency of hardware I/O devices such as the disk controllers and network cards.
You can also monitor "System System Calls/sec." Systems Calls/sec is the frequency of calls to Windows NT system service routines. These routines perform all of the basic scheduling and synchronization of activities on the computer and provide access to nongraphical devices, memory management, and name space management. If there are many more interrupts per second than system calls, it could indicate that a hardware device is generating an excessive number of interrupts.
Monitor the "System Context Switches/sec" as well. Too frequent context switching can be caused if semaphores or critical sections (see the Windows NT SDK for more information) are placed at too low a level in order to attain high concurrency. The only way to solve this problem is to re-evaluate the priority place on the source code.

What You Can Do

Schedule CPU-intensive applications during off-peak hours. You can use the AT scheduler that ships with Windows NT.
If you have control over the application source, you may want to investigate tuning the application to be less CPU intensive. There are a number of tools available with the Windows NT SDK that allow you to do this, such as WAP (Windows API Profiler), CAP (Call Attributed Profiler), FIOSAP (File I/O and Synchronization Win32 API Profiler), and Win32 API Logger.
Distribute applications and processes across multiple machines.
Upgrade the processor if possible. Keep in mind that Windows NT runs on MIPS and Digital Alpha AXP machines as well as the Intel (386, 486, and Pentium). Most servers are either file servers or application servers. Even though they use the same operating system each uses the machine's resources in a different way. A file server generally maximizes system bus utilization and under-utilizes the processor. A 486 clock doubler chip in this machine would not provide a big performance enhancement over a typical 486 chip. An application server (such as a database server running Microsoft SQL Server and Systems Management Server), however, utilizes the processor subsystem significantly more than the file servers. You will find that this is the environment where a more powerful CPU chip will pay off.
If you are in a situation where you are trying to determine if moving to a RISC processor will increase performance, you should look at the counter "System Context Switches/sec." This is the rate of switches from one thread to another. Moving to a RISC machine will only be a good idea if the Context Switch rate is NOT dominating processor activity.
Add more processors assuming there is more than 1 thread capable of asynchronous execution. If you have a multiple processor computer, Windows NT will assign separate threads to different processors (interrupts are also distributed). The thread execution load is then distributed across the multiple processors. For example, if a CPU-intensive thread is executing on processor A, processor B will be free to process other threads.
Upgrade the secondary cache. In this same regard, you may consider upgrading the CPU to a chip with a 16K First Level cache such as a 486 DX4/100 (Unified Instruction and data cache) or a Pentium (8K data cache and 8K instruction cache).
Assuming you have at least a 486, if you are in a server environment, part of your problem may be the network or disk adapter cards you have chosen. 8-bit cards use more processor time than 16-bit or 32-bit cards. The number of bits here refers to the amount of data moved to memory from the adapter on each transfer. The most efficient cards use 32-bit transfers to adapter memory or direct memory access (DMA) to move their data. Adapters that don't use memory-mapped buffers or DMA must use processor instructions to move data, and that makes the processor busy. DMA uses the memory bus, and that can slow the processor down but it is still more efficient than individual instructions. There is more information on this topic in the "Tuning for Disk Performance" section of this document. Keep in mind while reading this section and the "Disk Performance" section that replacing PIO devices will almost always reduce processor bottlenecks.
In a resource-sharing environment, a greater improvement can be found by upgrading to a faster processor rather than increasing the number of CPUs. In a client-server environment, the addition of another CPU will typically give a better performance increase than upgrading to a faster or more advanced processor because of the multithreaded design of all Microsoft BackOffice products.
Each application (as well as each thread) in the system has a set priority. You can control the priority system-wide by changing the following in Control Panel | System | Tasking.

img00004.gif (6196 bytes)

Use this dialog box to change the relative responsiveness of applications that are running at the same time. When more than one application is running in Windows NT, by default the foreground application receives more processor time, and so responds better, than applications running in the background. (You can also use the Windows NT SDK utility PVIEW to set individual application priorities.)

You may also use the START command to alter the priority of a program as it is started. This command can take /low, /normal, /high, and /realtime switches to start programs with varying levels of priority.

Gotcha. Never start processor-bound applications at real-time priority.

Considerations for 16-Bit Applications

You can monitor the performance of 16-bit MS-DOS-based applications, however they are difficult to identify as instances because the program name does not appear. This is because each MS-DOS-based application shows up in its own Virtual DOS Machine (NTVDM). You would have to look at the individual threads (that is, "Thread Processor Time") for the NTVDM.EXE application. An easy way to identify the thread associated with the application you want to monitor is to stop all other 16-bit MS-DOS-based applications and choose the remaining thread. Another way to identify the application is to copy the NTVDM.EXE process to another name and editing the following path in the Registry:
SYSTEM\CurrentControlSet\

Control\WOW\

cmdline

16-bit Windows-based applications execute in one NTVDM by default, but can be started in separate NTVDMs.
If you are not satisfied with the performance of your MS-DOS-based applications running on Windows NT Workstation, try full-screen mode. In full-screen mode, most applications can run with native performance directly on the installed video adapter. Windows maps VGA memory to the appropriate place in the VDM and maps the relevant registers from the application to the video adapter. To get in and out of full-screen mode, press ALT+ENTER.
When running MS-DOS or Windows version 3.1, serial communications applications that directly access serial port hardware, you may enhance performance of these applications by using software handshaking (xon/xoff) instead of hardware handshaking (cts/rts). Because hardware must be virtualized under Windows NT, checking the cts/rts signals directly will incur an unavoidable performance degradation. Using xon/xoff handshaking avoids this problem since xon/xoff handshaking does not require accessing the serial port hardware directly.

Tuning for "Disk" Performance

As you might have guessed, disk performance is the single most important aspect of I/O performance. It affects many other aspects of system performance. Good disk performance enhances virtual memory performance and reduces the elapsed time required to load programs that perform a great deal of I/O, and so on.

If you discover a disk bottleneck, the first thing you need to determine is whether it's really more memory that you need. If you are short on memory, you will see the lost performance reflected as a disk bottleneck.

Gotcha. Because disk counters can increase disk access time by approximately 1.5% on a 386/20, Windows NT does not automatically activate these counters at system startup. To activate disk counters, type diskperf -y at the command prompt and restart the computer. On a 486 or better system, the hit is not apparent.

What to Watch

img00005.gif (16212 bytes)

If the "Physical Disk object's % Disk Time" counter consistently registers at or near 100%, the physical disk is the bottleneck. This counter is the percentage of elapsed time that the selected disk drive is busy servicing read or write requests, including time waiting in the disk driver queue.
If "Physical Disk Disk Queue Length" (pending disk I/O requests) is greater than 2, it generally indicates significant disk congestion. (Note: This same rule applies to most all I/O devices.)
Determine the portion of the disk I/O used for paging with the following function "% disk time used for paging = 100 * ('Memory Pages/sec' * 'PhysicalDisk Avg,DiskSec/Transfer')". If this is more than 10% of the total disk activity then paging is excessive. Avg. Disk sec/Transfer is the time in seconds of the average disk transfer. This formula does not include the case where you may be paging over the network.

What You Can Do

Install a faster disk and/or controller. Determine if the controller card does 8-bit, 16-bit, or 32-bit transfers. The more bits in the transfer operation, the faster the controller moves data. You may also want to choose a different drive technology. IDE (integrated drive electronic) has a 2.5 MB/s throughput, ESDI has a 3 MB/sec, SCSI-2 has a 5 MB/s throughput, and a Fast SCSI-2 has a 10 MB/sec throughput.
Create mirrored data sets. The I/O system can issue concurrent reads to 2 partitions. The first portion of the read will be to partition A, while the next portion of the read will be to partition B. (Assuming the disk driver and controller can handle asynchronous I/O).
Create striped data sets. Multiple disks (between 3 and 32)can process I/O requests concurrently (assuming the disk driver and controller can handle asynchronous I/O).
Add memory (RAM) to increase file cache size.
Change to a different I/O bus architecture. EISA, MCA, and local bus (VESA or PCI) buses transfer data at a much higher rate than ISA buses. PCI is fast because it transfers data at 33 MHz, a double word at a time (33 MHz * 4 = 132 Mb/sec) whereas ISA maxes out at about 5 Mb/sec and EISA about 32 Mb/sec (EISA transfers at 8 MHz * 4 bytes). There has been talk about raising the PCI clock rate to 66 MHz (to get a 264 Mb/sec transfer rate) but most manufacturers are resisting the idea (at about 50 MHz or so, getting past FCC class B certification is a nightmare).
When choosing a I/O device such as a disk adapter, consider the architecture of the card. For example here are some of the points to consider about each architecture:
PIO: PIO (programmed I/O) requires intervention by the CPU. For example, the Adaptec 1522 is a PIO device and can do either 16-bit PIO or 32-bit PIO. However, CPU-usage is quite intensive (30-40%) and it will slow down your system during a large transfer or a CD-ROM access. As such, most high-performance systems don't use a PIO device because they adversely impact system throughput. BYTE magazine did a comparison of Adaptec 2940 (PCI) against a Future Domain adapter (PIO). While the Future Domain and Adaptec 2940 provide almost identical benchmark results, the Future Domain consumes a hefty 40% of CPU time whereas the 2940 does not. However, all PIO devices are much cheaper to manufacture- the FD is about half the price of the 2940. Another thing to keep in mind is that the standard ATDISK disk (most IDE drives) does PIO.
DMA: ISA DMA has only 24-address lines so it can physically address 16 MB. However, if you happen to have 32 MB of RAM, the OS can see all of the memory. Therefore, if the OS wants to transfer a block of memory (which happens to be located at memory location above 16 MB, which the ISA DMA card, such as the Adaptec 1542C, cannot physically see), it will have to copy that block down to an area in the 0-15 MB range (where the Adaptec 1542c can see) so the 1542C can initiate the DMA transfer (double buffering). This copying down to 0-15 MB range and also copying up (16 MB and up) takes quite a bit of time (using Intel repsb, repsw, repsd) so that explains the slow down. However, you don't have that problem with either VL, PCI, or EISA as they all have 32-bit DMA address lines and can physically see up to 4 GB. PIO devices can see all of the memory, including those above 16 MB. The only problem is that it takes the processor to do any kind of data transfer. The last thing to keep in mind is that some devices do both PIO and DMA. If your system is not an ISA computer WITH more than 16 MB of RAM, you should always run with the controller in DMA mode.
Bus Master: Bus master devices have their own intelligence and offload this work from the CPU. The CPU can resume doing its own thing while the bus-master device is doing all the I/O. When it's done, it hands the result to the CPU. These cards are by far the best solution.
Gotcha. Make sure that you check the Windows NT Hardware Compatibility List before you purchase a controller. This will tell you if the controller is supported by Microsoft and has a certified driver.
On a 2 SCSI disk daisy-chained system, the SCSI controller has more of an impact on your total performance than your disk drive. You would be better off buying a slower, cheaper disk and investing in a better SCSI controller.
Adding more physical drives in a RAID 5 configuration can result in significant performance improvements when the disk subsystem is the bottleneck. However, adding more controllers usually does not significantly improve performance. When using high-performance disk controllers, the physical drive access times are usually the performance limiting factor for the disk subsystem.
Choose a disk with a low seek time (the time required to move the disk drive's heads from one track of data to another). The ratio of time spent seeking to time spent transferring data is usually 10 to 1, and often much higher.
Distribute the workload as evenly as possible among different disk drives. This will allow you to take full advantage of the system's I/O bandwidth. For example, if you have one user population that does a great deal of reads and writes to directory \\server\ExcelData and another user population that does a great deal of reads and writes to a directory \\server\WordData then you may want to consider putting the ExcelData directory on a different disk and/or controller than the WordData directory. You can take advantage of the auditing facility of Windows NT and the NTFS file system to track how certain network files are being used. User Manager lets you enable file access auditing, and File Manager lets you specify the users and files whose access you want to record.
If you choose a FAT file system, with time it tends to become fragmented. As the file system becomes full, pieces of files tend to be scattered over the disk; the system cannot find enough contiguous blocks to store a new file in one place, so it must fit the file in empty spaces between other files. As files are added, deleted, truncated, and expanded, the file system becomes increasingly disorderly. Performance suffers because the disk drive cannot read a file with a sequential group of operations. Instead, it must constantly seek for different pieces of the file. To avoid fragmentation, use a Defrag utility, such as Executive Software's DiskKeeper, to adjust files in a sequence.
NTFS is best for use on volumes of about 400 MB or more. This is because performance does not degrade with larger volume sizes under NTFS as it does under FAT. As the size of the volume increases, performance with FAT will quickly decrease. When using the FAT file system, the disk space taken by files is more than the space taken when using NTFS. FAT file system uses clusters to allocate disk space for files. Clusters are the smallest allocation units that the file system uses to allocate space for the files. For example, for a 1-byte file, 1 cluster will be allocated, thus wasting all of the unused space. When a large number of small files are stored on a FAT partition, the cluster size may tend to waste a large amount of disk space. The cluster size is dependent on the size of the logical drive. FAT can only track a maximum of 64K clusters since there are 64K entries in the File Allocation Table. That would indicate that the cluster size will increase for large drives, in order to access the whole drive. The maximum cluster size is 64K, thus making the largest logical drive size to be 4 gigabytes. With NTFS there is a limit, however it's 264.
Disabling short name generation on an NTFS partition will greatly increase directory enumeration performance especially in the case where individual directories contain a large number of files/directories with non-8.3 filenames. To disable short name generation, use REGEDT32.EXE to set a registry DWORD value of 1 in the following Registry location:
SYSTEM\CurrentControlSet\

Control\Filesystem\

NtfsDisable8dot3NameCreation

Gotcha. This may cause compatibility problems with 16-bit MS-DOS- and Windows-based applications.

Tuning for "Network" Performance

Network performance problems can have basically three forms, each of which cause the network protocol to have to transmit each block of data many times (or error out) causing performance problems.

A server can be overloaded
The server is being asked to do more than it can based on an inadequate resource, possibly from a lack of another resource such as memory.
A network can be overloaded
The amount of data that needs transferred is greater than the capacity of the physical medium.
A network can lose data integrity
The network is faulty and intermittently transfers data incorrectly.
We can examine each of these problems from the perspective of the OSI (Open Systems Interconnect) networking model. From the application layer's point of view, there are the Server service and Workstation (redirector) service components as well as other application layer support entities such as Netlogon, Replicator, and other services. From the Transport layer's point of view, there are the transport components such as TCPIP, NetBEUI, NWLINK, and so on. From the Datalink/Physical layer's point of view there are the Adapter cards and NDIS drivers.
Gotcha. The following section details many registry entry changes. Let us note that the "out of the box" settings within the registry should allow you to have a well-balanced system. If you alter a setting, it may actually reduce the bottleneck, however it may also create another problem. Set parameters with care. If you do have a problem, use the "Last Known Good" option during system initialization to revert to an unchanged registry.

Server

The Windows NT Server service's responsibility is to establish sessions with remote stations and receive SMB (Server Message Block) request messages from those stations. (SMB requests are typically used to request the Server service perform I/O-such as open, read, or write on a device or file located on the Windows NT Server station).

You can configure the Windows NT Server service's resource allocation (and associated nonpaged memory pool usage) by using the Control Panel Network application. When you use the Control Panel Network application to configure the Windows NT Server service software, you are presented with the following Server Optimization Level dialog:

You may want to consider a specific setting, depending on factors such as how many users will be accessing the system and the amount of memory in the system. The amount of memory allocated to the Windows NT Server service (for such resources as InitWorkItems, MaxWorkItems, RawWorkItems, MaxPagedMemory, MaxNonPagedMem, ThreadCountAdd, BlockingThreads, MinFreeConnections, and MaxFreeConnection) differs dramatically based on your choice.
The "Minimize Memory Used" level is meant to accommodate up to 10 remote users simultaneously using Windows NT Server.
The "Balance" option is for up to 64 remote users.
The "Maximize Throughput for File Sharing" is for 64 or more remote users. With this option set, file cache access has priority over user application access to memory (the value of LargeSystemCache in the registry changes to 0x1). Use this option if you are using Windows NT Server for file server capabilities. This is the default setting!
SYSTEM\CurrentControlSet\Control\

Session Manager\Memory Management\

LargeSystemCache
The "Maximize Throughput for Network Applications" is for 64 or more remote users. However, with this option set, users' application access has priority over file cache access to memory (the value of LargeSystemCache in the registry changes to 0x0).
If the Windows NT Server service runs out of a resource due to one of these settings, you will see the following error in the Windows NT Event Log:
2009: Server could not expand a table because the table reached the maximum size.
Not enough server storage is available to process this command.
If the "Server Work Item Shortages" or "Server Pool Paged/Nonpaged Failures" are consistently increasing, or if "Server Context Block queue Time" (the average time, in milliseconds, a work context block sat in the server's queue waiting for the server service to act on the request) consistently averages greater than about 50 (ms), the server service is acting as a bottleneck for all tasks, on remote stations, that are issuing remote I/O requests to the server. This may be the fault of the Windows NT Server service optimization level, or it may be the fault of other bottlenecked resources (disk, CPU, and memory) on which the Windows NT Server service depends. A WorkItem is the location where the server stores an SMB. The amount of WorkItems that are available fluctuates between a minimum value (InitWorkItems) and a maximum value (MaxWorkItems). The initial value and maximum value are configured based on Server Optimization level and the amount of memory in the machine. If WorkItem shortages are occurring, it may be caused by an overloaded server. You may want to consider identifying and off-loading some of the server's "resource-consuming" tasks.
Monitor the "Server Pool paged failures" and "Server Pool nonpaged failures." If they are occurring then the server is running out of the paged/nonpaged pool it originally allocated. If this occurs you may want to consider increasing the resource using the following parameters:
SYSTEM\CurrentControlSet\Services\

LanmanServer\Parameters\

MaxNonPagedMemoryUsage

and

SYSTEM\CurrentControlSet\Services\

LanmanServer\Parameters\

MaxPagedMemoryUsage

You will also experience one of the following errors in the system Event Log:
2017: The server was unable to allocate from the system nonpaged pool because the server reached the configured limit for nonpaged pool allocations.
2018: The server was unable to allocate from the system paged pool because the server reached the configured limit for paged pool allocations.
2019: The server was unable to allocate from the system nonpaged pool because the pool was empty.
2020: The server was unable to allocate from the system paged pool because the pool was empty.
This is more than likely being caused by lack of memory in the system. If this occurs you should refer to the "Memory" section of this paper.

There are similar paged/nonpaged values for the Macintosh file server service. The "MacFile PagedMemLimit" specifies the maximum amount of page memory that the Macintosh file server can use. Performance of the Macintosh file service increases with an increase in this value. However, the value should not be set lower than 1000K. It is especially important that you are well acquainted with memory issues before changing this resource parameter. You cannot change this value from Server Manager.

SYSTEM\CurrentControlSet\Services\

MacFile\Parameters\

PagedMemLimit (default = 20000 decimal REG_DWORD)

The "MacFile NonPagedMemLimit" specifies the maximum amount of RAM that is available to the file server for Macintosh. Increasing this value helps performance of the file server but decreases performance of other system resources.

SYSTEM\CurrentControlSet\Services\ MacFile\Parameters\ NonPagedMemLimit (default = 4000 decimal REG_DWORD)
If Other (nonserver service) processes are competing with the server for processor time, you may want to consider increasing the server's worker threads priority.
SYSTEM\CurrentControlSet\Services\

LanmanServer\Parameters\

ThreadPriority (default =1 REG_DWORD)

The server threads by default run at "foreground process priority." Other threads in the system service run at "foreground process priority + 1" such as the XACTSRV threads (the service responsible for supporting remote API requests from Microsoft LAN Manager local area network software version 2.x stations). Since the XACTSRV is used to process printing requests, a file server that is also a print server may suffer from server thread starvation because the server threads are at a lower priority than the XACTSRV threads. In this case it makes since to increase the servers ThreadPriority to 2.

Gotcha. Do not increase the priority beyond 2, or the system may not respond normally to other activity.

Another alternative is to drop the priority of the Spooler (it runs at 9 by default on NT 3.5 Server). You can do this with the PriorityClass parameter in the registry. It is located in the following location:

SYSTEM\CurrentControlSet\

Control\Print\

PriorityClass (default=0 REG_DWORD)

You can verify the priority with the PVIEWER.EXE application in the Windows NT Resource Kit. The figure below details the priority with the default setting. If you change the value in the registry and do a 'net stop spooler' then a 'net start spooler' at the command line the priority will update.
If you see the following event occur in the System log "2001: The server was unable to perform an operation due to a shortage of available resources... with the following included in the hex information 000c0000 005c0001," increase the following:
SYSTEM\CurrentControlSet\Services\

LanmanServer\Parameters\

MinFreeConnections No matter how few connections are actually established, the server will make sure that there are at least "MinFreeConnections" preinitialized, unused connection blocks ready to be used for a new connection. This value is 2 if you set the server to "Minimize Memory Used" or "Balance" and 4 if you select "Maximize Throughput...."
If you are limited on hardware resources and want to limit the number of users that can be simultaneously logged on to a server you can manipulate:
SYSTEM\CurrentControlSet\Services\

LanmanServer\Parameters\

Users
Since each server connection does take up some amount of memory, you may want to consider tuning the Autodisconnect parameter. This parameter sets the time interval after which inactive connections are terminated if no open files on the connection exist. This will free up a small amount of the server's resources to accommodate active users.
SYSTEM\CurrentControlSet\Services\

LanmanServer\Parameters\

Autodisconnect (default=15 min.)

Workstation (Redirector)

When applications or users issue Connect, Open, Read, or Write requests on path-names that reference a redirected drive (net use z: \\server\share), the request is forwarded to the local Windows NT redirector. The redirector then packages up the request and forwards it down to the transport (TCP/IP, NBF, or NWLINK) and out onto the wire to be picked up by a server. So, as you can see, a great deal of the redirector's network performance is tied directly to how well the server responds to its requests. However, there are a few issues to be aware of on the redirector side.

"Redirector Current Commands" counts the number of requests to the Redirector that are currently queued for service. If this number is much larger than the number of network adapter cards installed in the computer, then the network(s) and/or the server(s) being accessed are seriously bottlenecked. To try to compensate for the problem locally, you could increase the maximum allowed pending network commands if the redirector application I/O request queue is backed up by increasing:
SYSTEM\CurrentControlSet\Services\

LanmanWorkstation\Parameters\

MaxCmds (default = 5)
If you see "Redirector Network Errors/sec" then SMB requests are timing out, forcing the redirector to disconnect, reconnect, and recover. If this is occurring, you may need to increase the:
SYSTEM\CurrentControlSet\Services\

LanmanWorkstation\Parameters\

SessTimeout (default = 45 sec REG_DWORD)

This specifies the maximum amount of time that the redirector allows an operation that is not long-term to be outstanding.
Increase the redirector's thread count if the redirector can't accommodate overlapped I/O requests. For example, the WriteFileEx() WIN32 function may fail, returning the messages ERROR_INVALID_USER_BUFFER or ERROR_NOT_ENOUGH_MEMORY if there are too many outstanding asynchronous I/O requests.
SYSTEM\CurrentControlSet\Services\

LanmanWorkstation\Parameters\

MaxThreads (default=17)
If you have more than 1 redirector loaded on your Windows NT Workstation (for example, Client Services for NetWare, and so on), consider the order of providers. When a WNet API is called, it routes the call to the first provider DLL (dynamic-link library) in the "ProviderOrder" and then waits for this provider to return before submitting it to the next provider. You can see the provider order by looking in the Network Control Panel and pressing the Network button

or, if you are interested in the value on a remote machine, you can use the registry editor (REGEDT32.EXE) and view the following registry entry:

SYSTEM\CurrentControlSet\Control\

NetworkProvider\

Order:ProviderOrder
There is a new SMB that is now supported under Windows NT called NtTransact_NotifyDirectoryChange. This allows an application to know when a directory structure has been updated on the server. If an application causes one of these SMBs to be submitted, RAW SMB I/O cannot be accomplished. (Note: RAW I/O is much faster than CORE I/O. However, it must have the session's full attention. Since there is an outstanding request on the session, RAW cannot be accomplished.) Windows NT File Manager causes one of these SMBs to be submitted if you are focused on a redirected drive. This can cause a slowdown on large reads and writes from other applications. You can shut this feature off for File Manager in the registry by adding the following value:
SOFTWARE\Microsoft\File Manager\ Settings\ ChangeNotifyTime (default=0 REG_SZ)

Netlogon

One of the primary jobs of Netlogon is to keep the user account database in sync on all of the backup domain controllers with the primary domain controller.

Increase Netlogon service update notice periods on your Primary Domain Controllers, as well as the server announcement period if you are concerned with the amount of maintenance traffic the Windows NT Server is creating and the load on the primary domain controller.
Value Name Default Value Minimum Value Maximum Value

PulseConcurrency 20 1 500

Pulse 300 (5 minutes) 60 (1 minute) 3600 (1 hour)

Randomize 1 (1 second) 0 (0 seconds) 120 (2 minutes)
Pulse defines the typical pulse frequency (in seconds). All User/Security account database changes made within this time are collected together. After this time, a pulse is sent to each BDC needing the changes. No pulse is sent to a BDC that is up-to-date.
Randomize specifies the BDC back off period (in seconds). When the BDC receives a pulse, it will back off between zero and Randomize seconds before calling the PDC.
PulseConcurrency defines the maximum number of simultaneous pulses the PDC will send to BDCs.
Netlogon sends pulses to individual BDCs. The BDCs respond by asking for any database changes. To control the maximum load these responses place on the PDC, the PDC will only have PulseConcurrency pulses "pending" at once. The PDC should be sufficiently powerful to support this many concurrent replication RPC calls (related directly to server service tuning as well as the amount of memory in the machine). Increasing PulseConcurrency increases the load on the PDC. Decreasing PulseConcurrency increases the time it takes for a domain with a large number of BDCs to get a user account database change to all of the BDCs. Consider that the time to replicate a database change to all the BDCs in a domain will be greater than:
((Randomize/2) * NumberOfBdcsInDomain) / PulseConcurrency

Transport (NBF, TCP/IP, NWLink, and so on)

The transport drivers function is to transport network data submitted by applications (such as the redirector, e-mail, Microsoft SQL Server, and so on) to other network stations. Windows NT ships with a variety of transport drivers such as TCP/IP, NBF (NetBEUI), and NWLink. All of these transports export a TDI interface on top and an NDIS (Network Driver Interface Specification) on the bottom. (Windows NT also ships with AppleTalk and DLC, however, these do not have a TDI interface.)

If the protocol used on most stations that you will connect to is first in the bindings list, average connection time decreases. This is because when you request a connection to shared resources on a remote station, the local workstation redirector submits a TDI connect request to all transports simultaneously, and when any one of the transport drivers completes the request successfully, the redirector waits until all higher priority transports return. In the following figure you will see that the NetBEUI / Intel Ether Express binding has the highest priority.
Each transport has its own way of doing windowing (typically the amount of packets sent before an acknowledgment is required). By increasing the window size, you can send more packets to the other side before you have to wait for an acknowledgment. This can have a slight increase in performance (less packets = less I/O), however, it can also increase the risk of retransmission. This is NOT a recommended practice.
For NBF you can modify:
SYSTEM\CurrentControlSet\Services\
NBF\Parameters\ LLCMaxWindowSize (default = 10)

This is how many LLC I-frames NBF can send before it must stop and wait for an acknowledgment.
For TCP/IP you can modify:
SYSTEM\CurrentControlSet\Services\
Tcpip\Parameters\
TcpWindowSize (default = 8192)

This is the amount of data that can be accepted in a single transaction.
For NWLINK you can modify 3 entries:
SYSTEM\CurrentControlSet\Services\
NWNBLink\Parameters\
AckWindow (default = 2)

This specifies the number of frames to receive before sending an acknowledgment.

SYSTEM\CurrentControlSet\Services\
NWNBLink\Parameters\
RcvWindowMax (default = 4)

This specifies the maximum number of frames the receiver can receive at one time.

SYSTEM\CurrentControlSet\Services\ NWLink\Parameters\
WindowSize (default = 4)

This specifies the window to use in the SPX packets.
If you are on an NBF network and have a server on a very slow link, you may want to consider increasing the following:
SYSTEM\CurrentControlSet\Services\ NBF\Parameters\
DefaultT1Timeout (default = 600 ms; grows dynamically to 10 sec)

he T1 value controls the time that NBF waits for a response after sending a logical link control (LLC) poll packet before resending it. The default value you specify here is only used upon link establishment. It is then dynamically changed every 30 seconds.
If you are on an NBF network and have a server on a very busy link you may want to consider increasing the following:
SYSTEM\CurrentControlSet\Services\ NBF\Parameters\ LLCRetries (default = 8)

This value specifies the number of times that NBF will retry polling a remote workstation after receiving a T1 timeout. After this many retries, NBF closes the link.

Physical (Network Adapter)

If the sum of the "Server Bytes Total/sec" (the number of bytes the server has sent to and received from the network) is roughly equivalent to the maximum transfer rate of your network, you may need to segment your network. On an Ethernet segment this value is ~1.2 megabits per sec., once you include the overhead of the network.
An Ethernet segment is shared by every user of every system on the network. Therefore, it is a relatively limited resource with many users. This situation can be alleviated somewhat by adding sub-networks, but no matter how complex the network's topology, a network basically consists of many systems communicating through a single piece of wire. If one user is accessing a very large file across the network, that user may be slowing down the network for all users.
Match adapter to the system bus. If you have a 16-bit bus, use a 16-bit network adapter; if you have a 32-bit bus, use a 32-bit network adapter.
Avoid sending from fast adapters to slow adapters.
If you need to transfer huge amounts of data between different computer systems, Ethernet may not be the appropriate medium to use; the basic Ethernet cable is limited to 10 megabits per second (considerably less when you include network overhead). Other media are now available that offer significantly higher sustained transfer rates (FDDI, and so on).
The Network Monitor (provided with Systems Management Server) is a very good tool to use to monitor the general network performance. It offers additional Performance Monitor counters as well as a few unique statistics from within the application such as:
% Network Utilization represents what percentage of the network bandwidth is being used.
Frames per Second is the number of frames being transmitted on the network per second.
Bytes Per Second is the number of bytes being transmitted on the network per second.
Broadcasts per Second represents the number of broadcast frames on the network per second.
Multicasts per Second represents the number of multicast frames on the network per second.
Network Card (MAC) Statistics represents the cumulative total number of frames, bytes, broadcasts, and multicasts seen on the network by the network card since the capture has begun.
Network Card (MAC) Error Statistics indicates the cumulative errors seen from the network card. These include CRC Errors and frames dropped because of no buffer space as well as frames dropped because of hardware constraints.
By sorting the Network Monitor Broadcasts Multicasts column in the Station Statistics pane (bottom pane), you can find the source(s) of a broadcast storm to see which machine(s) is/are sending the most Broadcast frames.
An increase in the amount of Broadcasts/Multicasts per second can relate directly to machine performance. Each broadcast/multicast causes every card on the net to generate an interrupt to allow the packet to be passed up to the transport. This can cause serious CPU utilization problems. As a general rule, a broadcast/multicast rate of over 100/sec should cause you to investigate a cause as well as a cure. The cure may be as easy as identifying a jabbering network card or configuring a router to not enable TCP ports 137 and 138. Note: NBF is not a routable transport.
% Network Utilization should be considered when things start slowing down to the point they are no longer acceptable. Some say that this point is around 40-50%.
Gotcha. In Windows NT 3.5, the counter "Network Segment % Network Utilization" in Performance Monitor must be monitored at 1 second intervals. This will be fixed in Windows NT 3.5.1.
Collisions occur when your system starts sending data at the same time as another system on the network. When your system detects a collision, it waits a random amount of time and retransmits the packet. Collisions are normal events and don't indicate hardware problems. However, the probability of two hosts transmitting at the same time, increases as the network is more heavily utilized, so collisions are an extremely good indicator of network load. The number of collisions should be, at most, 15% of the total number of output packets. The only solution for this problem is to rearrange the network in a way that reduces traffic. Ethernet networks start to have significant collisions at about 66.67% utilization, or 833375 bytes per second. You can measure collisions with a tool such as a Network General Sniffer. Note that Version 1.0 of Network Monitor does not report collisions.

Capacity Planning

Now that you have your system optimized to where you are very comfortable with its performance (today), it's time to start collecting data that will help you in the future. The following counters are a good starting point for resource capacity planning:

Object >Counter(s) Processor % Processor Time, Interrupts/sec

MemoryPages/sec, Cache Faults/sec, Available Pages, Commit Limit, Committed Bytes

Paging File Usage Peak

Physical Disk % Disk Time, Avg. Disk Seconds/Transfer

Logical Disk % Free Space

Redirector Bytes Total/sec, Current Commands

Server Bytes Total/sec, Server Sessions, Pool Paged Peak, Pool Nonpaged Peak, Work Item Shortages

There is a new service included in the Windows NT Resource kit called DATALOG.EXE. It will allow you to capture the data and forward it to a data store where it can be gathered up later to be used for trend analysis.

Once you have identified your system's thresholds based on the data, you will probably want to set up Performance Monitor Alerts. For example, you may want to set an alert on the "Physical Disk Free Megabytes" on your file server's logical drives if it hits a certain threshold, "Paging File % Usage" if it hits 80 or 90%, and "Redirector NetworkErrors/sec."

There is a great deal more information about Capacity Planning, and the issues surrounding it, in the Windows NT Resource Kit volume 3 by Russ Blake. It details issues relating to Log concentration and Archiving as well as other important details.

Summary

The motivation behind system tuning is to get the most you can out of the hardware you already own. If you decide that an upgrade is your only solution, you will find that your investment in performance tuning pays off. Your work will show you how the system should be upgraded. If you have done your homework, you will know whether you need more memory, faster disks, or a completely new processor. However, if you recorded your system's performance history, you not only did your homework, but you've studied enough to pass the test because now you can tell about latent demand and system growth.

References

Software Companies You May Want to Investigate

BSG Systems, Inc.
(617) 891-0000
E-mail best1@bgs.com
Datametrics
(800) 869-3282
E-mail experts@datametrics.com
Metron
Candle Software
Legent
SPEC
BAPCO
Computer Capacity Management (ICCM) in Phoenix, Arizona.
StonyBrook Services, Inc., in Bohemia, New York
Intrak, Inc., San Diego TrendTrak
Network General Corp., Menlo Park, California, "Reporter"

Literature

Optimizing Windows NT - Windows NT Resource Kit Volume 3 by Russ Blake (ISBN 1-55615-619-7)
Windows NT Advanced Server Concepts and Planning Guide
Capacity Management Review (602-997-7374, $195.00)
Computer Measurement Group (newsletter)
414 Plaza Drive, Suite 209
Westmont, IL 60559
High Performance Computing - O'Reilly & Associates

Thanks

Dan Perry (Microsoft World Wide Training)
Russ Blake (Microsoft development)
Reza Baghai (Microsoft development)
Barry Hicks (JCPenney Capacity Planning)
Chad, T. Ray, Glenn, Dennis, Rick, Darrel, and Mustafa

THESE MATERIALS ARE PROVIDED "AS-IS," FOR INFORMATIONAL PURPOSES ONLY.

NEITHER MICROSOFT NOR ITS SUPPLIERS MAKE ANY WARRANTY, EXPRESS OR IMPLIED, WITH RESPECT TO THE CONTENT OF THESE MATERIALS OR THE ACCURACY OF ANY INFORMATION CONTAINED HEREIN, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. BECAUSE SOME STATES/JURISDICTIONS DO NOT ALLOW EXCLUSIONS OF IMPLIED WARRANTIES, THE ABOVE LIMITATION MAY NOT APPLY TO YOU.

NEITHER MICROSOFT NOR ITS SUPPLIERS SHALL HAVE ANY LIABILITY FOR ANY DAMAGES WHATSOEVER INCLUDING CONSEQUENTIAL, INCIDENTAL, DIRECT, INDIRECT, SPECIAL, AND LOSS OF PROFITS. BECAUSE SOME STATES/JURISDICTIONS DO NOT ALLOW THE EXCLUSION OF CONSEQUENTIAL OR INCIDENTAL DAMAGES, THE ABOVE LIMITATION MAY NOT APPLY TO YOU. IN ANY EVENT, MICROSOFT'S AND ITS SUPPLIERS' ENTIRE LIABILITY IN ANY MANNER ARISING OUT OF THESE MATERIALS, WHETHER BY TORT, CONTRACT, OR OTHERWISE, SHALL NOT EXCEED THE SUGGESTED RETAIL PRICE OF THESE MATERIALS.

Owner: Ilse Vinson & Desktop-Admin
Feedback