|
Performance Tuning Windows NT
|
22 April 2003
Written by Scott B. Suhy, Consultant with Microsoft Consulting Services, responsible
for enterprise architecture, design, and optimization for Fortune 500 companies.
Email Scottsu@microsoft.com
Baseline
Introduction
Would it not be nice if there were no traffic bottlenecks during your
everyday task of going to work? No traffic lights, fender benders, car
problems, detours, people pulling out in front of you, people in the left hand lane going
less than the speed limit, four lane highways narrowing down to two lanes.... This is
rather unrealistic, just like with a computer system it is unrealistic to expect at some
point in time there will not be a limit to the amount of memory, CPU, or I/O being
consumed by internal or external processes.
You might also say that it might be nice to know how long it was going to take you to
get to work in the morning (with some expected normal variation). Users of a
computer system have the same expectation. They expect their jobs to finish in an
acceptable amount of time without bottlenecks in the system slowing them down.
If there were bottlenecks on your way to work each day, I suppose you could optimize
or tune the trip (reduce bottlenecks) by possibly finding an alternate route, car
pooling, taking advantage of a car pool lane, taking a bus, or even changing your working
hours (possibly to the evening when there is no traffic and the only thing keeping you
from getting to work any faster is the speed limit and possibly the size of your engine).
Computer systems have the same optimizations (run jobs during off peak hours, etc.). As
with transportation systems, there is also the same lack of environmental control with a
computer system. For example, it is not realistic to think that there will always be the
same amount of traffic (on the road or in your computer system), it's also not realistic
to think that you have control over the traffic (on the road or in your computer system).
Problems always occur (a rain storm causing increased slowdowns on the road or one user
consuming a great deal of the bandwidth of the server's memory, CPU, or I/O). Managing, as
well as expecting, the problems, and knowing what to do when they occur is the key.
Once you feel you have the trip optimized, you might also think about taking some
statistics, daily, weekly, or monthly, such as the amount of time it takes you to arrive
at the office, number of red lights you got rather than green, and so on. This type
of information will allow you to make future decisions on such things as "If I stop
to get gas in the morning, how much earlier will I have to leave the house?" Of
course you would also have to know how much time it would take you at the gas station
(another set of statistics). The same thing goes for your computer system. It's called
Capacity Planning.
The following information provides you with tips on areas of the Microsoft® Windows
NT operating system in which you should pay attention (What to Watch). It also gives
you a few rules/guidelines to use to optimize the system (What You Can Do). Once you take
each of these areas into consideration, your system should be optimized. Once you feel
your system is optimized it is then time to gather data on current capacity. The data will
allow you to do the following:
- Project how much the workload at the memory, CPU, I/O, and bandwidth levels will
increase in response to business growth and new Microsoft BackOffice applications.
- Diagnose problems by comparing subsequent measurements.
This information is rather technical in nature and assumes that you already know a
great deal about Microsoft Windows NT Workstation and Microsoft Windows NT Server
operating systems. However, it only touches the surface of optimization. Many books could
be written on the subject. Consequently, this paper neglects to explain many details and
assumes you know where to get information about the hardware and software concepts
mentioned. If you stumble upon a concept that is not explained in detail, you may want to
refer to the Microsoft Windows NT Resource Kit, Server Message Block specification (which
can be obtained from Microsoft), Microsoft TechNet, or any book that details network
architecture (such as the book Local Area Networks by James Martin or LAN Times
Encyclopedia of Networking by Tom Sheldon).
Definitions
Before diving into any Performance Tuning, it is necessary to go over some definitions
and terms.
Task
For the purpose of this paper, I refer to the word task as a series of computer
instructions, the execution of which involves work to be performed by one or more computer
components or resources (for example, CPU, memory, hard disk, and network adapters).
The amount of time it takes to complete a task can be divided up among the several
resources that are involved in the task's execution-some resources will be responsible for
small amounts of the total time, others will be responsible for larger amounts.
Bottleneck
The single resource that consumes the most time during a task's execution is that
task's bottleneck. Bottlenecks can occur because resources are not being used
efficiently, resources are not being used fairly, or a resource is too slow or too small.
Let me try to elaborate on this point with the following example.
Example.If a task takes 2.2 seconds to complete, with .2 seconds spent
executing instructions in the CPU and 2 seconds retrieving data from the disk (assuming
both are not overlapping in time), the disk is the bottleneck in the task. If the CPU were
replaced with one twice as fast, task execution time would drop from 2.2 to 2.1 seconds.
This would be approximately a 4.5% increase in productivity. However, if the disk
controller were replaced with one twice as fast, it would drop the disk access time from 2
seconds to 1 second, dropping the total execution time from 2.2 to 1.2 seconds. This would
be approximately a 45% increase in productivity.
It would be easy if the previous example were on a workstation running the Microsoft
MS-DOS® operating system, but we are dealing with a multitasking OS. One thing to always
keep in mind, especially in a multitasking OS, is that resolving one bottleneck will
always lead to the next one.
Windows NT System Tuning
The goal in tuning Windows NT is to determine what hardware resource is experiencing
the greatest demand (bottleneck), and then adjusting the operation to relieve that demand
and maximize total throughput. A system should be structured so that its resources are
used efficiently and distributed fairly among the users. This is not as difficult as it
sounds, assuming you use a few good rules/guidelines and have a thorough understanding of
the computing environment. For example, in a file and print server environment, most of
the activity at the server is in support of file and print services. This tends to cause
high disk utilization because of the large number of files being opened and closed. It
also causes the network interface card(s) to endure a heavy load because of the large
amount of data that is being transferred. Memory typically does not get a heavy load in
this environment (memory usage however can be heavy due to the large amount of system
memory that may be allocated to file system cache). Processor utilization is also
typically low in this environment. In contrast, a server application environment (for
example, other Microsoft BackOffice products such as Microsoft SQL Server database
server for PC networks, Microsoft Mail electronic mail system, Microsoft Systems
Management Server centralized management for distributed systems, and Microsoft SNA
Server) is much more processor and memory bound than a typical file and print server
environment because much more actual processing is taking place at the server. The disk
and network tend to be less utilized, due to a smaller amount of data being sent over the
wire and to the disk. Understanding these generalizations is not enough; the only way to
get an idea of the utilization of the resources is to monitor them, and one of the most
powerful tools that you can use is the Windows NT Performance Monitor.
Performance Monitor is a graphical tool for measuring the performance of your own
Windows NT-based computer or other Windows NT-based computers on a network. It is located
in the Administrative Tools group of both the Windows NT Workstation and Windows NT Server
products. On each computer, you can view the behavior of objects such as processors,
memory, cache, threads, and processes. Each of these objects has an associated set of counters
that provide information on such things as device usage, queue lengths, and delays, as
well as information used for throughput and internal congestion measurements. It provides charting,
alerting, and reporting capabilities that reflect current activity along with
ongoing logging. You can also open log files at a later time for browsing and
charting as if they were reflecting current activity.
Before spending money to add more hardware or replace existing hardware with faster,
it's best to use Performance Monitor to first tune the system to make the most efficient
use of existing resources. Here are a couple of examples of where the tool may be useful:
Example. If we find that the CPU is 100% utilized, before replacing it with a
faster CPU or adding another one, we should identify and analyze the process that is
utilizing the bulk of the CPU time. We may find that the processor cycles are being
consumed by a disk controller requiring PIO. In this case a DMA disk controller will then
reduce processor utilization.
Example. If we determine the hard disk is full, before adding additional disk
drives, identify how much of the page file is being utilized. You may find that the system
page file size is initialized at 100 MB, but there is never more than 40 MB of it being
used. Instead of purchasing another disk, we could adjust the size of the page file.
Typical Questions
If you talk to our product support engineers or our consultants in the field and ask
them about the tuning questions they most frequently hear, you may find the following:
How do I determine how well an application is performing?
How can I support my environment in a proactive manner?
How do I know what component of my system is the most limiting (the bottleneck)?
How can I ensure my system is performing the best it possibly can perform?
How do I determine what size system I need based on the following criteria?
How do I know when to upgrade?
All of these questions play some part in performance tuning. We are going to focus
mostly on answering questions 2, 3, and 4, primarily by focusing our attention on
exploring each of the primary components of a computer system-the memory, processor, and
the I/O subsystem (e.g., disks and networks). From this standpoint, performance tuning
means ensuring that every user gets a fair share of available resources of the entire
system. Once you feel you have 2, 3, and 4 under control, you can start focusing on 5 and
6, which are more capacity planning issues. Once you have 5 and 6 under control, you will
be able to answer number 1, and more important, do "What If" analysis.
Tuning for "Memory" Performance
Lack of memory is by far the most common cause of serious performance problems in
computer systems. If you read no further in this document you could just answer by saying
"Memory!", if anyone ever asks you how to improve the performance of a system.
Memory contention arises when the memory requirements of the active processes exceed
the physical memory available on the system; at this point, the system is out of memory.
To handle this lack of memory the system starts paging (moving portions of active
processes to disk in order to reclaim physical memory). At this point, performance
decreases dramatically. Consider the following example. If the average instruction in a
computer takes approximately 100 nanoseconds to execute and disk access takes somewhere on
the order of 10s of milliseconds, how many times slower would the machine run, if there
were 1 paging operation per instruction? If you answered 100,000 you would be correct!
Let's hope things don't get that bad....
To optimize overall performance, steps must be taken to ensure that main memory is used
as efficiently as possible and thus paging is held to a minimum. As you will see in the
next section, you can tell how loaded system memory is by watching how the system pages.
What to Watch
- The Performance Monitor counter "Memory Pages/sec" is the number of
pages read from the disk or written to the disk to resolve memory references to pages that
were not in memory at the time of the reference. As a rule, you can assume that if the
average of this counter is consistently greater than 5, then memory is probably becoming a
bottleneck in the system. Once this counter starts to average consistently at 10 or above,
performance is significantly degraded and disk thrashing is probably occurring.
- If the actual size of the page file is greater than its initial size (typically physical
RAM + 12), time is being spent growing the page file and dealing with page file
fragmentation. It is best that the page file not be required to grow during the operation
of the system because it adds time to the paging processes (additional disk access to
allocate the needed sectors, update any allocation, and free sector tables used by the
various file systems). Another result of this behavior is fragmentation, causing the file
to exist on many areas of the disk (the initial page file is created using contiguous disk
space).
- A quick way to tell if your system is struggling for memory is to call up WINMSD.EXE
(located in %System Root%\system32) and look at the Memory dialog.
It details the total memory in your system, the current available memory ready for
allocation to applications you may start, available space within your page file, and the
Memory Load Index. The Memory Load Index specifies a number between 0 and 100 that gives a
general idea of current memory utilization, in which 0 indicates no memory use and 100
indicates full memory use. This dialog is built with a call to the Microsoft Win32®
application programming interface GlobalMemoryStatus() in the SDK.
- The counter "Memory Available Bytes" displays the amount of free
physical memory. If this counter stays consistently below 1 MB on servers and 4 MB on
workstations, paging is occurring and performance is less than optimal.
- "Memory Committed Bytes" displays the size of virtual memory (in bytes)
that has been committed (as opposed to simply reserved). If this counter is greater than
the amount of main memory, it indicates that main memory MAY not be large enough to
accommodate all functions of all currently active processes-some paging MAY be inevitable.
However, before making such an assumption, you should check "Memory Pages/sec"
and "Memory Page Faults/sec." If the "Memory Pages/sec"
is greater than 10 (10 is a reasonable guideline, but varies with disk hardware) and
"Memory Page Faults/sec" is greater than "Memory Cache Faults/sec"
then you are paging too much.
- If you are trying to determine if adding more memory to your system will benefit your
Microsoft SQL Server system, then you may want to monitor the "SQLServer Cache Hit
Ratio" while the system is under a typical load. If the hit ratio is relatively
high (over 90%), adding more memory will usually not be beneficial. This is because
additional memory can mainly be used for additional Microsoft SQL Server data cache,
thereby increasing the hit ratio. In this case, the hit ratio is already high, and the
maximum available improvement quite small. If the hit ratio is consistently lower than
this, adding more memory may improve the hit ratio and thereby performance, if the
locality of reference is such that it can be "bracketed" by economically or
technically feasible amounts of memory.
- When "Memory Committed bytes" approaches the "Memory Commit
Limit"-and the page file has already reached maximum page file size, there are
simply no more pages available, in main memory or in the page file. The "Memory Commit
Limit" is the amount of virtual memory that can be committed without extending
the page file. If this occurs on a server running Windows NT Server, you may experience 3
errors in the Event Log. (EVENTVWR.EXE is located in the Administrative Tools group). They
are from the source: SRV.
- 2020: The server was unable to allocate from the system paged pool because the pool was
empty.
- 2001: The server was unable to perform an operation due to a shortage of available
resources.
- 2016: The server was unable to allocate virtual memory.
If this occurs, it is
generally related to a memory leak in another process. To determine the process at fault
you can monitor each process's Page File bytes or Working Set.
- Another condition you may want to be aware of is the following nonpaged pool
error in the server's Event Log:
- 2019: The server was unable to allocate from the system nonpaged pool because the pool
was empty.
Nonpaged pool pages cannot be paged out to the paging file, but instead
remain in main memory as long as they are allocated. NonPagedPoolSize is calculated using
complex algorithms based on physical memory size. However, you can use the following
formulas to 'approximate' these values for an X86-based computer.
MinimumNonPagedPoolSize = 256K
MinAdditionNonPagedPoolPerMb = 32K
DefaultMaximumNonPagedPool = 1 MB
MaxAdditionNonPagedPoolPerMb = 400K
PAGE_SIZE=4096
NonPagedPoolSize = MinimumNonPagedPoolSize +
((Physical MB - 4) * MinAdditionNonPagedPoolPerMB)
Example. On a 32 MB x86-based computer:
MinimumNonPagedPoolSize = 256K
NonPagedPoolSize = 256K + ((32 - 4) * 32K) = 1.2 MB
MaximumNonPagedPoolSize = DefaultMaximumNonPagedPool +
((Physical MB - 4) * MaxAdditionNonPagedPoolPerMB)
If MaximumNonPagedPoolSize < (NonPagedPoolSize + PAGE_SIZE * 16),
then MaximumNonPagedPoolSize = (NonPagedPoolSize + PAGE_SIZE *16)
Example. On a 32 MB x86-based computer:
MaximumNonPagedPoolSize = 1 MB + ((32 - 4) * 400K) = 12.5 MB
You can monitor the system's nonpaged pool allocation with the "Memory Pool Non
Paged Bytes" counter. If there is a shortage of nonpaged pool, you may also see
the following error on a remote system or even the local system:
- Not enough storage available to process this command.
If this occurs, start looking
at each process's nonpaged pool allocation. This is generally caused by an application
incorrectly making system calls and using up all allocated nonpaged pool.
- If you are concerned that one application is consuming a great deal of memory (paged or
nonpaged) then you may want to use a utility such as the Win32 Software Development Kit
utility PMON.EXE (this is also included in the Windows NT Resource Kit volume 3 utilities)
to monitor its load on the system. At the top of the PMON display you see some system
global statistics: memory size and available bytes, the virtual memory commitment, and
pool sizes. Then, for each process, PMON shows processor usage during the last update
interval. The next column is total processor time. The third column is how many pages each
process is using, and then the change since the last update. PMON also shows how many Page
Faults have occurred in the process and the change since the last update. Next is the
virtual memory commitment charge, and then the pool usage estimates for the process.
Finally you see process priority and the number of threads. There's nothing here that is
not in Performance Monitor (you could get the same information by looking at such counters
as "Process Page Faults/sec"), but it is a very handy overview and is
quicker to start up, as well as being "preconfigured" to show you the system at
a glance. Here is how it looks:
What You Can Do
- Schedule memory-intensive applications during off-peak hours. You can use the AT
scheduler that ships with Windows NT.
- Distribute memory-intensive applications/processes across multiple machines.
- Add more memory. To determine ABOUT how much memory to add, use the following formula:
"Paging
File % Usage MAX" * Page file size = number of bytes used
Add together the bytes used for all page files. This is the amount of memory that would
need to be added to allow all of the applications to perform their operations with minimum
paging. For example, if your page file is 100 MB and the % Usage MAX is 20%, then you
would need 20 MB additional RAM to have a system that does minimal paging. The reason this
formula only gives you an idea ABOUT how much memory to add is that a) not all page file
"in use" code is accessed all of the time; and b) the formula ignores the
requirements for code and mapped files not backed by the paging file. Therefore this
estimate is neither an upper bound, nor a lower bound-it is only an
"indication." The truth is that there is no good way to know how much memory to
add at this time. A more accurate way to measure the amount of memory an application would
require is to run the application on a very large machine and measure the needs under some
slight memory pressure. (There is a tool in the Windows NT Resource Kit volume 3 utilities
called Response Probe that can aid in this area.)
Gotcha. Adding memory without upgrading the secondary cache size sometimes
degrades processor performance. This is because the secondary cache now has to map the
larger memory space, usually resulting in lowered hit rates in the cache. This slows down
processor-bound programs because they are scattered more widely in memory after memory has
been added. (Secondary cache refers to the physical cache memory chip(s) usually located
on the motherboard, as opposed to within the processor itself. If the future, processors
will be built with secondary cache on the same substrate as the processor chip, or even
within the processor chip itself.)
- If you determine that a great deal of memory is being consumed by an application for
which you have the source code, you may want to investigate tuning the application to be
less memory intensive. Good tools to use to profile your applications' memory allocation
are the Working Set Tuner and the VADUMP tools in the Win32 SDK.
- Spreading paging files across multiple disk drives and controllers generally improves
performance as multiple disks can process I/O requests concurrently. After all, you can
have up to 16 separate page files. Also, since Windows NT has several system files that
are frequently accessed, you may want to experiment with locating the paging file on one
disk and the Windows NT system files on another. You should also locate the page file(s)
on separate disk(s) from application files to allow for page file I/O and application file
I/O to occur concurrently. This will only work if the disk driver(s) and controller(s)
used can accommodate asynchronous I/O requests. Keep in mind that most IBM-compatible
"non-super servers" have an ATDISK as the default and the ATDISK driver can have
only one I/O request pending at a time. If your system mixes high-speed disks and
low-speed disks, use the fastest disks for all your paging.
- Use the Control Panel | System | Virtual Memory and set the page file size such that
extension of it will rarely occur.
- Use the Control Panel | Services to turn off unnecessary Windows NT services, and
Control Panel | Network to uninstall any unnecessary Windows NT device drivers. This can
free up both CPU and memory.
- User accounts are stored in a registry hive, which means each account consumes paged
pool on a Primary Domain Controller or Backup Domain Controller. Therefore the limit on
the number of user accounts depends on the amount of memory and swap file space in your
PDC and BDCs. User accounts take about 1K each, so 10,000 is about 10 MB. You may want to
consider a second domain (possibly a different domain model) if you have more than 15,000
user accounts. However, the only answer may be to add more memory.
- Some machines provide the ROM BIOS shadowing option. While this feature provides an
advantage with MS-DOS, it is NOT an advantage with Microsoft Windows NT. ROM BIOS
shadowing is the process of copying the BIOS from ROM into RAM and using either hardware
or 386 enhanced mode to remap the RAM into the normal address space of the BIOS. Because
reading RAM is much faster than reading ROM, BIOS-intensive operations are substantially
faster. For example, MS-DOS uses the BIOS to write to the screen; therefore, with ROM BIOS
shadowing, directory listings run more quickly. Windows NT does not use the BIOS (except
during startup); therefore, no performance is gained by shadowing. If ROM BIOS shadowing
is not used, more RAM is available. With Windows NT, there is an advantage to disabling
the ROM BIOS shadowing option. This applies to other BIOS shadowing schemes as well.
Typically the CMOS settings allow the system to shadow any BIOS. This includes the
following: System BIOS, Video BIOS, Other adapters ROM BIOS (in a given select range).
Tuning for "Processor" Performance
A processor (running at a given clock speed) can execute a set number of instructions
per second. Therefore, if a processor is switched among multiple threads that all have
work to do, a given thread will take x (x being the number of simultaneously
executing threads) times longer to complete a given task.
There are times when a thread has no work to do, such as when waiting for user input,
or when waiting for another thread to finish a related operation. As long as the thread is
in this waiting state, it will not be scheduled for execution and, thus, does not take up
any CPU time. Since most Microsoft Windows®-type applications spend a considerable amount
of time with their threads in this waiting state, there may be little performance
degradation when running multiple Windows-based applications.
Some applications are considered CPU intensive. A CPU-intensive application almost
always has work to do and spends very little, if any, time in the waiting state. For
example, the following C program consumes 100% of the CPU. When additional applications
are started, their performance, and that of the CPU-intensive application, will be less
than optimal since all must share the processor's time. This is an example of how NOT to
write an application; a better approach would be to create an event or wait on a
semaphore.
main(){
while(1){}
}
The figure below shows the application's utilization of the CPU.
What to Watch
If the "Processor % Processor Time" counter consistently registers at
or near 100%, the processor may be the bottleneck. ('System % Total processor
time" can be viewed for multiprocessor systems.) If this occurs you need to
determine WHO or WHAT is consuming the CPU. To determine which process is using up most of
the CPU's time, monitor the "Process objects % Processor Time" for all of
the process instances (as in the previous figure).
- You can tell if the CPU activity is due to applications or to servicing hardware
interrupts by monitoring "Processor Interrupts/sec." This is the number
of device interrupts the processor is experiencing. A value over 1000 should cause you to
look at the efficiency of hardware I/O devices such as the disk controllers and network
cards.
- You can also monitor "System System Calls/sec." Systems Calls/sec is
the frequency of calls to Windows NT system service routines. These routines perform all
of the basic scheduling and synchronization of activities on the computer and provide
access to nongraphical devices, memory management, and name space management. If there are
many more interrupts per second than system calls, it could indicate that a hardware
device is generating an excessive number of interrupts.
- Monitor the "System Context Switches/sec" as well. Too frequent context
switching can be caused if semaphores or critical sections (see the Windows NT SDK for
more information) are placed at too low a level in order to attain high concurrency. The
only way to solve this problem is to re-evaluate the priority place on the source code.
What You Can Do
- Schedule CPU-intensive applications during off-peak hours. You can use the AT scheduler
that ships with Windows NT.
- If you have control over the application source, you may want to investigate tuning the
application to be less CPU intensive. There are a number of tools available with the
Windows NT SDK that allow you to do this, such as WAP (Windows API Profiler), CAP (Call
Attributed Profiler), FIOSAP (File I/O and Synchronization Win32 API Profiler), and Win32
API Logger.
- Distribute applications and processes across multiple machines.
- Upgrade the processor if possible. Keep in mind that Windows NT runs on MIPS and Digital
Alpha AXP machines as well as the Intel (386, 486, and Pentium). Most servers are either
file servers or application servers. Even though they use the same operating system each
uses the machine's resources in a different way. A file server generally maximizes system
bus utilization and under-utilizes the processor. A 486 clock doubler chip in this machine
would not provide a big performance enhancement over a typical 486 chip. An application
server (such as a database server running Microsoft SQL Server and Systems Management
Server), however, utilizes the processor subsystem significantly more than the file
servers. You will find that this is the environment where a more powerful CPU chip will
pay off.
- If you are in a situation where you are trying to determine if moving to a RISC
processor will increase performance, you should look at the counter "System Context
Switches/sec." This is the rate of switches from one thread to another. Moving to
a RISC machine will only be a good idea if the Context Switch rate is NOT
dominating processor activity.
- Add more processors assuming there is more than 1 thread capable of asynchronous
execution. If you have a multiple processor computer, Windows NT will assign separate
threads to different processors (interrupts are also distributed). The thread execution
load is then distributed across the multiple processors. For example, if a CPU-intensive
thread is executing on processor A, processor B will be free to process other threads.
- Upgrade the secondary cache. In this same regard, you may consider upgrading the CPU to
a chip with a 16K First Level cache such as a 486 DX4/100 (Unified Instruction and data
cache) or a Pentium (8K data cache and 8K instruction cache).
- Assuming you have at least a 486, if you are in a server environment, part of your
problem may be the network or disk adapter cards you have chosen. 8-bit cards use more
processor time than 16-bit or 32-bit cards. The number of bits here refers to the amount
of data moved to memory from the adapter on each transfer. The most efficient cards use
32-bit transfers to adapter memory or direct memory access (DMA) to move their data.
Adapters that don't use memory-mapped buffers or DMA must use processor instructions to
move data, and that makes the processor busy. DMA uses the memory bus, and that can slow
the processor down but it is still more efficient than individual instructions. There is
more information on this topic in the "Tuning for Disk Performance" section of
this document. Keep in mind while reading this section and the "Disk
Performance" section that replacing PIO devices will almost always reduce processor
bottlenecks.
- In a resource-sharing environment, a greater improvement can be found by upgrading to a
faster processor rather than increasing the number of CPUs. In a client-server
environment, the addition of another CPU will typically give a better performance increase
than upgrading to a faster or more advanced processor because of the multithreaded design
of all Microsoft BackOffice products.
- Each application (as well as each thread) in the system has a set priority. You can
control the priority system-wide by changing the following in Control Panel | System |
Tasking.
Use this dialog box to change the relative responsiveness of applications that are
running at the same time. When more than one application is running in Windows NT, by
default the foreground application receives more processor time, and so responds better,
than applications running in the background. (You can also use the Windows NT SDK utility
PVIEW to set individual application priorities.)
You may also use the START command to alter the priority of a program as it is started.
This command can take /low, /normal, /high, and /realtime switches to start programs with
varying levels of priority.
Gotcha. Never start processor-bound applications at real-time priority.
Considerations for 16-Bit Applications
- You can monitor the performance of 16-bit MS-DOS-based applications, however they are
difficult to identify as instances because the program name does not appear. This is
because each MS-DOS-based application shows up in its own Virtual DOS Machine (NTVDM). You
would have to look at the individual threads (that is, "Thread Processor Time")
for the NTVDM.EXE application. An easy way to identify the thread associated with the
application you want to monitor is to stop all other 16-bit MS-DOS-based applications and
choose the remaining thread. Another way to identify the application is to copy the
NTVDM.EXE process to another name and editing the following path in the Registry:
SYSTEM\CurrentControlSet\
Control\WOW\
cmdline
16-bit Windows-based applications execute in one NTVDM by default, but can be started
in separate NTVDMs.
- If you are not satisfied with the performance of your MS-DOS-based applications running
on Windows NT Workstation, try full-screen mode. In full-screen mode, most applications
can run with native performance directly on the installed video adapter. Windows maps VGA
memory to the appropriate place in the VDM and maps the relevant registers from the
application to the video adapter. To get in and out of full-screen mode, press ALT+ENTER.
- When running MS-DOS or Windows version 3.1, serial communications applications that
directly access serial port hardware, you may enhance performance of these applications by
using software handshaking (xon/xoff) instead of hardware handshaking (cts/rts). Because
hardware must be virtualized under Windows NT, checking the cts/rts signals directly will
incur an unavoidable performance degradation. Using xon/xoff handshaking avoids this
problem since xon/xoff handshaking does not require accessing the serial port hardware
directly.
Tuning for "Disk" Performance
As you might have guessed, disk performance is the single most important aspect of I/O
performance. It affects many other aspects of system performance. Good disk performance
enhances virtual memory performance and reduces the elapsed time required to load programs
that perform a great deal of I/O, and so on.
If you discover a disk bottleneck, the first thing you need to determine is whether
it's really more memory that you need. If you are short on memory, you will see the lost
performance reflected as a disk bottleneck.
Gotcha. Because disk counters can increase disk access time by
approximately 1.5% on a 386/20, Windows NT does not automatically activate these counters
at system startup. To activate disk counters, type diskperf -y at the command
prompt and restart the computer. On a 486 or better system, the hit is not apparent.
What to Watch
- If the "Physical Disk object's % Disk Time" counter consistently
registers at or near 100%, the physical disk is the bottleneck. This counter is the
percentage of elapsed time that the selected disk drive is busy servicing read or write
requests, including time waiting in the disk driver queue.
- If "Physical Disk Disk Queue Length" (pending disk I/O requests) is
greater than 2, it generally indicates significant disk congestion. (Note: This same
rule applies to most all I/O devices.)
- Determine the portion of the disk I/O used for paging with the following function
"% disk time used for paging = 100 * ('Memory Pages/sec' * 'PhysicalDisk Avg,DiskSec/Transfer')".
If this is more than 10% of the total disk activity then paging is excessive. Avg. Disk
sec/Transfer is the time in seconds of the average disk transfer. This formula does
not include the case where you may be paging over the network.
What You Can Do
- Install a faster disk and/or controller. Determine if the controller card does 8-bit,
16-bit, or 32-bit transfers. The more bits in the transfer operation, the faster the
controller moves data. You may also want to choose a different drive technology. IDE
(integrated drive electronic) has a 2.5 MB/s throughput, ESDI has a 3 MB/sec, SCSI-2 has a
5 MB/s throughput, and a Fast SCSI-2 has a 10 MB/sec throughput.
- Create mirrored data sets. The I/O system can issue concurrent reads to 2 partitions.
The first portion of the read will be to partition A, while the next portion of the read
will be to partition B. (Assuming the disk driver and controller can handle asynchronous
I/O).
- Create striped data sets. Multiple disks (between 3 and 32)can process I/O requests
concurrently (assuming the disk driver and controller can handle asynchronous I/O).
- Add memory (RAM) to increase file cache size.
- Change to a different I/O bus architecture. EISA, MCA, and local bus (VESA or PCI) buses
transfer data at a much higher rate than ISA buses. PCI is fast because it transfers data
at 33 MHz, a double word at a time (33 MHz * 4 = 132 Mb/sec) whereas ISA maxes out at
about 5 Mb/sec and EISA about 32 Mb/sec (EISA transfers at 8 MHz * 4 bytes). There has
been talk about raising the PCI clock rate to 66 MHz (to get a 264 Mb/sec transfer rate)
but most manufacturers are resisting the idea (at about 50 MHz or so, getting past FCC
class B certification is a nightmare).
- When choosing a I/O device such as a disk adapter, consider the architecture of the
card. For example here are some of the points to consider about each architecture:
- PIO: PIO (programmed I/O) requires intervention by the CPU. For example, the
Adaptec 1522 is a PIO device and can do either 16-bit PIO or 32-bit PIO. However,
CPU-usage is quite intensive (30-40%) and it will slow down your system during a large
transfer or a CD-ROM access. As such, most high-performance systems don't use a PIO device
because they adversely impact system throughput. BYTE magazine did a comparison of
Adaptec 2940 (PCI) against a Future Domain adapter (PIO). While the Future Domain and
Adaptec 2940 provide almost identical benchmark results, the Future Domain consumes a
hefty 40% of CPU time whereas the 2940 does not. However, all PIO devices are much cheaper
to manufacture- the FD is about half the price of the 2940. Another thing to keep in mind
is that the standard ATDISK disk (most IDE drives) does PIO.
- DMA: ISA DMA has only 24-address lines so it can physically address 16 MB.
However, if you happen to have 32 MB of RAM, the OS can see all of the memory. Therefore,
if the OS wants to transfer a block of memory (which happens to be located at memory
location above 16 MB, which the ISA DMA card, such as the Adaptec 1542C, cannot physically
see), it will have to copy that block down to an area in the 0-15 MB range (where the
Adaptec 1542c can see) so the 1542C can initiate the DMA transfer (double buffering). This
copying down to 0-15 MB range and also copying up (16 MB and up) takes quite a bit
of time (using Intel repsb, repsw, repsd) so that explains the slow down. However,
you don't have that problem with either VL, PCI, or EISA as they all have 32-bit DMA
address lines and can physically see up to 4 GB. PIO devices can see all of the memory,
including those above 16 MB. The only problem is that it takes the processor to do any
kind of data transfer. The last thing to keep in mind is that some devices do both PIO and
DMA. If your system is not an ISA computer WITH more than 16 MB of RAM, you should always
run with the controller in DMA mode.
- Bus Master: Bus master devices have their own intelligence and offload this work
from the CPU. The CPU can resume doing its own thing while the bus-master device is doing
all the I/O. When it's done, it hands the result to the CPU. These cards are by far the
best solution.
- Gotcha. Make sure that you check the Windows NT Hardware Compatibility List
before you purchase a controller. This will tell you if the controller is supported by
Microsoft and has a certified driver.
- On a 2 SCSI disk daisy-chained system, the SCSI controller has more of an impact on your
total performance than your disk drive. You would be better off buying a slower, cheaper
disk and investing in a better SCSI controller.
- Adding more physical drives in a RAID 5 configuration can result in significant
performance improvements when the disk subsystem is the bottleneck. However, adding more
controllers usually does not significantly improve performance. When using
high-performance disk controllers, the physical drive access times are usually the
performance limiting factor for the disk subsystem.
- Choose a disk with a low seek time (the time required to move the disk drive's heads
from one track of data to another). The ratio of time spent seeking to time spent
transferring data is usually 10 to 1, and often much higher.
- Distribute the workload as evenly as possible among different disk drives. This will
allow you to take full advantage of the system's I/O bandwidth. For example, if you have
one user population that does a great deal of reads and writes to directory
\\server\ExcelData and another user population that does a great deal of reads and writes
to a directory \\server\WordData then you may want to consider putting the ExcelData
directory on a different disk and/or controller than the WordData directory. You can take
advantage of the auditing facility of Windows NT and the NTFS file system to track how
certain network files are being used. User Manager lets you enable file access auditing,
and File Manager lets you specify the users and files whose access you want to record.
- If you choose a FAT file system, with time it tends to become fragmented. As the file
system becomes full, pieces of files tend to be scattered over the disk; the system cannot
find enough contiguous blocks to store a new file in one place, so it must fit the file in
empty spaces between other files. As files are added, deleted, truncated, and expanded,
the file system becomes increasingly disorderly. Performance suffers because the disk
drive cannot read a file with a sequential group of operations. Instead, it must
constantly seek for different pieces of the file. To avoid fragmentation, use a Defrag
utility, such as Executive Software's DiskKeeper, to adjust files in a sequence.
- NTFS is best for use on volumes of about 400 MB or more. This is because performance
does not degrade with larger volume sizes under NTFS as it does under FAT. As the size of
the volume increases, performance with FAT will quickly decrease. When using the FAT file
system, the disk space taken by files is more than the space taken when using NTFS. FAT
file system uses clusters to allocate disk space for files. Clusters are the smallest
allocation units that the file system uses to allocate space for the files. For example,
for a 1-byte file, 1 cluster will be allocated, thus wasting all of the unused space. When
a large number of small files are stored on a FAT partition, the cluster size may tend to
waste a large amount of disk space. The cluster size is dependent on the size of the
logical drive. FAT can only track a maximum of 64K clusters since there are 64K entries in
the File Allocation Table. That would indicate that the cluster size will increase for
large drives, in order to access the whole drive. The maximum cluster size is 64K, thus
making the largest logical drive size to be 4 gigabytes. With NTFS there is a limit,
however it's 264.
- Disabling short name generation on an NTFS partition will greatly increase directory
enumeration performance especially in the case where individual directories contain a
large number of files/directories with non-8.3 filenames. To disable short name
generation, use REGEDT32.EXE to set a registry DWORD value of 1 in the following Registry
location:
SYSTEM\CurrentControlSet\
Control\Filesystem\
NtfsDisable8dot3NameCreation
Gotcha. This may cause compatibility problems with 16-bit MS-DOS- and
Windows-based applications.
Tuning for "Network" Performance
Network performance problems can have basically three forms, each of which cause the
network protocol to have to transmit each block of data many times (or error out) causing
performance problems.
- A server can be overloaded
The server is being asked to do more than it can based on
an inadequate resource, possibly from a lack of another resource such as memory.
- A network can be overloaded
The amount of data that needs transferred is greater than
the capacity of the physical medium.
- A network can lose data integrity
The network is faulty and intermittently transfers
data incorrectly.
- We can examine each of these problems from the perspective of the OSI (Open Systems
Interconnect) networking model. From the application layer's point of view, there are the Server
service and Workstation (redirector) service components as well as other
application layer support entities such as Netlogon, Replicator, and other
services. From the Transport layer's point of view, there are the transport components
such as TCPIP, NetBEUI, NWLINK, and so on. From the Datalink/Physical layer's point
of view there are the Adapter cards and NDIS drivers.
Gotcha. The
following section details many registry entry changes. Let us note that the "out of
the box" settings within the registry should allow you to have a well-balanced
system. If you alter a setting, it may actually reduce the bottleneck, however it may also
create another problem. Set parameters with care. If you do have a problem, use the
"Last Known Good" option during system initialization to revert to an unchanged
registry.
Server
The Windows NT Server service's responsibility is to establish sessions with remote
stations and receive SMB (Server Message Block) request messages from those stations. (SMB
requests are typically used to request the Server service perform I/O-such as open, read,
or write on a device or file located on the Windows NT Server station).
- You can configure the Windows NT Server service's resource allocation (and associated nonpaged
memory pool usage) by using the Control Panel Network application. When you use the
Control Panel Network application to configure the Windows NT Server service software, you
are presented with the following Server Optimization Level dialog:
You may want to consider a specific setting, depending on factors such as how many
users will be accessing the system and the amount of memory in the system. The amount of
memory allocated to the Windows NT Server service (for such resources as InitWorkItems,
MaxWorkItems, RawWorkItems, MaxPagedMemory, MaxNonPagedMem, ThreadCountAdd,
BlockingThreads, MinFreeConnections, and MaxFreeConnection) differs dramatically based on
your choice.
- The "Minimize Memory Used" level is meant to accommodate up to 10
remote users simultaneously using Windows NT Server.
- The "Balance" option is for up to 64 remote users.
- The "Maximize Throughput for File Sharing" is for 64 or more remote
users. With this option set, file cache access has priority over user application access
to memory (the value of LargeSystemCache in the registry changes to 0x1). Use this
option if you are using Windows NT Server for file server capabilities. This is the
default setting!
SYSTEM\CurrentControlSet\Control\
Session Manager\Memory Management\
LargeSystemCache
- The "Maximize Throughput for Network Applications" is for 64 or more
remote users. However, with this option set, users' application access has priority over
file cache access to memory (the value of LargeSystemCache in the registry changes
to 0x0).
If the Windows NT Server service runs out of a resource due to one of these
settings, you will see the following error in the Windows NT Event Log:
- 2009: Server could not expand a table because the table reached the maximum size.
- Not enough server storage is available to process this command.
- If the "Server Work Item Shortages" or "Server Pool
Paged/Nonpaged Failures" are consistently increasing, or if "Server
Context Block queue Time" (the average time, in milliseconds, a work context
block sat in the server's queue waiting for the server service to act on the request)
consistently averages greater than about 50 (ms), the server service is acting as a
bottleneck for all tasks, on remote stations, that are issuing remote I/O requests to the
server. This may be the fault of the Windows NT Server service optimization level, or it
may be the fault of other bottlenecked resources (disk, CPU, and memory) on which the
Windows NT Server service depends. A WorkItem is the location where the server stores an
SMB. The amount of WorkItems that are available fluctuates between a minimum value
(InitWorkItems) and a maximum value (MaxWorkItems). The initial value and maximum value
are configured based on Server Optimization level and the amount of memory in the machine.
If WorkItem shortages are occurring, it may be caused by an overloaded server. You may
want to consider identifying and off-loading some of the server's
"resource-consuming" tasks.
- Monitor the "Server Pool paged failures" and "Server Pool
nonpaged failures." If they are occurring then the server is running out of the
paged/nonpaged pool it originally allocated. If this occurs you may want to
consider increasing the resource using the following parameters:
SYSTEM\CurrentControlSet\Services\
LanmanServer\Parameters\
MaxNonPagedMemoryUsage
and
SYSTEM\CurrentControlSet\Services\
LanmanServer\Parameters\
MaxPagedMemoryUsage
You will also experience one of the following errors in the system Event Log:
- 2017: The server was unable to allocate from the system nonpaged pool because the server
reached the configured limit for nonpaged pool allocations.
- 2018: The server was unable to allocate from the system paged pool because the server
reached the configured limit for paged pool allocations.
- 2019: The server was unable to allocate from the system nonpaged pool because the pool
was empty.
- 2020: The server was unable to allocate from the system paged pool because the pool was
empty.
This is more than likely being caused by lack of memory in the system. If this
occurs you should refer to the "Memory" section of this paper.
There are similar paged/nonpaged values for the Macintosh file server service.
The "MacFile PagedMemLimit" specifies the maximum amount of page memory
that the Macintosh file server can use. Performance of the Macintosh file service
increases with an increase in this value. However, the value should not be set lower than
1000K. It is especially important that you are well acquainted with memory issues before
changing this resource parameter. You cannot change this value from Server Manager.
SYSTEM\CurrentControlSet\Services\
MacFile\Parameters\
PagedMemLimit (default = 20000 decimal REG_DWORD)
The "MacFile NonPagedMemLimit" specifies the maximum amount of RAM
that is available to the file server for Macintosh. Increasing this value helps
performance of the file server but decreases performance of other system resources.
SYSTEM\CurrentControlSet\Services\ MacFile\Parameters\ NonPagedMemLimit
(default = 4000 decimal REG_DWORD)
- If Other (nonserver service) processes are competing with the server for processor time,
you may want to consider increasing the server's worker threads priority.
SYSTEM\CurrentControlSet\Services\
LanmanServer\Parameters\
ThreadPriority (default =1 REG_DWORD)
The server threads by default run at "foreground process priority." Other
threads in the system service run at "foreground process priority + 1" such as
the XACTSRV threads (the service responsible for supporting remote API requests from
Microsoft LAN Manager local area network software version 2.x stations). Since the XACTSRV
is used to process printing requests, a file server that is also a print server may suffer
from server thread starvation because the server threads are at a lower priority than the
XACTSRV threads. In this case it makes since to increase the servers ThreadPriority to 2.
Gotcha. Do not increase the priority beyond 2, or the system may not respond
normally to other activity.
Another alternative is to drop the priority of the Spooler (it runs at 9 by default on
NT 3.5 Server). You can do this with the PriorityClass parameter in the registry. It is
located in the following location:
SYSTEM\CurrentControlSet\
Control\Print\
PriorityClass (default=0 REG_DWORD)
You can verify the priority with the PVIEWER.EXE application in the Windows NT Resource
Kit. The figure below details the priority with the default setting. If you change the
value in the registry and do a 'net stop spooler' then a 'net start spooler' at the
command line the priority will update.
- If you see the following event occur in the System log "2001: The server was unable
to perform an operation due to a shortage of available resources... with the following
included in the hex information 000c0000 005c0001," increase the following:
SYSTEM\CurrentControlSet\Services\
LanmanServer\Parameters\
MinFreeConnections No matter how few connections are actually established, the
server will make sure that there are at least "MinFreeConnections"
preinitialized, unused connection blocks ready to be used for a new connection. This value
is 2 if you set the server to "Minimize Memory Used" or "Balance" and
4 if you select "Maximize Throughput...."
- If you are limited on hardware resources and want to limit the number of users that can
be simultaneously logged on to a server you can manipulate:
SYSTEM\CurrentControlSet\Services\
LanmanServer\Parameters\
Users
- Since each server connection does take up some amount of memory, you may want to
consider tuning the Autodisconnect parameter. This parameter sets the time interval after
which inactive connections are terminated if no open files on the connection exist. This
will free up a small amount of the server's resources to accommodate active users.
SYSTEM\CurrentControlSet\Services\
LanmanServer\Parameters\
Autodisconnect (default=15 min.)
Workstation (Redirector)
When applications or users issue Connect, Open, Read, or Write requests on path-names
that reference a redirected drive (net use z: \\server\share), the request is forwarded to
the local Windows NT redirector. The redirector then packages up the request and forwards
it down to the transport (TCP/IP, NBF, or NWLINK) and out onto the wire to be picked up by
a server. So, as you can see, a great deal of the redirector's network performance is tied
directly to how well the server responds to its requests. However, there are a few issues
to be aware of on the redirector side.
- "Redirector Current Commands" counts the number of requests to the
Redirector that are currently queued for service. If this number is much larger than the
number of network adapter cards installed in the computer, then the network(s) and/or the
server(s) being accessed are seriously bottlenecked. To try to compensate for the problem
locally, you could increase the maximum allowed pending network commands if the redirector
application I/O request queue is backed up by increasing:
SYSTEM\CurrentControlSet\Services\
LanmanWorkstation\Parameters\
MaxCmds (default = 5)
- If you see "Redirector Network Errors/sec" then SMB requests are timing
out, forcing the redirector to disconnect, reconnect, and recover. If this is occurring,
you may need to increase the:
SYSTEM\CurrentControlSet\Services\
LanmanWorkstation\Parameters\
SessTimeout (default = 45 sec REG_DWORD)
This specifies the maximum amount of time that the redirector allows an operation that
is not long-term to be outstanding.
- Increase the redirector's thread count if the redirector can't accommodate overlapped
I/O requests. For example, the WriteFileEx() WIN32 function may fail, returning the
messages ERROR_INVALID_USER_BUFFER or ERROR_NOT_ENOUGH_MEMORY if there are too many
outstanding asynchronous I/O requests.
SYSTEM\CurrentControlSet\Services\
LanmanWorkstation\Parameters\
MaxThreads (default=17)
- If you have more than 1 redirector loaded on your Windows NT Workstation (for example,
Client Services for NetWare, and so on), consider the order of providers. When a WNet API
is called, it routes the call to the first provider DLL (dynamic-link library) in the
"ProviderOrder" and then waits for this provider to return before submitting it
to the next provider. You can see the provider order by looking in the Network Control
Panel and pressing the Network button
or, if you are interested in the value on a remote machine, you can use the registry
editor (REGEDT32.EXE) and view the following registry entry:
SYSTEM\CurrentControlSet\Control\
NetworkProvider\
Order:ProviderOrder
- There is a new SMB that is now supported under Windows NT called NtTransact_NotifyDirectoryChange.
This allows an application to know when a directory structure has been updated on the
server. If an application causes one of these SMBs to be submitted, RAW SMB I/O cannot be
accomplished. (Note: RAW I/O is much faster than CORE I/O. However, it must have the
session's full attention. Since there is an outstanding request on the session, RAW cannot
be accomplished.) Windows NT File Manager causes one of these SMBs to be submitted if you
are focused on a redirected drive. This can cause a slowdown on large reads and writes
from other applications. You can shut this feature off for File Manager in the registry by
adding the following value:
SOFTWARE\Microsoft\File Manager\ Settings\ ChangeNotifyTime
(default=0 REG_SZ)
Netlogon
One of the primary jobs of Netlogon is to keep the user account database in sync on all
of the backup domain controllers with the primary domain controller.
- Increase Netlogon service update notice periods on your Primary Domain Controllers, as
well as the server announcement period if you are concerned with the amount of maintenance
traffic the Windows NT Server is creating and the load on the primary domain controller.
Value
Name Default Value Minimum Value Maximum Value
PulseConcurrency 20 1 500
Pulse 300 (5 minutes) 60 (1 minute) 3600 (1 hour)
Randomize 1 (1 second) 0 (0 seconds) 120 (2 minutes)
- Pulse defines the typical pulse frequency (in seconds). All User/Security account
database changes made within this time are collected together. After this time, a pulse is
sent to each BDC needing the changes. No pulse is sent to a BDC that is up-to-date.
- Randomize specifies the BDC back off period (in seconds). When the BDC receives a
pulse, it will back off between zero and Randomize seconds before calling the PDC.
- PulseConcurrency defines the maximum number of simultaneous pulses the PDC will
send to BDCs.
- Netlogon sends pulses to individual BDCs. The BDCs respond by asking for any database
changes. To control the maximum load these responses place on the PDC, the PDC will only
have PulseConcurrency pulses "pending" at once. The PDC should be sufficiently
powerful to support this many concurrent replication RPC calls (related directly to server
service tuning as well as the amount of memory in the machine). Increasing
PulseConcurrency increases the load on the PDC. Decreasing PulseConcurrency increases the
time it takes for a domain with a large number of BDCs to get a user account database
change to all of the BDCs. Consider that the time to replicate a database change to all
the BDCs in a domain will be greater than:
((Randomize/2) * NumberOfBdcsInDomain) /
PulseConcurrency
Transport (NBF, TCP/IP, NWLink, and so on)
The transport drivers function is to transport network data submitted by applications
(such as the redirector, e-mail, Microsoft SQL Server, and so on) to other network
stations. Windows NT ships with a variety of transport drivers such as TCP/IP, NBF
(NetBEUI), and NWLink. All of these transports export a TDI interface on top and an NDIS
(Network Driver Interface Specification) on the bottom. (Windows NT also ships with
AppleTalk and DLC, however, these do not have a TDI interface.)
- If the protocol used on most stations that you will connect to is first in the bindings
list, average connection time decreases. This is because when you request a connection to
shared resources on a remote station, the local workstation redirector submits a TDI
connect request to all transports simultaneously, and when any one of the transport
drivers completes the request successfully, the redirector waits until all higher priority
transports return. In the following figure you will see that the NetBEUI / Intel Ether
Express binding has the highest priority.
- Each transport has its own way of doing windowing (typically the amount of packets sent
before an acknowledgment is required). By increasing the window size, you can send more
packets to the other side before you have to wait for an acknowledgment. This can have a
slight increase in performance (less packets = less I/O), however, it can also increase
the risk of retransmission. This is NOT a recommended practice.
- For NBF you can modify:
SYSTEM\CurrentControlSet\Services\
NBF\Parameters\ LLCMaxWindowSize (default = 10)
This is how many LLC I-frames NBF can send before it must stop and wait for an
acknowledgment.
- For TCP/IP you can modify:
SYSTEM\CurrentControlSet\Services\
Tcpip\Parameters\
TcpWindowSize (default = 8192)
This is the amount of data that can be accepted in a single transaction.
- For NWLINK you can modify 3 entries:
SYSTEM\CurrentControlSet\Services\
NWNBLink\Parameters\
AckWindow (default = 2)
This specifies the number of frames to receive before sending an acknowledgment.
SYSTEM\CurrentControlSet\Services\
NWNBLink\Parameters\
RcvWindowMax (default = 4)
This specifies the maximum number of frames the receiver can receive at one time.
SYSTEM\CurrentControlSet\Services\ NWLink\Parameters\
WindowSize (default = 4)
This specifies the window to use in the SPX packets.
- If you are on an NBF network and have a server on a very slow link, you may want to
consider increasing the following:
SYSTEM\CurrentControlSet\Services\ NBF\Parameters\
DefaultT1Timeout (default = 600 ms; grows dynamically to 10 sec)
he T1 value controls the time that NBF waits for a response after sending a logical
link control (LLC) poll packet before resending it. The default value you specify here is
only used upon link establishment. It is then dynamically changed every 30 seconds.
- If you are on an NBF network and have a server on a very busy link you may want to
consider increasing the following:
SYSTEM\CurrentControlSet\Services\ NBF\Parameters\
LLCRetries (default = 8)
This value specifies the number of times that NBF will retry polling a remote
workstation after receiving a T1 timeout. After this many retries, NBF closes the link.
Physical (Network Adapter)
- If the sum of the "Server Bytes Total/sec" (the number of bytes
the server has sent to and received from the network) is roughly equivalent to the maximum
transfer rate of your network, you may need to segment your network. On an Ethernet
segment this value is ~1.2 megabits per sec., once you include the overhead of the
network.
- An Ethernet segment is shared by every user of every system on the network. Therefore,
it is a relatively limited resource with many users. This situation can be alleviated
somewhat by adding sub-networks, but no matter how complex the network's topology, a
network basically consists of many systems communicating through a single piece of wire.
If one user is accessing a very large file across the network, that user may be slowing
down the network for all users.
- Match adapter to the system bus. If you have a 16-bit bus, use a 16-bit network adapter;
if you have a 32-bit bus, use a 32-bit network adapter.
- Avoid sending from fast adapters to slow adapters.
- If you need to transfer huge amounts of data between different computer systems,
Ethernet may not be the appropriate medium to use; the basic Ethernet cable is limited to
10 megabits per second (considerably less when you include network overhead). Other media
are now available that offer significantly higher sustained transfer rates (FDDI, and so
on).
- The Network Monitor (provided with Systems Management Server) is a very good tool to use
to monitor the general network performance. It offers additional Performance Monitor
counters as well as a few unique statistics from within the application such as:
- % Network Utilization represents what percentage of the network bandwidth is
being used.
- Frames per Second is the number of frames being transmitted on the network per
second.
- Bytes Per Second is the number of bytes being transmitted on the network per
second.
- Broadcasts per Second represents the number of broadcast frames on the network
per second.
- Multicasts per Second represents the number of multicast frames on the network
per second.
- Network Card (MAC) Statistics represents the cumulative total number of frames,
bytes, broadcasts, and multicasts seen on the network by the network card since the
capture has begun.
- Network Card (MAC) Error Statistics indicates the cumulative errors seen from the
network card. These include CRC Errors and frames dropped because of no buffer space as
well as frames dropped because of hardware constraints.
- By sorting the Network Monitor Broadcasts Multicasts column in the Station Statistics
pane (bottom pane), you can find the source(s) of a broadcast storm to see which
machine(s) is/are sending the most Broadcast frames.
- An increase in the amount of Broadcasts/Multicasts per second can relate directly to
machine performance. Each broadcast/multicast causes every card on the net to generate an
interrupt to allow the packet to be passed up to the transport. This can cause serious CPU
utilization problems. As a general rule, a broadcast/multicast rate of over 100/sec should
cause you to investigate a cause as well as a cure. The cure may be as easy as identifying
a jabbering network card or configuring a router to not enable TCP ports 137 and 138.
Note: NBF is not a routable transport.
- % Network Utilization should be considered when things start slowing down to the point
they are no longer acceptable. Some say that this point is around 40-50%.
- Gotcha. In Windows NT 3.5, the counter "Network Segment % Network
Utilization" in Performance Monitor must be monitored at 1 second intervals. This
will be fixed in Windows NT 3.5.1.
- Collisions occur when your system starts sending data at the same time as another system
on the network. When your system detects a collision, it waits a random amount of time and
retransmits the packet. Collisions are normal events and don't indicate hardware problems.
However, the probability of two hosts transmitting at the same time, increases as the
network is more heavily utilized, so collisions are an extremely good indicator of network
load. The number of collisions should be, at most, 15% of the total number of output
packets. The only solution for this problem is to rearrange the network in a way that
reduces traffic. Ethernet networks start to have significant collisions at about 66.67%
utilization, or 833375 bytes per second. You can measure collisions with a tool such as a
Network General Sniffer. Note that Version 1.0 of Network Monitor does not report
collisions.
Capacity Planning
Now that you have your system optimized to where you are very comfortable with its
performance (today), it's time to start collecting data that will help you in the future.
The following counters are a good starting point for resource capacity planning:
Object >Counter(s) Processor % Processor Time, Interrupts/sec
MemoryPages/sec, Cache Faults/sec, Available Pages, Commit Limit, Committed Bytes
Paging File Usage Peak
Physical Disk % Disk Time, Avg. Disk Seconds/Transfer
Logical Disk % Free Space
Redirector Bytes Total/sec, Current Commands
Server Bytes Total/sec, Server Sessions, Pool Paged Peak, Pool Nonpaged Peak, Work Item
Shortages
There is a new service included in the Windows NT Resource kit called DATALOG.EXE. It
will allow you to capture the data and forward it to a data store where it can be gathered
up later to be used for trend analysis.
Once you have identified your system's thresholds based on the data, you will probably
want to set up Performance Monitor Alerts. For example, you may want to set an alert on
the "Physical Disk Free Megabytes" on your file server's logical drives
if it hits a certain threshold, "Paging File % Usage" if it hits 80 or
90%, and "Redirector NetworkErrors/sec."
There is a great deal more information about Capacity Planning, and the issues
surrounding it, in the Windows NT Resource Kit volume 3 by Russ Blake. It details issues
relating to Log concentration and Archiving as well as other important details.
Summary
The motivation behind system tuning is to get the most you can out of the hardware you
already own. If you decide that an upgrade is your only solution, you will find that your
investment in performance tuning pays off. Your work will show you how the system should
be upgraded. If you have done your homework, you will know whether you need more memory,
faster disks, or a completely new processor. However, if you recorded your system's
performance history, you not only did your homework, but you've studied enough to pass the
test because now you can tell about latent demand and system growth.
References
Software Companies You May Want to Investigate
- BSG Systems, Inc.
(617) 891-0000
E-mail best1@bgs.com
- Datametrics
(800) 869-3282
E-mail experts@datametrics.com
- Metron
- Candle Software
- Legent
- SPEC
- BAPCO
- Computer Capacity Management (ICCM) in Phoenix, Arizona.
- StonyBrook Services, Inc., in Bohemia, New York
- Intrak, Inc., San Diego TrendTrak
- Network General Corp., Menlo Park, California, "Reporter"
Literature
- Optimizing Windows NT - Windows NT Resource Kit Volume 3 by Russ Blake (ISBN
1-55615-619-7)
- Windows NT Advanced Server Concepts and Planning Guide
- Capacity Management Review (602-997-7374, $195.00)
- Computer Measurement Group (newsletter)
414 Plaza Drive, Suite 209
Westmont, IL 60559
- High Performance Computing - O'Reilly & Associates
Thanks
- Dan Perry (Microsoft World Wide Training)
- Russ Blake (Microsoft development)
- Reza Baghai (Microsoft development)
- Barry Hicks (JCPenney Capacity Planning)
- Chad, T. Ray, Glenn, Dennis, Rick, Darrel, and Mustafa
© 1995 Microsoft Corporation.
THESE MATERIALS ARE PROVIDED "AS-IS," FOR INFORMATIONAL
PURPOSES ONLY.
NEITHER MICROSOFT NOR ITS SUPPLIERS MAKE ANY WARRANTY, EXPRESS OR
IMPLIED, WITH RESPECT TO THE CONTENT OF THESE MATERIALS OR THE ACCURACY OF ANY INFORMATION
CONTAINED HEREIN, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTABILITY
OR FITNESS FOR A PARTICULAR PURPOSE. BECAUSE SOME STATES/JURISDICTIONS DO NOT ALLOW
EXCLUSIONS OF IMPLIED WARRANTIES, THE ABOVE LIMITATION MAY NOT APPLY TO YOU.
NEITHER MICROSOFT NOR ITS SUPPLIERS SHALL HAVE ANY LIABILITY FOR
ANY DAMAGES WHATSOEVER INCLUDING CONSEQUENTIAL, INCIDENTAL, DIRECT, INDIRECT, SPECIAL, AND
LOSS OF PROFITS. BECAUSE SOME STATES/JURISDICTIONS DO NOT ALLOW THE EXCLUSION OF
CONSEQUENTIAL OR INCIDENTAL DAMAGES, THE ABOVE LIMITATION MAY NOT APPLY TO YOU. IN ANY
EVENT, MICROSOFT'S AND ITS SUPPLIERS' ENTIRE LIABILITY IN ANY MANNER ARISING OUT OF THESE
MATERIALS, WHETHER BY TORT, CONTRACT, OR OTHERWISE, SHALL NOT EXCEED THE SUGGESTED RETAIL
PRICE OF THESE MATERIALS.
Owner: Ilse Vinson & Desktop-Admin
Feedback