Configuration and Tuning

Chern Lee Written by Mike Smith Based on a tutorial written by Matt Dillon Also based on tuning(7) written by Configuration and Tuning Synopsis system configuration/optimization Configuring a system correctly can substantially reduce the amount of work involved in maintaining and upgrading it in the future. This chapter describes some of the aspects of administrative configuration of FreeBSD systems. This chapter will also describe some of the parameters that can be set to tune a FreeBSD system for optimum performance. After reading this chapter, you will know: Why and how to efficiently size, layout, and place filesystems and swap partitions on your hard drive. The basics of the rc.conf configuration and /usr/local/etc/rc.d startup systems. How to configure virtual hosts on your network devices. How to use the various configuration files in /etc. How to tune FreeBSD using sysctl variables. How to tune disk performance and modify kernel limitations. Before reading this chapter, you should: Understand the basics of Unix and FreeBSD (). Be familiar with keeping FreeBSD sources up to date (), and the basics of kernel configuration/compilation (). Initial Configuration Partition Layout Partition layout /etc /var /usr Base Partitions When laying out your filesystem with &man.disklabel.8; or &man.sysinstall.8;, it is important to remember that hard drives can transfer data at a faster rate from the outer tracks than the inner. Knowing this, you should place your smaller, heavily-accessed filesystems, such as root and swap, closer to the outside of the drive, while placing larger partitions, such as /usr, towards the inner. To do so, it is a good idea to create partitions in a similar order: root, swap, /var, /usr. The size of your /var partition reflects the intended use of your machine. /var is primarily used to hold mailboxes, log files, and printer spools. Mailboxes and log files, in particular, can grow to unexpected sizes based upon how many users are on your system and how long your log files are kept. If you intend to run a mail server, a /var partition of over a gigabyte can be suitable. Additionally, /var/tmp must be large enough to contain any packages you may wish to add. The /usr partition holds the bulk of the files required to support the system and a subdirectory within it called /usr/local holds the bulk of the files installed from the &man.ports.7; hierarchy. If you do not use ports all that much and do not intend to keep system source (/usr/src) on the machine, you can get away with a 1 gigabyte /usr partition. However, if you install a lot of ports (especially window managers and Linux binaries), we recommend at least a two gigabyte /usr and if you also intend to keep system source on the machine, we recommend a three gigabyte /usr. Do not underestimate the amount of space you will need in this partition, it can creep up and surprise you! When sizing your partitions, keep in mind the space requirements for your system to grow. Running out of space in one partition while having plenty in another can lead to much frustration. Some users who have used &man.sysinstall.8;'s Auto-defaults partition sizer have found either their root or /var partitions too small later on. Partition wisely and generously. Swap Partition swap sizing swap partition As a rule of thumb, your swap space should typically be double the amount of main memory. For example, if the machine has 128 megabytes of memory, the swap file should be 256 megabytes. Systems with lesser memory may perform better with a lot more swap. It is not recommended that you configure any less than 256 megabytes of swap on a system and you should keep in mind future memory expansion when sizing the swap partition. The kernel's VM paging algorithms are tuned to perform best when the swap partition is at least two times the size of main memory. Configuring too little swap can lead to inefficiencies in the VM page scanning code as well as create issues later on if you add more memory to your machine. Finally, on larger systems with multiple SCSI disks (or multiple IDE disks operating on different controllers), it is strongly recommend that you configure swap on each drive (up to four drives). The swap partitions on the drives should be approximately the same size. The kernel can handle arbitrary sizes but internal data structures scale to 4 times the largest swap partition. Keeping the swap partitions near the same size will allow the kernel to optimally stripe swap space across the disks. Do not worry about overdoing it a little, swap space is the saving grace of Unix. Even if you do not normally use much swap, it can give you more time to recover from a runaway program before being forced to reboot. Why Partition? Why partition at all? Why not create one big root partition and be done with it? Then I do not have to worry about undersizing things! There are several reasons this is not a good idea. First, each partition has different operational characteristics and separating them allows the filesystem to tune itself to those characteristics. For example, the root and /usr partitions are read-mostly, with very little writing, while a lot of reading and writing could occur in /var and /var/tmp. By properly partitioning your system, fragmentation introduced in the smaller more heavily write-loaded partitions will not bleed over into the mostly-read partitions. Additionally, keeping the write-loaded partitions closer to the edge of the disk, for example before the really big partition instead of after in the partition table, will increase I/O performance in the partitions where you need it the most. Now it is true that you might also need I/O performance in the larger partitions, but they are so large that shifting them more towards the edge of the disk will not lead to a significant performance improvement whereas moving /var to the edge can have a huge impact. Finally, there are safety concerns. Having a small, neat root partition that is essentially read-only gives it a greater chance of surviving a bad crash intact. Core Configuration rc files rc.conf The principal location for system configuration information is within /etc/rc.conf. This file contains a wide range of configuration information, principally used at system startup to configure the system. Its name directly implies this; it is configuration information for the rc* files. An administrator should make entries in the rc.conf file to override the default settings from /etc/defaults/rc.conf. The defaults file should not be copied verbatim to /etc - it contains default values, not examples. All system-specific changes should be made in the rc.conf file itself. A number of strategies may be applied in clustered applications to separate site-wide configuration from system-specific configuration in order to keep administration overhead down. The recommended approach is to place site-wide configuration into another file, such as /etc/rc.conf.site, and then include this file into /etc/rc.conf, which will contain only system-specific information. As rc.conf is read by &man.sh.1; it is trivial to achieve this. For example: rc.conf: . rc.conf.site hostname="node15.example.com" network_interfaces="fxp0 lo0" ifconfig_fxp0="inet 10.1.1.1" rc.conf.site: defaultrouter="10.1.1.254" saver="daemon" blanktime="100" The rc.conf.site file can then be distributed to every system using rsync or a similar program, while the rc.conf file remains unique. Upgrading the system using &man.sysinstall.8; or make world will not overwrite the rc.conf file, so system configuration information will not be lost. Application Configuration Typically, installed applications have their own configuration files, with their own syntax, etc. It is important that these files be kept separate from the base system, so that they may be easily located and managed by the package management tools. /usr/local/etc Typically, these files are installed in /usr/local/etc. In the case where an application has a large number of configuration files, a subdirectory will be created to hold them. Normally, when a port or package is installed, sample configuration files are also installed. These are usually identified with a .default suffix. If there are no existing configuration files for the application, they will be created by copying the .default files. For example, consider the contents of the directory /usr/local/etc/apache: -rw-r--r-- 1 root wheel 2184 May 20 1998 access.conf -rw-r--r-- 1 root wheel 2184 May 20 1998 access.conf.default -rw-r--r-- 1 root wheel 9555 May 20 1998 httpd.conf -rw-r--r-- 1 root wheel 9555 May 20 1998 httpd.conf.default -rw-r--r-- 1 root wheel 12205 May 20 1998 magic -rw-r--r-- 1 root wheel 12205 May 20 1998 magic.default -rw-r--r-- 1 root wheel 2700 May 20 1998 mime.types -rw-r--r-- 1 root wheel 2700 May 20 1998 mime.types.default -rw-r--r-- 1 root wheel 7980 May 20 1998 srm.conf -rw-r--r-- 1 root wheel 7933 May 20 1998 srm.conf.default The filesize difference shows that only the srm.conf file has been changed. A later update of the apache port would not overwrite this changed file. Starting Services services It is common for a system to host a number of services. These may be started in several different fashions, each having different advantages. /usr/local/etc/rc.d Software installed from a port or the packages collection will often place a script in /usr/local/etc/rc.d which is invoked at system startup with a argument, and at system shutdown with a argument. This is the recommended way for starting system-wide services that are to be run as root, or that expect to be started as root. These scripts are registered as part of the installation of the package, and will be removed when the package is removed. A generic startup script in /usr/local/etc/rc.d looks like: #!/bin/sh echo -n ' FooBar' case "$1" in start) /usr/local/bin/foobar ;; stop) kill -9 `cat /var/run/foobar.pid` ;; *) echo "Usage: `basename $0` {start|stop}" >&2 exit 64 ;; esac exit 0 This script is called with at startup, and the at shutdown to allow it to carry out its purpose. Some services expect to be invoked by &man.inetd.8; when a connection is received on a suitable port. This is common for mail reader servers (POP and IMAP, etc.). These services are enabled by editing the file /etc/inetd.conf. See &man.inetd.8; for details on editing this file. Some additional system services may not be covered by the toggles in /etc/rc.conf. These are traditionally enabled by placing the command(s) to invoke them in /etc/rc.local. As of FreeBSD 3.1 there is no default /etc/rc.local; if it is created by the administrator it will however be honored in the normal fashion. Note that rc.local is generally regarded as the location of last resort; if there is a better place to start a service, do it there. Do not place any commands in /etc/rc.conf. To start daemons, or run any commands at boot time, place a script in /usr/local/etc/rc.d instead. It is also possible to use the &man.cron.8; daemon to start system services. This approach has a number of advantages, not least being that because &man.cron.8; runs these processes as the owner of the crontab, services may be started and maintained by non-root users. This takes advantage of a feature of &man.cron.8;: the time specification may be replaced by @reboot, which will cause the job to be run when &man.cron.8; is started shortly after system boot. Virtual Hosts virtual hosts ip aliases A very common use of FreeBSD is virtual site hosting, where one server appears to the network as many servers. This is achieved by assigning multiple network addresses to a single interface. A given network interface has one real address, and may have any number of alias addresses. These aliases are normally added by placing alias entries in /etc/rc.conf. An alias entry for the interface fxp0 looks like: ifconfig_fxp0_alias0="inet xxx.xxx.xxx.xxx netmask xxx.xxx.xxx.xxx" Note that alias entries must start with alias0 and proceed upwards in order, (for example, _alias1, _alias2, and so on). The configuration process will stop at the first missing number. The calculation of alias netmasks is important, but fortunately quite simple. For a given interface, there must be one address which correctly represents the network's netmask. Any other addresses which fall within this network must have a netmask of all 1's. For example, consider the case where the fxp0 interface is connected to two networks, the 10.1.1.0 network with a netmask of 255.255.255.0 and the 202.0.75.16 network with a netmask of 255.255.255.240. We want the system to appear at 10.1.1.1 through 10.1.1.5 and at 202.0.75.17 through 202.0.75.20. The following entries configure the adapter correctly for this arrangement: ifconfig_fxp0="inet 10.1.1.1 netmask 255.255.255.0" ifconfig_fxp0_alias0="inet 10.1.1.2 netmask 255.255.255.255" ifconfig_fxp0_alias1="inet 10.1.1.3 netmask 255.255.255.255" ifconfig_fxp0_alias2="inet 10.1.1.4 netmask 255.255.255.255" ifconfig_fxp0_alias3="inet 10.1.1.5 netmask 255.255.255.255" ifconfig_fxp0_alias4="inet 202.0.75.17 netmask 255.255.255.240" ifconfig_fxp0_alias5="inet 202.0.75.18 netmask 255.255.255.255" ifconfig_fxp0_alias6="inet 202.0.75.19 netmask 255.255.255.255" ifconfig_fxp0_alias7="inet 202.0.75.20 netmask 255.255.255.255" Configuration Files <filename>/etc</filename> Layout There are a number of directories in which configuration information is kept. These include: /etc Generic system configuration information; data here is system-specific. /etc/defaults Default versions of system configuration files. /etc/mail Extra &man.sendmail.8; configuration, other MTA configuration files. /etc/ppp Configuration for both user- and kernel-ppp programs. /etc/namedb Default location for &man.named.8; data. Normally the boot file is located here, and contains a directive to refer to other data in /var/db. /usr/local/etc Configuration files for installed applications. May contain per-application subdirectories. /usr/local/etc/rc.d Start/stop scripts for installed applications. /var/db Persistent system-specific data files, such as &man.named.8; zone files, database files, and so on. Hostnames hostname DNS <filename>/etc/resolv.conf</filename> resolv.conf /etc/resolv.conf dictates how FreeBSD's resolver accesses the Internet Domain Name System (DNS). The most common entries to resolv.conf are: nameserver The IP address of a name server the resolver should query. The servers are queried in the order listed with a maximum of three. search Search list for hostname lookup. This is normally determined by the domain of the local hostname. domain The local domain name. A typical resolv.conf: search example.com nameserver 147.11.1.11 nameserver 147.11.100.30 Only one of the search and domain options should be used. If you are using DHCP, &man.dhclient.8; usually rewrites resolv.conf with information received from the DHCP server. <filename>/etc/hosts</filename> hosts /etc/hosts is a simple text database reminiscent of the old Internet. It works in conjunction with DNS and NIS providing name to IP address mappings. Local computers connected via a LAN can be placed in here for simplistic naming purposes instead of setting up a &man.named.8; server. Additionally, /etc/hosts can be used to provide a local record of Internet names, reducing the need to query externally for commonly accessed names. # $FreeBSD$ # # Host Database # This file should contain the addresses and aliases # for local hosts that share this file. # In the presence of the domain name service or NIS, this file may # not be consulted at all; see /etc/nsswitch.conf for the resolution order. # # ::1 localhost localhost.my.domain myname.my.domain 127.0.0.1 localhost localhost.my.domain myname.my.domain # # Imaginary network. #10.0.0.2 myname.my.domain myname #10.0.0.3 myfriend.my.domain myfriend # # According to RFC 1918, you can use the following IP networks for # private nets which will never be connected to the Internet: # # 10.0.0.0 - 10.255.255.255 # 172.16.0.0 - 172.31.255.255 # 192.168.0.0 - 192.168.255.255 # # In case you want to be able to connect to the Internet, you need # real official assigned numbers. PLEASE PLEASE PLEASE do not try # to invent your own network numbers but instead get one from your # network provider (if any) or from the Internet Registry (ftp to # rs.internic.net, directory `/templates'). # /etc/hosts takes on the simple format of: [Internet address] [official hostname] [alias1] [alias2] ... For example: 10.0.0.1 myRealHostname.example.com myRealHostname foobar1 foobar2 Consult &man.hosts.5; for more information. Log File Configuration log files <filename>syslog.conf</filename> syslog.conf syslog.conf is the configuration file for the &man.syslogd.8; program. It indicates which types of syslog messages are logged to particular log files. # $FreeBSD$ # # Spaces ARE valid field separators in this file. However, # other *nix-like systems still insist on using tabs as field # separators. If you are sharing this file between systems, you # may want to use only tabs as field separators here. # Consult the syslog.conf(5) manual page. *.err;kern.debug;auth.notice;mail.crit /dev/console *.notice;kern.debug;lpr.info;mail.crit;news.err /var/log/messages security.* /var/log/security mail.info /var/log/maillog lpr.info /var/log/lpd-errs cron.* /var/log/cron *.err root *.notice;news.err root *.alert root *.emerg * # uncomment this to log all writes to /dev/console to /var/log/console.log #console.info /var/log/console.log # uncomment this to enable logging of all log messages to /var/log/all.log #*.* /var/log/all.log # uncomment this to enable logging to a remote log host named loghost #*.* @loghost # uncomment these if you're running inn # news.crit /var/log/news/news.crit # news.err /var/log/news/news.err # news.notice /var/log/news/news.notice !startslip *.* /var/log/slip.log !ppp *.* /var/log/ppp.log Consult the &man.syslog.conf.5; manual page for more information. <filename>newsyslog.conf</filename> newsyslog.conf newsyslog.conf is the configuration file for &man.newsyslog.8;, a program that is normally scheduled to run by &man.cron.8;. &man.newsyslog.8; determines when log files require archiving or rearranging. logfile is moved to logfile.0, logfile.0 is moved to logfile.1, and so on. Alternatively, the log files may be archived in &man.gzip.1; format causing them to be named: logfile.0.gz, logfile.1.gz, and so on. newsyslog.conf indicates which log files are to be managed, how many are to be kept, and when they are to be touched. Log files can be rearranged and/or archived when they have either reached a certain size, or at a certain periodic time/date. # configuration file for newsyslog # $FreeBSD$ # # filename [owner:group] mode count size when [ZB] [/pid_file] [sig_num] /var/log/cron 600 3 100 * Z /var/log/amd.log 644 7 100 * Z /var/log/kerberos.log 644 7 100 * Z /var/log/lpd-errs 644 7 100 * Z /var/log/maillog 644 7 * @T00 Z /var/log/sendmail.st 644 10 * 168 B /var/log/messages 644 5 100 * Z /var/log/all.log 600 7 * @T00 Z /var/log/slip.log 600 3 100 * Z /var/log/ppp.log 600 3 100 * Z /var/log/security 600 10 100 * Z /var/log/wtmp 644 3 * @01T05 B /var/log/daily.log 640 7 * @T00 Z /var/log/weekly.log 640 5 1 $W6D0 Z /var/log/monthly.log 640 12 * $M1D0 Z /var/log/console.log 640 5 100 * Z Consult the &man.newsyslog.8; manual page for more information. <filename>sysctl.conf</filename> sysctl.conf sysctl sysctl.conf looks much like rc.conf. Values are set in a variable=value form. The specified values are set after the system goes into multi-user mode. Not all variables are settable in this mode. A sample sysctl.conf turning off logging of fatal signal exits and letting Linux programs know they are really running under FreeBSD. kern.logsigexit=0 # Do not log fatal signal exits (e.g. sig 11) compat.linux.osname=FreeBSD compat.linux.osrelease=4.3-STABLE Tuning with sysctl sysctl Tuning with sysctl &man.sysctl.8; is an interface that allows you to make changes to a running FreeBSD system. This includes many advanced options of the TCP/IP stack and virtual memory system that can dramatically improve performance for an experienced system administrator. Over five hundred system variables can be read and set using &man.sysctl.8;. At its core, &man.sysctl.8; serves two functions: to read and to modify system settings. To view all readable variables: &prompt.user; sysctl -a To read a particular variable, for example, kern.maxproc: &prompt.user; sysctl kern.maxproc kern.maxproc: 1044 To set a particular variable, use the intuitive variable=value syntax: &prompt.root; sysctl kern.maxfiles=5000 kern.maxfiles: 2088 -> 5000 Settings of sysctl variables are usually either strings, numbers, or booleans (a boolean being 1 for yes or a 0 for no). Tuning Disks Sysctl Variables <varname>vfs.vmiodirenable</varname> vfs.vmiodirenable The vfs.vmiodirenable sysctl variable may be set to either 0 (off) or 1 (on); it is 1 by default. This variable controls how directories are cached by the system. Most directories are small, using just a single fragment (typically 1K) in the filesystem and less (typically 512 bytes) in the buffer cache. However, when operating in the default mode the buffer cache will only cache a fixed number of directories even if you have a huge amount of memory. Turning on this sysctl allows the buffer cache to use the VM Page Cache to cache the directories, making all the memory available for caching directories. However, the minimum in-core memory used to cache a directory is the physical page size (typically 4K) rather than 512 bytes. We recommend turning this option on if you are running any services which manipulate large numbers of files. Such services can include web caches, large mail systems, and news systems. Turning on this option will generally not reduce performance even with the wasted memory but you should experiment to find out. <varname>hw.ata.wc</varname> hw.ata.wc FreeBSD 4.3 flirted with turning off IDE write caching. This reduced write bandwidth to IDE disks but was considered necessary due to serious data consistency issues introduced by hard drive vendors. The problem is that IDE drives lie about when a write completes. With IDE write caching turned on, IDE hard drives not only write data to disk out of order, but will sometimes delay writing some blocks indefinitely when under heavy disk loads. A crash or power failure may cause serious filesystem corruption. FreeBSD's default was changed to be safe. Unfortunately, the result was such a huge performance loss that we changed write caching back to on by default after the release. You should check the default on your system by observing the hw.ata.wc sysctl variable. If IDE write caching is turned off, you can turn it back on by setting the kernel variable back to 1. This must be done from the boot loader at boot time. Attempting to do it after the kernel boots will have no effect. For more information, please see &man.ata.4;. Soft Updates Soft Updates tunefs The &man.tunefs.8; program can be used to fine-tune a filesystem. This program has many different options, but for now we are only concerned with toggling Soft Updates on and off, which is done by: &prompt.root; tunefs -n enable /filesystem &prompt.root; tunefs -n disable /filesystem A filesystem cannot be modified with &man.tunefs.8; while it is mounted. A good time to enable Soft Updates is before any partitions have been mounted, in single-user mode. As of FreeBSD 4.5, it is possible to enable Soft Updates at filesystem creation time, through use of the -U option to &man.newfs.8;. Soft Updates drastically improves meta-data performance, mainly file creation and deletion, through the use of a memory cache. We recommend turning Soft Updates on on all of your filesystems. There are two downsides to Soft Updates that you should be aware of: First, Soft Updates guarantees filesystem consistency in the case of a crash but could very easily be several seconds (even a minute!) behind updating the physical disk. If your system crashes you may lose more work than otherwise. Secondly, Soft Updates delays the freeing of filesystem blocks. If you have a filesystem (such as the root filesystem) which is almost full, performing a major update, such as make installworld, can cause the filesystem to run out of space and the update to fail. More details about Soft Updates Soft Updates (Details) There are two traditional approaches to writing a filesystem's meta-data back to disk. (Meta-data updates are updates to non-content data like inodes or directories.) Historically, the default behaviour was to write out meta-data updates synchronously. If a directory had been changed, the system waited until the change was actually written to disk. The file data buffers (file contents) were passed through the buffer cache and backed up to disk later on asynchronously. The advantage of this implementation is that it operates safely. If there is a failure during an update, the meta-data are always in a consistent state. A file is either created completely or not at all. If the data blocks of a file did not find their way out of the buffer cache onto the disk by the time of the crash, &man.fsck.8; is able to recognize this and repair the filesystem by setting the file length to 0). Additionally, the implementation is clear and simple. The disadvantage is that meta-data changes are slow. An rm -r, for instance, touches all the files in a directory sequentially, but each directory change (deletion of a file) will be written synchronously to the disk. This includes updates to the directory itself, to the i-node table, and possibly to indirect blocks allocated by the file. Similar considerations apply for unrolling large hierarchies (tar -x). The second case is asynchronous meta-data updates. This is the default for Linux/ext2fs and mount -o async for *BSD ufs. All meta-data updates are simply being passed through the buffer cache too, that is, they will be intermixed with the updates of the file content data. The advantage of this implementation is there is no need to wait until each meta-data update has been written to disk, so all operations which cause huge amounts of meta-data updates work much faster than in the synchronous case. Also, the implementation is still clear and simple, so there is a low risk for bugs creeping into the code. The disadvantage is that there is no guarantee at all for a consistent state of the filesystem. If there is a failure during an operation that updated large amounts of meta-data (like a power failure, or someone pressing the reset button), the file system will be left in an unpredictable state. There is no opportunity to examine the state of the file system when the system comes up again; the data blocks of a file could already have been written to the disk while the updates of the i-node table or the associated directory were not. It is actually impossible to implement a fsck which is able to clean up the resulting chaos (because the necessary information is not available on the disk). If the filesystem has been damaged beyond repair, the only choice is to newfs it and restore it from backup. The usual solution for this problem was to implement dirty region logging, which is also referred to as journaling, although that term is not used consistently and is occasionally applied to other forms of transaction logging as well. Meta-data updates are still written synchronously, but only into a small region of the disk. Later on they will be moved to their proper location. Because the logging area is a small, contiguous region on the disk, there are no long distances for the disk heads to move, even during heavy operations, so these operations are quicker than synchronous updates. Additionally the complexity of the implementation is fairly limited, so the risk of bugs being present is low. A disadvatage is that all meta-data are written twice (once into the logging region and once to the proper location) so for normal work, a performance pessimization might result. On the other hand, in case of a crash, all pending meta-data operations can be quickly either rolled-back or completed from the logging area after the system comes up again, resulting in a fast filesystem startup. Kirk McKusick, the developer of Berkeley FFS, solved this problem with Soft Updates: all pending meta-data updates are kept in memory and written out to disk in a sorted sequence (ordered meta-data updates). This has the effect that, in case of heavy meta-data operations, later updates to an item catch the earlier ones if the earlier ones are still in memory and have not already been written to disk. So all operations on, say, a directory are generally performed in memory before the update is written to disk (the data blocks are sorted according to their position so that they will not be on the disk ahead of their meta-data). If the system crashes, this causes an implicit log rewind: all operations which did not find their way to the disk appear as if they had never happened. A consistent filesystem state is maintained that appears to be the one of 30 to 60 seconds earlier. The algorithm used guarantees that all resources in use are marked as such in their appropriate bitmaps: blocks and inodes. After a crash, the only resource allocation error that occurs is that resources are marked as used which are actually free. &man.fsck.8; recognizes this situation, and frees the resources that are no longer used. It is safe to ignore the dirty state of the filesystem after a crash by forcibly mounting it with mount -f. In order to free resources that may be unused, &man.fsck.8; needs to be run at a later time. This is the idea behind the background fsck: at system startup time, only a snapshot of the filesystem is recorded. The fsck can be run later on. All filesystems can then be mounted dirty, so the system startup proceeds in multiuser mode. Then, background fscks will be scheduled for all filesystems where this is required, to free resources that may be unused. (Filesystems that do not use Soft Updates still need the usual foreground fsck though.) The advantage is that meta-data operations are nearly as fast as asynchronous updates (i.e. faster than with logging, which has to write the meta-data twice). The disadvantages are the complexity of the code (implying a higher risk for bugs in an area that is highly sensitive regarding loss of user data), and a higher memory consumption. Additionally there are some idiosyncrasies one has to get used to. After a crash, the state of the filesystem appears to be somewhat older. In situations where the standard synchronous approach would have caused some zero-length files to remain after the fsck, these files do not exist at all with a Soft Updates filesystem because neither the meta-data nor the file contents have ever been written to disk. Disk space is not released until the updates have been written to disk, which may take place some time after running rm. This may cause problems when installing large amounts of data on a filesystem that does not have enough free space to hold all the files twice. Tuning Kernel Limits Tuning kernel limits File/Process Limits <varname>kern.maxfiles</varname> kern.maxfiles kern.maxfiles can be raised or lowered based upon your system requirements. This variable indicates the maximum number of file descriptors on your system. When the file descriptor table is full, file: table is full will show up repeatedly in the system message buffer, which can be viewed with the dmesg command. Each open file, socket, or fifo uses one file descriptor. A large-scale production server may easily require many thousands of file descriptors, depending on the kind and number of services running concurrently. kern.maxfile's default value is dictated by the option in your kernel configuration file. kern.maxfiles grows proportionally to the value of . When compiling a custom kernel, it is a good idea to set this kernel configuration option according to the uses of your system. From this number, the kernel is given most of its pre-defined limits. Even though a production machine may not actually have 256 users connected as once, the resources needed may be similar to a high-scale webserver. As of FreeBSD 4.5, setting to 0 in your kernel configuration file will choose a reasonable default value based on the amount of RAM present in your system. Network Limits The kernel configuration option dictates the amount of network mbufs available to the system. A heavily-trafficked server with a low number of MBUFs will hinder FreeBSD's ability. Each cluster represents approximately 2K of memory, so a value of 1024 represents 2 megabytes of kernel memory reserved for network buffers. A simple calculation can be done to figure out how many are needed. If you have a web server which maxes out at 1000 simultaneous connections, and each connection eats a 16K receive and 16K send buffer, you need approximately 32MB worth of network buffers to cover the webserver. A good rule of thumb is to multiply by 2, so 32MBx2 = 64MB/2K = 32768. Adding Swap Space No matter how well you plan, sometimes a system doesn't run as you expect. If you find you need more swap space, it's simple enough to add. You have three ways to increase swap space: adding a new hard drive, enabling swap over NFS, and creating a swap file on an existing partition. Swap on a New Hard Drive The best way to add swap, of course, is to use this as an excuse to add another hard drive. You can always use another hard drive, after all. If you can do this, go reread the discussion of swap space from the Initial Configuration section of the Handbook for some suggestions on how to best arrange your swap. Swapping over NFS Swapping over NFS is only recommended if you do not have a local hard disk to swap to. Swapping over NFS is slow and inefficient in versions of FreeBSD prior to 4.x. It is reasonably fast and efficient in 4.0-RELEASE and newer. Even with newer versions of FreeBSD, NFS swapping will be limited by the available network bandwidth and puts an additional burden on the NFS server. Swapfiles You can create a file of a specified size to use as a swap file. In our example here we will use a 64Mb file called /usr/swap0. You can use any name you want, of course. Creating a Swapfile Be certain that your kernel configuration includes the vnode driver. It is not in recent versions of GENERIC. pseudo-device vn 1 #Vnode driver (turns a file into a device) create a vn-device: &prompt.root; cd /dev &prompt.root; sh MAKEDEV vn0 create a swapfile (/usr/swap0): &prompt.root; dd if=/dev/zero of=/usr/swap0 bs=1024k count=64 set proper permissions on (/usr/swap0): &prompt.root; chmod 0600 /usr/swap0 enable the swap file in /etc/rc.conf: swapfile="/usr/swap0" # Set to name of swapfile if aux swapfile desired. Reboot the machine or to enable the swap file immediately, type: &prompt.root; vnconfig -e /dev/vn0b /usr/swap0 swap