Linux Kernel Tuning

FAQ 2005/06/27 18:20
How to set kernel tunables
The easiest method to set kernel parameters is by modifying the /proc filesystem (and hence the kernel directly) by using echo "value" > /proc/kernel/parameter. Then changes take effect immediately but must be reset at system boot up. Kernel tuning can be automated at boo time by putting the echo commands in /etc/rc.d/rc.local for Non-RedHat distributions or for RedHat derived distributions modify the /etc/sysctl.conf configuration file instead.

Increasing System Limits
Increasing the Maximum number of file handles and the inode cache
NOTE:
On all current versions of linux up to and including 2.2.10 and 2.3.9, inode caches DO NOT SHRINK like the file and dentry caches do when your applications need lots of ram.
This means if you set a really large inode cache, then you can lose a significant amount of RAM over time. On a server machine, this is expected, normal and desired. On a workstation machine, doing a kernel compile when your inode-max is set at a very large number will probably give far too much to the inode cache.
Empirical evidence suggests that 40960 entries in the inode cache will use up to 10 megabytes of ram. Your mileage may vary, and more data is necessary to confirm this number.
Linux 2.0.x - file-max defaults to 1024 so increase the value of /proc/sys/kernel/file-max to something reasonable like 256 for every 4M of RAM you have: i.e.. for a 64 M machine, set it to 4096.
The canonical command to change anything in the /proc hierarchy is (as root) echo "newvalue" >/proc/file/that/you/want/to/change, so for this item the command line is
echo "4096" >/proc/sys/kernel/file-max
Also increase /proc/sys/kernel/inode-max to a value roughly 3 to 4 times the number of open files. This is because the number of inodes open is at least one per open file, and often much larger than that for large files.
(the following was written by Tani Hosokawa)
Note: if you increase this beyond 1024, you may also have to edit include/linux/posix_types.h and increase this line:

#define __FD_SETSIZE 1024

That allows for a select to handle 1024 file descriptors. More than that, and stuff may break.

Linux 2.2.x/2.3.x - increase the value of /proc/sys/fs/file-max to something reasonable like 256 for every 4M of RAM you have: i.e.. for a 64 M machine, set it to 4096. As above, also increase the /proc/sys/fs/inode-max as well

Long Answer:
The above technique or modifying the constants in the kernel sources. Not usually the right answer because that will not survive a new kernel source tree. One of the best techniques is to add the above commands to /etc/rc.d/rc.local.

The exact number will vary from the above formula based on what you are actually doing with the machine. A file server or web server need a lot of open files, for instance, but a compute server does not.
Very large memory systems, especially 512 Megabytes or larger, probably should not have more than 50,000 open files and 150,000 open inodes. Of course if you are Mindcraft, this is a cheap and effective way to waste kernel memory.

Linux 2.4.x - ?

Here is another method of increasing these limits from www.linuxraid.org:

Aim: Increase the number of files that may be open simultaneously
Changes to include/linux/fs.h:
increase NR_FILE from 4096 to 65536
increase NR_RESERVED_FILES from 10 to 128
Changes to fs/inode.c:
increase MAX_INODE from 16384 to 262144
Note: MAX_INODE must be at least three times larger than NR_FILE.

Increasing the number of processes/tasks allowed
Linux 2.0.x - The default maximum is 512 tasks, half of which can be used by any single
user. Here's an excerpt from /usr/src/linux/include/linux/tasks.h
#define NR_TASKS 512 /* On x86 Max 4092, or 4090 w/APM configured. */

#define MAX_TASKS_PER_USER (NR_TASKS/2)
#define MIN_TASKS_LEFT_FOR_ROOT 4

Just change the 512 to something higher. You can change MAX_TASKS_PER_USER to
something else as well, although it's a nice precaution against simple process
table attacks. Properly managed systems shouldn't be vulnerable to that
though (you do set your MaxClients and whatnot, don't you?). Don't try to go
above the maximums. Your machine will just keep rebooting and rebooting.
(the preceding was written by Tani Hosokawa)

Linux 2.2.x/2.3.x -Edit /usr/src/linux/include/linux/tasks.h, modify the "NR_TASKS" value and then rebuild and install the kernel. (One person recommended changing NR_TASKS from 512 to 2048, and changing MIN_TASKS_LEFT_FOR_ROOT to 24.)

Linux 2.4.x - ?

Decrease the time before disposing of unused TCP keepalive requests (from linuxraid.org)
Changes to include/net/tcp.h:
decrease TCP_KEEPALIVE_TIME from 2 hours to 5 minutes
Download: http://www.linuxraid.org/tcp-keepalive.patch
Increase the number of TCP/UDP ports that may be used simultaneously (from linuxraid.org)
On 2.2 and 2.4 kernels, the local port range can be changed via sysctl
echo 1024 25000 > /proc/sys/net/ipv4/ip_local_port_range
Allows more local ports to be available. Generally not a issue, but in a benchmarking scenario you often need more ports available. A common example is clients running `ab` or `http_load` or similar software.
Increasing the amount of memory associated with socket buffers
Increasing the amount of memory associated with socket buffers can often improve performance. Things like NFS in particular, or apache setups with large buffer configured can benefit from this.
echo 262143 > /proc/sys/net/core/rmem_max
echo 262143 > /proc/sys/net/core/rmem_default
This will increase the amount of memory available for socket input queues. The "wmem_*" values do the same for output queues.
Note: With 2.4.x kernels, these values are supposed to "autotune" fairly well, and some people suggest just instead changing the values in:

/proc/sys/net/ipv4/tcp_rmem
/proc/sys/net/ipv4/tcp_wmem
Increasing the amount of memory associated with socket buffers
Increasing the amount of memory associated with socket buffers can often improve performance. Things like NFS in particular, or apache setups with large buffer configured can benefit from this.
echo 262143 > /proc/sys/net/core/rmem_max
echo 262143 > /proc/sys/net/core/rmem_default
This will increase the amount of memory available for socket input queues. The "wmem_*" values do the same for output queues.
Note: With 2.4.x kernels, these values are supposed to "autotune" fairly well, and some people suggest just instead changing the values in:

/proc/sys/net/ipv4/tcp_rmem
/proc/sys/net/ipv4/tcp_wmem
There are three values here, "min default max".
Turning off tcp_sack and tcp_timestamps
These reduce the amount of work the TCP stack has to do:
echo 0 > /proc/sys/net/ipv4/tcp_sack
echo 0 > /proc/sys/net/ipv4/tcp_timestamps
This disables "RFC2018 TCP Selective Acknowledgements", and "RFC1323 TCP timestamps"
Increasing shared memory and ipc limits
Some applications, databases in particular, sometimes need large amounts of SHM segments and semaphores. The default limit for the number of shm segments is 128 for 2.2.
This limit is set in a couple of places in the kernel, and requires a modification of the kernel source and a recompile to increase them.

A sample diff to bump them up:

--- linux/include/linux/sem.h.save Wed Apr 12 20:28:37 2000
+++ linux/include/linux/sem.h Wed Apr 12 20:29:03 2000
@@ -60,7 +60,7 @@
int semaem;
};

-#define SEMMNI 128 /* ? max # of semaphore identifiers */
+#define SEMMNI 512 /* ? max # of semaphore identifiers */
#define SEMMSL 250 /* <= 512 max num of semaphores per id */
#define SEMMNS (SEMMNI*SEMMSL) /* ? max # of semaphores in system */
#define SEMOPM 32 /* ~ 100 max num of ops per semop call */
--- linux/include/asm-i386/shmparam.h.save Wed Apr 12 20:18:34 2000
+++ linux/include/asm-i386/shmparam.h Wed Apr 12 20:28:11 2000
@@ -21,7 +21,7 @@
* Keep _SHM_ID_BITS as low as possible since SHMMNI depends on it and
* there is a static array of size SHMMNI.
*/
-#define _SHM_ID_BITS 7
+#define _SHM_ID_BITS 10
#define SHM_ID_MASK ((1<<_SHM_ID_BITS)-1)

#define SHM_IDX_SHIFT (_SHM_ID_BITS)


Theoretically, the _SHM_ID_BITS can go as high as 11. The rule is that _SHM_ID_BITS + _SHM_IDX_BITS must be <= 24 on x86.
In addition to the number of shared memory segments, you can control the maximum amount of memory allocated to shm at run time via the /proc interface. /proc/sys/kernel/shmmax indicates the current. Echo a new value to it to increase it.

echo "67108864" > /proc/sys/kernel/shmmax
To double the default value.
A good resource on this is Tunings The Linux Kernel's Memory. Linux Maximus: How to Get Maximum Performance from Linux and Oracle also includes some useful about tuning shm for oracle, amongst other things.

The best way to see what the current values are, is to issue the command:

ipcs -l
Ptys and ttys
The number of ptys and ttys on a box can sometimes be a limiting factor for things like login servers and database servers.
On Red Hat Linux 7.x, the default limit on ptys is set to 2048 for i686 and athlon kernels. Standard i386 and similar kernels default to 256 ptys.

The config directive CONFIG_UNIX98_PTY_COUNT defaults to 256, but can be set as high as 2048. For 2048 ptys to be supported, the value of UNIX98_PTY_MAJOR_COUNT needs to be set to 8 in include/linux/major.h

With the current device number scheme and allocations, the maximum number of ptys is 2048.

Increasing Thread Limits
Limitations on threads are tightly tied to both file descriptor limits, and process limits.
Under Linux, threads are counted as processes, so any limits to the number of processes also applies to threads. In a heavily threaded app like a threaded TCP engine, or a java server, you can quickly run out of threads.

The first step to increasing the possible number of threads is to make sure you have boosted any process limits as mentioned before.

There are few things that can limit the number of threads, including process limits, memory limits, mutex/semaphore/shm/ipc limits, and compiled in thread limits. For most cases, the process limit is the first one to run into, then the compiled in thread limits, then the memory limits.

To increase the limits, you have to recompile glibc. Oh fun!. And the patch is essentially two lines!. Woohoo!


--- ./linuxthreads/sysdeps/unix/sysv/linux/bits/local_lim.h.akl Mon Sep 4
19:37:42 2000
+++ ./linuxthreads/sysdeps/unix/sysv/linux/bits/local_lim.h Mon Sep 4
19:37:56 2000
@@ -64,7 +64,7 @@
/* The number of threads per process. */
#define _POSIX_THREAD_THREADS_MAX 64
/* This is the value this implementation supports. */
-#define PTHREAD_THREADS_MAX 1024
+#define PTHREAD_THREADS_MAX 8192

/* Maximum amount by which a process can descrease its asynchronous I/O
priority level. */
--- ./linuxthreads/internals.h.akl Mon Sep 4 19:36:58 2000
+++ ./linuxthreads/internals.h Mon Sep 4 19:37:23 2000
@@ -330,7 +330,7 @@
THREAD_SELF implementation is used, this must be a power of two and
a multiple of PAGE_SIZE. */
#ifndef STACK_SIZE
-#define STACK_SIZE (2 * 1024 * 1024)
+#define STACK_SIZE (64 * PAGE_SIZE)
#endif

/* The initial size of the thread stack. Must be a multiple of PAGE_SIZE.
* */
Now just patch glibc, rebuild, and install it. ;-> If you have a package based system, I seriously suggest making a new package and using it.
Two references on how to do this are Jlinux.org, and Volano.Both describe how to increase the number of threads so Java apps can use them.

Increasing ulimits or shell limits
OK, so this isn't kernel tuning but it may be as issue that you have to deal with, here is how you set the shell security limits up for you application:
"In bash and similar shells you can use these three
commands:
ulimit -a
ulimit -Ha
ulimit -s unlimited
that will respectively print soft limits, hard limits
and remove the stack limit."

The source for this information came from this Usenet article.

Large File Support
Large file support - support for files greater than 2 GB - is a kernel AND user space issue, meaning that not just the kernel has to be able to support file larger than 2 GB but also the C library (libc or for GUN/Linux glibc) and all file accessing utilities have to support it as well. See the LFS section in the links page for links with detailed information.


Improving System Performance

Tuning (delaying) filesystem cache synchronization (flushing)
Increasing the time between when the kernel writes will minimize the amount of I/O done at the cost of losing more data if the system were to crash. See the following link from the linuxdoc.org site from the Securing and Optimizing Red Hat Linux Edition.

Tuning virtual memory system to use less memory on servers with *lots* of memory on 2.0.x or 2.2.x (from Tani Hosokawa)
Memory shortages (even though you've got tons) - for Linux 2.0.x/2.2.x
Sometimes, you'll end up with a situation where the kernel can't seem to find
enough memory to load a program for you, even though you've got tons of
memory. This may be caused by the filesystem buffers using up the extra, and
not having enough memory immediately available. You can often fix this by
modifying the contents of /proc/sys/vm/freepages (the three values are
min_free_pages, free_pages_low, and free_pages_high in case you care -- check
the source for more details). "256 512 768" is common, but often not enough.
I use "1024 2048 3072" usually. That's almost definitely enough memory to load
anything, and with 384 megs of RAM, it's not going to hurt performance by
reducing the amount available for caching.
2005/06/27 18:20 2005/06/27 18:20
컴퓨터의 시간들은 제각기 다르죠.

그렇다고 다시 시간을 설정을 해도 어느정도 시간이 흐른후에는

시간이 약간씩은 틀려지기 마련입니다.

이것을 관리자가 일일히 체크해서 맞추기도 귀찮구요.

이제는 일일이 시간을 맞출필요가 없습니다. 리눅스에서는 다른 서버에서

시간을 가져와 설정을 할수가 있죠. 아래와 같이 하시면 됩니다.

/etc/crontab <- 이 파일을 연후

0 0 * * 1 root rdate -s time.bora.net && clock -w <- 를 한줄 삽입하세요.

위에서 time.bora.net 는 시간을 가져올 서버입니다.

콘솔에서도 할수가 있습니다. 콘솔에서의 방법은

rdate -s time.bora.net && clock -w <-- 이렇게 하시고 엔터를 꽝 치시면 되죠!



[root@inet /etc]# date
Thu Nov 23 18:45:30 KST 2000
[root@inet /etc]# rdate -s time.bora.net && clock -w
[root@inet /etc]# date
Thu Sep 23 18:45:16 KST 2003
2005/06/27 18:19 2005/06/27 18:19
시스템을 관리하다보면 보안이나 서버부하등 여러 가지 사유로 인하여 ping에 응답하지 못하게 설정할 필요를 느낄 때가 있습니다.

이때에는 다음과 같이 설정함으로서 ping에 응답하지 못하게 하거나 다시 응답하게할 수 있습니다.

첫째, ping에 응답하지 못하게 하려면 다음과 같이 /proc/sys/net/ipv4/icmp_echo_ignore_all 파일값을 1로 설정합니다.
ex) /proc/sys/net/ipv4/echo 1 > icmp_echo_ignore_all

둘째, 다시 ping에 응답하게 하려면 다음과 같이 /proc/sys/net/ipv4/icmp_echo_ignore_all 파일값을 0으로 설정합니다.
ex) /proc/sys/net/ipv4/echo 0 > icmp_echo_ignore_all
2005/06/27 18:18 2005/06/27 18:18