nginx automatically cuts access logs

admin · Posted on 1/5/2015 8:53:57 PM

The web access log (access_log) records the access behavior of all external clients to the web server, including important information such as client IP, access date, URL resource accessed, HTTP status code returned by the server, and so on.
A typical web access log looks like this:

112.97.37.90 - - [14/Sep/2013:14:37:39 +0800] "GET / HTTP/1.1" 301 5 "-" "Mozilla/5.0 (Linux; U; Android 2.3.6; zh-cn; Lenovo A326 Build/GRK39F) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1 MicroMessenger/4.5.1.259" -

Planning:

1. To solve the problem:

When the website has a large number of visits, there will be a lot of log data, and if it is all written into a log file, the file will become larger and larger. The speed of large files will slow down, such as hundreds of megabytes of a file. When writing logs, it affects the operation speed. Also, if I want to look at the access logs, a file of several hundred megabits is slow to download and open. Using a third-party free log analysis tool - Log Treasure, you can upload log files from nginx, apache, and iis, which help analyze website security aspects. After all, specializing is more professional. Log Bao also has a size limit for uploaded files, no more than 50m.

2. Nignx does not have a mechanism to automatically separate files and store logs. Since nginx does not automatically save files for you. Therefore, you need to write your own script to implement it.

The shell script file nginx_log_division.sh the following contents:

# /bin/bash

logs_path="/data/wwwlogs/"

#以前的日志文件.

log_name="xxx.log"

pid_path="/usr/local/nginx/logs/nginx.pid"

mv ${logs_path}${log_name} ${logs_path}${log_name}_$(date --date="LAST WEEK" +"%Y-%m-d").log

kill -USR1 `cat ${pid_path}`

The principle of the above shell script is to first move and rename the previous log file to one, the purpose is to backup.

According to the name of the last Monday, when the script is run at the time point of "2013-09-16", then the generated file name is "xxx.log_ 20130909.log".

Even if the mv command has been executed on the file before kill -USR1 'cat ${{pid_path}' is executedChanged file name, nginx will still write log data to the newly named file "xxx.log_20130909" as usual. The reason is that in Linux systems, the kernel looks for files based on file descriptors.

---------------- understanding of Linux file descriptors

A file descriptor is an integer identifier that the Linux kernel names for each open file.

The Linux kernel generates (or maintains) a "File Descriptor TableThis file descriptor table records "the file opened by this process (identified)".

In this environment, nginx is a running process that has already opened a log file and logs the file in the file descriptor table.

Even if the path of the log file has changed, it can still be found (it can be located according to the file descriptor table).

----------------------------------------------

When executing the command "kill -USR1 'cat ${pid_path}'", the saved in the nginx.pid file is actually a number (you can open it and take a look, I am 894 here), and nginx writes the pid (process number) of its main process to the nginx.pid file, so you can directly get its main process number through the cat command and directly operate the specified process number.

kill -USR1 'cat ${pid_path}' is equivalent to

kill –USR1 894 #指定发信号 (USR1) signal to number this process.

In Linux systems, Linux communicates with "running processes" through signals. In Linux systems, there are also many predefined signals, such as SIGHUP. USR1 is a user-defined signal. It can be understood as the process itself defining what to do when it receives this signal (that is, the process writer himself decides whether to receive this signal or do nothing, and leaves it entirely to the developer to decide). In nginx, it writes its own code to handle the handling of having nginx reopen the log file when I receive a USR1 signal. The specific principle is as follows:

1. The main process of nginx receives the USR1 signal and will reopen the log file (named after the log name in the nginx configuration file, which is the value set by the access_log item in the configuration file, and if the file does not exist, a new file xxx.log will be automatically created).

2. Then change the owner of the log file to "worker process", so that the worker process has read and write permissions to the log file (master and worker usually run as different users, so the owner needs to be changed).

3. The nginx main process will close the duplicate log file (that is, the file that was renamed to xxx.log_ 20130909.log using the mv command just now),and notifies the worker process to use the newly opened log file(xxx.log the file opened by the main process just now). The specific implementation is more detailed, the main process sends the USR1 signal to the worker, and after receiving this signal, the worker will reopen the log file (that is, the xxx.log agreed in the configuration file)

=================================== Execute scripts at regular intervals

Set the shell script file above to be added to the scheduled task. crontab is a scheduled task process under Linux. This process will start when you turn it on, and it will go to its list every once in a while to see if there are any tasks that need to be performed.

crontab -e

* 04 * * 1 /data/wwwlogs/nginx_log_division.sh

A file will open with the above code added

The format is "Shell file path to be executed". * can be understood as "every", every minute, every hour, every month, etc.

I set up a script to run nginx_log_division.sh at 4 a.m. on Monday, and the content of the script is to regenerate a new log file.

Attached:Set upnginxHow logs are configured

log_format site '$remote_addr - $remote_user [$time_local] "$request" '

'$status $body_bytes_sent "$http_referer" '

'"$http_user_agent" $http_x_forwarded_for';

access_log /data/wwwlogs/xxxx.com.log site

#第二个参数表示使用那个日志格式, a name is identified for each log format, and the site corresponds to the name in the log_format

The above involves the use of crontab scheduled task manager.

There are also places where there is not a complete understanding and mistakes. Hope to update in the future.

johnyoung · Posted on 1/6/2015 12:04:30 AM

Oh, can reply make money?

admin · Posted on 1/6/2015 12:06:16 AM

johnyoung posted on 2015-1-6 00:04
Oh, can reply make money?

It can have prestige

johnyoung · Posted on 1/6/2015 12:13:44 AM

admin Posted on 2015-1-6 00:06
It can have prestige

What is prestige for?

[Web] nginx automatically cuts access logs

Sections viewed