2017-05-23

awk

内容概要: awk级别格式, 逐行打印,　分隔符, 内置变量,　同时处理多个文件, BEGIN＆END, awk使用变量, awk使用条件变量, awk使用正则表达式, awk脚本, 拆分文件, 找出长度最大的行, 统计, 举例说明…

1. awk命令的基本格式，

$ awk [options] 'program' file
- options ：表示可选参数 
- program ：表示awk的可执行脚本代码
- file ： 表示需要处理的文件

2. 逐行打印

#1.
$ awk '{print $0}' score.txt

#2.
$ awk '{print}' score.txt

#3.
$ awk '//{print $0}' score.txt

3. 分隔符

3.1 指定’:’为分隔符，awk默认的分隔符为空格和制表符

1	$ awk -F ':' '{print $2}' /etc/passwd

3.2 指定多个分隔符，’[:)]’

$ awk -F '[:())]' '{print $1,$2}' score.txt

#例如从以下文本汇总切出id:  "id":"123456", "title":"hello world"....
$ awk -F 'id|title' '{print $2}' menu.txt

3.3 打印每行第一项，第二项…

1	awk '{print $1,$2,$3}' score.txt

3.4 加入分隔符

1	$ awk '{print $1,"\t","\t",$2,$3,"\n"}' score.txt

3.5 指定输出分隔符为”**”

1	awk -F ':' '{print $1,$2,$3,$4,OFS="**"}' score.txt

3.6 打印每行第一项以外的所有项:

1
2
3

$ awk '{$1="";print}' test 
$ awk 'sub($1,"")' test
$ awk '{for(i=2;i<=NF;i++) {if(i<NF){printf "%s", $i} else{print $i}}}' test

4. awk内置变量的使用

$0 这个表示文本处理时的当前行

$1 表示文本行被分隔后的第 1 个字段列

$2 表示文本行被分割后的第 2 个字段列

$3 表示文本行被分割后的第 3 个字段列

$n 表示文本行被分割后的第 n 个字段列

NR 表示文件中的行号，表示当前是第几行

NF 表示文件中的当前行列的个数，类似于 mysql 数据表里面每一条记录有多少个字段

FS 表示 awk 的输入分隔符，默认分隔符为空格和制表符，你可以对其进行自定义设置

OFS 表示 awk 的输出分隔符，默认为空格，你也可以对其进行自定义设置

FILENAME 表示当前文件的文件名称，如果同时处理多个文件，它也表示当前文件名称

打印行号,内置变量NR表示行号

1	$ awk '{print NR , $1}' score.txt

内置变量NF表示每一行的列数

1	$ awk '{print NF}' score.txt

内置变量NF的使用示例，打印最后一列

1 2	$ awk '{print $NF}' score.txt $ awk '{if(NR>90 && NR <108)print $0}' LinuxStructuredCommands.md > ttt

5. 同时处理多个文件

1	awk '{print $0}' score.txt, students.txt

6. BEGIN关键字

在开始读取一个文件之前，运行一次 BEGIN后面的脚本，只执行一次

1	awk 'BEGIN {print "hello world!\n"} {print $1,$2}' score.txt

7. END关键字

END 指令和 BEGIN 恰好相反，在 awk 读取并且处理完文件的所有内容行之后，才会执行 END 后面的脚本代码段

1 2	$ awk 'BEGIN {print "hello world!\n"} END {print "\nGood bye....."} {print $1,$2}' score.txt $ awk 'BEGIN {print "hello world!\n"} {print $1,$2} END{print "\nGood bye....."}' score.txt

8. awk使用变量,注意分割符号’;’

awk '{msg="hello world!";print msg}' score.txt
awk 'BEGIN{msg="hello world!";print msg}' score.txt
awk 'BEGIN{msg="hello world!"} {print msg}' score.txt
awk 'BEGIN{a=1;b=2} {print a,"+",b,"=",a+b}' score.txt

9. awk使用条件判断

$ awk '$5>80 {print $0}' score.txt

$ awk '$5==80 {print $0}' score.txt

$ awk '{if($5>80) print $0}' score.txt

$ awk '$5>80 && $4=="History" {print $0}' score.txt

$ awk '{printf"%20s ：%-15s %-10s\n",FILENAME,$4,$5}' score.txt

9.1 awk使用for循环，例如:逐行打印文件中source那一行的所有项目
1
$ awk '/source/{for(i=0;i<=NF;i++)print $i,"\n"}' linux_command

10. awk中使用正则表达式

#打印匹配Maths的行
$ awk '/Maths/{print}' score.txt
$ awk '$0~/Maths/{print}' score.txt
$ awk '{if($0~Maths) print $0}' score.txt

#打印不包含Maths的行
$ awk '$0!~/Maths/{print}' score.txt
$ awk '{if($0!~Maths) print $0}' score.txt
$ grep -v 'Maths' score.txt

13. 执行awk脚本

1	$ awk -f calculate.awk score.txt

14. 拆分文件（使用重定向）：例如，按照每个学生的姓名，输出该学生的成绩到一个单独的文件

1	$ awk '{print >$1}' score.txt

15. 找出文件中长度大于100的行

1	$ awk 'length>100' calculate.awk

16. 统计

1	$ awk '{sum+=1} END {print sum}' score.txt

17. awk示例: 统计和打印学生成绩

#score.txt原始内容:
Amit 1001 80 90 78 100
Rahul 1002 90 89 87 89
Shyam 1003 87 78 96 94
Kedar 1004 85 85 89 83
Hari 1005 69 94 78 90


#calculate.awk脚本:
#!/usr/bin/awk 
#before calculate
BEGIN{
	math=0;
	english=0;
	computer=0;
	history=0;
	printf"----------------------------------------------------------------------------\n";
	printf("%-10s %10s %10s %10s %10s %10s %10s\n","Name","NO.","Math","English","Computer","History","Total")
	printf"----------------------------------------------------------------------------\n"

}
#calculating
{
	math+=$3
	english+=$4
	computer+=$5
	history+=$6
	printf("%-10s %10s %10d %10d %10d %10d %10d\n",$1,$2,$3,$4,$5,$6,$3+$4+$5+$6)

}

#after calculating
END{
	printf"----------------------------------------------------------------------------\n"
	printf("%-21s %10d %10d %10d %10d\n","Total",math,english,computer,history)	
	printf("%-21s %10.2f %10.2f %10.2f %10.2f\n","Average",math/NR,english/NR,computer/NR,history/NR)	
	
}

运行结果:

$ awk -f calculate.awk score.txt 
---------------------------------------------------------------------------
Name              NO.       Math    English   Computer    History      Total
----------------------------------------------------------------------------
Amit             1001         80         90         78        100        348
Rahul            1002         90         89         87         89        355
Shyam            1003         87         78         96         94        355
Kedar            1004         85         85         89         83        342
Hari             1005         69         94         78         90        331
----------------------------------------------------------------------------
Total                        411        436        428        456
Average                    82.20      87.20      85.60      91.20

打印每一个学生自己的成绩到单独文件:

$ awk '{print > $1}' score.txt 
$ ls
Amit  calculate.awk  command  command.md  Hari  Kedar  Rahul  score.txt  Shyam  students.txt
$ cat Amit 
     1	Amit 1001 80 90 78 100
$ cat Hari 
     1	Hari 1005 69 94 78 90
$ cat Kedar 
     1	Kedar 1004 85 85 89 83
$ cat Rahul 
     1	Rahul 1002 90 89 87 89
$ cat Shyam 
     1	Shyam 1003 87 78 96 94

18. 找出文件夹下面最大的文件

1	ls -Rl\| awk 'BEGIN{max=0;s=""}{if($5-max>0){s=$0;max=$5}}END{print s}'

19. 打印前100个字符

1	awk '{print substr($0,1,100)}'

20. awk内置函数substr()、split()、length()、index()

substr(str,start,n): 打印字符串str从start开始的n个字符

1
2
3

#打印每行第5-7的字符
$ echo "hello world" | awk '{print substr($0,5,3)}'
o w

split(str, array,fieldSeperator):如果第三个参数没有提供, 则默认使用当前FS值

1
2
3

s="12:34:56:78:90"
echo $time | awk '{split($0,s,":");print s[1],s[2],s[3],s[4],s[5]}'
12 34 56 78 90

length(str):字符串str的长度

1 2	$ echo "hello world" \|awk '{print length($0)}' 11

index(s,t): 返回字符串s中t的首位置

$ echo "hello world!" | awk '{print index($0,"wo")}'

# 从data.log里面找出filter item这一行日志, 这行日志中包含item完整信息, 找到publishTime的数值, 然后打印匹配"2018-12-1."的行 
$ grep --color 'filter item' data.log |awk '{s=index($0,"publishTime");pt=substr($0,s+12,20); if(match(pt,"2018-12-1.")) print $0}

gsub:
asorti:
gensub:
patsplit:
tolower:
toupper:

21. 数组

order.txt:

Java 10
Python 13
Java 11
Golang 2
C 10
Python 12
Shell 11
C 12
Golang 11
C 12
C++ 23

统计各个key的sum:

$ awk '{map[$1]+=$2}END{for(k in map) print k,map[k]}' order.txt 
C 34
Golang 13
Shell 11
C++ 23
Python 25
Java 21