提取关键字的文章是,小说完美世界的前十章;
我事先把前十章合并到了一个文件中;
然后直接调用关键字函数;
1 import sys 2 sys.path.append('../') 3 4 import jieba 5 import jieba.analyse 6 from optparse import OptionParser#引入关键词的包 7 from docopt import docopt 8 data_path = "C:\\Users\\wangyuguang\\Desktop\\work_data\\profect_world\\" 9 topK = 1010 withWeight = False11 content = ""12 for i in range(1,2):13 Data_path = data_path + "he"+".txt"14 content ="".join(open(Data_path, 'rb').read())15 # print content16 tags = jieba.analyse.extract_tags(content, topK=topK, withWeight=withWeight)#直接调用17 18 if withWeight is True:19 for tag in tags:20 print("tag: %s\t\t weight: %f" % (tag[0],tag[1]))21 else:22 print(",".join(tags))
关键字结果:
Building prefix dict from the default dictionary ...Loading model from cache c:\users\wangyuguang\appdata\local\temp\jieba.cacheLoading model cost 0.386 seconds.Prefix dict has been built succesfully.小不点,孩子,族长,石云峰,石村,凶禽,青鳞鹰,凶兽,一群,石昊