首页 | 本学科首页   官方微博 | 高级检索  
     

基于新词发现的环境管理专业词库构建及其实证应用
引用本文:王芷筠, 常杪, 周黎, 郭培坤, 谷美枫. 基于新词发现的环境管理专业词库构建及其实证应用[J]. 环境工程技术学报, 2021, 11(2): 385-392. doi: 10.12153/j.issn.1674-991X.20200127
作者姓名:王芷筠  常杪  周黎  郭培坤  谷美枫
作者单位:1.清华大学环境学院;;2.攀枝花市生态环境局环境信息与技术评估服务中心
摘    要:随着我国环境政策法规数量的不断增加,采用纯人工方式对政策法规进行整理归纳和分析解读变得越来越困难。运用文本挖掘等计算机技术辅助开展环境政策法规信息提取、内容分析以及智能化管理应用具有重要意义。精准分词则是实现文本挖掘各项功能的必要条件。为改善政策法规文本分词效果,以我国各级生态环境部门官网发布的环境政策法规文本为语料基础,通过新词发现算法与人工补充修正构建得到环境管理专业词库。应用实证结果表明:添加专业词库能将政策法规文本的分词准确率由72.6%升至94.1%;将基于支持向量机模型的政策法规文本自动分类误判率降低22.7%;且添加词库后的词频统计和关键词提取结果能为环境政策法规分析提供更全面、更具有时效性的统计信息。

关 键 词:新词发现   环境政策   专业词库   文本挖掘
收稿时间:2020-05-22

Development of environmental management lexicon based on new word discovery and its empirical application
WANG Zhijun, CHANG Miao, ZHOU Li, GUO Peikun, GU Meifeng. Development of environmental management lexicon based on new word discovery and its empirical application[J]. Journal of Environmental Engineering Technology, 2021, 11(2): 385-392. doi: 10.12153/j.issn.1674-991X.20200127
Authors:WANG Zhijun  CHANG Miao  ZHOU Li  GUO Peikun  GU Meifeng
Affiliation:1. School of Environment, Tsinghua University;;2. Service Center of Environmental Information and Technology Assessment, Panzhihua Bureau of Ecology and Environment
Abstract:With the rapid development of environmental policies in China, collating, inducing, analyzing and interpreting a large number of policies and regulations in a purely manual way has become more and more difficult. Therefore, it is of great significance to use computer technologies, such as text mining, to support intelligent environmental policy management and environmental policy analysis, including information extraction and text analysis. Accurate word segmentation, or tokenization, is the basis of all text mining functions. In order to improve the effect of policy text segmentation, the environmental policies published on official websites of China?s ecological and environmental departments of all levels were collected and taken as corpus. New word discovery algorithms and manual supplement and modification were adopted to develop the environmental management professional lexicon. The empirical results showed that with addition of the environmental lexicon, the accuracy of environmental policy segmentation could improve from 72.6% to 94.1%, and the misjudgment rate of policy automatic classification based on support vector machine could reduce by 22.7%. Besides, the results of word frequency statistics and keyword extraction after adding lexicon could also provide more comprehensive and more timely statistical information for environmental policy analysis.
Keywords:new word discovery  environmental policy  lexicon  text mining
本文献已被 万方数据 等数据库收录!
点击此处可从《环境工程技术学报》浏览原始摘要信息
点击此处可从《环境工程技术学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号