当前位置:网站首页>NLP model Bert: from introduction to mastery (2)

NLP model Bert: from introduction to mastery (2)

2020-11-06 01:22:30 Elementary school students in IT field

Named entity recognition

First download the corresponding bert modular

pip install bert-base==0.0.9 -i https://pypi.python.org/simple

Also can reference Official website Handle
install
 Insert picture description here
What the package now supports
1. Named entity recognition training
2. Services for Named Entity Recognition C/S
3. Inherit excellent open source software :bert_as_service(hanxiao) Of BERT All services
4. Text categorization Services
The following functions will continue to increase

Training named entity recognition model based on named row :

installed bert-base after , Two tools based on named rows will be generated , among bert-base-ner-train Support the training of named entity recognition model , You just need to specify the directory of training data ,BERT The directory of relevant parameters can be . You can use the following command to view help

 Insert picture description here
The examples of training are named as follows :

bert-base-ner-train \
    -data_dir {your dataset dir}\
    -output_dir {training output dir}\
    -init_checkpoint {Google BERT model dir}\
    -bert_config_file {bert_config.json under the Google BERT model dir} \
    -vocab_file {vocab.txt under the Google BERT model dir}

Parameter description
among data_dir It's the directory where your data is located , Training data , The naming format of validation data and test data is :train.txt, dev.txt,test.txt, Please name the file in this format , Otherwise, an error will be reported .
The format of training data is as follows :

 The sea  O
 fishing  O
 Than  O
 "  O
 The earth  O
 spot  O
 stay  O
 mansion  B-LOC
 door  I-LOC
 And  O
 gold  B-LOC
 door  I-LOC
 And  O
 between  O
 Of  O
 The sea  O
 Domain  O
. O

The first word in each line is , The second is its label , Use spaces ’ ' Separate , Please make sure to use spaces . Use blank lines between sentences . The program will automatically read your data .

output_dir: Training model output file path , Model checkpoint And some tag mapping tables will be stored here , This path is used as a service , Can be specified as -ner_model_dir
init_checkpoint: Download Google BERT Model
bert_config_file : Google BERT Under the model bert_config.json
vocab_file: Google BERT Under the model vocab.txt
After training , You can specify in your output_dir To see the results of your training .

More operations :
https://blog.csdn.net/macanv/article/details/85684284

One more bert Encapsulation of models

https://www.jianshu.com/p/1d6689851622
https://cloud.tencent.com/developer/article/1470051
https://www.h3399.cn/201908/714454.html

 WeChat ID

版权声明
本文为[Elementary school students in IT field]所创,转载请带上原文链接,感谢