はじめに
日本語極性判定APIのアップデート情報をこちらにまとめます。主なアナウンスはモデル更新と精度改善についてです。また、簡易的にパフォーマンスチェックサイトを作りました。サンプルツイートと極性判定APIでの推定結果を載せていますので、ご確認ください。 現行版の極性判定APIのパフォーマンスチェックはこちら
デモサイト
サンプル
ソースコード
更新記録
2017/5/12 アップデート
Distant SupervisionでNeutralツイートを補強。これまでの学習データをイチから一括で学習。Distant Supervisionを使っているので性能(評価データの結果)については参考程度。 学習データ
positive | negative | neutral |
---|---|---|
18,491 | 97,954 | 75,921 |
評価データ
positive | negative | neutral |
---|---|---|
9,629 | 33,573 | 25,921 |
実験結果 Before(2016年11月21日版)
Examples labeled as 0 classified by model as 0: 4076 times
Examples labeled as 0 classified by model as 1: 2218 times
Examples labeled as 0 classified by model as 2: 3335 times
Examples labeled as 1 classified by model as 0: 1327 times
Examples labeled as 1 classified by model as 1: 28105 times
Examples labeled as 1 classified by model as 2: 4141 times
Examples labeled as 2 classified by model as 0: 11196 times
Examples labeled as 2 classified by model as 1: 9941 times
Examples labeled as 2 classified by model as 2: 4784 times
Accuracy: 0.5348
Precision: 0.4446
Recall: 0.4817
F1 Score: 0.4624
After(2017年5月12日版)
Examples labeled as 0 classified by model as 0: 7048 times
Examples labeled as 0 classified by model as 1: 759 times
Examples labeled as 0 classified by model as 2: 1822 times
Examples labeled as 1 classified by model as 0: 2090 times
Examples labeled as 1 classified by model as 1: 28438 times
Examples labeled as 1 classified by model as 2: 3045 times
Examples labeled as 2 classified by model as 0: 1101 times
Examples labeled as 2 classified by model as 1: 1015 times
Examples labeled as 2 classified by model as 2: 23805 times
Accuracy: 0.8578
Precision: 0.82
Recall: 0.8325
F1 Score: 0.8262
2016/11/21 アップデート
現状までのSocial Sentimentでのアノテーション結果を使って学習実行。Positive/Negative/Neutralの3値出力を正式に行うように変更。学習データが少なくテストに回す分がないため精度評価は未評価。
2016/11/12 アップデート1
学習データ補強。 学習データ
positive | negative |
---|---|
16,323 | 121,816 |
Early Stopping用の評価データ
positive | negative |
---|---|
884 | 6,436 |
性能評価用データ
positive | negative |
---|---|
500 | 500 |
Examples labeled as 0 classified by model as 0: 376 times
Examples labeled as 0 classified by model as 1: 124 times
Examples labeled as 1 classified by model as 0: 4 times
Examples labeled as 1 classified by model as 1: 496 times
Accuracy: 0.872
Precision: 0.8947
Recall: 0.872
F1 Score: 0.8832
2016/11/12 アップデート2
アップデート1に、極性辞書によるオンライン学習を追加。技術の詳細はこちらで解説。 辞書データ
positive | negative |
---|---|
5,215 | 7,505 |
評価データ(人手で現状手法の間違いを修正したもの)
positive | negative |
---|---|
607 | 76 |
Before
Examples labeled as 0 classified by model as 0: 123 times
Examples labeled as 0 classified by model as 1: 484 times
Examples labeled as 1 classified by model as 0: 20 times
Examples labeled as 1 classified by model as 1: 56 times
Accuracy: 0.2621
Precision: 0.4819
Recall: 0.4697
F1 Score: 0.4758
After
Examples labeled as 0 classified by model as 0: 303 times
Examples labeled as 0 classified by model as 1: 304 times
Examples labeled as 1 classified by model as 0: 32 times
Examples labeled as 1 classified by model as 1: 44 times
Accuracy: 0.5081
Precision: 0.5155
Recall: 0.5391
F1 Score: 0.527
2016/11/6 アップデート
軽微なバグ修正。
Examples labeled as 0 classified by model as 0: 108 times
Examples labeled as 0 classified by model as 1: 92 times
Examples labeled as 1 classified by model as 0: 3 times
Examples labeled as 1 classified by model as 1: 797 times
Accuracy: 0.905
Precision: 0.9347
Recall: 0.7681
F1 Score: 0.8433
2016/11/1 アップデート
学習データ増強。 学習データ
positive | negative |
---|---|
10,780 | 76,442 |
Early Stopping用の評価データ
positive | negative |
---|---|
996 | 7,692 |
性能評価用データ
positive | negative |
---|---|
200 | 800 |
Examples labeled as 0 classified by model as 0: 107 times
Examples labeled as 0 classified by model as 1: 93 times
Examples labeled as 1 classified by model as 0: 6 times
Examples labeled as 1 classified by model as 1: 794 times
Accuracy: 0.901
Precision: 0.921
Recall: 0.7638
F1 Score: 0.835
2016/10/17 初期
学習データ
positive | negative |
---|---|
6,000 | 40,900 |
性能評価用データ
positive | negative |
---|---|
160 | 1100 |
Examples labeled as 0 classified by model as 0: 136 times
Examples labeled as 0 classified by model as 1: 31 times
Examples labeled as 1 classified by model as 0: 96 times
Examples labeled as 1 classified by model as 1: 998 times
Accuracy: 0.8993
Precision: 0.778
Recall: 0.8633
F1 Score: 0.8185
関連情報
- deeplearning4jで日本語WikipediaのWord2Vecを作る
- deeplearning4jのword2vecの限界とその上手な使い方
- deeplearning4jのdoc2vecで極性判定してみた
- deeplearning4jのdoc2vecにwikipediaのword2vecモデルを注入する
- deeplearning4jでword2vecのベクトルデータからNNで極性判定してみる
- deeplearning4jでword2vecのベクトルデータからRNNで極性判定してみる
- 【API】日本語極性判定APIを公開しました
- 日本語極性判定技術のデモサイト作りました
- deeplearning4jのRNNで極性判定を作った~Early Stop編~
- deeplearning4jの極性判定でオンライン学習をやってみた
- 極性判定のニュートラルのデータを補強する