2007-11-15

TF-IDF

假設要測試的文件為 {D} = {d1, d2, ..., dn}、全部關鍵詞的集合為 {K} = {k1, k2, ..., km}。則:
  • kx1 表示 dx 中 k1 出現的次數
  • TSum(dx) = kx1 + kx2 + ... + kxm
  • TF(dx) = (wx1, wx2, ..., wxm) ,wxi = kxi / TSum(dx)
  • DF(D, ki) = D 中含有 ki 的文件數量
  • IDF(D, ki) = log(2, p/DF(D, ki))
  • TF-IDF(D, dx) = (w1, w2, ..., wq),wi = TF(dx) * IDF(D, ki)