• 2021-04-14
    中国大学MOOC:下面是一段文档的向量化的程序,且未经停用词过滤fromsklearn.feature_extraction.textimportCountVectorizercorpus=[JobswasthechairmanofAppleInc.,andhewasveryfamous,Iliketouseapplecomputer,AndIalsoliketoeatapple]vectorizer=CountVectorizer()print(vectorizer.vocabulary_)print(vectorizer.fit_transform(corpus).todense())#转化为完整特征矩阵已知print(vectorizer.vocabulary_)的输出结果为:{uand:1,ujobs:9,uapple:2,uvery:15,ufamous:6,ucomputer:4,ueat:5,uhe:7,uuse:14,ulike:10,uto:13,uof:11,ualso:0,uchairman:3,uthe:12,uinc:8,uwas:16}.则最后一条print语句中文档D1,即JobswasthechairmanofAppleInc.,andhewasveryfamous的向量为
  • 举一反三