Skip to content

phfaustini/TfidfVectorizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

C++ TfidfVectorizer

Convert raw documents to a matrix of TF-IDF features.

Requirements:

  • Armadillo, g++, boost
sudo apt install g++ libboost-all-dev libarmadillo-dev

Compiling and running example in main.cc:

g++ main.cc src/tfidf_vectorizer.cc -larmadillo -std=c++11 && ./a.out

Features:

Notes:

  • Features are in rows, documents (objects) are in columns.
  • This behavior is opposed to what is normally done in Python, but it is the default in C++ libraries such as MLPack.

Optional: unit tests

  • Install catch2
git clone https://github.com/catchorg/Catch2.git # somewhere else
cd Catch2
cmake -Bbuild -H. -DBUILD_TESTING=OFF
sudo cmake --build build/ --target install 
  • Run tests
cd tests/
g++ t1.cc -larmadillo -std=c++11 -o tests
./tests

About

Convert raw documents to a matrix of TF-IDF features.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages