Skip to content

Latest commit

 

History

History
45 lines (33 loc) · 1.3 KB

File metadata and controls

45 lines (33 loc) · 1.3 KB

C++ TfidfVectorizer

Convert raw documents to a matrix of TF-IDF features.

Requirements:

  • Armadillo, g++, boost
sudo apt install g++ libboost-all-dev libarmadillo-dev

Compiling and running example in main.cc:

g++ main.cc src/tfidf_vectorizer.cc -larmadillo -std=c++11 && ./a.out

Features:

Notes:

  • Features are in rows, documents (objects) are in columns.
  • This behavior is opposed to what is normally done in Python, but it is the default in C++ libraries such as MLPack.

Optional: unit tests

  • Install catch2
git clone https://github.com/catchorg/Catch2.git # somewhere else
cd Catch2
cmake -Bbuild -H. -DBUILD_TESTING=OFF
sudo cmake --build build/ --target install 
  • Run tests
cd tests/
g++ t1.cc -larmadillo -std=c++11 -o tests
./tests