similarity-search


A general framework for similarity search.
Run
script
git clone https://github.com/xinyandai/similarity-search.git
cd similarity-search
sh srp.sh
command
git clone https://github.com/xinyandai/similarity-search.git
cd similarity-search/src
mkdir build
cd build
cmake ..
make srp
./srp -h
./srp -t ${train_data_file} -b ${base_data_file} -q ${query_data_file} -g ${ground_truth_file}
You should predefine ${train_data_file} ${base_data_file} ${query_data_file} ${ground_truth_file}.
process
-
- choose the index structure and query method
-
- load data, include training data, base data and query data, ground truth is needed for recall.
-
- training the index using training data
-
- store the base data in index
-
- for each query, probe high priority items/buckets using the choosed query method
-
- for each query, re-rank the probed items
index type
- map based index
- bit index
- sign random projection
- ITQ
- PCAH(to be developed)
- int index
- cluster index
- graph based index(to be developed)
- transform based method(for maximum inner product search)
- Simple-LSH
- Norm-Range LSH
- L2-ALSH
query method
- int ranking(for int index)
- hamming ranking(for bit index)
- cluster ranker(for cluster based method)
- inverted multi index(for pq only)
- norm-range(for norm-range index only)
style check
pip install cpplint
cpplint --recursive --filter=-whitespace,-runtime/indentation_namespace src/include/*
Acknowledgement
Our project is developed based on GQR.
Reference
PQ based method for similarity search
Norm-Ranging LSH for Maximum Inner Product Search