Skip to content

feat: add c language support #167

Open
chinaux wants to merge 4 commits intomainfrom
feat/add_c_api
Open

feat: add c language support #167
chinaux wants to merge 4 commits intomainfrom
feat/add_c_api

Conversation

@chinaux
Copy link
Collaborator

@chinaux chinaux commented Feb 25, 2026

Summary

Add C language bindings.

Changes

src/c_api/c_api.cc

  • Implement C API wrapper in C++ to expose functionality to C

src/include/zvec/c_api.h

  • Add C-compatible header file defining the public C API

examples/c_api/

  • Add example programs demonstrating usage of the C API

Testing

  • Add C API unit tests in tests/c_api/c_api_test.c

@Cuiyus
Copy link
Collaborator

Cuiyus commented Feb 26, 2026

@greptile

@greptile-apps
Copy link

greptile-apps bot commented Feb 26, 2026

Greptile Summary

This PR adds comprehensive C language bindings to the zvec project, including a C API wrapper (c_api.cc), public C header (c_api.h), examples, and test suite.

Key additions:

  • Complete C API covering collections, documents, schemas, queries, and configuration
  • 6 example programs demonstrating API usage
  • Comprehensive test suite with 2,350+ lines of test code
  • Proper CMake integration with BUILD_EXAMPLES option

Critical issue:

  • Memory management bug throughout c_api.cc: functions allocate memory with new[] but document/use free() for deallocation, causing undefined behavior. This affects copy_string(), zvec_string_create(), zvec_doc_get_pk_copy(), and all error message handling.

The implementation is otherwise well-structured with thread-safe error handling, proper API versioning, and comprehensive functionality coverage.

Confidence Score: 1/5

  • This PR contains critical memory management bugs that will cause undefined behavior and potential crashes in production
  • Score reflects the severity of memory allocation/deallocation mismatches (mixing new[]/delete[] with malloc/free) that occur in multiple core functions throughout the C API implementation. These bugs will cause undefined behavior on most platforms and must be fixed before merging
  • src/c_api/c_api.cc requires immediate attention to fix memory management issues. All functions that allocate strings need to be updated to use consistent allocation/deallocation methods

Important Files Changed

Filename Overview
src/c_api/c_api.cc Implements C API wrapper with critical memory management bugs - mixing new[]/delete[] with malloc/free causes undefined behavior in string allocation functions
src/include/zvec/c_api.h Comprehensive C API header with proper structure, though documentation inconsistencies exist regarding memory deallocation methods
CMakeLists.txt Added BUILD_EXAMPLES option and examples directory - straightforward and safe changes
examples/c_api/basic_example.c Demonstrates C API usage but contains memory deallocation bug at line 227 (uses free() on memory allocated with new[])
tests/c_api/c_api_test.c Comprehensive test suite covering major C API functionality

Class Diagram

%%{init: {'theme': 'neutral'}}%%
classDiagram
    class ZVec_C_API {
        +zvec_initialize()
        +zvec_shutdown()
        +zvec_get_version()
    }
    
    class Collection_API {
        +zvec_collection_create()
        +zvec_collection_open()
        +zvec_collection_insert()
        +zvec_collection_query()
        +zvec_collection_flush()
    }
    
    class Schema_API {
        +zvec_collection_schema_create()
        +zvec_field_schema_create()
        +zvec_field_schema_set_hnsw_index()
        +zvec_field_schema_set_invert_index()
    }
    
    class Document_API {
        +zvec_doc_create()
        +zvec_doc_set_pk()
        +zvec_doc_add_field()
        +zvec_doc_get_pk_copy()
    }
    
    class Index_API {
        +zvec_index_params_hnsw_create()
        +zvec_index_params_invert_create()
    }
    
    class Config_API {
        +zvec_config_data_create()
        +zvec_config_log_create()
    }
    
    class Memory_Management {
        +zvec_free()
        +zvec_free_string()
        +zvec_free_str()
        +zvec_docs_free()
    }
    
    ZVec_C_API --> Collection_API
    ZVec_C_API --> Config_API
    Collection_API --> Schema_API
    Collection_API --> Document_API
    Schema_API --> Index_API
    Document_API --> Memory_Management
    Collection_API --> Memory_Management
Loading

Last reviewed commit: 67ff247

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

19 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +2906 to +2908
char *result = new char[pk.length() + 1];
strcpy(result, pk.c_str());
return result;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Memory allocated with new[] but documented (line 1966 in c_api.h) to be freed with free(). This causes undefined behavior - mixing new[]/delete[] with malloc/free.

Suggested change
char *result = new char[pk.length() + 1];
strcpy(result, pk.c_str());
return result;
char *result = static_cast<char*>(malloc(pk.length() + 1));
strcpy(result, pk.c_str());
return result;

Comment on lines +138 to +174
ZVecString *zvec_string_create(const char *str) {
if (!str) {
set_last_error_details(ZVEC_ERROR_INVALID_ARGUMENT,
"String pointer cannot be null", __FILE__, __LINE__,
__FUNCTION__);
return nullptr;
}

ZVecString *zstr = nullptr;
char *data_buffer = nullptr;

try {
size_t len = strlen(str);
zstr = new ZVecString();
data_buffer = new char[len + 1];
strcpy(const_cast<char *>(data_buffer), str);

zstr->data = data_buffer;
zstr->length = len;
zstr->capacity = len + 1;

return zstr;

} catch (const std::exception &e) {
if (data_buffer) {
delete[] data_buffer;
}
if (zstr) {
delete zstr;
}

set_last_error_details(ZVEC_ERROR_INTERNAL_ERROR,
std::string("String creation failed: ") + e.what(),
__FILE__, __LINE__, __FUNCTION__);
return nullptr;
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Memory allocated with new[] at line 152 but should be freed with delete[]. Since the API provides zvec_free_string which uses delete[], this is correct. However, zvec_string_create is also used in zvec_config_file_log_create where the memory is later freed with zvec_free_str (which calls free()) instead of delete[].

Suggested change
ZVecString *zvec_string_create(const char *str) {
if (!str) {
set_last_error_details(ZVEC_ERROR_INVALID_ARGUMENT,
"String pointer cannot be null", __FILE__, __LINE__,
__FUNCTION__);
return nullptr;
}
ZVecString *zstr = nullptr;
char *data_buffer = nullptr;
try {
size_t len = strlen(str);
zstr = new ZVecString();
data_buffer = new char[len + 1];
strcpy(const_cast<char *>(data_buffer), str);
zstr->data = data_buffer;
zstr->length = len;
zstr->capacity = len + 1;
return zstr;
} catch (const std::exception &e) {
if (data_buffer) {
delete[] data_buffer;
}
if (zstr) {
delete zstr;
}
set_last_error_details(ZVEC_ERROR_INTERNAL_ERROR,
std::string("String creation failed: ") + e.what(),
__FILE__, __LINE__, __FUNCTION__);
return nullptr;
}
}
ZVecString *zvec_string_create(const char *str) {
if (!str) {
set_last_error_details(ZVEC_ERROR_INVALID_ARGUMENT,
"String pointer cannot be null", __FILE__, __LINE__,
__FUNCTION__);
return nullptr;
}
ZVecString *zstr = nullptr;
char *data_buffer = nullptr;
try {
size_t len = strlen(str);
zstr = new ZVecString();
data_buffer = static_cast<char*>(malloc(len + 1));
strcpy(data_buffer, str);
zstr->data = data_buffer;
zstr->length = len;
zstr->capacity = len + 1;
return zstr;
} catch (const std::exception &e) {
if (data_buffer) {
free(data_buffer);
}
if (zstr) {
delete zstr;
}
set_last_error_details(ZVEC_ERROR_INTERNAL_ERROR,
std::string("String creation failed: ") + e.what(),
__FILE__, __LINE__, __FUNCTION__);
return nullptr;
}
}

@feihongxu0824 feihongxu0824 requested a review from egolearner March 4, 2026 02:21
@feihongxu0824 feihongxu0824 self-requested a review March 10, 2026 08:19
This was referenced Mar 16, 2026
@kdy1 kdy1 mentioned this pull request Mar 17, 2026
chinaux and others added 4 commits March 19, 2026 10:12
format c_api.h

fix some code

format c_api code

build rpm

fix some code

fix some code

remove rpm build

Add GitHub Actions release workflow for C API

add release linux-arm64

remove c api version

fix some code

fix some code

fix some code

refact some code

format some code

format some code

format some code

add api reference
// Version Information
// =============================================================================

/** @brief Major version number */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer to git tag based version to manually maintained version.

在 C 项目中,基于 Git Tag 自动生成版本号是推荐的最佳实践。这样可以确保代码版本与 Git 标签保持同步,避免手动修改头文件的繁琐。

实现这一功能通常分为三个步骤:

  1. 创建版本头文件模板 (version.h.in)。
  2. CMakeLists.txt 中编写脚本:获取 Git Tag 并解析。
  3. 生成头文件:在编译前通过 CMake 填充模板。

以下是完整实现方案:


1. 创建头文件模板 version.h.in

在项目目录下创建一个名为 version.h.in 的文件。使用 @VAR@ 语法作为占位符:

#ifndef ZVEC_VERSION_H
#define ZVEC_VERSION_H

/** @brief Major version number */
#define ZVEC_VERSION_MAJOR @ZVEC_VERSION_MAJOR@

/** @brief Minor version number */
#define ZVEC_VERSION_MINOR @ZVEC_VERSION_MINOR@

/** @brief Patch version number */
#define ZVEC_VERSION_PATCH @ZVEC_VERSION_PATCH@

/** @brief Full version string */
#define ZVEC_VERSION_STRING "@ZVEC_VERSION_STRING@"

#endif // ZVEC_VERSION_H

2. 在 CMakeLists.txt 中获取 Git 信息

编写逻辑来运行 git describe 并解析版本号。

cmake_minimum_required(VERSION 3.10)
project(zvec LANGUAGES C)

# --- 1. 获取 Git 版本信息 ---
# 默认版本号(如果不在 git 仓库中或没找到 tag)
set(ZVEC_VERSION_MAJOR 0)
set(ZVEC_VERSION_MINOR 0)
set(ZVEC_VERSION_PATCH 0)
set(ZVEC_VERSION_STRING "0.0.0-unknown")

# 尝试运行 git describe
find_package(Git QUIET)
if(GIT_FOUND AND EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/.git")
    execute_process(
        COMMAND ${GIT_EXECUTABLE} describe --tags --always --dirty
        WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}
        OUTPUT_VARIABLE GIT_TAG
        OUTPUT_STRIP_TRAILING_WHITESPACE
    )

    # 假设 tag 格式为 v1.2.3 或 1.2.3
    # 使用正则解析:^v?([0-9]+)\.([0-9]+)\.([0-9]+)
    if(GIT_TAG MATCHES "^v?([0-9]+)\\.([0-9]+)\\.([0-9]+)")
        set(ZVEC_VERSION_MAJOR ${CMAKE_MATCH_1})
        set(ZVEC_VERSION_MINOR ${CMAKE_MATCH_2})
        set(ZVEC_VERSION_PATCH ${CMAKE_MATCH_3})
        set(ZVEC_VERSION_STRING ${GIT_TAG})
    else()
        # 如果 tag 不符合语义化版本格式,则直接使用整个 tag 作为字符串
        set(ZVEC_VERSION_STRING ${GIT_TAG})
    endif()
endif()

# --- 2. 生成版本头文件 ---
# 将模板填充后生成到 build 目录,防止污染源码目录
configure_file(
    "${CMAKE_CURRENT_SOURCE_DIR}/version.h.in"
    "${CMAKE_CURRENT_BINARY_DIR}/generated/zvec_version.h"
)

# --- 3. 配置项目 ---
add_library(zvec src/zvec.c)

# 必须将生成的目录加入包含路径
target_include_directories(zvec PUBLIC 
    "${CMAKE_CURRENT_BINARY_DIR}/generated"
)

3. 在 C 代码中使用

在你的代码中,只需包含生成的头文件:

#include "zvec_version.h"
#include <stdio.h>

void print_version() {
    printf("ZVec Version: %s\n", ZVEC_VERSION_STRING);
    printf("Major: %d, Minor: %d, Patch: %d\n", 
           ZVEC_VERSION_MAJOR, ZVEC_VERSION_MINOR, ZVEC_VERSION_PATCH);
}

4. 工作流程说明

  1. 打标签:在 Git 中打一个符合 vX.Y.Z 格式的标签:
    git tag -a v0.3.0 -m "Release version 0.3.0"
  2. 运行 CMake
    mkdir build && cd build
    cmake ..
    此时 CMake 会执行 git describe,获取到 v0.3.0,解析出 0, 3, 0,并生成 zvec_version.h
  3. 自动处理开发版本
    如果你在 v0.3.0 之后又提交了 5 次代码,git describe --tags --dirty 会输出类似 v0.3.0-5-gabc123 的字符串,这会自动变成你的 ZVEC_VERSION_STRING

进阶:如何让版本号在每次 build 时更新?

上述方案只在执行 cmake 配置时更新版本号。如果你希望每次 make 时都检查版本(即使没有重新 run cmake),通常需要使用 add_custom_command 调用一个独立的 CMake 脚本来生成头文件。

但对于大多数项目,只有在发布新版本(打 Tag)时才需要改变版本号,因此上述 configure_file 方案已经足够,且性能更好。如果你修改了 Tag,只需手动运行一次 cmake .. 即可。


// RAII guard for array allocated with new[]
template <typename T>
struct DeleteArrayGuard {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those guards are not used? Can they be removed?
BTW, you can used unique_ptr instead of Delete(Array)Guard.


bool zvec_check_version(int major, int minor, int patch) {
if (major < 0 || minor < 0 || patch < 0) {
set_last_error_details(ZVEC_ERROR_INVALID_ARGUMENT,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

define macro to simplify code

#define SET_LAST_ERROR(code, msg) set_last_error_details(code, msg, __FILE__,
                           __LINE__, __FUNCTION__);

/**
* @brief Mutable string structure (owns memory)
*/
typedef struct {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use c99 flexible array to reduce malloc count(2->1)

struct A {
    int a;
    char data[];   // flexible array member
};

set_last_error(status.message());
return ZVEC_ERROR_INTERNAL_ERROR;
}
} g_initialized.store(true);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing new line

cpp_config.log_config = log_config;
}
// Initialize global configuration
auto status = zvec::GlobalConfig::Instance().Initialize(cpp_config);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initialize and error check common logic could be extracted out of if/else.

return ZVEC_OK;)
}

ZVecErrorCode zvec_is_initialized(bool *initialized) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update to bool zvec_is_initialized() to simplify usage?


void zvec_config_file_log_destroy(ZVecFileLogConfig *config) {
if (config) {
if (config->dir.data) free((void *)config->dir.data);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cast is unnecessary, void * can be implicitly casted.

}

void zvec_config_console_log_destroy(ZVecConsoleLogConfig *config) {
if (config) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unnecessary. free accepts NULL.

static char *copy_string(const std::string &str) {
if (str.empty()) return nullptr;
char *copy = static_cast<char *>(malloc(str.length() + 1));
strcpy(copy, str.c_str());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use strncpy to support binary str input.

for (size_t i = 0; i < *result_count; ++i) {
const std::string pk = i < pks.size() ? pks[i] : std::string();
const std::string message = statuses[i].message();
(*results)[i].pk = copy_string(pk);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use ordered style so that pk does not need to be stored in result?

ZVecDoc *zvec_doc_create(void) {
ZVEC_TRY_RETURN_NULL("Failed to create document", {
auto doc_ptr =
new std::shared_ptr<zvec::Doc>(std::make_shared<zvec::Doc>());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just use plain Doc?

auto* doc = new Doc();

set_last_error("Invalid value size for vector_fp64 type");
return error_code;
}
(*doc_ptr)->set(name, vec);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move(vec) for better performance

reinterpret_cast<const int8_t *>(field->value.vector_value.data),
reinterpret_cast<const int8_t *>(field->value.vector_value.data) +
field->value.vector_value.length);
(*doc_ptr)->set(name, vec);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move(vec) for better performance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants