Skip to content

please adapt download_embeddings to support 'file://' url scheme #54

@EricDeveaud

Description

@EricDeveaud

Hello,

can you please adapt download_embeddings to also support file://.... url schema.

this will allow us to provide centralized embedings files on our cluster, avoiding the download time.

something in this spirit.

    log.info("Downloading embeddings from %s �~F~R %s", url, dest)

    if url.startswith("file://"):
        from urllib.request import urlopen
        with urlopen(url) as resp:
            content = resp.read()
            with open(dest, "wb")as f:
                f.write(content)
        return dest
    else:
        #else download from given URL
        with requests.get(url, stream=True, timeout=timeout) as resp:
            resp.raise_for_status()

problem, this will duplicate the file for each user // base_directory

I would prefer to main.py to directly handle embeddings data file

something like:

    # Nuevo: obtener nombre del archivo desde la URL
    URL=conf["embeddings_url"]
    if URL.startswith('file://'):
        file_path=urllib.parse.urlparse(URL).path
        if os.path.exists(file_path):
            tar_path=file_path
    else:
        filename = os.path.basename(urllib.parse.urlparse(conf["embeddings_url"]).path)
        tar_path = os.path.join(embeddings_dir, filename)

        logger.info(f"Downloading reference embeddings to {tar_path}...")
        download_embeddings(conf["embeddings_url"], tar_path)

    logger.info("Loading embeddings into the database...")
    load_dump_to_db(tar_path, conf)

maybeed there is something I missed regarding the embeddings file needed in base_directory

lmk if this sounds acceptable and which method you prefer. I will then propose a PR

!hasta luego¡

Eric

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions