Hello,
can you please adapt download_embeddings to also support file://.... url schema.
this will allow us to provide centralized embedings files on our cluster, avoiding the download time.
something in this spirit.
log.info("Downloading embeddings from %s �~F~R %s", url, dest)
if url.startswith("file://"):
from urllib.request import urlopen
with urlopen(url) as resp:
content = resp.read()
with open(dest, "wb")as f:
f.write(content)
return dest
else:
#else download from given URL
with requests.get(url, stream=True, timeout=timeout) as resp:
resp.raise_for_status()
problem, this will duplicate the file for each user // base_directory
I would prefer to main.py to directly handle embeddings data file
something like:
# Nuevo: obtener nombre del archivo desde la URL
URL=conf["embeddings_url"]
if URL.startswith('file://'):
file_path=urllib.parse.urlparse(URL).path
if os.path.exists(file_path):
tar_path=file_path
else:
filename = os.path.basename(urllib.parse.urlparse(conf["embeddings_url"]).path)
tar_path = os.path.join(embeddings_dir, filename)
logger.info(f"Downloading reference embeddings to {tar_path}...")
download_embeddings(conf["embeddings_url"], tar_path)
logger.info("Loading embeddings into the database...")
load_dump_to_db(tar_path, conf)
maybeed there is something I missed regarding the embeddings file needed in base_directory
lmk if this sounds acceptable and which method you prefer. I will then propose a PR
!hasta luego¡
Eric
Hello,
can you please adapt
download_embeddingsto also supportfile://....url schema.this will allow us to provide centralized embedings files on our cluster, avoiding the download time.
something in this spirit.
problem, this will duplicate the file for each user // base_directory
I would prefer to main.py to directly handle embeddings data file
something like:
maybeed there is something I missed regarding the embeddings file needed in base_directory
lmk if this sounds acceptable and which method you prefer. I will then propose a PR
!hasta luego¡
Eric