Skip to content

Discussion: add HTTP API for looper #433

@simeoncarstens

Description

@simeoncarstens

This issue is coauthored by @zz1874.

looper is a CLI tool that often runs on the front node of a HPC cluster, so jobs can be submitted to Slurm / SGE / other job schedulers.
@nsheff expressed desire for a HTTP API for looper which wraps around looper. That would allow him and other users to run looper on the front node and use a reverse SSH tunnel from a different machine to send HTTP requests to the HTTP API.
Advantages of this would be

  • use of looper functionalities from any machine without manually copying code to the frontend node,
  • potential for a graphical user interface (GUI) that builds upon that API.

An earlier attempt of this was caravel (https://github.com/pepkit/caravel). @nsheff tells us that there were issues, possibly due to the synchronous nature of the Flask framework. caravel seems to be a Python 2.7 code base that uses 2to3 to convert to Python 3 code on-the-fly during installation via setuptools' use_2to3. This makes it, in the meantime, hard to run caravel for reasons such as: setuptools doesn't come with use_2to3 anymore, the Docker image cannot be built anymore, Debian index URLs are out of date, Python 3.6-specific typing imports are used, ...

After browsing the looper and caravel code, we identified the following possibilities:

  1. Revive caravel, meaning bringing it up-to-date with recent Python versions and making it compatible with recent looper versions,
  2. Write a new HTTP API from scratch, which leaves us at least three possibilities:
    1. Figure out a way of automagically creating both CLI and HTTP API from a single definition of commands / options.
      This would likely be the most sustainable idea, as it prevents the need of keeping CLI and HTTP API in sync if commands / options are added / removed in the future. But it would possibly be a larger undertaking with the risk of being only partially finished in the limited time we can work on it. It would also possibly make a nice separate, reusable library!
    2. Implement only the most important top-level commands and their options as HTTP API endpoints, but design this easily transferable to other commands and document the development process. That way, a subset of the looper commands / options could likely be made available via the HTTP API in the little development time we have. But this also means an increased maintenance burden - if a new CLI command / option is added, the HTTP API and its documentation have to be adapted accordingly.
    3. Implement only top-level commands and allow setting of flags / options only via a project configuration file that is POSTed to the API. This would be the easiest and quickest solution, but limits the use cases of the API. A similarly easy and inflexible approach would be POSTing a string of command lines argument that is then parsed by looper's existing argparse argument parser.

Important questions that would need to be answered:

  • Which version of looper should we develop against? looper is currently at v1.5.1, but there is a PR open for v.1.6.0, and in fact we could only get the hello_looper example working with the future v1.6.0 of looper. A similar question holds for pipestat, if required for development of the HTTP API. The answer is: v.1.6.0 for looper and v0.6.0 for pipestat, as both new versions have now been released.
  • What were the exact issues you faced with caravel? Knowing them would help us make a more informed decision whether to possibly revive caravel or to redevelop from scratch, avoiding mistakes made in caravel. Answer: Discussion: add HTTP API for looper #433 (comment)
  • If we were to decide to implement only a subset of the top-level looper commands: which commands have the highest priority and should thus be implemented first as HTTP API calls? Answer: looper run, looper runp, looper check, looper report (Discussion: add HTTP API for looper #433 (comment))

And finally, of course:

  • Which of options 1-3.1-3 should we pursue? We should discuss this question together with @nsheff and add the answer in a comment. Answer: in a call with @nsheff, we decided to go with 2.1. Details in a comment below.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions