Build a Python SDK with Streaming

This tutorial includes the following SDK languages and versions:

TypeScript v1 TypeScript v2 Java Python v1 Python v2 C# Go PHP
❌ ❌ ❌ ❌ ✅ ❌ ❌ ❌

TypeScript v1	TypeScript v2	Java	Python v1	Python v2	C#	Go	PHP
❌	❌	❌	❌	✅	❌	❌	❌

Streaming data is a common pattern when calling AI Large Language Models (LLMs) like Chat GPT, Claude, Llama or Mistral.

In this post we will explore how to seamlessly create an SDK for your favorite LLM using the liblab SDK generator.

The example in this tutorial will use Ollama to host a LLM locally on your computer, however you can use the same principles to access any LLM API that provides an OpenAPI file

The OpenAPI file contains API information like paths, params, security schemas, and other.

Prerequisites

A liblab account
The liblab CLI installed and you are logged in
Python ≥ 3.9
Ollama installed and running
A Large Language Model, such as Llama3.1

Steps

Setting up Ollama and installing Llama 3.1
Setting up the liblab CLI
Generating the SDK
Using the SDK
How to enable streaming endpoints
Conclusion

1. Setting up an example Llama API

First go to Ollama home to download and install the latest version.

Then, once Ollama is installed and running, execute the following command on a console to download the latest version of Llama 3.1:

ollama pull llama3.1

To verify that Llama 3.1 is installed, run the following command:

ollama run llama3.1

If all is well, you will be able to send prompts to the model and receive responses. For example:

ollama run llama3.1
>>> tell me a joke
Here's one:

What do you call a fake noodle?

(wait for it...)

An impasta!

Hope that made you laugh! Do you want to hear another?

>>>

2. Setting up the `liblab CLI`

First, ensure you have the liblab CLI installed. If not, you can install it via npm:

npm install -g @liblab/cli

Once installed, you need to log in to your liblab account. Run the following command and follow the prompts to log in:

liblab login

After logging in, you can configure the CLI with your project. We want a new directory for the SDK, let's create one called streaming:

mkdir -p streaming
cd streaming
liblab init

This will generate a liblab.config.json.

Before we edit the generated json file, let’s create the API spec for which we will generate this SDK.

ℹ️ Usually we don’t need to create the API spec ourselves since most APIs provide them. However we are using Ollama locally and it does not provide an OpenAPI spec.

Create a new file called ollama-open-api.yaml and paste the following content into it:

openapi: 3.0.0
info:
  title: Ollama API
  description: This is an open API Spec for Ollama, created internally by liblab. This is not an offical API Spec.
  version: 1.0.0
paths:
  /api/generate:
    post:
      description: Send a prompt to a LLM.
      operationId: generate
      x-liblab-streaming: true
      requestBody:
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/GenerateRequest'
      responses:
        '200':
          description: OK
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/GenerateResponse'

components:
  schemas:
    GenerateRequest:
      type: object
      required:
        - model
        - prompt
      properties:
        model:
          type: string
        prompt:
          type: string
        stream:
          type: boolean

    GenerateResponse:
      type: object
      required:
        - model
        - created_at
        - response
      properties:
        model:
          type: string
        created_at:
          type: string
        response:
          type: string
        done:
          type: boolean
        done_reason:
          type: string
        context:
          type: array
          items:
            type: integer
        total_duration:
          type: integer
        load_duration:
          type: integer
        prompt_eval_count:
          type: integer
        prompt_eval_duration:
          type: integer
        eval_count:
          type: integer
        eval_duration:
          type: integer

Now let's update the liblab.config.json file to use the our new spec. Copy and paste the json to overwrite the liblab.config.json file:

{
  "sdkName": "ollama-sdk",
  "apiVersion": "1.0.0",
  "apiName": "ollama-api",
  "specFilePath": "./ollama-open-api.yaml",
  "languages": ["python"],
  "auth": [],
  "customizations": {
    "baseURL": "http://localhost:11434",
    "includeOptionalSnippetParameters": true,
    "devContainer": false,
    "generateEnv": true,
    "inferServiceNames": false,
    "injectedModels": [],
    "license": {
      "type": "MIT"
    },
    "responseHeaders": false,
    "retry": {
      "enabled": true,
      "maxAttempts": 3,
      "retryDelay": 150
    },
    "endpointCustomizations": {
      "/api/generate": {
        "post": {
          "streaming": "true"
        }
      }
    }
  },
  "languageOptions": {
    "python": {
      "alwaysInitializeOptionals": false,
      "pypiPackageName": "",
      "githubRepoName": "",
      "ignoreFiles": [],
      "sdkVersion": "1.0.0",
      "liblabVersion": "2"
    }
  },
  "publishing": {
    "githubOrg": ""
  }
}

Enable streaming

To enable the SDK to receive streaming data, we can either add the streaming: true parameter to the endpointCustomizations parameter in the liblab.config.json, or we can add the x-liblab-streaming: true annotation to the Open API Spec file.

In this example, we've done both, to illustrate how to enable streaming, but you only need to do one of these.

Note: streaming is enabled by default if an endpoint returns the text/event-stream content type. In this case there is no need for any extra configurations.

3. Generate the SDK

Now that we have an OpenAPI Spec file and the liblab CLI, it is time to generate our SDK.

Execute the following command inside the streaming folder:

liblab build

The CLI will validate and notify us about any issues with the liblab.config.json or the ollama-open-api.yaml files. You might expect something like:

✓ No issues detected in the liblab config file.


Created output/api-schema-validation.json with the full linting results

Detected 3 potential issues with the spec:

⚠ OpenAPI "servers" must be present and non-empty array.
⚠ Info object must have "contact" object.
⚠ Operation must have non-empty "tags" array.
? It is important to fix your spec before continuing with a build. Not fixing the spec may yield a subpar SDK and documentation. Would you like to attempt to build the SDK anyway?

We can go ahead and confirm by typing Y.

Next we should see the build started and hopefully finished messages:

Ignoring the spec errors and attempting to build with the spec

No hooks found, SDKs will be generated without hooks.

No custom plan modifiers found, SDKs will be generated without them.
Your SDKs are being generated. Visit the liblab portal (https://app.liblab.com/apis/ollama-api/builds/6770) to view more details on your build(s).
✓ Python built
Successfully generated SDKs downloaded. You can find them inside the /Users/felipe/Development/LibLab/cli-test-runner/output folder
Successfully generated SDK's for Python ♡

If we go inside the output directory, we will see our SDK.

Congratulations! You have successfully generated an SDK for Ollama with streaming capabilities.

4. Using the SDK

Now that we have generated our SDK, let's make a request to Ollama to test it.

To do this, go into the output/python/examples directory.

Now run the install.sh script, and activate the generated virtual environment.

chmod u+x install.sh # this step my be necessary if this file requires run permissions.
./install.sh
source .venv/bin/activate

Now copy and paste the following code in the sample.py file.

Execute the sample.py file with this command:

python sample.py

After the sample has ran you should see the following output:

(.venv) ➜  examples git:(main) ✗ python sample.py
Here's one:

What do you call a fake noodle?

An impasta!

Hope that made you laugh! Do you want to hear another one?

5. How to enable streaming endpoints

There are 3 ways to enable streaming endpoints in your SDK:

1. Using the `text/event-stream` content type

Streaming is automatically enabled for any endpoint that returns a text/event-stream content type.

2. Using the liblab config

You can enable streaming for an endpoint by adding an endpoint customization to your liblab.config.json file, like we did in this tutorial:

{
  ...
  "customizations": {
    "endpointCustomizations": {
      "/api/generate": {
        "post":  {
           "streaming": "true"
        }
      }
    }
  }
  ...
}

3. Adding the `x-liblab-streaming: true` annotation to the OpenAPI spec

You can also enable streaming by adding the x-liblab-streaming: true annotation to the OpenAPI spec, like we did in this tutorial:

paths:
  /api/generate:
    post:
      description: Send a prompt to a LLM.
      operationId: generate
      x-liblab-streaming: true
      requestBody:
      ...

6. Conclusion

In conclusion, creating an SDK for your LLM application simplifies the development process. By following the steps outlined in this tutorial, you’ve learned how to utilize the liblab CLI to generate a robust SDK.

Prerequisites​

Steps​

1. Setting up an example Llama API​

2. Setting up the liblab CLI​

Enable streaming​

3. Generate the SDK​

4. Using the SDK​

5. How to enable streaming endpoints​

1. Using the text/event-stream content type​

2. Using the liblab config​

3. Adding the x-liblab-streaming: true annotation to the OpenAPI spec​

6. Conclusion​