In a previous article, I’ve described how you can write a Python script that allows a continuing conversation with the GPT models of OpenAI. Still missing from that script is the possibility to have the models use functions. Functions provide the models with a way to access external information or take external actions, analogous to what plugins do for ChatGPT. This can be done by adding an extra parameter tools
when calling the chat completion endpoint (see the article linked above).
Classical examples for that are retrieving the current weather, send an email, book a flight, etc. For this article, I’m going to provide a function and implement it directly in the Python script, instead of connecting to a web service (as you would probably do for, e.g., a weather forecast).
Providing a Function
My example will give the model the possibility to play a message in Morse code out loud. This is something that it definitely cannot do without external support.
The model can translate text to Morse code, as you might expect. If I give the model the following task: “Translate the following message to Morse code: morse code demo”, using my chat script, I get the following reply:
-- --- .-. ... . / -.-. --- -.. . / -.. . -- ---
This is the correct translation. If I ask “Play the following message out loud in Morse code: morse code demo”, the results vary. A few responses were:
-... ..- .-. ... . / -.-. --- -.. . / -.. . -- --- ... --- ..-. --- .-.
and
.-. -- --- .-. ... .-.-. .-.-.- -.. . -- ---
The first one translates as “burse code demosofor”, and the second one as “rmors??dmo” (with question marks for two non-existing codes). Sometimes it also simply produces the correct Morse code for “morse code demo”, but is is clear that the model doesn’t know how to fulfill the request.
We can change that by giving the model access to an external function, and explain to it, in plain English, what that function does. The model will then decide by itself whether it needs to call it, and with which parameter(s). The crucial parts are the description of the fuction, for which I’ve simply used “Create audio from a message in Morse code”, and the description of the (one) parameter of the function, for which I’ve taken “The message in Morse code”. You then have to formulate that definition as a JSON Schema object to be able to add it to the call to the chat completion endpoint, as follows:
{ "type": "function", "function": { "name": "create_audio", "description": "Create audio from a message in Morse code", "parameters": { "type": "object", "properties": { "message": { "type": "string", "description": "The message in Morse code" } } } } }
This looks a bit complicated, but that’s just syntactic sugar. It boils down to a function that you could declare as create_audio(message: string)
, with the instruction towards the model to call it when it needs to create Morse code as audio, with a string as a parameter that contains the message, already converted to Morse code.
Python Code
Below is the script from "ChatGPT" from the Command Line, with the definition and handling of the create_audio()
function added. The definition of the Morse
class follows later.
import openai from os import environ from json import loads from morse import Morse class Chat: def __init__(self, model, temperature): self.model = model self.temperature = temperature self.messages = [{ 'role': 'system', 'content': 'You are a helpful assistant.'}] self.tools = [{ 'type': 'function', 'function': { 'name': 'create_audio', 'description': 'Create audio from a message in Morse code', 'parameters': { 'type': 'object', 'properties': { 'message': { 'type': 'string', 'description': 'The message in Morse code'}}}}}] self.morse = Morse(44100) def run(self): print('Leave an empty message (press return) to end this chat.') while True: message = input('you> ') if message == '': return self.messages.append(dict(role='user', content=message)) response = openai.ChatCompletion.create( model=self.model, messages=self.messages, tools=self.tools, temperature=self.temperature) message = response.choices[0].message if message.get('tool_calls'): self.__handle_function_call( message['tool_calls'][0]['function']) continue answer = response.choices[0].message['content'] print('gpt>', answer) self.messages.append(dict(role='assistant', content=answer)) def __handle_function_call(self, function_info): function_name = function_info['name'] arguments = loads(function_info['arguments']) if function_name == 'create_audio' and 'message' in arguments: print('Model called create_audio("{}").'. format(arguments['message'])) self.morse.play(arguments['message']) else: print('Unknown function request received.') openai.api_key = environ['OPENAI_API_KEY'] chat = Chat('gpt-3.5-turbo', 1) chat.run()
I do need to point out that this Morse code example is not something that you would use an OpenAI language model for in practice. However, I didn’t want to write the nth example of a weather forecast or stock price…
Below is the Morse
class that is used in the above script. I save the audio in a .wav file instead of actually playing it from the script, to avoid the sometimes troublesome installation of audio support in Python, should you want to try the script in practice. The .wav file should be playable on any audio player that you have installed on your computer.
#filename: morse.py import numpy as np from scipy.io.wavfile import write from scipy.signal.windows import tukey class Morse: def __init__(self, rate): self.rate = rate # [Hz] self.freq = 1000 # [Hz] self.audio = np.array([]) def play(self, message): for element in message: self.__add(element) write('morse.wav', self.rate, np.round(self.audio * 32767).astype(np.int16)) self.audio = np.array([]) def __add(self, element): dit = self.rate // 10 # 100 ms. beep_len = 0 pause_len = 1 if element == '.': beep_len = 1 elif element == '-': beep_len = 3 elif element == ' ': pause_len = 2 # Total pause between letters is 3. elif element == '/': pause_len = 2 # Total pause between words is 7. else: pass # Ignore unknown characters. t = np.linspace( 0, beep_len * dit / self.rate, beep_len * dit, endpoint=False) self.audio = np.append( self.audio, np.sin(2 * np.pi * self.freq * t) * tukey(beep_len * dit, 0.2)) self.audio = np.append(self.audio, np.zeros(pause_len * dit))
This script, with the same message as before, “Play the following message out loud in Morse code: morse code demo”, will print the message:
Model called create_audio("-- --- .-. ... . / -.-. --- -.. . / -.. . -- ---").
This indicates that the model has correctly understood that it needs to call the function to create the audio. The script then calls the actual function that creates the .wav file. Funny enough, sometimes the model calls the function like this:
Model called create_audio("morse code demo").
If that happens, you can ask “Could you try this again, but already convert the message to Morse code before you call the function?”, and then it will actually do that! :-) This could maybe be improved by adapting the description of the message parameter.
Since you’ve read all the way to the end, a quiz question (unrelated to the OpenAI API): What am I trying to achieve with the call to tukey()
in the Morse
class?
Add new comment