Ecoute: An OpenAI GPT-3.5 Powered Real-time Communication Transcription Tool

Name: Omar C. Williams

Updated on 6/4/2023

Unraveling the Magic Behind Ecoute

Ecoute is more than just a live transcription tool. It transcribes in real-time both the user's microphone input and the speakers output, thereby making both parts of a conversation readily accessible. Furthermore, Ecoute uses OpenAI's GPT-3.5 to generate contextually relevant responses based on the live transcription of the conversation, a groundbreaking feature that sets it apart.

For instance, imagine you're having a complex technical discussion with a colleague. Ecoute transcribes your dialogue and provides potential responses to facilitate your conversation. This feature can significantly boost efficiency, especially in intricate debates where crafting suitable responses may require extra time and effort.

Visit Escote GitHub page here (opens in a new tab).

Ecoute Setup: The Pre-requisites

Before setting up Ecoute on your local machine, you must ensure the following prerequisites:

Python >=3.8.0
An OpenAI API key
Windows OS (Not tested on others)
FFmpeg

If FFmpeg isn't already installed on your system, you can install it using Chocolatey, a package manager for Windows.

Set-ExecutionPolicy Bypass -Scope Process -Force; [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))
choco install ffmpeg

Please remember to run these commands in a PowerShell window with administrator privileges.

Navigating the Ecoute Installation Process

Once the prerequisites are met, follow these steps to install and run Ecoute:

Clone the repository using the command: git clone https://github.com/SevaSk/ecoute
Navigate to the ecoute folder with: cd ecoute
Install the required packages via: pip install -r requirements.txt

Next, you need to create a keys.py file in the Ecoute directory and add your OpenAI API key. Here are two methods to accomplish this:

Method 1: Utilize Command Prompt

Run the following command, ensuring to replace "API KEY" with your actual OpenAI API key:

python -c "with open('keys.py', 'w', encoding='utf-8') as f: f.write('OPENAI_API_KEY=\"API KEY\"')"

Method 2: Manually Create the File

Open a text editor and enter the following content:

OPENAI_API_KEY="API KEY"

Replace "API KEY" with your actual OpenAI API key. Save this file as keys.py within the Ecoute directory.

Launching Ecoute

You can run Ecoute by executing the main script: python main.py.

For a faster and more enhanced version that supports most languages, use: python main.py --api

This command will use the Whisper API for transcriptions, offering enhanced speed and accuracy. Please note that it may take a few seconds for the system to warm up before the transcription becomes real-time.

Key Considerations: Limitations and Future Prospects

While Ecoute offers real-time transcription and response suggestions, certain limitations are worth noting:

Default Mic and Speaker: Ecoute listens only to the default microphone and speaker in your system. For using a different mic or speaker, set it as the default device in your system settings.
Whisper Model: Without the --api flag, Ecoute utilizes the 'tiny' version of the Whisper ASR model due to its low resource consumption and fast response times. However, this model might not transcribe certain types of speech as accurately as the larger models.
Language: Without the --api flag, the Whisper model used is set to English. It may not accurately transcribe non-English languages or dialects.

Active efforts are ongoing to address these limitations and add multi-language support in future versions.

Conclusion

Ecoute is an innovative tool with the potential to revolutionize communication. Its live transcription feature coupled with response suggestion makes it an invaluable asset for personal and professional communication. Despite its limitations, the Ecoute project is an exciting step forward, hinting at the limitless possibilities that AI offers for the future of communication.