GLaDOS Voice Generator (weekend project)

Unknownbolt | 2020-03-02

I wanted some GLaDOS-like sounds for my home automation system. I found this project, which doesn’t work anymore as of writing, as the synthesis server is offline.

Thus, my weekend project turned into a week-long evening project. But that’s fine, as the result is cool:

The new GLaDOS Voice Generator is here! Enjoy.

14 Comments

Montgomery Newcom says:

2020-04-25 at 22:35

Awesome! Thank you so much!

Reply
- bolt says:
  
  2020-06-09 at 19:40
  
  Have fun 🙂
  
  Reply
Glados says:

2020-06-01 at 11:17

Awesome stuff! How did you achieve such a nice replica of the original voice? Which sound synthesizer did you use?

Reply
- bolt says:
  
  2020-06-09 at 19:35
  
  The input voice generator is a commercial one I’ve developed for work, so I’m not at liberty to distribute it, I’m afraid. I simply used it because it was available to me, and it sounded good after running the glados script from https://github.com/EtiennePerot/gladosvoicegen which I also altered so it wouldn’t get stuck as often.
  
  Reply
Han says:

2020-07-11 at 18:26

This is amazing. Did you have to use Windows XP in order to run it? or is there a way to run this on Windows 10?

Reply
- bolt says:
  
  2020-07-11 at 18:56
  
  I had enough issues making the script run reliably, so I did not bother trying to get it to run on Windows 10. It was supposed to be a week-end project, after all.
  
  I happened to have an old XP license, so I simply ran that in a virtual machine. I see no reason why XP would be required, however, but since parts of the script relies on pixel perfect image recognition, any blurring or transparency effects is virtually guaranteed to mess up the algorithm.
  
  Reply
Andrew Lee says:

2020-10-27 at 10:20

Wow this is legitimately amazing. Big thanks for this tool. Would it not be possible to exponentially speed up by using audacity scripts and some custom audio manipulation code? I would think that audacity could do most of it.

Reply
- bolt says:
  
  2020-10-27 at 10:50
  
  Yes, Audacity could probably do the same thing. I went with Melodyne, both since Etienne’s original Python script used it, and since that’s what was allegedly used to create the actual GLaDOS voice in Portal.
  
  If you’re interested in creating the same effect in Audacity, I could send you a sample of the unprocessed TTS voice that you could play around with. If it sounds just as good (or better), I’d be happy to replace the process with your script (with your permission) to speed it up.
  
  Reply
Can says:

2022-03-05 at 23:23

Very nice!
I have a question, instead of annoying other people by adding too many sentences which takes a while, is there a way to do it locally? Like something to download?
It sounds pretty cheeky (xD) of me to say this, but if u read this maybe answer me via email… Thanks!

Reply
- bolt says:
  
  2022-10-02 at 13:11
  
  Unfortunately, this is not a regular TTS voice. It’s a script that runs ordinary, human sounding, audio generated by a TTS through a program called Melodyne to mimic the effects applied to the actual human voice of GLaDOS. I am not at liberty to distribute the voice, nor Melodyne, as both have licenses attached to them.
  
  Reply
redonkuless says:

2022-10-02 at 00:13

Bolt,

Could i get a copy of that TTS voice, Im in the process of upgrading an older glados 3d printed ceiling lamp, and it would be pretty cool to have it interact with who ever is in the room. Been considering setting up a jetson nano for object req, and voice commands with responses, along with sensors so it auto looks at the direction your moving.

Reply
- bolt says:
  
  2022-10-02 at 13:10
  
  Unfortunately, this is not a regular TTS voice. It’s a script that runs ordinary, human sounding, audio generated by a TTS through a program called Melodyne to mimic the effects applied to the actual human voice of GLaDOS. I am not at liberty to distribute the voice, nor Melodyne, as both have licenses attached to them.
  
  Your project sounds very cool, and you are of course free to use this voice for it, but unfortunately you’d have to do so using the API. This works quite well if you make some good decisions in regards to which sounds to generate. For instance, for reading the temperature, getting one sample to say “The outside temperature is”, and then extracting the numbers from 0-100, “negative”, “point”, and “degrees” would then yield instant results from the API when it’s asked for the same samples at a later date. Or you could cache them locally.
  
  Reply
  - redonkuless says:
    
    2022-10-02 at 19:56
    
    I appreciate that, I was thinking if your ok with it, just taking the English dictionary and running it through the api, but i want to be mindful of usage and tmp files on your system. Do you have a auto clean tmp of audio clips that are being generated by the api? is there a rate limit you would like me to set? I havnt started any of the programing yet for using the api, so im completely flexable to what ever conditions you need me to set. I also need to figure out on my end how to structure the downloaded files and how I want to store them for later use. Id like the system to work offline one of the reasons im considering this approach.
    
    Reply
    - bolt says:
      
      2022-10-02 at 20:12
      
      The API has limits, yes. You’ll probably find them. As long as you stay within the limits, I have no objections. However, running through a dictionary is probably not feasible.
      
      Firstly, the sentences are not going to flow well, as each word will be pronounced as if it was the beginning of a sentence.
      
      Secondly, with the setup taking about a minute twenty to generate any given sample, mostly regardless of length, processing a common dictionary of even the 171,472 words in common use in the English language would take you almost half a year, and that’s with no variants to each word, i.e. “go”, but not “going” or “gone”.
      
      Thirdly, some words are pronounced differently based on their meaning in a sentence, such as “live” in “I live in Scotland” vs “I went to a live concert”. This will be pronounced correctly in most cases when provided as a sentence to the API. With a single word, there’s no way for the API to know which one you want. It will prefer pronouncing “live” as in “to live somewhere”, by the way. I just tested 🙂
      
      Additionally, you’re probably only going to use something like a 1 percent of those words with an automated assistant. The rest is a waste of time and disk space.
      
      But you do you, I guess…
      
      Reply

14 Comments

Leave a Reply to bolt Cancel reply