(and the animation studio that derived from it)

The goal of this project was (conceptually) simple: to have a model of a human face mimic the motions a person use while forming words (aka I wanted my models to look like they were really talking). In both pre-rendered and real time animations having a character with facial realistic movements will both contribute to the realism of the scene, will help people be more comfortable with what they are seeing, and can add a distinct personally to the character as well. In this project, emphasis was put on real time animation and rendering of the facial deformations.

The English language contains approximately 44 different phonemes, of which one can use about 10 distinct "groups" to achieve a pseudo-realistic representation. Vowels are voiced within the throat, while consonants are constructed more with tongue and teeth positioning (to see a full list of English language phonemes, click here).

7 Major steps taken for completion of this project:
  1. Construct a keyframe editor for ease of use in creating the phoneme representations.
  2. Determine which keyframes need to be created, and in what positions.
  3. Creation of each of these keyframes.
  4. Creating an interpreter for text-based input --text to speech, essentially.
  5. Smooth interpolations between each phoneme, regardless of the combination of consonants/vowels.
  6. Tweaking of the motion and interpolation to make everything less "perfect".
  7. Impress your friends at a party! (ok, not really)

1. Keyframe Editor:
And by Keyframe Editor, I really meant the beginnings of an Animation Suite and a processing UI toolkit. UI elements are a bit lacking in processing, and while there are libraries to download, they seemed to be built off AWT/SWING and were not up to my visual standards. I wanted something a bit more light-weight and more visually appealing. So what does one do in this case? If you said spend a week and a half constructing one, then you ma'am, are correct.
Clicking on each image will enlarge it

The following classes were created for the UI Toolkit:
GUIButton Provides an interface for an extensible, clickable button. Support images for normal, mouse-over and mouse-click. Each button can be assigned an action name which then gets handled upon clicking. Additionally, the cursor is modified when ever the mouse is over a button. Button text gets centered onto each button behind the scenes, so no need to worry about spacing. Width, height and position are fully adjustable.
GUIMenu Allows multiple buttons to be added together into a single menu component. Menus can be either vertical or horizontal --supporting an optional header item as well. Contains flags for visible and hidden so you can toggle the visibility upon a right click action. Width, height and position are fully adjustable.
GUITextInput Allows the input of text into a well-recognized text-box format. Simply supports the handling textual input and backspace for deleting characters. Width, height and position are fully adjustable.
GUITextPrompt Gives the ability to prompt the user for input. While this prompt is active, the rest of the screen is grayed out to help draw focus to the prompt. Uses a GUITextBox for input and 2 GUIButtons (OK and Cancel) for handling actions. Also supports the ENTER key press in lieu of the OK button. Width, height and position are fully adjustable.
GUIManager Encases all GUI components under 1 manager component. Use this to create and intialize all GUI Elements. Also here is were you would handling Processing's built in actions and pass that event down to each of the GUI elements.

A full tutorial is available here.

2. Keyframes to Construct:
With this Animation Suite implemented, then construction each phoneme keyframe became much easier (and after some tweaking and adjustments, fairly quick and straight forward). I implemented 10 different phoneme keyframes, and using such, was able to mimic good approximation of facial deformations.
Keyframe Name: Represents: Keyframe Name: Represents:
AandI a i E e
U u oo ou O o aw
Consonants c d g k n r s th y z FandV f ph gh v
L l ll h MBP m b p
WandQ w q T t
In addition to my own mouth and a trusty mirror, I used the following to pages for reference:
1. Preston Blair's Phoneme Series.
2. Lip-Syncing For Animation: Basic Phonemes.

3. Constructing Said Keyframes:

I should mention that before creating any of these keyframes, I first created a base "face" from which to work from. This contained all the different facial featured I needed to accurately reflect a facial model.

2 Examples of my faces: (click to toggle enlargement)
Once I had a face created, I would then use as my rest position. Clicking the 'Add Keyframe' button on the top of screen duplicates everything in my current editor window to a new keyframe, so I could just modify this rest position to create each of facial positions. This involved some tweaking, additional editing and re-visiting of each keyframe (for each model), but overall it was --at least for me-- an ejoyable experience and I think one which is terrible hard for an end-user.

4. Creating an Interpreter:
I wanted to create a pseudo-generic interpreter which would run behind the scenes and read in all the text input, and convert it to keyframe data and interpolation commands. While only partially successful at generalizing it, the actually processing of interpreting the data worked out very well.

Bottom up layout of interpreter class:

Phoneme (Symbol) -- stores all the basic information for each symbol in our system.
Name The name of this phoneme ('a', 'e', 'th', etc..)
GroupName Which group/keyframe this belongs to ('AandI', 'E', etc..)
IsVowel Needed to do some special interpolation for vowels
IsLong Whether or not this symbol should be a bit longer than the average
IsEnding Whether or not this symbol ends a word/sentence.

SymbolLibrary -- collection of all symbols and methods for finding them based on input
List Array of all symbols
NumSymbols Number of items this contains
FindSymbol() Takes in a symbol to search for, and returns largest matching symbol in the library.

Interpreter -- accepts user input; parses it all into symbols then starts the proper deformations
ParseInput() Grabs the user entered data from the text box and proceeds to find all largest matching symbols in the library. Uses a look-ahead approach to finding the largest symbol, so if searching for "the" it will match "t", then "th" and then return the correct Keyframe for "th" and not for "t" and "h".

Custom Rules:
  • If a word ends in a single vowel phoneme, then this vowel's keyframe is NOT displayed in our motion.
  • The endings of words are shortened slightly to achieved proper transition between words.
  • If two adjacent phonemes come from the same "Group" then only one of those is displayed, but in a slightly elongated manner. Having 2 of the same keyframes in a row produces an very unnatural feeling motion.
  • Vowels are speed up when in the middle of words. Very few people articulate each and every letter in a word, and vowels are usually the first to get truncated while talking.

5 & 6. Smoother Interpolation:
Creating smooth and correct looking interpolations took a lot of work and trial and error. The basic function uses a sine function with a parameter of (PI * TimeStep), in then adjust this to be within the [0,1] range. As mentioned above, certain phonemes need to be elongated or shortened, so each key frame gets a floating point modifier that adjust how quickly time is "stepped". So a normal step has a "1.0" modifier, phonemes which need to go faster have a larger time step modifier (currently 3.0 for vowels and 2.0 for word endings -- tStep += (FPS_STEP * modifier) ). Elongated phonemes have a value less than 1 to increase the number of frames are interpolated.