Bob Carpenter's Projects

My preferred style is to collaborate with an interdisciplinary team to engineer general, large-scale frameworks for language technology research and development. I'm keen to study specific applications in order to develop portable and scalable frameworks.

The most important thing I learned at SpeechWorks is how to write production quality code in a team environment; I can no longer survive without CVS, automated builds and extensive unit tests. I prefer emacs and a shell to any IDE I've used, and although I'm a native unix speaker, I can survive on Windows with emacs and cygwin.

My favorite tool is Java. As of the 1.4 JDK server mode, it's very fast. I'd be happy to never see C, C++ or Perl again, though I respect C's proximity to the bits. I use Java's Swing toolkit for native GUIs and Servlets for web-based GUIs. I love the Ant build framework. I can't even write a hello world program anymore without creating a CVS repository and the first unit test. I'm a compulsive javadoc-er. I automate everything I can, which calls for the occasional script. When necessary, I run Quantify or JProfiler to tune. I've used XML extensively, particularly through the DOM and SAX APIs. I'm comfortable in distributed and concurrent environments; I always think about threading, and can use sockets and databases.

LingPipe

2003. Open Source. LingPipe is a Java-based framework for tokenization, named-entity detection, within-document coreference, and cross-document coreference. It works on XML, HTML or plain text. It contains a fairly tight statistical decoder implementation for named entity extraction. It includes named-entity models for English newswire and English genomics, and has been trained on Hindi, Spanish, and Dutch sources. Everything runs through XML with SAX filters, and it's extensively documented. You can download it from the LingPipe Home Page. It's also the basis for Alias-i's ThreatTracker interface. We've recently been applying our techniques to bio-informatics, including the recent BioCreative evaluation of named entities in genomics.

Compression by Language Modeling

2002. Open Source. When researching language identification, I became fascinated with the prediction by partial matching approach to text compression, which builds n-gram models of characters, and with the arithmetic coding that allows you to pack the outcomes below the bit level. Because I like to learn by doing and there were no Java implementations, I wrote my own version of PPM and Arithmetic Coding implemented as a stream and with interfaces and adapters designed to be extended. It's open source and illustrates how I like to set up projects.

Email Parsing

2002. Production. Co-designed customer facing API and built runtime software with Paul Duchnowski. After using third-party software to parse a message according to the mime standards, the email parser classifies lines of message bodies into types and then combines the types to parse the email body into quoted regions, signatures, lists, graphics, etc. It's another HMM decoder, this time unfolded to optimize for a trigram prior model and allow reference-counting garbage collection in the C implementation. I wrote a Java Swing GUI to automate the construction of the training corpus and to train models. I supervised a summer student, JongHo Shin, who used the first version to build a VoiceXML-based demo that accessed the API through JNI.

Language Identification

2002. Production. Also work with Paul Duchnowski. Designed the statistical models and wrote the production decoders. I extended language ID to multiple character sets, to rejection with thresholds set on held-out data, and to segment a stream of characters into languages. It uses interpolated n-gram models up to order 9, which are pruned to three sizes for different application environments from server to embedded. Model values are quantized to 16-bit integers, and all run-time arithmetic is integer-based. I designed a novel fixed memory HMM decoder for language segmentation. It's synchronized for concurrent read and single write. I also wrote a servlet-driven demo using the Apache Web Server and Tomcat Servlet Container. The product is available for beta testing from SpeechWorks as of September 2002.

Letter to Sound Modeling: Labeling and Tagging

2002. Open Source. A supervised learning approach to letter-to-sound rules has never been tested because there is no corpus with letters and phonemes aligned. I decided to write a GUI to support alignment, using the tag-a-little, learn-a-little interactive paradigm introduced at MITRE with the Alembic Workbech. Yet another HMM decoder, this one for generating tags for letter-to-sound rules. Uses active learning to select training instances for maximum expected uncertainty reduction. Far and away the most complex GUI I've built, with a nifty new component of my own design that works like a slider in one of those 15 squares in 16 spaces puzzles. I've aligned 5K of the CMU dictionary so far.

Semantic Interpretation for Spoken Grammars

2000-2001. Production. My first project at SpeechWorks was to co-design with Roberto Pieraccini and then implement the semantic interpration format for the grammar underlying SpeechWorks OpenSpeech Recognizer. The result is roughly a YACC with JavaScript for weighted ambiguous grammars. This involved heavy optimization of garbage collection, and resource sharing across threads, as well as transformations for backward compatibility.

2002 Production. A year later, I briefly returned to write a grammar normalizer that transformed the XML format grammars into ones with abbreviations and other "non-standard" words expanded at the lexical level. Integrated the Speech Synthesis front-end processor for the job with the Xerces SAX Parser. Scheduled for release in Q4 2002 as part of SpeechWorks OpenSpeech Recognizer 2.0.

Dialog Application Framework Generation

2001. Prototype. A program generator which converts a dialog call-flow specification in XML to a ready-to-launch web archive (WAR). I built an XML specification for generating web-based dialog applications and embedding Java code. I used the Xerces DOM parser and generated a complete WAR (web archive) that could be one-touch deployed on Apache. I generated web configuration files, ant build files, and all of the Java source code from the XML.

Multi-Modal Application Framework and MapQuest Application

2001. Prototype. Worked on DARPA-sponsored research project to create and demonstrate a framework for multi-modal application design and a prototype implementation. I designed the user interface and built the framework and the application by myself, working from existing dialog and communication components. The demonstration application involved accessing MapQuest information with a combination voice/stylus interface. The tricky part was synchronizing spoken and gestural dialogs with feedback. The implementation involved the SpeechWorks 6.5 recognizer, SpeechWorks Speechify synthesizer, SpeechWorks OpenSpeech Application Framework, MIT Galaxy Communicator Framework for socket communication, all powered by a Microsoft SQL server database supplied by MapQuest.

Probabilistic Non-deterministic Tagging and Parsing

1999. Prototype. Built HMM-style syntactic category taggers that take probabilistic word graphs as input and produce probabilistic tag graphs or n-best lists as output. Built generic probabilistic CFG parser operating on probabilistic tag graphs and producing parse forests or n-best lists. Implemented tag-a-little, learn-a-little training for PCFG using outside probabilities and an intuitive GUI. Bell Labs project. Implemented the Collins Parser, training from the Penn Treebank.

Natural Language Call Routing

1997-1998. Field Tested. This joint project with Jennifer Chu-Carroll, backed up by the speech recognition research group and business communications systems unit at Bell Labs focuses on an information retrieval approach to routing callers using natural spoken dialogues. You can read about the automatically generated dialogue component, the content-based language modeling component, or its robustness for speech. This work has been tested on live customer data and shown to perform as well as human operators in complex call centers, with higher customer acceptance than touch-tone systems. For the system, I built a manual call router GUI in Tcl/Tk that we used to label the training corpus. Field Trial Completed for customer, USAA Bank.

2002. Field Tested. I've recently been able to continue my work on call routing at SpeechWorks, where I've been working on extending the approach we developed at Bell Labs to hiearchical classification and integrating it with SpeechWorks user interface design methodologies. Philip Clarkson did all the hard classification work this time, and Jon Bloom designed the application interface, call flow and prompts. Co-designed a Wizard-of-Oz simulation for the application with Jon Bloom. Application Field Trial Completed for undisclosed customer.

ALE: The Attribute-Logic Engine

1990-1995. Open Source. The Attribute-Logic Engine (ALE), is a long-term project surrounding the use of attribute-value logics to model constraint-based linguistic theories. The theory behind the system led to my first book, The Logic of Typed Feature Structures. The implementation is based on an abstract machine architecture. The system includes constraint resolution, definite clause logic programming, bottom-up chart parsing and head-driven generation in any combination. It is supplied with example grammars in constraint-based morpho-phonology and a full implementation of head-driven phrase structure grammar. ALE is being used as a research and development tool by over a hundred industrial and academic sites worldwide for projects ranging from phonology to dialogue for a wide variety of languages. It was also integrated with the HTK speech recognizer to parse candidate word graphs in the context of an English-to-German Translator that synchronized with Dominic Massaro's Baldi facial synthesizer. More recently, my student and collaborator, Gerald Penn, was awarded the Beth Dissertation Prize in Logic, Language and Information for his work on feature structure compilation and transformation, most of which was tested in the ALE environment.

Type-Logical Semantics

1985++. Research. I am currently working on compositional semantics in the form of type-logical grammar. This work began in my Ph.D. thesis, Phrase Meaning and Categorial Grammar. An introduction to the theory along with a very detailed semantic and syntactic grammar of English can be found in my latest book, Type-Logical Semantics. The examples were generated automatically, and you can try out the program online with the Type-Logical Grammar Theorem Prover. Most recently, I have been working on multimodal grammars (in the modal logic sense, not the sensory mode sense), including German clause structure and word Order. You can try the type-logical theorem prover which includes a Prolog CGI I wrote in 1994.

Intelligent Characters for Interactive Cinema

1999-2001. Production. Worked with Toni Dove, an artist working on responsive and immersive narrative environments for both interactive installations and theatrical performance. I consulted on her most recent project to expand the varieties of interaction from the motion and gesture detection used in Toni's previous work, Artificial Changelings, into spoken language understanding, and to expand the depth and variety of interaction using agent technology. I helped out with dialog design, speech recognition integration with her Macintosh environment, and control of a facial animation accomplished through carefully edited phonemically-synchronized pixilation. An early prototype dialog, Sally Rand, the Fan Dancer, was developed in VoiceXML and launched on Tellme Studio in 2001. There is now an Interactive CD ROM available.

Games

2001. Open Source. I am very pleased with my offensive/defensive simultaneous dice rolling solution for my at-bat baseball simulation, Little Professor Baseball. I wrote Java programs to extract the statistics from spreadsheets I found online. You can recreate the 1970 World Series or the entire 2000 season.

2000. Open Source. I've also been stunned at the hundreds of thousands of calls that SpeechSmuggler has received; all those hours listening to my friend Dan Klein's voice talent. Available toll-free through Tellme Extensions.


Directory:     Home   |   Schedule   |   Projects   |   Publications   |   CV   |   Personal

Copyright ©1999-2003. All rights reserved.   Contact: webmaster@colloquial.com   Updated: 13 December 2003.