|   | Sample Dialogs | ![]() |
This is an early sample dialog, meant to get across the feel for what you could do with this application. Many elements have changed and this section will have to be brought in line with the changes.
| Scenario | Commentary |
| A street corner in lower Manhattan.Dave steps out of a cab with a heavy laptop case in one hand and an iPaq in the other.The iPaq has a cable from it which ends in an earbud in Dave’s ear, and a microphone built to hang in front of Dave’s chest as he holds the iPaq in front of himself in order to see the screen. | “One hand free” may be a different scenario than “hands free” – can hold iPaq to see it, but can’t enter data w/other hand at all.(Don’t put down your laptop in NYC!) |
| Closeup of the iPaq screen in Dave’s hand.It shows a PDA menu of “Calendar/Address Book/Notepad./…”.We see Dave’s thumb depress a button on the side of the iPaq, holding it down continuously.The original screen shrinks to ¾ size, with a new window in the bottom quarter. A cartoon head turns in the left third of this window, with an exaggerated ear facing outward. | Visual feedback that microphone is live, other status markers |
| iPAQ: <bing> Listening. | Audio feedback that mic is live.Earcon will always sound, audio“Listening” added if mike hasn’t been activated in last 15 minutes. |
| Medium shot of Dave looking at the PDA. His thumb is still squeezed against the iPaq.. | |
| Dave: New messages, please. | |
| iPAQ: You have 3 new voicemails, 1 urgent, and 17 new emails.Shall I play the urgent voicemail? | Audio output active because audio input is active. |
| Dave: Yes. | |
| Closeup of the iPaq screen in Dave’s hand, thumb still depressed.Screen shows “Voicemail #1, Received: 10:12 a.m. Today, Caller ID: 617-444-1795.” | Parallel presentation of different views on same data is more efficient.I.e., with voicemail header information on-screen, there’s no need to voice it. |
| Female VoiceOver:Dave, it’s Cathy.Acme postponed your meeting until 2:30!Sorry you rushed to get there early.Well, try and enjoy your free hour in New York! | |
| Long shot of a woman in a business suit walking up to Dave. | |
| Joanne: Dave Simmons? | |
| Dave: [looking at PDA, chagrined] Delete it. [Releases thumb from side of iPaq, drops hand down halfway, looks up at Joanne. Blank stare, then he recognizes her]Joanne!I didn’t know Acme was inviting us both to the meeting! | Situations benefiting from multiple modes are likely to be situations in which multiple things are happening.So apps will need help understanding which audio is meant for them, and users will have trouble predicting the mode that will be optimal in the next instant; hence, push-to-talk. |
| Joanne: Well, our companies are both key players. Shall we go in? | |
| Dave: Sure, except I just got a message that the meeting won’t be starting till 2:30 now.I was just thinking that I might grab a cup of coffee and a donut somewhere. | |
| Joanne: Did you know that Krispy Kreme Donuts has opened up a bunch of chains in New York? | |
| Dave: You’re kidding!I love Krispy Kreme!Where? | |
| Joanne: I’m not sure, but maybe I’ll join you.I actually have some preparation to do, though, so I can’t be going all over town…. | |
| Dave: Well, hold on a second.[raises iPaq back up & depresses thumb] | |
| iPAQ: [bing!] | Frequent reactivation of mic signaled only by earcon:too much speaking “Listening!” would get tedious. |
| Dave: Switch to “MapQuest”. | Dave was in his messaging system, but can shortcut out to any other app.Speech shortcuts to numerous apps much easier to provide in bulk than screen real estate would allow visually. |
| Joanne: What’s that? | |
| Dave: [releases thumb] It’s my new gadget.Hold on a second. [depresses thumb] | Flipping modes w/o confusing iPaq. |
| iPAQ: [bing!] | Gee, after a while, even the earcon might get tedious…. |
| Closeup of iPaq screen in Dave’s hand with thumb depressed.Again, ¾ of the screen is devoted to MapQuest; the bottom portion is feedback on the interaction.The screen shows 3 radio buttons labeled “Maps”, “Directions”, and “Biz Locator”. | |
| iPAQ: [harp strings!] MapQuest!Would you like Maps, Directions, or the Business Locator? | Audio output was generated in response to nothing but the user activating push-to-talk (ie no dialog state change or other outside event).Audio matches visuals because (?) we’re at a command point, not an info-provision point (as when hearing voicemail), so (?) there’s no info bottleneck via either mode. The “harp strings!” are the earcon indicating a change of application. |
| Dave: Business Locator. | |
| Closeup of iPaq screen in Dave’s hand with thumb depressed.The screen shows 4 entry fields with an “OR” between the first two: the first is labeled “Business Name”, and the second, “Business Category”. The last two fields below are labeled “City”, and “State”. | |
| iPAQ: Business locator.Please describe the business you are trying to locate by filling in the on-screen form. | Note that audio helps offload the visual presentation real-estate-crunch. |
| Dave: Business name is ‘Krispy Kreme’; City is ‘New York’; state is ‘New York’. | “Speech Graffiti” |
| As Dave speaks, screen shows “???” in “Business Name” field, then ‘New York’ appears in City field, then in ‘State’ field.Finally, “Business Name” field resolves to “Christy’s”, but there is a pulldown arrow next to it; the field is flashing. | Grammar should allow developer to specify the order in which fields are resolved; in this case, the graffiti-labeled 2nd and 3rd fields are resolved first, in order to constrain the first – but speaker needn’t change order. Using visuals to confirm multiple keys in the input may be less confusing than using audio. |
| iPAQ: Say ‘yes’ to confirm business name of ‘Christy’s’, or select an alternative. | Slot-based confidence scoring; fallback out of voice input when voice input has already been flagged low-confidence. |
| Dave puts down laptop on sidewalk, pulls stylus out of iPaq.In doing so, his thumb wavers on the push-to-talk button and the bottom-portion “feedback” area of the screen flickers in and out.With thumb depressed, he taps on down-arrow to reveal scrolling list of 20 entries; the tap results in an immediate graying out of feedback area, indicating mic is dead.He selects “Krispy Kreme” with stylus.Long shot of a nervous Dave hurriedly putting back his stylus and picking up his laptop case before someone can steal it. Closeup of iPaq screen (thumb depressed) shows “downloading maps”, with bottom portion indicating audio still isn’t being listened for (thumb is depressed, but ‘downloading’ nixes input). Screen changes to show a map of downtown Manhattan with a green circled dot for where Dave is, and 4 red circled dots, blinking.(This is downtown, where the streets are more complicated than a simple grid.) There are 2 radio buttons: one is labeled “Directions” and the other is a pulldown list with the words “Other Options” displayed.Since his thumb is still depressed, the feedback area is active again now that a new turn has begun. | Should there be “persistence” of push-to-talk?Should iPaq immediately silence itself if push-to-talk button is released during prompt playout? |
| iPAQ: There are 4 Krispy Kremes in New York, New York. The nearest one is 0.3 miles away. You can say ‘Directions’, or, you can also say ‘hours of operation’, or – | Some redundant info, some semi-redundant (location expressed two different ways), some new info (longer list of “shortcut” options than what real estate can show). If audio channel wasn’t active, the real estate for the feedback icons could be used to present more info visually. |
| Dave: [releases thumb, speaks to Joanne]It’s about 6 blocks from here. | Audio output cuts off when audio input is deactivated (in this mode). |
| Joanne: Okay, I’ll have to pass then.See you in the meeting? | |
| Dave: Sure.Bye! | |
| Joanne: Bye![exits into building] | |
| Dave looks around. The sidewalk is teeming with people. A bicycle messenger jumps up onto the sidewalk right next to Dave to get around a parked cab, almost knocking Dave over.Closeup of iPaq as Dave depresses thumb button again. | |
| iPAQ: [bing!] | Are the earcons annoying yet? L |
| Dave: Switch to “voice-only mode.” | Explicit control of mode-switching. |
| iPAQ: “Voice-only mode” active.Your screen is still active, but I’ll assume you can’t look at it, until you touch the screen anywhere, or until you say ‘switch to “mixed mode”’.You can say ‘Directions’, or, you can also say ‘hours of operation’, or – | Screen is still active (map is displayed), but in “voice only” mode, no reference to the screen is made. |
| Dave: [releases thumb, slips iPaq into breast pocket] Directions. | |
| iPAQ: Directions to which location? | |
| Dave: Nearest Krispy Kreme. | ??? Some visuals have no good audio equivalent.This input might be realistic, might not.What would the grammar be if there were three alternative Gucci shirts being displayed in a shopping app?How is such a grammar generated? In true voice-only mode, we should never require visual input (e.g., can’t ask automobile driver to choose from items on a screen).But, e.g., how to voice-choose between MapQuest’s 3 different hypotheses of what you mean when you say “12 Main Street” (as sometimes happens)? Do we just tell the user “you have to look at the screen now”? |
| iPAQ: Go west on North Moore Street 0.05 miles. Say ‘repeat’ to hear that again, ‘where am I?’ to hear how you’re progressing, or, when you’re ready, say ‘continue’ for next direction. | |
| Dave looks around, gets oriented, and starts to walk through sidewalk crowd. | |
| Dave: [clears throat as he walks] | |
| iPAQ: [earcon:off-key “bonk!”] Excuse me? | In voice-only mode, assumption is that user is doing something else with eyes & hands, but mic is always active. This is different from phone conversations where it’s OK to assume complete attention to the conversation.Need a low-cost, ie short and non-intrusive, way to communicate non-rec of possible input. |
| Dave: Never mind. | “Nothing”, no response, etc.,would’ve also been acceptable to the iPaq. |
| Dave continues to walk. He gets to end of block. | |
| Dave: Continue. | |
| iPAQ: Continuing.Turn left onto Varick Street and proceed south for 0.1 miles.You can say ‘repeat’, ‘where am I?’, or ‘continue’. | |
| Dave continues to walk. | |
| Dave: Switch to “voicepad”. | Will multimodal devices be multitasking?How do we switch back and forth between applications?Safe to assume no context when we switch? |
| iPAQ: [harp strings!] Voicepad.Recording. | Unique earcon to represent that we’re switching apps.This app has only one mode.Note that switching apps does not switch modes; modes are cross-application and based on situation, not the app. |
| Dave: Note to self.When you get back to the office, look up Joanne’s last name from today’s meeting. And write it down this time!<pauses> End recording. | |
| iPAQ: Saving Voicepad Note Number Seven. [harp strings!] MapQuest!You can say ‘repeat’, ‘where am I?’, or ‘continue’. | Only context retained was last instruction reminder. |
| Dave continues to walk. He’s gone 3 blocks, and is now at a confusing 5-way intersection.There are no street signs apparent. | No timeouts?The app prompted, but Dave hasn’t said anything in 90 seconds…. |
| Dave: Continue. | |
| iPAQ: Continuing.Bear right onto Leonard Street. | |
| Dave looks around, confused. | |
| Dave: Which one is Leonard Street? | |
| iPAQ: [bonk!] Excuse me? | Trade-off for non-obtrusive speech confirmation means less helpful (and shorter) reprompts. |
| Dave: I can’t tell which street is which. | |
| iPAQ: [bonk!] | High-repeat feedback fades back to just earcon, which could conceivably play dozens of times if user is having a side conversation while in voice-only mode.!! So helpful reprompts never get invoked? And we don’t have timeouts either (see above).Don’t know the right solution to balance between keeping the conversation on track, and not intruding too much into other activities. |
| Dave: [realizing he’s OOV] Where am I? | How else could he recover?If he had to fall back to the screen while driving, there could be an accident. |
| iPAQ: You are currently more than half-way from your starting point to your destination.Your next turn – | |
| Dave pulls iPaq out of breast pocket.Closeup shows “listening” icon on bottom-portion (always listening in voice-only mode) and same map of the city with 4 red circled dots as before; the green dot indicating Dave’s position has moved.Dave puts down his laptop case beside him (far fewer people here), pulls out stylus, and taps at the map. The thumb button is not depressed, so this kills the audio output.Bottom-portion icons change to show the transition from always-listening voice-only mode, back to not-listening mixed mode. | This is also one way you could cut off a runaway talking app when something else takes your attention.By tapping the screen, Dave is indicating he wants mixed-mode, which only provides output when button is depressed.(But what about the persistence we asked about in case his thumb wavers?) |
| Dave selects “Other Options” pull-down and chooses “Perspective View”.The map switches to a 3-D rendering from his perspective of the intersection he’s at; one street is labeled “Leonard” and there’s a red arrow pointing down it.Dave nods to himself, and depresses the thumb button. | |
| iPAQ: [bing!] | |
| Dave: Switch to “voice-only” mode. | |
| iPAQ: “Voice-only mode” active.Your screen is still active, but I’ll assume you can’t look at it, until you touch the screen anywhere, or until you say ‘switch to “mixed mode”’. You can say ‘Repeat’, ‘where am I’, or ‘continue’. | ?Describing modes to user is very cumbersome, and probably confusing! ? |
| Dave smiles, slips iPaq back into breast pocket, and keeps walking. In the distance, we see a “Hot Doughnuts Now!” sign begin to flash. Fade out. |