Dont Mode Me In

Oh, give me keys, lots of keys on the keyboard in my cube, Don't mode me in. Let me work without fear that my strokes are misconstrued, Don't mode me in. Let's just keep things simple when I'm bangin' on my keys, Pasting and inserting and replacing with ease, Make me use MS Word but I ask you please, Don't mode me in.

Apologies to Cole Porter.

More to the point: http://www.byte.com/art/9608/sec4/art3.htm

Exploring DontModeMeIn: If it were a high-priority OS Design Criterion

Applications are modes that capture all user input and prevent it from going to the OS or other applications; they 'mode me in' to the application interface. A philosophy towards DontModeMeIn implies a philosophy supporting NoApplication.

FileSystems imply modes: it implies that you're working with one 'file' at a time, though some Applications get around this. DontModeMeIn would reject a FileSystem, and simply make all files available for interactions at all times... e.g. via ZoomableInterface? or ObjectBrowser. These would still need to organize and store the objects, of course, but they'd do so in a manner that allows natural interactions between them.

'Running' vs. 'non-Running' programs is a mode. DontModeMeIn would suggest that programs run 'continuously', which in turn would suggest that programs (which shouldn't be 'applications') adopt a different approach towards normal operation (e.g. EventDrivenProgramming rather than ProceduralProgramming in order to drop the permanent (and oft expensive) polling loops). Processes become 'objects' that are inactive between processing inputs.

If DontModeMeIn was fully embraced as a design criterion, it would lead to an entirely different sort of OperatingSystem than any available for use today. (Perhaps it would lead to some NewOsFeatures.)

There exists a (surmountable) problem with DontModeMeIn: if the user provides input (via mouse, keyboard, camera (video and gestures and eyes and facial microexpressions), microphone (sound and speech), pedals, joystick, sketchpad, wiimote, etc.), and there are no modes at all, and there are a lot of processes-objects that could accept input in a big, unified zoomable interface/object browser (but no 'applications') how is the computer to even begin to comprehend what that input is supposed to mean? where it is to go? how it ought be transformed to be acceptable at its destination?

The best answer I can think of is context. If you DontModeMeIn, you must shoulder tasks that 'modes' would have handled by inferring the semantic context. In a ZoomableInterface?, part of the context is well implied by the perception and apparent focus of the user (based on where the zoom is, focus of eyes, and focus of other pointer-inputs). However, there is also temporal context based in the memory of the user: the objects recently browsed, the commands recently given. (Further, one might also apply a predictive-learning success/failure context associated with the user, allowing the program to learn the user's habits and intuit intent based on success and failure in predictions, what the user does and undoes, etc.) Of course, context-based is also imperfect and subject to error, but it doesn't mode me in (fortunately if extra confirmation is needed due to lack of confidence in a decision (or predicted cost/permanence of performing it - undoable actions never need confirmation), DontModeMeIn allows for request of confirmation input... so long one isn't "moded in" just to provide confirmation.

[For inferring facts about the users memory and intention see also ExplicitUserModel]

Transforming user inputs into something computationally and semantically acceptable to an object, and inferring the intended semantics of input... now that's a VERY tricky problem (certainly more difficult than inferring context as to which objects the input was to be applied). It would work (SecondLife style) to simply make a single set of inputs that all objects can choose to accept. However, this is somewhat limiting: most objects won't handle all inputs, so one becomes 'moded in' by the objects; further, it limits the utility of new input devices. So, for the input interpretation layer, a possible approach would be to provide semantic classifications for the process-objects (as to which forms of input they're expecting to receive) and provide the environment with a mapping layer between common input-devices and the objects (e.g. 'left-click on object means ABC for objects of semantic type XYZ' or 'objects of type DEF should attempt to interpret speech from ambient user-audio'). The environment (zoomable interface or object browser) would also (necessarily) ALWAYS be open to commands if one is to not be "moded in" (e.g. 'Ctrl+Z means UNDO no matter what the context or who 'did' it' or 'when browsing pron, the environment is to identify my boss and promptly eject me to a random work location'). This mapping could have some flexibility (allowing for 'learning' by the OS).

I expect if the context inference was 'good enough', I wouldn't really need QuasiModes. Modifier keys would apply, but 'modified' operations would apply just like everything else.

Despite all the above, I expect it'd be better to have at least a couple major user-input modes for the 'system as a whole', largely to provide for better debugging and better tech-support. E.g. hit Ctrl+Alt+F1 and get a semi-transparent command shell across the 'main' monitor that completely hooks the keyboard you used to press Ctrl+Alt+F1... offering access to a debugger and access to the configuration information for the object-browser (and ability to create a fresh object browser or reconfigure the current one). It'd be rather hellish to be called for support and walk in on someone's customized-til-it-broke environment and be unable to provide meaningful input or fixes. This is akin to the Ctrl+Alt+Delete for Windows. It's a good thing.

All this is a pipe dream at the moment. :-) However, it's all possible (physically, computationally, theoretically... just not practically). See the discussion in MindControlWithDerrenBrown.

I think there's actually two separate concerns here which can be teased apart: a modeless data/computation model, and a modeless UserInterface. The first does not necessarily imply the second, and possibly shouldn't.

I'm not certain how they could be divided. A modeless data model in the context of a UI would result in, at least, one modeless aspect for the UI: the query results for display in the ZoomableUserInterface could be from anywhere, not restricted to a single computer. And if one is to have a modeless UI, one would benefit greatly from having a modeless data model underlying every operation as opposed to fighting to create the illusion thereof with every distinct operation (e.g. when making objects interact).

I envisage an interface perhaps similar to today's GUI, with multiple windows (or tabs or pages; the exact physical metaphor not being entirely important, as the GUI widgets should be interconvertible). Each 'window' would be a view of the underlying DataModel or ObjectSystem from a particular perspective.

I.e. displays each of which constitute an automatic layout of objects in an ObjectBrowser in response to queries. Check out KillerUserInterface.

That way, you can separate the modeful GUI aspects of 'which window is the keyboard currently bound to' from the modeless aspects of 'each window is just another instance of the same NoApplication'.

Another implication of this is that it ought to be possible to distribute one's 'desktop' or 'browser session' equivalent across a set of multiple workstations, in a similar way to how XwindowProtocol works. Evolve this a bit and one might get a whole networked collaboration environment, somewhat like OnLineSystem. Or LotusNotes if we're unlucky.