Conference Abstracts - Alex Hill

KHARMA: An Open KML/HTML Architecture for Mobile Augmented Reality Applications

Alex Hill
Augmented Environments Laboratory, Georgia Institute of Technology


The recent advent of GPS and orientation sensors on mobile devices has led to the development of numerous mobile augmented reality (AR) applications and broader public awareness and use of these applications. To date, the most flexible AR content authoring platforms such as the Layar ( and Wikitude ( browsers rely on proprietary protocols, closed source clients and data formats that severely limit client-side functionality. In order to see broad adoption, authoring tools for AR need to use existing standards and protocols when possible [1]. We have developed a combined KML/HTML Augmented Reality Mobile Architecture (KHARMA) that leverages existing protocols and content delivery pipelines.

At the heart of this architecture is an extension to KML [2] called KARML that allows HTML content to be authored, positioned in the surroundings and manipulated dynamically using the same JavaScript, dynamic CSS and AJAX techniques used to create Web 2.0 content. The architecture seeks to create lightweight and easily authored AR content by decoupling resources such as representations of physical infrastructure and sources of tracking data from both the authoring pipeline and runtime content delivery. This approach results in an architecture with three main components: channel servers delivering multiple individual channels of AR content, tracking servers providing content related to location, and infrastructure servers delivering information about the physical environment.

We have implemented the KARML extensions on a client for the iPhone called Argon and released it in the iTunes store [4]. A number of demonstration examples including Twitter and Google Local search illustrate the many different AR applications that can be rapidly built using the KHARMA platform.


The basic properties of most AR applications can be summarized by the familiar refrain, "What?, Where? and How?". Modern web standards for content delivery and client-side interactivity provide a means to author content that addresses What? and How? We felt that the significant penetration of KML into everyday applications such as Google Maps (GM), Yahoo Maps and various web services made it a strong candidate for answering the question of Where?. Although many consider AR to be a fundamentally 3D media, we feel that 2D HTML combined with recent CSS3 standards provides a rich palette for authors. The KML specification already has limited support for HTML content through feature point descriptions displayed in callouts called balloons in the GoogleEarth (GE) application.

In developing the KARML extensions, we attempted to re-conceive the language in the context of AR browsers and avoid introducing elements whose function can already be accommodated by existing elements. One result was the repurposing of the KMLCamera node, which normally indicates a location to which the GE camera should "fly to". Because the user directly controls the viewpoint in an AR context, we use the KMLCamera node to indicate the presence of surveyed locations called GeoSpots. The KARML extension also adds a modifier to KML style elements indicating the HTML content stored in a feature description should be rendered without decoration. This allows the authored HTML content to be seamlessly integrated into the background scene. One drawback to using KML in the service of AR authoring is the lack of a notion of relative positioning; all points and even the vertices of geometry elements in KML are defined in terms of absolute longitude, latitude and altitude. KARML adds a mode that establishes the associated content in either a fixed geospatial coordinate system or relative to another KML feature. We have leveraged these relative frames of reference to facilitate connecting HTML content to typical AR fiducial markers [3].

The question of How? also extends to the client side tools the AR author has for manipulating content. In the current implementation of GE, each content balloon has a separate namespace un-addressable by other balloons, even those created by the same source. Removing this restriction significantly increases the interactivity that can be achieved between different feature points in the same channel. Dynamic interaction between different channels can be accomplished using the same authentication, sessions and AJAX tools that let content in one desktop browser window affect content in another.


Civilian grade GPS is only accurate to within tens of meters and can easily lead an AR browser to render a nearby business in front of the user that is actually behind them. An alternative source of tracking information involves providing nearby surveyed locations to users along with descriptive information about how to find them. Users can indicate their presence at these locations, called GeoSpots, and effectively increase the positional accuracy of the browser. We have deployed a GeoSpot tracking server and database with associated HTML/SOUP protocols to deliver surveyed locations based on geospatial query.

Since the subjects of most current augmentations are static, we also provide an optional panoramic image at surveyed locations that can be used to replace the live video. By using the phone orientation sensor to display the appropriate subset of the panorama, orientation accuracy can be effectively increased and augmentations tightly registered with the background. Tracking server information can be used offline to give AR authors information about the context in which their augmentations will be viewed. Knowing the location of nearby trackable surfaces or surveyed GeoSpots can influence the nature of the AR content the author creates. During runtime, changing tracking accuracy can also affect the types of augmentations the author presents. When only GPS is available, using labels that can be expanded to full screen may be appropriate. When a panoramic background is in use, the author may try to register content more tightly with the scene.

View slides from this presentation.


  1. B. MacIntyre , M. Gandy, J. Bolter, S. Dow, B. Hannigan, DART: The Designer's Augmented Reality Toolkit, Proceedings of 2nd IEEE and ACM International Symposium on Mixed and Augmented Reality, October 07-10, 2003
  3. Kato, H., Billinghurst, M., Marker Tracking and HMD Calibration for a video-based Augmented Reality Conferencing System, Proc. of the 2nd Int. Workshop on Augmented Reality, San Francisco, 1999
  4. and