WO2006052390A2

WO2006052390A2 - System and method for managing communication and/or storage of image data

Info

Publication number: WO2006052390A2
Application number: PCT/US2005/037226
Authority: WO
Inventors: Blaise Aguera Y Arcas; Julian Walker; Ian Gilman
Original assignee: Seadragon Software, Inc.
Priority date: 2004-10-15
Filing date: 2005-10-17
Publication date: 2006-05-18
Also published as: CN101147174B; EP1810249A4; WO2006052390A9; EP1810249A2; WO2006052390A3; JP2008517540A; JP4831071B2; CN101147174A

Abstract

A system and method are disclosed, that may include establishing communication between a first computer and a second computer over a communication link, the second computer having an image collection stored therein in the form of compressed image data; selecting a plurality of images in the collection for communication to said first computer; and transmitting low-resolution image data for all of the selected images from the second computer to the first computer before transmitting full-resolution image data for any of the selected images.

Description

SYSTEM AND METHOD FOR MANAGING COMMUNICATION AND/OR STORAGE OF IMAGE DATA

BACKGROUND ART Recently developed image compression and transmission standards such as JPEG2000/JPIP have enabled the interactive display of large images (i.e. gigapixels in size) over narrow bandwidth communication channels. However, these emerging standards and technologies do not provide means for achieving a more ambitious goal: to allow flexible visual interaction with a very large number of images simultaneously, each of which may also potentially be very large. Accordingly, there is a need in the art for an improved system and method for an improved system and method for transmitting and/or storing image data.

SUMMARY OF THE INVENTION

According to one aspect, the present invention provides a method that may include establishing communication between a first computer and a second computer over a communication link, the second computer having an image collection stored therein in the form of compressed image data; selecting a plurality of images in the collection for communication to said first computer/ and transmitting low-resolution image data for all of the selected images from the second computer to the first computer before transmitting full-resolution image data for any of the selected images.

Other aspects, features, advantages, etc. will become apparent to one skilled in the art when the description of the preferred embodiments of the invention herein is taken in conjunction with the accompanying drawings. BRIEF DESCRIPTION OF THE DRAWINGS

For the purposes of illustrating the various aspects of the invention, there are shown in the drawings forms that are presently preferred, it being understood, however, that the invention is not limited to the precise arrangements and instrumentalities shown.

FIG. 1 is a block diagram of a system that may be connected to enable communication of image data a plurality of computers in accordance with one or more embodiments of the present invention;

FIG. 2 is a block diagram of an image having at least two regions of interest therein in accordance with one or more embodiments of the present invention;

FIG. 3 is a block diagram of a "virtual book" that employs aspects of the technology disclosed herein in accordance with one or more embodiments of the present invention;

FIG. 4 is an illustration of a three-dimensional version of the virtual book of FIG. 3 in accordance with one or more embodiments of the present invention;

FIG. 5 is block diagram of a system for managing image data communication between one or more portable devices and one or more other computers in accordance with one or more embodiments of the present invention; FIG. 6A illustrates the results of an incomplete image data download employing an existing approach;

FIG. 6B illustrates the results of an incomplete image data download in accordance with one or more embodiments of the present invention; FIG. 7 is a block diagram of a "common space" that may include a physical display (screen) and two virtual displays in accordance with one or more embodiments of the present invention;

FIG. 8 illustrates a collection of over one thousand images (a collection of digitized maps of various sizes) packed into a montage in accordance with one or more embodiments of the present invention;

FIG. 9 illustrates a snapshot of about three thousand images that have been dynamically re-arranged into a random configuration in accordance with one or more embodiments of the present invention; and

FIG. 10 is a block diagram of a computer system that may be adaptable for use with one or more embodiments of the present invention.

BEST MODE OF CARRYING OUT THE INVENTION FIG. 1 is a block diagram of a system 100 that may be connected to enable communication of image data between a plurality of computers in accordance with one or more embodiments of the present invention. System 100 preferably includes client computer 102 which is connected to display 104 and data storage device 106. System 100 preferably also includes server computer 108 which may be connected to data storage device 110. Server computer 108 may also be connected to the Internet 112.

In one or more embodiments, image data may be communicated between a plurality of computers 102, 108 so as to enable viewing of large collections of potentially large images using a relatively low-bandwidth connection therebetween. For instance, desirable viewing and navigation of images stored at server computer 108 may be accomplished by transmitting selected portions of the image data stored at server computer 108 at controllable levels of resolution. The selectivity of image data 114 may be such as to select a particular image at high resolution, or even a selected portion of a particular image at high resolution.

Herein, various embodiments are discussed that include varying the types of devices used as client computer 102 and server 108, the types of image data 114 transmitted therebetween, and various applications of the ability to transmit selected image data at specified levels of resolution.

FIG. 2 is a block diagram of an image 200 having at least two regions of interest 202, 204 therein in accordance with one or more embodiments of the present invention. Image 200 could be a subset of image data 114. Alternatively, image data 114 could represent a subset of image 200, depending upon what image data is requested by client computer 102. In one or more embodiments, image 200 may be stored in compressed form on server computer 108, or within storage device 110. Preferably, when stored in this manner, data for a plurality of resolution levels for various regions of image 200 may be stored and may be requested for downloading by client computer 102.

In one or more embodiments, the resolution level at which a particular image or region of an image is stored on client computer 102 may be readily increased or decreased. Where a prior download results in storage of region or image at a first resolution level (which may be less-than-full resolution) , this first resolution level may be increased by adding data representing the next higher level of resolution, preferably without having to discard the data representing the first resolution, thereby avoiding redundancy and increasing the efficiency of the image data communication contemplated herein'. Conversely, the resolution level of a region or image stored at client 102 may be decreased by discarding the highest level of resolution stored therein, without losing data corresponding to the lower levels of resolution for the same region or image. Such resolution reduction may be practiced at client 102 to clear data storage space needed for one or more regions or images other than the one for which data is being discarded.

The pertinent image compression may be provided by, for instance, the use of JPEG2000 or another discrete wavelet transform-based image compression scheme. However, the present invention is not limited to the use of any particular compression format or image data representation. Other formats may be employed, including image formats whose sizes in bytes are not substantially smaller than the uncompressed image data. It is merely preferable that the selected image format be susceptible to multiscale representation and storage of image data.

In one or more embodiments, client computer 102 may seek to download one or more regions of image 200, where such regions may be portions of image 200. The one or more regions of interest 202, 204 may be the only ones that client computer 102 seeks to download. Alternatively, client computer (client) 102 may merely seek to download one or more selected regions at higher resolution than the resolution at which the remainder of image 200 is downloaded. In either case, client 102 may request a download by identifying both a specified region of image 200 to download and a resolution level at which this specified region will be provided by server computer (server) 108.

In the example of FIG. 2, client 102 preferably requests a download of all of image 200 at low resolution. (The exact resolution level at which the bulk of image 200 is downloaded is not pertinent to this discussion) . However, client 102 seeks to download region of interest 1 202 at a higher resolution, or even at full resolution. Accordingly, client 102 preferably specifies the coordinates and the desired resolution level of region of interest 1 202 to server 108. Thus, in addition to downloading the bulk (including that portion external to region of interest 1 202) of image 200 at low resolution, client 102 preferably downloads region of interest 1 202 at the specified higher resolution. In other situations, client 102 could seek to download only the region (s) of interest and omit a download of the remainder of image 200. In this manner, a user of client computer 102 may view region of interest 1 202 at high resolution without having to download the entirety of image 200 at this high resolution. Thus, a relatively low-bandwidth data communication link between client 102 and server 108 could nevertheless transmit the entirety of image 200, while providing a region of particular interest (in this case, region of interest 1 202) at high resolution, thereby providing the viewer with the same viewing experience with respect to the region of interest that would have occurred had client 102 downloaded the entirety of image 200 at the high resolution, this latter option however demanding considerably more download time and data storage space at client computer 102 or data storage device 106.

Shifting the Region of Interest

In one or more embodiments, a user of client computer 102 may wish to pan across image 200. Normally, panning from one region of interest 202 to another 204 would involve having both regions downloaded at client 102 at the level of resolution at which the regions will be viewed. Moreover, generally, all image territory in between region of interest 1 202 and region of interest 2 204 would be stored at client computer 102 to enable the described panning to occur. As described in the following, in one or more embodiments of the present invention, viewing of such regions of interest 202, 204 may be accomplished by downloading much less data and using less storage space at client computer 102 than in the approach described above.

In one or more embodiments, client 102 may shift from a high resolution view of region of interest 1 202 to region of interest 2 204. Preferably, image data corresponding to a low-resolution representation of region of interest 2 204 is already present in client computer 102 from the download of image 200, discussed above, In this case, all that is needed is to supplement the existing image data for region of interest 2 204 with additional image data describing the pertinent higher levels of resolution to arrive at a high- resolution rendition of region of interest 2 204 at client computer 102. If needed, image data representing the higher resolution levels of region of interest 1 202 may be discarded or overwritten to make space in data storage device 106 or other data storage space for the additional image data to be downloaded for region of interest 2 204.

In one or more embodiments, the shift in view from region of interest 1 202 to region of interest 2 204 may be accomplished gradually, to provide a viewer of display 104 with a viewing experience that may closely simulate that available on a computer that has the entirety of image 200 downloaded at high resolution. Specifically, the level of resolution at which region of interest 1 202 is displayed may be reduced gradually to the resolution level at which most of image 200 is represented. Thereafter, the view on display 104 may present a gradual pan across the low-resolution territory in between region of interest 1 202 and region of interest 2 204. Finally, upon arriving at region of interest 2 204, the view on display 104 may increase toward a high-resolution rendition of region of interest 2 204 either after completing the panning across image 200 or concurrently with the latter portion of this panning operation. Preferably, at the conclusion of the described process, region of interest 2 204 may be stored in client computer 102 at high resolution and may be displayed on display 104 at this high resolution. FIG. 3 is a block diagram of a "virtual book" 300 that employs aspects of the technology disclosed herein in accordance with one or more embodiments of the present invention. Virtual book 300 may include display 302, backward cache 304, and forward cache 306. While the caches 304, 306 are each shown having two pages stored therein, any number of pages may be stored in either of caches 304 and 306.

In one or more embodiments, virtual book 302 employs the ability to present selected image data at controllable levels of resolution for the particular case of virtual book 300. In virtual book 300, each image may be a page within display 302 of virtual book 300. Display 302 may correspond to display 104 of FIG. 1 or may be a special purpose display that accommodates the specific features of virtual book 300. Virtual book 3000 may correspond to client computer 102 of FIG. 1, or may be a special purpose computer that is substantially limited to communicating, storing, and displaying pages of books.

In one or more embodiments, virtual book 300 may include only one page that is stored and/or displayed at full resolution, with other pages, both earlier and later in the sequence of pages displayed, at a variety of other resolutions.

In one or more embodiments, the page currently displayed on display 104, i.e. the active page, is displayed at full resolution, which is "page 10" in FIG. 3. In such embodiments, other pages may be displayed at progressively lower resolutions with increasing distance in pages from the active page. More specifically, the resolution at which each page is stored may equal the resolution of the active page being displayed in display 306 divided the quantity equal to 2 raised to a power equal to the number of pages between each stored page and the active page. Thus, applying this approach, page 11 (in forward cache 306) and page 9 (in backward cache 304) may each occupy one half the amount of data storage space occupied by the active page in display 302. Continuing with this approach, page 12 (in forward cache 306) and page 8 (in backward cache 304) may each occupy one quarter the amount of data storage space occupied by the active page in display 302.

While in the above discussion, the amount of data storage space allocated to each page differs by a factor of two with respect to its immediately neighboring page, it will be appreciated by those of skill in the art that a value greater than or less than two may be employed as the division factor. Moreover, arithmetic formulae other than division of the data storage space of the active page by a constant may be employed to determine the allocation of data storage space to a succession of pages stored in caches 304 and 306.

In one or more embodiments, a new active page may be selected in place of page 10 which is shown displayed in FIG. 3. The new selected page may, but need not, be a page immediately adjacent page 10 (either page 9 or page 11) . That is, any page from 1 to the last page in the pertinent book (or any other type of publication with discrete pages) may be the new active page.

In one or more embodiments, upon selection of the new active page, a transition between the currently active page and the new active page is preferably conducted. This transition to a new active page may include acquiring additional image data for the new active page to enable the new active page to be stored and/or displayed resolution. If the new active page is "page 11", and the "factor-of-two" embodiment, discussed above, is employed, the amount of data storage space allocated to page 11 will preferably double. Continuing with an application of the "factor-of-two" embodiment, the data storage space allocated to page 10 will preferably be halved as part of the transition away from page 10 and toward page 11, as the active page. The data for the active version of page 10 that is not included in the post-transition page 10 may be discarded (which may include overwriting thereof) . Alternatively however, this "surplus" data for page 10 may be stored in another cache. Such caching of the page-10 surplus data may provide efficiency if a transition to page 10 occurs soon after (i.e. within a reasonable number of page transitions) the transition away therefrom.

In one or more embodiments, the transition from page 10 to page 11 (or other new active page) may include a gradual fade-out from page 10 and gradual fade-in of page 11, to provide an experience that is visually pleasing and/or reminiscent of a physical page transition to the user of virtual book 300. Optionally, a sequence of images showing the folding and turning of the old active page may be provided to make the virtual page transition look still more reminiscent of a physical turn of a page. FIG. 4 is an illustration of a three-dimensional version of the virtual book of FIG. 3 in accordance with one or more embodiments of the present invention. The embodiment of FIG. 4 illustrates a method in which an alpha channel, for partial transparency (the rough edges) may be stored as image information in addition to the red, green and blue color components. While color components are discussed above, for the sake of convenience, only a black and white rendition of the image of FIG. 4 is provided herein. In one or more embodiments, hardware-accelerated texture mapping may be employed to support an alpha channel. Another feature that may be practiced in connection with either two- dimensional or three-dimensional embodiments of the virtual book is dynamic deformation of images, e.g. bending the pages of this book as they turn, as illustrated in FIG. 4.

Managing Image Data in One or More Portable Devices

In this section, a number of mechanisms for storing and interacting with the digital images, based on progressive and interactive visual collection transmission are described. In one or more embodiments of the present invention, variations on the methods disclosed herein allow near-instantaneous viewing, on a desktop computer, a mobile device, or other devices, of a large collection of images stored on a second mobile device; the use of remote storage to augment a mobile device's local memory for the purpose of viewing images; and browsing of large image collections from a mobile device. The various permutations enabled by one or more embodiments of the present invention may rely on a common client/server imaging and collection representation architecture.

One or more embodiments of the present invention may provide a method which may include providing a collection of digital images or other visual objects on a server; establishing communication between a client and said server; and enabling efficient multi-scale navigation by the client of collections of visual objects residing on the server.

In this disclosure, the term "digital image data" may include digital photographs, digital images, visual documents or other forms of visual content. Herein, the term "image" generally corresponds to the term "digital image," and either of these terms may correspond to a "digital photograph." Herein, the term "client" generally corresponds to the term "client side" and to the term "client device". Herein, the terms "portable device", "portable camera device", and "camera device" generally refer to a digital image capturing devices and/or digital image storage devices. Herein, a "digital image capturing device" may include but is not limited to a digital camera, a camera-enabled mobile phone (which may be referred to as a camera-enabled cell phone) , a personal digital assistant, and/or a digital video recorder able to record digital still images. A "digital image capturing device" may include devices that are capable of ^• receiving image data by directly optically receiving and recording such data (such as with a standard digital camera) and may also include devices that are able to receive image data via a wired or wireless Internet or other network connection.

One or more embodiments of the methods described herein may use a multi-resolution approach to address the problems of storing, synchronizing, browsing, and organizing collections of digital image data, which may be visual documents. Digital photos, which may be represented as arrays of color pixels at a certain resolution (e.g. 1024x768 pixels = 0.75 megapixels, 2592x1944 pixels = about 5 megapixels, etc.), are a common visual document type that end-users may create in large numbers using digital cameras, camera-enabled mobile phones, and digital video recorders, among other devices.

One or more of the methods described herein may also apply to visual data objects other than images, such as the roadmap or other vector data of Applicant reference document

489/17NP (U.S. Patent Application Serial No. 11/082,556) or the textual data of Applicant reference document 489/13 (U.S.

Provisional Patent Application Serial No. 60/617,485) . . (Both documents are identified in greater detail at the beginning of this document, and both documents are incorporated by reference herein) . A problem facing users of existing systems is that the camera devices can quickly create large numbers of potentially- large visual documents. However, these devices typically don't have sufficient memory or visual browsing facilities to allow satisfactory archiving, viewing, or organization of these documents .

Digital photographs or other digital image data stored in a camera or other portable device are generally downloaded periodically to a desktop or notebook computer, cleared from the camera's memory to allow more pictures to be taken, and organized and/or viewed on the desktop or notebook computer. Thereafter, digital photographs may be shared with friends by posting a selection of digital photographs to one or more Internet sites. Traditional Approach to Managing Image Data on Portable Devices

The following steps may be followed when using a conventional approach to managing image data on portable devices. First, a mobile device, which may be a digital camera or other digital image data capturing device, takes pictures. Then, potentially after some culling of the pictures, the pictures are downloaded to the camera user's PC

(personal computer) and deleted from the camera device. The camera device's local storage may be limited and, in this conventional approach, only holds images transiently, until they are safely stored on the PC.

The PC may permanently retain in its memory (e.g. hard disk drive or other non-volatile storage) any subset of the digital photos. The user may in turn upload some further culled subset of those images to a web server which may be owned by a web photo publishing service, typically at reduced resolution. The images uploaded may be made publicly viewable by any third party using a web browser on a PC or other device, or by some subset of those users with restricted access.

Limitations of the existing approach may include lengthy download times from the camera device to the PC. There is also usually poor management of persistent storage on the camera device. Camera devices typically have small color displays on which viewers could theoretically view persistently stored images of the same type people commonly carry in their wallets

(such as of family and pets) and photos associated with callers or other contacts on a PDA (Personal Digital

Assistant) . However, the limitations on persistent storage in existing camera devices makes the above task difficult to achieve.

Moreover, existing camera devices impose other limitations. In existing camera devices, navigation of images stored on the camera device is generally awkward and difficult. In existing camera devices, there is a lack of a unified visual interface to image collections which would give users a consistent experience either on the camera device or on a PC. Existing camera devices tend to impose very restrictive limits on the number of photos that can be stored thereon before downloading becomes necessary. Thus, when employing existing approaches, a lengthy series of steps is generally involved in making images available to a third party.

Managing Image Data According to One or More Embodiments of the Invention

FIG. 5 is block diagram of a system 500 for managing image data communication between one or more portable devices 512, 522 and one or more other computers in accordance with one or more embodiments of the present invention. System 500 may include a client side 510 and a server side 520. However, in alternative embodiments, the client and server statuses of the groupings of devices shown in FIG. 5 may be reversed.

In one or more embodiments, system 500 may include portable device 1 512, portable device 2 522, personal computer 102 (which may be substantially the same as client computer 102 of FIG. 1), server 108 (which may be substantially the same as server computer 108 of FIG. 1) and/or additional computers 524. Preferably, each of devices

512, 522 and computers 102, 108, and 524, have memory and one or more displays included therewith. Alternatively or additionally, the devices and computers of FIG. 5 could be in communication with memories and/or displays.

FIG. 5 illustrates various possible data paths useable in accordance with one or more embodiments of the present invention. One or more embodiments may use less than all of the data paths shown in FIG. 5. The available data paths shown in FIG. 5 may have one or more of the following features in common: 1) The data paths may involve server side 520 (the originator of the image data) and a client side 510 (the recipient of the image data); 2) bi-directional data paths (which are illustrated with lines having arrows at both ends) indicate that the devices pointed to by these arrows are capable of serving in either a client or a server capacity; 3) the connections may employ a hard-wired network (e.g. Universal Serial Bus (USB) , Firewire or Ethernet) or a wireless network (e.g., for nearby devices, Bluetooth, and for more remote connections, WiFi or a wireless wide-area networking protocol); and/or 4) the illustrated connections may or may not be ad-hoc. In one or more embodiments, both the client side 510 and the server side 520 may include one or more digital computing and/or storage devices including but not limited to: camera devices, personal computers, and personal digital assistants. In one or more embodiments, a client device (client) may have one or more displays. The client may browse a collection of documents residing on the server using one or more of the efficient multi-resolution browsing methods described in Applicant reference document 489/15P (U.S. Provisional Application Serial No. 60/619,118, entitled "Method for Efficiently Interacting with Dynamic, Remote Photo Albums with Large Numbers of Potentially Large Images" which is incorporated herein by reference) . These methods allow large collections of large images or other visual documents to be navigated efficiently over low-bandwidth connections. Zooming, panning, and dynamic rearrangement of such image collections are described in the referenced document.

In one or more embodiments, one of the properties of this navigation method is that the display contents may gradually come into focus as information is sent from the server to the client. The rate at which this information comes into focus may be governed by the ratio of connection bandwidth to display pixels. When the user zooms, pans, or rearranges the documents on the client side 510 such that new content becomes visible, this content again appears blurred, then comes into focus.

Virtual Display

In one or more embodiments, a client's "display" need not necessarily be physical, or visible to an end-user. In one or more embodiments, this display can be a "virtual display", i.e. an abstract model of a display with a specified resolution. Such a "virtual display" might be represented as an array of pixel values in the client' s memory, irrespective of whether those pixel values are rendered to a screen. A virtual display may include wavelet data that at least partially describes one or more images. The wavelet data is preferably able to represent an image at a range of possible resolutions. In one or more embodiments, the wavelet data may correspond to that employed using JPEG2000. In one or more embodiments, a virtual display may include enough wavelet data to completely describe one or more images. For example, if it were desirable for a device to acquire thumbnails of all of the images in a collection at a specified resolution, then this device could create a "virtual display" of the appropriate size, establish a connection with a server, and request a view of the entire collection. The full set of thumbnails could then be transmitted to and rendered on this "virtual display". If transmission were interrupted before all of the relevant data were sent from the server to the client, then the client's virtual display would not yet have all of the thumbnail images in perfectly focused condition. However, all of the requested thumbnail images would preferably be stored within the client's virtual display with sufficient resolution to enable rendering visible versions of these images on a screen. The images rendered in the manner described would generally be of lower visual quality than if the transmission of the image had concluded without interruption. Thus, some image degradation may be present in the images rendered using data from an incomplete, interrupted transmission.

Nevertheless, the described degradation is preferable to the prior art approach to sending a set of thumbnails across a network, in which the complete image of each thumbnail is transmitted in turn. Under this prior art approach, a premature interruption of connectivity would result in some thumbnails being available in their entirety (i.e. at full resolution) and would result in other thumbnails being completely unavailable. FIG. 6 illustrates this difference.

FIG. 6A illustrates the results of an incomplete image data download employing an existing approach; and FIG. 6B illustrates the results of an incomplete image data download in accordance with one or more embodiments of the present invention.

FIG. 3A shows a prior art scenario in which all of the data for three thumbnails (shown with square shapes) have been received, and in which the remaining nine thumbnails (shown with X's) have not been received at all. FIG. 3B illustrates a situation that may arise employing one or more embodiments of the present invention, in which all twelve thumbnails (shown as cross-hatched square shapes) have been received at some level of resolution, which is preferably acceptable for viewing, but which is likely below the resolution that would be obtained after conclusion of a complete and uninterrupted data transmission. In one or more embodiments, a client may have a client- side cache that caches recently viewed visual content. A standard MRU (most-recently-used) cache may be employed for the caching needs for one or more embodiments of the present invention. However, a cache disclosed in U.S. Patent Application Serial No. 11/141,958 (client reference document 489/10NP), entitled "Efficient Data Cache", which is incorporated herein by reference, may be beneficially employed to enable more sophisticated client-side caching. In either case, a given amount of client-side memory may be devoted to the cache. Thus, navigation back to a recently viewed image may permit using image data stored in the cache, rather than requiring that this image data be re-sent from the server.

A client may have multiple displays. A given display may be physical or virtual. A given display may be driven directly by user input, or it may be driven programmatically by software within a client computer such as computer 102. The total size in pixels of all of the displays may be fixed or bounded by some limit, and this limit may define a minimum amount of client-side memory needed for visual content. This client-side memory is preferably separate from storage space allocated to the cache memory.

An embodiment involving both a physical display and a virtual display is described in the following. Preferably, a physical display within a client device is visible to the user and allows zooming and panning navigation through, and rearrangement of, a collection of digitally stored images. The user may also select one or more images from the collection and send them to a "holding pen" which may serve as a place for storing user-selected images. The holding pen may be visualized in some way on the physical display. Adding an image to the holding pen preferably causes the image to be placed on the virtual display, which may be invisible to the user. As images are added to the holding pen, the virtual display representing the holding pen gradually fills up.

This virtual display may increase in size (as measured in number of pixels) up to some limit, after which its size may remain fixed at this limit. The virtual display may be too small to display all of the images in the holding pen at full resolution. In this case, the data storage space needed for the images resident in the virtual display is preferably reduced as needed to fit the images into the virtual display. Hence, an off-screen view (the virtual display) preferably gets supplemented with images as the user puts viewable images into the holding pen. This supplementing of the off-screen view may occur invisibly to the user.

A method for browsing is disclosed in U.S. Patent Application Serial No. 10/790,253 (Applicant reference document 489/2NP) , entitled "System and Method for Exact Rendering in a Zooming User Interface", which is incorporated by reference herein. The method disclosed in that document for determining the order in which information is sent from the server to the client based on the client' s view may be modified for a multiple display scenario. The 489/2NP document discloses that visual information may be broken up into tiles, with each tile covering a region in space at a given resolution. Low-resolution tiles may then occupy large physical areas, while high-resolution tiles may occupy smaller physical areas, such that the amount of information in each tile may be substantially the same.

The 489/2NP document discloses methods for ordering tiles using criteria discussed in the following. One criterion may be tile resolution and tile position on the display. Sorting of tiles could be lexicographic, such that lower-resolution tiles always precede higher-resolution tiles, with spatial position only playing a role in resolving order within a resolution. (Lexicographic sorting is referenced here in the generalized tuple sense — for example, the lexicographic sort of the set of triplets { (1,2,3), (0,3,1), (4,0,0), (0,0,1), (0,3,2) } would be (0,0,1), (0,3,1), (0,3,2), (1,2,3), (4,0,0) .) Alternatively, non-lexicographic sorting criteria may be employed. For instance, a linear combination of a plurality of properties could be used to sort tiles . Such properties may include but are not limited to: resolution (which could be expressed in logarithmic units) and distance of the tile from the center of the display. Herein, the term "sort key" corresponds to the term "sorting criterion."

In this embodiment, lower-resolution tiles may be sent in preference to higher-resolution tiles, and tiles near the center of the display may be sent in preference to tiles near the periphery, but these properties can trade off against each other.

Preferably, minor changes may be implemented to adapt the above scheme for a multiple display scenario. In one embodiment, display number can be added as an extra lexicographic sort key. Thus, a first display might refine completely (in accordance with the other sort keys) before any tiles are sent relevant to a second display. In another embodiment, display number can be an additional variable for inclusion in a linear combination, allowing display number to trade off in some fashion against resolution and proximity to the center of the display. In yet another embodiment, the displays can coexist in an imaginary "common space", and the resolution and proximity-to-center sort keys can be used as before. The "common space" is a notional space establishing an imaginary spatial relationship between multiple displays, as if they were regions of a single, larger display. Defining this imaginary spatial relationship determines all parameters needed for prioritizing tiles in the multiple displays.

FIG. 7 is a block diagram of a "common space" 700 that may include a physical display (screen) 702 and two virtual displays 704, 706 in accordance with one or more embodiments of the present invention. The physical display 702 is preferably in the center of "common space" 700 at normal size. Virtual displays Vl 704 and V2 706 are preferably off to the side, and V2 is preferably scaled down, so that its pixels are preferably half the linear size of the physical display' s pixels. This means that, assuming purely lexicographic tile sort order, the contents of each resolution level in Vl 706 will preferably be sent from the server to the client after the corresponding resolution for the physical display (since Vl is farther from the center of the space than any point on the physical display) . Resolutions in V2 706 may be sent after all the tiles at a resolution twice as fine have been sent for both the physical display 702 and Vl 704. It is noted that it isn't necessary for the "common space" 700 to correspond to any real larger display or memory address space. The "common space" 700 is merely a conceptual convenience for establishing the relationships among tile priorities across different displays. Clearly many tradeoffs are possible. These tradeoffs can have the consequence, as in the lexicographic example above, of giving refinement of the physical display 702 the highest priority, while using any excess time and bandwidth not required for bringing the physical display into focus to continue refining the virtual display(s) 704, 706. The tradeoffs may alternatively begin refining the virtual display (s) after the physical display has largely, but not completely, come into focus. After the physical display 702 has largely come into focus, the physical and virtual displays 704, 706 can share bandwidth resources to refine in concert.

If the images in a collection are JPEG2000 images, then any subset of the data for a given image can itself comprise a JPEG2000 image file. During navigation of an image, the client may progressively download image data from the server, thereby supplementing the quality of the client's subset of the image and giving the client the ability to create a JPEG2000 file that is an increasingly accurate approximation of the full image.

If a client has navigated everywhere in an image, or has viewed the entire image at full resolution for long enough that all of the image data has been transmitted, then the client can recreate the entire original JPEG2000 file for that image. If a client has zoomed in closely on only a part of a large image, then the client could still create a JPEG2000 file, but it would lack detail everywhere except where the client zoomed in. This property of JPEG2000 can be extended to other multi-resolution document types as well. If the client never zoomed in beyond a given resolution, then no information would be available regarding the image content beyond that given resolution. In this case, the version of the JPEG2000 image which may be created and/or stored by the client may have a lower overall resolution than the original version of that image.

One application of the virtual display scenario described above is to ameliorate the problem of long download times for images from a camera. In one or more embodiments, the camera or camera-enabled mobile device may operate as the server, and a PC may operate as the client.

In one or more embodiments, rather than initiating a time-consuming batch download of all images to the PC, when the camera and PC are connected, the PC can rapidly browse through the complete set of images available on the camera. During navigation, a group of images can be selected and put in the holding pen. Note that if all images on the camera are to be downloaded to the PC in their entirety, then the total time needed to accomplish the transfer remains the same as in the prior art. However, as with the closely related problem of thumbnail transmission, this method can provide a number of advantages over the conventional serial download of images which are listed and discussed below. The present invention is not limited to the features listed below.

Image download and user navigation of the full image set on the camera or other mobile device may be concurrent and cooperative in their use of bandwidth (in effect, navigation merely influences the order in which tiles are sent from server to client) .

If the PCs display is larger than the mobile device's display, then better choices can be made about which images to download, which to leave on the mobile device, and which to discard, without incurring the delay of downloading the entire set before deciding. The experiences of browsing on the PC and on the mobile device (assuming that it also has a display) , respectively, are preferably simple and experientially similar, thereby increasing usability. If lower-resolution versions of the images in the holding pen are desired, it's preferably straightforward to suitably limit the detail of downloaded data by reducing the size of the item on the virtual display. It is noted that reducing the image size in this manner may both speed up downloading by a large factor — i.e. by a factor of 4 per resolution level discarded — and requires less space on the PC) .

By limiting the size of the virtual display and reducing the number of images therein as desired, the amount of memory allocated to photos on the PC can be bounded. Also, different constraints can be placed on different photos, and hence space can be allocated based on recency or one or more other criteria.

In one or more embodiments, premature loss of connectivity results in a degradation in the quality of some or all of the images to be downloaded, instead of completely removing some images from the download operation. (Note that the bulk of the data volume for an image is very high- resolution detail, some of which is camera noise, and all of which is less critical for ordinary viewing than the coarser image structure. Hence, it is preferable to conduct the transmission of the high-resolution image data for all images after the lower-resolution image data for all of the images has been fully transmitted. ) Hybrid prioritization of the image data is also possible, for example, favoring the complete download of a subset of the photos before proceeding to refine a second set beyond thumbnail detail.

In one or more embodiments, one or more methods disclosed herein are resilient to intermittent connectivity, since any JPEG2000 object can continue to be augmented at any time with additional information while still allowing browsing and interaction with whatever visual data has already been received. With regard to the above references to a) reducing the size of the item on the physical display and b) bounding the amount of memory allocated to photos on the PC, it is noted that typical home users may not want to discard any of their images (after an initial culling of such images) . If such users continue to add sufficient storage to their PC, then of course it should not be necessary to discard any content. The addition of storage can in itself increase the virtual display maximum size. Features of (a) and (b) above can therefore be omitted if a sufficiently large virtual display size can be created (i.e., if there is enough available client-side storage) .

Because it may be unclear to the user on the client-side when the "holding pen" images are finished downloading, some form of visual indication for completion is desirable. As an example, checkmarks or green dots can appear next to images as they finish downloading. When all images in the "holding pen" include green dots, the connection can be broken without loss.

Operations such as requesting that the camera discard some of its images using the client computer (which may be a PC) may benefit from some additional communication from the client to the server beyond that contemplated in Applicant reference document 489/15P. In one or more other embodiments, the client side could also instruct the server side (which may be a mobile device such as a digital camera or mobile phone) to launch its own client side, and create its own view to receive content from the PC.

This is similar to "push" methods developed in the context of the World Wide Web. The PC can render the camera/mobile phone's "view" of content on the PC, thus (for example) displaying the green completion dots described above for images uploaded from the PC to the camera. Each of the reciprocal arrows of FIG. 5 can be implemented using either a "push" or "pull" arrangement. Specifically, the viewport setting, the arrangement, and other navigation settings may be controlled from either the client side 510 ("pull") or from the server side 520 ("push") . A user interacting with one device can be connected reciprocally to another device, thereby enabling both "pulling" and "pushing" to occur simultaneously.

We will now enumerate the potential client-server connections shown in FIG. 5, and describe briefly how they can be used and why they are useful. A mobile device 512 which may be a camera or camera- enabled mobile phone may serve content to a user's PC (personal computer) 102. This connection might typically take place over a USB cable or a Bluetooth ad-hoc wireless network. The benefits are described above. The PC 102 may serve content back to the mobile device 512. This can be useful for the following applications, among others.

"Wallet photos" can be sent from the PC to the camera or mobile phone, even if those photos weren't taken by the mobile device.

The PC may be a home appliance without a display, and the mobile device may then be used as a primary visual interface to the archived visual material. The mobile device in this context may be a digital camera, a camera-enabled cell phone, a PDA, or a mobile tablet PC with a display.

A first mobile device can be connected directly to, or form an ad-hoc network with, another mobile device (the "guest") . The two mobile devices can then view and share each others' photos.

The PC could upload images (via push) to a remote server.

The server may be a photo sharing service, and may therefore implement the kind of space constraints envisioned in the above processes of reducing the size of the item on the physical display and bounding the amount of memory allocated to photos on the PC. The remote server could then serve its collection to one or more additional PCs. Typically this would be a broadband connection. However, other connection types could be employed.

The remote server could also serve collections to mobile device users. Typically this would be a mobile wireless wide- area network. Mobile devices could upload their images via "push" (that is, under control of the mobile devices) to a remote server. In one or more embodiments, the upload may be automatic, allowing the mobile device to transparently extend its apparent storage space by transferring content freely to a server and deleting it locally when transfers are complete.

In connection with the last two items above, it is noted that local caching on the mobile device 512 may allow the mobile device 512 to support browsing through very large thumbnail collections using only local storage, even if the local storage is limited. Zooming in on details of recently viewed images may also be possible, if the relevant information is still in the mobile device's local cache.

Zooming in on images whose details are only available on a remote server could result in a blurry and un-detailed image. If the mobile device is on a network that includes the remove server 108 however, the blurry image can become progressively more refined as more and more detailed image data is downloaded to the mobile device 512. If the mobile device is not connected to a network that can supply additional image data, the image may not be presented with any greater detail than is available in the initial thumbnail image.

Montage of Low-Resolution Images

One or more embodiments of the present invention may define precomputed steps and interactive rendering algorithms which can be used in a variety of configurations to implement downloading of selected images and or image regions at controllable levels of resolution for various applications. Many of these applications (such as focusing on regions of interest, virtual book etc..) may involve user interaction with a "universe" of images.

In one or more embodiments, the starting point for precomputation may therefore be a list of the filenames, URLs, or other strings referencing the individual images. When a user is zoomed out far enough to view all of these images at once, it is impractical for either the client or the server to traverse all of the image files, as there may be a very large number of them. For example, in the regime where individual images occupy 2x2=4 pixels onscreen, tens of thousands or hundreds of thousands of images may be in view. Even if these images support efficient low-resolution access, merely opening and closing 100,000 files involves a large overhead and could be impractical to accomplish on an interactive timescale. It may therefore be desirable to use a cached representation of low-resolution versions of these images, referred to herein as a "montage".

In one or more embodiments, a montage may be a mosaic or collage of all of the images, rendered at low resolution and packed efficiently into a rectangular area, as shown in FIG. 8. Auxiliary metadata, which can be embedded in the montage image file or stored separately, may identify rectangular regions on the montage image with a particular image file. In one embodiment, the montage image itself can be navigated using a zooming and panning interface. When the user zooms in far enough to exhaust the resolution available in the montage version of one or more images within the montage, the metadata for that image may refer the client to one or more individual image files, and the client may use imagery from these image files to render the images at higher resolution.

In one or more embodiments, the overall size of the montage in pixels may be chosen such that its resolution is only exhausted when zooming in to a stage where only a small number of images, which may be referred to herein as a "set" of images, are visible simultaneously. Therefore, access to more than this small number of images at high resolution is preferably not needed at any given time. During subsequent zooming and panning, image streams may be opened and closed as needed to limit the number of high resolution images that are open at any given time.

The above approach to navigating many images of high resolution incurs a limitation: the montage layout is preferably designed for packing efficiency, but the user may want a different arrangement of the images onscreen. Moreover, the user may want to be able to dynamically rearrange the layout of images on the screen.

In one or more embodiments, to enable such rearrangement, we can make use of a graphics rendering technique known as

"texture mapping", which may be implemented in software but is, in general, hardware-accelerated on modern personal computers. Texture mapping allows a portion of a "texture", or source image, to be drawn on the display, optionally rescaling the image, rotating it, and/or performing a three- dimensional perspective transform. Other hardware-accelerated transformations are often supported, including color correction or alteration, full or partial transparency, lighting, occlusion, and coordinate remapping. A low- resolution version of the montage can be used as a "texture", so that when the user is zoomed out, the individual images within the montage can be dynamically remapped in any way, as in FIG. 9. More than one texture map may be used, in which case each texture map may be a montage containing a subset of the images. Transitions between arrangements may or may not be animated. It is noted that rearrangement can take place while the user is zoomed in, but because the rearrangement might result in a new zoomed-in view of an image which was previously off-screen, the new image may initially be very blurry.

In another embodiment, the texture mapping technique may be used only during dynamic rearrangement of images. When the image arrangement is static, software compositing can be used to assemble all or part of a higher-definition rearranged montage on-screen. This software compositing method is especially valuable in combination with the multiresolution rendering techniques described in US Patent application Serial No. 10/790,253, (Applicant reference document 489/2NP) , identified in detail earlier in this disclosure. This method may in effect creates a new "display montage" by rearranging the imagery of the original montage.

Texture mapping may also be used to display high resolution images, but in this case, rather than using textures containing montages of multiple images, textures are used that contain tiles of individual images . This technique is also described in US Patent application Serial No. 10/790,253 (Applicant reference document 489/2NP) .

In one or more embodiments, montage rearrangement may be used to support reorganization of the images without recourse to texture mapping.

In one or more other embodiments, texture mapping, software rendering, or any combination of the two may be used to render imagery in three dimensions instead of on a one- dimensional plane. Dynamic rearrangement in three dimensions is also possible. Three-dimensional applications may include virtual galleries or other walk-through environments as well as virtual books. Virtual books are described herein and still further in Provisional Patent Application Serial No. 60/619,053 which is incorporated herein by reference. FIG. 10 is a block diagram of a computing system 1000 adaptable for use with one or more embodiments of the present invention. In one or more embodiments, central processing unit (CPU) 1002 may be coupled to bus 1004. In addition, bus 1004 may be coupled to random access memory (RAM) 1006, read only memory (ROM) 1008, input/output (I/O) adapter 1010, communications adapter 1022, user interface adapter 1006, and display adapter 1018.

In one or more embodiments, RAM 1006 and/or ROM 1008 may hold user data, system data, and/or programs. I/O adapter 1010 may connect storage devices, such as hard drive 1012, a CD-ROM (not shown) , or other mass storage device to computing system 1000. Communications adapter 1022 may couple computing system 1000 to a local, wide-area, or Internet network 1024. User interface adapter 1016 may couple user input devices, such as keyboard 1026 and/or pointing device 1014, to computing system 1000. Moreover, display adapter 1018 may be driven by CPU 1002 to control the display on display device 1020. CPU 1002 may be any general purpose CPU. It is noted that the methods and apparatus described thus far and/or described later in this document may be achieved utilizing any of the known technologies, such as standard digital circuitry, analog circuitry, any of the known processors that are operable to execute software and/or firmware programs, programmable digital devices or systems, programmable array logic devices, or any combination of the above. One or more embodiments of the invention may also be embodied in a software program for storage in a suitable storage medium and execution by a processing unit.

Although the invention herein has been described with reference to particular embodiments, it is to be understood that these embodiments are merely illustrative of the principles and applications of the present invention. It is therefore to be understood that numerous modifications may be made to the illustrative embodiments and that other arrangements may be devised without departing from the spirit and scope of the present invention as defined by the appended claims .

APPENDIX HEADING

Title: SYSTEM AND METHOD FOR EXACT RENDERING IN A

ZOOMING USER INTERFACE

Inventor: BLAISE HILARY AGUERA Y ARCAS

Field of the invention

The present invention relates generally to graphical zooming user interfaces for computers. More specifically, the invention is a system and method for progressively rendering zoomable visual content in a manner which is both computationally efficient, resulting in good user responsiveness and high frame rates, and exact, in the sense that

I vector drawings, text, and other non-photographic content is ultimately drawn without the resampling which would normally lead to degradation in image quality.

Background of the invention

Most present-day graphical computer user interfaces (GUIs) are designed using visual components of fixed spatial scale. However, it was recognized from the birth of the field _j of computer graphics that visual components could be represented and manipulated in such a way that they do not have a fixed spatial scale on the display, but can be zoomed in or out. The desirability of zoomable components is obvious in many application domains; to name only a few: viewing maps, browsing through large heterogeneous text layouts such as newspapers, viewing albums of digital photographs, and working with visualizations of large data sets. Even when viewing ordinary documents, such as spreadsheets and reports, it is often useful to be able to glance at a document overview, then zoom in on an area of interest. Many modern computer applications include zoomable components, such as Microsoft® Word ® and other Office ® products (Zoom under the View menu), Adobe ® Photoshop ®, Adobe ® Acrobat ®, QuarkXPress ®, etc. In most cases, these applications allow zooming in and out of documents, but not necessarily zooming in and out of the visual components of the applications themselves. Further, zooming is normally a peripheral aspect of the user's interaction with the software, and the zoom setting is only modified occasionally. Although continuous panning over a document is standard (i.e., using scrollbars or the cursor to translate the viewed document left, right, up or down), the ability to zoom continuously is almost invariably absent. In a more generalized zooming framework, any kind of visual content could be zoomed, and zooming would be as much a part of the user's experience as panning. Ideas along these lines made appearances as futuristic computer user interfaces in many movies even as early as the 1960s¹; recent movies continue the trend². A number of continuously zooming interfaces have been conceived and/or developed, from the 1970s through the present.³ In 1991, some of these ideas were formalized in U.S. Patent 5,341,466 by Kenneth Perlin and Jacob Schwartz At New York University ("Fractal Computer User Centerface with Zooming Capability"). The prototype zooming user interface developed by Perlin and co-workers, Pad, and its successor, Pad++, have undergone some development since⁴. To my knowledge, however, no major application based on a full ZUI (Zooming User Interface) has yet appeared on the mass market, due

¹ e.g. Stanley Kubrick's 2001: A Space Odyssey, Turner Entertainment Company, a Time Warner company (1968).

² e.g. Steven Spielberg's Minority Report, 20^th Century Fox and Dreamworks Pictures (2002).

³ An early appearance is W.C. Donelson, Spatial Management of Information, Proceedings of Computer Graphics SIGGRAPH (1978), ACM Press, p. 203-9. A recent example is 2anvas.com, which launched in the summer of 2002.

⁴ Perlin describes subsequent developments at http://mrl.nyu.edu/projects/zui/. in part to a number of technical shortfalls, one of which is addressed in the present invention.

Summary of the invention

The present invention embodies a novel idea on which a newly developed zooming user interface framework (hereafter referred to by its working name, Voss) is based. Voss is more powerful, more responsive, more visually compelling and of more general utility than its predecessors due to a number of innovations in its software architecture. This patent is specifically about one of the innovations in Voss's approach to object tiling and rendition for non-photographic content.

A multiresolution visual object is normally rendered from a discrete set of sampled images at different resolutions or levels of detail (an "image pyramid"). In some technological contexts where continuous zooming is used, such as 3D gaming, two adjacent levels of detail which bracket the desired level of detail are blended together to render each frame, because it is not normally the case that the desired level of detail is exactly one of those represented by the discrete set. Such techniques are sometimes referred to as trilinear filtering or mipmapping. The resulting interpolated renditions are usually satisfactory for photographic content, but not satisfactory for content defined in terms of geometric primitives, such as text, graphs, drawings, and in short most of the visual content with which^' users interact outside gaming or multimedia applications. This is because blending levels of detail necessarily introduces blurring and aliasing effects.

The ideal solution to this problem would be to render each frame's view exactly in real- time, without relying on a discrete set of pre-existing resolutions. While in principle this would allow perfect rendition of each frame, it may not be practical, as too much time is often needed to render each frame at high quality from scratch. Accordingly, the frame rate would be so greatly reduced that this method would become unattractive for interactive applications.

The present invention involves a hybrid strategy, in which an image pyramid- based approach with a discrete number of levels of detail is typically used during rapid zooming and panning, but when the view stabilizes sufficiently, an "exact view" is rendered and blended in over several frames. Because the human visual system is insensitive to fine detail in the visual content while it is still in motion, this hybrid strategy can produce the illusion of continuous "perfect rendering" with a fraction of the computational burden.

An objective of the present invention is to allow text, plots, charts, drawings, maps, and any other vector-based content (also referred to here as vectorial content) to be rendered in a zooming user interface without degradation in ultimate image quality relative to the highest possible quality rendition in an ordinary GUI.

A further objective of the present invention is to allow arbitrarily large or complex vector-based content to be viewed in a zooming user interface.

A further objective of the present invention is to enable near-immediate viewing of arbitrarily complex vector-based visual content, even if this content is ultimately represented using a very large amount of data, and even if these data are stored at a remote location and shared over a low-bandwidth network. A further objective of the present invention is to allow the user to zoom arbitrarily far in on vectorial content while maintaining a crisp, unblurred view of the content and maintaining interactive frame rates.

A further objective of the present invention is to allow the user to zoom arbitrarily far out to get an overview of complex vectorial content, in the process both preserving the overall appearance of the content and maintaining interactive frame rates.

A further objective of the present invention is to minimize the user's perception of transitions between levels of detail or rendition qualities during interaction.

A further objective of the present invention is to allow the graceful degradation of image quality by blurring when exact renditions of certain parts of the vectorial content cannot yet be made either because the information needed to render them is unavailable, or because an exact rendition is still in progress.

A further objective of the present invention is to gracefully increase image quality by sharpening when exact renditions of certain parts of the vectorial content first become available.

These and other objectives of the present invention will become apparent to those skilled in the art from a review of the specification that follows.

Prior art: multiresohition imagery and zooming user interfaces

From a technical perspective, zooming user interfaces are a generalization of the usual concepts underlying visual computing, allowing a number of limitations inherent in the classical user/computer/document interaction model to be overcome. One such limitation is on the size of a document that can be "opened" from a computer application, as traditionally the entirety of such a document must be "loaded" before viewing or editing can begin. Even when the amount of short-term memory (normally RAM) available to a particular computer is large, this limitation is felt, because all of the document information must be transferred to short-term memory from some repository (e.g. from a hard disk, or across a network) during opening; limited bandwidth can thus make the delay between issuing an "open" command and being able to begin viewing or editing unacceptably long.

Still digital images both provide an excellent example of this problem, and an illustration of how the computer science community has moved beyond the standard model for visual computing in overcoming the problem. Table 1 below shows download times at different bandwidths for typical compressed sizes of a variety of different image types, from the smallest useful images (thumbnails, which are sometimes used as icons) to the largest in common use today. Shaded boxes indicate images sizes for which interactive browsing is difficult or impossible at a particular connection speed.

Table 1.

*Note that these figures represent realistic compressed sizes at intermediate quality, not raw image data. Specifically, we assume 1 bit/pixel for the sizes up to 40MB, and 0.25 bits/pixel for the larger images, which are generally more compressible.

**Local wireless networks may be considerably faster; this figure refers to wireless wide- area networks of the type often used for wireless PDAs.

Nearly every image on the Web at present is under 10OK (0.1MB), because most users are connected to the Web at DSL or lower bandwidth, and larger images would take too long to download. Even in a local setting, on a typical user's hard drive, it is unusual to encounter images larger than 500K (0.5MB). That larger (that is, more detailed) images would often be useful is attested to by the fact that illustrated books, atlases, maps, newspapers and artworks in the average home include a great many images which, if digitized at full resolution, would easily be tens of megabytes in size.

Several years ago the dearth of large images was largely due to a shortage of storage space in repositories, but advances in hard drive technology, the ease of burning CDROMs, and the increasing prevalence of large networked servers has made repository space no longer the limiting factor. The main bottleneck now is bandwidth, followed by short-term memory (i.e. RAM) space. The problem is in reality much worse than suggested by the table above, because in most contexts the user is interested not only in viewing a single image, but an entire collection of images; if the images are larger than some modest size, then it becomes impractical to wait while one image downloads after another.

Modern image compression standards, such as JPEG2000⁵, are designed to address precisely this problem. Rather than storing the image contents in a linear fashion (that is, in a single pass over the pixels, normally from top to bottom and left to right), they are based on a multiresolution decomposition. The image is first resized to a hierarchy of resolution scales, usually in factors of two; for example, a 512x512 pixel image is resized to be 256x256 pixels, 128x128, 64x64, 32x32, 16x16, 8x8, 4x4, 2x2, and 1x1. Obviously the fine details are only captured at the higher resolutions, while the broad strokes are captured — using a much smaller amount of information — at the low resolutions. This is why the differently-sized images are often called levels of detail, or LODs for short. At first glance it may seem as if the storage requirements for this series of differently-sized images might be greater than for the high-resolution image alone, but in fact this is not the case: a low-resolution image serves as a "predictor" for the next higher resolution. This allows the entire image hierarchy to be encoded very efficiently — more efficiently, in fact, than would usually be possible with a non- hierarchical representation of the high-resolution image alone.

If one imagines that the sequence of multiresolution versions of the image is stored in order of increasing size in the repository, then a natural consequence is that as the image is transferred across the data link to the cache, the user can obtain a low- resolution overview of the entire image very rapidly; finer and finer details will then "fill in" as the transmission progresses. This is known as "incremental" or "progressive"

' http://www.jpeg.org/JPEG2000.html transmission. Properly implemented, it has the property that any image at all — no matter how large — can be viewed in its spatial entirety (though not in its full detail) almost immediately, even if the bandwidth of the connection to the repository is very modest. Although the ultimate amount of time needed to download the image in full detail remains the same, the order in which this information is sent has been changed such that the large-scale features of an image are transmitted first; this is much more helpful to the user than transmitting pixel information at full detail and in "reading order", from top to bottom and left to right.

Hidden in this advance is a new concept of what it means to "open" an image which does not fit into the classical application model described in the previous section. We are now imagining that the user is able to view an image as it downloads, a concept whose usefulness arises from the fact that the broad strokes of the image are available very soon after download begins, and perhaps well before downloading is finished. It therefore makes no sense for the application to force the user to wait while downloading finishes; the application should instead display what it can of the document immediately, and not cause delays or unnecessarily interrupt its interaction with the user while it continues downloading the details "in the background". This requires that the application do more than one task at once, which is termed multithreading. Note that most modern web browsers use multithreading in a slightly different capacity: to simultaneously download images on a web page, while displaying the web page's textual layout and remaining responsive to the user in the meantime. In this case we can think about the embedded images themselves as being additional levels of detail, which enhance the basic level of detail comprised of the web page's bare-bones text layout. This analogy will prove important later.

Clearly hierarchical image representation and progressive transmission of the image document are an advance over linear representation and transmission. However, a further advance becomes important when an image, at its highest level of detail, has more information (i.e. more pixels) than the user's display can show at once. With current display technology, this is always the case for the bottom four kinds of images in the Table 1, but smaller displays (such as PDA screens) may not be able to show even the bottom eight. This makes a zooming feature imperative for large images: it is useless to view an image larger than the display if it is not possible to zoom in to discover the additional detail.

When a large image begins to download, presumably the user is viewing it in its entirety. The first levels of detail are often so coarse that the displayed image will appear either blocky or blurry, depending on the kind of "interpolation" used to spread the small amount of information available over a large display area. The image will then refine progressively, but at a certain point it will "saturate" the display with information, making any additional detail downloaded have no visible effect. It therefore makes no sense to continue the download beyond this point at all. Suppose, however, that the user decides to zoom in to see a particular area in much more detail, making the effective projected size of the image substantially larger than the physical screen. Then, in the downloading model described in the previous section, higher levels of detail would need to be downloaded, in increasing order. The difficulty is that every level of detail contains approximately four times the information of the previous level of detail; as the user zooms in, the downloading process will inevitably lag behind. Worse, most of the information being downloaded is wasted, as it consists of high-resolution detail outside the viewing area. Clearly, what is needed is the ability to download only selected parts of certain levels of detail — that is, only the detail which is visible should be downloaded. With this alteration, an image browsing system can be made that is not only capable of viewing images of arbitrarily large size, but is also capable of navigating (i.e. zooming and panning) through such images efficiently at any level of detail.

Previous models of document access are by nature serial, meaning that the entirety of an information object is transmitted in linear order. This model, by contrast, is random-access, meaning that only selected parts of the information object are requested, and these requests may be made in any order and over an extended period of time, i.e. over the course of a viewing session. The computer and the repository now engage in an extended dialogue, paralleling the user's "dialogue" with the document as viewed on the display.

To make random access efficient, it is convenient (though not absolutely required) to subdivide each level of detail into a grid, such that a grid square, or tile, is the basic unit of transmission. The size in pixels of each tile can be kept at or below a constant size, so that each increasing level of detail contains about four times as many tiles as the previous level of detail. Small tiles may occur at the edges of the image, as its dimensions may not be an exact multiple of the nominal tile size; also, at the lowest levels of detail, the entire image will be smaller than a single nominal tile. The resulting tiled image pyramid is shown in Figure 2. Note that the "tip" of the pyramid, where the downscaled image is smaller than a single tile, looks like the untiled image pyramid of Figure 1. The JPEG2000 image format includes all of the features just described for representing tiled, multiresolution and random-access images.

Thus far we have considered only the case of static images, but the same techniques, with application-specific modifications, can be applied to nearly any type of visual document. This includes (but is not limited to) large texts, maps or other vector graphics, spreadsheets, video, and mixed documents such as web pages. Our discussion thus far has also implicitly considered a viewing-only application, i.e. one in which only the actions or methods corresponding to opening and drawing need be defined. Clearly other methods may be desirable, such as the editing commands implemented by paint programs for static images, the editing commands implemented by word processors for texts, etc. Yet consider the problem of editing a text: the usual actions, such as inserting typed input, are only relevant over a certain range of spatial scales relative to the underlying document. If we have zoomed out so far that the text is no longer legible, then interactive editing is no longer possible. It can also be argued that interactive editing is no longer possible if we have zoomed so far in that a single letter fills the entire screen. Hence a zooming user interface may also restrict the action of certain methods to their relevant levels of detail.

When a visual document is not represented internally as an image, but as more abstract data — such as text, spreadsheet entries, or vector graphics — it is necessary to generalize the tiling concept introduced in the previous section. For still images, the process of rendering a tile, once obtained, is trivial, since the information (once decompressed) is precisely the pixel-by-pixel contents of the tile. The speed bottleneck, moreover, is normally the transfer of compressed data to the computer (e.g. downloading). However, in some cases the speed bottleneck is in the rendition of tiles; the information used to make the rendition may already be in the cache, or may be very compact, so that downloading no longer causes delay. Hence we will refer to the production of a finished, fully drawn tile in response to a "tile drawing request" as tile rendition, with the understanding that this may be a slow process. Whether it is slow because the required data are substantial and must be downloaded over a slow connection or because the rendition process is itself computationally intensive is irrelevant.

A complete zooming user interface combines these ideas in such a way that the user is able to view a large and possibly dynamic composite document, whose sub- documents are usually spatially non-overlapping. These sub-documents may in turn contain (usually non-overlapping) sub-sub-documents, and so on. Hence documents form a tree, a structure in which each document has pointers to a collection of sub- documents, or children, each of which is contained within the spatial boundary of the parent document. We call each such document a node, borrowing from programming terminology for trees. Although drawing methods are defined for all nodes at all levels of detail, other methods corresponding to application-specific functionality may be defined only for certain nodes, and their action may be restricted only to certain levels of detail. Hence some nodes may be static images which can be edited using painting-like commands, while other nodes may be editable text, while other nodes may be Web pages designed for viewing and clicking. All of these can coexist within a common large spatial environment — a "supernode" — which can be navigated by zooming and panning.

There are a number of immediate consequences for a well-implemented zooming user interface, including: - - It is able to browse very large documents without downloading them in their entirety from the repository; thus even documents larger than the available short-term memory, or whose size would otherwise be prohibitive, can be viewed without limitation.

- - Content is only downloaded as needed during navigation, resulting in optimally efficient use of the available bandwidth.

- - Zooming and panning are spatially intuitive operations, allowing large amounts of information to be organized in an easily understood way.

- - Since "screen space" is essentially unlimited, it is not necessary to minimize windows, use multiple desktops, or hide windows behind each other to work on multiple documents or views at once. Instead, documents can be arranged as desired, and the user can zoom out for an overview of all of them, or in on particular ones. This does not preclude the possibility of rearranging the positions (or even scales) of such documents to allow any combination of them to be visible at a useful scale on the screen at the same time. Neither does it necessarily preclude combining zooming with more traditional approaches.

- - Because zooming is an intrinsic aspect of navigation, content of any kind can be viewed at an appropriate spatial scale.

- - High-resolution displays no longer imply shrinking text and images to small (sometimes illegible) sizes; depending on the level of zooming, they either allow more content to be viewed at once, or they allow content to be viewed at normal size and higher fidelity.

- - The vision impaired can easily navigate the same content as normally sighted people, simply by zooming in farther. These benefits are particularly valuable in the wake of the explosion in the amount of information available to ordinary computers connected to the Web. A decade ago, the kinds of very large documents which a ZUI enables one to view were rare, and moreover such documents would have taken up so much space that very few would have fit on the repositories available to most computers (e.g., a 40MB hard disk). Today, however, we face a very different situation: servers can easily store vast documents and document hierarchies, and make this information available to any client connected to the Web. Yet the bandwidth of the connection between these potentially vast repositories and the ordinary user is far lower than the bandwidth of the connection to a local hard disk. This is precisely the scenario in which the ZUI confers its greatest advantages over conventional graphical user interfaces.

Detailed description of the invention

In the following we use two variable names,/and g. /refers to the sampling density of a tile relative to the display. This sampling density or "relative level of detail", which is zoom-dependent, is given by/= (linear tile size in tile pixels)/(projected tile length on the screen measured in screen pixels). If/= 1 , then tile pixels are 1 : 1 with screen pixels; if/ =10, then the information in the tile is far more detailed than the display can show (10* 10=100 tile pixels fit inside a single screen pixel); and if/=0.1 then the tile is coarse relative to the display (every tile pixel must be "stretched", or interpolated, to cover 10* 10=100 display pixels). Tiling granularity, which we will write as the variable g, is defined as the ratio of the linear tiling grid size at a higher LOD to the linear tiling grid size at the next lower LOD. In the JPEG2000 example considered in the previous section, g=2: conceptually, each tile "breaks up" into 2x2=4 tiles at the next higher LOD. Granularity 2 is by far the most common in similar applications, but in the present context g may take other values.

Note that the level of detail scheme described thus far involves a fixed, discrete set of LODs at different scales separated by factors of the granularity g. The image drawn on any region of the display is then usually a weighted blend between two levels of detail, one of which is somewhat finer than the display resolution (/>1) and one of which is somewhat coarser (/<1) (although more generally, the present invention also applies if an image region is transiently a single undersampled LOD, or a blend between more than two LODs). This scheme, unmodified, produces visually compelling results for content defined by sampled images, such as digital photographs or video. However, much of the content users interact with regularly is instead defined vectorially; this includes text, as well as combinations of lines, rectangles, circles, and other vector primitives. What is special about vectorial graphic elements is that they involve mathematically exact edges; control over the values of single display pixels is then generally necessary to produce an accurate result. The same is true of digital fill patterns, such as a checkerboard of black and white pixels. This kind of visual content is not well reproduced by the blending methods described thus far. Examples of the resulting visual artifacts are shown in Figure 3. These artifacts include blurriness, unexpected changes in blurriness with changing scale, and Moire patterns that shift during zooms. The images shown all have small pixel dimensions, and are blown up to show clearly what happens at the pixel level. Figure 3 (a) is an example of text rendered in pure black and white (without antialiasing); (b) is the same text rendered with antialiasing; (c) is a pattern of closely spaced lines; (d) is a checkerboard fill pattern of alternating black and white pixels. The bottom row of images shows the LOD blending effects on the exact images in the top row. Clearly the edge blurring of the blended text in (a) is inferior to the result of pixel-precise antialiasing of the top image in (b). If, on the other hand, the text begins properly antialiased, LOD blending further blurs it, again resulting in a suboptimal image. Hence (a) and (b) do not produce terrible results, but the exact version is clearly better in each case. The other two images produce more serious errors, including spurious Moire patterns, shimmer, speckling, and blurring.

The present invention defines a programmatic tile-drawing method for nodes with this kind of content, allowing them to render special exact tiles that map precisely to display pixels, i.e. with/=l exactly. Simply calling this method at every frame refresh during a zoom or pan would in general be far too slow; the exact drawing method may easily take several frames to execute, and in some cases could be much slower still. This is not specific to the exact drawing method; the "ordinary" drawing method may also be slow, in particular as it may involve downloading information at low bandwidth, or carrying out an extended calculation. However, the targets of the "ordinary" drawing method are tiles which remain relevant over an entire range of pans and zooms, making it possible to implement queueing and resolution fallback schemes which allow smooth navigation and graceful image quality degradation even if tile rendition is slow and asynchronous. By contrast, exact tiles are of perfect quality, but are specific to a particular view. We therefore adopt a hybrid approach, in which the strengths of the discrete LOD representation are leveraged to enable responsive navigation even under unfavorable circumstances (i.e. low bandwidth or, more generally, slow tile-drawing methods), while the exact drawing method is used for visual accuracy. This is done by requesting exact tiles using the exact drawing method when the user is at a standstill; these requests are queued after all relevant fixed levels of detail. Thus exact tiles are a final stage of display refinement. Exact tiles are "throwaways", in the sense that they become unusable when the user pans or zooms, since it is unlikely that the user will pan or zoom back to precisely the old view. Note that not only zooming, but also panning, invalidates exact tiles, because for a tile to be exact it is not only necessary for the scale/ to be exactly one; it is also necessary for the tile pixels to align exactly with the display pixels. Exact alignment is therefore lost during a pan, unless the pan is by an integral number of display pixels. Panning and zooming therefore discard any cached exact tiles; only when the view comes to a standstill are requests for new exact tiles queued. When exact tiles become available, they are blended into the display "on top of (i.e. obscuring) the underlying "inexact" tiles. The blending occurs over time, avoiding sudden changes in sharpness.

The overall effect of the present invention is that navigation performance remains unchanged when panning or zooming over large volumes of text or other vector graphics; during such navigation, the rendered image is less than ideal, but because the image is in motion, in the vast majority of cases the degradation is not noticeable. On coming to a standstill, exact tiles are requested and blended in foveally as they arrive, resulting in a sharpening of the image, beginning near the center of the display and spreading outward. Spatial and temporal blending can generally make the progress of this sharpening difficult to detect by eye, but the resulting image is of perfect quality, i.e. unaffected by the blurring and blending operations that allow a ZUI architecture based on the continuous interpolation between discrete levels of detail to operate.

Title: SYSTEM AND METHOD FOR FOVEATED, SEAMLESS,

PROGRESSIVE RENDERING IN A ZOOMING USER INTERFACE

Inventor: BLAISE HILARY AGUERA Y ARCAS

Field of the Invention

The present invention relates generally to zooming user interfaces (ZUIs) for computers. More specifically, the invention is a system and method for progressively rendering arbitrarily large or complex visual content in a zooming environment while maintaining good user responsiveness and high frame rates. Although it is necessary in some situations to temporarily degrade the quality of the rendition to meet these goals, the present invention largely masks this degradation by exploiting well-known properties of the human visual system.

Background of the invention

Most present-day graphical computer user interfaces (GUIs) are designed using visual components of fixed spatial scale. However, it was recognized from the birth of the field of computer graphics that visual components could be represented and manipulated in such a way that they do not have a fixed spatial scale on the display, but can be zoomed in or out. The desirability of zoomable components is obvious in many application domains; to name only a few: viewing maps, browsing through large heterogeneous text layouts such as newspapers, viewing albums of digital photographs, and working with visualizations of large data sets. Even when viewing ordinary documents, such as spreadsheets and reports, it is often useful to be able to glance at a document overview, then zoom in on an area of interest. Many modern computer applications include zoomable components, such as Microsoft® Word ® and other Office ® products (Zoom under the View menu), Adobe ® Photoshop ®, Adobe ® Acrobat ®, QuarkXPress ®, etc. In most cases, these applications allow zooming in and out of documents, but not necessarily zooming in and out of the visual components of the applications themselves. Further, zooming is normally a peripheral aspect of the user's interaction with the software, and the zoom setting is only modified occasionally. Although continuous panning over a document is standard (i.e., using scrollbars or the cursor to translate the viewed document left, right, up or down), the ability to zoom continuously is almost invariably absent. In a more generalized zooming framework, any kind of visual content could be zoomed, and zooming would be as much a part of the user's experience as panning. Ideas along these lines made appearances as futuristic computer user interfaces in many movies even as early as the 1960s ; recent movies continue the trend . A number of continuously zooming interfaces have been conceived and/or developed, from the 1970s through the present.³ In 1991, some of these ideas were formalized in U.S. Patent 5,341,466 by Kenneth Perlin and Jacob Schwartz At New York University ("Fractal Computer User Centerface with Zooming Capability"). The prototype zooming user interface developed by Perlin and co-workers, Pad, and its successor, Pad++, have

¹ e.g. Stanley Kubrick's 2001: A Spaae Cdyssey, Turner Entertainment Company, a Time Warner company (1968).

³ An early appearance is W.G Donelson, Spatial Mamψmt cf ^'Iηformtion, Proceedings of Computer Graphics SIGGRAPH (1978), ACM Press, p. 203-9. A recent example is Zanvas.com, which launched in the summer of 2002. undergone some development since⁴. To my knowledge, however, no major application based on a full ZUI (Zooming User Interface) has yet appeared on the mass market, due in part to a number of technical shortfalls, one of which is addressed in the present invention.

Summary of the invention

The present invention embodies a novel idea on which a newly developed zooming user interface framework (hereafter referred to by its working name, Voss) is based. Voss is more powerful, more responsive, more visually compelling and of more general utility than its predecessors due to a number of innovations in its software architecture. This patent is specifically about Voss's approach to object tiling, level-of-detail blending, and render queueing.

A multiresolution visual object is normally rendered from a discrete set of sampled images at different resolutions or levels of detail (an image pyramid). In some technological contexts where continuous zooming is used, such as 3D gaming, two adjacent levels of detail which bracket the desired level of detail are blended together to render each frame, because it is not normally the case that the desired level of detail is exactly one of those represented by the discrete set. Such techniques are sometimes referred to as trilinear filtering or mipmapping. In most cases, mipmapped image pyramids are premade, and kept in short-term memory (i.e. RAM) continuously during the zooming operation; thus any required level of detail is always available. In some advanced 3D rendering scenarios, the image pyramid must itself be rendered within an

⁴ Perlin describes subsequent developments at http://mrl.nyu.edu/projects/zui/. animation loop; however, in these cases the complexity of this first rendering pass must be carefully controlled, so that overall frame rate does not suffer.

In the present context, it is desirable to be able to navigate continuously by zooming and panning through an unlimited amount of content of arbitrary visual complexity. This content may not render quickly, and moreover it may not be available immediately, but need to be downloaded from a remote location over a low-bandwidth connection. It is thus not always possible to render levels of detail (first pass) at a frame rate comparable to the desired display frame rate (second pass). Moreover it is not in general possible to keep pre-made image pyramids in memory for all content; image pyramids must be rendered or re-rendered as needed, and this rendering may be slow compared to the desired frame rate.

The present invention involves both strategies for prioritizing the (potentially slow) rendition of the parts of the image pyramid relevent to the current display, and stategies for presenting the user with a smooth, continuous perception of the rendered content based on partial information, i.e. only the currently available subset of the image pyramid. In combination, these strategies make near-optimal use of the available computing power or bandwidth, while masking, to the extent possible, any image degradation resulting from incomplete image pyramids. Spatial and temporal blending are exploited to avoid discontinuities or sudden changes in image sharpness.

An objective of the present invention is to allow sampled (i.e. "pixellated") visual content to be rendered in a zooming user interface without degradation in ultimate image quality relative to conventional trilinear interpolation. A further objective of the present invention is to allow arbitrarily large or complex visual content to be viewed in a zooming user interface.

A further objective of the present invention is to enable near-immediate viewing of arbitrarily complex visual content, even if this content is ultimately represented using a very large amount of data, and even if these data are stored at a remote location and shared over a low-bandwidth network.

A further objective of the present invention is to allow the user to zoom arbitrarily far in on visual content while maintaining interactive frame rates.

A further objective of the present invention is to allow the user to zoom arbitrarily far out to get an overview of complex visual content, in the process both preserving the overall appearance of the content and maintaining interactive frame rates.

A further objective of the present invention is to allow the graceful degradation of image quality by continuous blurring when detailed visual content is as yet unavailable, either because the information needed to render it is unavailable, or because rendition is still in progress.

A further objective of the present invention is to gracefully increase image quality by gradual sharpening when renditions of certain parts of the visual content first become available.

These and other objectives of the present invention will become apparent to those skilled in the art from a review of the specification that follows. Prior art: multiresolution imagery and zooming user interfaces

Still digital images both provide an excellent example of this problem, and an illustration of how the computer science community has moved beyond the standard model for visual computing in overcoming the problem. Table 1 below shows download times at different bandwidths for typical compressed sizes of a variety of different image types, from the smallest useful images (thumbnails, which are sometimes used as icons) to the largest in common use today. Shaded boxes indicate images sizes for which interactive browsing is difficult or impossible at a particular connection speed. Table 1.

time to download image description typical size LAN DSL wireless/modem * * (MB, (lOMbit) (500KMt) (40Kbit) compressed*) thumbnail image 0.001 < 1 msec 0.02 sec 0.2 sec web-resolution snapshot 0.025 0.02 sec 0.4 sec 5 sec medium-resolution image 0.1 0.08 sec 1.6 sec 20 sec photo-quality image 0.5 0.4 sec 8 sec 1.7 mm full-page magazine img. 2.5 2 sec 40 sec 8.3 mm fine art or map scan 10 8 sec 2.7 mm 33.3 mm road atlas of Wash., DC 40 32 sec 10.7 min 2.2 hr small aerial photomontage 100 1.3 min 26.7 min 5.6 hr large aerial photomontage 1000 13.3 mm 4.4 hr 2.3 days night sky, 6" telescope 10000 2.2 hr 1.9 days 23.1 days resolution

Nearly every image on the Web at present is under IOOK (0.1MB), because most users are connected to the Web at DSL or lower bandwidth, and larger images would take too long to download. Even in a local setting, on a typical user's hard drive, it is unusual to encounter images larger than 500K (0.5MB). That larger (that is, more detailed) images would often be useful is attested to by the fact that illustrated books, atlases, maps, newspapers and artworks in the average home include a great many images which, if digitized at full resolution, would easily be tens of megabytes in size.

Modern image compression standards, such as JPEG2000⁵, are designed to address precisely this problem. Rather than storing the image contents in a linear fashion (that is, in a single pass over the pixels, normally from top to bottom and left to right), they are based on a multiresolution decomposition. The image is first resized to a hierarchy of resolution scales, usually in factors of two; for example, a 512x512 pixel image is resized to be 256x256 pixels, 128x128, 64x64, 32x32, 16x16, 8x8, 4x4, 2x2, and 1x1. Obviously the fine details are only captured at the higher resolutions, while the broad strokes are captured — using a much smaller amount of information — at the low resolutions. This is why the differently-sized images are often called levels of detail, or LODs for short. At first glance it may seem as if the storage requirements for this series of differently-sized images might be greater than for the high-resolution image alone, but

' http://www.jpeg.org/JPEG2000.html in fact this is not the case: a low-resolution image serves as a "predictor" for the next higher resolution. This allows the entire image hierarchy to be encoded very efficiently — more efficiently, in fact, than would usually be possible with a non- hierarchical representation of the high-resolution image alone.

If one imagines that the sequence of multiresolution versions of the image is stored in order of increasing size in the repository, then a natural consequence is that as the image is transferred across the data link to the cache, the user can obtain a low- resolution overview of the entire image very rapidly; finer and finer details will then "fill in" as the transmission progresses. This is known as incremental or progressive transmission. Properly implemented, it has the property that any image at all — no matter how large — can be viewed in its spatial entirety (though not in its full detail) almost immediately, even if the bandwidth of the connection to the repository is very modest. Although the ultimate amount of time needed to download the image in full detail remains the same, the order in which this information is sent has been changed such that the large-scale features of an image are transmitted first; this is much more helpful to the user than transmitting pixel information at full detail and in "reading order", from top to bottom and left to right.

Clearly hierarchical image representation and progressive transmission of the image document are an advance over linear representation and transmission. However, a

further advance becomes important when an image, at its highest level of detail, has more information (i.e. more pixels) than the user's display can show at once. With current display technology, this is always the case for the bottom four kinds of images in the Table 1, but smaller displays (such as PDA screens) may not be able to show even the bottom eight. This makes a zooming feature imperative for large images: it is useless to view an image larger than the display if it is not possible to zoom in to discover the additional detail.

When a large image begins to download, presumably the user is viewing it in its entirety. The first levels of detail are often so coarse that the displayed image will appear either blocky or blurry, depending on the kind of interpolation used to spread the small amount of information available over a large display area. The image will then refine progressively, but at a certain point it will "saturate" the display with information, making any additional detail downloaded have no visible effect. It therefore makes no sense to continue the download beyond this point at all. Suppose, however, that the user decides to zoom in to see a particular area in much more detail, making the effective projected size of the image substantially larger than the physical screen. Then, in the downloading model described in the previous section, higher levels of detail would need to be downloaded, in increasing order. The difficulty is that every level of detail contains approximately four times the information of the previous level of detail; as the user zooms in, the downloading process will inevitably lag behind. Worse, most of the information being downloaded is wasted, as it consists of high-resolution detail outside the viewing area. Clearly, what is needed is the ability to download only selected parts of certain levels of detail — that is, only the detail which is visible should be downloaded. With this alteration, an image browsing system can be made that is not only capable of viewing images of arbitrarily large size, but is also capable of navigating (i.e. zooming and panning) through such images efficiently at any level of detail.

Previous models of document access are by nature serial, meaning that the entirety of an information object is transmitted in linear order. This model, by contrast, is random-access, meaning that only selected parts of the information object are requested, and these requests may be made in any order and over an extended period of time, i.e. over the course of a viewing session. The computer and the repository now engage in an extended dialogue, paralleling the user's "dialogue" with the document as viewed on the display. , To make random access efficient, it is convenient (though not absolutely required) to subdivide each level of detail into a grid, such that a grid square, or tile, is the basic unit of transmission. The size in pixels of each tile can be kept at or below a constant size, so that each increasing level of detail contains about four times as many tiles as the previous level of detail. Small tiles may occur at the edges of the image, as its dimensions may not be an exact multiple of the nominal tile size; also, at the lowest levels of detail, the entire image will be smaller than a single nominal tile. The resulting tiled image pyramid is shown in Figure 2. Note that the "tip" of the pyramid, where the downscaled image is smaller than a single tile, looks like the untiled image pyramid of Figure 1. The JPEG2000 image format includes all of the features just described for representing tiled, multiresolution and random-access images.

When a visual document is not represented internally as an image, but as more abstract data — such as text, spreadsheet entries, or vector graphics — it is necessary to generalize the tiling concept introduced in the previous section. For still images, the process of rendering a tile, once obtained, is trivial, since the information (once decompressed) is precisely the pixel-by-pixel contents of the tile. The speed bottleneck, moreover, is normally the transfer of compressed data to the computer (e.g. downloading). However, in some cases the speed bottleneck is in the rendition of tiles; the information used to make the rendition may already be stored locally, or may be very compact, so that downloading no longer causes delay. Hence we will refer to the production of a finished, fully drawn tile in response to a tile drawing request as tile rendition, with the understanding that this may be a slow process. Whether it is slow because the required data are substantial and must be downloaded over a slow connection or because the rendition process is itself computationally intensive is irrelevant.

There are a number of immediate consequences for a well-implemented zooming user interface, including:

- - It is able to browse very large documents without downloading them in their entirety from the repository; thus even documents larger than the available short-term memory, or whose size would otherwise be prohibitive, can be viewed without limitation.

- - The vision impaired can easily navigate the same content as normally sighted people, simply by zooming in farther.

- These benefits are particularly valuable in the wake of the explosion in the amount of information available to ordinary computers connected to the Web. A decade ago, the kinds of very large documents which a ZUI enables one to view were rare, and moreover such documents would have taken up so much space that very few would have fit on the repositories available to most computers (e.g., a 40MB hard disk). Today, however, we face a very different situation: servers can easily store vast documents and document hierarchies, and make this information available to any client connected to the Web. Yet the bandwidth of the connection between these potentially vast repositories and the ordinary user is far lower than the bandwidth of the connection to a local hard disk. This is precisely the scenario in which the ZUI confers its greatest advantages over conventional graphical user interfaces. Detailed description of the invention

For a particular view of a node at a certain desired resolution, there is some set of tiles, at a certain LOD₅ which would need to be drawn for the rendition to include at least one sample per screen pixel. Note that views do not normally fall precisely at the resolution of one of the node's LODs, but rather at an intermediate resolution between two of them. Hence, ideally, in a zooming environment the client generates the set of visible tiles at both of these LODs — -just below and just above the actual resolution — and uses some interpolation to render the pixels on the display based on this information. The most common scenario is linear interpolation, both spatially and between levels of detail; in the graphics literature, this is usually referred to as trilinear interpolation. Closely related techniques are commonly used in 3D graphics architectures for texturing.⁶

Unfortunately, downloading (or programmatically rendering) tiles is often slow, and especially during rapid navigation, not all the necessary tiles will be available at all times. The innovations in this patent therefore focus on a combination of strategies for presenting the viewer with a spatially and temporally continuous and coherent image that approximates this ideal image, in an environment where tile download or creation is happening slowly and asynchronously.

In the following we use two variable names, /and g. /refers to the sampling density of a tile relative to the display, defined in #1. Tiling granularity, which we will write as the variable g, is defined as the ratio of the linear tiling grid size at a some LOD to the linear tiling grid size at the next lower LOD. This is in general presumed to be

⁶ S. L. Tanimoto and T. Pavlidis, A hierardokd data structure for picture processing, Computer Graphics and Image Processing, Vol. 4, p. 104-119 (1975); Lance Williams, Pyrarridal Pararrεtrics, ACM SIGGRAPH Conference Proceedings (1982). constant over different levels of detail for a given node, although none of the innovations presented here rely on constant g. In the JPEG2000 example considered in the previous section, g=2: conceptually, each tile "breaks up" into 2x2=4 tiles at the next higher LOD. Granularity 2 is by far the most common in similar applications, but in the present context g may take other values.

1. Level of detail tile request queuing. We first introduce a system and method for queuing tile requests that allows the client to bring a composite image gradually "into focus", by analogy with optical instruments.

Faced with the problem of an erratic, possibly low-bandwidth connection to an information repository containing hierarchically tiled nodes, a zooming user interface must address the problem of how to request tiles during navigation. In many situations, it is unrealistic to assume that all such requests will be met in a timely manner, or even that they will be met at all during the period when the information is relevant (i.e. before the user has zoomed or panned elsewhere.) It is therefore desirable to prioritize tile requests intelligently.

The "outermost" rule for tile request queuing is increasing level of detail relative to the display. This "relative level of detail", which is zoom-dependent, is given by the number/= (linear tile size in tile pixels)/(projected tile length on the screen measured in screen pixels). If/= 1, then tile pixels are 1 : 1 with screen pixels; if/=10, then the information in the tile is far more detailed than the display can show (10*10=100 tile pixels fit inside a single screen pixel); and if/=0.1 then the tile is coarse relative to the display (every tile pixel must be "stretched", or interpolated, to cover 10*10=100 display pixels). This rule ensures that, if a region of the display is undersampled (i.e. only coarsely defined) relative to the rest of the display, the client's first priority will be to fill in this "resolution hole". If more than one level of detail is missing in the hole, then requests for all levels of detail with/< 1, plus the next higher level of detail (to allow LOD blending — see #5), are queued in increasing order. At first glance, one might suppose that this introduces unnecessary overhead, because only the finest of these levels of detail is strictly required to render the current view; the coarser levels of detail are redundant, in that they define a lower-resolution image on the display. However, these coarser levels cover a larger area — in general, an area considerably larger than the display. The coarsest level of detail for any node in fact includes only a single tile by construction, so a client rendering any view of a node will invariably queue this "outermost" tile first.

This is an important point for viewing robustness. By robustness we mean that the client is never "at a loss" regarding what to display in response to a user's panning and zooming, even if there is a large backlog of tile requests waiting to be filled. The client simply displays the best (i.e. highest resolution) image available for every region on the display. At worst, this will be the outermost tile, which is the first tile ever requested in connection with the node. Therefore, every spatial part of the node will always be renderable based on the first tile request alone; all subsequent tile requests can be considered incremental refinements.

Falling back on lower-resolution tiles creates the impression of blurring the image; hence the overall effect is that the display may appear blurry after a sizeable pan or zoom. Then, as tile requests are filled, the image sharpens. A simple calculation shows that the overhead created by requesting "redundant" lower-resolution tiles is in fact minor — in particular, it is a small price to pay for the robustness of having the node image well-defined everywhere from the start.

2. Foveated tile request queuing. Within a relative level of detail, tile requests are queued by increasing distance to the center of the screen, as shown in Figure 3. This technology is inspired by the human eye, which has a central region — the fovea — specialized for high resolution. Because zooming is usually associated with interest in the central region of the display, foveated tile request queuing usually reflects the user's implicit prioritization for visual information during inward zooms. Furthermore, because the user's eye generally spends more time looking at regions near the center of the display than the edge, residual blurriness at the display edge is less noticeable than near the center.

The transient, relative increase in sharpness near the center of the display produced by zooming in using foveal tile request order also mirrors the natural consequences of zooming out — see Figure 4. The figure shows two alternate "navigation paths": in the top row, the user remains stationary while viewing a single document (or node) occupying about two thirds of the display, which we assume can be displayed at very high resolution. Initially the node contents are represented by a single, Io w- resolution tile; then tiles at the next LOD become available, making the node contents visible at twice the resolution with four (=2x2) tiles; 4x4=16 and 8x8=64 tile versions follow. In the second row, we follow what happens if the user were to zoom in on the shaded square before the image displayed in the top row is fully refined. Tiles at higher levels of detail are again queued, but in this case only those that are partially or fully visible. Refinement progresses to a point comparable to that of the top row (in terms of number of visible tiles on the display). The third row shows what is available if the user then zooms out again, and how the missing detail is filled in. Although all levels of detail are shown, note that in practice the very fine levels would probably be omitted from the displays on the bottom row, since they represent finer details than the display can convey.

Note that zooming out normally leaves the center of the display filled with more detailed tiles than the periphery. Hence this ordering of tile requests consistently prioritizes the sharpness of the central area of the display during all navigation.

3. Temporal LOD blending. Without further refinements, when a tile needed for the current display is downloaded or constructed and drawn for the first time, it will immediately obscure part of an underlying, coarser tile presumably representing the same content; the user experiences this transition as a sudden change in blurriness in some region of the display. Such sudden transitions are unsightly, and unnecessarily draw the user's attention to details of the software's implementation. Our general approach to ZUI design is to create a seamless visual experience for the user, which does not draw attention to the existence of tiles or other aspects of the software which should remain "under the hood". Therefore, when tiles first become available, they are not displayed immediately, but blended in over a number of frames — typically over roughly one second. The blending function may be linear (i.e. the opacity of the new tile is a linear function of time since the tile became available, so that halfway through the fixed blend- in interval the new tile is 50% opaque), exponential, or follow any other interpolating function. In an exponential blend, every small constant interval of time corresponds to a constant percent change in the opacity; for example, the new tile may become 20% more opaque at every frame, which results in the sequence of opacities over consecutive frames 20%, 36%, 49%, 59%, 67%, 74%, 79%, 83%, 87%, 89%, 91%, 93%, etc. Mathematically, the exponential never reaches 100%, but in practice, the opacity becomes indistinguishable from 100% after a short interval. An exponential blend has the advantage that the greatest increase in opacity occurs near the beginning of the blending-in, which makes the new information visible to the user quickly while still preserving acceptable temporal continuity. In our reference implementation, the illusion created is that regions of the display come smoothly into focus as the necessary information becomes available.

4. Continuous LOD. In a situation in which tile download or creation is lagging behind the user's navigation, adjacent regions of the display may have different levels of detail. Although the previous innovation (#3) addresses the problem of temporal discontinuity in level of detail, a separate innovation is needed to address the problem of spatial discontinuity in level of detail. If uncorrected, these spatial discontinuities are visible to the user as seams in the image, with visual content drawn more sharply to one side of the seam. We resolve this problem by allowing the opacity of each tile to be variable over the tile area; in particular, this opacity is made to go to zero at a tile edge if this edge abuts a region on the display with a lower relative level of detail. It is also important in some situations to make the opacity at each corner of the tile go to zero if the corner touches a region of lower relative level of detail.

Figure 5 shows our simplest reference implementation for how each tile can be decomposed into rectangles and triangles, called tile shards, such that opacity changes continuously over each tile shard. Tile X, bounded by the square aceg, has neighboring tiles L, R, T and B on the left, right, top and bottom, each sharing an edge. It also has neighbors TL, TR, BL and BR sharing a single corner. Assume that tile X is present. Its "inner square", iiii, is then fully opaque. (Note that repeated lowercase letters indicate identical vertex opacity values.) However, the opacity of the surrounding rectangular frame is determined by whether the neighboring tiles are present (and fully opaque). Hence if tile TL is absent, then point g will be fully transparent; if L is absent, then points h will be fully transparent, etc. We term the border region of the tile (X outside iiii) the blending flaps.

Figure 6 illustrates the reference method used to interpolate opacity over a shard. Part (a) shows a constant opacity rectangle. Part (b) is a rectangle in which the opacities of two opposing edges are different; then the opacity over the interior is simply a linear interpolation based on the shortest distance of each interior point from the two edges. Part (c) shows a bilinear method for interpolating opacity over a triangle, when the opacities of all three corners abc may be different. Conceptually, every interior point/? subdivides the triangle into three sub-triangles as shown, with areas A, B and C. The opacity at/? is then simply a weighted sum of the opacities at the corners, where the weights are the fractional areas of the three sub-triangles (i.e. A, B and C divided by the total triangle area A+B+C). It is easily verified that this formula identically gives the opacity at a vertex when/? moves to that vertex, and that if/? is on the triangle edge then its opacity is a linear interpolation between the two connected vertices.

Since the opacity within a shard is determined entirely by the opacities at its vertices, and neighboring shards always share vertices (i.e. there are no T-junctions), this method ensures that opacity will vary smoothly over the entire tiled surface. In combination with the temporal LOD blending of #3, this strategy causes the relative level of detail visible to the user to be a continuous function, both over the display area and in time. Both spatial seams and temporal discontinuities are thereby avoided, presenting the user with a visual experience reminiscent of an optical instrument bringing a scene continuously into focus. For navigating large documents, the speed with which the scene comes into focus is a function of the bandwidth of the connection to the repository, or the speed of tile rendition, whichever is slower. Finally, in combination with the foveated prioritization of innovation #2, the continuous level of detail is biased in such a way that the central area of the display is brought into focus first.

5. Generalized Hnear-mipmap-linear LOD blending. We have discussed strategies and reference implementations for ensuring spatial and temporal smoothness in apparent LOD over a node. We have not yet addressed, however, the manner in which levels of detail are blended during a continuous zooming operation. The method used is a generalization of trilinear interpolation, in which adjacent levels of detail are blended linearly over the intermediate range of scales. At each level of detail, each tile shard has an opacity as drawn, which has been spatially averaged with neighboring tile shards at the same level of detail for spatial smoothness, and temporally averaged for smoothness over time. The target opacity is 100% if the level of detail undersamples the display, i.e. /<1 (see #1). However, if it oversamples the display, then the target opacity is decreased linearly (or using any other monotonic function) such that it goes to zero if the oversampling is g-fold. Like trilinear interpolation, this causes continuous blending over a zoom operation, ensuring that the perceived level of detail never changes suddenly. However, unlike conventional trilinear interpolation — which always involves a blend of two levels of detail — the number of blended levels of detail in this scheme can be one, two, or more. A number larger than two is transient, and caused by tiles at more than one level of detail not having been fully blended in temporally yet. A single level is also usually transient, in that it normally occurs when a lower-than-ideal LOD is "standing in" at 100% opacity for higher LODs which have yet to be downloaded or constructed and blended in.

The simplest reference implementation for rendering the set of tile shards for a node is to use the so-called "painter's algorithm": all tile shards are rendered in back-to- front order, that is, from coarsest (lowest LOD) to finest (highest LOD which oversamples the display less than g-fold). The target opacities of all but the highest LOD are 100%, though they may transiently be rendered at lower opacity if their temporal blending is incomplete. The highest LOD has variable opacity, depending on how much it oversamples the display, as discussed above. Clearly this reference implementation is not optimal, in that it may render shards which are then fully obscured by subsequently rendered shards. More optimal implementations are possible through the use of data structures and algorithms analogous to those used for hidden surface removal in 3D graphics.

6. Motion anticipation. During rapid zooming or panning, it is especially difficult for tile requests to keep up with demand. Yet during these rapid navigation patterns, the zooming or panning motion tends to be locally well-predicted by linear extrapolation (i.e. it is difficult to make sudden reversals or changes in direction). Thus we exploit this temporal motion coherence to generate tile requests slightly ahead of time, thus improving visual quality. This is accomplished by making tile requests using a virtual viewport which elongates, dilates or contracts in the direction of motion when panning or zooming, thus pre-empting requests for additional tiles. When navigation ceases, the virtual viewport relaxes over a brief interval of time back to the real viewport.

Note that none of the above innovations are restricted to rectangular tilings; they generalize in an obvious fashion to any tiling pattern which can be defined on a grid, such as triangular or hexagonal tiling, or heterogeneous tilings consisting of mixtures of such shapes, or entirely arbitrary tilings. The only explicit change which needs to be made to accommodate such alternate tilings is to define triangulations of the tile shapes analogous to those of Figure 5, such that the opacities of the edges and the interior can all be controlled independently.

construction of composite node grid from superimposed irrational LOD tilings

(a) finest LOD (b) next-finest LOD, g=sqrt(3) (c) composite grid

fine (a) tile available partially obscured coarse (b) tile unobscured coarse (b) tile fine tile unavailable in (a)

FIGURE 1 Title: SYSTEM AND METHOD FOR THE EFFICIENT, DYNAMIC AND

CONTINUOUS DISPLAY OF MULTIRESOLUTION VISUAL

DATA

Inventor: BLAISE HILARY AGÜERA Y ARCAS

Field of the Invention

The present invention relates generally to multiresolution imagery. More specifically, the invention is a system and method for efficiently blending together visual representations of content at different resolutions or levels of detail in real time. The method ensures perceptual continuity even in highly dynamic contexts, in which the data being visualized may be changing, and only partial data may be available at any given time. The invention has applications in a number of fields, including (but not limited to) zooming user interfaces (ZUIs) for computers.

Background of the invention

In many situations involving the display of complex visual data, these data are stored or computed hierarchically, as a collection of representations at different levels of detail

(LODs). Many multiresolution methods and representations have been devised for different kinds of data, including (for example, and without limitation) wavelets for

digital images, and progressive meshes for 3D models. Multiresolution methods are also used in mathematical and physical simulations, in situations where a possibly lengthy calculation can be performed more "coarsely" or more "finely"; this invention also applies to such simulations, and to other situations in which multiresolution visual data may be generated interactively. Further, the invention applies in situations in which visual data can be obtained "on the fly" at different levels of detail, for example, from a camera with machine-controllable pan and zoom. The present invention is a general approach to the dynamic display of such multiresolution visual data on one or more 2D displays (such as CRTs or LCD screens).

In explaining the invention we will use as our main example the wavelet decomposition of a large digital image (e.g. as used in the JPEG2000 image format). This decomposition takes as its starting point the original pixel data, normally an array of samples on a regular rectangular grid. Each sample usually represents a color or luminance measured at a point in space corresponding to its grid coordinates. In some applications the grid may be very large, e.g. tens of thousands of samples (pixels) on a side, or more. This large size can present considerable difficulties for interactive display, especially when such images are to be browsed remotely, in environments where the server (where the image is stored) is connected to the client (where the image is to be viewed) by a low-bandwidth connection. If the image data are sent from the server to the client in simple raster order, then all the data must be transmitted before the client can generate an overview of the entire image. This may take a long time. Generating such an overview may also be computationally expensive, perhaps, for example, requiring downsampling a 20,000x20,000 pixel image to 500x500 pixels. Not only are such operations too slow to allow for interactivity, but they also require that the client have sufficient memory to store the full image data, which in the case just cited is 1.2 gigabytes (GB) for an 8-bit RGB color image (=3*20,000^Λ2). Nearly every image on the Web at present is under IOOK (0.1MB), because most users are connected to the Web at DSL or lower bandwidth, and larger images would take too long to download. Even in a local setting, on a typical user's hard drive, it is unusual to encounter images larger than 500K (0.5MB). That larger (that is, more detailed) images would often be useful is attested to by the fact that illustrated books, atlases, maps, newspapers and artworks in the average home include a great many images which, if digitized at full resolution, would easily be tens of megabytes in size.

Several years ago the dearth of large images was largely due to a shortage of non- volatile storage space (repository space), but advances in hard drive technology, the ease of burning CDROMs, and the increasing prevalence of large networked servers has made repository space no longer the limiting factor. The main bottlenecks now are bandwidth, followed by short-term memory (i.e. RAM) space.

Modern image compression standards, such as JPEG2000¹, are designed to address precisely this problem. Rather than storing the image contents in a linear fashion (that is, in a single pass over the pixels, normally from top to bottom and left to right), they are based on a multiresolution decomposition. The image is first resized to a hierarchy of resolution scales, usually in factors of two; for example, a 512x512 pixel image is resized to be 256x256 pixels, 128x128, 64x64, 32x32, 16x16, 8x8, 4x4, 2x2, and 1x1. We refer to the factor by which each resolution differs in size from the next higher — here 2 — as the granularity, which we represent by the variable g. The granularity may change at different scales, but here, for example and without limitation, we will assume that g is constant over the "image pyramid". Obviously the fine details

¹ http://ww_W ^,jpeg.org/JPEG2000.html are only captured at tihe higher resolutions, while the broad strokes are captured — using a much smaller amount of information — at the low resolutions. This is why the differently- sized images or scales are often called levels of detail, or LODs for short. At first glance it may seem as if the storage requirements for this series of differently-sized images might be greater than for the high-resolution image alone, but in fact this is not the case: a low-resolution image serves as a "predictor" for the next higher resolution. This allows the entire image hierarchy to be encoded very efficiently — more efficiently, in fact, than would usually be possible with a non-hierarchical representation of the high-resolution image alone.

If one imagines that the sequence of multiresolution versions of the image is stored in order of increasing size in a server's repository, then a natural consequence is that if the image is transferred from a server to a client, the client can obtain a low- resolution overview of the entire image very rapidly; finer and finer details will tihien "fill in" as the transmission progresses. This is known as incremental or progressive transmission, and is one of the major advantages of multiresolution representations. When progressive transmission is properly implemented, any image at all — no matter how large — can be viewed by a client in its spatial entirety (though not in its full detail)

almost immediately, even if the bandwidth of the connection to the server is very modest. Although the ultimate amount of time needed to download the image in full detail remains the same, the order in which this information is sent has been changed such that the large-scale features of an image are transmitted first; this is much more helpful to the client than transmitting pixel information at full detail and in "reading order", from top to bottom and left to right. To make random access efficient in a dynamic and interactive context, it is convenient (though not absolutely required) to subdivide each level of detail into a grid, such that a grid square, or tile, is the basic unit of transmission. The size in pixels of each tile can be kept at or below a constant size, so that each increasing level of detail contains about four times as many tiles as the previous level of detail. Small tiles may occur at the edges of the image, as its dimensions may not be an exact multiple of the nominal tile size; also, at the lowest levels of detail, the entire image will be smaller than a single nominal tile. Hence if we assume 64x64 pixel tiles, the 512x512 pixel image considered earlier has 8x8 tiles at its highest level of detail, 4x4 at the 256x256 level, 2x2 at the 128x128 level, and a single tile at the remaining levels of detail. The JPEG2000 image format includes the features just described for representing tiled, multiresolution and random-access images.

If a detail of a large, tiled JPEG2000 image is being viewed interactively by a client on a 2D display of limited size and resolution, then some particular set of adjacent tiles, at a certain level of detail, are needed to produce an accurate rendition. Ih a ^■ dynamic context, however, these may not all be available. Tiles at coarser levels of detail often will be available, however, particularly if the user began with a broad overview of the image. Since tiles at coarser levels of detail span a much wider area spatially, it is likely that the entire area of interest is covered by some combination of available tiles. This implies that the image resolution available will not be constant over the display area.

In a previously filed provisional patent application, I have proposed methods for "fading out" the edges of tiles where they abut a blank space at the same level of detail; this avoids the abrupt visual discontinuity in sharpness that would otherwise result when the "coverage" of a fine level of detail is incomplete. The edge regions of tiles reserved for blending are referred to as blending flaps. The simple reference implementation for displaying a finished composite image is a "painter's algorithm": all relevant tiles (that is, tiles overlapping the display area) in the coarsest level of detail are drawn first, followed by all relevant tiles in progressively finer levels of detail. At each level of detail blending was applied at the edges of incomplete areas as described. The result, as desired, is that coarser levels of detail "show through" only in places where they are not obscured by finer levels of detail.

Although this simple algorithm works, it has several drawbacks: first, it is wasteful of processor time, as tiles are drawn even when they will ultimately be partly or even completely obscured. In particular, a simple calculation shows that each display pixel will often be (re)drawn 1Og₂(I) times, where f is the magnification factor of the display relative to the lowest level of detail. Second, this technique relies on compositing in the framebuffer — meaning that, at intermediate points during the drawing operation, the regions drawn do not have their final appearance; this makes it necessary to use ^■ double-buffering or related methods and perform the compositing off-screen to avoid the appearance of flickering resolution. Third, unless an additional compositing operation is applied, this technique can only be used for an opaque rendition — it is not possible, for example, to ensure that the final rendition has 50% opacity everywhere, allowing other content to "show through". This is because the painter's algorithm relies precisely on the effect of one "layer of paint" (i.e. level of detail) fully obscuring the one underneath; it is not known in advance where a level of detail will be obscured, and where not. The Invention

The present invention resolves these issues, while preserving all the advantages of the painter's algorithm. One of these advantages is the ability to deal with any kind of LOD tiling, including non-rectangular or irregular tilings, as well as irrational grid tilings, for which I am filing a separate provisional patent application. Tilings generally consist of a subdivision, or tesselation, of the area containing the visual content into polygons. For a tiling to be useful in a multiresolution context it is generally desirable that the areas of tiles at lower levels of detail be larger than the areas of tiles at higher levels of detail; the multiplicative factor by which their sizes differ is the granularity g, which we will assume (but without limitation) to be a constant. In the following, an irrational but rectangular tiling grid will be used to describe the improved algorithm. Generalizations to other tiling schemes should be evident to anyone skilled in the art.

The improved algorithm consists of four stages. In the first stage, a composite grid is constructed in the image's reference frame from the superposition of the visible parts of all of the tile grids in all of the levels of detail to be drawn. When the irrational tiling innovation (detailed in a separate provisional patent application) is used, this results in an irregular composite grid, shown schematically in Figure 1. The grid is further augmented by grid lines corresponding to the x- and >>-values which would be needed to draw the tile "blending flaps" at each level of detail (not shown in Figure 1, because the resulting grid would be too, dense and visually confusing). This composite grid, which

can be defined by a sorted list of x- and ^-values for the grid lines, has the property that the vertices of all of the rectangles and triangles that would be needed to draw all visible tiles (including their blending flaps) lie at the intersection of an x andy grid line. Let there be n grid lines parallel to the jc-axis and m grid lines parallel to the^-axis. We then construct a two-dimensional n * m table, with entries corresponding to the squares of the grid. Each grid entry has two fields: an opacity, which is initialized to zero, and a list of references to specific tiles, which is initially empty.

The second stage is to walk through the tiles, sorted by decreasing level of detail (opposite to the naive implementation). Each tile covers an integral number of composite grid squares. For each of these squares, we check to see if its table entry has an opacity less than 100%, and if so, we add the current tile to its list and increase the opacity accordingly. The per-tile opacity used in this step is stored in the tile data structure. When this second stage is complete, the composite grid will contain entries corresponding to the correct pieces of tiles to draw in each grid square, along with the opacities with which to draw these "tile shards". Normally these opacities will sum to one. Low-resolution tiles which are entirely obscured will not be referenced anywhere in this table, while partly obscured tiles will be referenced only in tile shards where they are partly visible.

The third stage of the algorithm is a traversal of the composite grid in which tile shard opacities at the composite grid vertices are adjusted by averaging with neighboring vertices at the same level of detail, followed by readjustment of the vertex opacities to preserve the summed opacity at each vertex (normally 100%). This implements a refined

version of the spatial smoothing of scale described in a separate provisional patent application. The refinement comes from the fact that the composite grid is in general denser than the 3x3 grid per tile defined in innovation #4, especially for low-resolution tiles. (At the highest LOD, by construction, the composite gridding will be at least as fine as necessary.) This allows the averaging technique to achieve greater smoothness in apparent level of detail, in effect by creating smoother blending flaps consisting of a larger number of tile shards.

Finally, in the fourth stage the composite grid is again traversed, and the tile shards are actually drawn. Although this algorithm involves multiple passes over the data ^• and a certain amount of bookkeeping, it results in far better performance than the naϊve algorithm, because much less drawing must take place in the end; every tile shard rendered is visible to the user, though sometimes at low opacity. Some tiles may not be drawn at all. This contrasts with the naϊve algorithm, which draws every tile intersecting with the displayed area in its entirety.

An additional advantage of this algorithm is that it allows partially transparent nodes to be drawn, simply by changing the total opacity target from 100% to some lower value. This is not possible with the naϊve algorithm, because every level of detail except the most detailed must be drawn at full opacity in order to completely "paint over" any underlying, still lower resolution tiles.

When the view is rotated in the x-y plane relative to the node, some minor

changes need to be made for efficiency. The composite grid can be constructed in the usual manner; it may be larger than the grid would have been for the unrotated case, as larger coordinate ranges are visible along a diagonal. However, when walking through tiles, we need only consider tiles that are visible (by the simple intersecting polygon criterion). Also, composite grid squares outside the viewing area need not be updated during the traversal in the second or third stages, or drawn in the fourth stage. Note that a number of other implementation details can be modified to optimize performance; the algorithm is presented here in a form that makes its operation and essential features easiest to understand. A graphics programmer skilled in the art can easily add the optimizing implementation details. For example, it is not necessary to keep a list of tiles per tile shard; instead, each level of detail can be drawn immediately as it is completed, with the correct opacity, thus requiring only the storage of a single tile identity per shard at any one time. Another exemplary optimization is that the total opacity rendering left to do, expressed in terms of (area) x (remaining opacity), can be kept track of, so that the algorithm can quit early if everything has already been drawn; then low levels of detail need not be "visited" at all if they are not needed.

The algorithm can be generalized to arbitrary polygonal tiling patterns by using a constrained Delaunay triangulation instead of a grid to store vertex opacities and tile shard identifiers. This data structure efficiently creates a triangulation whose edges contain every edge in all of the original LOD grids; accessing a particular triangle or vertex is an efficient operation, which can take place in of order n*log(ή) time (where n is the number of vertices or triangles added). The resulting triangles are moreover the basic primitive used for graphics rendering on most graphics platforms.

Title: SYSTEM AND METHOD FOR INFINITE PRECISION

COORDINATES IN A ZOOMING USER INTERFACE

Inventor: BLAISE HILARY AGUERA Y ARCAS

Field of the Invention

The present invention relates generally to zooming user interfaces (ZUIs) for computers. More specifically, the invention is a system and method for efficiently representing and navigating through zoomable content using hierchical data structures which allow the content to have effectively infinite precision spatial positioning and size. This enables zoomable environments of unlimited scale or depth.

Background of the invention

Most present-day graphical computer user interfaces (GUIs) are designed using visual components of fixed spatial scale. However, it was recognized from the birth of the field of computer graphics that visual components could be represented and manipulated in such a way that they do not have a fixed spatial scale on the display, but can be zoomed in or out. The desirability of zoomable components is obvious in many application domains; to name only a few: viewing maps, browsing through large heterogeneous text layouts such as newspapers, viewing albums of digital photographs, and working with visualizations of large data sets. Even when viewing ordinary documents, such as spreadsheets and reports, it is often useful to be able to glance at a document overview, then zoom in on an area of interest. Many modern computer applications include zoomable components, such as Microsoft® Word ® and other Office ® products (Zoom under the View menu), Adobe ® Photoshop ®, Adobe ® Acrobat ®, QuarkXPress ®, etc. In most cases, these applications allow zooming in and out of documents, but not necessarily zooming in and out of the visual components of the applications themselves. Further, zooming is normally a peripheral aspect of the user's interaction with the software, and the zoom setting is only modified occasionally. Although continuous panning over a document is standard (i.e., using scrollbars or the cursor to translate the viewed document left, right, up or down), the ability to zoom continuously is almost invariably absent. In a more generalized zooming framework, any kind of visual content could be zoomed, and zooming would be as much a part of the user's experience as panning. Ideas along these lines made appearances as futuristic computer user interfaces in many movies even as early as the 1960s¹; recent movies continue the trend². A number of continuously zooming interfaces have been conceived and/or developed, from the 1970s through the present. In 1991, some of these ideas were formalized in U.S. Patent 5,341,466 by Kenneth Perlin and Jacob Schwartz At New York University ("Fractal Computer User Centerface with Zooming Capability"). The prototype zooming user interface developed by Perlin and co-workers, Pad, and its successor, Pad++, have undergone some development since⁴. To my knowledge, however, no major application based on a full ZUI (Zooming User Interface) has yet appeared on the mass market, due in part to a number of technical shortfalls, one of which is addressed in the present invention.

¹ e.g. Stanley Kubrick's 2001: A Spaa Odyssey, Turner Entertainment Company, a Time Warner company (1968).

³ An early appearance is W.G Donelson, Spatial Mamgermt cf Irfcmutian, Proceedings of Computer Graphics SIGGRAPH (1978), ACM Press, p. 203-9. A recent example is Zanvas.com, which launched in the summer of 2002.

⁴ Perlin describes subsequent developments at http://mrl.nyu.edu/projects/zui/. Summary of the invention

The present invention embodies a novel idea on which a newly developed zooming user interface framework (hereafter referred to by its working name, Voss) is based. Voss is more powerful, more responsive, more visually compelling and of more general utility than its predecessors due to a number of innovations in its software architecture. This patent is specifically about Voss's approach to coordinate systems and navigation.

Most graphics architectures take as their point of departure a 2D coordinate system, which defines a point in two dimensional (2D) space as a pair of numbers, usually termed the x- and y-coordinates (x,y), representing horizontal and vertical displacements from the origin, (0,0). Occasionally 2D points are also represented using non-Cartesian coordinate systems, such as polar coordinates; the substantive aspects of the following discussion apply equally to any such coordinate systems. In the field of three dimensional (3D) graphics, a 3D coordinate system defined by a triplet of numbers (x,y,z) is normally used to represent points in space; again, these may or may not be Cartesian coordinates. Because displays are normally two dimensional, view-dependent mathematical transformations are required to reduce three dimensional world coordinates to two dimensional screen coordinates. In either case, the coordinates being manipulated are normally represented using a numerical data type native to the computer, either integer or floating-point. Such data types normally use between 16 and 64 bits (binary digits) of memory. Because of their limited representational size, these numbers have limited precision — that is, their decimal expansions are only defined out to some limited number of significant places. In the case of 64-bit floating point numbers, this is about 15 decimal places.

When the coordinate. system is "locked" to the display, i.e. each 2D coordinate pair (x,y) corresponds to a fixed point on the display surface, this precision is more than adequate. However, in the context of a zooming user interface, the user is easily able to zoom in, causing the area which previously covered a single pixel to fill the entire display, or zoom out, causing the contents of the entire display to shrink to the size of a single pixel. Each such zoom may effectively multiply or divide the (x,y) coordinates by a factor of approximately 1,000. Several such zooms in or out therefore exhaust the precision of any standard internal floating-point representation. (Five such zooming operations, for example, would completely exhaust the precision of a 64-bit foating point number. In this context, visual artifacts resulting from discretization or "roundoff error" would begin to be noticeable after three such zooms.) Yet in a zooming user interface it may be desirable to zoom in or out much farther. This implies that global or world coordinates cannot be stored in standard machine formats without severely restricting the scope for zooming.

The present invention solves this problem by dispensing entirely with world coordinates. Instead, all zooming and panning operations are conducted in a tree (or more generally, a directed graph) of local coordinate systems which collectively define the zoomable "universe" of content. Content comprises a collection of nodes, which are themselves defined using a local 2D coordinate system of machine-precision coordinates. If standard 64-bit floating point numbers are used, a single node is thus limited to having about 15 decimal places of precision per coordinate, or, in terms of pixels, being at most about 10^Λ14 pixels on a side. However, a node may be the parent of a number of child nodes, each of which is geometrically contained within the boundary of the parent. The child's size and position relative to the parent can be specified in the parent's local coordinate system, and thus fit into machine-precision numbers; the child may have its own local coordinate system, however, which allows it in turn to have (for example) resolution up to 10^Λ14 pixels on a side. By thus embedding child nodes within child nodes, we have a construction which allows infinitely deep nesting of visual content, while still storing, manipulating, and performing geometric calculations only using machine-precision numbers. The main body of the patent further elucidates this scheme, outlining exemplary implementations of panning and zooming operations using the data structure just described.

An objective of the present invention is to allow a pannable and zoomable 2D space of finite "physical size", but arbitrarily high complexity or resolution, to be embedded into a well-defined area of a larger pannable and zoomable 2D space.

A further objective of the present invention is to allow geometric trees or directed graphs of visual objects to be constructed by the above embedding procedure, allowing such trees or graphs to become arbitrarily large and complex while retaining the ability to pan and zoom through the resulting space.

An objective of the present invention is therefore to allow fluid zooming and panning in a virtual 2D universe of potentially unlimited visual complexity and detail on ordinary, present-day computer architectures.

A further objective of the present invention is to mimic the behavior of infinite- precision arithmetic on the coordinates of a pannable and zoomable 2D space while still retaining the computational speed of coordinate calculations performed on native machine precision numbers. The software package Mathematica™ (© Wolfram Research) provides examplary implementations of data structures and algorithms for infinite precision arithmetic (which however do not satisfy these same criteria).

A further objective of the present invention is to mimic the behavior of infmite- precision arithmetic on the coordinates of a pannable and zoomable 2D space while avoiding the large memory consumption of infinite-precision numbers.

A further objective of the present invention is to allow the embedding of reusable visual content into a zoomable and pannable 2D space by reference, i.e. without needing to update any coordinates or other data structures in the content to be embedded. Because this allows a 2D space to be embedded within another without a traversal of the new child's coordinate system tree, this capability enables the embedding of any 2D space, regardless of complexity.

A further objective of the present invention is to allow infinite nesting in zoomable and pannable content due to circular references: a node with content B may be a child of a node with content A (i.e. B appears geometrically in the interior of A), and node B may in turn contain a node with content A as a child. In a complex zoomable environment where visual content can be reused and included by reference, this type of recurrence can occur very easily. This generalizes the concept of a tree of nodes with associated coordinate systems to that of a directed graph of nodes with associated coordinate systems. A further objective of the present invention is to allow zooming out after a deep zoom-in to behave like the "back" button of a web browser, letting the user retrace his or her steps through a visual navigation.

A further objective of the present invention is to allow zooming in immediately after zooming out to behave analogously to the "forward" button of a web browser, letting the user precisely undo the effects of an arbitrarily long zoom-out.

A further objective of the present invention is to allow a node to have a very large number of child nodes (for example, up to 10^A28).

A further objective of the present invention is to allow a node to generate its own children programmatically on the fly, enabling content to be defined, created or modified dynamically during navigation.

These and other objectives of the present invention will become apparent to those skilled in the art from a review of the specification that follows. Conventions

In what follows we will use several pseudo-code conventions. Data structures (sometimes known as abstract data types, or ADTs) will be introduced using the word Structure followed by material in curly braces {...}. Within the curly braces will be listed the fields, or data elements that comprise the structure, in the format DataType variableName where DataType is either a previously defined structure or a primitive type, and variableName is the field's name. Note that data types and functions will always begin with a capital letter, and variable names or field names will always begin with lowercase. The primitive types used will be Boolean (which can take on values true or false), Double (a 64-bit floating point number corresponding to C language's double type), Integer (a 64-bit signed integer data type), and String (a character string). The names of structures and variables, as well as details of the data types and formats used, are exemplary; alternate implementations of the present invention may alter any of these details, include any number of additional fields, or use different structures or internal representations.

For convenience we will immediately define the following structures, for storing (respectively) the positions of points on the 2D Cartesian plane and axis-aligned rectangles on the plane: Structure Point2D { Double x; Double y; } Structure Rectangle ( Point2D Io ; Point2D hi; } We will assume (without restriction) a mathematical coordinate system, with the x-axis horizontal and increasing to the right, and y-axis vertical and increasing upward. The

points Io and hi in Rectangle represent the lower-left and upper-right corners of a rectangular area. To reference fields of structures we will use the period ("."), as in the following pseudocode: Boolean Function IsPointlnBox (Point2D p, Rectangle r) { if p . x > r . lo . x & p . x < r . hi . x & p . y > r . lo . y & p . y < r . hi . y { return true; } else { return false; } } This function determines whether a point is within a rectangle. Statements such as return are equivalent to their C language counterparts. The syntax used should be understandable to anyone skilled in the art. Italics will be used within pseudo-code to indicate abstract or complex actions which are most easily described in English.

Finally, we define two container data types: Collection<T>, which stores an

unordered set of objects of type T, and Stack<T>, which stores objects of type T on a

last-in, first-out (LIFO) basis. To iterate over a collection, we will use the syntax for all x in collection {... do something with x ...} where collection is of type Collection<T> and x stands in for each object of

type T in the container; for each such object the code in the curly braces is executed. It is assumed that the order in which the objects are processed does not matter. For stacks, we define the following functions: Function Push(Stack<T> stack, T t) Function T Pop(Stack<T> stack) Function Integer Count (Stack<T> stack) Function T Element(Stack<T> stack, Integer n) The Push function appends object t to stack, and the Pop function removes the last

element pushed, returning it. Count returns the number of objects in the stack (an

integer greater than or equal to zero), and Element looks up an element in the stack by

index, returning the element but leaving the stack unaltered. Following the C convention, valid indices begin at zero for the first element and run up through Count ( stack) -1.

Detailed description of the invention

We assume a user interface metaphor in which the display is a camera, through which the user can view part of a two-dimensional surface, or 2D universe. For convenience, although it is not necessary to do so, we ascribe physical dimensions to this universe, so that it may be, for example, one meter square.

The universe in turn contains of 2D objects, or nodes, which have a visual representation, and may also be dynamic or interactive (i.e. video clips, applications, editable text documents, CAD drawings, or still images). For a node to be visible it must be associated with a rendering method, which is able to draw it in whole or in part on some area of the display. Each node is endowed with a local coordinate system of finite precision. For illustrative purposes we will assume that a node is rectangular and represents its local coordinates using the Point2D and Rectangle data structures as defined above. Thus a Rectangle will define the boundaries of the local coordinate

system. More generally, a node may be non-rectangular and/or use a different coordinate system. Thus we define Structure Node { Rectangle coordSystem;

} where the ellipsis (...) indicates that Node is to have other fields as well, which we will specify later. A minimal rendering method for a node (which we will give the data type Node) might thus require the following arguments: Function RenderNode (Node node, Rectangle OnDisplay, Rectangle onNode) This exemplary function would render the part of node defined by the onNode

rectangle (in the node's coordinate system) to the rectangle on the display defined by onDisplay (in display or "screen" coordinates). The rectangle onNode should in

general be within node . coordSystem. For the rendition to be visible the rectangle

onDisplay should be within a rectangle defining the boundaries of the display in display coordinates.

Each node may have zero or more child nodes, which it addresses by reference. This means that a node need not, and generally does not, contain all the information of each child node, but instead only an address providing the information necessary to obtain the child node. A URL (http://...) is an example of such an address, but addresses may take other forms, e.g. a pointer in memory, a globally unique identifier, a hardware port, etc. We define the abstract data type Address to represent an address generally, and a function Function Node Dereference (Address address) that returns a reference to a node, given its address.

In addition to the child node's address, a reference to a child must specify the child's size and location in the parent node's local coordinate system. Thus we have Structure ChildReference { Address address; Rectangle placement; } where once again placement should be within the parent, i.e. inside the parent's coordSystem. Thus we expand the definition of a node: Structure Node { Rectangle coordSystem; Collection<ChildReference> children;

} Different nodes may share some or all of their children, but perhaps in a different spatial arrangement, allowing the possibility of different views of the same information.

We are now in a position to define in somewhat more detail the generic behavior of the node rendering method: Function RenderNode (Node node, Rectangle OnDisplay, Rectangle onNode) { ... draw node ... for all childRef in children ( Rectangle childOnDisplay = CalcRectangle (onDisplay, childRef.placement) ; if Area(childOnDisplay) >= minimumArea { Child child = Dereference (childRef); RenderNode (child, childOnDisplay, child.coordSystem) ; } } } This function will now render a node and, recursively, its children. The CalcRectangle function computes the display rectangle occupied by a child given the

parent's display rectangle and the child's placement within the parent. This rendering method is considerably simplified, assuming, for example, that if the parent is visible, all of its children are also visible. It satisfies important design criteria, however:

1. Global coordinates are not used.

2. Assuming that child nodes are smaller than their parent nodes, the function terminates in finite time, because eventually child nodes will be smaller than

minimumArea. This is true even if there are an infinite number of nodes in the

tree.

3. If minimumArea is made sufficiently small, then the visual effect of the

truncated rendition may become negligible, because any nodes not drawn are too small to affect the overall appearance of the display.

4. Recurrence is allowed: a node may be its own descendant. Thus the directed graph of nodes defined by the "is a child of relationship may have cycles (making it no longer a tree in the graph theoretic sense). If children occupy a substantial portion of their parents' area, and a graph cycle is small (i.e. A->B->A or A->B->C-^A) then this results in a hall-of-mirrors effect.

User interaction with a node, such as typing text into it, normally requires that the node be visible. A number of different models might be used for selecting the node with which an interaction is to take place: for example, the tab key may cycle through nodes, or the node under the mouse pointer may be the target. In any case, the number of nodes that are candidates for user interaction is of the same order as the number of nodes rendered, and thus finite. Methods analogous to the rendering function described above can be used to pass user interaction messages to nodes, which may affect their future behavior or appearance. This architecture is therefore sufficient to enable a node to be a complete software application, and not merely a static visual object.

In addition to viewing and interacting with nodes, the user may navigate using continuous zoom and pan operations. Zooming in means progressively enlarging a part of the content visible on the display so that it occupies more area on the display; a smaller physical area will then be visible, but in greater detail. Zooming out is the converse operation. Because we have assumed that the physical dimensions of the universe are bounded, zooming out is a bounded operation: once the entire universe is visible, further zooming out cannot bring any new content into view, but simply shrinks the universe to an area smaller than the entire display. It is therefore natural to define a root node as encompassing the entire universe; it has child nodes which are visible when fully zoomed out, these children have their own child nodes, etc. Because a child must be geometrically within the parent's boundary, child nodes are in general smaller than parent nodes. Each node, however, has its own local coordinate system, so this construction allows for a cascade of ever-finer coordinate systems, and thus for a universe of potentially infinite spatial resolution. This means that zooming in is not a bounded operation: if the node graph is has cycles, one may zoom forever in a "content loop"; or more interestingly, the node graph may have a very large or even infinite number of nodes, allowing one to zoom in indefinitely while viewing new content all the time.

For the architecture to truly allow infinite resolution, it is necessary to be able to pan, zoom and render the display efficiently, without at any time traversing the (potentially infinite) node graph. We have seen that rendition of the area occupied by a node visible on the display will occur in finite time if we initially call the RenderNode

function on this node. We must now address the question of how the visible nodes can in general be found during dynamic zooming and panning.

This may be accomplished with the addition of a field to the node structure and an additional address stack data structure. The expanded Node definition is: Structure Node { Rectangle coordSystem; Collection<ChildReference> children; Rectangle view;

} The new view field represents, in node coordinates, the visible area of the node — that is,

the image of the display rectangle in node coordinates. This rectangle may only partially overlap the node's area as defined by coordSystem, as when the node is partially off-

screen. Clearly the view field cannot always be kept updated for every node, as we cannot necessarily traverse the entire directed graph of nodes in finite time. The stack structure is defined thus: Stack<Address> viewStack; where this stack is a global variable of the client (the computer connected to the display). For exemplary purposes we assume that navigation begins with an overview of a universe of content, defined by a root node; then this root node is pushed onto the viewStack,

and the root node's view field might be initialized to be the entire area of the root node, i.e. rootNode.view = rootNode.coordSystem; Push(viewStack, rootNode); Schematically, the viewStack will specify the addresses of a sequence of nodes

"pierced" by a point relative to the display, which we will take in our exemplary implementation to be the center of the display. This sequence must begin with the root node, but may be infinite. We must therefore truncate the sequence, and we do so using the same criterion used in RenderNode: the sequence stops when the nodes "pierced"

become smaller than some minimum size, defined above as minimumArea. The

current view is then represented by the view fields of all of the nodes in the

viewStack, each of which specify the current view in terms of the node's local

coordinate system. If a user has zoomed very deeply into a universe, then the detailed location of the display will be given most precisely by the view field of the last node in

the stack. The last element's view field does not, however, specify the user's viewpoint

relative to the entire universe, but only relative to its local coordinates. The view field of the root node, on the other hand, does specify where in the universe the user is looking, although due to roundoff and discretization error it is probable that the root node's view . Io and view . hi have collapsed to a point, and this point will only be a finite- precision approximation to the real view position. Nodes closer to the "fine end" of the viewStack thus specify the view position with increasing precision, but relative to

progressively smaller areas in the universe.

In this context, note that the naϊve implementation of the RenderNode function as defined earlier is flawed, in that the CalcRectangle function calculates the display's overlap with each node beginning from the root node, using progressively the

placement field of each node traversed and the onDisplay argument, which is

passed down recursively. In a deep zoom, progressive loss of precision will make this calculation fail to give correct results. In a corrected version, the CalcRectangle

function is simply replaced by the node's view field. The problem then reduces to the

following: the views (i.e. view fields) of all visible nodes must be kept synchronized as

the user navigates through the universe, panning and zooming. Failure to keep them synchronized would result in the appearance of nodes moving on the display independently of each other, rather than behaving as a cohesive and physically consistent 2D surface.

Changing the view during any navigation operation proceeds as follows. Because the last node in the viewStack has the most precise representation of the view, the first

step is to alter the view field of this last node; this altered view is taken to be the correct

new view, and any other visible nodes must follow along. The second step is to propagate the new view "upward" toward the root node, which entails making progressively smaller and smaller changes to the view fields of nodes earlier in the

stack. If the user is deeply zoomed, then at some point in the upward propagation the alteration to the view may be so small that it ceases to be accurately representable; upward propagation stops at this node. At each stage of the upward propagation, the change is also propagated downward to other visible nodes, using the approach of the unmodified RenderNode pseudo-code. Hence, first, the last node's parent's view is

modified; then, in the downward propagation, the last node's "siblings". The next upward propagation modified the grandparent's view, and the second downward propagation modifies first uncles, then first cousins. The downward propagation is halted, as before, when the areas of "cousin nodes" become smaller than minimumArea, or when a node falls entirely offscreen.

A panning operation may move the last node far enough away that it no longer belongs in the viewStack. Alternatively, zooming in might enlarge a child of the last

node above minimumArea, requiring a lengthening of the viewStack, or zooming

out might bring the last node's area below minimumArea, requiring a truncation of the

viewStack. In all of these cases the identity of the last node changes. These situations

are detected during the downward propagation, which may alter the viewStack

accordingly, potentially leaving it longer or shorter.

An extension of the idea is to avoid truncating the viewStack immediately in

response to a long outward zoom. Truncating the viewStack is only necessary if the user then pans. Although a long outward zoom will cause the view fields of deeply zoomed nodes to grow very large (and therefore numerically inaccurate), a field

Point2D viewCenter;

can be added to the Node structure, representing the central point of the view rectangle;

zooming without panning therefore does not alter the viewCenter field of any node. This construction allows zooming far outward to be followed immediately by zooming back in. Because the viewStack has been left intact, the user can then return to

precisely the starting view. This behavior is analogous to the "back" and "forward" buttons of a web browser: "back" is analogous to zooming out, and "forward" is analogous to zooming back in. In a web browser, if a user uses "back" to return to a previous web page, but then follows an alternate link, it is at this point that "forward" ceases to work. Following an alternate link is thus analogous to panning after zooming out.

The Zeno cache: a system for increasing the effectiveness of most- recently- used (MRU)caching for variably compressable data objects.

Introduction.

The infinite sum of the series y(n)=n^Λ-p, n going from 1 to infinity, with p>l is finite. Similarly, the sum of y=l/b^An is finite for b >1. (For example, in the latter case, if b=2, the sum is exactly 2.) The concept of convergent series like these can be used to implement a highly efficient form of data caching which we term the Zeno cache, named after the famous Zeno paradox. Zeno is a runner who is so quick that in one step (which, for the sake of argument, we could say he makes every second) he covers half the distance to the end of any racetrack; the paradox, of course, is that he never finishes the course, in spite of moving forward with every step. This paradox is easily related to the l/b^Λn series above with b=2, and summing from n=2 to infinity.

Prior art.

"MRU caching", where MRU stands for "most recently used", is a well-known concept for implementing a client-side memory in a client-server system. It is assumed that the server has access to and can serve to a client a large number of data objects, which in the aggregate may occupy a large amount of memory. The available bandwidth between client and server is limited, however, so client requests for data objects to be sent from the server take time. If access to data objects is reasonably "coherent", meaning that objects which the client needed recently are likely to be needed again in the near future, then MRU caching is a way to greatly increase the efficiency of the client-server system.¹ The idea is that the client sets aside some limited amount of memory (generally much less than would be needed to store all of the objects on the server), and stores in this memory as many of the most recently requested objects as will fit. When a new object is sent from the server to the client and the client's cache space has run out, the least recently used (LRU) object is erased from the cache to make room. ^"We will refer to the LRU and MRU ends of the cache; objects are always added at the MRU end, and erased from the LRU end. (Note that the physical layout of objects in memory need not correspond to the LRU-MRU layout; the architecture simply requires that it be possible for the client to find, insert and erase objects in the manner described here. The linear LRU-MRU arrangement is merely a conceptual convenience.) When the client needs a data object, then, the cache is first examined to see if the object is cached. If it is, then the cached representation is used, obviating the need for an expensive server request; usually, making use of a cached representation also "promotes" that object to the MRU end of the cache. The performance advantages of this scheme are obvious.

¹ Generalizations of the concept of coherence as described above -will be understood to one skilled in the art. Caching can confer performance advantages in any situation in which a client request for one object affects the probability distribution of the likelihoods of requesting other objects in the near future. Straightforward MRU caching is optimized for the case in which this alteration is simply modeled as an increased likelihood of requesting the same object again; but generalizations exist and the present invention can be extended to them. Zeno caching concept.

In some situations, the data objects being served are compressable, which for our purposes "will mean amenable to lossy data compression techniques. Lossy data compression is characterized by the ability to represent a data object with fewer bytes than the full representation; higher compression ratios (meaning more compression) correspond to higher distortion, or lower quality. For Zeno caching, the nature of the data and associated compression algorithm should have the following characteristics:

Required— compressed versions of the data should be suitable as stand-ins for the uncompressed data. Below a certain level of distortion, the compressed representations may be fully adequate, and above a certain level of distortion, they may be adequate as a stopgap while the client waits for uncompressed, lossless or higher- quality versions of the data.

In an enhanced embodiment, lower-quality representations are subsets of higher- quality representations, meaning that improving the representation quality at the client side using additional information available on the server side only requires sending the missing data or difference, not re-sending an entirely new version of the data. This avoids redundancy and hence substantially increases efficiency.

The enhanced embodiment above also usually implies that lowering the representation quality of an object merely involves throwing away some of the data, not re-compressing. This properly is also important for efficiency.

In an enhanced embodiment, the compression technique scales from lossy to lossless (i.e. a perfect, or zero distortion representation). Combined with the above enhanced embodiments, this allows a perfect representation of a data object to be built up in steps, from highly lossy to lossless, at little or no extra total cost relative to sending across a lossless version initially.

An example of a data type and compression technology satisfying all of the above is wavelet compression of images, as exemplified by the JPEG2000 standard.

Given these properties, if memory were "continuous" (i.e. not discretized into bytes) then it would be possible in theory to cache an infinite number of data objects in a finite amount of memory, merely by enforcing the constraint that the compressed sizes of the objects follow a convergent series like those given at the beginning of the disclosure. Most recently used objects are represented at low distortion, and less recently used objects are compressed progressively more. The sum of the sizes of all objects can still be any finite number, as shown below.

In practice, this scheme must be modified for two reasons:

Memory is discrete, so that, for example, it is usually meaningless in practice to compress an object to a representations smaller than one bit.

Enforcing the continuous curve of compression ratios described by one of the convergent formulas above would require visiting every object in the cache and reducing its representation size every time some space needs to be freed; this is a theoretically infinite number of operations (and in practice, could be a large number of operations).

Practical version.

The number of objects in the cache will in practice be finite, but by using the Zeno cache concept this number may be much larger than would be possible with conventional MRU caching. Further, cached objects will have the property that if recently used, they will be represented in the cache at high quality, and this quality will deteriorate progressively if the object has not been accessed recently. In this sense, Zeno caching probably works a lot like the human memory.

Because memory is discrete and there is a minimum compressed size below which a compressed representation has no value to the user, cached representations will be subject to a maximum compression ratio. The absolute maximum number of objects that could be stored in the cache is thus the cache size divided by this minimum compressed size, assuming that the objects are all of equal size (of course, they need not be).

There are many ways to design a series which is bounded above by one of the theoretical equiations given earlier (or any other convergent sum), and which therefore also has a finite sum. An additional constraint can also be introduced: that the likelihood of any given value repeating sequentially increases at higher n in such a way that a fairly small number of discrete values are y are used. An example of such a series is

1, 1/4, 1/4, 1/16, 1/16, 1/16, 1/16, 1/64, 1/64, 1/64, 1/64, 1/64, 1/64, 1/64, 1/64, 1/256,

Clearly the sum of 1, two quarters, four sixteenths, eight sixty-fourths, etc. is 2, just like y=l/2^An, but if we take the series out to n=16000, only about Iog2(16000) or 14 values of y will be used. At n=l million, only about 20 values are used. This implies that when space has to be freed in the cache, only a small number of operations need to take place to keep the current residents of the cache "in line" with their quota— the great majority will already be compressed to the right size.

Many other sequences also satisfy the necessary criteria. Additionally, it is possible to use series which are not theoretically convergent (Le. whose sums are infinite), since in practice a finite number of terms will be summed in any case. Generalizations .

Stochastic generalizations. Random algorithms can be used to improve the basic algorithm in a number of ways. The chief disadvantage of the 2*1/4, 8*1/16 etc. series above arises from its strength— its small number of assumed values. Random choice can also be used to "squeeze" a random subset of the cache elements in a weighted fashion until some target amount of space is freed. This works because exact position in the cache becomes decreasingly important with large ru The amount of squeezing can also be (somewhat) random. Using random approaches like these can eliminate obvious discontinuities or thresholds in object quality.

Rather than just MRU/LRU, caching can also involve intelligent guessing about which objects might be needed next; thus objects less likely to be needed can be "squeezed" before objects with a higher likelihood of being needed in the future. This can be combined with a random algorithm.

Claim:

An MRU/LRU caching system substantially as described.

METHOD FOR SPATIALLY ENCODING LARGE TEXTS, METADATA, AND OTHER COHERENTLY ACCESSED NON-IMAGE DATA

Recently, image compression standards such as JPEG2000/JPIP¹ have been introduced to meet a demanding engineering goal: to enable very large images (i.e. gigapixels in size) to be delivered incrementally or selectively from a server to a client over a low-bandwidth communication channel. When such images are being viewed at full resolution, only a limited region can fit on a client's graphical display at any given time; the new standards are geared toward selectively accessing such regions and sending across the communication channel only data relevant to the region. If this "region of interest" or ROI changes continuously, then a continuous dialogue between a client and server over a low- bandwidth channel can continue to keep the client's representation of the area inside the ROI accurate.

The present invention relates to an extension of these selectively decompressable image compression and transmission technologies to textual or other non-image data. In the simplest instantiation, imagine a large text, e.g. the book Ulysses, by James Joyce. We can format this text by putting each chapter in its own column, with columns for sequential chapters arranged Ieft-to-right. Columns are assumed to have a maximum width in characters, e.g. 100. Figure 2 shows the entire text of Ulysses encoded as an image in this fashion, with each textual character corresponding to a single pixel. The pixel intensity value in Figure 1 is simply the ASCII code of the corresponding character. Because greyscale pixels and ASCH characters both fit in 8 bits (values 0-255), the correspondence between a pixel value and a character is quite natural. The full text of Ulysses in ordinary ASCπ representation (i.e. as a .txt file) is 1.5MB, which may be too large to transmit in its entirety over a narrowband communication channel. The ASCπ text image of Figure 1, encoded as a lossless JPEG2000, is 2.2MB. This size would be somewhat reduced if the chapters of the book were more equal in length, resulting in less empty space (encoded as

¹ See e.g. David Taubman's implementation of Kakadu, www.kakadusoftware.com. Taubman was on the JPEG2000 ISO standards committee. zeros) in the text-image. Much more important than the overall file size, however, is the ability of an ordinary JPIP server to serve this file to a client selectively and incrementally. Any client viewing the text at a zoom level sufficient to read the characters (this requires well over 1 pixel/character on the client-side display) can use JPIP to request only the relevant portion of the text. This operation is efficient, and adequate performance could be achieved for a reader of the text even with a very low bandwidth connection to the server, under conditions that would make it prohibitive to download the entire text

Note that similar effects could be achieved using a client/server technology specifically designed for selective access to large texts, but the text-image approach (as we will call it) has a number of advantages over conventional implementations: it uses existing technology and protocols designed for very large data volume it easily scales up to texts many gigabytes in size, or more, without any degradation of performance because access is inherently two-dimensional, in many situations (for example, when text is to be viewed in columns as in the Ulysses case) this approach is much more efficient than approaches that deal with text as a one-dimensional string because wavelets are used in JPEG2000, the text is subject to a multiresolution representation; although the text cannot be read at other than the final (most detailed) resolution, the broader spatial support of lower-resolution wavelets naturally provides an intelligent client-side cache for text near the region of interest; moving the ROI, as during scrolling, is thus highly efficient.

Extending this approach to deal with formatted text, Unicode, or other metadata is straightforward, as all such data can be represented using ASCII text strings, possibly with embedded escape sequences.

In many applications, JPEG2000 is used as a lossy compression format, meaning that the decoded image bytes are not necessarily identical to the original bytes. Clearly if the image bytes represent text, lossy compression is not acceptable. One of the design goals of JPEG2000 was, however, to support lossless compression efficiently, as this is important in certain sectors of the imaging community (e.g. medical and scientific). Lossless compression ratios for photographic images are typically only around 2:1, as compared with visually acceptable lossy images, which can usually easily be compressed by 24:1.

Image compression, both lossy and lossless, can operate best on images that have good spatial continuity, meaning that the differences between the intensity values of adjacent pixels are minimized. The raw ASCII encoding is clearly not optimal from this perspective. One very simple way to improve the encoding is to reorder characters by frequency in the text or simply in the English language, from highest to lowest: code 0 remains empty space, code 1 becomes the space character, and codes 2 onward are e, t, a, o, i, n, s, r, h, 1, etc. Figures 2 and 3 compare text-images with ASCII encoding and with this kind of character frequency encoding. Clearly pixel values tend to cluster near zero; at least as importantly, the difference between one letter and the next tends to be substantially decreased. When this frequency encoded text-image is compressed losslessly as a JPEG2000, the file size is 1.6MB, barely larger than the raw ASCII text file (1.5MB) and 37% smaller than the ASCII encoded text-image. With further optimizations of the letter encoding, the compressed file size can drop well below the ASCII text file size. The further optimizations can include, but are not limited to: using letter transition probabilities (Markov-1) to develop the encoding, instead of just frequencies (Markov-0) encoding as pixels the delta or difference between one character and the next, rather than the characters themselves.

With these added optimizations, we add to the advantages listed earlier that on the server side, text ready to be served in this fashion is actually compressed relative to the raw

ASCΠ.

The new invention is discussed here in the context of JPEG2000/JPIP as a selective image decompression technology, but nothing about the invention limits it to that particular format or protocol. For example, LizardTech's MrSID format and protocol has similar properties, and would also work.

Figure 1. Full Ulysses text-image, raw ASCII encoding (white=0, black=255).

Figure 2. Text-image of first five chapters of Ulysses (truncated). Raw ASCII encoding; white=0, black=255.

Figure 3. Text-image of first five chapters of Ulysses (truncated) encoded by frequency (simplest remapping). CLAIM:

A method of spatially encoding large texts and the like comprising encoding the ascii value of each of a plurality of characters into an intensity level.

NONLINEAR CACHING FOR VIRTUAL BOOKS, WIZARDS OR SLIDESHOWS

This invention relates to a novel method for accessing visual data, usually images, by computer. It is applicable to any situation in which visual content consists of a series of objects viewed one or a few at a time, in some established order.

The infinite sum of the series y(n)=n^Λ-p, n going from 1 to infinity, with p>l is finite. Similarly, the sum of y=l/b^Λn is finite for b>l. (For example, in the latter case, if b=2, the sum is exactly 2.) The concept of convergent series like these can be used to implement a highly efficient form of data caching, such as that described in the attached Exhibit A, a provisional application we previously filed. It is particularly applicable to virtual books (or "e-books"), "wizards" (in the graphical user interface sense, a term denoting a linear progression of interaction windows for executing a multistep process), virtual slideshows, or other similar displays based on a temporal progression of visual content.

Prior art.

Some popular image browsing applications, e.g. ACDSee™ by ACD Systems, implement "read-ahead" and "read-behind" strategies to avoid flicker or lack of responsiveness during virtual image slideshows. This involves loading and decompressing the next and previous image files in a presentation in addition to the current picture. When the user presses a key, a timer expires or some other event signals a change of image, the image being displayed is immediately replaced with the next image which has been "waiting in the wings", and the following image is read and decoded to prepare for the next transition. The old previous image — now two images behind — is normally erased from memory, keeping the number of images in memory at 3. "Read-behind" conversely allows instant replacement of the image on¬ screen with the previous image. In effect, this strategy makes the computer always "ready" to display the next or previous image (unless the user's rate of frame advance outstrips the computer's image decoding rate). Without read-ahead, when the user requests the next image, the computer must in general either delay response until the next image has been read and decoded, or update the display incrementally as the next image is decoded, often resulting in flashing, flickering, or otherwise distracting transitions. Both of these compromises are annoying to the user.

The downsides of read-ahead/read-behind are: the user cannot skip more than one image forward or back without eliminating the benefit, reintroducing delays, lack of responsiveness or flickering;

- if the user moves either forward or in reverse through the images more rapidly than the rate at which they can be fully decompressed, then the benefit is again eliminated; memory use is three times that required to hold the current image (assuming all images are the same size).

Details of the invention.

The present invention extends the concept of read-ahead/read-behind in conjunction with multiresolution imagery. Multiresolution imagery can be decompressed at a ladder of resolutions, e.g. full size, half size (on each side), quarter size, eighth size, etc. In general, the time required to decompress an image at 1/8 size should be about 1/8 of the time required to decompress it at full resolution; and, of course, 1/8 of the memory is required to hold the image at 1/8 size.

In its simplest instantiation, the invention involves maintaining a complete representation in memory of the current image, a half-size representation of the next and previous images, a quarter-size representation of the images before the last and after the next, etc. It is easily verified that the two-sided infinite sum of image sizes is ... + 1/16 + 1/8 + 1/4 + 1/2 + 1 + 1/2 + 1/4 + 1/8 + 1/16 + ... = 3. Hence the storage requirement is identical to that of ordinary read- ahead/read-behind. However, in theory, ALL of the images in any presentation (or mathematically, an infinite number of images) are represented at some resolution, albeit perhaps very low. In practice, finite image size and discrete memory implies that only a finite number of images will fit, but this number may be quite large. In some applications, it may be desirable to cache only the next images, or only the previous images; these cases sum to 2 instead of 3. The resolution progression can be defined as a function r(i), where the integer i=..., -3,-2,-1,0,1,2,3, ... denotes position in the image queue relative to the current image i=0. (i=l is then the next image, and i=-l is the previous image.) Other resolution progressions with the general property that r(0)=l, r(i)<=l if i is not equal to O₅ and r(i) does not increase as the absolute value |i| increases may also be suitable. Even progressions for which the sum of r(i) does not converge to a finite number may be suitable, as in practice there will not usually be an infinite number of images.

This multiscale representation of images must be coupled with a multiresolution rendering scheme to allow the client or viewer to respond instantly to user requests to switch images. Such a rendering scheme simply interpolates or "upsamples" lower-resolution representations of an image for display on a high-resolution screen. As additional, higher- resolution image data become available, the display must then refine dynamically to reflect the new higher-quality data. This refinement may be instantaneous, or it may be accomplished using gradual blending or other techniques that mask the transition from low to high visual quality. In general, interpolating from very low-resolution image data results in a blurry appearance. If high-resolution imagery replaces lower-resolution interpolated imagery, blending in over time, the perceptual effect is is for the image to appear to "come into focus".

Upon transition to a different image, the viewer or client must request additional data from the server, or load additional data from files, to improve the quality both of the new current image and of surrounding images (normally the new next images, if advancing images, or the new previous images, if stepping backward). Unneeded high-resolution data may also be discarded when the current image changes, to preserve the total memory footprint.

This scheme has many advantages over traditional look-ahead/look-behind, including: the user can skip any number of images forward or backward at a time — larger skips simply result in a blurrier initial appearance of the new image; the memory footprint may be no bigger than traditional methods, and may even be smaller (if deemed necessary) by making the function r(i) drop off more sharply, e.g. ..., 1/64, 1/16, 1/4, 1, 1/4, 1/16, 1/64, ....

- the rate at which the user can "flip through" images is unlimited; rapid flipping may simply result in a blurry appearance during flipping. Due to psychophysical limits on the perception of fine detail in moving visual stimuli, some or all of this blurriness during flipping may even remain invisible to the user. Extensions.

Although the preceding discussion refers exclusively to varying image resolution, note that other progressive decompression schemes exist that can represent an image using more or less data, with dynamic improvement in image quality as additional data become available. For example, in transform-based coding, additional transform coefficients can progressively improve quality. The methods described above extend naturally to these other progressive (though not necessarily multiresolution) schemes.

The invention as described above applies to linear sequences of images, but it may be extended to "graphs" of images, i.e. collections of images in which one image may (potentially by user choice) be followed by or preceded by more than one possible next or previous image. In this case, the original function r(i) may be applied to all images which might follow the current image via one transition, two transitions, etc.; or particular "paths" through the set of images may be weighted preferentially; or constraints can be used to allocate some fixed amount of memory among all possible preceding or successive images according to some allocation algorithm.

Finally, although the preceding discussion has assumed that images are static and previously compressed, all of the techniques described are equally applicable in situations where image content is generated dynamically, representing the output of a computation or the visual interface of a program or applet. In this case, the computation, program or applet must be able to render itself to a large or small "virtual display", that is, at different resolutions (or to render itself coarsely or finely in varying degrees, using some non-pixel representation). As with images, it must be the case that the time a program takes to render itself at quarter size is approximately a quarter of the time it would take to render at full size.

CLAIM:

A method comprising caching an image, said step of caching being nonlinear.

METHOD FOR EFFICIENTLY INTERACTING WITH DYNAMIC, REMOTE PHOTO ALBUMS WITH LARGE NUMBERS OF POTENTIALLY LARGE IMAGES

Recently developed image compression and transmission standards such as JPEG2000/JPIP¹ have enabled the interactive display of large images (i.e. gigapixels in size) over narrow bandwidth communication channels. However, these emerging standards and technologies do not provide any obvious means for achieving a more ambitious goal: to allow flexible visual interaction with a very large number of images simultaneously, each of which may also potentially be very large. The present invention enables such interaction. The following scenarios both make concrete the system's technical capabilities and describe some of the applications enabled by the technology.

Scenario #la. A user keeps her entire collection of digital photos (5 megapixels each) on the hard drive of her notebook computer. She is an avid photographer, and after several years, this collection totals 25,000 images. She uses the present invention to organize the entire collection, and is able to rearrange the photos dynamically to sort them by date, size, color, or other properties, and extract subsets. When viewing the entire collection, she can zoom out smoothly and continuously until all of the photos are in view,² zoom in to view a detail of a single photo, or zoom to any intermediate view.

Scenario #lb. The user of scenario #la can configure her home computer as a server, and then navigate the entire photo collection from a remote client computer just as in scenario #la.

Scenario #2a. An art museum has invested in high-resolution scanning of all of its paintings (100 megapixels and up), and puts together an online exhibit featuring dozens or hundreds of these, organized spatially with descriptive captions. Using the present invention, not

¹ See e.g. David Taubman's implementation of Kakadu, www.kakadusoftware.com. Taubman was on the JPEG2000 ISO standards committee.

² Assuming that the display is high-resolution (1920x1200 = 2.3 megapixels), viewing the entire collection of 25,000 images simultaneously gives about 92 square pixels per image, so each "thumbnail" is about sqrt(92) = 9.6 pixels on a side. Surprisingly, even these very small thumbnails can often suggest the character of the image, and at a minimum images from a series with a similar color gamut or composition will be clearly identifiable. only can this exhibit be accessed locally from within the museum, but even a remote user browsing the collection through a low-bandwidth collection can pan and zoom in or out to navigate the material, just as in scenarios #la and #lb.

Scenario #2b. The art museum creates a virtual three-dimensional space representing the museum building, with high-resolution scans of all of the artworks in their "geographically correct" positions within the 3D model. Alternatively three-dimensional virtual museum spaces can be created with no physical counterpart. These 3D models can be navigated in a manner similar to the 2D versions of scenarios #2a, either locally or remotely. The analogue to the two- dimensional operation of zooming in is moving closer to an image surface, and zooming out is analogous to moving away from the image surface.

Scenario #2c. The museum also scans its 14^th century Book of Hours at very high resolution, yielding hundreds of images of > 100 megapixels. These are assembled into a "virtual book", a high-quality surrogate for the original, available online. The book can be navigated locally or remotely, with turnable pages in three dimensions.

The key features of JPEG2000/JPIP relevant to enabling the current invention are: a multiscale image representation, making it efficient to decompress an image file at a ladder of resolutions lower than full resolution. In most cases, these resolutions are are downsampled from the original by powers of two, e.g. if the original is 512x512 pixels, then 256x256, 128x128, 64x64, 32x32, 16x16, 8x8, 4x4, 2x2 and 1x1 representations will also be available. The 1x1 version is just a single pixel value corresponding the the average color of the entire image; progressively higher resolutions add progressively more details. For some images, the lowest resolutions (e.g. 4x4, 2x2 and 1x1) may not be available. the ability to selectively decompress only a portion of an image (called a "region of interest" or ROI) at a given resolution, e.g., a 32x32 pixel section from the 256x256 resolution of a 512x512 image. the ability to perform this multiscale selective decompression efficiently on a server (i.e. without parsing the entire image file), and serve to a remote client only the limited amount of information necessary to reconstruct the region and resolution of interest. The amount of information sent should be approximately proportional to the size of the ROI.

Any other image compression format/protocol satisfying these requirements would be equally suitable. We will refer to the image format simply as "multiscale", with the understanding that it could be wavelet-based, like JPEG2000, or based on some other technology.

The present invention defines precomputed steps and interactive rendering algorithms which can be used in a variety of configurations to implement the scenarios listed above. All of these scenarios involve user interaction with a "universe" of images; the starting point for precomputation is therefore a list of the filenames, URLs, or other strings referencing the individual images. When the user is zoomed out far enough to view all of these images at once, it is impractical for either the client or the server to traverse all of the image files, as there may be a very large number of them. For example, in the regime where individual images occupy 2x2=4 pixels onscreen, tens or hundreds of thousands of images may be in view; even if these images support efficient low-resolution access, merely opening and closing 100,000 files involves a large overhead and cannot be accomplished on an interactive timescale. It is therefore necessary to use a cached representation of low-resolution versions of these images, called a "montage". The montage is a mosaic or collage of all of the images, rendered at low resolution and packed efficiently into a rectangular area, as shown in Figure 1. Auxiliary metadata, which can be embedded in the montage image file or stored separately, identifies rectangular regions on the montage image with a particular image file.

;!l ,.. ^:.r,? a "

tø<- Ii §§ ^■ ss

Figure 1. More than 1000 images (a collection of digitized maps of various sizes) packed into a montage.

In the simplest implementation of the present invention, the montage image itself can be navigated using a zooming and panning interface. When the user zooms in far enough to exhaust the resolution available in the montage image, the metadata refers the client to individual image files, and the client uses imagery from these to render at higher resolution. The overall montage size in pixels is chosen such that its resolution is only exhausted during a zoom-in at a stage where only a few images are visible simultaneously; therefore it is never necessary to access more than a few images at a time. During subsequent zooming and panning, image streams are opened and closed as necessary to limit the number open at any given time.

This simplest approach to navigating many images of high resolution suffers a major drawback: the montage layout is designed for packing efficiency, but the user may want a different visual arrangement onscreen. Moreover, the user may want to be able to dynamically rearrange the layout of images on the screen. To enable this, we can make use of a graphics rendering technique known as "texture mapping", which may be implemented in software but is in general hardware-accelerated on modern personal computers. Texture mapping allows a portion of a "texture", or source image, to be drawn on the display, optionally rescaling the image, rotating it, and performing a three-dimensional perspective transform. Other hardware- accelerated transformations are often supported, including color correction or alteration, full or partial transparency, lighting, occlusion, and coordinate remapping. A low-resolution version of the montage can be used as a "texture", so that when the user is zoomed out, the individual images within the montage can be dynamically remapped in any way, as in Figure 2. More than one texture map may be used, in which case each texture map may be a montage containing a subset of the images.

Figure 2. Snapshot of approximately 3000 images rearranging themselves dynamically into a random configuration.

In another enhanced embodiment, the texture mapping technique (which is generally only applicable to low-resolution renditions of the montage image or images) can be used only during dynamic rearrangement; when the image arrangement is static, software compositing can be used to assemble all or part of a higher-definition rearranged montage on-screen. This software compositing method is especially valuable in combination with the lazy multiresolution rendering techniques described in US Patent application number 10/790,253, a copy of which is provided herewith as Ex. A. This method in effect creates a new "display montage" by rearranging the imagery of the original montage.

It it possible to use montage rearrangement of this kind to support reorganization of the images without recourse to texture mapping. In this case transitions between arrangements may or may not be animated. Further extensions.

Texture mapping, software rendering, or any combination of the two can be used to render imagery in three dimensions instead of on the plane. Dynamic rearrangement in three dimensions is also possible. Three-dimensional applications include virtual galleries or other walk-through environments as well as virtual books, especially when used in combination with the invention described in a copending provisional application filed by the applicant concurrently herewith, and attached hereto as Exhibit B. The virtual book application is illustrated in Figure 3. This example also illustrates an extension of the method in which an alpha channel, for partial transparency (the rough edges) is stored as image information in addition to the red, green and blue color components. Most implementations of hardware-accelerated texture mapping support an alpha channel. Another extension applicable in either 2D or 3D is dynamic deformation of images, e.g. bending the pages of this book as they turn. .

K v

Figure 3. 3D book.

The invention can also be extended to support visual objects other than static images, such as the output of a visual calculation, or an application or applet.

CLAIM:

A method comprising performing texture mapping during dynamic rearrangement and ceasing to do so when such dynamic rearrangement ceases.

METHOD FOR ENCODING AND SERVING GEOSPATIAL OR OTHER VECTOR DATA AS IMAGES

Recently, image compression standards such as JPEG2000/JPIP¹ have been introduced to meet a demanding engineering goal: to enable very large images (i.e. gigapixels in size) to be delivered incrementally or selectively from a server to a client over a low-bandwidth communication channel. When such images are being viewed at full resolution, only a limited region can fit on a client's graphical display at any given time; the new standards are geared toward selectively accessing such regions and sending across the communication channel only data relevant to the region. If this "region of interest" or ROI changes continuously, then a continuous dialogue between a client and server over a low-bandwidth channel can continue to keep the client's representation of the area inside the ROI accurate.

The present invention relates to an extension of these selectively decompressable image compression and transmission technologies to geospatial or schematic data. It combines and extends methods introduced in previous application (1) Method for spatially encoding large texts, metadata, and other coherently accessed non-image data, attached as exhibit A, and (2) METHODS AND APPARATUS FOR NAVIGATING AN IMAGE attached as exhibit B. In (2), the concept of continuous multiscale roadmap rendering was introduced. The basis for the invention of (2) is a pre-rendered "stack" of images of a roadmap or other vector-based diagram at different resolutions, in which categories of visual elements (e.g. classes of road, including national highway, state highway, and local road) are rendered with different visual weights at different resolutions. During client/server interaction, corresponding areas of more than one of these images are downloaded, and the client's display shows a blended combination of these areas; the blending coefficients and the choice of image resolutions to be blended depend on

¹ See e.g. David Taubman's implementation of Kakadu, www.kakadusoftware.com. Taubman was on the JPEG2000 ISO standards committee. zoom scale. The net result is that a user on the client side can navigate through a large map (e.g. all roads in the United States), zooming and panning continuously, without experiencing any visual discontinuities, such as categories of roads appearing or disappearing as the zoom scale is changed. Rather, at every scale, the most relevant categories are accentuated; for example, when zoomed out to view the entire country, the largest highways are strongly weighted, making them stand out clearly, while at the state level, secondary highways are also weighted strongly enough to be clear. When the user zooms in to the point where the most detailed pre-rendered image is being used, all roads are clearly visible, and in the preferred embodiment for geospatial data, all elements are shown at close to their physically correct scale. A maximum reasonable resolution for these most detailed pre-rendered images may be about 15 meters/pixel; however, it is desirable from the user's standpoint to be able to zoom in farther. Pre-rendering at higher detail is not desirable for several reasons: first, because the file sizes on the server side become prohibitive (a single Universal Transverse Mercator zone image at 15 meters/pixel may already be in the gigapixel range); second, because a pre-rendered image is an inefficient representation for the kind of very sparse black-and-white data normally associated with high-resolution map rendering; and third, because the client may require the "real" vector data for performing computational tasks beyond static visual presentation. For example, a route guidance system may highlight a road or change its color; this can be done on the client side only if the client has access to vector data, as opposed to a pre-rendered image alone. Vector data may also include street names, addresses, and other information which the client must have the flexibility to lay out and render selectively. Pre-rendering street name labels into the map image stack is clearly undesirable, as these labels must be drawn in different places and sizes depending on the precise location and scale of the client view; different label renditions should not blend into one another as the user zooms. Pre-rendering such data would also eliminate any flexibility with regard to font.

To summarize, vector data (where we use the term generically to refer both to geometric and other information, such as place names) is both important to the client in its own right, and a more efficient representation of the information than pre-rendered imagery, when the desired rendering resolution is high. Note, however, that if a large area is to be rendered at low resolution, the vector data may become prohibitively large and complex, making the pre- rendered image the more efficient representation. Even at low resolution, however, some subset of the vector data is necessary, such as the names of major highways.

The present invention extends the methods introduced in (1) to allow spatial vector data to be encoded and transmitted selectively and incrementally to the client, possibly in conjunction with the pre-rendered imagery of (2). Using prior art, this would be accomplished using a geospatial database. The database would need to include all relevant vector data, indexed spatially. Such databases present many implementation challenges. Here, instead of using conventional databases, we use spatially addressable images, such as those supported by JPEG2000/JPIP, to encode and serve the vector data.

In our exemplary embodiment, three images or channels are used for representing the map data, each with 8 bit depth: the prerendered layer is a precomputed literal rendition of the roadmap, as per (2); the pointer layer consists of 2*2 pixel blocks positioned at or very near the roadmap features to which they refer, typically intersections; the data layer consists of n*m pixel blocks centered on or positioned near the 2*2 pointers which refer to them.

Because these three channels are of equal size and in registration with each other, they can be overlaid in different colors (red, green, blue on a computer display, or cyan, magenta, yellow for print media) to produce a single color image. Such images are reproduced in Figures 2-3. Figure 1 shows the prerendered layer alone, for comparison and orientation. The region shown is King County, in Washington state, which includes Seattle and many of its suburbs. Figures 3 a and 3b are closeups from suburban and urban areas of the map, respectively.

Figure 1. Prerendered roadmap of King County, WA.

Figure 2. Color version showing prerendered roads (yellow), pointers (magenta) and data (cyan).

Figure 3a. Closeup of suburban area of King County.

Figure 3b. Closeup of urban area of King County.

If the user navigates to the view of the map shown in Figure 3a, then the client will request from the server the relevant portions of all three image layers, as shown. The prerendered layer (shown in yellow) is the only one of the three displayed on the screen as is. The other two specify the vector data. The pointer image consists of 2x2 pixel blocks aligned on a 2x2 pixel grid, each of which specifies an (x,y) vector offset (with the x and y components of the vector each comprising a 16-bit integer, hence two pixels each) from its own location to the beginning (top left corner) of a corresponding data block in the data layer. The corresponding data block, in turn, begins with two 16-bit values (four pixels) specifying the data block width and height. The width is specified first, and is constrained to be at least 2, hence avoiding ambiguities in reading the width and height. The remainder of the data block can be treated as binary data which may contain any combination of vectors, text, or other information. In the examples of Figures 2-3, data blocks contain streetmap information including street names, address ranges, and vector representations.

The pointer and data layers are precomputed, just as the prerendered layer is. Precomputation for the pointer and data layers consists of encoding all of the relevant vector data into data blocks, and packing both the pointers and data blocks as efficiently as possible into their respective images. In rural or sparse suburban areas (see Figure 3a), features tend to be well-separated, resulting in large empty areas in the pointer and data images. Where pointers do occur, they fall precisely on the feature to which they refer, and their corresponding data blocks are in turn often centered precisely on the pointer. In dense urban areas, however (see Figure 3b), features are often too close together for the pointers and data blocks to all fit. It is therefore necessary to use a rectangle packing algorithm to attempt to place pointers and data blocks as close to their desired positions as possible without any overlaps. The results are evident in Figure 3b: the lakes and coasts near Seattle are filled with data blocks corresponding to features on the land. Because all urban areas are surrounded by sparser areas (suburbs, mountains, or bodies of water), it is always possible to place urban data blocks somewhere on the map; in general, even within a dense city there are enough empty spaces that this "spillover" is not overly severe. The higher the rate of spillover, the less well-localized the map vector data becomes. Spillover decreases drastically as the resolution of the data layer image is increased, and it is always possible to find a resolution at which efficiency and non-locality are appropriately balanced. In North America, 15m/pixel appears to be a good choice. It is "overkill" in rural areas, but near cities, it limits spillover as shown in Figures 2 and 3b.

Efficient rectangle packing is a computationally difficult problem; however, there are numerous approximate algorithms for solving it in the computational geometry literature, and the present invention does not stipulate any particular one of these. The algorithm actually used involves a hierarchical "rectangle tree", which makes the following operations fast: test whether a given rectangle intersects any other rectangle already in the tree; insert a non-overlapping rectangle; find the complete set of "empty corners" (i.e. corners abutting already-inserted rectangles that border on empty space) in a ring of radius rθ<=r<rl around a target point p.

The "greedy algorithm" used to insert a new rectangle as close as possible to a target point then proceeds as follows:

Attempt to insert the rectangle centered on the target point. If this succeeds, algorithm ends.

Otherwise, define radius rO to be half the minimum of the length or width of the rectangle, and rl = rθ*2.

Find all "empty corners" between rO and rl, and sort by increasing radius.

Attempt to place the rectangle at each of these corners in sequence, and on success, algorithm ends.

If none of the attempted insertions succeeds, set rO to rl, set rl to 2*rO, and goto step 3.

This algorithm always succeeds in ultimately placing a rectangle provided that somewhere in the image an empty space of at least the right dimensions exists. It is "greedy" in the sense that it places a single rectangle at a time; it does not attempt to solve the wholistic problem of packing n rectangles as efficiently as possible. (A wholistic algorithm would require defining an explicit measure of packing efficiency, specifying the desired tradeoff between minimizing wasted space and minimizing distance between rectangles and their "target points".

1-4 The greedy algorithm does not require explicitly specifying this tradeoff, as is clear from the algorithm description above.)

Figure 4 demonstrates the output of the basic packing algorithm for three cases. In each case, the algorithm sequentially placed a number of rectangles as near as possible to a common point. This solution to the rectangle packing problem is provided by way of example only.

Figure 4. Test output of the greedy rectangle packing algorithm. On the left, predominantly small, skinny rectangles; in the center, large, square rectangles; and on the right, a mixture.

For the greedy packing algorithm not to give placement preference to any specific areas of the map, it is desirable to randomize the order of rectangle insertion. In a preferred embodiment, pointer/data block pairs are thus inserted in random order. Other orderings may further improve packing efficiency in certain circumstances; for example, inserting large blocks before small ones may minimize wasted space.

Pointers are always 2x2 (our notation is rows x columns); however, for data blocks, there is freedom in selecting an aspect ratio: the required block area in square pixels is determined by the amount of data which must fit in the block, but this area can fit into rectangles of many different shapes. For example, a 24 byte data block (including 4 bytes of width and height information, and 20 bytes of arbitrary data) can be represented exactly as 1x24, 2x12, 3x8, 4x6, 6x4, 8x3, or 12x2. (24x1 is disqualified, as the block width must be at least 2 for the 2-byte width to be decoded before the block dimensions are known on the client side, as described above.) The block can also be represented, with one byte left over, as 5x5. We refer to the set of all factorizations listed above, in addition to the approximate factorization 5x5, as "ceiling factorizations". The requirements for a valid ceiling factorization are that its area be at least the required area, and that no row or column be entirely wasted; for example, 7x4 or 3x9 are invalid ceiling factorizations, as they can be reduced to 6x4 and 3x8 respectively. In the simplest implementation, block dimensions may be selected based only on a ceiling factorization of the data length; in general, "squarer" blocks (such as 4x6) pack better than oblique ones (such as 2x12). The simplest data block sizing algorithm would thus select either 4x6 or 5x5, depending on how it trades off "squareness" against wasted bytes. More sophisticated block size selection algorithms may pick block dimensions adaptively, as part of the search for empty space near the target point. In one implementation, steps 1 and 4 of the algorithm above are then modified as follows:

Sort the ceiling factorizations of the required data length by desirability, with preference for squarer factorizations and possibly a penalty for wasted bytes. Attempt to place rectangles of dimensions given by each ceiling factorization in turn at target point p. If any of these insertions succeeds, algorithm ends.

For each "empty corner" c in turn, attempt to place rectangles of dimensions given by each of the ceiling factorizations in turn at c. On success, algorithm ends.

Further refinements of this algorithm involve specification of a scoring function for insertions, which, as with a wholistic optimization function, trade off wasted space, non-square aspect ratio and distance from the target point.

Each of the three map layers — prerendered roads, pointers and data — is stored as a JPEG2000 or similar spatially-accessible representation. However, the storage requirements for the three layers differ. The prerendered road layer need not be lossless; it is only necessary for it to have reasonable perceptual accuracy when displayed. At 15m/pixel, we have found 0.5 bit/pixel lossy wavelet compression to be fully adequate. The pointer and data layers, however, must be represented losslessly, as they contain data which the client must be able to reconstruct exactly. Lossless compression is not normally very efficient; typical digital imagery, for example, is not usually compressible losslessly by more than a factor of about two at best.

For most forms of either lossy or lossless compression, performance can be optimized by making the image function small in magnitude, hence occupying fewer significant bits. Therefore, in enhanced embodiments, special coding techniques are used to "flatten" the original data. The outcome of these techniques is apparent in Figure 5, which shows the same densely populated region of a data image before and after "flattening". Note that before flattening, the data image has full 8-bit dynamic range, and exhibits high frequencies and stuctured patterns that make it compress very poorly (in fact, a lossless JPEG2000 of this image is no smaller than the original raw size). After "flattening", most of the structure is gone, and a great majority of the pixels have values < 8, hence fitting in 3 bits. The corresponding JPEG2000 has better than 3:1 compression. "Flattening" can consist of a number of simple data transformations, including the following (this is the complete list of transformations applied in the example of Figure 5):

16-bit unsigned values, such as the width or height of the data block, would normally be encoded using a high-order byte and a low-order byte. We may require 16 bits because values occasionally exceed 255 (the 8-bit limit) by some unspecified amount, yet in the majority of cases values are small. For a value that fits in 8 bits, the high-order byte would be zero. Frequent zero high-order bytes followed by significant low-order bytes account for much of the 2-pixel periodicity apparent in parts of Figure 5a. We can remap the 16 bits as follows:

The left eight columns represent the first pixel of the pair, previously the high-order byte; the rightmost eight columns represent the second pixel, previously the low-order byte. By redistributing the bits in this way, the range of accessible values (0-65535) remains unchanged, but the two bytes become much more symmetric. For example, for all 16-bit values 0-255, the two bytes each assume values < 16.

Similar techniques apply to 32-bit or larger integer values. These techniques are also extensible to signed quantities. For variables in which the sign changes frequently, as occurs for differential coding of a road vector, a sign bit can be assigned to position 0, and the absolute value encoded in alternating bytes as above. Note that to be drawn convincingly, road vector data must often be represented at greater than pixel precision. Arbitrary units smaller than a pixel can instead be used, or equivalently, subpixel precision can be implemented using fixed point in combination with the above techniques. In our exemplary embodiment, 4 subpixel bits are used, for 1/16 pixel precision.

When numbers are encoded as described above, it is desirable to make the numbers as small as possible. Sometimes context suggests an obvious way to do this; for example, since the width of any data block must be at least 2, we subtract 2 from data width before encoding. More significantly, both pointers and any position vectors encoded in a data block are specified in pixels relative to the pointer position, rather than absolute coordinates. This not only decreases the magnitude of the numbers to encode greatly; it also allows a portion of the data image to be decoded and rendered vectorially in a local coordinate system without regard for the absolute position of this portion.

For vector rendering of a sequence of points defining a curve (for example, of a road), only the first point need be specified relative to the original pointer position; subsequent points can be encoded as "deltas", or step vectors from the previous point. After the second such point, subsequent points can be encoded as the second derivative, or the difference between the current and previous delta. Encoding using the second derivative is generally efficient for such structures as roads, since they tend to be discretizations of curves with continuity of the derivative — that is, they change their direction gradually.

Another "flattening" technique is described in [1] for use with textual data, which would normally be encoded as ASCII, with a single character per byte. In the application described in [1], English text is being encoded, and hence the letters are remapped based on decreasing frequency of letter occurrence in a representative sample of English. The same technique can be used in this context, although the text encoded in a map, consisting mostly of street names, has quite different statistics from ordinary English. Numerals and capital letters, for example, are far more prominent.

Note that the particular methods for encoding of pointers or data as presented above are exemplary; many other encodings are also possible. "Good" encodings generally result in images which are smooth and/or have low dynamic range.

Figure 5. The same binary 8-bit data (taken from a dense region of the roadmap data image of the Virgin Islands) before (above, a) and after (below, b) "flattening".

Using the techniques above, the King county roadmap at 15 m/pixel compresses as follows:

Surprisingly, the JPEG2000 representation (including lossy pre-rendered roadmap image, lossless pointer layer, and lossless data layer) is actually smaller than the compressed ZIP rile representing the original data as tabulated text. (This file is part of the United States Census Bureau's 2002 TIGER/Line database.) Unlike the original ZIP, however, the new representation is ready to serve interactively to a client, with efficient support for continuously pannable and zoomable spatial access.

The original prerendered multiscale map invention introduced in [2] included not a single prerendered image, but a stack of such images, rendered at progressively coarser resolutions and with rescaled weights for lines (or other visible features). Although no features are omitted from any of these prerenditions, some features are de-emphasized enough to be clearly visible only in an aggregate sense, e.g. the local roads of a city become a faint grey blur at the statewide level. The present invention can be extended to include pointer and data images corresponding to the coarser prerendered roadmap images, in which only a subset of the original vector objects are represented. For example, statewide pointer and data images, which are at much lower resolution than those of Figures 1-3, might only include data for state and national highways, excluding all local roads. These coarser data may also be "abstracts", for example specifying only road names, not vectors. Images at different resolutions might include varying mixtures or subsets of the original data, or abstracted versions. This technique both allows all of the relevant data to fit into the smaller coarse images, and provides the client with the subset of the vector information relevant for navigation at that scale.

Although the implementation outlined above suggests an 8-bit greyscale prerendered map image at every resolution, the prerendered images may also be in color. Further, the prerendered images may be displayed by the client in color even if they are single-channel images, since the vector data can be used to draw important roads in different colors than the prerendered material. Finally, the prerendered images may omit certain features or roads present in the vectorial data, relying on the client to composite the image and vectorial material appropriately. CLAIM:

A method of encoding images using rectangle packing and a JPEG representation.

F:\Clienls\Sand Codex, LtC-489\489-17W89-17 Provisional patent application.doo

APPLICATION FOR LETTERS PATENT

FOR

SYSTEM AND METHOD FOR EXACT RENDERING IN A ZOOMING USER INTERFACE

BY

BLAISE HILARY AGUERA Y ARCAS

Kaplan & Gilman, LLP Attorney No. 489/2 SYSTEM AND METHOD FOR EXACT RENDERING IN A ZOOMING USER INTERFACE

RELATED APPLICATIONS

[0001] This application claims priority to U.S. Provisional No. 60/452,075, filed on March 5, 2003, U.S. Provisional No. 60/453,897, filed on March 12, 2003, U.S. Provisional No. 60/475,897, filed on June 5, 2003, and U.S. Provisional No. 60/474,313, filed on May 30, 2003.

FIELD OF THE INVENTION

[0002] The present invention relates generally to graphical zooming user interfaces (ZUI) for computers. More specifically, the invention is a system and method for progressively rendering zoomable visual content in a manner that is both computationally efficient, resulting in good user responsiveness and interactive frame rates, and exact, in the sense that vector drawings, text, and other non-photographic content is ultimately drawn without the resampling which would normally lead to degradation in image quality, and without interpolation of other images, which would also lead to degradation.

BACKGROUND OF THE INVENTION

[0003] Most present-day graphical computer user interfaces (GUIs) are designed using visual components of a fixed spatial scale. However, it was recognized from the birth of the field of computer graphics that visual components could be represented and manipulated in such a way that they do not have a fixed spatial scale on the display, but can be zoomed in or out. The desirability of zoomable components is obvious in many application domains; to name only a few: viewing maps, browsing through large heterogeneous text layouts such as newspapers, viewing albums of digital photographs, and working with visualizations of large data sets. Even when viewing ordinary documents, such as spreadsheets and reports, it is often useful to be able to glance at a document overview, and then zoom in on an area of interest. Many modern computer applications include zoomable components, such as Microsoft® Word ® and other Office ® products (Zoom under the View menu), Adobe ® Photoshop ®, Adobe ® Acrobat ®, QuarkXPress ®, etc. In most cases, these applications allow zooming in and out of documents, but not necessarily zooming in and out of the visual components of the applications themselves. Further, zooming is normally a peripheral aspect of the user's interaction with the software, and the zoom setting is only modified occasionally. Although continuous panning over a document is standard (i.e., using scrollbars or the cursor to translate the viewed document left, right, up or down), the ability to zoom and pan continuously in a user-friendly manner is absent from prior art systems.

[0004] First, we set forth several definitions. A display is the device or devices used to output rendered imagery to the user. A frame buffer is used to dynamically represent the contents of at least a portion of the display. Display refresh rate is the rate at which the physical display, or portion thereof, is refreshed using the contents of the frame buffer. A frame buffer's frame rate is the rate at which the frame buffer is updated.

[0005] For example, in a typical personal computer, the display refresh rate is 60- 90 Hz. Most digital video, for example, has a frame rate of 24-30 Hz. Thus, each frame of digital video will actually be displayed at least twice as the display is refreshed. Plural frame buffers may be utilized at different frame rates and thus be displayed substantially simultaneously on the same display. This would occur, for example, when two digital videos with different frame rates were being played on the same display, in different windows.

[0006] One problem with zooming user interfaces (ZUT) is that the visual content has to be displayed at different resolutions as the user zooms. The ideal solution to this problem would be to display, in every consecutive frame, an exact and newly computed image based on the underlying visual content. The problem with such an approach is that the exact recalculation of each resolution of the visual content in real time as the user zooms is computationally impractical if the underlying visual content is complex.

[0007] As a result of the foregoing, many prior art ZUI systems use a plurality of precomputed images, each being a representation of the same visual content but at different resolutions. We term each of those different precomputed images a Level of Detail (LOD). The complete set of LODs, organized conceptually as a stack of images of decreasing resolution, is termed the LOD pyramid — see Fig. 1. In such prior systems, as zooming occurs, the system interpolates between the LODs and displays a resulting image at a desired resolution. While this approach solves the computational issue, it displays a final compromised image that is often blurred and unrealistic, and often involves loss of information due to the fact that it represents interpolation of different LODs. These interpolation errors are especially noticeable when the user stops zooming and has the opportunity to view a still image at a chosen resolution which does not precisely match the resolution of any of the LODs.

^• [0008] Another problem with interpolating between precomputed LODs is that this approach typically treats vector data in the same way as photographic or image data. Vector data, such as blueprints or line drawings, are displayed by processing a set of abstract instructions using a rendering algorithm, which can render lines, curves and other primitive shapes at any desired resolution. Text rendered using scalable fonts is an important special case of vector data. Image or photographic data (including text rendered using bitmapped fonts) are not so generated, but must be displayed either by interpolation between precomputed LODs or by resampling an original image. We refer to the latter herein as nonvector data.

[0009] Prior art systems that use rendering algorithms to redisplay vector data at a new resolution for each frame during a zoom sequence must restrict themselves to simple vector drawings only in order to achieve interactive frame rates. On the other hand, prior art systems that precompute LODs for vector data and interpolate between them, as for nonvector data, suffer from markedly degraded visual quality, as the sharp edges inherent in most vector data renditions are particularly sensitive to interpolation error. This degradation is usually unacceptable for textual content, which is a special case of vector data.

[0010] It is an object of the invention to create a ZUI that replicates the zooming effect a user would see if he or she actually had viewed a physical object and moved it closer to himself or herself.

[0011] It is an object of the invention to create a ZUI that displays images at an appropriate resolution but which avoids or diminishes the interpolation errors in the final displayed image. A further object of the present invention is to allow the user to zoom arbitrarily far in on vector content while maintaining a crisp, unblurred view of the content and maintaining interactive frame rates. [0012] A further object of the present invention is to allow the user to zoom arbitrarily far out to get an overview of complex vectorial content, while both preserving the overall appearance of the content and maintaining interactive frame rates.

[0013] A further object of the present invention is to diminish the user's perception of transitions between LODs or rendition qualities during interaction.

[0014] A further object of the present invention is to allow the graceful degradation of image quality by blurring when information ordinarily needed to render portions of the image is as yet incomplete.

[0015] A further object of the present invention is to gradually increase image quality by bringing it into sharper focus as more complete information needed to render portions of the image becomes available.

[0016] It is an object of the invention to optimally and independently render both vector and nonvector data.

[0017] These and other objects of the present invention will become apparent to those skilled in the art from a review of the specification that follows.

SUMMARY OF THE INVENTION

[0018] The above and other problems of the prior art are overcome in accordance with the present invention, which relates to a hybrid strategy for implementing a ZUI allowing an image to be displayed at a dynamically varying resolution as a user zooms in or out, rotates, pans, or otherwise changes his or her view of an image. Any such change in view is termed navigation. Zooming of the image to a resolution not equal to that of any of the predefined LODs is accomplished by displaying the image at a new resolution that is interpolated from predefined LODs that "surround" the desired resolution. By "surrounding LODs" we mean the LOD of lowest resolution which is greater than the desired resolution and the LOD of highest resolution which is less than the desired resolution. If the desired resolution is either greater than the resolution of the LOD with the highest available resolution or less than the resolution of the LOD with the lowest resolution, then there will be only a single "surrounding LOD". The dynamic interpolation of an image at a desired resolution based on a set of precomputed LODs is termed in the literature mipmapping or trilinear interpolation. The latter term further indicates that bilinear sampling is used to resample the surrounding LODs, followed by linear interpolation between these resampled LODs (hence trilinear). See, e.g.; Lance Williams. "Pyramidal Parametrics " Computer Graphics (Proc. SIGGRAPH '83) 17(3): 1-11 (1983). The foregoing document is incorporated herein by reference in its entirety. Obvious modifications of or extensions to the mipmapping technique introduced by Williams use nonlinear resampling and/or interpolation of the surrounding LODs. In the present invention it is immaterial whether the resampling and interpolation operations are zeroth-order (nearest-neighbor), linear, higher-order, or more generally nonlinear.

[0019] In accordance with the invention described herein, when the user defines an exact desired resolution, which is almost never the resolution of one of the predefined LODs, the final image is then displayed by preferably first displaying an intermediate final image. The intermediate final image is the first image displayed at the desired resolution before that image is refined as described hereafter. The intermediate final image may correspond to the image that would be displayed at the desired resolution using the prior art.

[0020] In a preferred embodiment, the transition from the intermediate final image to the final image may be gradual, as explained in more detail below.

[0021] In an enhanced embodiment, the present invention allows LODs to be spaced in any resolution increments, including irrational increments (i.e. magnification or minification factors between consecutive LODs which cannot be expressed as the ratio of two integers), as explained in more detail below.

[0022] In another enhanced embodiment, portions of the image at each different LOD are denoted tiles, and such tiles are rendered in an order that minimizes any perceived imperfections to a viewer. In other embodiments, the displayed visual content is made up of plural LODs (potentially a superset of the surrounding LODs as described above), each of which is displayed in the proper proportion and location in order to cause the display to gradually fade into the final image in a manner that conceals imperfections.

[0023] The rendition of various tiles in plural LODs is accomplished in an order that optimizes the appearance of the visual content while staying within acceptable levels of computational complexity so that the system can run on standard computers with typical clock speeds available in most laptop and desktop personal computers.

[0024] The present invention involves a hybrid strategy, in which an image is displayed using predefined LODs during rapid zooming and panning, but when the view stabilizes sufficiently, an exact LOD is rendered and displayed. The exact LOD is rendered and displayed at the precise resolution chosen by the user, which is normally different from the predefined LODs. Because the human visual system is insensitive to fine detail in the visual content while it is still in motion, this hybrid strategy can produce the illusion of continuous "perfect rendering" with far less computation.

BRIEF DESCRIPTION OF THE DRAWINGS

[0025] Figure 1 depicts an LOD pyramid (in this case the base of the pyramid, representing the highest-resolution representation, is a 512x512 sample image, and successive nullifications of this image are shown in factors of 2);

[0026] Figure 2 depicts a flow chart for use in an exemplary embodiment of the invention;

[0027] Figure 3 is another flow chart that shows how the system displays the final image after zooming;

[0028] Figure 4 is the LOD pyramid of Figure 1 with grid lines added showing the subdivision of each LOD into rectangular tiles of equal size in samples;

[0029] Figure 5 is another flow chart, for use in connection with the present invention, and it depicts a process for displaying rendered tiles on a display;

[0030] Figure 6 shows a concept termed irrational tiling, explained in more detail herein; and

[0031] Figure 7 depicts a composite tile and the tiles that make up the composite tile, as explained more fully below. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

[0032] Figure 2 shows a flowchart of a basic technique for implementation of the present invention. The flowchart of Figure 2 represents an exemplary embodiment of the invention and would begin executing when an image is displayed at an initial resolution. It is noted that the invention may be used in the client/server model, but that the client and server may be on the same or different machines. Thus, for example, there could be a set of discrete LODs stored remotely at a host computer, and the user can be connected to said host through a local PC. The actual hardware platform and system utilized are not critical to the present invention.

[0033] The flowchart is entered at start block 201 with an initial view of an image at a particular resolution. In this example, the image is taken to be static. The image is displayed at block 202. A user may navigate that image by moving, for example, a computer mouse. The initial view displayed at block 202 will change when the user navigates the image. It is noted that the underlying image may itself be dynamic, such as in the case of motion video, however, for purposes of this example, the image itself is treated as static. As explained above, any image to be displayed may also have textual or other vector data and/or nonvector data such as photographs and other images. The present invention, and the entire discussion below, is applicable regardless of whether the image comprises vector or nonvector data, or both.

[0034] Regardless of the type of visual content displayed in block 202, the method transfers control to decision point 203 at which navigation input may be detected. If such input is not detected, the method loops back to block 202 and continues displaying the stationary visual content. If a navigation input is detected, control will be transferred to block 204 as shown.

[0035] Decision point 203 may be implemented by a continuous loop in software looking for a particular signal that detects movement, an interrupt system in hardware, or any other desired methodology. The particular technique utilized to detect and analyze the navigation request is not critical to the present invention. Regardless of the methodology used, the system can detect the request, thus indicating a desire to navigate the image. Although much of the discussion herein relates to zooming, it is noted that the techniques are applicable to zooming, panning, or otherwise navigating. Indeed, the techniques described herein are applicable to any type of dynamic transformation or change in perspective on the image. Such transformations may include, for example, three dimensional translation and rotation, application of an image filter, local stretching, dynamic spatial distortion applied to selected areas of the image, or any other kind of distortion that might reveal more information. Another example would be a virtual magnifying glass, that can get moved over the image and which magnifies parts of the image under the virtual magnifying glass. When decision point 203 detects that a user is initiating navigation, block 204 will then render and display a new view of the image, which may be, for example, at a different resolution from the prior displayed view.

[0036] One straightforward prior art technique of displaying the new view is based upon interpolating LODs as the user zooms in or out. The selected LODs may be those two LODs that "surround" the desired resolution; i.e.; the resolution of the new view.. The interpolation, in prior systems, constantly occurs as the user zooms and is thus often implemented directly in the hardware to achieve speed. The combination of detection of movement in decision point 205 and a substantially immediate display of an appropriate interpolated image at block 204 results in the image appearing to zoom continuously as the user navigates. During zooming in or out, since the image is moving, an interpolated image is sufficient to look realistic and clear. Any interpolation error is only minimally detectable by the human visual system, as- such errors are disguised by the constantly changing view of the image.

[0037] At decision point 205, the system tests whether or not the movement has substantially ceased. This can be accomplished using a variety of techniques, including, for example, measuring the rate of change of one or more parameters of the view. That is, the methodology ascertains whether or not the user has arrived at the point where he has finished zooming. Upon such stabilization at decision point 205, control is transferred to block 206, where an exact image is rendered, after which control returns to block 203. Thus, at any desired resolution, the system will eventually display an exact LOD.

[0038] Notably, the display is not simply rendered and displayed by an interpolation of two predefined LODs, but may be rendered and displayed by re- rendering vector data using the original algorithm used to render the text or other vector data when the initial view was displayed at block 202. Nonvector data may also be resampled for rendering and displayed at the exact required LOD. The required re- rendering or resampling may be performed not only at the precise resolution required for display at the desired resolution, but also on a sampling grid corresponding precisely to the correct positions of the display pixels relative to the underlying content, as calculated based on the desired view. As an example, translation of the image on the display by ¹A

1.61 pixel in the display plane does not change the required resolution, but it does alter the sampling grid, and therefore requires re-rendering or resampling of the exact LOD.

[0039] The foregoing system of Fig. 2 represents a hybrid approach in which interpolation based upon predefined LODs is utilized while the view is changing (e.g. navigation is occurring) but an exact view is rendered and displayed when the view becomes substantially stationary.

[0040] For purposes of explanation herein, the term render refers to the generation by the computer of a tile at a specific LOD based upon vector or nonvector data. With respect to nonvector data, these may be rerendered at an arbitrary resolution by resampling an original image at higher or lower resolution.

[0041] We turn now to the methodology of rendering and displaying the different portions of the visual content needed to achieve an exact final image as represented by block 206 of Fig. 2. With reference to Fig. 3, when it is determined that navigation has ceased, control is transferred to block 303 and an interpolated image is immediately displayed, just as is the case during zooming. We call this interpolated image that may be temporarily displayed after the navigation ceases the intermediate final image, or simply an intermediate image. This image is generated from an interpolation of the surrounding LODs. In some cases, as explained in more detail below, the intermediate image may be interpolated from more than two discrete LODs, or from two discrete LODs other than the ones that surround the desired resolution.

[0042] Once the intermediate image is displayed, block 304 is entered, which causes the image to begin to gradually fade towards an exact rendition of the image, which we term the final image. The final image differs from the intermediate image in that the final image may not involve interpolation of any predefined LODs. Instead, the final image, or portions thereof, may comprise newly rendered tiles. In the case of photographic data, the newly rendered tiles may result from resampling the original data, and in the case of vector data, the newly rendered tiles may result from rasterization at the desired resolution.

[0043] It is also noted that it is possible to skip directly from block 303 to 305, immediately replacing the interpolated image with a final and exact image. However, in the preferred embodiment, step 304 is executed so the changeover from the intermediate final image to the final image is done gradually and smoothly. This gradual fading, sometimes called blending, causes the image to come into focus gradually when^' navigation ceases, producing an effect similar to automatic focusing in cameras or other optical instruments. The illusion of physicality created by this effect is an important aspect of the present invention.

[0044] Following is a discussion of the manner in which this fading or blending may take place in order to minimize perceived irregularities, sudden changes, seams, and other imperfections in the image. It is understood however that the particular technique of fading is not critical to the present invention, and that many variations will be apparent to those of skill in the art.

[0045] Different LODs differ in the number of samples per physical area of the underlying visual content. Thus, a first LOD may take a 1 inch by 1 inch area of a viewable object and generate a single 32 by 32 sample tile. However, the information may al,so be rendered by taking the same 1 inch by 1 inch area and representing it as a tile that is 64 by 64 samples, and therefore at a higher resolution. [0046] We define a concept called irrational tiling. Tiling granularity, which we will write as the variable g, is defined as the ratio of the linear tiling grid size at a higher- resolution LOD to the linear tiling grid size at the next lower-resolution LOD. In the Williams paper introducing trilinear interpolation, g = 2. This same value of g has been used in other prior art. Although LODs may be subdivided into tiles in any fashion, in an exemplary embodiment each LOD is subdivided into a grid of square or rectangular tiles containing a constant number of samples (except, as required, at the edges of the visual content). Conceptually, when g = 2, each tile at a certain LOD "breaks up" into 2x2=4 tiles at the next higher-resolution LOD (again, except potentially at the edges), as shown in Figure 4.

[0047] There are fundamental shortcomings in tilings of granularity 2. Usually, if a user zooms in on a random point in a tile, every g-fold increase in zoom will require the rendition of a single additional tile corresponding to the next higher-resolution LOD near the point toward which the user is zooming. However, if a user is zooming in on a grid line in the tiling grid, then two new tiles need to be rendered, one on either side of the line. Finally, if a user is zooming in on the intersection of two grid lines, then four new tiles need to be rendered. If these events — requests for 1, 2 or 4 new tiles with each g- fold zoom — are interspersed randomly throughout an extended zooming sequence, then overall performance will be consistent. However, a grid line in any integral-granularity tiling (i.e. where g is a whole number) remains a grid line for every higher-resolution LOD.

.[0048] Consider, for example, zooming in on the center of a very large image tiled with granularity 2. We will write the (x,y) coordinates of this point as (¹A, Vi), adopting the convention that the visual content falls within a square with corners (0,0), (0,1), (1,0) and (1,1). Because the center is at the intersection of two grid lines, as the user reaches each higher-resolution LOD, four new tiles need to be rendered every time; this will result in slow performance and inefficiency for zooming on this particular point. Suppose, on the other hand, that the user zooms in on an irrational point — meaning a grid point (x,y) such that x and y cannot be expressed as the ratios of two whole numbers. Examples of such numbers are pi (=3.14159...) and the square root of 2 (=1.414213...). Then, it can easily be demonstrated that the sequence of l's, 2's and 4's given by the number of tiles that need to be rendered for every g-fold zoom is quasi-random, i.e. follows no periodic pattern. This kind of quasi-random sequence is clearly more desirable from the point of view of performance; then there are no distinguished points for zooming from a performance standpoint.

[0049] Irrational tiling resolves this issue: g itself is taken to be an irrational number, typically the square root of 3, 5 or 12. Although this means that on average 3, 5 or 12 tiles (correspondingly) at a given LOD are contained within a single tile at the next lower-resolution LOD, note that the tiling grids at consecutive LODs no longer "agree" on any grid lines in this scheme (except potentially at the leading edges of the visual content, x=0 and y=0, or at some other preselected single grid line along each axis). If g is chosen such that it is not the n^th root of any integer (pi is such a number), then no LODs will share any grid lines (again, potentially except x=0 and y=0). Hence it can be shown that each tile may randomly overlap 1, 2, or 4 tiles at the next lower LOD, whereas with g=2 this number is always 1. [0050] With irrational tiling granularity, zooming in on any point will therefore produce a quasi-random stream of requests for 1, 2 or 4 tiles, and performance will be on average uniform when zooming in everywhere. Perhaps the greatest benefit of irrational tiling emerges in connection with panning after a deep zoom. When the user pans the image after having zoomed in deeply, at some point a grid line will be moved onto the display. It will usually be the case that the region on the other side of this grid line will correspond to a lower-resolution LOD than the rest of the display; it is desirable, however, for the difference between these resolutions to be as small as possible. With integral g, however, the difference will often be extremely large, because grid lines can overlap over many consecutive LODs. This creates "deep cracks" in resolution over the node area, as shown in Figure 6(a).

[0051] On the other hand, because grid lines in an irrational tiling never overlap those of an adjacent LOD (again with the possible exception of one grid line in each direction, which may be at one corner of the image), discontinuities in resolution of more than one LOD do not occur. This increased smoothness in relative resolution allows the illusion of spatial continuity to be much more convincing.

[0052] Figure 6(b) illustrates the advantage gained by irrational tiling granularity. Figure 6 shows cross-sections through several LODs of the visual content; each bar represents a cross-section of a rectangular tile. Hence the second level from the top, in which there are two bars, might be a 2x2=4 tile LOD. The curves 601, drawn from top to bottom, represent the bounds of the visible area of the visual content at the relevant LOD during a zooming operation: as the resolution is increased (zooming in to reveal more detail), the area under examination decreases. Darker bars (e.g., 602) represent tiles which have already been rendered over the course of the zoom. Lighter bars have not yet been rendered, so cannot be displayed. Note that when the tiling is integral as in Figure 6(a), abrupt changes in resolution over space are common; if the user were to pan right after the zoom, then at the spatial boundary indicated by the arrow, four LODs would "end" abruptly. The resulting image would look sharp to the left of this boundary, and extremely blurry to the right. The same visual content represented using an irrational tiling granularity lacks such resolution "cracks": adjacent LODs do not share tile boundaries, except as shown at the left edge. Mathematically, this shared boundary may occur at most in one position on the x-axis and at one position on the y-axis. In the embodiment shown these shared boundaries are positioned at y=0 and x=0, but, if present, they may also be placed at any other position.

[0053] Another benefit of irrational tiling granularity is that it allows finer control of g, since there are a great many more irrational numbers than integers, particularly over the useful range where g is not too large. This additional freedom can be useful for tuning the zooming performance of certain applications. If g is set to the irrational square root of an integer (such as sqrt(2), sqrt(5) or sqrt(8)), then in the embodiment described above the grid lines of alternate LODs would align exactly; if g is an irrational cube root, then every third LOD would align exactly; and so on. This confers an additional benefit with respect to limiting the complexity of a composite tiling, as defined below.

[0054] An important aspect of the invention is the order in which the tiles are rendered. More particularly, the various tiles of the various LODs are optimally rendered such that all visible tiles are rendered first. Nonvisible tiles may not be rendered at all. Within the set of visible tiles, rendition proceeds in order of increasing resolution, so that tiles within low-resolution LODs are rendered first. Within any particular LOD, tiles are rendered in order of increasing distance from the center of the display, which we refer to as foveated rendering. To sort such tiles in the described order, numerous sorting algorithms such as heapsort, quicksort, or others may be used. To implement this ordering, a lexigraphic key may be used for sorting "requests" to render tiles, such that the outer subkey is visibility, the middle subkey is resolution in samples per physical unit, and the inner subkey is distance to the center of the display. Other methods for ordering tile rendering requests may also be used. The actual rendering of the tiles optimally takes place as a parallel process with the navigation and display described herein. When rendering and navigation/display proceed as parallel processes, user responsiveness may remain high even when tiles are slow to render.

[0055] We now describe the process of rendering a tile in an exemplary embodiment. If a tile represents vector data, such as alphabetic typography in a stroke based font, then rendering of the tile would involve running the algorithm to rasterize the alphabetic data and possibly transmitting that data to a client from a server. Alternatively, the data fed to the rasterization algorithm could be sent to the client, and the client could run the algorithm to rasterize the tile. Li another example, rendering of a tile involving digitally sampled photographic data could involve resampling of that data to generate the tile at the appropriate LOD. For discrete LODs that are prestored, rendering may involve no more than simply transmitting the tile to a client computer for subsequent display. For tiles that fall between discrete LODs, such as tiles in the final image, some further calculation as described above may be required. [0056] At any given time, when the tiles are rendered and the image begins to fade toward the exact image, the actual display may comprise different mixes of different tiles from different LODs. Thus, any portion of the display could contain for example, 20% from LOD 1, 40% from LOD 2, and 40% from LOD 3. Regardless of the tiles displayed, the algorithm attempts to render tiles from the various LODs in a priority order best suited to supply the rendered tiles for display as they are most needed. The actual display of the rendered tiles will be explained in more detail later with reference to Figure 5.

[0057] In what follows we describe a method for drawing the plural LODs using an algorithm which can guarantee spatial and temporal continuity of image detail. The algorithm is designed to make the best use of all rendered tiles, using high-resolution tiles in preference to lower-resolution tiles covering the same display area, yet using spatial blending to avoiding sharp boundaries between LODs, and temporally graduated blending weights to blend in higher detail if and when it becomes available (i.e. when higher-resolution tiles have been rendered). Unlike the prior art, this algorithm and variants thereof can result in more than two LODs being blended together at a given point on the display; it can also result in blending coefficients that vary smoothly over the display area; and it can result in blending coefficients that evolve in time even after the user has stopped navigating. In this exemplary embodiment it is nonetheless computationally efficient, and can be used to render imagery as partially transparent, or with an overall transparency that varies over the image area, as will become apparent.

[0058] We define herein a composite tile area, or simply a composite tile. To define a composite tile we consider all of the LODs stacked on top of each other. Each LOD has its own tile grid. The composite grid is then formed by the projection of all of the grids from all of the LODs onto a single plane. The composite grid is then made up of various composite tiles of different sizes, defined by the boundaries of tiles from all of the different LODs. This is shown conceptually in Fig. 7. Fig. 7 depicts the tiles from three different LODs, 701 through 703, all representing the same image. One can imagine the LODs 701 through 703 being stacked up on top of each other. In such a case, if one lined up corner 750 from each of these LODs and stacked them on top of each other, an area represented by 740 would be inside the area represented by 730, and the areas represented by 730 and 740, would be inside the area represented by 720. Area 710 of Fig. 7 shows that there would be a single "composite tile" 710. Each of the composite tiles is examined during each frame, wherein the frame rate may be typically greater than ten frames per second. Note that, as explained above, this frame rate is not necessarily the display refresh rate.

[0059] Fig. 5 depicts a flow chart of an algorithm for updating the frame buffer as tiles are rendered. The arrangement of Fig. 5 is intended to operate on every composite tile in the displayed image each time the frame buffer is updated. Thus, for example, if a frame duration is 1/20 of a second, each of the composite tiles on the entire screen would preferably be examined and updated during each 1/20 of a second. When a composite tile is operated upon by the process of Fig. 5, the composite tile may lack the relevant tiles in one or more LODs. The process of Fig. 5 attempts to display each composite tile as a weighted average of all the available superimposed tiles within which the composite tile lies: Note that composite tiles are defined in such a way that they fall within exactly one tile at any given LOD; hence the weighted average can be expressed as a relative proportion of each LOD. The process attempts to determine the appropriate weights for each LOD within the composite tile, and to vary those weights gradually over space and time to cause the image to gradually fade towards the final image discussed above.

[0060] The composite grid includes plural vertices which are defined to be any intersection or corner of gridlines in the composite grid. ' These are termed composite grid vertices. We define an opacity for each LOD at each composite grid vertex. The opacity can be expressed as a weight between 0.0 and 1.0, and the sum of all the LOD weights at each vertex should therefore be 1.0 if the desired result is for the image to be totally opaque. The current weights at any particular time for each LOD at each vertex

are maintained in memory.

[0061] The algorithm for updating vertex weights proceeds as described below.

[0062] The following variables, which are taken to be numbers between 0.0 and 1.0, are kept in memory for each tile: centerOpacity, cornerOpacity for each corner (4 if the tiling is a rectangular grid), and edgeOpacity for each edge (4 if the tiling is a rectangular grid). When a tile is first rendered, all of its opacities as just listed are normally set to 1.0.

[0063] During a drawing pass, the algorithm walks through the composite tiling once for each relevant LOD, beginning with the highest-resolution LOD. In addition to the per-tile variables, the algorithm maintains the following variables: levelOpacityGrid and opacityGrid. Both of these variables are again numbers between 0.0 and 1.0, and are maintained for each vertex in the composite tiling.

. [0064] The algorithm walks through each LOD in turn, in order from highest- resolution to lowest, performing the following operations. First 0.0 is assigned to levelOpacityGrid at all vertices. Then, for each rendered tile at that LOD (which may be a subset of the set of tiles at that LOD, if some have not yet been rendered), the algorithm updates the parts of the levelOpacityGrid touching that tile based on the tile's centerOpacity, cornerOpacity and edgeOpacity values:

[0065] If the vertex is entirely in the interior of the tile, then it gets updated using centerOpacity.

[0066] If the vertex is e.g. on the tile's left edge, it gets updated with the left edgeOpacity.

[0067] If the vertex is e.g. on the top right corner, it gets updated with the top right cornerOpacity.

[0068] "Updating" means the following: if the pre-existing levelOpacityGrid value is greater than 0.0, then set the new value to the minimum of the present value, or the value it's being updated with. If the pre-existing value is zero (i.e. this vertex hasn't been touched yet) then just set the levelOpacityGrid value to the value it's being updated with. The end result is that the levelOpacityGrid at each vertex position gets set to the minimum nonzero value with which it gets updated.

[0069] The algorithm then walks through the levelOpacityGrid and sets to 0.0 any vertices that touch a tile which has not yet been rendered, termed a hole. This ensures spatial continuity of blending: wherever a composite tile falls within a hole, at the current LOD, drawing opacity should fade to zero at all vertices abutting that hole.

[0070] In an enhanced embodiment, the algorithm can then relax all levelOpacityGrid values to further improve spatial continuity of LOD blending. The situation as described thus far can be visualized as follows: every vertex is like a tentpole,

1?2 where the levelOpacityGrid value at that point are the tentpole's height. The algorithm has thus far ensured that at all points bordering on a hole, the tentpoles have zero height; and in the interior of tiles that have been rendered, the tentpoles are set to some (probably) nonzero value. In the extreme case, perhaps all the values inside a rendered tile are set to 1.0. Assume for purposes of illustration that the rendered tile has no rendered neighbors yet, so the border values are 0.0. We have not specified how narrow the "margin" is between a 0.0 border tentpole and one of the 1.0 internal tentpoles. If this margin is too small, then even though the blending is technically continuous, the transition may be too sharp when measured as an opacity derivative over space. The relax operation smoothes out the tent, always preserving values of 0.0, but possibly lowering other tentpoles to make the function defined by the tent surface smoother, i.e. limiting its maximum spatial derivative. It is immaterial to the invention which of a variety of methods are used to implement this operation; one approach, for example, is to use selective low-pass filtering, locally replacing every nonzero value with a weighted average of its neighbors while leaving zeroes intact. Other methods will also be apparent to those skilled in the art.

[0071] The algorithm then walks over all composite grid vertices, considering corresponding values of levelOpacityGrid and opacityGrid at each vertex: if levelOpacityGrid is greater than 1.0-opacityGrid, then levelOpacityGrid is set to 1.0- opacityGrid. Then, again for each vertex, corresponding values of levelOpacityGrid are added to opacityGrid. Due to the previous step, this can never bring opacityGrid above 1.0. These steps in the algorithm ensure that as much opacity as possible is contributed by higher-resolution LODs when they are available, allowing lower-resolution LODs to "show through" only where there are holes.

[0072] The final step in. the traversal of the current LOD is to actually draw the composite tiles at the current LOD₅ using levelOpacityGrid as the per-vertex opacity values. In an enhanced embodiment, levelOpacityGrid can be multiplied by a scalar overallOpacity variable in the range 0.0 to 1.0 just before drawing; this allows the entire image to be drawn with partial transparency given by the overallOpacity. Note that drawing an image-containing polygon, such as a rectangle, with different opacities at each vertex is a standard procedure. It can be accomplished, for example, using industry- standard texture mapping functions using the OpenGL or Direct3D graphics libraries. In practice, the drawn opacity within the interior of each such polygon is spatially interpolated, resulting in a smooth change in opacity over the polygon.

[0073] hi another enhanced embodiment of the algorithm described above, tiles maintain not only their current values of centerOpacity, cornerOpacity and edgeOpacity (called the current values), but also a parallel set of values called targetCenterOpacity, targetCornerOpacity and targetEdgeOpacity (called the target values). In this enhanced embodiment, the current values are all set to 0.0 when a tile is first rendered, but the the target values are all set to 1.0. Then, after each frame, the current values are adjusted to new values closer to the target values. This may be implemented using a number of mathematical formulae, but as an example, it can be done in the following way: newValue = oldValue*(l-δ) + targetValue*£, where b is a rate in greater than 0.0 and less than 1.0. A value of b close to 0.0 will result in a very slow transition toward the target value, and a value of b close to 1.0 will result in a very rapid transition toward the target value. This method of updating opacities results in exponential convergence toward the target, and results in a visually pleasing impression of temporal continuity. Other formulae can achieve the same result.

[0074] The foregoing describes the preferred embodiment of the present invention. The invention is not limited to such preferred embodiment, and various modifications consistent with the appended claims are included within the invention as well.

Claims

CLAIMS:

1. A method of displaying visual content, said method comprising: generating a plurality of different levels of detail (LODs) of the visual content; displaying the visual content as an interpolation of said LODs while the visual content is navigated; and displaying a final image including at least a portion not as an interpolation of said LODs when said navigation substantially ceases.

2. The method of claim 1 wherein said navigation comprises one of more of the following: two or three dimensional translation, rotation, image filtering, local stretching, dynamic spatial distortion, magnification or minification.

3. The method of claim 1 wherein prior to said final image being displayed, an intermediate final image is generated by interpolation from said plurality of said LODs.

4. The method of claim 3 wherein said intermediate final image gradually changes to said final image.

5. The method of claim 4 wherein said final image or said intermediate image is rendered on a tile-by-tile basis.

6. The method of claims 3, 4 or 5 wherein each LOD is comprised of tiles and said final image or said intermediate final image is displayed by using tiles from several LODs displayed as composite tiles.

7. The method of claim 6 wherein the tiles of each LOD are made available for entry into a frame buffer in an order that depends at least in part upon the LOD in which the tile is, or whether the tile is viewable presently, or the degree of foveation of such tile.

8. The method of claim 7 wherein viewable tiles are rendered first, and within said viewable tiles, tiles are rendered in order of increasing resolution, and within tiles of a similar resolution, tiles are rendered in foveated order.

9. The method of claim 8 further comprising implementing irrational tiling.

10. The method of claim 6 wherein said visual content comprises vector and nonvector data.

11. The method of claim 10 wherein said plurality of LODs are generated at a remote terminal, and said intermediate and final images are generated at a locally viewable terminal.

12. A method of displaying visual content that is being navigated comprising displaying said visual content as an interpolation of plural LODs during said navigation and displaying said visual content at least in part not as an interpolation of plural LODs when said navigation is substantially over.

13. The method of claim 12 wherein displaying said visual content as an interpolation of plural LODs gradually fades to displaying said visual content not as an interpolation of plural LODs.

14. A method of representing visual content by generating a set of LODs₅ each LOD comprising tiles, a number of tiles in a first LOD and a number of tiles in a second LOD not forming a ratio of integers for at least one subset of first and second LODs in said set.

15. The method of claim 14 comprising rendering composite tiles as a combination of parts of tiles from various LODs stacked on top of one another.

16. The method of claim 16 further comprising rendering tiles from each of a plurality of LODs in an order sorted first by viewability, within said viewable tiles by LOD, and within each LOD by level of foveation.

17. A method of displaying visual content comprising combining plural LODs representing visual content, and gradually altering a contribution attributable to at least one of said LODs so that said displayed visual content gradually changes toward a better displayed image in response to information to render said better displayed image becoming available.

18. The method of claim 17 wherein said contribution is altered gradually by assigning at least one weight to plural tiles within plural LODs, and then altering said weights.

19. The method of claim 17 wherein said assigning assigns plural weights to each of said tiles in at least one LOD.

20. The method of claim 19 wherein said plural weights include opacities at each of plural corners of said tile, opacities at each of plural edges of said tile, and an opacity at a point within each of said tiles.

21. The method of claim 18 further comprising calculating a levelopacitygrid set of variables for plural locations in an LOD, said levelopacitygrid set of variables being calculated by utilizing at least some of the weights of claim 20 for all tiles in said LOD tangent to a vertex at which said levelopacitygrid is to be calculated.

22. The method of claim 21 further comprising spatially filtering said levelopacitygrid set of variables for at least one LOD.

23. The method of claim 17 wherein said combination is made such that combinations from higher resolution LODs near the resolution of the display are emphasized over lower resolution LODs when such higher resolution LODs are available.

24. A method of representing visual content comprising combining a first LOD with a second LOD, each of said LODs being comprised of plural tiles, the tiles being arranged so that edges of said tiles in said LODs do not align throughout substantially all the visual content.

25. The method of claim 16 applied to three or more such LODs having increasing resolution, wherein when the LODs are arranged in order of increasing resolution, no two consecutive LODs differ in resolution by a rational multiple.

26. A method of combining plural LODs to display visual content, the method comprising weighting each of the LODs with an associated contribution, and varying the contribution provided by each LOD over time and space.

27. The method of claim 26 wherein the weighting is an opacity level.

28. The method of claim 27 wherein the total opacity of the combined LODs is less than one hundred percent.

29. The method of claim 28 wherein said varying over time results in asymptotic convergence toward a target value.

30. The method of claim 29 wherein for each LOD, an opacity level is calculated for each of a plurality of vertices.

31. _: The method of claim 30 wherein said varying over time and space is designed to diminish viewable discontinuities.

32. The method of claim 31 wherein values representing the weighting are low pass filtered.

33. A method comprising displaying an intermediate final image and then displaying a final image, the final image and intermediate final image being comprised of tiles rendered in foveated order, a transition from intermediate final image to final image being displayed occurring upon detection of navigation substantially ceasing.

34. The method of claim 33 wherein lower resolution tiles are displayed prior to higher resolution tiles.

35. The method of claim 34 wherein said transition is gradual.

36. Apparatus for displaying a final image, after navigation substantially ceases, comprising means for displaying interpolated images while navigation occurs, means for detecting when navigation has substantially ceased, and rerendering vector data and non vector data using separate algorithms to display said final image.

37. The apparatus of claim 36 wherein said final image has tiles that contain both vector and non-vector data, and wherein at least one such tile is rendered using two different algorithms to accomplish this rendering.

38. The apparatus of claim 37 further comprising a processor to implement software to fade from an intermediate image to a final image.

39. A method of displaying visual content comprising combining plural LODs representing visual content, and gradually altering a contribution attributable to at least three of said LODs so that said displayed visual content gradually changes.

40. The method of claim 17 wherein said contribution is altered gradually by assigning at least one weight to plural tiles within plural LODs, and then altering said weights.

41. The method of claim 40 wherein said assigning assigns plural weights to each of said tiles in at least one LOD.

42. The method of claim 41 wherein said plural weights include opacities at each of plural corners of said tile, opacities at each of plural edges of said tile, and an opacity at a point within each of said tiles.

43. The method of claim 42 further comprising calculating a levelopacitygrid set of variables for plural locations in an LOD, said levelopacitygrid set of variables being calculated by utilizing at least some of the weights of claim 19 for all tiles in said LOD tangent to a vertex at which said levelopacitygrid is to be calculated.

44. The method of claim 43 further comprising spatially filtering said levelopacitygrid set of variables for at least one LOD.

45. The method of claim 17 wherein said combination is made such that combinations from higher resolution LODs are increased over contributions from lower resolution LODs.