Before we can even start speculating about what kinds of neat things HoloLens will enable us to do, we need to know how exactly the device works and what its technical capabilities and limitations are. After doing a bit of research, gathering information from Microsoft's official statements, from hands-on reports and from images of the hardware, I realized that we already know quite a lot about it, so here's my summary. If you have any additional information, please comment!
1. HOW ARE THE "HOLOGRAPHIC" IMAGES PRODUCED?
This is probably the most fascinating, and also kind of mysterious part about HoloLens hardware-wise.
Microsoft has very vividly, but also only very vaguely described how the virtual images are being produced: A so-called "light engine", which has light particles bouncing around inside millions of times, emits the light towards two separate lenses (one for each eye), which consist of three layers of glass of three different primary colors (blue, green, red) and have "micro-thin" corrugated grooves on them. The light hits those layers and finally enters the eye in specific angles, intensities and colors, producing an image on the eye's retina.
Some journalists guess that this is basically a form of technology that uses interference to reconstruct light fields, which means that the light enters the eye in the same angles that it would if it originated from an actual object in space (in that sense the terms "holography" is not even that inappropriate for this technology, since actual holography is based on a similar physical principle).
If that's how HoloLens works, it's significantly different from most 3D systems. It's truly "beyond screens and pixels" (as Microsoft put it). Most 3D head mounted displays, such as the Oculus Rift, send 2D images from 2D screens into the eyes in an angle that simulates an infinite distance between the image and the eye, with the 3D effect relying solely on the different perspectives portrayed on the 2D images for the left and the right eye, a.k.a. stereoscopy. If HoloLens actually recreates light fields, that would elevate the 3D experience to a whole new level by recreating not only the stereoscopic effect of real life 3D vision but also the different optical focuses of light originating from different distances. That would make for a much more natural visual experience, and it would explain why those who got to try the HoloLens described the virtual objects as almost indistinguishable from real objects. Try this: hold your finger a few inches from your eyes and look at it. Then look at the wall behind it. You will notice that, as you focus on the wall in the distance, your finger does not only become double but also blurry - that's because the focal point for the finger is a different one than for the wall. If you were viewing the same thing through an Oculus Rift or a 3D TV, the finger would become double, but it would stay just as sharp as the rest of the image. That unnatural ubiquitous sharpness, combined with the unusual sensation of performing convergence without accommodation (adjusting the position of the eyes without adjusting the focus of the eyes' lenses) result in an unnatural visual experience, which would appear as even more unnatural and jarring when combined with real world vision in an augmented reality system such as the HoloLens. Real objects would be out of focus if you focus on virtual objects, and vice versa.
Regarding the quality of the images, reports say that the image is clear, but not "4k-like", a little "more grainy" than portrayed in the promotional videos. Microsoft mentioned that the lenses are "HD", which is interesting because with other devices such as the Oculus Rift it's not the lenses that determine the resolution, but the display. Again, this seems to indicate that HoloLens does not use conventional screen technology.
What do those lenses look like? If you look at images of the device, you can see two somewhat rectangular lenses behind the big dark visor, one for each eye (they basically look like normal frameless glasses). I assume that those are the lenses that produce the image, their size matches the reported field of view. The "light engine" must be located somewhere above those lenses. The big dark visor covering those lenses and almost the whole field of view probably makes the virtual objects appear more opaque by simply reducing the brightness of the real world image, preventing bright real objects from shining through virtual images (according to testers, the virtual image is not perfectly opaque - I don't think the prototypes used in those demos had a shaded visor though). The special camera rig that was used during the on-stage demo had some sort of glasses mounted on its front, I guess that those served the same purpose as the visor on the headset, I could be wrong though. There are buttons on the side of the device for manual "contrast" adjustment, I think that those actually adjust the intensity of the virtual images to different lighting conditions so the image doesn't look transparent in daylight or too bright in low light conditions.
There are a bunch of other companies besides Microsoft that seem to be working on similar display technologies, one of which (Osterhout Design Group) Microsoft bought patents from last year. I couldn't find any information on how those other systems work, so if you know more about this, please comment!
2. WHAT SENSORS DOES THE DEVICE USE?
The device has 18 sensors built in.
It's safe to assume that some of those sensors are similar to those used in VR headsets and phones (things like accelerometer, gyroscope, magnetometer).
The device uses depth cameras that are based on Kinect technology, but with a wider field of sensing (120x120 degrees) and using "a fraction of the energy used by Kinect". I read somewhere that there are multiple cameras at the front and sides of the device, but on images of the device I can only see two pairs of cameras on the front side (probably two visual cameras and two depth cameras).
Speaking of visual cameras, they can be used to record and share the user's visual experience from a first person perspective (as demonstrated in the Skype application). Since there seem to be two cameras, roughly at interpupillary (eye-to-eye) distance, I wonder if HoloLens is capable of recording stereoscopic photos/videos.
Other sensors that we know of are: at least one microphone, and some sort of sensor that automatically measures the interpupillary distance (and maybe also the position of the eyes relative to the lenses) so the system can render the images appropriately. I wonder how that sensor works, and if it also allows for eye tracking functionality so the system knows where the user is looking.
3. HOW DOES IT TRACK THE USER'S POSITION?
HoloLens provides full positional and rotational tracking without using any external markers or external cameras. I assume that the depth cameras (and/or visual cameras) detect movement relative to static objects surrounding the user, and other sensors such as accelerometer, gyroscope and magnetometer provide additional information to improve the tracking. Any slight imperfections or lag in the tracking process would ruin the merging of the real world and virtual objects, and according to those who used the prototypes tracking works flawlessly.
4. HOW DOES IT RECOGNIZE THE SPATIAL ENVIRONMENT?
To me this is the second most fascinating thing about this device besides the visual system - how does it scan the environment so accurately and intelligently that it allows for virtual objects to fit so perfectly into the real environment? According to hand-on reports, virtual objects can sit perfectly flat on furniture, and the device perfectly recognizes the position and shape of real world objects that sit "in front of" virtual objects, even as the user moves around changing the visual perspective rapidly.
I think this is only possible with a combination of very high definition, very low latency depth cameras, and software that is intelligent enough to translate the depth information into a complete spatial map and to recognize shapes and objects.
5. WHAT KIND OF AUDIO SYSTEM DOES IT HAVE?
There are speakers on top of both ears. There's a good reason why the device doesn't use earphones: earphones would block out real world sounds, which is not how augmented reality is supposed to work. It's not clear if the speakers are traditional ones or ones that transmit sound through the skull (which would prevent other people from hearing what the user is hearing). There's an adjustable red element on the sides of the device, I wonder if those might be for the speakers or for other ergonomic adjustments.
The device has "spatial sound", which means that sound is adjusted for the left and the right ear so that sounds appear to be originating from certain directions. There are different kinds of 3D sound solutions out there - simple ones only allow for horizontal localization, and more advanced systems that can make things sound as if they are above or below you. There's no information yet on what kind of system is used in HoloLens.
There are volume buttons on the side of the device.
6. WHAT KIND OF COMPUTER IS DRIVING THE DEVICE?
The device is basically a wearable Windows 10 computer with three "high end" processors built in - a CPU, a GPU, and a separate processor which according to Microsoft handles all the information coming from the various sensors and uses that data to generate information about position, the spatial environment, voice inputs, gestures, and so on. (Even just the fact that this device has 3 powerful processors inside - besides the futuristic visual system - makes a pricing in the Oculus Rift range very unlikely. This device is definitely going to be quite expensive).
7. WHAT DO WE KNOW ABOUT ERGONIMICS?
According to Microsoft, what was demoed on stage is the actual industrial design and will weigh about 400g.
Heat is vented through the sides of the headset so the device doesn't feel hot on the head.
8. WHAT KINDS OF COOL THINGS ARE PROVEN TO BE POSSIBLE WITH HOLOLENS?
• Putting virtual objects onto real objects (on tables, walls, …)
• Changing the texture, appearance and lighting of real objects
• Making virtual holes into real objects or making real objects disappear
• Putting virtual screens into the real world
• Moving a cursor or virtual objects by head movement
• Initiating actions by doing hand gestures or voice commands
• Replacing the complete environment with a virtual one
• Integrating real objects into a virtual environment
• Moving a cursor from a real screen into the virtual world using a mouse
• Seeing avatars of other users within a real or virtual environment (body tracking presumably requires external hardware such as Kinect)
• Letting other people annotate your real environment
Sources (I can't post links yet, so please google them):
Microsoft's presentation and on-stage demo from January 21st;
Hands-on reports from Wired, The Verge, Neowin, Arstechnica and Windows Central;
Article on Wired titled "Microsoft in the age of Satya Nadella"
Article on TechCrunch on Microsoft's acquisition of patents from the Osterhout Design Group
1. HOW ARE THE "HOLOGRAPHIC" IMAGES PRODUCED?
This is probably the most fascinating, and also kind of mysterious part about HoloLens hardware-wise.
Microsoft has very vividly, but also only very vaguely described how the virtual images are being produced: A so-called "light engine", which has light particles bouncing around inside millions of times, emits the light towards two separate lenses (one for each eye), which consist of three layers of glass of three different primary colors (blue, green, red) and have "micro-thin" corrugated grooves on them. The light hits those layers and finally enters the eye in specific angles, intensities and colors, producing an image on the eye's retina.
Some journalists guess that this is basically a form of technology that uses interference to reconstruct light fields, which means that the light enters the eye in the same angles that it would if it originated from an actual object in space (in that sense the terms "holography" is not even that inappropriate for this technology, since actual holography is based on a similar physical principle).
If that's how HoloLens works, it's significantly different from most 3D systems. It's truly "beyond screens and pixels" (as Microsoft put it). Most 3D head mounted displays, such as the Oculus Rift, send 2D images from 2D screens into the eyes in an angle that simulates an infinite distance between the image and the eye, with the 3D effect relying solely on the different perspectives portrayed on the 2D images for the left and the right eye, a.k.a. stereoscopy. If HoloLens actually recreates light fields, that would elevate the 3D experience to a whole new level by recreating not only the stereoscopic effect of real life 3D vision but also the different optical focuses of light originating from different distances. That would make for a much more natural visual experience, and it would explain why those who got to try the HoloLens described the virtual objects as almost indistinguishable from real objects. Try this: hold your finger a few inches from your eyes and look at it. Then look at the wall behind it. You will notice that, as you focus on the wall in the distance, your finger does not only become double but also blurry - that's because the focal point for the finger is a different one than for the wall. If you were viewing the same thing through an Oculus Rift or a 3D TV, the finger would become double, but it would stay just as sharp as the rest of the image. That unnatural ubiquitous sharpness, combined with the unusual sensation of performing convergence without accommodation (adjusting the position of the eyes without adjusting the focus of the eyes' lenses) result in an unnatural visual experience, which would appear as even more unnatural and jarring when combined with real world vision in an augmented reality system such as the HoloLens. Real objects would be out of focus if you focus on virtual objects, and vice versa.
Regarding the quality of the images, reports say that the image is clear, but not "4k-like", a little "more grainy" than portrayed in the promotional videos. Microsoft mentioned that the lenses are "HD", which is interesting because with other devices such as the Oculus Rift it's not the lenses that determine the resolution, but the display. Again, this seems to indicate that HoloLens does not use conventional screen technology.
What do those lenses look like? If you look at images of the device, you can see two somewhat rectangular lenses behind the big dark visor, one for each eye (they basically look like normal frameless glasses). I assume that those are the lenses that produce the image, their size matches the reported field of view. The "light engine" must be located somewhere above those lenses. The big dark visor covering those lenses and almost the whole field of view probably makes the virtual objects appear more opaque by simply reducing the brightness of the real world image, preventing bright real objects from shining through virtual images (according to testers, the virtual image is not perfectly opaque - I don't think the prototypes used in those demos had a shaded visor though). The special camera rig that was used during the on-stage demo had some sort of glasses mounted on its front, I guess that those served the same purpose as the visor on the headset, I could be wrong though. There are buttons on the side of the device for manual "contrast" adjustment, I think that those actually adjust the intensity of the virtual images to different lighting conditions so the image doesn't look transparent in daylight or too bright in low light conditions.
There are a bunch of other companies besides Microsoft that seem to be working on similar display technologies, one of which (Osterhout Design Group) Microsoft bought patents from last year. I couldn't find any information on how those other systems work, so if you know more about this, please comment!
2. WHAT SENSORS DOES THE DEVICE USE?
The device has 18 sensors built in.
It's safe to assume that some of those sensors are similar to those used in VR headsets and phones (things like accelerometer, gyroscope, magnetometer).
The device uses depth cameras that are based on Kinect technology, but with a wider field of sensing (120x120 degrees) and using "a fraction of the energy used by Kinect". I read somewhere that there are multiple cameras at the front and sides of the device, but on images of the device I can only see two pairs of cameras on the front side (probably two visual cameras and two depth cameras).
Speaking of visual cameras, they can be used to record and share the user's visual experience from a first person perspective (as demonstrated in the Skype application). Since there seem to be two cameras, roughly at interpupillary (eye-to-eye) distance, I wonder if HoloLens is capable of recording stereoscopic photos/videos.
Other sensors that we know of are: at least one microphone, and some sort of sensor that automatically measures the interpupillary distance (and maybe also the position of the eyes relative to the lenses) so the system can render the images appropriately. I wonder how that sensor works, and if it also allows for eye tracking functionality so the system knows where the user is looking.
3. HOW DOES IT TRACK THE USER'S POSITION?
HoloLens provides full positional and rotational tracking without using any external markers or external cameras. I assume that the depth cameras (and/or visual cameras) detect movement relative to static objects surrounding the user, and other sensors such as accelerometer, gyroscope and magnetometer provide additional information to improve the tracking. Any slight imperfections or lag in the tracking process would ruin the merging of the real world and virtual objects, and according to those who used the prototypes tracking works flawlessly.
4. HOW DOES IT RECOGNIZE THE SPATIAL ENVIRONMENT?
To me this is the second most fascinating thing about this device besides the visual system - how does it scan the environment so accurately and intelligently that it allows for virtual objects to fit so perfectly into the real environment? According to hand-on reports, virtual objects can sit perfectly flat on furniture, and the device perfectly recognizes the position and shape of real world objects that sit "in front of" virtual objects, even as the user moves around changing the visual perspective rapidly.
I think this is only possible with a combination of very high definition, very low latency depth cameras, and software that is intelligent enough to translate the depth information into a complete spatial map and to recognize shapes and objects.
5. WHAT KIND OF AUDIO SYSTEM DOES IT HAVE?
There are speakers on top of both ears. There's a good reason why the device doesn't use earphones: earphones would block out real world sounds, which is not how augmented reality is supposed to work. It's not clear if the speakers are traditional ones or ones that transmit sound through the skull (which would prevent other people from hearing what the user is hearing). There's an adjustable red element on the sides of the device, I wonder if those might be for the speakers or for other ergonomic adjustments.
The device has "spatial sound", which means that sound is adjusted for the left and the right ear so that sounds appear to be originating from certain directions. There are different kinds of 3D sound solutions out there - simple ones only allow for horizontal localization, and more advanced systems that can make things sound as if they are above or below you. There's no information yet on what kind of system is used in HoloLens.
There are volume buttons on the side of the device.
6. WHAT KIND OF COMPUTER IS DRIVING THE DEVICE?
The device is basically a wearable Windows 10 computer with three "high end" processors built in - a CPU, a GPU, and a separate processor which according to Microsoft handles all the information coming from the various sensors and uses that data to generate information about position, the spatial environment, voice inputs, gestures, and so on. (Even just the fact that this device has 3 powerful processors inside - besides the futuristic visual system - makes a pricing in the Oculus Rift range very unlikely. This device is definitely going to be quite expensive).
7. WHAT DO WE KNOW ABOUT ERGONIMICS?
According to Microsoft, what was demoed on stage is the actual industrial design and will weigh about 400g.
Heat is vented through the sides of the headset so the device doesn't feel hot on the head.
8. WHAT KINDS OF COOL THINGS ARE PROVEN TO BE POSSIBLE WITH HOLOLENS?
• Putting virtual objects onto real objects (on tables, walls, …)
• Changing the texture, appearance and lighting of real objects
• Making virtual holes into real objects or making real objects disappear
• Putting virtual screens into the real world
• Moving a cursor or virtual objects by head movement
• Initiating actions by doing hand gestures or voice commands
• Replacing the complete environment with a virtual one
• Integrating real objects into a virtual environment
• Moving a cursor from a real screen into the virtual world using a mouse
• Seeing avatars of other users within a real or virtual environment (body tracking presumably requires external hardware such as Kinect)
• Letting other people annotate your real environment
Sources (I can't post links yet, so please google them):
Microsoft's presentation and on-stage demo from January 21st;
Hands-on reports from Wired, The Verge, Neowin, Arstechnica and Windows Central;
Article on Wired titled "Microsoft in the age of Satya Nadella"
Article on TechCrunch on Microsoft's acquisition of patents from the Osterhout Design Group
Last edited: