It indicates that most of the processing work - identifying people and other objects in view, and deciding what information to show about them - would likely be carried out by remote computer servers in order to keep the equipment slimline.
BBC: Microsoft patent