Neural Network Models of Human Gloss Perception
The human visual system can identify materials and their properties at a quick glance. One property that contributes to a material’s appearance is gloss – the physical tendency of a surface to reflect light in a single direction, causing the appearance of sharp distinct reflections. The glossiness of a material allows us to infer further material ... properties, such as the freshness of food, the cleanliness of a surface, and whether the floor is wet and slippery. Despite its importance human perception of gloss remains poorly understood. Artificial neural networks have been very successful over the last decade as powerful tools in computer vision and have garnered substantial interest from human vision science. Using a convolutional architecture and machine learning these models can extract useful features in large datasets containing millions of images in order to recognize objects, materials, faces and for many other tasks in computer vision. However, despite their power and versatility, or perhaps because of these properties, the applications and interpretations of neural networks in human vision science remain challenging and complex and new methods are still emerging. This thesis has two aims - to investigate human perception of gloss, and to investigate the application and usefulness of artificial neural networks in human vision science. The first study explores different feed-forward architectures of convolutional neural networks (CNNs) to replicate human responses in discriminating between high gloss and low gloss textured materials. We found that CNNs of different depths may correlate well with human responses, but that networks with 3-5 layers most typically tend to respond similarly to humans. We also trained Deep Convolutional Generative Adversarial Networks (DCGANs) of different depths to recreate images showing low- and high-gloss materials and showed these to human observers. Observers were able to tell apart low- from high-gloss materials in images created by DCGANs with two or more layers, while DCGANs with three layers produced images that were as discriminable as renderings. Our findings show that CNNs of relatively shallow depths can replicate successes and failures of gloss classification that are typical for humans and can generate images that convincingly depict glossy materials. In the second study we investigated human perception of gloss highlights – sharp and bright reflections on a glossy surface. Humans classified individual pixels in grayscale images of textured glossy surfaces as containing a highlight or surface texture. We trained a neural network to identify pixels containing highlights in a large set of such images. In a second fitting stage we pruned the network to find a subnetwork that responds more like humans. We found that we can indeed find pruned configurations that explain the human data better than the full network. Further analysis suggests that representations in the network mostly resemble simple directly image computable features, while only some show similarity to certain complex geometric features. The network appears to be only weakly sensitive to violations of photo- geometric constraints. Taken together our findings support a view of gloss perception as a process of mid-level vision. We find that relatively shallow neural network architectures of 3-5 layers are sufficient to model human gloss perception. We also find that our model for highlight detection learned features and representations that resemble both complex geometric predictors as well as simpler directly image computable features. In both projects our most human-like models appear to be only weakly sensitive to photo-geometric constraints.