The project is implemented in scope of the Machine Perception and Learning course. The goal of the project is to develop a neural network that predicts points of interest on diagrams under task-specific conditions. The model architecture is based on the Task-driven Webpage Saliency paper [1] .
Multiple datasets are used for the model training.
Task-free part of the model is trained with the joint dataset of What Makes a Visualization Memorable? [2] and Beyond Memorability: Visualization Recognition and Recall. [3]
Task-specific part of the model is trained with the dataset from Exploring Visual Attention and Saliency Modeling for Task-Based Visual Analysis. [4]
The model utilizes the FCN semantic image segmentation [5] encoder-decoder architecture for both task-free and task-specific branches.
An attempt was made to exploit the Xception: Deep Learning with Depthwise Separable Convolutions [6] encoder together with the Predicting Visual Importance Across Graphic Design Types [7] decoder. This approach demonstrated successful results for the task-free saliency prediction, but needs to be adapted for the task-specific part.
The comparison between the ground truth (left image) and the model prediction (right image) is presented below. As a similarity measure, pearson correlation coefficient between the images is computed.
[1] Zheng, Quanlong & Jiao, Jianbo & Cao, Ying & Lau, Rynson. (2018). Task-Driven Webpage Saliency: 15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part XIV. 10.1007/978-3-030-01264-9_18.
[2] Borkin, Michelle & Vo, Azalea & Bylinskii, Zoya & Isola, Phillip & Sunkavalli, Shashank & Oliva, Aude & Pfister, Hanspeter. (2013). What Makes a Visualization Memorable?. IEEE transactions on visualization and computer graphics. 19. 2306-15. 10.1109/TVCG.2013.234.
[3] Borkin, Michelle & Bylinskii, Zoya & Kim, Nam & Bainbridge, Constance & Yeh, Chelsea & Borkin, Daniel & Pfister, Hanspeter & Oliva, Aude. (2015). Beyond Memorability: Visualization Recognition and Recall. IEEE transactions on visualization and computer graphics. 22. 10.1109/TVCG.2015.2467732.
[4] Polatsek, Patrik & Waldner, Manuela & Viola, Ivan & Kapec, Peter & Benesova, Wanda. (2018). Exploring Visual Attention and Saliency Modeling for Task-Based Visual Analysis. Computers & Graphics. 72. 10.1016/j.cag.2018.01.010.
[5] Long, Jonathan & Shelhamer, Evan & Darrell, Trevor. (2014). Fully Convolutional Networks for Semantic Segmentation. Arxiv. 79.
[6] Chollet, Francois. (2017). Xception: Deep Learning with Depthwise Separable Convolutions. 1800-1807. 10.1109/CVPR.2017.195.
[7] Fosco, Camilo & Casser, Vincent & Bedi, Amish & O'Donovan, Peter & Hertzmann, Aaron & Bylinskii, Zoya. (2020). Predicting Visual Importance Across Graphic Design Types.



