Notes of Ken Lin: Paper 整理：Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning

前言

"Paper 整理"系列只是試著將paper中的概念，或是將別人對paper的見解整理起來，幫助自己消化及日後複習。都會附上出處，如有侵權或錯誤煩請email告知：jason.eed05@gmail.com

論文連結

https://arxiv.org/pdf/1803.05268.pdf

Abstract

任務-VQA

給一張圖片，問問題，回答答案

例：“What color is the cube to the right of the large metal sphere?”，此時model可能就要identify large metal sphere 是指哪一個，理解"right"的意思，然後在他"right"的區域中，再找到一個 cube 並回傳顏色。

由此例可以稍微了解到，為了可以回答各式各樣長度的推理問題，model behavior 需要是可以將問題拆解成基本問題 (compositional) 的。

現況

另外為了方便分析 model 分別在每個基本問題的表現，model 通常採用 modular network，其中會有一定的 "transparency"(讓結果interpretable)，但是在複雜的視覺推理上 performance 常會下降；另外先前有研究專門讓performance 提高，但會 lose model transparency。

而這篇論文就是要在視覺推理上 close the gap between performance and interpretability。

Transparent

Here, transparency refers to the ability to examine the intermediate outputs of each module and understand their behavior at a high level.

Related Work

Johnson et al, ICCV 2017

performance 不錯，but their modules are not easily interpretable, because they process high dimensional features throughout their entire network，很難 interpret。

且由於是gradient-based method, they flow backward through a network, 其要 attend (visualize) 的視覺特徵會被下游 modules 影響，而不是每個被拆解的"基本問題"各自獨立=>attention mask unreliable。

Transparent by Design

並非產生 high-dimensional feature maps，TbD-net 只產生 one-dimensional ($1 \times H \times W$) attention masks between its modules."

圖：基本問題 (module) 的類型以及其運作方式。大部分 module 都會直接、單獨的運用"stem" (image feature)，這樣會比較不受其他module影響。

而 stem 是圖片經 ResNet-101 提取特徵再經捲積而成。

Performance Improvements

論文說他原本依照上述簡單想法隨便寫一個 initial model 就很棒了，再細調就不得了，達到state-of-the-art：

Regularization:

運用 regularization 對 attention mask output 做 penalize，其輸出結果會比較"嚴格"。

提高 spatial resolution

從 14x14 到 28x28，表現結果就會變棒。

CLEVR CoGenT

CoGenT是CLEVR的一個子任務，全稱是Compositional Generalization Test。

可以參考https://bigdatafinance.tw/index.php/data-visualization/low-frequency-stock/79-2015-06-24-14-05-24/tech/558-cvpr-2018-mit

結論是 TbD-net has good performance in this test for "generalization" 。
最後可以再讀一次https://bigdatafinance.tw/index.php/data-visualization/low-frequency-stock/79-2015-06-24-14-05-24/tech/558-cvpr-2018-mit，這樣會對這篇paper的概念更了解。

Notes of Ken Lin

Paper 整理：Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning

前言

論文連結

Abstract

任務-VQA

現況

Transparent

Related Work

Johnson et al, ICCV 2017

Transparent by Design

Performance Improvements

Regularization:

運用 regularization 對 attention mask output 做 penalize，其輸出結果會比較"嚴格"。

提高 spatial resolution

CLEVR CoGenT

沒有留言:

張貼留言

Paper 整理：Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning

前言

論文連結

Abstract

任務-VQA

現況

Transparent

Related Work

Johnson et al, ICCV 2017

Transparent by Design

Performance Improvements

Regularization: 運用 regularization 對 attention mask output 做 penalize，其輸出結果會比較"嚴格"。 提高 spatial resolution

CLEVR CoGenT

沒有留言:

張貼留言

Regularization:

運用 regularization 對 attention mask output 做 penalize，其輸出結果會比較"嚴格"。

提高 spatial resolution