Your "Flamingo" is My "Bird": Fine-Grained, or Not

Dongliang Chang1      Kaiyue Pang2      Yixiao Zheng1      Zhanyu Ma1*      Yi-Zhe Song2      Jun Guo1

1Beijing University of Posts and Telecommunications, CN       2SketchX, CVSSP, University of Surrey, UK

CVPR 2021

front
arXiv
     
front
Code

front

Figure 1. Definition of what is fine-grained is subjective. Your “flamingo” is my “bird”.


Introduction

Whether what you see in Figure 1 is a "flamingo" or a "bird", is the question we ask in this paper. While fine-grained visual classification (FGVC) strives to arrive at the former, for the majority of us non-experts just "bird" would probably suffice. The real question is therefore -- how can we tailor for different fine-grained definitions under divergent levels of expertise. For that, we re-envisage the traditional setting of FGVC, from single-label classification, to that of top-down traversal of a pre-defined coarse-to-fine label hierarchy -- so that our answer becomes "bird"-->"Phoenicopteriformes"-->"Phoenicopteridae"-->"flamingo". To approach this new problem, we first conduct a comprehensive human study where we confirm that most participants prefer multi-granularity labels, regardless whether they consider themselves experts. We then discover the key intuition that: coarse-level label prediction exacerbates fine-grained feature learning, yet fine-level feature betters the learning of coarse-level classifier. This discovery enables us to design a very simple albeit surprisingly effective solution to our new problem, where we (i) leverage level-specific classification heads to disentangle coarse-level features with fine-grained ones, and (ii) allow finer-grained features to participate in coarser-grained label predictions, which in turn helps with better disentanglement. Experiments show that our method achieves superior performance in the new FGVC setting, and performs better than state-of-the-art on traditional single-label FGVC problem as well. Thanks to its simplicity, our method can be easily implemented on top of any existing FGVC frameworks and is parameter-free.

Human Study

dataset

Figure 2. Human study on CUB-200-2011 bird dataset. Order, family, species are three coarse-to-fine label hierarchy for a bird image. A higher group id represents a group of people with better domain knowledge of birds, with group 5 interpreted as domain experts. (a) Human preference between single and multiple labels. (b) Impact of human familiarity with birds on single-label choice. (c) Impact of human familiarity with birds on multi-label choice.

Cooperation or Confrontation?

To explore the transfer effect in the joint learning of multi-granularity labels, we design an image classification task for predicting two labels at different granularities.

network

Figure 3. Joint learning of two-granularity labels under different weighting strategy on CUB-200-2011 bird dataset. (a) x-axis: β value that controls the relative importance of a fine-grained classifier; y axis: performance of the coarse-grained classifier. (b) x-axis: α value that controls the relative importance of a coarse-grained classifier; y axis: performance of the fine-grained classifier.

Our Solution

network

Figure 4. A schematic illustration of our FGVC model with multi-granularity label output. BP: backpropagation.

Results

network

Table 1. Comparisons with different baselines for FGVC task under multi-granularity label setting.

network

Table 2. Performance comparisons on traditional FGVC setting with single fine-grained label output.

Visualization

network

Figure 5. We highlight the supporting visual regions for classifiers at different granularity of two compared models. Order, Family, Species represent three coarse-to-fine classifiers trained on CUB-200-2011 bird dataset.

Bibtex

If this work is useful for you, please cite it:
@inproceedings{dongliang2021flamingo,
    title={Your "Flamingo" is My "Bird": Fine-Grained, or Not},
    author={Dongliang Chang, Kaiyue Pang, Yixiao Zheng, Zhanyu Ma, Yi-Zhe Song, Jun Guo},
    booktitle={CVPR},
    year={2021}
}

Proudly created by Dongliang Chang @ BUPT
2021.6