Notice

Recent Posts

Recent Comments

Link

250x250

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

쉬엄쉬엄블로그

(딥러닝) Computer Vision Applications 본문

부스트캠프 AI Tech 4기

(딥러닝) Computer Vision Applications

쉬엄쉬엄블로그 2023. 5. 31. 12:05

728x90

이 색깔은 주석이라 무시하셔도 됩니다.

Semantic Segmentation

위 그림 예시처럼 이미지를 픽셀별(pixel-wise)로 객체를 분류(Classification)하는 것을 Semantic Segmentation이라고 함
아래에서 Semantic Segmentation task를 위한 접근법들을 소개

Fully Convolutional Network

파라미터 수
- Left(ordinary CNN) : 4 x 4 x 16 x 10 = 2,560
- Right(fully convolutional network) : 4 x 4 x 16 x 10 = 2,560
보통의 CNN 구조를 fully convolutional network로 변환하는 과정을 convolutionalization이라고 부름
Transforming fully connected layers into convolution layers enables a classification net to output a heat map.
- fully connected layers를 convolution layers로 변환하면 분류 네트워크가 heatmap를 출력할 수 있다.
While FCN can run with inputs of any size, the output dimensions are typically reduced by subsampling.
- FCN은 모든 크기의 입력으로 실행할 수 있지만 출력 차원은 일반적으로 서브샘플링을 통해 감소한다.
So we need a way to connect the coarse output to the dense pixels.
- 그래서 우리는 조잡한(coarse) 출력을 조밀한 픽셀(dense pixels)에 연결하는 방법이 필요하다.

Deconvolution (conv transpose)


Convolution	Deconvolution

출처 : https://github.com/vdumoulin/conv_arithmetic
Deconvolution은 Convolution의 엄밀한 역은 아니지만 파라미터의 숫자와 네트워크의 입력과 출력의 느낌으로 봤을 때는 똑같다

Results

Detection

R-CNN

R-CNN의 동작 과정

(1) 입력 이미지를 가져옴 (takes an input image)
(2) 약 2,000개의 영역(선택적 검색 사용)을 추출 (extracts around 2,000 region proposals (using Selective search))
(3) 각 영역(AlexNet 사용)에 대한 features를 계산 (compute features for each proposal (using AlexNet))
(4) 선형 SVM으로 분류 (classifies with linear SVMs)

출처 : https://arxiv.org/pdf/1311.2524.pdf

SPPNet

In R-CNN, the number of crop/warp is usually over 2,000 meaning that CNN must run more than 2,000 times (59s/image on CPU).
- R-CNN에서 자르기/왜곡 수는 일반적으로 2,000회 이상이며, 이는 CNN이 2,000회 이상 실행되어야 한다는 것을 의미한다(CPU 상의 59초/이미지).
However, in SPPNet, CNN runs once.
- 하지만 SPPNet은 CNN을 한번만 실행하여 더 빠름

Fast R-CNN

입력 및 bound boxes를 가져온다. (Takes an input and a set of bounding boxes.)
convolutional feature map을 생성한다. (Generated convolutional feature map)
각 영역에 대해 ROI pooling에서 fixed length feature를 가져온다. (For each region, get a fixed length feature from ROI pooling)
class와 bounding-box regressor 두 가지를 출력한다. (Two outputs: class and bounding-box regressor.)

Faster R-CNN

Faster R-CNN = Region Proposal Network + Fast R-CNN

Region Proposal Network

특정 영역이 bounding boxes로써의 의미가 있을지 없을지 분류함
- 특정 영역에 객체가 있을 것 같은지를 찾아줌
- 객체를 더 정확하게 찾아줌

9 : Three different region sizes (128, 256, 512) with three different ratios (1:1, 1:2, 2:1)
4 : four bounding box regression parameters
2 : box classification (whether to use it or not)

YOLO (You Only Look Once)

YOLO (v1) is an extremely fast object detection algorithm.
- YOLO (v1)은 굉장히 빠른 객체 검출 알고리즘이다.
- baseline : 45fps / smaller version : 155fps
It simultaneously predicts multiple bounding boxes and class probabilities.
- 여러 bounding boxes와 class의 확률을 동시에 예측한다.
- No explicit bounding box sampling (compared with Faster R-CNN)
  - bounding box를 따로 뽑는 Region Proposal Network 작업이 없기 때문에 Faster R-CNN보다 빠름
Given an image, YOLO divides it into SxS grid.
- 이미지가 주어지면 YOLO는 이를 SxS 그리드로 나눈다.
  출처 : https://arxiv.org/abs/1506.02640
- If the center of an object falls into the grid cell, that grid cell is responsible for detection.
  - 객체의 중심이 그리드 셀에 들어가면 해당 그리드 셀이 탐지를 담당한다.
Each cell predicts B bounding boxes (B=5).
- Each bounding box predicts
  - box refinement (x / y / w / h)
  - confidence (of objectness)
  - 각 bounding box가 box refinement와 confidence를 예측한다.
    출처 : https://arxiv.org/abs/1506.02640
Each cell predicts C class probabilities.
- 각 cell은 C class 확률을 예측한다.
  출처 : https://arxiv.org/abs/1506.02640
In total, it becomes a tensor with SxSx(B*5+C) size.
- 텐서의 크기는 전체적으로 SxSx(B*5+C)가 된다.
  출처 : https://arxiv.org/abs/1506.02640
- SxS : Number of cells of the grid
- B*5 : B bounding boxes with offsets (x,y,w,h) and confidence
- C : Number of classes

Bounding box를 찾아내는 것과 동시에 class를 찾아내는 것을 같이 하기 때문에 빠른 속도를 얻을 수 있었음

출처: 부스트캠프 AI Tech 4기(NAVER Connect Foundation)

'부스트캠프 AI Tech 4기' 카테고리의 다른 글

(딥러닝) Transformer (0)	2023.06.02
(딥러닝) Recurrent Neural Networks (0)	2023.06.01
(딥러닝) Modern Convolutional Neural Networks (0)	2023.05.30
(딥러닝) Convolutional Neural Networks (1)	2023.05.30
(딥러닝) Optimization (0)	2023.05.29