dsp/index.html at master · roshandash411/dsp · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
<html>
	<head>
		<style type="text/css">
			body{font-family: sans-serif;}
			p{display: inline-block;}
			img{display: block;}
			.container{width: 90%;position absolute;margin: auto;}
			.title{position: relative;width: 90%;margin: auto;text-align: center;font-weight: bold;font-size: 18px;padding: 1%;}
			.section{position: relative;width: 90%;margin: auto;padding: 2%;}
			.subsection{position: relative; width: 98%;text-align: justify;padding: 10px;}
			.heading{position: relative; width: 98%;text-align: left;font-size: 14px;font-weight: bold;}
			.text{width: 95%;font-size: 12px;text-align: justify;padding: 10px 0px 10px 0px;}
			.authors{position: relative;width: 80%;margin: auto;padding: 2%;font-style: italic;text-align: center;font-size: 12px;}
			.image{width: 95%;font-size: 12px;text-align: left;}
		</style>
	</head>
	<body>
		<div class="container">
			<div class="title">Feature Detection and Extraction using SIFT</div>

			<div class="authors">

				<!-- Start edit here  -->
				<p>Roshan Dash, Roll No.: 150108045, Branch: EEE</p>; &nbsp; &nbsp;
				<p>Sarvesh Jhunjhunwala, Roll No.: 150108032, Branch: EEE</p>; &nbsp; &nbsp;
				<p>Akash Shetkar, Roll No.: 150108049, Branch: EEE</p>; &nbsp; &nbsp;
				<p>Ajeyo Dey, Roll No.: 150102004, Branch: ECE</p>; &nbsp; &nbsp;
				<!-- Stop edit here -->

			</div>


			<div class="section">
				<div class="heading">Abstract</div>
				<div class="text">

					<!-- Start edit here  -->
					The scale-invariant feature transform (SIFT) is an algorithm in computer vision to detect and describe local features in images. The algorithm was patented in Canada by the University of British Columbia and published by David Lowe in 1999.

					Applications include object recognition, robotic mapping and navigation, image stitching, 3D modeling, gesture recognition, video tracking, individual identification of wildlife and match moving.
					<!-- Stop edit here -->

				</div>
			</div>

			<div class="section">
				<div class="heading">1. Introduction</div>
				<div class="text">

					<!-- Start edit here  -->
					Matching features across different images in a common problem in computer vision. When all images are similar in nature (same scale, orientation, etc) simple corner detectors can work. But when you have images of different scales and rotations, you need to use the Scale Invariant Feature Transform.
					<img src="Pictures/example.png" width="500px"/>

					<!-- Stop edit here -->

				</div>

				<div class="subsection">
					<div class="heading">1.1 Introduction to Problem</div>
					<div class="text">

						<!-- Start edit here  -->
						SIFT isn't just scale invariant. You can change the following, and still get good results:
							<br>
							1.Scale
							<br>
							2.Rotation
							<br>
							3.Illumination
							<br>
							4.Viewpoint
							<br>
							<br>
							<br>
							Here's an example. We're looking for these:
							<img src="Pictures/sift-objects.jpg" alt="">
							And we want to find these objects in this scene:
							<img src="Pictures/sift-scene.jpg" alt="">
						<!-- Stop edit here -->

					</div>
				</div>

				<div class="subsection">
					<div class="heading">1.2 Figure</div>
					<div class="image">

						<!-- Start edit here  -->
						<br>
						Here's the result:
						<img src="Pictures/sift-result.jpg" alt="">
						<!-- Stop edit here -->

					</div>
				</div>
				<div class="subsection">
					<div class="heading">1.3 Literature Review</div>
					<div class="text">

						<!-- Start edit here  -->
						Object recognition is widely used in the machine vision industry
						for the purposes of inspection, registration, and manipulation.
						However, current commercial systems for object
						recognition depend almost exclusively on correlation-based
						template matching. While very effective for certain engineered
						environments, where object pose and illumination
						are tightly controlled, template matching becomes computationally
						infeasible when object rotation, scale, illumination,
						and 3D pose are allowed to vary, and even more so when
						dealing with partial visibility and large model databases.
						<!-- Stop edit here -->

					</div>
				</div>
				<div class="subsection">
					<div class="heading">1.4 Proposed Approach</div>
					<div class="text">

						<!-- Start edit here  -->
						SIFT is quite an involved algorithm. It has a lot going on and can become confusing.
						<br>
 Here's an outline of what happens in SIFT.
 <br>
1.Constructing a scale space: This is the initial preparation. You create internal representations of the original image to ensure scale invariance. This is done by generating a "scale space".
<br>
2.LoG Approximation: The Laplacian of Gaussian is great for finding interesting points (or key points) in an image. But it's computationally expensive. So we cheat and approximate it using the representation created earlier.
<br>
3.Finding keypoints: With the super fast approximation, we now try to find key points. These are maxima and minima in the Difference of Gaussian image we calculate in step 2
<br>
4.Assigning an orientation to the keypoints An orientation is calculated for each key point. Any further calculations are done relative to this orientation. This effectively cancels out the effect of orientation, making it rotation invariant.
<br>
5.Generate SIFT features: Finally, with scale and rotation invariance in place, one more representation is generated. This helps uniquely identify features. Lets say you have 50,000 features. With this representation, you can easily identify the feature you're looking for (say, a particular eye, or a sign board).
<br>
						<!-- Stop edit here -->

					</div>
				</div>
				<!-- <div class="subsection">
					<div class="heading">1.5 Report Organization</div>
					<div class="text">


					</div>
				</div> -->
			</div>

			<div class="section">
				<div class="heading">2. Proposed Approach</div>
				<div class="text">

					<!-- Start edit here  -->
					From the image above, it is obvious that we can’t use the same window to detect keypoints with different scale. It is OK with small corner. But to detect larger corners we need larger windows. For this, scale-space filtering is used. In it, Laplacian of Gaussian is found for the image with various σ  values. LoG acts as a blob detector which detects blobs in various sizes due to change in σ. In short, σ acts as a scaling parameter.
But this LoG is a little costly, so SIFT algorithm uses Difference of Gaussians which is an approximation of LoG. Difference of Gaussian is obtained as the difference of Gaussian blurring of an image with two different σ, let it be σ and kσ. This process is done for different octaves of the image in Gaussian Pyramid. It is represented in below image:
<img src="Pictures/sift_dog.jpg" alt="">
Once this DoG are found, images are searched for local extrema over scale and space. For eg, one pixel in an image is compared with its 8 neighbours as well as 9 pixels in next scale and 9 pixels in previous scales. If it is a local extrema, it is a potential keypoint. It basically means that keypoint is best represented in that scale. It is shown in below image:
<img src="Pictures/sift_local_extrema.jpg" alt="">
					<!-- Stop edit here -->

				</div>
			</div>

			<div class="section">
				<div class="heading">3. Experiments &amp; Results</div>
				<div class="subsection">
					<div class="heading">3.1 Discussion</div>
					<div class="text">

						<!-- Start edit here  -->
						1.Original image:
						<br>
						<img src="Pictures/1_original image.jpg" width="500px">
						<br>
						2.Scale Space and DoG level1
						<img src="Pictures/2_scale space and DoG level 1.jpg" width="500px">
						<br>
						3.Scale Space and DoG level2
						<img src="Pictures/3_scale space and DoG level 2.jpg" width="500px">
						<br>
						4.Scale Space and DoG level3
						<img src="Pictures/4_scale space and DoG level 3.jpg" width="500px">
						<br>
						5.Scale Space and DoG level4
						<img src="Pictures/5_scale space and DoG level 4.jpg" width="500px">
						<br>
						6.Keypoints before elimination
						<img src="Pictures/6_kepoints_before_elimination.jpg" width="500px">
						<br>
						7.Keypoints after elimination
						<img src="Pictures/7_kepoints_after_elimination.jpg" width="500px">
						<br>
						<!-- Stop edit here -->

					</div>
				</div>
			</div>

			<div class="section">
				<div class="heading">4. Conclusions</div>
				<div class="subsection">
					<div class="heading">4.1 Summary</div>
					<div class="text">

						<!-- Start edit here  -->

						SIFT was one of the pathbreaking discoveries in the field of computer vision as this was scale invariant. This proved to be a stepping stone for further research in the field of computer vision. Another algorithm called sift was later developed which had similar foundations but the time complexity was reduced drastically. Nowadays we have quite a lot of open source algorithms like ORB algortihms, but still they are not as good as SIFT and SURF.
						<br>
						Moreover the learning curve was quite steep and we learned a great deal about topics in computer vision and its signal processing applications.
						<!-- Stop edit here -->

					</div>
				</div>
				<div class="subsection">
					<div class="heading">4.2 Future Extensions</div>
					<div class="text">

						<!-- Start edit here  -->
						These algorithms can be used for applications like object tracking in videos, image segmentation, panaromic stitching, 3D scene modeling, recognition and tracking.
						<!-- Stop edit here -->

					</div>
				</div>
			</div>

		</div>
	</body>
</html>