Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
167 changes: 6 additions & 161 deletions annotations/image-annotation.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,98 +2,9 @@

An **Image Annotation** is a reference to an image along with collected annotations about that image. Right now, only object-detection-style annotations are supported, though others (such as image-level labels or relationship labels) will be added in the future.

## Features
## Droplets

### Object Detection

Object detection annotations are organized around on _classes_. Each class is a single category of object that you wish to detect; for example, `person` or `car`. For each class being annotated, an image annotation may provide a list of _instances_, or singlular example of that object class in the image, and/or a list of _multi-instances_ or groups of that object in the image that were labeled once as an entire group. Each instance and multi-instance may have associated geometric information, and instances may additionally have deeper _attribute_ information.

## Types

### Droplets

#### Bounding Box

A `BoundingBox` is a tight axis-aligned rectangle containing a given detection. It may also have an assigned confidence, which would primarily be used to represent model output.

```ts
interface BoundingBox {
rectangle: Rectangle;
confidence?: number;
}
```

#### Segmentation

A `Segmentation` is a tight mask containing a given detection. It may also have an assigned confidence, which would primarily be used to represent model output.

```ts
interface Segmentation {
mask: Mask;
confidence?: number;
}
```

#### Keypoint

A `Keypoint` is an appearance of a point of interest for a particular class. For example, an annotation might provide the position of the "left shoulder" keypoint for a "person," or the "back right tire" of a "car." It may also have an assigned confidence, which would primarily be used to represent model output.

```ts
interface Keypoint {
point: Point;
occluded?: boolean;
confidence?: number;
}
```

#### Instance

An `Instance` is an appearance of a single example of a particular class within an image. An instance may have any subset of the following fields:

- `boundingBox`: a bounding box for the instance
- `segmentation`: a segmentation for the instance
- `keypoints`: a mapping from keypoint node names to their positions in the image (or `null` if the keypoint is not in-frame for this instance)
- `attributes`: a mapping from attribute names to their values

```ts
interface Instance {
boundingBox?: BoundingBox;
segmentation?: Segmentation;
keypoints?: Record<string, Keypoint | null>;
attributes?: Record<string, string>;
}
```

#### Multi-Instance

A `MultiInstance` is an appearance of a group of examples of a particular class within an image. (Some datasets, such as COCO, call this a _crowd_.) Note that there is not a strict definition for when a group of examples transitions from being many instances to being a single multi-instance. It is only recommended that this be used for operating on premade datasets.

A multi-instance may have any subset of the following fields:

- `boundingBox`: a bounding box for the multi-instance
- `segmentation`: a segmentation for the multi-instance
- `count`: the number of true instances in the multi-instance

```ts
interface MultiInstance {
boundingBox?: Rectangle;
segmentation?: Mask;
count?: number;
}
```

#### Class Annotation

A `ClassAnnotation` is the collection of all of the instances and multi-instances of a particular class within an image.

```ts
interface ClassAnnotation {
instances?: Instance[];
multiInstances?: MultiInstance[];
}
```

#### Image
### Image

An `Image` is a single picture taken by a camera. An image is defined by one or more URIs (paths) at which it may be located. When loading an image, implementers should use the first path that is able to be loaded successfully. (Earlier paths may fail due to access controls or due to an unacceptable protocol.)

Expand All @@ -103,9 +14,9 @@ interface Image {
}
```

#### Image Annotation
### Image Annotation

An `ImageAnnotation` is a collection of annotations on a given image.
An `ImageAnnotation` is a collection of annotations on a given image. See [Object Detection](../common/object-detection.md) for a description of the `ClassAnnotation` type.

```ts
interface ImageAnnotation {
Expand All @@ -115,75 +26,9 @@ interface ImageAnnotation {
}
```

### Templates

#### Instance Template

An `InstanceTemplate` describes what fields should be present for instances of a particular class. This may consist of any subset of the following:

- `boundingBox`: if true, then instance annotations must provide bounding boxes; if omitted, treated as `false`
- `segmentation`: if true, then instance annotations must provide segmentations; if omitted, treated as `false`
- `keypoints`: if present, then instance annotations must provide a keypoint annotation for every listed keypoint; if omitted, treated as `[]` (no keypoints)
- `attributes`: if present, then instance annotations must provide an attribute annotation for every listed attribute, and it must come from the given set of values; if omitted, treated as `{}` (no attributes)

```ts
interface InstanceTemplate {
boundingBox?: boolean;
segmentation?: boolean;
keypoints?: string[];
attributes?: Record<string, string[]>;
}
```

For `InstanceTemplate`, A ≤ B if and only if:

- `A.boundingBox` ⇒ `B.boundingBox`
- `A.segmentation` ⇒ `B.segmentation`
- `A.keypoints` is a subset of `B.keypoints`
- For each attribute in `A.attributes`, the same attribute with exactly the same set of values is present in `B.attributes`

#### Multi-Instance Template

A `MultiInstanceTemplate` describes what fields should be present for multi-instances of a particular class. This may consist of any subset of the following:

- `boundingBox`: if true, then instance annotations must provide bounding boxes; if omitted, treated as `false`
- `segmentation`: if true, then instance annotations must provide segmentations; if omitted, treated as `false`
- `count`: if true, then multi-instance annotations must provide a count; if omitted, treated as `false`

```ts
interface MultiInstanceTemplate {
boundingBox?: boolean;
segmentation?: boolean;
count?: boolean;
}
```

For `MultiInstanceTemplate`, A ≤ B if and only if:

- `A.boundingBox` ⇒ `B.boundingBox`
- `A.segmentation` ⇒ `B.segmentation`
- `A.count` ⇒ `B.count`

#### Class Annotation Template

A `ClassAnnotationTemplate` describes what fields should be present on annotations of a particular class. This may consist of any subset of the following:

- `instances`: an instance template; if omitted, treated as the most permissive template (i.e., the `InstanceTemplate` with all fields omitted)
- `multiInstances`: a multi-instance template; if omitted, treated as the most permissive template (i.e., the `MultiInstanceTemplate` with all fields omitted)

```ts
interface ClassAnnotationTemplate {
instances?: InstanceTemplate;
multiInstances?: MultiInstanceTemplate;
}
```

For `ClassAnnotationTemplate`, A ≤ B if and only if:

- `A.instances` ≤ `B.instances`
- `A.multiInstances` ≤ `B.multiInstances`
## Templates

#### Image Annotation Template
### Image Annotation Template

An `ImageAnnotationTemplate` describes what fields should be present in an `ImageAnnotation`. It simply consists of a mapping of classes to `ClassAnnotationTemplate`s. (Note that the `image` field is not included in the template since it is always required.)

Expand Down
69 changes: 69 additions & 0 deletions annotations/video-annotation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# Video Annotation

A **Video Annotation** is a reference to a video along with collected annotations about that video. Right now, only object-detection-style annotations are supported, though others (such as video-segment labels or relationship labels) will be added in the future.

## Droplets

### FrameAnnotation

A `FrameAnnotation` is a single frame of a video. It is identical to an `ImageAnnotation`, except that it does not define the `image` and `kind` fields.

```ts
interface FrameAnnotation {
classes?: Record<string, ClassAnnotation>;
}
```

### Video

A `Video` is a reference to a video. A video is described by one or both of the following:

- One or more URIs (paths) at which the video may be located.
- A sequence of `Image`s that are the frames of the video, in order.

If both are provided, it is assumed that the video at the provided paths is the same as the provided images, and a reader may choose to consume either one.

```ts
interface Video {
paths?: string[];
frames?: Image[];
}
```

### Video Annotation

A `VideoAnnotation` is a collection of annotations on a given video.

```ts
interface VideoAnnotation {
kind: "VideoAnnotation";
video: Video;
frames: FrameAnnotation[];
}
```

## Templates

### Frame Annotation Template

A `FrameAnnotationTemplate` describes what fields should be present in a `FrameAnnotation`. It simply consists of a mapping of classes to `ClassAnnotationTemplate`s. Note that a `FrameAnnotationTemplate` is directly compatible with `ImageAnnotationTemplate`.

```ts
interface FrameAnnotationTemplate {
classes?: Record<string, ClassAnnotationTemplate>;
}
```

For `FrameAnnotationTemplate`, A ≤ B if and only if:

- For each class `C` in `A.classes`, `C` is present in `B.classes` and `A.classes[C]` ≤ `B.classes[C]`

### Video Annotation Template

A `VideoAnnotationTemplate` describes what fields should be present in a `VideoAnnotation`. It currently only consists of a `FrameAnnotationTemplate`, which describes the structure of each frame. (Note that the `video` field is not included in the template since it is always required.)

```ts
interface VideoAnnotationTemplate {
frames: FrameAnnotationTemplate;
}
```
2 changes: 1 addition & 1 deletion common/geometry.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Geometry

In order to represent labels on parts of images, the Droplet format defines a number of types for representing geometry. In droplet, the coordinate space for an image is **always** 0-to-1 on both axes. This was chosen so that stored geometries are independent of the resolution of the image.
In order to represent labels on parts of images, the Droplet format defines a number of types for representing geometry. In droplet, the coordinate space for an image is **always** 0-to-1 on both axes. This was chosen so that stored geometries are independent of the resolution of the media being annotated.

Since Droplet is a JSON-based serialization format, here we provide TypeScript-style type definitions for all of the serialized types. There are also language-specific bindings for operating on droplets; see the language-specific API documentation for more information.

Expand Down
Loading