Perceptions

I joined the Perceptions team in September 2024 to work on 2D & 3D LiDar object detection. Starting May 2024, I was appointed as the stream-4 Perceptions Lead :)

Intro to Perceptions Slide Deck

here

LiDar Object Detection

TIP

See Watonomous Cheatsheet for useful cmds.
See the WATO Stack for a high level infra overview.

PR branch here has demos of the LiDar node in action.

Perceptions Efforts

High level overview of WATO workflow

Here are my ideas for ideas on how we can make Perceptions CRACKED + things we should work on this term (Bold is for integration, the rest is for R&D)

Downstream sensor fusion for tracked objects
Pedestrian behaviour prediction
Turn signal detection (I like this)
Road marking detection
Shared YoLo backbones
Batched inference for YoLo models
Build sensor mount

Questions for Eddy (Lead onboarding meeting)

Leadership style?
Where do priorities come from?
Incoming co-op to perceptions questions
Extra responsibilities as lead (meetings, keys, etc)
Goal of S24
- Sensor rack + integration stuff?

Corresponding Notes from meeting

Self-driving means lots of things ;)

Milestones from last term
- Camera, LiDar, MPC done (we got things done that directly touch the car)

New milestones
- Split perceptions into R&D (new ideas etc.) & core (integration-side we need to get shit done)
- Once things that touch the car are done we want to do this:
- Main objective: Tracking (Object Tracks)
  - As opposed to just detecting objects in space we want to track OBJECTS and associate bounding boxes with objects not just take note of occupied positions (in addition to position we want velocity as well)
  - We want to create tracked trajectories of objects from detections to send down-stream to WM
  - We want to use each sensors strong suits to create these tracks
    - LiDar: Spacial awareness
    - Camera: Semantics
    - Radar: Velocity
  - Get Majority of tracks from camera, use the rest of the sensors for redundancy
  - Sensor fusion is for building the tracks
    - Look into AB3DMOT
- Sensor Rack
  - Build this out!!!
  - We should be involved since classical CV perceptions algorithms depend heavily on sensor extrinsics and intrinsics
  - Support VP where needed
- Ideas
  - Annotate shit (nope)
  - Some type of simulation to get annotated data (sim2real problem)
  - NeRF?
  - We want to tune both configurations and models
    - Both ROS stuff (things are overlapping well, timestamps have integrity) and ML models (fine tune models)
- AV Stack generalized: Perceptions, World Modelling, Action
  - Perceptions: Senses
  - World Modelling: Brain understanding the world and trying to predict the future
  - MPC: What should you do based on prediction
- Output of perception is tacked objects for world modelling

Core Perceptions Stack

Perceptions Term Projects

Monocular Depth Estimation
Radar Velocity Detection
3D MOT
Unifying Camera Detection
Segformer Semantic Segmentation
(LOW PRIORITY) Batched yolov8 inference
(LOW PRIORITY) LiDAR Velocity Estimation See Perceptions Backlog

Double check that the feed coming into the node is 30fps (for Lucas’s semantics)

🤖 Dan Huynh

Explorer

Perceptions

Intro to Perceptions Slide Deck

LiDar Object Detection

Perceptions Efforts

Core Perceptions Stack

Perceptions Term Projects

Graph View

Table of Contents

Backlinks