Episode 006: Spatial, AI & Robots

Recorded on February 16th, 2024

HOSTS

AB – Andrew Ballard
Spatial AI Specialist at Leidos.
Robotics & AI defence research.
Creator of SPAITIAL

Helena Merschdorf
Marketing/branding at Tales Consulting.
Undertaking her PhD in Geoinformatics & GIScience.

Mirek Burkon
CEO at Phantom Cybernetics.
Creator of Augmented Robotality AR-OS.

Violet Whitney
Adj. Prof. at U.Mich
Spatial AI insights on Medium.
Co-founder of Spatial Pixel.

William Martin
Director of AI at Consensys
Adj. Prof. at Columbia.
Co-founder of Spatial Pixel.

FAST FIVE – Spatial AI News of the Week

From Violet:

Open Interpreter – one of the first open-source LLMs to transcend it’s own sandbox

You’re likely well aware of text-based LLMs to pour out large blocks of code in response to your requests – but most of those LLMs exist within their own sandbox – a walled garden, for you to copy’n’paste those blocks of code out from, to make them into real applications.

Well, no longer. Open Interpreter begins to cross the streams of a virtual [code] assistant, and can offer to help you work *across* your current applications, for a wider range of tasks – on the fly. Granted, every command or code output is met with a prompt asking: “are you sure you want me to do this?”. This sounds like both a great way to put a human into the loop in the interim – AND a great way to have a user interaction to allow cross-application scripting to occur in the foreground.

Demo here: https://github.com/KillianLucas/open-interpreter/#demo
Product page here: https://openinterpreter.com/

From AB:

Depth Anything: Unleashing the power of large-scale unlabeled data

I’m calling it: this model will be plug’n’play in a robot/car/phone/device near you, within the end of the year.

This computer vision model – not just a concept, but code. And demo. And paper. AND freely downloadable model! – takes any image as an input, and returns a depth map – a greyscale/colour-scale mask of what it sees in the image, in terms of near or far from the viewpoint of the camera.

In essence, this does what previous a fleet of discrete RGB and/or LiDAR cameras would have previously done – estimating a scene in 3D space – but with a single camera as the input.

The model is so temporally stable that it’s been shown to process the still frames of a video, in sequence, and keep consistent depth masks across the flow of time. Amazing.

https://depth-anything.githu b.io/

From William:

Boston Dynamics’ Atlas now has fingers – and a real job!

In this YouTube short from Boston Dynamics, we get to see their Atlas humanoid robot begin to have working hands and fingers, AND we get our first view from Atlas’ own internal cameras, as the robot overlays a digital twin of the car part that its currently handling, over the actual part – showing the improvements in real-time spatial understanding. A fascination watch.

https://www.youtube.com/shorts/SFKM-Rxiqzg

From Mirek:

1X – slightly-uncanny humanoid robots… coming to an office near you…

A must-watch! 20+ android-like robots – on wheels – all performing various tasks around an office scenario, and all controlled by a central neural network.

Unless Captain Disillusion debunks this one in the coming weeks, we have to assume that it’s real – and if so – well *dang*. Low-cost (ish), but genuinely-human-replaceable… for some tasks, at least.

For bonus points: see how they plug AND unplug themselves from powerpoints when they’re low on juice!

https://www.youtube.com/watch?v=iHXuU3nTXfQ

Deep Dive Links

Luxonis – OAK LIDAR/RGB cameras for robotic applications: https://shop.luxonis.com/collections/oak-cameras-1
Bring me a Spoon – the thought experiment that is the litmus test for robotics & AI: https://bringmeaspoon.org/

_{To absent friends.}