This week is a free-flowing conversation between AB and Arseniy Sergeyev – a true ‘young gun’ of the spatial computing world – and the co-founder of a startup working on Adaptive User Interfaces (AUI).
AUI aims to create interfaces that adapt to the user’s context, needs, and preferences, providing more relevant and personalized experiences.
Arseniy and AB discuss the current limitations of generalized interfaces and the potential of AI, specifically large language models, to understand user context and generate tailored interfaces. Arseniy shares his vision of a future where the world becomes the interface, with smart objects and environments adapting to individual users. He outlines his startup’s approach, starting with specific use cases and then building a foundational system for broader applications.
Heads up for a massive opportunity: as Arseniy heads to San Fransisco this week for meetings and networking, do note that he is looking for a technical co-founder – details in the show notes below!
https://www.linkedin.com/in/arseniysergeyev
We’re also publishing this episode on YouTube, if you’d like to watch along in full living colour: https://youtu.be/VDGRFUI_Khc
Chapters
03:07 Current Limitations and the Potential of AI for Adaptive User Interfaces
Arseniy discusses the current limitations of generalized interfaces, which are fixed and do not adapt to individual user journeys and contexts. He highlights the potential of AI, specifically large language models, to understand user context and generate tailored interfaces based on free-text prompts and user data.
06:02 A Vision for the Future
Arseniy shares his vision of a future where the world becomes the interface, with smart objects and environments adapting to individual users. He envisions a system where interfaces are generated based on user requests, and physical objects and digital interfaces seamlessly blend, providing personalized experiences.
25:11 Startup Approach and Use Cases
Arseniy outlines his startup’s approach, starting with specific use cases like food and then building a foundational system for broader applications. He discusses the importance of understanding the user’s context, preferences, and available options to provide optimal solutions. Arseniy mentions potential use cases like choosing food, conference matching, and wine selection.
38:17 Data Privacy and Decentralization
Arseniy emphasizes the importance of decentralized data processing and user privacy, aiming to create a user-centric system where individuals control their data. He discusses the potential risks of centralized data collection and the need for a more secure and privacy-focused approach.
57:50 Partnerships and Timeline
Arseniy discusses potential partnerships with technology providers and the need for guidance and mentorship from experts in personal AI systems and human-computer interaction. He estimates a timeline of 5-10 years for significant progress in realizing the vision of adaptive user interfaces and personalized experiences.
Transcript and Links
AB
Well, g’day and welcome to SPAITIAL, This is Episode 23 – coming to you from deepest, darkest, emphasis on darkest in my part of the world. It’s a bit late at night. Why is it late at night? I hear you ask?
That’s because I’m talking to Arseniy Sergeyev, who is in Riga in Latvia. So in your part of the world it’s just turned over midday. Now I’m basing this timing on the fact that we’ve coordinated this and the Tour de France starts in a little while.
So I’ve sort of got my time zones really good, but Arseniy, welcome. Great to chat to you. Oh, look, I’ve got to say, and I’d better spill some beans. This is our second chat.
We were chatting actually at the start of the year, even before we were chatting about doing a serious chat, we were literally having a good open conversation about what is Spatial AI, where’s it, “what are you doing, what are YOU doing? Oh my God, that’s so cool.” The offer was made, I think, way back then.
And it’s great to be able to have time to circle back and actually find out what you’re actually up to. What you are up to is an acronym that is new to me. The words themselves aren’t new. The way it’s put forward is so, the acronym is AUI, Adaptive User Interfaces. And I’m mad keen to find out as much as I possibly can about it. What’s a one-liner? What’s the Twitter post? What’s the hashtag? How would you explain it to a grandmother?
Arseniy
Well, if we’re talking just about the adaptive user interfaces, AUI, it’s basically the interfaces that adapt to what you want to do what you need to do. Right? So they understand your context, and they give more relevant things for you to do, rather than just a general universal interface.
AB
That was the idea. So somewhat of the concept of a computer as a general purpose device where the screen changes for what we needed to do and ironically, inputs and outputs were left with mouse and keyboard, mouse and trackpad as the, probably the only physical remnants of here have a keyboard, here have a way to move a two dimensional thing.
But in that same way that to abstract and keep on abstracting even beyond what we’re playing with today, is it a function of I haven’t got enough screen space, so I had to make it adaptive or why is AUI, why is this way to be adaptive and context aware?
Why is this the future? What’s the real gem behind doing this?
Arseniy
Well, the issue with the current interfaces, and I’m talking interfaces in quite a general terms, but of course we are most used to smartphones, we are most used to computers. Any screen that we have is our most common interface, but of course there are more physical interfaces, knobs, some switches that you have on the surfaces of objects.
You have objects, you know, you’re putting a teapot on there, it’s also an interface. But I’m more talking, in this case, in this scenario, we’re talking about adaptive user interfaces in the context of present times, it’s mostly digital interfaces, right, on the screens, websites, apps, where we have.
And so the issue with the current interfaces is that they are quite generalized, so they are quite standard and quite fixed to what you can do in them as a user. And the idea is that the issue is that different users obviously have different user journeys towards what they want to do, and how they want to do it, right?
Some people want to learn more information along the way, some people won’t just instantly do something and just don’t worry about it, but the current state of the interface is doesn’t allow that. And the idea is to enable the interfaces to adapt more to your needs and to your individual context, so that you don’t have that hassle of going through someone else’s user journey where your user journey is different,
With AI, of course, this is finally more possible, and specifically with LLMs as well, because we can get a better understanding of the context of the user just from the free text prompts, right?
And that allows to get even more information, which wouldn’t be just static metrics and adapts already based on that.
AB 04:56
Wow. So being able to harness the interactions that are going on with the user and any system as part of that natural interaction, also being able to bend some of the, you know, de -emphasize, focus, highlight, deprecate, but also morph and change to be context switching.
That’s pretty phenomenal. Is this the holy grail? Is this a pipe dream? Is this being done in small elements right now? Or is this like a new thrust of a design mode that is everyone’s hoping will one day come out, but, you know, we’re in alpha mode right now.
Where is it on its own journey of context switching?
Arseniy
Well, I feel like that’s the logical next step for the computing that we have. Okay, so because there’s quite a big trend in terms of actually, there’s a big trend towards the computer disappearing. So the computer, the computer being all around us, the world being our interface, that’s actually also the future downloads.
And the current state of things is that it’s just it’s just very early in the beginning, there are a one example is brain AI, they’re doing really magical work, it’s really beautiful, and how you go from a prompt of what you want to do, towards the options you’re presented to fulfill a task.
So if you want to eat, you want to eat for a sign up for dinner, they have this demo, and it just generates an interface for you, it’s a generative interface with them. And so it generates you an interface with all the products that you need with a recipe, and you can buy through that.
So it’s not static interface anymore. It’s already a generative interface which generates based upon your request. And that’s already happening. They recently launched together with a T -Mobile, I think in Germany, and I’m mistaken, it’s a global company, but I think it was in Europe.
But the idea is that they made a phone specifically is a phone without the apps. It’s an interface that adapts to your request. And so there are some projects that are already working towards that. And there are a couple more that basically prepare the grounds for the data that is necessary to understand the context of the user and then adapt and trace to them.
So now that the grounds are being set, I would say, and with AI, with all the generative AI specifically, it now allows to really get into the context and not only get into the context, but also do something with adapting to it.
But otherwise, Facebook, Google, Amazon, they’ve been talking about ambient computing, ubiquitous computing for quite a long time. And then Google’s advanced technologies and projects department, they already started doing some work years from now, before now, so they’ve been experimenting with sensors, with radars, how you can mix interaction from the physical world with the digital interfaces.
And they are adapted to your context, right? So they’re not just fixed.
AB 08:08
I come back to the hardware sensor stuff. I know that’s something I’ll maybe throw dozens of questions at you, but just rolling it back a fraction, is it one way of looking at it potentially, is it the purposeful opposite of muscle memory?
I’m thinking of we’ve gone through the era of icons everywhere, dropdown menus everywhere. I’m now thinking of the Photoshop sort of esque suite of palettes everywhere. And if it could stop, please moving my palettes around, that’d be great.
But muscle memory is probably the thing that saves you. You know that’s there, if your keyboard, if so, if your fingers don’t have the muscle memory for the keyboard shortcut, at the very least your eyes can latch onto the icon, or you know that it’s the third thing down in the fifth thing or it’s tucked away under here.
All that’s great. And it’s great to be fast and proficient, but muscle memory is hard to get. And when they change things based on you every alternate season just because, yeah, that’s quite tiring.
Is this trying to be, ‘don’t try and learn something, I’m trying to be one step ahead of you’? So in the example of baking lasagna, not having a predefined ‘oh, they’re doing a recipe quick, bring up the recipe interface’, but all they’re saying to be doing something which will require a list over here, a timer here and, you know, pointers on the screen to say, put that with that.
Is it that level of low level fundamental building blocks that rise up to be smart, or is it actually today that mid level of, we have a recipe tool, we have a scroll list tool, we have a, and I’ll just mash together the mid level components until user is satisfied.
Where’s it sit on the perfect, lots of little good bits or just on the fly, it comes out of nowhere and, you know, lets you just do your job.
Arseniy 10:03
Right. Well, there’s a fine balance to strike so that not to make people learn something new, because just because we invented something new, and you know, you need to learn how to interact with it. That’s how many devices and interfaces are being created.
And it’s suboptimal. The idea is, first of all, is to reduce the number of actions that you need to do to get to your desired result. Because all those are micro frictions, right? So you know where the icon is, you know where the buttons are, but you do need to do those micro steps, right?
Press this button to proceed to the next step, right? What do you want to eat? All right. What do I want to eat? I select the meal. All right. What do I need to buy the next step in the journeys? You know, choose the groceries, add them to cart, pay.
So those are all micro steps that cause micro frictions. And yeah, removing those, this muscle memory, like not having to do it at all, is already a good thing in a sense that, you know, we’re not starting that much just by reducing those steps.
We’re not in a combination of different use cases in the day -to -day life. We are already much less stuck in the devices that we are at. We can raise our head, raise our eyes from the device, already a bit more.
That’s one thing. But on another hand,
AB 11:08
Wiggle your mouse to find your cursor. Where did I leave it? All those milliseconds not only add up, but they make you weary. Yeah.
Arseniy
Exactly. That’s why voice assistants are sometimes very useful that you don’t have to take into your hands the device You just say it if you have it already formulated in your mind, but it’s a different topic But then when we do keep some actions which are you know So you do need to press some buttons or do you need to show some gestures?
To kind of choose to make a choice make a decision And or we add some new ones we do want to rely on the existing muscle memory And that’s important so that we don’t Make it important by kind of seemingly simplifying the mental effort By creating something you wouldn’t want to add additional mental effort of learning something
AB
Gotcha, so fallback to conventions where that actually is the fastest way to get a part of the job done. But sometimes the whole might be radically different in different circumstances. Even if you do the same thing three times, you might get a different interface every single time.
Arseniy
And the question here is, how much back do we go in terms of what is this level of the actions of the gestures that we’re used to that we go back to? Maybe we don’t need anymore to remember how to click on the icons of the apps, figuratively speaking.
I’m imagining that we do need to go back to the most natural ways of how we interact with the things. You know, we’re waving or we’re sliding, basically, all right, those are already kind of screen native interactions.
But it’s just so they’re more natural. Just like, well, we’re getting one dimension.
AB 12:58
Because with 2D screens and flat surfaces with mouse trackpad, they are an elegant solution to have a 2D input for a 2D device that you’re trying to control. Perfect in many, many ways. Hence why it survived 40 something years.
Thank you Xerox (and I won’t get into the Apple debate) but thank you Xerox, Sparc Xerox. But once you remove yourself from having to be a two dimensional flat thing to how bad I rotated and pinch to zoom and twist your fingers on a flat screen, there are good conventions.
But what if you could be released from that? How do you see behind something? How do you grab? So is that part of the mix of already thinking outside the literal rectangles that we’re bound by?
Arseniy 13:49
I would say so, because we can use still those same interaction modalities that we are using on the 2D screens and interfaces on the same pinch and zoom and sliding and scrolling. We can use it more in the 3D world, but in spatial computing it doesn’t make sense, as I’m seeing it, it doesn’t make sense to put the same screens that we have on 2D surfaces to put them as Apple did right now for many apps.
It just makes much more sense for more natural and seamless interaction to have 3D modalities for interaction. You have a table here, you touch the table, you don’t touch a window that says, you know, a table.
For example, you do interact with the 3D objects, you don’t interact still in the 3D environment with 2D screens, as you do, because it doesn’t make sense.
AB
Yeah, correct. And it makes sense to transition from 2D to 3D to have lots of floating screens around. It makes your shoulders tired, pinching to Zoom and doing everything up here. I had some excellent play with the Hololens’ when they came out, version one and version two.
And they were good if I am on the YouTube version. They were in a natural in front of you, but you couldn’t go too far left, right. You basically had this. It was fun to watch different people sort of try and control things up here and hang on the field of view of sensors in a normal resting pose.
It’s lovely to see the Apple Vision Pro have a bit more range. You can be on the couch and just go sort of click, click and it should be able to pick those sort of things up. But to have your, not have your shoulders raised to manage multiple flat screens in 3D space around you versus I just want to have this screen where I would normally have it.
I’m drawn to the, I don’t have it here in front of me. He’s probably a little bit dead, but I am the owner of a beautiful big Wacom tablet. One of the big ones. It’s awesome. It plays as much as this desk does, but it’s glorious because with one big move, you could go from right in front of you to the whole screen would just become your desk and you could be drawing and shading and painting and doing awesome stuff.
But there weren’t many times I went from upright to desk mode. It basically stayed in the one position for most of any given month till it was unwieldy. But that different level of thinking is hard to get into.
And once you’re in it, hard to break out of.
Arseniy
Exactly, exactly. And then come the new best practices for good user experience. You know, you’re thinking about already more physical aspects of the interaction, which is a super important thing of interface design.
AB
How did you get into this? how did you start this journey? Was it the classical, was it from design first or human need first or from I’m a coder and this is broken, I need to fix it? What sort of avenue did you come into this realm from?
Arseniy 16:52
The quick answer is a combination. One thing is that throughout my life I always wanted to create something on my own, something to improve the world. Every Apple event was like a special happening to me watching all those kennels, one more thing.
AB
Yeah, so we’re both wearing the black turtlenecks. We’re not quite, but still, yep, definitely a design inspiration for multitudes. Yeah. But how did you get into this then specific field or how’d you get to see that this is the thing that I desperately want to do?
Arseniy
Yeah, and so basically, I’ve been inspired and you know, I want to create something as beautiful as functional like this and present it also to the world like in a very, in a very powerful way. So it really changes how we do things.
So we do things better. Then I got into in the university, I was in the retail specialization that we got into the era of psychology, people and it was super interesting. It was about you know, Kahneman, you know, the classics, you know, the nudging, how do you, you know, different bias, different hero stuff, behavioral things that actually can subconsciously nudge people towards one extra or another.
And it was super exciting, because in the context of retail, you could make the life easier for the users by showing them the choice faster. And for the businesses better, because they just have better chance of conversion.
Yeah, right. And that’s all right. I want to do something with that, something with understanding the user better and improve the experience for them. Then I also take some additional courses, then I, you know, worked as an e commerce with user experience optimization.
So all that kind of came together, building up some skill in that direction. But on a personal level, I think that was the biggest part, is one part of this inspiration. But then another part is just being crazily frustrated by how imperfect our interactions are with the world right now.
We still have to think about how I can do this. It’s like how to do things. And then we even, we thought how to do it best. We considered different options. We took everything into account that has to be taken into account to make it optimally.
We still need to do micro actions, either in an app or compare options with apps. Super frustrating. It does really take focus away from real important things. And that’s when it hit me that it just must not be that way.
And the world should be, it’s already going that way with all the big second and smaller companies as well, startups. But it needs to get there faster. So that’s how I got in there. And the ideas on how to do that have been in my mind, you know, for, I would say, two years, year and a half.
And yeah, in the end, I understood the right tool. It’s been too abstract. It’s too ambitious, too big. But then we started with something smaller, of course, going to that big ambition. That’s the story of how.
AB 19:55
Well, you currently have on your LinkedIn bio, in that classic mode, you have a startup where you are the co-founder. The company name is, well, you haven’t named it yet, and you’re welcome to reveal or you want to keep in stealth mode, perfectly fine to honour that.
So in classic startup stealth mode, are you building, are you prototyping? Can you envisage something that is really there? To have this idea in your head for two years sounds like, I don’t know… There’s a thing called the ‘idea maze’ and a creator or a founder comes up with lots of ideas and ideas are free. They’re cheap. You shouldn’t get people signed NDAs because an idea is worthless unless you go through this process of testing it against yourself for as long as you can.
And the good ideas fall out of that maze. They literally last the test of time and you’re still thinking about them, tuning them over, or they still are that bright light that you run back to even after multiple years.
So this is obviously something that in your brain you’ve probably got tired of, dreamt of, put down for a little while, come back to it. And this is now your sole focus. What is the, what’s this journey?
What’s the startup stealth to non-stealth journey that you can foresee and you’re willing to tell?
Arseniy 21:29
Right. Well, it’s indeed in the stealth mode. It’s I would say it’s like semi stealth, because we do participate in some programs, we do mentioned what we do in the name of the company as well. There’s mostly mostly stealth, not to not to be too early, before we have something solid.
That’s actually, but basically, the idea is to build out the system that knows the user, and then provide output for adapting the interface to them. And so at least showing the area already gets me to connect with the right people with the right competence, who is also who are also passionate about the field of, you know, making the interfaces more personalized, more optimal, in terms of interaction.
Okay, and so the path of that is I’ve been now for a couple years, like, all right, year and a half, it’s been a year and a half ago, it just started in my mind, quite abstract, I started putting it on the paper, conceptualizing of what does it entail?
How do we get to implementing this? And because I’ve been involved in commercializing science, basically, I’ve been involved in a platform that helps bring science from labs to market, mostly by connecting that with entrepreneurs to create deep tech startups.
So I’ve been exposed to the process of how do we identify the science, its level, its core competence of the research team, how do we bring that to the market? What am I missing? Yeah. Yeah. And from there, a big, big, good tool is being able to decompose the technology into its core components, which is your proprietary part, which is, you know, where you could cooperate with something which is open, available, and which is the special source where the real gem of the idea sits. Yeah. And then you can understand where you protect your intellectual property, where you focus your efforts on what your license or sell, etc.
Arseniy 23:21
And with that approach, I decomposed the idea the concept into the core blocks of the technologies. We have 5G there, we have haptics there, we have, you know, some some meta-materials, smart materials that make objects smart.
I’m talking now about the ultimate vision of making the world our interface, right? Yeah, but I had to start with something because it just, you know, it was blowing the mind because it was too big. So that’s what I started decomposing.
AB
“Boil the ocean”. How do you tackle something that you can demonstrate, gets the idea through, but doesn’t involve buying everything all at once and solving everyone else’s problems before you can solve the call problem?
Arseniy 24:02
Exactly. And that helped to narrow down a bit and to understand what to start with. And what I came to is that the world, so the special computer gives a huge opportunity, let’s say I think it’s 700 billion dollars in 10 years or something, but it’s basically special computer gives about interacting with the world as an interface, as I’m saying it, in different versions, either it’s AR, VR, or it’s you know just smart materials, et cetera, smart objects, IoT, but the idea is that it must learn the user first. And that’s what we’re like right now. And for that system when I decompose and understood what the core thing to start with is the system that understands the user, understands their basically preferences, their past choices, their current context on the individual level, and understands the options that are available to them.
And it processes that, it knows what information it needs for any given task in the user day -to -day. And it provides the best options based on that. That’s basically what we’re starting with. Right now it’s in the, we’re testing the core algorithm right now, building it up.
But the idea is that we need to find the core use cases. We have tested a couple of use cases. So then starting with day -to -day it’s like food, like choosing food, and you know what do you need to prepare for it?
Arseniy 25:25
The whole user journey. It knows what’s in your fridge in real time, knows what you like for sure, and knows what’s available around you. And it puts those blocks together to get you the optimal version, the optimal options.
And that’s just one use case that we’re starting with. We also did some conferences matching people properly based on their profiles. Wine choosing as well as a wine lover. It’s just, I’m not a wine professional, but I’m a wine lover for sure.
But it just overwhelms with all the choice when you want to experiment, you want to be too drunk all the time. So a couple of tries, but right now the next step towards going, you know, as we said, out of stealth or like into some real progress is really building a strong team to bring this forward.
And also why it was great to use this opportunity to talk to you because you do have exposure to really bright minds of people who are interested in human -computer interaction in, you know, in different forms and ways.
Right now we’re looking for a core team member, a co -founder, basically, who is, who would help us to architect the solution, basically a solution architect, trouble with background in machine learning, AI, and then probably back -end full stack.
AB
So basically, someone to code this with, to build this with, but also a user experience designer to make this user aware of where it’s going to lead to and not just build an API and say done my job yeah gotcha so that also needs to have the context of what’s coming down the line and a hundred percent buy into the fact that this is going to get gnarlier quickly so exactly exactly
Arseniy 27:01
Because we do we do have quite an early small team right now We just we need to move faster and not in a more profound way to to build the core core of the system right now And also of course the guidance people who are a bit further down the road in personal AI systems and Adaptive user interfaces or on the human computer interaction field Right now mostly about software, but later on as well hardware So this is basically we need guidance and mentorship for those people.
Those are those are basically two main Components for it to move forward really really well as I’m saying
AB
I’m glad to hear, oh, well, it’s good to hear that you’ve got a, you’ve brought out half the known world’s problems by not trying to solve hardware. First, you just stick with software. It’s cheaper. It’s easier. It requires ones and zeros and a battery or 240 volts or 220 volts, 120 volts, not many volts you want. Five volts, this one.
In this early stage, are you also going to try and limit the use cases down to, let’s say, adaptive user interfaces, but just for a smaller scenario of problems? Or are you going to try and do a foundational piece that tries to be that personal context tool with limited outputs for the first few phases of how it can help? Where’s the focus on the software of going strong in a vertical, in marketing terms, or to be going horizontal on the foundational piece so you’ve got your own secret sauce in how you filled up the knowledge of a person irrespective of wine, coffee, lasagna, choices?
Arseniy 28:44
Yeah. Well, we do start with something, with a certain use case. But in parallel, we do want to do the horizontal part. Because the horizontal part will be something that will be really impactful and enable much more seamless interaction with the world.
And that’s the vision what to build. We’re starting with the food use case, basically taking the whole user journey and all super seamless and making it ultra personalized and optimal to the user. But then in parallel, we want to build the core system that will be adaptable to different use cases.
Right. So this one basically will help us learn on, you know, what are the key things. And then based on that, we’ll be able to develop something, you know, already more foundational. And here is that understanding, you know, what what type of strength and what type of competence and the same would need, because this is a lot of research should be done there, right, for developing something foundational.
So that’s why we’re starting with something more specific than to be more informed and more structured when we’re doing something more foundational.
AB 29:53
That will let you, for instance, in food, learn food preferences, likes, dislikes, what you have on hand, what you don’t have on hand. If you’re in a certain part of the world, you will never have X, but you may have Y.
But that may give you deep insight into whether you need a personal context engine, the personal AI to be, you know, this size, mega size, or humongous size. It might give you the first glimpses into whether you can get away with a small little nugget or, you know, gem of an idea, or whether you need to go big quickly and perhaps the same in the large language model lessons learned in the last few years of, yes, language models going from small to medium sized is good, but at some point in time, there is a break point where they just get good. They do things that you can quite think of beforehand. So you might be looking for where are our break points, where is under baked to put the food metaphor back on the table, and where is, yeah, spot on that can actually start to give you the guidance so you can transition to more and more personal knowledge.
Arseniy 31:01
The ultimate vision is to have all the day-to-day things and why food also is because it allows to us to quickly to test people – people eat every day!
AB
It’s a rather common theme amongst, yes, people in this world : ) That makes that makes sense. And yeah, everyone’s got a bit of a preference. It’s nice to have that level of “what have I got on hand?” What’s ahead? Yes, yesterday. Can I have a, you know, cheese and biscuits for the ninth day in a row? Yeah, yeah, no, it’s time to mix it up. This seems like the dream and interface of really bad early apps that you’d take a photo with your fridge and it would come up with given these ingredients, you can make X and Y and Z.
It seems like the ultimate of that kind of tool.
Arseniy 31:50
I would think so, yeah. Because it is already possible. You kind of take a photo, I think, even with ChatGPT, and throw you some recipes on. There’s Samsung Smart Fridge that does the analysis with the cameras of the pretty contents and reminds you to buy something.
It’s kind of already towards that direction. But Samsung Fridge can analyze up to 30 or something products at the same time. They also need, ideally, to be weighed down very accurately for the system to recognize.
So it’s the precision. And I think, going back to our first conversation, it started, actually, it was connected with – I reached out to you, I think, after your podcast about local AI. So it was the statement that there are those large models, but what we’re really liking are those more specialized models based upon those that really know much better how to analyze things.
In this case, it would really help that if we had a model that knows – can much better distinguish between the products if they’re even piled up, not just artificially weighed down. But in this sense, there’s a lack of connection between all of that and there’s a lack of universality in terms of your scenarios.
So you might be sitting at home, you want to cook something or you want to order something. It’s not all connected. You have Uber Eats separately. You have your Smart Fridge or wherever separately. But the idea is to have this user journey of eating optimized for that user in a centralized manner.
So it’s probably a single interface, or if the person uses some service, it adapts based on that user profile, already switching the box with you as well. So that’s the ultimate, indeed.
AB 33:44
Now – you sent me some links, which I’ve done my homework on. So thank you for that. First time to give me some homework. Cheers! I saw some excellent TED talks and some lovely other great concepts.
One here is called pAI-OS, Personal Artificial Intelligence Operating System. Good to hear an Aussie voice on the end of it. It’s definitely thin and starting, but obviously I could see why you sent me the links to these sorts of things.
I was having this chat with some of my teams today and we’ve gone from being a data scientist where you’d soup up your PC with the GPU and water cool it – and maybe overnight you might be able to bake a model and hopefully not, you know, use all the power in your street.
Then you went to the cloud and grabbed an A100 for 30 minutes and, you know, paid you $7 to one of the cloud providers and did the same thing, just more quickly. But now we’re almost leaving that up to the large language models, the big foundation models in every data domain are the place you go for image, for language, for multimodal, for coding.
And those large language models have no idea about you or me. I could ask it to draw two people having a podcast conversation, facing their cameras, you know, with a turtle tank in the background and yourself with black and white squares in the background.
I couldn’t ask it to draw ‘AB and Arseniy’. It would have no idea what that concept is. So being a general model means it just, you know, knows general things and can make ‘general stuff’ up.
I must keep the people who are trying to recreate the Will Smith spaghetti videos these days, because you can’t type in ‘Will Smith eating spaghetti’. I believe you have to basically prompt him into existence?!
So in that realm of we’ve gone from our own baking – our own models to do our own personalized things: “I don’t just have pictures of cats and dogs, I have pictures of my cats at my dogs”.
We’ve now foregone that to the power of these large foundational models with the asterisk is that we’ve lost that personalized sense of “yes, but it doesn’t know where I am”.
It doesn’t know what my local street offers. It doesn’t know where my closest X or Y or Z is. It has no idea even what the town is that I would name. It just goes, okay. I know the name of that town.
I can tell you facts, but it can’t tell you anything about it specifically. It’s just regurgitating general knowledge here.
Arseniy 36:33
So yeah, or you just you just prompted all the time with all the data you say what do I do in this town? I like this I like this I like this but you have to write it down So you just might might as well give up because it’s too long.
You just figure it out yourself what to do there
AB
Gotcha. This is the untold story of what happens behind the scenes in a lot of demos. So to try and replay it back to you, AUI adaptive user interfaces is not just the interface has smaller blocks, perhaps conventions and components where needed, where that is an elegant solution, but on the fly, being able to use Gen AI to say, I need something that does vaguely this, draw me one.
And if the user swipes left or swipes right, try, no, that doesn’t solve the problem. You could reiterate it again. But that’s the visual, not tactile, but that’s the bit that the user plays with. But the important mirror to all of that is having that context under the hood to be able to know what they want, try and guess what they might need, infer their location, infer their tasking, infer routines, infer person. That’s drawing good amounts of data about a user. Is that challenged then to try and come up with a standard or a syntax, or is that a someone else’s problem? Is that personal AI context your core problem solving, or you can outsource that to someone else?
Arseniy 38:03
Well, here, the approach that we want to take is not to store the data ourselves, not to be able to look into that for each of the people. Because with the data, we know the key question for everyone is, you know, who gets to see my data, etc.
And the key here is to keep it if we’re talking, if we’re talking popular keywords, buzzwords, we say decentralized manner. I’ve been in an interesting meetup in London recently on the centralized AI topic.
So it was blending a bit AI and blockchain, but it was a it was an interesting perspective on thinking we have AI on devices. So the idea there is to have at least bigger part of the processing of the data and storing of the data on edge, either on the user’s device or on the device that is, you know, there’s processing it what they’re doing.
And it doesn’t go to the cloud, or if it goes to the cloud, it goes with the apples, presume, presume we say approach that it’s, you know, private cloud, as they call it. So the idea is that we you know, there is no central one entity that gets access to this data.
Because imagine if you have all the behavioral and preferences and pasteurized data for all the day to day things for all the people in the world. This is the new empire. This is the new I would say, that’s, you can conquer the world.
eah, you can come to the world with that. And here the idea is to take more user centric approach so that they control who they give the data to. And they they have the control over it. Thank you.
AB
Nice. Now, as you were saying that before you said, AI with Apple intelligence, Apple’s new rebranding of AI, clever, also painful. But then again, they did have the eye for 20 years. So we’re all used to the eye, everything with eye rolled so many times.
This does sound like it’s it’s a great fit. I’m thinking when you said that privacy and security in that regard, my first thought was, was it the private enclave on your devices? That is the classic Apple portion of your phone, your personal computer, watches, has it reached earbuds yet? But still, actually, no, I’ve seen some patents. It’s early and may never happen. But I’ve seen some patents where you’re a future version of AirPods. Your buds one day may be sensing your blood, your temperature, your other stuff / going into your health.
Arseniy
Maybe it’s coming. I think I saw something. Maybe
AB 40:45
But you’re absolutely right, this does lend itself, lead itself to, and these conversations with those sorts of San Francisco, Silicon Valley based companies who have privacy basically everywhere. And to have that distributed intelligence, it’s, it’s still pretty radical that there are more AI processing chips on our phones than there are, you know, I think on the ratio of CPUs, GPUs, and TPUs or AI processing units.
I think most things are AI, like that is where the lion’s share of the processing power comes through. When you take a photo, you’re not actually taking a photo, you’re keeping a shut open for a very long time.
And if you can say what you’re taking a photo of and adjust things so that, you know, the sky is blue, people are vaguely have rosy cheeks and they don’t look like they’re bright green. You know, there’s a lot of work going on when you think you’re taking a photo, there’s no single film is exposed.
That level of local processing first, where that is not possible, go up to a cloud, private cloud, I’ll anonymize the request though, so it has less of a, you know, can be farmed less. This does end like to me that there is a certain company you need to be talking to at some stage, keen to deny that they’re on your hit list.
Arseniy 42:04
Yeah, absolutely. There will be a lot of integrations, first of all, to also be able to provide those options to the user and smoothly do them. Right. So there will be a lot of technologies that we will need to partner with or integrate with also to make it possible.
So it’s not just service providers, but also to the providers of the technology who who do that, for example, the security bit, right? Yeah, we’ll, we’ll, we’ll work with the best in class to provide the security and the optimization is there.
And then in a small fashion with other parts of the system as well. But I’m just seeing that the distribution is super important. And Apple is doing quite, you know, they’re, they’re the biggest terms of the, or one of the biggest with the devices with the user base.
So if we can get there to the devices, of course, this will be a win. But, you know, everyone wants to get that user. Yeah. Access to the user base.
AB 43:00
…the hardware, the software, and they’ve got a real big focus on the experience and making it seem worse. So it definitely leads in that direction quite well. But irrespective of that, being able to tap into any phone, I dare say a phone or a flat screen is going to be the interface for a while, looking through a looking through a phone lens in AR style, you know, panning it around the kitchen and having on that screen overlaid what you wish to inform the user they can do with things.
That’s the interim or long term technology until we all get again smart glasses and
Arseniy
I’ll say even brain implants, but this would further away, but not that far. It’s just a super direct way of how to interact with things. You just don’t need an intermediary thing almost. You don’t need it.
But it’s a bit further. It’s a bit of a different topic. But I would say even AR glasses, VR glasses, there is a lot of things that can be much more predictive towards the user, much more proactive towards the user if the system knows them.
And also, all such devices as the Rabbit R1, I imagine if it knew already your context, it’s already, it could at this point already connect to at least some services, which it couldn’t, unfortunately.
But I am hoping for a very cool over -the -air software update, several months that will make it magic. Right now, it’s a bit not that useful. But imagine if it really held information about your profile or could connect with it, that would be a whole different story.
You would have to describe all the context that you wanted it to do. You would say, order me food. It already knows what you want. Because surely you want it.
AB
I’m thinking your context doesn’t switch every alternate 10 seconds? Probably find cases where it does, but in general, for the next hour, I’m hungry, I’m in food mode. After that, I’ve got to do X or Y.
You don’t need to pull something probably every millisecond. What are you doing? What are you doing? What are you doing? You could probably, I last checked context five minutes ago. I don’t think that’s changed.
We’re still exercise mode or commuting mode or focused on work mode.
Arseniy 45:17
And then you have all the health data that’s coming in. It’s, it’s like all the different types of data, more and more, they’re coming into the system, they’re being processed. And then you create an almost, it’s like basically this, you know, digital twin term, basically will be a digital twin few that will be kind of adapting the world to you.
That’s the old computer. Yeah. And you have the objects with all the information and actions that you have, uh, that they have, uh, for example, the table may have some information again about groceries in your, uh, in your, in your fridge, uh, or maybe it’s a work desk when it has your email or something like that notification.
Basically it already adapts. It shows what you need to see as a person. Another person comes in, they see different groceries because they’re vegan. Uh, they come to the work desk. They, they don’t see nothing because do they have privacy access or like they’re very secure.
So in that sense, that’s the big division. I’m, I’m thinking the world is going, it should be going and it should be going at a faster pace. And that’s, that’s what I hope to contribute.
AB 46:16
What’s your prediction? Can I get you to do the Bill Gates “the world only needs five computers” kind of stupid prediction that we will hang you on a few years later? What sort of timeline do you think something like this is in its infancy and what time do you think this might really hit its stride? One year, three years, five years, or this is a 10 -year kind of journey?
Arseniy
I’m thinking it’s in the range of five to 10 years. Because right now, we are basically with all the base systems and solutions being built, it is setting the grounds for being able to get the needed data, process it, understand the context better, and then do something with it.
For example, I’m a big follower of the Archetype AI project. So the same guys from the Google ATAP, they went away. They built their own stuff. And it is super profound in the way of how you bring the context from the physical world into the digital one.
Arseniy
So you do have, you process what the sensors see. Basically, time series data, they process it, and understand what happens either from the cameras, from the sensors. And then their system makes sense out of what happens in the real world.
And then in the digital realm, you can do some calculations or some predictions, et cetera. So that really sets the base of where do we get the data? How do we make sense of the world? But then next step is what do we do with it?
And if they are already doing some stuff, so it’s really moving fast. So I would say five, 10 years, maybe it’s already too long. So five years or so.
AB 47:54
Yeah, so between five and 10, but if you were able to be positive, close to five, if there’s a lot of unknowns in there, then okay, close to 10, that’s wild. A problem I would like you to solve for me, if you are talking to any one infinite way company, Apple Watch, and here’s a real life problem that I hope your future adaptive interface will solve.
If you’re going for a walk, it does, if you don’t set things, it says, ah, it seems like you’re going for a walk. Would you like to record? Yes, yes, so, and it does that after many hundreds of meters because it figures out pretty confidently that you’ve been going for a walk for a while.
We live in Australia where we live on the quarter acre block and every weekend in spring and summer, you’ve got to mow the lawn, don’t you? So the great Australian dream is knowing that backyard and doing laps up and down, knowing the silly thing.
I’m getting up sweat, I’m breathing heavily, my watch should know that I’m exercising, but I haven’t left my wifi and I haven’t actually left a radius. So it never, ever, ever suggests, hey, it seems like you’re doing some medium level exercise, would you like to record this?
Of course, that is, for me, that’s a cracker of a use case for context. If it could know that every alternate, it’s on a weekend and it’s in winter, you don’t have to, and in some of you don’t have to, in Australia, the grass is dead, but in spring and autumn, fall, goodness me, the sound of lawnmowers on the weekend is what a shrimp on the barbie and lawnmowers is pretty much the classic Australian weekend.
That level of context awareness, it shouldn’t be hard to figure out that it’s on a day, that’s a Saturday or a Sunday and I’ve left X or Y, but my heart rate’s up and I’m doing something for an hour and, and, and, and obviously that level of context is easy.
If I can speak that context, you can probably put a rule somewhere and someone can find a way to action that with real code. That’s not AI, that’s fine. Real code works beautifully many times, but is that the kind of level of nuance you would love to be able to capture on a personalized level of different people in different scenarios and knowing routines, actions, spotting the patterns in daily life.
Arseniy 50:18
Yeah, absolutely. Yeah, I would say in our case, it will be more about if you see, if you’re going for a walk, it reminds you to get something it knows that you’re going for a walk, for example, it reminds you to grab some water or something.
So basically provide you additional things from the real world that you need to remember all for or to do. But in that sense, that it already knows your what you’re doing, what’s happening, goes, there are already systems like that, for example, the same architect, architect API, it, for example, you have a camera or some sensors that you’re working around, it understands what’s happening, you’re walking, probably sweating, because it’s hot outside, and you’re doing a lot of walking. So in this sense, it’s more about when you have certain, in our case, it’s more about when you have certain thing you want to do, it already predicts what you need in the world of all the things you need to take into account or do for achieving that in the optimal way.
It’s more about when you do want to have something. But also, after that, yeah, when it realizes that something is happening, it will most probably remind you that, hey, your heart rate is high, you’re sweating, grab some water, someone gotcha, gotcha.
AB
So less task focused, like I am going exercising or I am going to make a meal, but more delighting in the moment of, you know, it’s been, you’re in the kitchen, you know, let’s get stuff out and let’s get going or putting together parts of your daily routine that even you had forgotten about, but it’s able to suggest on the fly, build a way to show that to you and then back away.
Arseniy 52:09
Yeah, that’s exactly where interfaces are going. Basically, right now, in our case, also specifically the same thing, but basically, it’s in general. So first of all, it’s task focus, right now, it’s prompt focus, you have to write everything in the prompt.
All right, it will simplify how we get to the task results in the optimal manner. This is our current focus. But then all the other interfaces and for us also, the logical evolution will be that it’s already already knowing how you do tasks, what you usually when do you usually don’t do those tasks and how you do them, it really practically suggest, hey, yeah, I’m feeling you’re hungry.
Actually, I just finished the book I sent to you this interface design, it’s just came out, I just straightaway ordered it by Guillem Couche. So it’s a French name, but I’m, I might be wrong.
https://www.amazon.com/Interface-design-Creating-interactions-successful
Yeah, I might be wrong in pronouncing it. But basically, the final chapter is about what’s the future. And that’s exactly the productivity of the interface as being able to already know you’re looking at a banana specific example from the book, you’re basically looking at a banana.
All right, it already orders you the food that you like. So you don’t have to think about it, it already comes to you. And then it comes in, you know, when the human really blends with the machine. So of course, different sides to it.
Where do we know that it’s our decision, right? Yeah, that we processed rather than already the AI or the algorithm basically some kind of really subconscious thoughts, without us being managing to make that decision before it’s already delivered.
So but it’s an interesting thought experiment to think how far will we go? Also, no, you already wrote about the ultimate, ultimate vision of we won’t be communicating with our system, we’ll be communicating for all the decisions.
And then for example, even the romantic partners will be fine with that. So that’s a, we can go super crazy far in the future. But yeah, towards approximation.
AB 54:05
Yeah, you’re going to be doing the dark pedaling furiously underwater, but looking serene on top. There’s going to be a hell of a lot of furious pedaling and connections to be made at the back end. But the goal is to be as interface -free, context -switching, context -aware.
And as you said, reading your mind without having to read your mind, that’s an admirable goal.
Arseniy
And you know why? To live live, not technology. Because we’re too stuck in screens that we have right now, and devices overall. To do something. Why don’t we just want to do something? We spend a little time thinking about it and doing it, but really enjoying the results of that.
AB
So make it humane. Everyone has something which even though it hurts, it kind of feels good doing it the long slow way cause they’re in control, but gee, being able to keep up more and more stuff to be able to just, you know, and wave it and have it done is, you know, is glorious.
And we can do that these days because of, you know, CPU power to energy is climbed beyond belief. I was talking about systems today where, you know, if you have a data set not long ago, you would throw a few models at it and you would get good results.
Now you can cut it before you go to lunch, tell your meta AI, not your meta AI, but your mega AI to process it all the ways, like literally process it in all the known models from this list. And even though most of those are going to be stupid with what your data set is, with what the model is, there’s no harm.
One day you might be pleasantly surprised there was one odd idea that just, you know, was a delight that you hadn’t thought about because it was outside your knowledge base. And sometimes it does just, you know, to justify that what you thought was great, but that there’s far less penalty now for doing things aspirationally beforehand, having agents running in the background.
Yeah, our cents per watt per compute power that it does per cycle, it is just, you know, gone astronomically so much that the penalty is reduced to, you’re now in control of the next level of interface that I am looking forward to.
Lastly, can we find out the name of your startup? Can we reveal or links in show notes? I guess the question zero in this section is, how do we get in contact with you if not LinkedIn? What’s the way to reach out?
Arseniy
Yeah, I think I’ll leave my email for you. So sort of we, you know, just keep it as a as a as a contact email. Yeah, I think that’s the best way. Just write to me. I might might leave also the number, but I don’t think it’s necessary, especially given that I’m right now in Europe. So a bit too expensive to call.
AB
All good. Makes perfect sense. Can we do a follow-up? Are you doing an excursion to Silicon Valley, a extended stay, couch-surfing like the best of everyone to see how long you can last to funding and find partners?
Or when can we check back in with you to see what the next steps for both your co -founder has gone, but also how your meetings and conversations have gone? When’s a good time to check back in?
Arseniy 57:33
I think in a month or so, maybe towards September, because I’m thinking to go in a couple of weeks time, there is a Techstars hackathon on the physical and personal AI on this personal AI operating system.
So I’m hoping to meet some really exciting people there, connect with people, and also to meet with some of the guys already creating stuff on the cutting -edge market API from BrainAI, perhaps as well.
So whoever is in Silcon Valley will try to get a hold of it. But I’ll be there for a couple of weeks, I think.
AB 58:09
Legendary. Thank you so much. Absolute joy to chat to you now formally with the record button going. We’ve had a lot of this conversation, but before we even got into the detail here, many moons ago, but great to have you back on, to be able to talk to you at length.
From all your links, I will put most of those links in the show notes, I can possibly find. There might be a brain dump there, so apologies if they might be in alphabetical order, but that’s about it.
There might need to be some context switching there, too, to make that happen. Your details will be there 100 times, so it should be easy to catch hold of you. And by all means, we really are looking forward to seeing what this future brings.
The spatial computing world is probably one of the biggest doors that needs to be opened to solve finale problems. There’s been a lot of incremental and evolutionary growth in these fields, but it definitely does take a revolutionary way to do it.
Definitely, I’m wrapped here that you’re in the software space, you’re trying to do the human problems first, and figure out where’s your niche to be able to play with. I’m glad you’re not trying to humane pin rabbit.
Notwithstanding, the world has shifted that way, maybe it shifted back to, yeah, but that rectangle in our pocket probably is the best bank for our buck, and let’s use that for a bit longer. But I’m just super thrilled to watch this journey and encourage you to say, hurry up, we want what you’re thinking about.
So yeah, please make one, that’d be great.
Arseniy
We will, we will. And that’s why it’s my pleasure being here. Thanks for having me, because I’m realizing that I’m actually one of the several brightest minds that have visited this this this podcast, I would say.
So it’s great to be here. Thank you. It’s been a great conversation. So, yeah, it’s looking forward to getting in touch with people who could help us bring this faster, as you said. Thank you.
AB
We will get to work as far as we possibly can. This will be worth it. We’re probably either listening to this or watching this on YouTube. You’ve scrubbed right to the end to go straight to the who done it.
From us though, we’ll leave Episode 23 there and we’ll catch you next week on SPATIAL. Arseniy, you thank you so much. And to all, we’ll catch you next time. Cheers all.
HOSTS
AB – Andrew Ballard
Spatial AI Specialist at Leidos.
Robotics & AI defence research.
Creator of SPAITIAL
To absent friends.