We can learn a lot about the structure of existing voice assistants, or RPA solutions by looking at the APIs available, or open source variants (e.g. MyCroft). Check out my earlier blog post for links to developer documentation for a range of automation software.
Looking at voice assistants like Siri, Cortana, Alexa, Google Assistant, and MyCroft, it’s clear we would need to extend the design of these intent based APIs to
a) automatically deduce intent from raw data inputs; and
b) Operate in a non-event driven manner – intents will increase and decrease in confidence over time, instead of being based on a triggered action
RPA platforms like Automation Anywhere, UIPath, Blue prism etc. do provide some insight on how to handle b) above – using visually composed conditional flows to convert a time-variant continuous signal to triggers which can then feed into an event driven architecture. This is a good first approximation which could be used to validate parts of the system, however the final solution will need to be able to derive these conditional flows from raw data, instead of relying on explicit programming. Our final solution will need to be a superset of these designs, operating on a general principle of predicting a desired goal based on current state S -> G, and the best next action to apply given the current state and goal (S, G) -> A.
What we need is a cognitive architecture (CA), something which handles perception, memory, planning, reasoning and actioning. A design which at its highest level relies on raw input signals to derive state and performs intelligent and complex actions to mimic human like behavior. Most cognitive architectures aim to replicate the entire human process, from the senses through to muscle control to navigate a three dimensional world. We’re looking to simplify this environment down to the desktop OS, instead of the entire world, so Cyte’s needs do not overlap completely with those of cognitive architectures. Using a computer is only a small portion of the human skillset.
There aren’t too many cognitive architectures around. Many are secluded to the realm of university research, with limited practical application. Those that do exist are obviously limited in capability, and constantly evolving in an ongoing effort to converge on the correct solution which can mimic human level behavior.
TinyCog employs Scene Based Reasoning. After pulling raw data from sensors, it builds an internal representation of the observed environment (or state), and stores this as a 3D scene graph. Observed scenes are stored in episodic memory, and are connected by actions to form Plans. Actions define effects that cause transitions in environment state, and chained actions are used to translate from the current state to a desired goal state. Progress towards a goal/of plan execution is stored in episodic memory and the attention mechanism, which serves as an active scope related to active goals.
Its primary focus seems to revolve around physical environments and robotics, a bit more complex than the scenarios for Cyte. Although the primary difference should theoretically be operating in 2D instead of 3D, there would be cascading changes required which are not exemplified in currently available code. Also, the source code is licensed under LGPL – while not a showstopper if the library would save significant enough time, MIT or BSD are the preferred third-party library licenses for Cyte.
SOAR is one of the original AI research projects and has been developed over a long period of time primarily by the University of Michigan. It’s the largest project of the three, with features such as Reinforcement Learning built in, and provides numerous small yet usable examples. It is licensed under BSD, which is a positive for Cyte.
OpenCog is the brainchild of Ben Goertzel – the chief scientist at Hansen robotics; the company that developed Sophia the robot. He also founded SingularityNet, a decentralized model marketplace. Portions of the source code have been reused by the team in systems built for large enterprise customers.
The core of OpenCog is the AtomSpace, a knowledge graph database which can handle arbitrarily complex information hierarchies, perform graph rewriting to acquire knowledge and adapt over time, and also perform forward or backward chaining as part of reasoning. There is a persistence layer built using Postgres, and scheme language interface bindings for mutating the database.
There are several other modules built around the AtomSpace for functions such as natural language processing, attention, visual processing and much more. The primary example of system integration is a chatbot in the same vein as Sophia, built on ROS. A great positive for the platform is the level of verification it has undergone as part of use in real world projects for enterprise clients. Sadly, this has also created a large list of improvements to be made to core systems to support scalability and fix some incorrect initial assumptions. This means interfaces are likely to change in unpredictable and compatibility breaking ways, which may require significant rework. The system as a whole is also aiming to do much more (full AGI) than is required for Cyte, and it is also licensed under LGPL.
Although this list is by no means exhaustive – there are other popular models such as ACT-R etc. these are some of the most popular platforms around in 2019. All platforms have a very real chance of undergoing breaking changes which would require significant rework to Cyte, which is a concern. All platforms also aim to solve target problems which are a superset of the problem Cyte aims to address, and therefore are more complex than strictly required. At this stage, there seems to be more work required to integrate a platform than what it would save in writing a minimal architecture specifically for Cyte, which is the likely approach that we will take in later phases, unless something changes by that point.