Cyte is not going to be a simple tool. It will require innovative solutions, and in some ways it is a bet on soon to be made technological progress – a prediction that there will be sufficient advancement during development to enable some of the more challenging aspects of the problem to be solved.
At its core, Cyte is a machine learning problem – which means tracking as many of the primary factors driving machine progress is important to understand the ongoing project confidence. Here are 5 factors which are driving major progress in machine learning right now.
1. GPU Acceleration
Not only are there more powerful GPUs available for use through the economies of scale and ongoing semiconductor R&D going on at vendors like NVIDIA and AMD, there’s increasing demand driven by game developers pushing limits on rendering, and all the hype surrounding blockchain and it’s suitability for GPUs. Each GPU has increasingly more operations per second available, there are increasingly more GPUs circulating, and furthermore vendors are developing software + dedicated pipelines suited to tensor processing – the bread and butter of neural network inference and gradient decent training.
2. Efficiency improving toolsets
I still remember my first project using neural networks. Back then, visualization almost always needed to be custom implemented, debugging matrix multiplications on a forward or backwards pass often involved logging massive quantities of data. Over time, the community has converged on specific good practices and design patterns, which has in turn incentivized the creation of tools and environments which support those patterns – for example, the developers over at lobe.ai built a great visual tool for constructing network architectures, visualizing activations and trained weights, and automating common data preparation and pipeline activities like normalization, k fold validation etc. Conversion tools for moving between computational graph formats like TensorFlow, CoreML and PyTorch allow developers to learn their toolset of choice and still be able to leverage progress made by other developers using other formats. These tools greatly reduce the time spent experimenting, and in turn allow us to deliver valuable results quicker.
3. Cloud Tensor Processing Units
Gradient decent is notoriously expensive for compute. When architecting networks, we usually consider the compute, and time available for training to ensure we can test and iterate to deliver projects on time. Recently, neural networks have proven themselves able to drive economic value at massive scale. This has incentivized the organizations to stand to profit most to invest in and develop specialized hardware for performing tensor math which are much more efficient than general purpose CPUs and GPUs. Notably, Google developed the TPU, and then made them available for rent through their cloud offerings. These cost-effective solutions for training models makes using larger and more complex models viable.
4. Availability of large data sets
Storage capacity increases, and cost per megabyte decreases massively over time. Along with the proliferation of cloud storage services, data storage more accessible than ever. Not only this, but the amount of data we are create is increasing rapidly. These two factors, along with the increasing appreciation for the value of this data (we know the quantity and quality of data used to train a model is crucial to it’s performance) has resulted in the archiving and distribution of massive, high quality datasets in well understood and compatible formats (for example, TFRecords for TensorFlow, or even pre-trained weights for use in transfer learning). Since ImageNet was first introduced, we’ve seen numerous, similarly valuable datasets appear for many other types of tasks. Leveraging these datasets allows us to concentrate on novel developments, instead of reproducing a baseline every time.
5. Neural network inference co-processors on edge devices
Similar to cloud based TPUs for training models, there has been a trend towards developing and integrating specialized tensor co-processors alongside CPUs and GPUs on devices, specifically for the purpose of running neural network inference. Some examples of this are the Apple A11 and Google’s Coral (https://coral.withgoogle.com/products/dev-board/). Traditionally, when developing models for use in production software, we’d need to consider the impact on the CPU and GPU, including total capacity and sharing resources in a multitasking environment. Dedicated processors allows more freedom in deploying complex networks, with well defined baseline performance targets, and much more total throughput.
All the above continuing progressions will allow us to shatter previously set expectations over time, unlocking use cases previously considered infeasible. It’s simply a matter of time.