Scientists from Google Research and the company Loon, which is spun off from Google, have used reinforcement learning to train an AI to keep a stratospheric balloon autonomously in a fixed area. The AI lets the balloon rise or fall so that it can use high-altitude winds in different directions in order not to leave a radius of 50 kilometers around its reference point.
Loon emerged from the Google X experimental laboratory. The company supplies regions with weak network coverage with Internet access via stratospheric balloons. The new control is more efficient than the old, manually optimized method. Marc Bellemare and his colleagues now describe the technical details of the method in the journal Nature.
The balloons use solar-powered pumps to take in or out air, and so rise or fall. Since the winds blow steadily in the stratosphere, the balloons can be controlled quite reliably: If a balloon is too far from its reference point, the controller searches for a height at which a suitable wind blows based on meteorological data and wind forecasts . The previously used “Station Seeker” algorithm was quite conservative. He preferred winds that blew at the smallest possible angle to the target and at the lowest possible speed.
However, this strategy is not always optimal in terms of energy. The Google researchers therefore wanted to test whether reinforcement learning would be a better solution to the problem. The algorithm performs a large number of different actions in a simulation in order to learn which strategy leads to the greatest possible success. The method has recently been used with great success, especially in computer games. In contrast to game scenarios, however, the AI only has limited and incorrect data for this problem. Bellemare and colleagues filled these gaps by adding randomly generated noise to the wind data.
The neural network trained in this way was actually able to master the task better than the algorithm previously used: the balloon stayed within a radius of 50 kilometers around the reference point around 55 percent of the time, while Seeker station only stayed 40 percent of the time brought. That doesn’t read particularly impressive at first. In absolute numbers, however, the balloon spent around 3.5 hours more in the target area within 24 hours.
In addition, the bar on this problem is quite high. The limitation of the navigation method and the incomplete information about the wind system mean that the researchers calculated an upper limit of 68 percent of the time within the 50 kilometer radius. More, they write, could not be achieved “even with perfect knowledge of the wind system”, since there are situations in which the prevailing winds would simply make a solution impossible. If the information about the winds is incomplete, the researchers come to an upper limit of 55 percent. Because more information would improve the controls, the balloons will collect data during the flight in future. However, Bellemare writes that letting the neural network learn further in the course of operation would be of little use because it has already had “millions of flights in the simulation” behind it.
The researchers write that the method cannot be applied to networking balloons alone. Altitude balloons that remain in one place could also be used for a variety of other tasks such as environmental measurements or forest fire observation. “The primary use for Loon is to provide connectivity to underserved populations,” writes Bellemare. However, the ability to quickly train a controller for other purposes opens up a number of opportunities for the team to pursue.