ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE.

Other Software > Developer's Corner

Artificial intelligence bests humans at classic arcade games

(1/2) > >>

There has been some buzz recently around a few articles that demonstrate machine learning in the video game domain.

Here's one writeup:

Artificial intelligence bests humans at classic arcade games

For the academically inclines, I would recommend:
Playing Atari with Deep Reinforcement Learning

Which talks in detail about the methods used.

The use of the term "deep" seems to me to be as much about coming up with a catchy term that has gone viral and is being hyped like mad -- and has little innovation behind it -- but the new wave of practitioners using neural networks for large scale problems are getting undeniably impressive results.  

Again, getting back to the video game results:

There is nothing particularly novel in the approach -- the domain is wonderful, and the basic focus on using the same architecture and parameters to tackle a large collection of learning problems -- and using large dimensional raw input, is great.  And the results are impressive.  Again -- in my mind this is more a story of the new wave of practioners who are getting very good at leveraging fairly standard neural network techniques on larger and larger problems.

Having said that, this line of work offers little qualitative improvement on the hard problems in AI -- on serious multiscale hierarchical planning, scene recognition, etc.  For that we are still waiting for some paradigm shifts.

My enthusiasm for AI often exceeds my editing, but here's a couple of thoughts.

Certainly "Q or Reinforcement Learning" feels like a bit of partly complicating the obvious. So it seems that from what little I know of chess programming, they can't guarantee the best move, so they  "monte-carlo simulate it" - aka run scads of iterated tests and then the program "tends to notice that such a certain X move tends to lose or win". Sometimes the other side escapes, but it's that "tends" that matters.

So in Space Invaders, if you get stuck on the side, you "tend to get trapped" because you're missing half your movement range. In certain conceptual ways, that feels like "sorta easy" programming to me.

What I don't see is any interaction with "precursor tutorials" such as if your friend comes over and hangs out with pizza for an hour to show you stuff. You still have to play the game, but it sounds like the games tested were "easy to play with clever middle level tricks". So unlike hardcoded strategies, you make your friend's suggestions "a hypothesis" - that's how he always played, so the computer looks there first with at least a baseline. Then some of the friend's suggestions turn out to be sub-optimal. (I think that was called H:0 and H:A in statistics. Yes?) PacMan sounds like it would be a good test here.

Take a game where the human says "I don't know what I am doing" regarding gameplay and I bet the computer will get stuck.

I don't mean to sound harsh, and please don't take offense, but I don't think it's helpful saying things like "Q or Reinforcement Learning feels like a bit of partly complicating the obvious" without understanding the math and foundation for these algorithms.  Q-learning and other reinforcement learning techniques are elegant, efficient, and based on very sound principles.  They aren't the holy grail of human-level intelligence but they are very elegant algorithms. There are great books on this stuff for those who want to learn about it.  The now classic book on reinforcement learning is by Sutton and Barto (here), which I recommend.  

ps. Your idea to use an expert to initialize training and start as a baseline is an area of active research in current AI -- and in fact was part of the early days of AI.

Someone wrote and explains a bit about a neural network made for Super Mario World.

MarI/O - Machine Learning for Video Games


And the follow up:

Machine learning AIs are starting to get better, and can now defeat humans in 5v5 Dota 2 matches!

Our team of five neural networks, OpenAI Five, has started to defeat amateur human teams at Dota 2. While today we play with restrictions, we aim to beat a team of top professionals at The International in August subject only to a limited set of heroes. We may not succeed: Dota 2 is one of the most popular and complex esports games in the world, with creative and motivated professionals who train year-round to earn part of Dota’s annual $40M prize pool (the largest of any esports game).

OpenAI Five plays 180 years worth of games against itself every day, learning via self-play. It trains using a scaled-up version of Proximal Policy Optimization running on 256 GPUs and 128,000 CPU cores — a larger-scale version of the system we built to play the much-simpler solo variant of the game last year. Using a separate LSTM for each hero and no human data, it learns recognizable strategies. This indicates that reinforcement learning can yield long-term planning with large but achievable scale — without fundamental advances, contrary to our own expectations upon starting the project.-
--- End quote ---

A really interesting (and long) read.


[0] Message Index

[#] Next page

Go to full version