ESR (and most normal desktop-recording / screencasting tools) record audio through one of your audio recording devices. That is, it normally will want to be recording your voice from a microphone.
The only way to record the audio playing through a browser is if you have a virtual audio recording device that is names something like "what you hear" or "stereo mix". If you have such a device, you should select it as your audio input source in ESR and then it should work.