UPDATE, Now this work perfect on Intel GPUS, blame bad precision on intel gpus for uniform float vars.
Also use ASEPRITE for generate the greyscale mask, Photoshop alters the palette on save
After playing a bit with godot I found maybe the best solution for this problem without using if for each color or a loop:
https://gfycat.com/UnderstatedExemplaryAuklet
This solution only requires a greyscale mask for each frame and the palette textures which saved on the correct format are a minimal memory overhead
First you need a mask which represent the areas that change color for map on the palette, on my example these are like this:

They are greyscale images without alpha where each color is represented by a correlative number and then this number represents the color index on the palette:
On the example image:
megaman black color is 0,0,0 which is index 0 on palette
megaman blue color is 1,1,1 which is index 1 on palette
megaman cyan color is 2,2,2 which is index 2 on palette
white color 255,255,255 is ignored
And then the palettes, because I need only 3 colors I created my palette of 4x1 pixels, more colors to map, the bigger the palette.
Then the shader where the magic happens:
uniform texture palette;
uniform float palette_size;
uniform texture mask;
vec4 mask_color = tex(mask,UV);
vec4 output = tex(TEXTURE,UV);
if(mask_color.r != 1.0)
{
output = tex(palette,vec2((mask_color.r*255.0)/(palette_size-0.001),0.0));
}
COLOR.rgba = output;
This shader receives the current palette
The total size of the palette (image width)
And the mask for current frame
Then for each non white pixel (255,255,255) on the image I apply the mask vs palette translation
Note: I only use the red channel for my calculations because on a greyscale images the three channels are the same value.
And for the translation basically the formula (mask_color.r/(palette_size/255))
is transform 0.0/1.0 values to palette UV.x ones, also remember UV values are between 0.0 and 1.0 too.
for Example:
the current mask color = 2,2,2
mask_color.r
= 0,007843137254902 (remember godot shaders return normalized (0.0-1.0) colors, this number here is the "2" from current_color normalized automatically by godot (2/255))
palette_size
= 4 (This value needs to be normalized for match the godot color values between 0.0 and 1.0, that is the /255 on the formula)
So the formula:
0,007843137254902/(4/255) = 0,5000000000000007 ~ 0,5 which results on the color two of our palette (look at the image on top is cyan)
All this looks good now, but what about the implications of change textures on the fly from disk on the shader, for example an animation player directly change textures?
For that I created a simple script to automatically update the animation frames with animation frame mask using a SpriteFrames, so is the same process as loading the frames for animated sprite, a requirement is those needs to match, so if one of your animations has 5 frames, you need the 5 same frames masked. Also the script does the same for palettes. With this you already loaded the required textures so no more disk read.
Here the piece of code which does that:
extends AnimatedSprite
export(SpriteFrames) var color_mask = null
export(SpriteFrames) var palettes = null
export(int) var palette_index = 0 setget _set_palette_index
var palette_size = 0 setget _set_palette_size
#When the frame change, change the mask reference on the shader too
func _on_AnimatedSprite_frame_changed():
if(color_mask != null && get_frame() < color_mask.get_frame_count()):
get_material().set_shader_param("mask",color_mask.get_frame(get_frame()))
#If the palette index change, reflect the change on the shader and update the width of the new palette
func _set_palette_index(value):
if(palettes != null && palette_index < palettes.get_frame_count() && palette_index != value):
palette_index = value
var frame_mask = palettes.get_frame(palette_index)
get_material().set_shader_param("palette",frame_mask)
self.palette_size = frame_mask.get_width()
#Update the width of the palette
func _set_palette_size(value):
if(palette_size != value):
palette_size = value
get_material().set_shader_param("palette_size",palette_size)
An image which resume how the algorithm works:

And the demo project with the concepts explained here:
https://dl.dropboxusercontent.com/s/8prbvi22q5493mk/pallete_swap_fixed.zip?dl=0
If somebody has an optimization feel free to comment it here, but this could be the fastest solution on performance with a minimal memory overhead on GLES2.0