Doing this mostly boils down to segmenting your world into chunks, so you can "stream" it from disk and load only the part the player is standing on. You can do it simply by loading nodes dynamically rather than using change_scene
for example.
The way of doing that differs a bit if the world is randomly generated (world building is then shipped with the game), and if it is not, you could have to build some tools to make that easier because Godot and most mainstream game engines (or even 3D software) work in single, non-streamed scenes.
I've been interested in this for a very long time but never really went far in implementing that (made a couple of Minecraft clones on the way). Doing that is non trivial and depends a lot on what you actually want to support. There can be a lot of work involving all areas, graphics, terrains, physics, floating-point precision handling, persistance, networking, probably sound too.