This site is currently in read-only mode during migration to a new platform.
You cannot post questions, answers or comments, as they would be lost during the migration otherwise.
+1 vote

I want to do Web Scraping with Godot.
As an example I found this
https://www.youtube.com/watch?v=_N83PpMQsfM&t=354s
How do I get only the "text" part of the output I want to do. ("User ramazan")

My code

var data : PoolStringArray
var url := "https://godotengine.org/qa/user/ramazan"

func _ready():
   $HTTPRequest.request(url)
pass
func _on_HTTPRequest_request_completed(_result, _response_code, _headers, body):
  var response = body.get_string_from_utf8()
  if 'godotengine' in url:
      data = response.split('<div class="page-title">')
      var price = str(data[1]).split('div')
      price = str(price[0])
      print(price)
pass

output =

<h1> - erase
User ramazan  - I only want this
</h1> -  erase
</       - erase
in Engine by (755 points)
edited by

1 Answer

+2 votes

Godot has a XMLParser class that you can use to parse HTML, is valid to remember that HTML and XML has some subtle differences, but for most scenarios you should be able to achieve the result you expect, I really don't recommend you to manually split/find/regex the text.

func _ready():
    $HTTPRequest.connect("request_completed", self, "_on_request_completed")
    $HTTPRequest.request("https://godotengine.org/qa/user/ramazan")

func _on_request_completed(_result, _response_code, _headers, _body):
    var parser: XMLParser = XMLParser.new()
    parser.open_buffer(_body)

    while parser.read() != ERR_FILE_EOF:
        if parser.get_node_name() == "form" and parser.has_attribute("method") and parser.has_attribute("action"):
            if parser.get_attribute_value(1).find('../user/') == 0:
                print(parser.get_attribute_value(1)) # ../user/ramazan

Although you can get the same text just parsing the url.

by (255 points)
edited by

thank you very much. I tried to do it with Python, but I learned everything in Godot.

Welcome to Godot Engine Q&A, where you can ask questions and receive answers from other members of the community.

Please make sure to read Frequently asked questions and How to use this Q&A? before posting your first questions.
Social login is currently unavailable. If you've previously logged in with a Facebook or GitHub account, use the I forgot my password link in the login box to set a password for your account. If you still can't access your account, send an email to [email protected] with your username.