SoFunction
Updated on 2025-04-11

PHP uses puppeteer to grab the page content rendered by JS

Recently I encountered a problem, and I needed to crawl the content of the web page rendered by js, so I studied the relevant implementation methods. Mainly rely onpuppeteerImplementation, it is a Node library. If you want to use it in PHP, you also use it.spatie/browsershot

Environmental dependency

environment Require
Node >=7.6.0
PHP >=7.1
PHP extension php_sockets, php_exif

puppeteer

PuppeteerIt's oneNodeLibrary, I installed this library directly using npm under the php project, and then use it tospatie/browsershotto call it. Readers can also create a new node project to install this library, and then expose a port to the outside world to pass the URL to return the HTML content through the interface.

npm i puppeteer --save

Install Chromium offline

InstallpuppeteerWill download it whenChromium, because it may not be downloaded for well-known reasons, so the following provides an offline download method.

Skip to install chromium

If the previous command has been run and downloadingChromiumThen, you canCtrl+CStop the task. If it has not been run, use the following command to install.

npm i puppeteer --ignore-scripts

Get the chromium version number that needs to be downloaded

Open/node_modules/puppeteer/searchchromium_revisionCorrespondingVersion number

"puppeteer": {
    "chromium_revision": "756035",
    "firefox_revision": "latest"
}

Download the corresponding version of chromium

Replace the characters in the braces below with the version number above, for example, I'm locallywin x64, the download address is/chromium-browser-snapshots/Win_x64/756035/

macDownload address:
/chromium-browser-snapshots/Mac/{chromiumVersion}/

windows 64位Version下载地址:
/chromium-browser-snapshots/Win_x64/{chromiumVersion}/

windows 32位Version下载地址:
/chromium-browser-snapshots/Win/{chromiumVersion}/

Linux X86Version下载地址:
/chromium-browser-snapshots/Linux/{chromiumVersion}/

Linux X64Version下载地址:
/chromium-browser-snapshots/Linux_x64/{chromiumVersion}/

Decompression

Will download itchromiumUnzip the installation package topuppeteerIn-house.local_chromium/win64-{chromium version number}/In the directory. Take mine as an example/node_modules/puppeteer/.local_chromium/win64-756035/chrome-win/. Get it done ~

spatie/browsershot

browsershotIt's onecomposerPackage, I've used it beforespatie/laravel-permission, they are all produced by the same team

composer require spatie/browsershot

use

In fact, the difficult part is finding the right tools and installation tools, which is actually very simple to use. Here is a very simple example, and more methods are to look atOfficial DocumentationBar.

<?php
use Spatie\Browsershot\Browsershot;
class Spider
{
    /**
      * Get html content
      * @param $url
      * @return string
      */
    public static function getBodyHtml($url)
    {
        return Browsershot::url($url)->bodyHtml();
    }
}

Summarize

This is the article about PHP using puppeteer to crawl the page content rendered by JS. This is the end of this article. For more related PHP to obtain the page content rendered by JS. Please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!