php怎麼只獲取文章文字內容

2022-11-30 10:00:53

php只獲取文章文字內容的方法:1、建立一個PHP範例檔案;2、通過定義「function curl_request ( $url , $post = '' , $cookie = '' , $returnCookie = 0 ) {...}」方法實現只抓取網頁文字內容,並過濾其標籤即可。

php入門到就業線上直播課:進入學習
Apipost = Postman + Swagger + Mock + Jmeter 超好用的API偵錯工具:

本教學操作環境:Windows7系統、PHP8.1版、Dell G3電腦。

php怎麼只獲取文章文字內容?

php只抓取網頁body文字內容,並過濾網頁標籤

php只抓取網頁文字內容,並過濾其標籤,說幹就幹,開始!

程式碼如下:

<?php
 function curl_request ( $url , $post = '' , $cookie = '' ,  $returnCookie = 0 ) {
     $ua = $ua==''?$_SERVER ['HTTP_USER_AGENT']:'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E; LBBROWSER)' ;
            $curl  =  curl_init ( ) ;
            curl_setopt ( $curl , CURLOPT_URL ,  $url ) ;
            curl_setopt ( $curl , CURLOPT_USERAGENT , $ua ) ;
            curl_setopt ( $curl , CURLOPT_FOLLOWLOCATION ,  1 ) ;
            curl_setopt ( $curl , CURLOPT_AUTOREFERER ,  1 ) ;
            curl_setopt ( $curl , CURLOPT_REFERER ,  "https://www.baidu.com" ) ;
            if ( $post )  {
                 curl_setopt ( $curl , CURLOPT_POST ,  1 ) ;
                 curl_setopt ( $curl , CURLOPT_POSTFIELDS ,  http_build_query ( $post ) ) ;
            }
            if ( $cookie )  {
                 curl_setopt ( $curl , CURLOPT_COOKIE ,  $cookie ) ;
            }
            curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
            curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, false);
            curl_setopt ( $curl , CURLOPT_HEADER ,  $returnCookie ) ;
            curl_setopt ( $curl , CURLOPT_TIMEOUT ,  10 ) ;
            curl_setopt ( $curl , CURLOPT_RETURNTRANSFER ,  1 ) ;
            $data  =  curl_exec ( $curl ) ;
            if  ( curl_errno ( $curl ) )  {
                 return  curl_error ( $curl ) ;
            }
            curl_close ( $curl ) ;
            if ( $returnCookie ) {
                 list ( $header ,  $body )  =  explode ( "\r\n\r\n" ,  $data ,  2 ) ;
                 preg_match_all ( "/Set\-Cookie:([^;]*);/" ,  $header ,  $matches ) ;
                 $info [ 'cookie' ]   =  substr ( $matches [ 1 ] [ 0 ] ,  1 ) ;
                 $info [ 'content' ]  =  $body ;
                 return  $info ;
            } else {
                 //return  $data ;
                 $data=mb_convert_encoding($data, 'UTF-8', 'UTF-8,GBK,GB2312,BIG5');
                preg_match("/<body.*?>(.*?)<\/body>/is",$data,$match);
                $str= trim($match[1]);
      $html = strip_tags($str);
    $html_len = mb_strlen($html,'UTF-8');
    $html = mb_substr($html, 0, strlen($html), 'UTF-8');
    $search = array(" "," ","\n","\r","\t");
    $replace = array("","","","","");
    echo str_replace($search, $replace, $html);
            }
}
curl_request ( $url, $post = '' , $cookie = '' ,  $returnCookie = 0 );
?>
登入後複製

推薦學習:《》

以上就是php怎麼只獲取文章文字內容的詳細內容,更多請關注TW511.COM其它相關文章!