Skip to content

Add, delete, modify, get html tags, text, links by using css selector

License

Notifications You must be signed in to change notification settings

emmanuelroecker/php-simply-html

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

74 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

php-simply-html

Scrutinizer Code Quality Build Status Coverage Status SensioLabsInsight Dependency Status

Add, delete, modify, read html tags by using css selector.

Get all text, links, summary inside html file.

It's working with PHP DOM Extension and Symfony CssSelector

Installation

This library can be found on Packagist.

The recommended way to install is through composer.

Edit your composer.json and add :

{
    "require": {
       "glicer/simply-html": "dev-master"
    }
}

Install dependencies :

php composer.phar install

How to modify html ?

// Must point to composer's autoload file.
require 'vendor/autoload.php';

use GlHtml\GlHtml;

//read index.html contents
$html = file_get_contents("index.html");

$dom = new GlHtml($html);

//delete all style tags inside head
$dom->delete('head style');

//prepare a new style tag
$style = '<link href="solver.css" type="text/css" rel="stylesheet"></link>';

//add the new style tag
$dom->get("head")[0]->add($style);

//replace a node
$dom->get("span")[0]->replaceMe("<h1></h1>");

//write result in a new html file
file_put_contents("result.html",$dom->html());

How to get all text inside html ?

// Must point to composer's autoload file.
require 'vendor/autoload.php';

use GlHtml\GlHtml;

//read index.html contents
$html = file_get_contents("index.html");

$dom = new GlHtml($html);

//array of string sentences
$sentences = $dom->getSentences();

print_r($sentences);

How to get all links inside html ?

// Must point to composer's autoload file.
require 'vendor/autoload.php';

use GlHtml\GlHtml;

//read index.html contents
$html = file_get_contents("index.html");

$dom = new GlHtml($html);

//array of string url
$links = $dom->getLinks();

print_r($links);

How to extract html headings (h1,h2,...,h6)?

<?php
// Must point to composer's autoload file.
require 'vendor/autoload.php';

use GlHtml\GlHtml;

//read index.html contents
$html = file_get_contents("index.html");

$dom = new GlHtml($html);

//array of GlHtmlSummary object
$summary = $dom->getSummary();

echo $summary[0]->getNode()->getText() . ' ' . $summary[0]->getLevel();

/* 
  extract html headings tree
*/
$summaryTree = $dom->getSummaryTree();

Running Tests

Launch from command line :

vendor\bin\phpunit

License MIT

Contact

Authors : Emmanuel ROECKER & Rym BOUCHAGOUR

Web Development Blog - http://dev.glicer.com