Ability to get meta information for non webpages #112

zyuhel · 2015-10-26T11:54:43Z

If we use essence as end-user links parser, sometimes they add links directly to files/videos. Of course getting metadata is almost impossible. But if we could check headers. we can form imageUrl, title (from filename), providername/providerurl(guess from domain). If the file is small we can also try to get inner metadata. Also header checking of requested url will help if there will be default provider and some one pass a link to debian iso :) we can say it is big file,mark it in resulting data, and close the request.

zyuhel · 2015-10-27T01:31:54Z

Tried to write compatible http client using Guzzle. I have extended interface with a couple of methods. to separate body and headers retrieval, so that we could retrieve headers only for large files.
It seems to work.

Have used it like this

$client=new GuzzleClient();
$response=$client->processUrl($url);
print_r($response->getHeaders());
echo $response->getBody();

What do you think about making extended interface. So that more complex providers could be written. It seems normal solution, so some extended providers can be created, and it shouldn't break compatibility.
Of course it is just a draft, exception catching/throwing should be added. But i would like to listen to you respond about the idea itself..

interface ClientExtended extends \Essence\Http\Client {

    /**
     *  Retrieves headers of current request
     *  @return array headers
     *  
     */
    public function getHeaders();

    /**
     *  Retrieves body of current request
     *  
     *  @return string The contents.
     */
    public function getBody();

    /**
     *  Opens request to url
     *
     *  @param string $url The URL fo fetch contents from.
     *  @return self.
     *  @throws Essence\Http\Exception
     */

    public function processUrl($url);

}

and

use GuzzleHttp\Client;

class GuzzleClient implements ClientExtended
{
    protected $_response;
    protected $_user_agent='Katmark';
    public function getHeaders()
    {
        return $this->_response->getHeaders();
    }
    public function getBody(){

        $elem='';
        while (!$this->_response->getBody()->eof())
        {
             $elem.=$this->_response->getBody()->read(1024);
        } 
        return $elem;
    }
    public function processUrl($url)
    {

        $client = new \GuzzleHttp\Client(['headers' => ['User-Agent' => $this->_user_agent],'stream' => true]);
        // Send a request to https://foo.com/api/test
        $this->_response = $client->request('GET', $url);
        return $this;

    }
    public function get($url)
    {
        $client = new \GuzzleHttp\Client(['headers' => ['User-Agent' => $this->_user_agent]]);
        $response = $client->request('GET',$url);

        return $response->getBody();


    }
    public function setUserAgent($agent){
        $this->_user_agent=$agent;
    }
}

felixgirault · 2015-10-27T14:07:02Z

The ability to extract metadata from files could be a cool thing :)
Does it mean that essence should make a headers request before doing anything?

zyuhel · 2015-10-27T14:56:34Z

essense doesn't need to do anything. There is filter that match urls to providers. So the fetch metadata and form output could be put on shoulders of providers. Essense just select provider and forward url to it. :)

Default provider of course will need to make a headers request. because no one knows what will be. for other providers it is optional i think.

felixgirault added the enhancement label Oct 27, 2015

felixgirault self-assigned this Oct 27, 2015

felixgirault mentioned this issue Nov 12, 2015

Class 'Parkour\Transform' not found in /var/www/html/essence/lib/Essence/Di/Container.php on line 76 #115

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ability to get meta information for non webpages #112

Ability to get meta information for non webpages #112

zyuhel commented Oct 26, 2015

zyuhel commented Oct 27, 2015

felixgirault commented Oct 27, 2015

zyuhel commented Oct 27, 2015

Ability to get meta information for non webpages #112

Ability to get meta information for non webpages #112

Comments

zyuhel commented Oct 26, 2015

zyuhel commented Oct 27, 2015

felixgirault commented Oct 27, 2015

zyuhel commented Oct 27, 2015