Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to get meta information for non webpages #112

Open
zyuhel opened this issue Oct 26, 2015 · 3 comments
Open

Ability to get meta information for non webpages #112

zyuhel opened this issue Oct 26, 2015 · 3 comments
Assignees

Comments

@zyuhel
Copy link
Contributor

zyuhel commented Oct 26, 2015

If we use essence as end-user links parser, sometimes they add links directly to files/videos. Of course getting metadata is almost impossible. But if we could check headers. we can form imageUrl, title (from filename), providername/providerurl(guess from domain). If the file is small we can also try to get inner metadata. Also header checking of requested url will help if there will be default provider and some one pass a link to debian iso :) we can say it is big file,mark it in resulting data, and close the request.

@zyuhel
Copy link
Contributor Author

zyuhel commented Oct 27, 2015

Tried to write compatible http client using Guzzle. I have extended interface with a couple of methods. to separate body and headers retrieval, so that we could retrieve headers only for large files.
It seems to work.

Have used it like this

$client=new GuzzleClient();
$response=$client->processUrl($url);
print_r($response->getHeaders());
echo $response->getBody();

What do you think about making extended interface. So that more complex providers could be written. It seems normal solution, so some extended providers can be created, and it shouldn't break compatibility.
Of course it is just a draft, exception catching/throwing should be added. But i would like to listen to you respond about the idea itself..

interface ClientExtended extends \Essence\Http\Client {

    /**
     *  Retrieves headers of current request
     *  @return array headers
     *  
     */
    public function getHeaders();

    /**
     *  Retrieves body of current request
     *  
     *  @return string The contents.
     */
    public function getBody();

    /**
     *  Opens request to url
     *
     *  @param string $url The URL fo fetch contents from.
     *  @return self.
     *  @throws Essence\Http\Exception
     */

    public function processUrl($url);

}

and

use GuzzleHttp\Client;

class GuzzleClient implements ClientExtended
{
    protected $_response;
    protected $_user_agent='Katmark';
    public function getHeaders()
    {
        return $this->_response->getHeaders();
    }
    public function getBody(){

        $elem='';
        while (!$this->_response->getBody()->eof())
        {
             $elem.=$this->_response->getBody()->read(1024);
        } 
        return $elem;
    }
    public function processUrl($url)
    {

        $client = new \GuzzleHttp\Client(['headers' => ['User-Agent' => $this->_user_agent],'stream' => true]);
        // Send a request to https://foo.com/api/test
        $this->_response = $client->request('GET', $url);
        return $this;

    }
    public function get($url)
    {
        $client = new \GuzzleHttp\Client(['headers' => ['User-Agent' => $this->_user_agent]]);
        $response = $client->request('GET',$url);

        return $response->getBody();


    }
    public function setUserAgent($agent){
        $this->_user_agent=$agent;
    }
}

@felixgirault
Copy link
Member

The ability to extract metadata from files could be a cool thing :)
Does it mean that essence should make a headers request before doing anything?

@zyuhel
Copy link
Contributor Author

zyuhel commented Oct 27, 2015

essense doesn't need to do anything. There is filter that match urls to providers. So the fetch metadata and form output could be put on shoulders of providers. Essense just select provider and forward url to it. :)

Default provider of course will need to make a headers request. because no one knows what will be. for other providers it is optional i think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants