Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird characters in loaded HTML #187

Open
marcselman opened this issue Apr 28, 2015 · 1 comment
Open

Weird characters in loaded HTML #187

marcselman opened this issue Apr 28, 2015 · 1 comment

Comments

@marcselman
Copy link

Hi,

I noticed some weird characters popping up in the HTML when using CQ.CreateFromUrl.
Here is an example:

var c = CQ.CreateFromUrl("http://www.cswonen.nl/sint-willebrord-monseigneur-van-hooydonkstraat-NLH00452695006");
c.Document.Body.OuterHTML.Dump();

When you execute above example (in LinqPad for example) you'll notice in the output:

<img src="http://public���������������������������������������������������������������������������������������������������������������������������������������������.parariusoffice.nl/45/photos/export/2695006.1429611799-844.jpg" alt="Foto van">

I have no idea where the weird characters come from. I don't see them in the HTML source when loading it in the browser or in Sublime Text. If I load the page in c# into a string and then load the string into a CQ object it works without problems.

Do you have any idea what this could be?
Thanks.

@rufanov
Copy link
Contributor

rufanov commented May 20, 2015

It's a bug. Three was unwanted nulls after first package from webserver if it's size been less than 4096.

Little messy test, that illustrate this bug.

using CsQuery.HtmlParser;
using Microsoft.VisualStudio.TestTools.UnitTesting;
using NUnit.Framework;
using System;
using System.IO;
using System.Text;
using Assert = NUnit.Framework.Assert;

namespace CsQuery.Tests.Issues
{
    [TestFixture, TestClass]
    public class Issue187 : CsQueryTest
    {
        [Test, TestMethod]
        public void Issue187Test()
        {
            using (var mockStream = new Issue187MockStream())
            {
                var factory = new ElementFactory();
                var dom = factory.Parse(mockStream, Encoding.UTF8);

                Assert.AreEqual(Issue187MockStream.HTML, dom.FirstChild.OuterHTML);
            }
        }
    }
    public class Issue187MockStream : Stream
    {
        public const string HTML = @"<html><head></head><body><a href=""http://test.example.com"">Test</a></body></html>";

        public override int Read(byte[] buffer, int offset, int count)
        {
            byte[] bytes = Encoding.UTF8.GetBytes(HTML);

            int splitPosition = bytes.Length / 2;
            int lenght;

            if (Position == 0)
            {
                lenght = splitPosition;
                Array.Copy(bytes, buffer, splitPosition);
            }
            else if (Position == splitPosition)
            {
                lenght = bytes.Length - splitPosition;
                Array.Copy(bytes, splitPosition, buffer, 0, lenght);
            }
            else
            {
                lenght = 0;
            }

            Position += lenght;
            return lenght;
        }


        public override bool CanRead { get { return true; } }
        public override bool CanSeek { get { return false; } }
        public override bool CanWrite { get { return false; } }

        public override long Position { get; set; }
        public override void Flush() { return; }

        public override long Length { get { throw new NotImplementedException(); } }
        public override long Seek(long offset, SeekOrigin origin) { throw new NotImplementedException(); }
        public override void SetLength(long value) { throw new NotImplementedException(); }
        public override void Write(byte[] buffer, int offset, int count) { throw new NotImplementedException(); }
    }
}

jamietre added a commit that referenced this issue May 22, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants