Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Security issues because of serving html files #514

Closed
Otto-AA opened this issue Mar 22, 2023 · 18 comments
Closed

Security issues because of serving html files #514

Otto-AA opened this issue Mar 22, 2023 · 18 comments

Comments

@Otto-AA
Copy link

Otto-AA commented Mar 22, 2023

Currently Solid pods serve html files which get rendered in the browser. This breaks one base assumption of web security, namely that html files from the same origin are equally trustworthy, and leads to various potential security issues. From my point of view this should be addressed in the specification, to ensure the pods have the same behaviour in this regard.

Security issue

In web security it is assumed that apps on the same origin are equally trustworthy. Many security mechansisms block cross-origin communication and access, but within the same origin apps are less limited. Citing MDN: "The same-origin policy is a critical security mechanism that restricts how a document or script loaded by one origin can interact with a resource from another origin.".

Solid pods break the assumption that apps on the same origin are equally trustworthy. I could give someone access to /music/index.html and someone else to /banking/index.html. I do not want someone with access to /music to be able to access my banking details, hence I trust them in different ways. However, they are on the same origin and thus the browser does not prevent access between these apps. Frankly, we throw away the "critical security mechanism".

EDIT: this does not only apply to html files. For instance opening malicious svg files in the browser can execute javascript on the same origin.

Example issues

The following issues arise from the fact that html files on the same pod share the same origin. The first ones require that the user already uses one app (e.g. /banking/index.html) and someone with access to /drawing/index.html creates a malicious app:

  1. LocalStorage: this storage is shared within the same origin. The drawing app would be able to access stored data from the banking app.
  2. Cookies: By default cookies use SameSite: Lax which means they are not send cross-origin, but are sent on the same origin. Requests from the drawing app would include cookies from the banking app. A real case is this issue where the Drawing would be able to use the NSS session cookies.
  3. Iframes: The drawing app can include the banking app as an iframe. As it is same origin, it can access its contents. A real case is this issue where Mashlib accidentically gives html files it displays access to its window.
  4. Url history: the url is modifiable as long as the origin stays the same. Thus the drawing app could use window.history.pushState('page2', 'Title', '/banking/index.html'); to trick the user into believing they are looking at the banking app.

The following example does not require any previously used app on the pod, ie it is a security issue even when there are currently no html files on the pod:

  1. Service Workers: Service Workers are installed client side and then have control over all the network traffic within their scope. The scope is everything within the same folder or subfolders (eg /foo/serviceWorker.js can intercept requests to /foo/**/*). This can circumvent access control and other pod restrictions.

I’m pretty sure there are other cases, where it could be troublesome that apps of different trust-level are hosted on the same origin.

Example usages

So far I have heard of @josephguillaume that he uses custom html files to add custom extensions to SolidOS (see this issue). He also mentioned other possible usages in this thread: direct use of web components, local add-ons as per the other thread, hosting apps yourself, not having to run a separate server.

There are probably more unknown usages that rely on pods serving html files.

Solution ideas

Following potential solutions were mentioned so far. Note that they are mostly intended as a starting idea, not fully thought through solution proposals.

  1. Do not serve html files at all (ie nothing with content-type: text/html)
  2. Sandbox html files, for instance with the CSP sandbox header
  3. Sanitize html, eg remove javascript (on save or when serving)
  4. "placing additional restrictions on where public resources can be stored and how clients are allowed to interact with pods, etc." (by @joachimvh, not sure how this could look like)
  5. (EDITED) Add the Content-Disposition: attachment header for all files, to ensure they are downloaded and not rendered in the browser

One thing to note is, that it is not only about html files, but anything that is served as text/html. For instance CSS can serve markdown files as html files with the same security implications.

Related links

@sandervd
Copy link

This looks like a same origin policy issue?
Perhaps a wildcard DNS entry on the domain serving the HTML could offer a solution?
Then use the hierarchy in reverse order on the domain, e.g.
banking.example.org/index.html
drawing.example.org/index.html
etc.
My guess is this is why github pages etc. create a subdomain for each organization.

@Otto-AA
Copy link
Author

Otto-AA commented Mar 22, 2023

This looks like a same origin policy issue?

Exactly. This issue is about how we handle this with html files hosted on a pod, which thus share the domain.

For instance:

As they are on the same pod, we cannot give them different domains. One current example is that NSS serves SolidOS on the pod domain. Any html file on the same pod will run under the same origin, resulting in security issues.

@elf-pavlik
Copy link
Member

As they are on the same pod, we cannot give them different domains.

People should be able to use any number of solid storage instances easily. I have somehow related issue #377 .
For example, if people self-host solid apps (SPA / PWA), they should not allow anyone they don't trust to host their apps in the same storage. I would most likely have dedicated storage to host my apps, this storage type would allow it, for data I would use multiple storage instances which would not allow hosting apps. I don't think 'one size fits all' will provide a clean solution for all the use cases.

BTW in all the implementation work that I do on SAI, everything gets tested with data where mix of individuals and organizations, each have multiple storage instances.

@Otto-AA
Copy link
Author

Otto-AA commented Mar 22, 2023

For example, if people self-host solid apps (SPA / PWA), they should not allow anyone they don't trust to host their apps in the same storage. I would most likely have dedicated storage to host my apps, this storage type would allow it, for data I would use multiple storage instances which would not allow hosting apps.

This goes into the direction of separate origins for different levels/types of trust. One storage for "I would give you my password" level of trust that is on a different origin than a storage for "I give you access to my music folder" level of trust.

I guess for the pod implementations this would mean, they need to have a safe mode, where creating apps is prevented (sandboxing, etc). And another mode, or different implementations, would allow creation of apps. If we have such app-pods, then it needs to be clear to users that anyone with access to this pod could get access to anyone using these apps and all their pods.

However, I'm not really sure if these app-pods are worth it. First of, If we reduce it to "I completely trust someone" and "I don't completely trust someone", most of the access controls don't make any sense more. To be Solid compliant the server could implement it, but in the end it's pretty ineffective if someone can create applications on the same origin and likely only gives false expectations. A simple example for this: If I upload SolidOS to an app pod and login to it with the same web ID, anyone that can modify this SolidOS app can act as if they are me. So imo these app pods won't be truly solid compliant in the first place, so we maybe don't need to allow them explicitly in the specification?

And secondly, the main benefits of publishing apps on pods I see are that (1) I can access the applications source code with multiple apps and (2) it is easy to publish. Regarding (1), imo using git with offline tools achieves the same + the benefits of version control. Source code is standardized well enough and established source control systems allow online exchange. And regarding (2), we can use repl.it, glitch or any other tool and sync them with github (and likely gitlab and anything else) and automatically publish them (I haven't tested this in some years, but I guess it's still possible).

@elf-pavlik
Copy link
Member

elf-pavlik commented Mar 23, 2023

Makes sense, anyone who would want to self-host apps can most likely do it using a setup not based on solid.

👍 to prioritize solutions for the regular use of solid storage by a regular user

@Otto-AA
Copy link
Author

Otto-AA commented Mar 31, 2023

I've just learned that also svg files can execute javascript. For instance, if following file was public on a pod and the user navigates to pod.example.org/malicious.svg, the script will execute:

<?xml version="1.0" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"  "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg width="6cm" height="5cm" viewBox="0 0 600 500" xmlns="http://www.w3.org/2000/svg" version="1.1">
  <script type="application/ecmascript"> <![CDATA[
    alert('Hi from: ' + document.domain)
  ]]> </script>
</svg>

Thus the security issues come not only from serving text/html files, but at least anything that can execute javascript. I don't know which mime types can include javascript.

Also I don't know if javascript is the only issue when executed in the browser. Other rendered files could also cause security issues. As an example of what I mean: with CSS styles you can leak potentially sensitive data out of the website in which it is rendered, even when javascript is disabled.

So when fixing this issue, this needs to be taken into account. Maybe adding the CSP sandbox header to all files could work, I'm not sure if it applies to non-html files as well.

@elf-pavlik
Copy link
Member

For the record, disabling the execution of scripts in HTML might impact accessing protected HTML documents. Since navigating to that HTML document does not set the required Authorization header, one approach relies on including a solid app in the body of the 401 response which would allow the user to authenticate. Related issue:

I think in general there might not be enough hands-on experience with accessing protected HTML documents hosted in solid storage.

@josephguillaume
Copy link

Could you clarify how the disabling of execution of scripts in protected HTML documents might impact accessing that document? Or is that not what was meant?

In my mind, the "databrowser hack" is quite separate from serving of protected HTML documents - the code for the databrowser is not stored in the solid storage and is in some sense privileged. When accessing a protected HTML document in the web browser, the databrowser has to be loaded, makes an authenticated request to retrieve it and is then responsible for displaying it in a safe way.

This is already how things work with Community Solid Server, given that it does not support authentication using cookies. As a result, we have plenty of experience with accessing protected HTML documents hosted in solid storage. It would be fair to say, however, that we don't have much hands-on experience in safely displaying the resulting HTML. The experience we do have has gaps, e.g. svg above, or has usability issues, e.g. disabling how HTML usually works in non-intuitive ways without presenting clear alternatives.

Accessing unprotected HTML is actually a more serious issue because the databrowser is not loaded and cannot provide security mitigations. If the unprotected HTML is untrusted, then the page has same origin permissions it can exploit. (While noting that it won't be authenticated and therefore would need an exploit to access solid storage without user permission). Mitigating protected vs unprotected HTML documents is therefore a different ball game.

@elf-pavlik
Copy link
Member

This is already how things work with Community Solid Server, given that it does not support authentication using cookies. As a result, we have plenty of experience with accessing protected HTML documents hosted in solid storage.

I asked about that during one of the recent Solid CG meetings but no one seemed to have any references. Do you know if what you describe is documented somewhere, with details on how the data browser gets served after an unauthorized request, how it redirects the user to Solid-OIDC Provider to sign in, and later after getting redirected back displays the protected HTML page? I don't want to sidetrack this issue, we could follow up on it on the main matrix channel if it's ok with you. Thank you!

@josephguillaume
Copy link

I may not be the right person to provide authoritative information, but here's an attempt.

Community Solid Server is configured to provide the databrowser as default text/html representation here:
https://github.com/CommunitySolidServer/Recipes/blob/eb479fbb19eca68f001ed0b1133266dfc07ecc58/mashlib/config-mashlib.json#L49

Actual code does what you would expect, i.e. provides the databrowser instead of the content.
https://github.com/CommunitySolidServer/CommunitySolidServer/blob/fbe60371b9643961200c6b7aff736c6269a8331f/src/storage/conversion/ConstantConverter.ts#L116

As far as I understand, this gets triggered for http errors too, i.e. the databrowser gets returned.

So far from a security perspective, it is clear that the databrowser file itself is a server side file that is not editable by the user. While the code would return text/html content if it exists, if a token hasn't been provided then at this point what we have is a http error, not the original text/html.

The databrowser attempts to load the resource and displays an error.
https://github.com/SolidOS/solid-panes/blob/1b1ad666de01a80a3aa97c375b0d84d25c3f6850/src/outline/manager.js#L2109

If the user logs in, then the databrowser fetch succeeds. I'll skip over how the databrowser works out how to display the actual text/html page, but ultimately it ends up doing an authenticated fetch and setting the resulting blob as src of an IFrame
https://github.com/SolidOS/solid-panes/blob/1b1ad666de01a80a3aa97c375b0d84d25c3f6850/src/humanReadablePane.js#L107

This replaces previous simpler code that simply set the src uri. That used to work because NSS allowed cookie-based authentication.

SolidOS/solid-panes@e09aa9b#diff-169430e512cdad60ff6381cb8655365a6b0cc210df6805d6d3fc48b3a6a0abe2L107

From a security point of view, same origin cookie based attacks to access protected resources are therefore no longer possible because a token is needed. This is obviously a key reason we've moved to the token approach in the first place, and that the data browser has been modified rather than getting community solid server to implement cookie based access.

It has been noted that this is not developer friendly, but it is what it is given the spec's security model.

CommunitySolidServer/CommunitySolidServer#1392

I don't really use matrix and I'm not sure which is the main channel, but happy to be taught if needed.

@elf-pavlik
Copy link
Member

Thank you @josephguillaume, much appreciated 🙏

I don't really use matrix and I'm not sure which is the main channel, but happy to be taught if needed.

This is a generic matrix.to link to the mentioned conversation. Many of us used Gitter before and now use the Element (matrix web client) provided by Gitter as part of our transition to Matrix (same link just on app.gitter.im). If you look at this conversation I also linked there to an old issue where I was experimenting with using authenticated requests from Service Worker without needing an iframe solid/solid#143

The information you provided is really helpful and I plan to look at where it could be documented and made more broadly available. At the same time, I'm worried that I might have us drifting from the original issue too far. You already addressed the relevant concern I had, which I think is great to have captured here. Diving further into details of serving protected HTML documents will be better served in a separate issue or discussion. Thank you once again!

@josephguillaume
Copy link

It sounds like you might be interested in streamlining access to a protected HTML page, in which case there is an additional security implication.

On load, the databrowser follows a standard session restore flow to receive a token via a redirect from the identity provider.

https://github.com/SolidOS/solid-logic/blob/24c693ce69749c851559942ba2811e0d642cad8a/src/authn/SolidAuthnLogic.ts#L53
https://docs.inrupt.com/developer-tools/javascript/client-libraries/tutorial/restore-session-browser-refresh/

If the databrowser is registered and logged in on the identity provider, this is a silent authentication, i.e. just pasting the URL to a protected HTML page will result in that page being displayed by the databrowser without user intervention.

The data browser plays a critical security role here - it ensures that the (possibly untrusted) protected HTML page never has access to the token or an authenticated fetch - neither in receiving the redirect nor when the HTML page is displayed 1.

Any attempt to streamline the process would need some other component of the system to take on this security role.

Footnotes

  1. Note that this depends on the implementation - it may be possible for an IFrame to have access to its parent's authenticated fetch.

@josephguillaume
Copy link

Related:
Physhing risk when hosting HTML files
SolidOS/solidos#137

Strategy was to Render HTML files as text and not web application
CommunitySolidServer/CommunitySolidServer#1226

@Otto-AA
Copy link
Author

Otto-AA commented Aug 16, 2023

I think I'd prefer to add the Content-Disposition: attachment header (as suggested in the initial issue). If we return text/plain, then we'd lose some information and some applications will probably overwrite it as text/plain on editing as well (ie, we'd completely lose the information that it is a html file). In such a scenario, it seems pretty similar to simply blocking text/html, as we can't use the content type information anyway.

(The same applies to the svg content type, as this could also execute javascript)

@Otto-AA Otto-AA changed the title Security considerations for serving html files Security issues because of serving html files Aug 16, 2023
@Otto-AA
Copy link
Author

Otto-AA commented Nov 22, 2023

I've created a PR (#598 ) that would add following security consideration:

Servers are encouraged to apply security measures when serving user-created files. Multiple agents can create files on the same server, which could render same-origin security boundaries useless. As an example countermeasure, servers could add a Content-Security-Policy: sandbox header to artificially enable same-origin security policies for files on the same origin.

@michielbdejong
Copy link
Contributor

michielbdejong commented Jan 15, 2024

The PR for this issue is on the agenda for the 17 January CG Call

@elf-pavlik
Copy link
Member

Relevant security considerations are being worked on in https://solid.github.io/security-considerations/

Can this issue be closed?

@Otto-AA
Copy link
Author

Otto-AA commented May 30, 2024

Yes, can be closed

@Otto-AA Otto-AA closed this as completed May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants