Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a way to gracefully and securely handle non-percent-encoded query string values #1685

Open
kgriffs opened this issue Feb 25, 2020 · 15 comments

Comments

@kgriffs
Copy link
Member

kgriffs commented Feb 25, 2020

When the web server allows a non-RFC-compliant query string (i.e., one that includes non-ASCII byte values that were not percent-encoded) to be passed to the app, we should provide a way for the app to at least detect such values. We might also provide an option for automatically rejecting them or perhaps attempting to decode them as UTF8.

Should we also add support for automatically percent-encoding query params when using the testing.simulate_* methods?

@kgriffs
Copy link
Member Author

kgriffs commented Feb 25, 2020

@vytas7:

I noticed that Nginx blocks such requests outright with a 400, does not propagate them further to the app server.

@vytas7
Copy link
Member

vytas7 commented Feb 25, 2020

Re Nginx, it's at least obvious if there is invalid percent coding: https://github.com/nginx/nginx/blob/37984f0be1a5d608015517a64927d498888fb7ca/src/http/ngx_http_parse.c#L1532
But I've tested non-ASCII quickly and it seems to yield a 400 as well.

@vytas7
Copy link
Member

vytas7 commented Feb 25, 2020

Talking this issue in general, I'm admittedly not completely convinced it's something a Python 3 framework should/is able to handle at all as per PEP-3333 mechanics.
Could we craft a simple example demonstrating where this is a practical problem under Python 3, be it using the Falcon test client, or a real server?

On Python 2, this hearkens back to #1031 which was IIRC quite hairy to properly address.

But regardless of Python version, I OTOH agree it might be worth reviewing how our test client handles path encoding edge cases.

@sirfz
Copy link

sirfz commented Jul 18, 2023

Not sure if this is the issue this addresses but here's an example:

/search?offset=0&query=películas%20de%20terror&limit=10&version=

request.params returns

{'offset': ['0'], 'query': ['pelÃ-culas de terror'], 'limit': ['10']}

urllib.parse.parse_qs handles it correctly:

from urllib.parse import parse_qs

parse_qs("offset=0&query=películas%20de%20terror&limit=10&version=")

returns

{'offset': ['0'], 'query': ['películas de terror'], 'limit': ['10']}

@vytas7
Copy link
Member

vytas7 commented Jul 18, 2023

Hi @sirfz,
yes this is potentially an example.

Just checking, which WSGI (or ASGI?) app server are you using? And which HTTP client?

@sirfz
Copy link

sirfz commented Jul 18, 2023

it's WSGI hosted with the latest uWSGI (testing with curl).

update: some extra context about this issue. The encoding of the query_string itself seems to be messed up so can't even work around it by parsing the raw string (same applies for get_param as well as uri and relative_uri parameters). Not sure if it's a falcon or uwsgi issue?

@vytas7
Copy link
Member

vytas7 commented Jul 18, 2023

I don't think it is a Falcon or uWSGI issue per se, but we filed this to track scenarios like yours, and discuss whether we as a framework can do anything sensible about it.

If, as mentioned, you run it behind Nginx, non-ASCII URLs are rejected outright with an HTTP 400, so probably it is less of an issue in a typical production environment.

In PEP-3333 (WSGI), strings are tunneled as Latin-1, and if curl sends a UTF-8 string, things can get messed up completely, as you observed.

@rokclimb15
Copy link

Example of an error I get in production. This isn't a valid URL path in my app, so I can block these upstream, but some option to handle decoding gracefully and route would be ideal.

[ERROR] UnicodeEncodeError: 'latin-1' codec can't encode character '\u30fc' in position 1: ordinal not in range(256)
Traceback (most recent call last):
  File "/var/task/apig_wsgi/__init__.py", line 97, in handler
    result = wsgi_app(environ, response.start_response)
  File "falcon/app.py", line 304, in falcon.app.App.__call__
    req = self._request_type(env, options=self.req_options)
  File "falcon/request.py", line 479, in falcon.request.Request.__init__
    path = path.encode('iso-8859-1').decode('utf-8', 'replace')

@vytas7
Copy link
Member

vytas7 commented Sep 25, 2024

Hi @rokclimb15 ,
and thanks for reporting this.

Just to understand this better, which WSGI server are you using? Do you have any reverse proxy or API gateway in front of it? And if possible to reveal it to us, how does the whole URL path look like?

@rokclimb15
Copy link

rokclimb15 commented Sep 25, 2024

It's an AWS HTTP Gateway, similar to API Gateway. CloudFront is in front of that. I'm going to run a trace to capture one of these bad URLs. It's possible it's an interaction with the WSGI wrapper for the API environment, which does the following:

"PATH_INFO": urllib.parse.unquote(event["rawPath"], encoding="iso-8859-1")

@vytas7
Copy link
Member

vytas7 commented Sep 26, 2024

@rokclimb15 looking at this again, maybe we should file a separate issue for your case.
Because as I understand, this one talks about query string values, not the path itself.

The path that arrives to the application does not comply with the WSGI spec, which says that it must be UTF-8 bytes tunneled as Latin-1 (so no character above 0xFF). We can still see if we can provide a better way to handle such path.

Would automatically raising an HTTP 400 fit your use case, or would you like to be able to intercept it regardless?

@rokclimb15
Copy link

@vytas7 I had failed to notice that this ticket has to do with query strings, thanks for pointing that out! If the path arriving doesn't comply with WSGI then the bug is actually in https://github.com/adamchainz/apig-wsgi. I'll work on gathering more details and filing/fixing it there.

If Falcon caught this exception and raised a 400 that would be helpful I think.

@vytas7
Copy link
Member

vytas7 commented Sep 26, 2024

Cool beans. I'll file a new issue for rendering an HTTP 400 in the case the path contains characters that are not representable in the ISO-8859-1 encoding.

@vytas7
Copy link
Member

vytas7 commented Sep 26, 2024

@rokclimb15 I filed this as #2340 so that we can make a decision on this. According to the spec, Falcon actually is not doing anything wrong by exploding with an unhandled exception in this case; however, we could still provide a more helpful error, be it an HTTP client error, or a Python exception with a more helpful error message.

As to https://github.com/adamchainz/apig-wsgi, you can quote this part of the spec (quoted below) in your bug report:

On Python platforms where the str or StringType type is in fact Unicode-based (e.g. Jython, IronPython, Python 3, etc.), all “strings� referred to in this specification must contain only code points representable in ISO-8859-1 encoding (\u0000 through \u00FF, inclusive). It is a fatal error for an application to supply strings containing any other Unicode character or code point. Similarly, servers and gateways must not supply strings to an application containing any other Unicode characters.

@CaselIT
Copy link
Member

CaselIT commented Sep 26, 2024

I think a 500 is a better code if we decide to handle this error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants