Python Chinese Encoding

## Python Chinese Encoding In previous tutorials, we learned how to output **"Hello, World!"** using Python. While printing English characters works seamlessly out of the box, you may encounter encoding issues when attempting to print Chinese characters (such as **"你好，世界"**) depending on your Python version and environment configuration. This tutorial explains why these encoding errors occur and how to resolve them in both Python 2 and Python 3. --- ## The Root Cause of Encoding Errors If a Python source file contains non-ASCII characters (such as Chinese characters) and does not explicitly declare its encoding, the interpreter may fail to parse the file and throw a syntax error. Consider the following script (`test.py`): ```python #!/usr/bin/python print("你好，世界") ``` If run under an environment that defaults to ASCII (such as Python 2), the execution will fail with the following error: ```text File "test.py", line 2 SyntaxError: Non-ASCII character '\xe4' in file test.py on line 2, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details ``` ### Why does this happen? * **Python 2.x** defaults to the **ASCII** encoding format. Because ASCII cannot represent Chinese characters, the interpreter throws a `SyntaxError` when it encounters bytes outside the standard ASCII range (0–127). * **Python 3.x** defaults to **UTF-8** encoding. Therefore, Python 3 can parse Chinese characters natively without requiring any special encoding declarations. --- ## Solution: Declaring Source Code Encoding To resolve this issue in environments that do not default to UTF-8 (such as Python 2), you must declare the file encoding at the very top of your Python script. ### Syntax Options You can use either of the following magic comments as the first or second line of your script: **Option 1 (Standard PEP 263 style):** ```python # -*- coding: UTF-8 -*- ``` **Option 2 (Simplified style):** ```python # coding=utf-8 ``` > **Note:** If you use the simplified `coding=utf-8` syntax, ensure there are **no spaces** around the equals (`=`) sign. --- ## Code Examples ### Python 2.x Compatible Example By adding the encoding declaration at the top of the file, the Python interpreter will successfully parse and print the Chinese characters. ```python #!/usr/bin/python # -*- coding: UTF-8 -*- print("你好，世界") ``` **Output:** ```text 你好，世界 ``` --- ## Important Considerations ### 1. Python 3.x Behavior Python 3.x source files default to UTF-8 encoding. If you are using Python 3, you can safely write and execute code containing Chinese characters without adding any encoding declarations at the top of your files. ### 2. Editor File Encoding Alignment Even if you declare `# -*- coding: UTF-8 -*-` in your code, you must ensure that your text editor or IDE actually saves the physical file on disk using the **UTF-8** encoding format. If there is a mismatch (for example, your editor saves the file as GBK/ANSI but the code declares UTF-8), you will encounter decoding errors such as: ```text SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xc4 in position 0: invalid continuation byte ``` #### How to Configure UTF-8 in PyCharm: 1. Go to **File > Settings** (or **PyCharm > Preferences** on macOS). 2. Search for **encoding** in the settings search bar. 3. Navigate to **Editor > File Encodings**. 4. Set both **Global Encoding** (IDE Encoding) and **Project Encoding** to **UTF-8**. !(https://www.runoob.com/wp-content/uploads/2014/12/pycharm-utf8.jpg)

YouTip

Python Chinese Encoding

📂 Categories