Python Chinese Encoding
## Python Chinese Encoding
In previous tutorials, we learned how to output **"Hello, World!"** using Python. While printing English characters works seamlessly out of the box, you may encounter encoding issues when attempting to print Chinese characters (such as **"δ½ ε₯½οΌδΈη"**) depending on your Python version and environment configuration.
This tutorial explains why these encoding errors occur and how to resolve them in both Python 2 and Python 3.
---
## The Root Cause of Encoding Errors
If a Python source file contains non-ASCII characters (such as Chinese characters) and does not explicitly declare its encoding, the interpreter may fail to parse the file and throw a syntax error.
Consider the following script (`test.py`):
```python
#!/usr/bin/python
print("δ½ ε₯½οΌδΈη")
```
If run under an environment that defaults to ASCII (such as Python 2), the execution will fail with the following error:
```text
File "test.py", line 2
SyntaxError: Non-ASCII character '\xe4' in file test.py on line 2, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details
```
### Why does this happen?
* **Python 2.x** defaults to the **ASCII** encoding format. Because ASCII cannot represent Chinese characters, the interpreter throws a `SyntaxError` when it encounters bytes outside the standard ASCII range (0β127).
* **Python 3.x** defaults to **UTF-8** encoding. Therefore, Python 3 can parse Chinese characters natively without requiring any special encoding declarations.
---
## Solution: Declaring Source Code Encoding
To resolve this issue in environments that do not default to UTF-8 (such as Python 2), you must declare the file encoding at the very top of your Python script.
### Syntax Options
You can use either of the following magic comments as the first or second line of your script:
**Option 1 (Standard PEP 263 style):**
```python
# -*- coding: UTF-8 -*-
```
**Option 2 (Simplified style):**
```python
# coding=utf-8
```
> **Note:** If you use the simplified `coding=utf-8` syntax, ensure there are **no spaces** around the equals (`=`) sign.
---
## Code Examples
### Python 2.x Compatible Example
By adding the encoding declaration at the top of the file, the Python interpreter will successfully parse and print the Chinese characters.
```python
#!/usr/bin/python
# -*- coding: UTF-8 -*-
print("δ½ ε₯½οΌδΈη")
```
**Output:**
```text
δ½ ε₯½οΌδΈη
```
---
## Important Considerations
### 1. Python 3.x Behavior
Python 3.x source files default to UTF-8 encoding. If you are using Python 3, you can safely write and execute code containing Chinese characters without adding any encoding declarations at the top of your files.
### 2. Editor File Encoding Alignment
Even if you declare `# -*- coding: UTF-8 -*-` in your code, you must ensure that your text editor or IDE actually saves the physical file on disk using the **UTF-8** encoding format.
If there is a mismatch (for example, your editor saves the file as GBK/ANSI but the code declares UTF-8), you will encounter decoding errors such as:
```text
SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xc4 in position 0: invalid continuation byte
```
#### How to Configure UTF-8 in PyCharm:
1. Go to **File > Settings** (or **PyCharm > Preferences** on macOS).
2. Search for **encoding** in the settings search bar.
3. Navigate to **Editor > File Encodings**.
4. Set both **Global Encoding** (IDE Encoding) and **Project Encoding** to **UTF-8**.
!(https://www.runoob.com/wp-content/uploads/2014/12/pycharm-utf8.jpg)
YouTip