Tool use

注:Python 版本使用的是 Python 3。

  • 安装 Python 运行库:
1
pip3 install -U pangu
  • 参考作者的脚本,编写如下 Python 代码:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
import re
import pangu


def add_space_between_content(original_text):
def add_space(match):
return match.group(1)

processed_text = pangu.spacing(original_text)
pattern = r'([a-zA-Z]+\d+)'
processed_text = re.sub(pattern, add_space, processed_text)
pattern = r'(\d+[a-zA-Z]+)'
processed_text = re.sub(pattern, add_space, processed_text)
pattern = r"\n\s*\n"
processed_text = re.sub(pattern, "\n", processed_text)
return processed_text


def replace_punctuation(text):
punctuation_map = {
'.': '。',
',': ',',
'!': '!',
'?': '?',
# ':': ':',
';': ';',
'(': '(',
')': ')',
}
for eng_punct, chi_punct in punctuation_map.items():
text = text.replace(eng_punct, chi_punct)
return text


def change_file(source, target):
source_file = open(source, encoding='utf-8', errors='ignore')
original_text = source_file.read()
source_file.close()
modified_string = replace_punctuation(original_text)
modified_string = add_space_between_content(modified_string)
target_file = open(target, 'w', encoding='utf-8', errors='ignore')
target_file.write(modified_string)
target_file.close()


file_source = r'source.txt' # 源文件
file_target = r'target.txt' # 目标文件
change_file(file_source, file_target)
  • 将要修改的文件名改为 source.txt 即可,结果如下:

image-20240711211837429

  • 当然也可以创建一个 bat 文件,这样就不用来回的命令执行了:
1
2
3
4
5
@echo off
setlocal
cd /d "%~dp0"
python3 main.py
endlocal