- 最近在借鉴
(Copy)汇编相关的书籍内容,但是 Ctrl C 与 Ctrl V 都会出现英文/数字与中文粘连的情况。
- 身为究极强迫症患者实在是不爽,一个字一个字又非常浪费时间,在网上找到个脚本,这里借鉴一下。
- 本文章参考:
注:Python 版本使用的是 Python 3。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
| import re import pangu
def add_space_between_content(original_text): def add_space(match): return match.group(1)
processed_text = pangu.spacing(original_text) pattern = r'([a-zA-Z]+\d+)' processed_text = re.sub(pattern, add_space, processed_text) pattern = r'(\d+[a-zA-Z]+)' processed_text = re.sub(pattern, add_space, processed_text) pattern = r"\n\s*\n" processed_text = re.sub(pattern, "\n", processed_text) return processed_text
def replace_punctuation(text): punctuation_map = { '.': '。', ',': ',', '!': '!', '?': '?', ';': ';', '(': '(', ')': ')', } for eng_punct, chi_punct in punctuation_map.items(): text = text.replace(eng_punct, chi_punct) return text
def change_file(source, target): source_file = open(source, encoding='utf-8', errors='ignore') original_text = source_file.read() source_file.close() modified_string = replace_punctuation(original_text) modified_string = add_space_between_content(modified_string) target_file = open(target, 'w', encoding='utf-8', errors='ignore') target_file.write(modified_string) target_file.close()
file_source = r'source.txt' file_target = r'target.txt' change_file(file_source, file_target)
|
- 将要修改的文件名改为 source.txt 即可,结果如下:
- 当然也可以创建一个 bat 文件,这样就不用来回的命令执行了:
1 2 3 4 5
| @echo off setlocal cd /d "%~dp0" python3 main.py endlocal
|