目录
下面是关于Debian 系统上可用的格式转化工具及其相关提示的信息。
Standard based tools are in very good shape but support for proprietary data formats are limited.
如下是文本数据转换工具。
表 11.1. 文本数据转化工具列表
软件包 | 流行度 | 大小 | 关键词 | 说明 |
---|---|---|---|---|
libc6
|
V:928, I:998 | 10670 | charset | text encoding converter between locales by iconv(1) (fundamental) |
recode
|
V:5, I:36 | 608 | charset+eol | text encoding converter between locales (versatile, more aliases and features) |
konwert
|
V:2, I:59 | 122 | charset | text encoding converter between locales (fancy) |
nkf
|
V:1, I:11 | 346 | charset | character set translator for Japanese |
tcs
|
V:0, I:0 | 544 | charset | character set translator |
unaccent
|
V:0, I:0 | 76 | charset | replace accented letters by their unaccented equivalent |
tofrodos
|
V:3, I:36 | 50 | eol | text format converter between DOS and Unix: fromdos(1) and todos(1) |
macutils
|
V:0, I:1 | 320 | eol | text format converter between Macintosh and Unix: frommac(1) and tomac(1) |
![]() |
提示 |
---|---|
iconv(1)
是 |
你能够通过如下的命令用 iconv(1) 来转换文本文件的编码。
$ iconv -f encoding1 -t encoding2 input.txt >output.txt
Encoding values are case insensitive and ignore "-
" and
"_
" for matching. Supported encodings can be checked by
the "iconv -l
" command.
表 11.2. 编码值和用法的列表
编码值 | 用法 |
---|---|
ASCII | American Standard Code for Information Interchange, 7 bit code w/o accented characters |
UTF-8 | current multilingual standard for all modern OSs |
ISO-8859-1 | old standard for western European languages, ASCII + accented characters |
ISO-8859-2 | old standard for eastern European languages, ASCII + accented characters |
ISO-8859-15 | old standard for western European languages, ISO-8859-1 with euro sign |
CP850 | code page 850, Microsoft DOS characters with graphics for western European languages, ISO-8859-1 variant |
CP932 | code page 932, Microsoft Windows style Shift-JIS variant for Japanese |
CP936 | code page 936, Microsoft Windows style GB2312, GBK or GB18030 variant for Simplified Chinese |
CP949 | code page 949, Microsoft Windows style EUC-KR or Unified Hangul Code variant for Korean |
CP950 | code page 950, Microsoft Windows style Big5 variant for Traditional Chinese |
CP1251 | code page 1251, Microsoft Windows style encoding for the Cyrillic alphabet |
CP1252 | code page 1252, Microsoft Windows style ISO-8859-15 variant for western European languages |
KOI8-R | old Russian UNIX standard for the Cyrillic alphabet |
ISO-2022-JP | standard encoding for Japanese email which uses only 7 bit codes |
eucJP | old Japanese UNIX standard 8 bit code and completely different from Shift-JIS |
Shift-JIS | JIS X 0208 Appendix 1 standard for Japanese (see CP932) |
![]() |
注意 |
---|---|
Some encodings are only supported for the data conversion and are not used as locale values (第 8.3.1 节 “编码的基础知识”). |
For character sets which fit in single byte such as ASCII and ISO-8859 character sets, the character encoding means almost the same thing as the character set.
For character sets with many characters such as JIS X 0213 for Japanese or Universal Character Set (UCS, Unicode, ISO-10646-1) for practically all languages, there are many encoding schemes to fit them into the sequence of the byte data.
EUC and ISO/IEC 2022 (also known as JIS X 0202) for Japanese
UTF-8, UTF-16/UCS-2 and UTF-32/UCS-4 for Unicode
For these, there are clear differentiations between the character set and the character encoding.
The code page is used as the synonym to the character encoding tables for some vendor specific ones.
![]() |
注意 |
---|---|
Please note most encoding systems share the same code with ASCII for the 7
bit characters. But there are some exceptions. If you are converting old
Japanese C programs and URLs data from the casually-called shift-JIS
encoding format to UTF-8 format, use " |
![]() |
提示 |
---|---|
recode(1)
may be used too and offers more than the combined functionality of
iconv(1),
fromdos(1),
todos(1),
frommac(1),
and
tomac(1).
For more, see " |
你能够通过如下命令用 iconv(1) 来检查一个文本文件是不是用 UTF-8 编码的。
$ iconv -f utf8 -t utf8 input.txt >/dev/null || echo "non-UTF-8 found"
![]() |
提示 |
---|---|
Use " |
Here is an example script to convert encoding of file names from ones created under older OS to modern UTF-8 ones in a single directory.
#!/bin/sh ENCDN=iso-8859-1 for x in *; do mv "$x" "$(echo "$x" | iconv -f $ENCDN -t utf-8)" done
The "$ENCDN
" variable specifies the original encoding
used for file names under older OS as in 表 11.2 “编码值和用法的列表”.
For more complicated case, please mount a filesystem (e.g. a partition on a
disk drive) containing such file names with proper encoding as the
mount(8)
option (see 第 8.3.6 节 “文件名编码”) and copy its entire
contents to another filesystem mounted as UTF-8 with "cp
-a
" command.
The text file format, specifically the end-of-line (EOL) code, is dependent on the platform.
表 11.3. List of EOL styles for different platforms
platform | EOL code | control | decimal | hexadecimal |
---|---|---|---|---|
Debian (unix) | LF |
^J
|
10 | 0A |
MSDOS and Windows | CR-LF |
^M^J
|
13 10 | 0D 0A |
Apple's Macintosh | CR |
^M
|
13 | 0D |
The EOL format conversion programs, fromdos(1), todos(1), frommac(1), and tomac(1), are quite handy. recode(1) is also useful.
![]() |
注意 |
---|---|
Some data on the Debian system, such as the wiki page data for the
|
![]() |
注意 |
---|---|
Most editors (eg. |
![]() |
提示 |
---|---|
The use of " |
这里有一些转换 TAB 代码的专业工具。
表 11.4. List of TAB conversion commands from bsdmainutils
and
coreutils
packages
功能 |
bsdmainutils
|
coreutils
|
---|---|---|
把制表符扩展成空格 |
"col -x "
|
expand
|
不把空格扩展成制表符 |
"col -h "
|
unexpand
|
indent(1)
from the indent
package completely reformats whitespaces
in the C program.
例如 vim
和 emacs
这样的编辑软件可以被用来扩展 TAB。就拿
vim
来说,你能够按顺序输入 ":set expandtab
" 和
":%retab
" 命令来扩展 TAB。你也可以按顺序输入 :%set
noexpandtab
" 和 ":%retab
" 命令来复原。
像 vim
这样的现代智能编辑器软件是相当聪明的并且能够处理任何编码系统以及任何文件格式。你应该在支持 UTF-8
编码的控制台上并在 UTF-8 环境下使用这些编辑器来获得最好的兼容性。
An old western European Unix text file, "u-file.txt
",
stored in the latin1 (iso-8859-1) encoding can be edited simply with
vim
by the following.
$ vim u-file.txt
This is possible since the auto detection mechanism of the file encoding in
vim
assumes the UTF-8 encoding first and, if it fails,
assumes it to be latin1.
An old Polish Unix text file, "pu-file.txt
", stored in
the latin2 (iso-8859-2) encoding can be edited with vim
by the following.
$ vim '+e ++enc=latin2 pu-file.txt'
An old Japanese unix text file, "ju-file.txt
", stored in
the eucJP encoding can be edited with vim
by the
following.
$ vim '+e ++enc=eucJP ju-file.txt'
An old Japanese MS-Windows text file, "jw-file.txt
",
stored in the so called shift-JIS encoding (more precisely: CP932) can be
edited with vim
by the following.
$ vim '+e ++enc=CP932 ++ff=dos jw-file.txt'
When a file is opened with "++enc
" and
"++ff
" options, ":w
" in the Vim
command line stores it in the original format and overwrite the original
file. You can also specify the saving format and the file name in the Vim
command line, e.g., ":w ++enc=utf8 new.txt
".
Please refer to the mbyte.txt "multi-byte text support" in
vim
on-line help and 表 11.2 “编码值和用法的列表” for locale values used with
"++enc
".
The emacs
family of programs can perform the equivalent
functions.
The following reads a web page into a text file. This is very useful when copying configurations off the Web or applying basic Unix text tools such as grep(1) on the web page.
$ w3m -dump http://www.remote-site.com/help-info.html >textfile
同样,你可以使用如下所示的工具从其他格式提取纯文本数据。
表 11.5. 用于提取纯文本数据的工具列表
软件包 | 流行度 | 大小 | 关键词 | 功能 |
---|---|---|---|---|
w3m
|
V:275, I:835 | 2292 | html→text |
HTML to text converter with the "w3m -dump " command
|
html2text
|
V:28, I:85 | 229 | html→text | advanced HTML to text converter (ISO 8859-1) |
lynx
|
V:37, I:107 | 1901 | html→text |
HTML to text converter with the "lynx -dump " command
|
elinks
|
V:18, I:34 | 1587 | html→text |
HTML to text converter with the "elinks -dump " command
|
links
|
V:21, I:47 | 2135 | html→text |
HTML to text converter with the "links -dump " command
|
links2
|
V:3, I:18 | 5403 | html→text |
HTML to text converter with the "links2 -dump " command
|
antiword
|
V:7, I:15 | 614 | MSWord→text,ps | convert MSWord files to plain text or ps |
catdoc
|
V:24, I:38 | 666 | MSWord→text,TeX | convert MSWord files to plain text or TeX |
pstotext
|
V:4, I:6 | 127 | ps/pdf→text | extract text from PostScript and PDF files |
unhtml
|
V:0, I:0 | 66 | html→text | remove the markup tags from an HTML file |
odt2txt
|
V:3, I:6 | 53 | odt→text | converter from OpenDocument Text to text |
你可以通过如下所示的来高亮并格式化纯文本数据。
表 11.6. 高亮纯文本数据的工具列表
软件包 | 流行度 | 大小 | 关键词 | 说明 |
---|---|---|---|---|
vim-runtime
|
V:20, I:431 | 27567 | 高亮 |
Vim MACRO to convert source code to HTML with ":source
$VIMRUNTIME/syntax/html.vim "
|
cxref
|
V:0, I:0 | 1157 | c→html | 从 C 程序到 latex 和 HTML(C语言)的转换器 |
src2tex
|
V:0, I:0 | 612 | 高亮 | 转换许多源代码到 TeX(C语言) |
source-highlight
|
V:1, I:7 | 2008 | 高亮 | convert many source codes to HTML, XHTML, LaTeX, Texinfo, ANSI color escape sequences and DocBook files with highlight (C++) |
highlight
|
V:1, I:16 | 943 | 高亮 | convert many source codes to HTML, XHTML, RTF, LaTeX, TeX or XSL-FO files with highlight (C++) |
grc
|
V:0, I:2 | 60 | text→color | generic colouriser for everything (Python) |
txt2html
|
V:0, I:4 | 296 | text→html | 文本到 HTML 转换器(Perl) |
markdown
|
V:0, I:6 | 56 | text→html | markdown text document formatter to (X)HTML (Perl) |
asciidoc
|
V:1, I:14 | 2442 | text→any | AsciiDoc text document formatter to XML/HTML (Python) |
pandoc
|
V:3, I:23 | 69422 | text→any | general markup converter (Haskell) |
python-docutils
|
V:35, I:554 | 1653 | text→any | ReStructured Text document formatter to XML (Python) |
txt2tags
|
V:0, I:1 | 951 | text→any | document conversion from text to HTML, SGML, LaTeX, man page, MoinMoin, Magic Point and PageMaker (Python) |
udo
|
V:0, I:0 | 548 | text→any | universal document - text processing utility (C language) |
stx2any
|
V:0, I:0 | 264 | text→any | document converter from structured plain text to other formats (m4) |
rest2web
|
V:0, I:0 | 526 | text→html | document converter from ReStructured Text to html (Python) |
aft
|
V:0, I:0 | 235 | text→any | "free form" document preparation system (Perl) |
yodl
|
V:0, I:0 | 522 | text→any | pre-document language and tools to process it (C language) |
sdf
|
V:0, I:0 | 1445 | text→any | simple document parser (Perl) |
sisu
|
V:0, I:0 | 5338 | text→any | document structuring, publishing and search framework (Ruby) |
The Extensible Markup Language (XML) is a markup language for documents containing structured information.
See introductory information at XML.COM.
XML text looks somewhat like HTML. It enables
us to manage multiple formats of output for a document. One easy XML system
is the docbook-xsl
package, which is used here.
Each XML file starts with standard XML declaration as the following.
<?xml version="1.0" encoding="UTF-8"?>
The basic syntax for one XML element is marked up as the following.
<name attribute="value">content</name>
XML element with empty content is marked up in the following short form.
<name attribute="value"/>
The "attribute="value"
" in the above examples are
optional.
The comment section in XML is marked up as the following.
<!-- comment -->
Other than adding markups, XML requires minor conversion to the content using predefined entities for following characters.
表 11.7. List of predefined entities for XML
predefined entity | character to be converted into |
---|---|
"
|
" : quote
|
'
|
' : apostrophe
|
<
|
< : less-than
|
>
|
> : greater-than
|
&
|
& : ampersand
|
![]() |
小心 |
---|---|
" |
![]() |
注意 |
---|---|
When SGML style user defined entities,
e.g. " |
![]() |
注意 |
---|---|
As long as the XML markup are done consistently with certain set of the tag name (either some data as content or attribute value), conversion to another XML is trivial task using Extensible Stylesheet Language Transformations (XSLT). |
There are many tools available to process XML files such as the Extensible Stylesheet Language (XSL).
Basically, once you create well formed XML file, you can convert it to any format using Extensible Stylesheet Language Transformations (XSLT).
The Extensible Stylesheet
Language for Formatting Objects (XSL-FO) is supposed to be solution
for formatting. The fop
package is new to the Debian
main
archive due to its dependence to the Java programing language. So the
LaTeX code is usually generated from XML using XSLT and the LaTeX system is
used to create printable file such as DVI, PostScript, and PDF.
表 11.8. List of XML tools
软件包 | 流行度 | 大小 | 关键词 | 说明 |
---|---|---|---|---|
docbook-xml
|
I:533 | 2131 | xml | XML document type definition (DTD) for DocBook |
xsltproc
|
V:14, I:123 | 148 | xslt | XSLT command line processor (XML→ XML, HTML, plain text, etc.) |
docbook-xsl
|
V:15, I:233 | 14998 | xml/xslt | XSL stylesheets for processing DocBook XML to various output formats with XSLT |
xmlto
|
V:3, I:37 | 121 | xml/xslt | XML-to-any converter with XSLT |
dbtoepub
|
V:0, I:1 | 71 | xml/xslt | DocBook XML to .epub converter |
dblatex
|
V:5, I:25 | 4639 | xml/xslt | convert Docbook files to DVI, PostScript, PDF documents with XSLT |
fop
|
V:3, I:53 | 64 | xml/xsl-fo | convert Docbook XML files to PDF |
Since XML is subset of Standard Generalized Markup Language (SGML), it can be processed by the extensive tools available for SGML, such as Document Style Semantics and Specification Language (DSSSL).
表 11.9. List of DSSSL tools
软件包 | 流行度 | 大小 | 关键词 | 说明 |
---|---|---|---|---|
openjade
|
V:3, I:34 | 921 | dsssl | ISO/IEC 10179:1996 standard DSSSL processor (latest) |
openjade1.3
|
V:0, I:0 | 2199 | dsssl | ISO/IEC 10179:1996 standard DSSSL processor (1.3.x series) |
jade
|
V:0, I:12 | 825 | dsssl | James Clark's original DSSSL processor (1.2.x series) |
docbook-dsssl
|
V:2, I:39 | 2604 | xml/dsssl | DSSSL stylesheets for processing DocBook XML to various output formats with DSSSL |
docbook-utils
|
V:2, I:26 | 281 | xml/dsssl |
utilities for DocBook files including conversion to other formats (HTML,
RTF, PS, man, PDF) with docbook2* commands with DSSSL
|
sgml2x
|
V:0, I:0 | 90 | SGML/dsssl | converter from SGML and XML using DSSSL stylesheets |
You can extract HTML or XML data from other formats using followings.
表 11.10. List of XML data extraction tools
软件包 | 流行度 | 大小 | 关键词 | 说明 |
---|---|---|---|---|
wv
|
V:6, I:9 | 713 | MSWord→any | document converter from Microsoft Word to HTML, LaTeX, etc. |
texi2html
|
V:0, I:11 | 1832 | texi→html | converter from Texinfo to HTML |
man2html
|
V:0, I:3 | 133 | manpage→html | converter from manpage to HTML (CGI support) |
tex4ht
|
V:1, I:24 | 36 | tex↔html | converter between (La)TeX and HTML |
unrtf
|
V:2, I:4 | 137 | rtf→html | document converter from RTF to HTML, etc |
info2www
|
V:3, I:4 | 156 | info→html | converter from GNU info to HTML (CGI support) |
ooo2dbk
|
V:0, I:1 | 217 | sxw→xml | converter from OpenOffice.org SXW documents to DocBook XML |
wp2x
|
V:0, I:0 | 215 | WordPerfect→any | WordPerfect 5.0 and 5.1 files to TeX, LaTeX, troff, GML and HTML |
doclifter
|
V:0, I:0 | 457 | troff→xml | converter from troff to DocBook XML |
For non-XML HTML files, you can convert them to XHTML which is an instance of well formed XML. XHTML can be processed by XML tools.
表 11.11. List of XML pretty print tools
软件包 | 流行度 | 大小 | 关键词 | 说明 |
---|---|---|---|---|
libxml2-utils
|
V:25, I:322 | 177 | xml↔html↔xhtml | command line XML tool with xmllint(1) (syntax check, reformat, lint, …) |
tidy
|
V:2, I:17 | 83 | xml↔html↔xhtml | HTML syntax checker and reformatter |
Once proper XML is generated, you can use XSLT technology to extract data based on the mark-up context etc.
Unix上的 troff 程序最初是由 AT&T 公司开发的,可以被用做简单排版。现在被用来创建手册页。
Donald Knuth 发明的 Tex 是非常强大的排版工具也是实际上的标准。最初是由 Leslie Lamport 开发的 LaTex 使得用户可以更为方便的利用 Tex 的强大功能。
传统意义上,roff 是 Unix 上主要的文本处理系统。参见
roff(7),
groff(7),
groff(1),
grotty(1),
troff(1),
groff_mdoc(7),
groff_man(7),
groff_ms(7),
groff_me(7),
groff_mm(7)
和 "info groff
"。
You can read or print a good tutorial and reference on
"-me
" macro in
"/usr/share/doc/groff/
" by installing the
groff
package.
![]() |
提示 |
---|---|
" |
![]() |
提示 |
---|---|
To remove "^H" and "_" from a text file generated by
|
The TeX Live software distribution offers a
complete TeX system. The texlive
metapackage provides a
decent selection of the TeX Live packages
which should suffice for the most common tasks.
tex(1)
latex(1)
texdoc(1)
texdoctk(1)
"The TeXbook", 作者 Donald E. Knuth, (Addison-Wesley)
"LaTeX - A Document Preparation System", 作者 Leslie Lamport, (Addison-Wesley)
"The LaTeX Companion", 作者 Goossens, Mittelbach, Samarin, (Addison-Wesley)
This is the most powerful typesetting environment. Many SGML processors use this as their back end text
processor. Lyx provided by the
lyx
package and GNU
TeXmacs provided by the texmacs
package offer
nice WYSIWYG editing environment for LaTeX while many use Emacs and Vim as the choice
for the source editor.
有许多在线资源存在。
TEX Live Guide - TEX Live 2007
("/usr/share/doc/texlive-doc-base/english/texlive-en/live.html
")
(texlive-doc-base
包)
When documents become bigger, sometimes TeX may cause errors. You must
increase pool size in "/etc/texmf/texmf.cnf
" (or more
appropriately edit "/etc/texmf/texmf.d/95NonPath
" and run
update-texmf(8))
to fix this.
![]() |
注意 |
---|---|
The TeX source of "The TeXbook" is available at http://tug.ctan.org/tex-archive/systems/knuth/dist/tex/texbook.tex.
This file contains most of the required macros. I heard that you can
process this document with
tex(1)
after commenting lines 7 to 10 and adding " |
你能够用如下任意一个命令在打印机上漂亮的打印手册页。
$ man -Tps some_manpage | lpr
$ man -Tps some_manpage | mpage -2 | lpr
The second example prints 2 pages on one sheet.
Printable data is expressed in the PostScript format on the Debian system. Common Unix Printing System (CUPS) uses Ghostscript as its rasterizer backend program for non-PostScript printers.
处理可印刷的数据的核心是 Ghostscript PostScript 解释器,它能够生成光栅图像。
The latest upstream Ghostscript from Artifex was re-licensed from AFPL to GPL and merged all the latest ESP version changes such as CUPS related ones at 8.60 release as unified release.
表 11.14. Ghostscript PostScript 解释器列表
软件包 | 流行度 | 大小 | 说明 |
---|---|---|---|
ghostscript
|
V:160, I:691 | 224 | GPL Ghostscript PostScript/PDF 解释器 |
ghostscript-x
|
V:32, I:77 | 210 | GPL Ghostscript PostScript/PDF 解释器-X 显示支持 |
libpoppler64
|
V:19, I:53 | 3214 | PDF rendering library forked from the xpdf PDF viewer |
libpoppler-glib8
|
V:239, I:526 | 435 | PDF 渲染库(基于 Glib 的共享库) |
poppler-data
|
V:103, I:669 | 12123 | CMaps for PDF rendering library (for CJK support: Adobe-*) |
![]() |
提示 |
---|---|
" |
你能够使用 Ghostscript 中的 gs(1) 来合并两个 PostScript(PS) 或可移植文档格式(PDF) 文件。
$ gs -q -dNOPAUSE -dBATCH -sDEVICE=pswrite -sOutputFile=bla.ps -f foo1.ps foo2.ps $ gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=bla.pdf -f foo1.pdf foo2.pdf
![]() |
提示 |
---|---|
对于命令行来说,psmerge(1)
和 |
如下是处理可印刷数据的工具列表。
表 11.15. 处理可印刷数据的工具列表
软件包 | 流行度 | 大小 | 关键词 | 说明 |
---|---|---|---|---|
poppler-utils
|
V:52, I:492 | 526 | pdf→ps,text,… |
PDF 工具:pdftops , pdfinfo ,
pdfimages , pdftotext ,
pdffonts
|
psutils
|
V:12, I:221 | 219 | ps→ps | PostScript 文件转换工具 |
poster
|
V:0, I:8 | 49 | ps→ps | create large posters out of PostScript pages |
enscript
|
V:3, I:28 | 2111 | text→ps, html, rtf | 转化 ASCII 文本到 PostScript, HTML, RTF 或 Pretty-Print |
a2ps
|
V:2, I:31 | 3624 | text→ps | 'Anything to PostScript' converter and pretty-printer |
pdftk
|
V:9, I:56 | 2959 | pdf→pdf |
PDF 文档转换工具:pdftk
|
mpage
|
V:0, I:5 | 141 | text,ps→ps | print multiple pages per sheet |
html2ps
|
V:0, I:6 | 320 | html→ps | 从 HTML 到 PostScript 的转换器 |
gnuhtml2latex
|
V:0, I:1 | 53 | html→latex | 从 html 到 latex 的转换器 |
latex2rtf
|
V:0, I:7 | 438 | latex→rtf | 转换 LaTeX 文档到能被 Microsoft Word 读取的 RTF 格式的文档 |
ps2eps
|
V:8, I:114 | 94 | ps→eps | 从 PostScript 到 EPS(Encapsulated PostScript)的转换器 |
e2ps
|
V:0, I:0 | 112 | text→ps | Text to PostScript converter with Japanese encoding support |
impose+
|
V:0, I:1 | 180 | ps→ps | PostScript 工具 |
trueprint
|
V:0, I:0 | 138 | text→ps | 漂亮的打印许多源程序(C, C++, Java, Pascal, Perl, Pike, Sh, 和 Verilog)到 PostScript。(C 语言) |
pdf2svg
|
V:0, I:5 | 50 | ps→svg | PDF 到可升级的向量图形格式的转换器 |
pdftoipe
|
V:0, I:0 | 63 | ps→ipe | 从 PDF 到 IPE‘s XML 格式的转换器 |
Unix 通用打印系统(CUPS) 中的 lp(1) 和 lpr(1) 命令都提供了自定义打印数据的选项。
You can print 3 copies of a file collated using one of the following commands.
$ lp -n 3 -o Collate=True filename
$ lpr -#3 -o Collate=True filename
你能够通过 "-o number-up=2
", "-o
page-set=even
", "-o page-set=odd
", "-o
scaling=200
", "-o natural-scaling=200
"
等等打印机选项来进一步定制打印机操作,详细的文档参见命令行打印和选项。
The following packages for the mail data conversion caught my eyes.
表 11.16. List of packages to help mail data conversion
软件包 | 流行度 | 大小 | 关键词 | 说明 |
---|---|---|---|---|
sharutils
|
V:9, I:123 | 1352 | shar(1), unshar(1), uuencode(1), uudecode(1) | |
mpack
|
V:2, I:26 | 91 | MIME | encoding and decoding of MIME messages: mpack(1) and munpack(1) |
tnef
|
V:7, I:11 | 98 | ms-tnef | unpacking MIME attachments of type "application/ms-tnef" which is a Microsoft only format |
uudeview
|
V:0, I:6 | 97 | encoder and decoder for the following formats: uuencode, xxencode, BASE64, quoted printable, and BinHex | |
readpst
|
I:1 | 21 | PST | convert Microsoft Outlook PST files to mbox format |
![]() |
提示 |
---|---|
The Internet Message Access Protocol version 4 (IMAP4) server (see 第 6.7 节 “POP3/IMAP4 server”) may be used to move mails out from proprietary mail systems if the mail client software can be configured to use IMAP4 server too. |
Mail (SMTP) data should be limited to series of 7 bit data. So binary data and 8 bit text data are encoded into 7 bit format with the Multipurpose Internet Mail Extensions (MIME) and the selection of the charset (see 第 8.3.1 节 “编码的基础知识”).
The standard mail storage format is mbox formatted according to RFC2822 (updated RFC822). See
mbox(5)
(provided by the mutt
package).
For European languages, "Content-Transfer-Encoding:
quoted-printable
" with the ISO-8859-1 charset is usually used for
mail since there are not much 8 bit characters. If European text is encoded
in UTF-8, "Content-Transfer-Encoding: quoted-printable
"
is likely to be used since it is mostly 7 bit data.
For Japanese, traditionally "Content-Type: text/plain;
charset=ISO-2022-JP
" is usually used for mail to keep text in 7
bits. But older Microsoft systems may send mail data in Shift-JIS without
proper declaration. If Japanese text is encoded in UTF-8, Base64 is likely to be used since it contains many 8
bit data. The situation of other Asian languages is similar.
![]() |
注意 |
---|---|
If your non-Unix mail data is accessible by a non-Debian client software which can talk to the IMAP4 server, you may be able to move them out by running your own IMAP4 server (see 第 6.7 节 “POP3/IMAP4 server”). |
![]() |
注意 |
---|---|
If you use other mail storage formats, moving them to mbox format is the good first step. The versatile client program such as mutt(1) may be handy for this. |
You can split mailbox contents to each message using procmail(1) and formail(1).
Each mail message can be unpacked using
munpack(1)
from the mpack
package (or other specialized tools) to
obtain the MIME encoded contents.
如下是关于图形数据转换、编辑和管理的工具包。
表 11.17. 图形数据工具列表
软件包 | 流行度 | 大小 | 关键词 | 说明 |
---|---|---|---|---|
gimp
|
V:97, I:509 | 16255 | 图形(位图) | GNU 图形处理程序 |
imagemagick
|
V:154, I:544 | 191 | 图形(位图) | 图形处理程序 |
graphicsmagick
|
V:7, I:14 | 4820 | 图形(位图) |
image manipulation programs (fork of imagemagick )
|
xsane
|
V:24, I:193 | 913 | 图形(位图) | GTK+-based X11 frontend for SANE (Scanner Access Now Easy) |
netpbm
|
V:32, I:547 | 4230 | 图形(位图) | 图形界面的转换工具 |
icoutils
|
V:8, I:72 | 192 | png↔ico(bitmap) | convert MS Windows icons and cursors to and from PNG formats (favicon.ico) |
scribus
|
V:14, I:28 | 19136 | ps/pdf/SVG/… | Scribus DTP 编辑器 |
libreoffice-draw
|
V:344, I:479 | 8995 | 图形(矢量) | LibreOffice 办公套件-绘画 |
inkscape
|
V:145, I:360 | 102751 | 图形(矢量) | SVG(可升级矢量图形)编辑器 |
dia-gnome
|
V:6, I:11 | 20 | 图形(矢量) | 图表编辑器(GNOME) |
dia
|
V:25, I:41 | 3880 | 图形(矢量) | 图表编辑器(Gtk) |
xfig
|
V:13, I:19 | 1783 | 图形(矢量) | Facility for Interactive Generation of figures under X11 |
pstoedit
|
V:15, I:358 | 667 | ps/pdf→image(vector) | PostScript 和 PDF 文件到可编辑的矢量图形的转换器(SVG) |
libwmf-bin
|
V:14, I:365 | 104 | Windows/image(vector) | Windows metafile (vector graphic data) conversion tools |
fig2sxd
|
V:0, I:0 | 142 | fig→sxd(vector) | 转换 XFig 文件为 OpenOffice.org 绘画格式 |
unpaper
|
V:2, I:15 | 447 | image→image | post-processing tool for scanned pages for OCR |
tesseract-ocr
|
V:4, I:27 | 558 | image→text | 基于惠普的商业 OCR 引擎的免费 OCR 软件 |
tesseract-ocr-eng
|
I:28 | 37486 | image→text | OCR engine data: tesseract-ocr language files for English text |
gocr
|
V:2, I:25 | 494 | image→text | 免费 OCR 软件 |
ocrad
|
V:1, I:7 | 310 | image→text | 免费 OCR 软件 |
eog
|
V:101, I:337 | 10581 | image(Exif) | Eye of GNOME 图像浏览程序 |
gthumb
|
V:15, I:27 | 3238 | image(Exif) | 图像浏览器(GNOME) |
geeqie
|
V:17, I:25 | 1535 | image(Exif) | 基于 GTK+ 的图像浏览器 |
shotwell
|
V:17, I:140 | 5754 | image(Exif) | 数码相片管理器(GNOME) |
gtkam
|
V:0, I:7 | 965 | image(Exif) | application for retrieving media from digital cameras (GTK+) |
gphoto2
|
V:1, I:14 | 969 | image(Exif) | gphoto2 软件是命令行方式的管理数码相机的工具 |
gwenview
|
V:33, I:104 | 4508 | image(Exif) | 图片浏览器(KDE) |
kamera
|
V:4, I:103 | 230 | image(Exif) | KDE 上的支持数码相机的应用软件 |
digikam
|
V:3, I:17 | 1760 | image(Exif) | 用于 KDE 桌面环境的数字照片管理应用 |
exiv2
|
V:5, I:77 | 242 | image(Exif) | EXIF/IPTC 元数据处理工具 |
exiftran
|
V:2, I:26 | 67 | image(Exif) | transform digital camera jpeg images |
jhead
|
V:1, I:13 | 105 | image(Exif) | manipulate the non-image part of Exif compliant JPEG (digital camera photo) files |
exif
|
V:1, I:10 | 370 | image(Exif) | 显示 JPEG 文件中的 EXIF 信息的命令行工具 |
exiftags
|
V:0, I:3 | 205 | image(Exif) | utility to read Exif tags from a digital camera JPEG file |
exifprobe
|
V:0, I:3 | 482 | image(Exif) | 从数码图片中读取元数据 |
dcraw
|
V:3, I:25 | 358 | image(Raw)→ppm | decode raw digital camera images |
findimagedupes
|
V:0, I:1 | 79 | image→fingerprint | 找到相似或重复的图像 |
ale
|
V:0, I:0 | 766 | image→image | merge images to increase fidelity or create mosaics |
imageindex
|
V:0, I:0 | 144 | image(Exif)→html | generate static HTML galleries from images |
outguess
|
V:0, I:0 | 217 | jpeg,png | universal Steganographic tool |
librecad
|
V:12, I:18 | 7762 | DXF | CAD 数据编辑器(KDE) |
blender
|
V:4, I:29 | 101399 | blend, TIFF, VRML, … | 3D content editor for animation etc |
mm3d
|
V:0, I:0 | 4668 | ms3d, obj, dxf, … | 基于 OpenGL 的 3D 模型编辑器 |
open-font-design-toolkit
|
I:0 | 28 | ttf, ps, … | metapackage for open font design |
fontforge
|
V:1, I:10 | 91 | ttf, ps, … | 用于 PS,TrueType 和 OpenType 的字体编辑器 |
xgridfit
|
V:0, I:0 | 898 | ttf | program for gridfitting and hinting TrueType fonts |
![]() |
提示 |
---|---|
在
aptitude(8)
(参考第 2.2.6 节 “aptitude 搜索方式选项”)中用正则表达式
" |
虽然像 gimp(1) 这样的图形界面程序是非常强大的,但像 imagemagick(1) 这样的命令行工具在用脚本自动化处理图像时是很有用的。
实际上的数码相机的图像是可交换的图像文件格式(EXIF),这种格式是在 JPEG 图像文件格式上添加一些元数据标签。它能够保存诸如日期、时间和相机设置的信息。
The Lempel-Ziv-Welch (LZW)无损数据压缩专利已经过期了。使用 LZW 压缩方式的 图形交互格式(GIF)工具现在可以在 Debian 系统上自由使用了。
![]() |
提示 |
---|---|
任何带有可移动记录介质的数码相机或扫描仪都可以在 Linux 上通过 USB 存储读取器来工作,因为它遵循相机文件系统设计规则并且使用 FAT 文件系统,参考第 10.1.7 节 “可移动存储设备”。 |
这里有许多其他用于数据转换的工具。在
aptitude(8)(参考
第 2.2.6 节 “aptitude 搜索方式选项”) 里用正则表达式
"~Guse::converting"
" 来查找如下的软件包。
你能够通过如下的命令从 RPM 格式的包中提取数据。
$ rpm2cpio file.src.rpm | cpio --extract