PHPWord是用纯PHP编写的库,它提供了一组类写入和读取来自不同文档文件格式。当前版本的PHPWord支持微软Office Open XML(OOXML或OpenXML),绿洲办公应用程序开放文档格式该种或ODF),富文本格式(RTF)、HTML和PDF。
一、composer安装phpWord
composer require phpoffice/phpword
composer包地址:https://packagist.org/packages/phpoffice/phpword

二、phpWord 读取 docx 文档(注意是docx格式,doc格式不行)
如果你的文件是doc格式,直接另存为一个docx就行了;如果你的doc文档较多,可以下一个批量转换工具:http://www.batchwork.com/en/doc2doc/download.htm
如果你还没配置自动加载,则先配置一下:
require './vendor/autoload.php';
加载文档:
$dir = str_replace('\\', '/', __DIR__) . '/';
$source = $dir . 'test.docx';
$phpWord = \PhpOffice\PhpWord\IOFactory::load($source);三、关键点
1)对齐方式:\PhpOffice\PhpWord\Style\Paragraph -> getAlignment() 2)字体名称:\PhpOffice\PhpWord\Style\Font -> getName() 3)字体大小:\PhpOffice\PhpWord\Style\Font -> getSize() 4)是否加粗:\PhpOffice\PhpWord\Style\Font -> isBold() 5)读取图片:\PhpOffice\PhpWord\Element\Image -> getImageStringData() 6)ba64格式图片数据保存为图片:file_put_contents($imageSrc, base64_decode($imageData))
四、完整代码
require './vendor/autoload.php';
function docx2html($source)
{
$phpWord = \PhpOffice\PhpWord\IOFactory::load($source);
$html = '';
foreach ($phpWord->getSections() as $section) {
foreach ($section->getElements() as $ele1) {
$paragraphStyle = $ele1->getParagraphStyle();
if ($paragraphStyle) {
$html .= '<p style="text-align:'. $paragraphStyle->getAlignment() .';text-indent:20px;">';
} else {
$html .= '<p>';
}
if ($ele1 instanceof \PhpOffice\PhpWord\Element\TextRun) {
foreach ($ele1->getElements() as $ele2) {
if ($ele2 instanceof \PhpOffice\PhpWord\Element\Text) {
$style = $ele2->getFontStyle();
$fontFamily = mb_convert_encoding($style->getName(), 'GBK', 'UTF-8');
$fontSize = $style->getSize();
$isBold = $style->isBold();
$styleString = '';
$fontFamily && $styleString .= "font-family:{$fontFamily};";
$fontSize && $styleString .= "font-size:{$fontSize}px;";
$isBold && $styleString .= "font-weight:bold;";
$html .= sprintf('<span style="%s">%s</span>',
$styleString,
mb_convert_encoding($ele2->getText(), 'GBK', 'UTF-8')
);
} elseif ($ele2 instanceof \PhpOffice\PhpWord\Element\Image) {
$imageSrc = 'images/' . md5($ele2->getSource()) . '.' . $ele2->getImageExtension();
$imageData = $ele2->getImageStringData(true);
// $imageData = 'data:' . $ele2->getImageType() . ';base64,' . $imageData;
file_put_contents($imageSrc, base64_decode($imageData));
$html .= '<img src="'. $imageSrc .'" style="width:100%;height:auto">';
}
}
}
$html .= '</p>';
}
}
return mb_convert_encoding($html, 'UTF-8', 'GBK');
}
$dir = str_replace('\\', '/', __DIR__) . '/';
$source = $dir . '1.docx';
echo docx2html($source);效果如图:
这是一个简陋的word读取示例,只读取了段落的对齐方式,文字的字体、大小、是否加粗及图片等信息,其他例如文字颜色、行高。。。等等信息都忽悠了。需要的话,请自行查看phpWord源码,看\PhpOffice\PhpWord\Style\xxx 和 \PhpOffice\PhpWord\Element\xxx 等类里有什么读取方法就可以了。
五、可以用以下方法直接获取到完整的html
$phpWord = \PhpOffice\PhpWord\IOFactory::load('xxx.docx');
$xmlWriter = \PhpOffice\PhpWord\IOFactory::createWriter($phpWord, "HTML");
$html = $xmlWriter->getContent();注:html内容里包含了head部分,如果只需要style和body的话,需要自己处理一下;然后图片是base64的,要保存的话,也需要自己处理一下base64数据保存为图片请参考上面代码。
如果只想获取body里的内容,可以参考 \PhpOffice\PhpWord\Writer\HTML\Part\Body 里的 write 方法
$phpWord = \PhpOffice\PhpWord\IOFactory::load('xxxx.docx');
$htmlWriter = \PhpOffice\PhpWord\IOFactory::createWriter($phpWord, "HTML");
$content = '';
foreach ($phpWord->getSections() as $section) {
$writer = new \PhpOffice\PhpWord\Writer\HTML\Element\Container($htmlWriter, $section);
$content .= $writer->write();
}
echo $content;exit;图片的处理的话,暂时没有好办法能在不修改源码的情况下处理好,改源码的话,相关代码在 \PhpOffice\PhpWord\Writer\HTML\Element\Image 里
public function write()
{
if (!$this->element instanceof ImageElement) {
return '';
}
$content = '';
$imageData = $this->element->getImageStringData(true);
if ($imageData !== null) {
$styleWriter = new ImageStyleWriter($this->element->getStyle());
$style = $styleWriter->write();
// $imageData = 'data:' . $this->element->getImageType() . ';base64,' . $imageData;
$imageSrc = 'images/' . md5($this->element->getSource()) . '.' . $this->element->getImageExtension();
// 这里可以自己处理,上传oss之类的
file_put_contents($imageSrc, base64_decode($imageData));
$content .= $this->writeOpening();
$content .= "<img border=\"0\" style=\"{$style}\" src=\"{$imageSrc}\"/>";
$content .= $this->writeClosing();
}
return $content;
}
下载doc文档:
$html ='
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<style>
body{font-size:14px;color:#000000;padding:0;margin:0;font-family:"宋体"}
dd,table,td,p{padding:0;margin:0}
h2{font-size:16px;}
.rs_content{width:880px;margin:0 auto;}
.rs_content h2{height:30px;text-align:center;font-size:16px;line-height:30px}
.rs_content .info{overflow:hidden;}
.rs_content .notice,.information{margin-top:30px;}
.rs_content .notice p{line-height:30px;}
.rs_content table{width:100%;border-collapse:collapse;}
.rs_content table tr.tb{font-weight:bold;}
.rs_content table td{height:30px;line-height:30px;}
.rs_content .info table td{border:#000 1px solid;text-align:center;}
</style>
</head>
<body>
<div class="rs_content">
<h2>下载doc文档</h2>
<div class="info">
<table>
<tr class="tb">
<td>合计</td>
<td>123</td>
<td>¥xxxxxx</td>
<td>456</td>
<td>¥xxxxxx</td>
<td>¥xxxxxx</td>
</tr>
<tr class="tb">
<td>人民币大写:</td>
<td colspan="5">xxxxxx</td>
</tr>
</table>
</div>
<div class="notice">
<p>备注(1)dadsdad撒大多撒多打打大萨达大萨达打打大萨达撒阿达。<br />
</p>
</div>
</div>
</body>
</html>
';
$file_name = "downDoc";
header('Content-type: application/msword');
header("Content-Disposition:filename=$file_name.doc");
echo $html;
die;本文为崔凯原创文章,转载无需和我联系,但请注明来自冷暖自知一抹茶ckhttp://www.cksite.cn