I need to use C# programatically to append several preexisting DOCX files into a single, long DOCX file - including special markups like bullets and images. Header and footer information will be stripped out, so those won't be around to cause any problems.
I can find plenty of information about manipulating an individual DOCX file with .NET Framework 3, but nothing easy or obvious about how you would merge files. There is also a third-party program (Acronis.Words) that will do it, but it is prohibitively expensive.
[edit] Automating through Word has been suggested, but my code is going to be running on ASP.NET on an IIS web server, so going out to Word is not an option for me. Sorry for not mentioning that in the first place. [/edit]
Thank you for any and all help!
-
Try automating Word - not the best solution, but should work.
-
I wrote a little test app a while ago to do this. My test app worked with Word 2003 documents (.doc) not .docx, but I imagine the process is the same - I should think all you'd have to change is to use a newer version of the Primary Interop Assembly. This code would look a lot neater with the new C# 4.0 features...
using System; using System.Collections.Generic; using System.Linq; using System.Text; using Microsoft.Office.Interop.Word; using Microsoft.Office.Core; using System.Runtime.InteropServices; using System.IO; namespace ConsoleApplication1 { class Program { static void Main(string[] args) { new Program().Start(); } private void Start() { object fileName = Path.Combine(Environment.CurrentDirectory, @"NewDocument.doc"); File.Delete(fileName.ToString()); try { WordApplication = new ApplicationClass(); var doc = WordApplication.Documents.Add(ref missing, ref missing, ref missing, ref missing); try { doc.Activate(); AddDocument(@"D:\Projects\WordTests\ConsoleApplication1\Documents\Doc1.doc", doc, false); AddDocument(@"D:\Projects\WordTests\ConsoleApplication1\Documents\Doc2.doc", doc, true); doc.SaveAs(ref fileName, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing); } finally { doc.Close(ref missing, ref missing, ref missing); } } finally { WordApplication.Quit(ref missing, ref missing, ref missing); } } private void AddDocument(string path, Document doc, bool lastDocument) { object subDocPath = path; var subDoc = WordApplication.Documents.Open(ref subDocPath, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing); try { object docStart = doc.Content.End - 1; object docEnd = doc.Content.End; object start = subDoc.Content.Start; object end = subDoc.Content.End; Range rng = doc.Range(ref docStart, ref docEnd); rng.FormattedText = subDoc.Range(ref start, ref end); if (!lastDocument) { InsertPageBreak(doc); } } finally { subDoc.Close(ref missing, ref missing, ref missing); } } private static void InsertPageBreak(Document doc) { object docStart = doc.Content.End - 1; object docEnd = doc.Content.End; Range rng = doc.Range(ref docStart, ref docEnd); object pageBreak = WdBreakType.wdPageBreak; rng.InsertBreak(ref pageBreak); } private ApplicationClass WordApplication { get; set; } private object missing = Type.Missing; } }
MadBoy : Works great, but it seems to miss header and footers. Do you know a way to make it merge all headers and footers as well? -
Although automoation through Word is a good solution (and Terence's code is a great example of it), my code is going to be running on top of ASP.NET on an IIS web server, so automating through Word is not an option (Microsoft explains why here).
My apologies for not mentioning that in the question writeup.
So, I need to be able to programatically append DOCX documents without using Word.
-
You don't need to use automation. DOCX files are based on the OpenXML Formats. They are just zip files with a bunch of XML and binary parts (think files) inside. You can open them with the Packaging API (System.IO.Packaging in WindowsBase.dll) and manipulate them with any of the XML classes in the Framework.
Check out OpenXMLDeveloper.org for details.
Dave Markle : Automation is from Satan. Good answer, Rob. -
Its quit complex so the code is outside the scope of a forum post, I'd be writing your App for you, but to sum up.
- Open both documents as Packages
- Loop through the second docuemnt's parts looking for images and embbed stuff
- Add these parts to the first package remembering the new relationship IDs(this involves alot of stream work)
- open the document.xml part in the second document and replace all the old relationship IDs with the new ones- Append all the child nodes, but not the root node, of the second document.xml to the first document.xml
- save all the XmlDocuments and Flush the Package
-
You want to use AltChunks and the OpenXml SDK 1.0 (at a minimum, 2.0 if you can). Check out Eric White's blog for more details and just as a great resource!. Here is a code sample that should get you started, if not work immediately.
public void AddAltChunkPart(Stream parentStream, Stream altStream, string altChunkId) { //make sure we are at the start of the stream parentStream.Position = 0; altStream.Position = 0; //push the parentStream into a WordProcessing Document using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(parentStream, true)) { //get the main document part MainDocumentPart mainPart = wordDoc.MainDocumentPart; //create an altChunk part by adding a part to the main document part AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(altChunkPartType, altChunkId); //feed the altChunk stream into the chunk part chunk.FeedData(altStream); //create and XElement to represent the new chunk in the document XElement newChunk = new XElement(altChunk, new XAttribute(relId, altChunkId)); //Add the chunk to the end of the document (search to last paragraph in body and add at the end) wordDoc.MainDocumentPart.GetXDocument().Root.Element(body).Elements(paragraph).Last().AddAfterSelf(newChunk); //Finally, save the document wordDoc.MainDocumentPart.PutXDocument(); } //reset position of parent stream parentStream.Position = 0; }
-
Hi,
I had made an application in C# to merge RTF files into one doc,Iam hopeful it should work for DOC and DOCX files as well.
Word._Application wordApp; Word._Document wordDoc; object outputFile = outputFileName; object missing = System.Type.Missing; object vk_false = false; object defaultTemplate = defaultWordDocumentTemplate; object pageBreak = Word.WdBreakType.wdPageBreak; string[] filesToMerge = new string[pageCounter]; filestoDelete = new string[pageCounter]; for (int i = 0; i < pageCounter; i++) { filesToMerge[i] = @"C:\temp\temp" + i.ToString() + ".rtf"; filestoDelete[i] = @"C:\temp\temp" + i.ToString() + ".rtf"; } try { wordDoc = wordApp.Documents.Add(ref missing, ref missing, ref missing, ref missing); } catch(Exception ex) { Console.WriteLine(ex.Message); } Word.Selection selection= wordApp.Selection; foreach (string file in filesToMerge) { selection.InsertFile(file, ref missing, ref missing, ref missing, ref missing); selection.InsertBreak(ref pageBreak); } wordDoc.SaveAs(ref outputFile, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing);
Hope this helps!
-
Hi,
In spite of all good suggestions and solutions above I developed a different one. In my opnion avoid using Word in server applications is a must. So I worked with OpenXML. But I did not worked with AltChunk. I added text to original body. I receive a List of byte[] instead a List of file names but you can easily change the code.
using System; using System.Collections.Generic; using System.Globalization; using System.IO; using System.Xml.Linq; using DocumentFormat.OpenXml.Packaging; using DocumentFormat.OpenXml.Wordprocessing; namespace OfficeMergeControl { public class CombineDocs { public byte[] OpenAndCombine( IList<byte[]> documents ) { MemoryStream mainStream = new MemoryStream(); mainStream.Write(documents[0], 0, documents[0].Length); mainStream.Position = 0; int pointer = 1; byte[] ret; try { using (WordprocessingDocument mainDocument = WordprocessingDocument.Open(mainStream, true)) { XElement newBody = XElement.Parse(mainDocument.MainDocumentPart.Document.Body.OuterXml); for (pointer = 1; pointer < documents.Count; pointer++) { WordprocessingDocument tempDocument = WordprocessingDocument.Open(new MemoryStream(documents[pointer]), true); XElement tempBody = XElement.Parse(tempDocument.MainDocumentPart.Document.Body.OuterXml); newBody.Add(tempBody); mainDocument.MainDocumentPart.Document.Body = new Body(newBody.ToString()); mainDocument.MainDocumentPart.Document.Save(); mainDocument.Package.Flush(); } } } catch (OpenXmlPackageException oxmle) { throw new OfficeMergeControlException(string.Format(CultureInfo.CurrentCulture, "Error while merging files. Document index {0}", pointer), oxmle); } catch (Exception e) { throw new OfficeMergeControlException(string.Format(CultureInfo.CurrentCulture, "Error while merging files. Document index {0}", pointer), e); } finally { ret = mainStream.ToArray(); mainStream.Close(); mainStream.Dispose(); } return (ret); } } }
I hope it help you.
MadBoy : Nice, thanks for this.MadBoy : Does it also adds pagebreak ?GRGodoi : Hi MadBoy, I checked and it preserve original page breaks and add new page breaks when needed.MadBoy : Would this work with images, and headers and footers (different headers, footers, images across each document)?
0 comments:
Post a Comment