You must Sign In to post a response.
  • Category: ASP.NET

    Convert pdf to excel using C#

    Hi

    I need Convert PDF file into Excel file in C#. Please let me know how to achieve it?

    I dont want to use any third party paid dll for this. Is it feasible?


    Thanks!
    Best Regards,
    Anjali Bansal
  • #767855
    Hi,

    I have tried the below solution once and it worked for me.

    http://www.codeproject.com/Questions/660626/convert-pdf-file-into-excel-sheet

    Kindly check this out.

    Note:
    Most of them are using 3rd party tool for this. Which is simple and free too.

    Thanks,
    Mani

  • #767865
    Hi,

    First extract data from pdf using PdfTextExtractor.GetTextFromPage method to some object like stringbuilder like below,


    StringBuilder text = new StringBuilder();
    PdfReader pdfReader = new PdfReader(fileName);
    for (int page = 1; page <= pdfReader.NumberOfPages; page++)
    {
    ITextExtractionStrategy strategy = new LocationTextExtractionStrategy();
    string currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);
    currentText = Encoding.UTF8.GetString(Encoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.UTF8.GetBytes(currentText)));
    text.Append(currentText);
    pdfReader.Close();
    }


    once you got the result set into stringbuilder then you can export it into excel as usual, refer below link for more details.

    aspforums.net/Threads/180443/Convert-a-PDF-File-to-Excel-File-using-iTextSharp-using-C-Net/

    Hope this helps you...

    --------------------------------------------------------------------------------
    Give respect to your work, Instead of trying to impress your boss.

    N@veen
    Blog : http://naveens-dotnet.blogspot.in/

  • #768417
    try this.

    Add a reference to the "SautinSoft.PdfFocus.dll".

    using System;
    using System.IO;
    using SautinSoft;

    namespace Sample
    {
    class Sample
    {
    static void Main(string[] args)
    {
    string pathToPdf = @"e:\test.pdf";
    string pathToExcel = Path.ChangeExtension(pathToPdf, ".xls");

    SautinSoft.PdfFocus pdffocus = new SautinSoft.PdfFocus();

    pdffocus.ExcelOptions.ConvertNonTabularDataToSpreadsheet = true;

    // 'true' = Preserve original page layout.
    // 'false' = Place tables before text.
    pdffocus.ExcelOptions.PreservePageLayout = true;

    pdffocus.OpenPdf(pathToPdf);

    if (pdffocus.PageCount > 0)
    {
    int result = pdffocus.ToExcel(pathToExcel);

    //Open a produced Excel workbook
    if (result==0)
    {
    System.Diagnostics.Process.Start(pathToExcel);
    }
    }
    }
    }
    }

    Software Developer
    iFour Technolab Pvt. Ltd.

  • #768418
    try this



    protected void HtmlToPdf_Click(object sender, EventArgs e)
    {
    // create the HTML to PDF converter
    HtmlToPdf htmlToPdfConverter = new HtmlToPdf();

    // set browser width
    htmlToPdfConverter.BrowserWidth = int.Parse(textBoxBrowserWidth.Text);

    // set browser height if specified, otherwise use the default
    if (textBoxBrowserHeight.Text.Length > 0)
    htmlToPdfConverter.BrowserHeight = int.Parse(textBoxBrowserHeight.Text);

    // set HTML Load timeout
    htmlToPdfConverter.HtmlLoadedTimeout = int.Parse(textBoxLoadHtmlTimeout.Text);

    // set PDF page size and orientation
    htmlToPdfConverter.Document.PageSize = GetSelectedPageSize();
    htmlToPdfConverter.Document.PageOrientation = GetSelectedPageOrientation();

    // set PDF page margins
    htmlToPdfConverter.Document.Margins = new PdfMargins(0);

    // set a wait time before starting the conversion
    htmlToPdfConverter.WaitBeforeConvert = int.Parse(textBoxWaitTime.Text);

    // convert HTML to PDF
    byte[] pdfBuffer = null;

    if (radioButtonConvertUrl.Checked)
    {
    // convert URL to a PDF memory buffer
    string url = textBoxUrl.Text;

    pdfBuffer = htmlToPdfConverter.ConvertUrlToMemory(url);
    }
    else
    {
    // convert HTML code
    string htmlCode = textBoxHtmlCode.Text;
    string baseUrl = textBoxBaseUrl.Text;

    // convert HTML code to a PDF memory buffer
    pdfBuffer = htmlToPdfConverter.ConvertHtmlToMemory(htmlCode, baseUrl);
    }

    // inform the browser about the binary data format
    HttpContext.Current.Response.AddHeader("Content-Type", "application/pdf");

    // instruct browser how to open PDF as attachment or inline and file name
    HttpContext.Current.Response.AddHeader("Content-Disposition",
    String.Format("{0}; filename=HtmlToPdf.pdf; size={1}",
    checkBoxOpenInline.Checked ? "inline" : "attachment",
    pdfBuffer.Length.ToString()));

    // write the PDF buffer to HTTP response
    HttpContext.Current.Response.BinaryWrite(pdfBuffer);

    // call End() method of HTTP response to stop ASP.NET page processing
    HttpContext.Current.Response.End();
    }

    Software Developer
    iFour Technolab Pvt. Ltd.


  • Sign In to post your comments