UDF to return figures from MS SQL Server 2k5

May 24, 2011 at 3:32 PM

Hello

I am new to Excel-DNA and I think this is amazing. Thanks to the team who created Excel-DNA. I am just a newbie to excel development (I'm an accountant with IT background). I created the below function in VS2010 then compiled to dll and combined with execl-dna.xll.

This function works well but it seems to have performance issue when I use it in my worksheet which would include about 100 cells using this function concurrently. It took 15 minutes to finish the calculation. Could you please advise what can be improved with the below function.

Many thanks, Duan.

Public Shared Function GL_ACTUAL(ByVal AccMask As String, ByVal CCMask As String, ByVal FrmDte As Date, ByVal ToDte As Date) As Double
        Dim sConnect As String = ""
        Dim strSQL As String
        Dim Actual As Double = 0
        'Dim svrPar As String
        '
        On Error GoTo SQLERROR
        Dim tmpAccMask As String = Replace(AccMask, "*", "%")
        Dim tmpCCMask As String = Replace(CCMask, "*", "%")
        'Connection string
       
        '
        sConnect = "Data Source= x.x.x.x;Initial Catalog =xxxxx;Uid=xxxx;Pwd=xxxx;"
        'create strSQL

        strSQL = "SELECT SUM(GL06004) "
        strSQL = strSQL & "FROM GL0601" & Format(FrmDte, "yy") & " "
        strSQL = strSQL & "WHERE GL06003>='" & FrmDte & "' AND GL06003<='" & ToDte
        strSQL = strSQL & "' And GL06012 Not In ('/','U','V','W','X','Y','Z','\','a','c','d','e','f','g','h')"
        strSQL = strSQL & "AND GL06001 LIKE '" & String.Concat(tmpAccMask, tmpCCMask) & "'"

        ' Initialize database connection
        Dim cnScala As New SqlConnection(connectionString:=sConnect)
        Dim cmdScala As New SqlCommand(cmdText:=strSQL, connection:=cnScala)

        'Open the connection
        cnScala.Open()

        Actual = cmdScala.ExecuteScalar

        'Reset the connection
        cmdScala.Dispose()
        cmdScala = Nothing
        cnScala.Dispose()
        cnScala.Close()
        cnScala = Nothing
        Return Actual
SQLERROR:
        Return Actual
    End Function

Coordinator
May 24, 2011 at 6:45 PM

Hi Duan,

I have something similar, where I call SQL Server stored procedures from UDF functions. The SPs are optimised to individually execute very quickly, and the performance is usually OK, certainly usable for an interactive user making reporting with some hundreds of .function calls without using any fancy tricks.

 

You need to distinguish between the time each SQL Server query takes, and the round-trip time for making and processing the data. Out of the 15 minutes your 100 calls take, I guess 14:30 is purely due to the query? You could test this by changing the query to be something like "SELECT 1+2" and testing 100 calls. I think you'll have to optimize your database indices or your queries a bit.

I don't think there's a magic fix for making the Excel side faster, though you could try to do less in each function call by:

* Keeping a connection, and prepared command as Shared fields in the Class, and initializing these only once.

* Use a stored procedure rather than the SQL, so that the query string need not be processed every time, 

* or at least use a parameterised SQL query, set up all the parameters only once, and in the UDF call only bind the parameter values and call the function. This will also prevent SQL server from processing the query from scratch every time.

* You could make an function that returns a whole block of data to Excel, rather than have single functions.

You might also want to consider caching the results of these calls, so that only the first time the function is called it actually goes to the database.

Then there was a more advanced discussion about asynchronous approached and multi-threading for helping with this scenario - see this thread on the Google group: http://groups.google.com/group/exceldna/browse_frm/thread/4c1a16f6cc7e813e.

(If you're using an in-process Access database (i just noticed the tmpAccMask stuff) there are some tricks to get high-performance index lookups so that you can do about 20000 lookup functions per second.)

If you find anything interesting, let us know.

Regards,

Govert

May 25, 2011 at 1:34 PM

Hi Govert

Thank you very much for the good tips. Functions that return whole block of data is a good idea. I will try it out with the store procedure as well. Appreciate your reply.

Best regards

Duan

May 25, 2011 at 2:35 PM

Hi Govert

I was too exciting about the idea of function returns a whole block of data...but on the second thought I could not figure out what kind of function could do that. Normally when you input a function into a cell, it should return a value into that cell only. I need your guide again on the idea of creating this type of function.

Thanks and regards

Duan

Coordinator
May 25, 2011 at 5:01 PM
Edited May 25, 2011 at 5:02 PM

Hi Duan,

I think the best approach is to make a macro that dumps the data instead of a function. So you put the parameters on a sheet (perhaps the connection string too) and then press a button which does the query and creates a result sheet with the data, or adds the data to the query sheet.

 

If you want to do it with a worksheet function, you might need to read about array formulae (http://www.cpearson.com/excel/ArrayFormulas.aspx and http://www.decisionmodels.com/optspeedj.htm).

For these you declare the function as returning a two-dimensional array, e.g.

   Public Shared Function GetStuff() As Object(,)

      ...

   End Function

There's a C# example that shows how you could automatically resize the formula range in Distribution\Samples\ArrayResizer.dna. But that can be a bit tricky.

Regards,

Govert