Friday, February 4, 2011

SQL Server, nvarchar(MAX) or ntext, image or varbinary?

When should I choose one or the other? What are the implications regarding space and (full-text) indexing?

BTW: I'm currently using SQL Server 2005 planing to upgrade to 2008 in the following months.

Thanks

  • Once you put it in the blob, it's going to be difficult to be used for normal SQL comparison. See Using Large-Value Data Types.

    From eed3si9n
  • The new (max) fields make it a lot easier to deal with the data from .NET code. With varbinary(max), you simply set the value of a SqlParameter to a byte array and you are done. WIth the image field, you need to write a few hundred lines of code to stream the data into and out of the field.

    Also, the image/text fields are deprecated in favor of varbinary(max) and varchar(max), and future versions of Sql Server will discontinue support for them.

cfqueryparam with like operator in coldfusion

I have been tasked with going through a number of ColdFusion sites that have recently been the subject of a rather nasty SQL Injection attack. Basically my work involves adding <cfqueryparam> tags to all of the inline sql. For the most part I've got it down, but can anybody tell me how to use cfqueryparam with the LIKE operator?

If my query looks like this:

select * from Foo where name like '%Bob%'

what should my <cfqueryparam> tag look like?

  • select * from Foo where name like '%' + <cfqueryPARAM... > + '%'
    
  • @Joel, I have to disagree.

    select a,b,c
    from Foo
    where name like <cfqueryparam cfsqltype="columnType" value="%#variables.someName#%" />
    

    1) Never suggest to someone that they should "select star." Bad form! Even for an example! (Even copied from the question!)

    2) The query is pre-compiled and you should include the wild card character(s) as part of the parameter being passed to the query. This format is more readable and will run more efficiently.

    3) When doing string concatenation, use the ampersand operator (&), not the plus sign. Technically, in most cases, plus will work just fine... until you throw a NumberFormat() in the middle of the string and start wondering why you're being told that you're not passing a valid number when you've checked and you are.

    Joel Coehoorn : Agree on not selecting *, but I was just matching the sample query in the original question.
  • I can't vote yet but would lean towards Adam's answer.

    From Sam Farmer

Is there an event that triggers if the number of ListViewItems in a ListView changes? (Windows Forms)

I'd like to enable/disable some other controls based on how many items are in my ListView control. I can't find any event that would do this, either on the ListView itself or on the ListViewItemCollection. Maybe there's a way to generically watch any collection in C# for changes?

I'd be happy with other events too, even ones that sometimes fire when the items don't change, but for example the ControlAdded and Layout events didn't work :(.

  • I can't find any events that you could use. Perhaps you could subclass ListViewItemCollection, and raise your own event when something is added, with code similar to this.

    Public Class MyListViewItemCollection
        Inherits ListView.ListViewItemCollection
    
        Public Event ItemAdded(ByVal Item As ListViewItem)
    
        Sub New(ByVal owner As ListView)
            MyBase.New(owner)
        End Sub
    
        Public Overrides Function Add(ByVal value As System.Windows.Forms.ListViewItem) As System.Windows.Forms.ListViewItem
            Dim Item As ListViewItem
    
            Item = MyBase.Add(value)
    
            RaiseEvent ItemAdded(Item)
    
            Return Item
        End Function
    End Class
    
    From Kibbee
  • @ Kibbee: OK, but then how would I get my ListView to use that collection internally?

    From Domenic
  • I think the best thing that you can do here is to subclass ListView and provide the events that you want.

    From John
  • @Domenic

    Not too sure, Never quite got that far in the thought process.

    Another solution might be to extend ListView, and when adding and removing stuff, instead of calling .items.add, and items.remove, you call your other functions. It would still be possible to add and remove without events being raised, but with a little code review to make sure .items.add and .items.remove weren't called directly, it could work out quite well. Here's a little example. I only showed 1 Add function, but there are 6 you would have to implement, if you wanted to have use of all the available add functions. There's also .AddRange, and .Clear that you might want to take a look at.

    Public Class MonitoredListView
        Inherits ListView
    
        Public Event ItemAdded()
        Public Event ItemRemoved()
    
        Public Sub New()
            MyBase.New()
        End Sub
    
        Public Function AddItem(ByVal Text As String) As ListViewItem
            RaiseEvent ItemAdded()
    
            MyBase.Items.Add(Text)
        End Function
    
        Public Sub RemoveItem(ByVal Item As ListViewItem)
            RaiseEvent ItemRemoved()
    
            MyBase.Items.Remove(Item)
        End Sub
    
    End Class
    
    From Kibbee

Do you know of any "best practice" or "what works" vi tutorial for programmers?

There are thousands of vi tutorials on the web, most of them generically listing all the commands. There are even videos on youtube which show basic functionality. But does anyone know of a vi tutorial which focuses on the needs of programmers? For example when I program in Perl with vi, moving to the "next paragraph" is meaningless. I want to know which commands seasoned vi users combine to e.g. copy everything inside of parentheses, or copy a functio, or copy and paste a variable (e.g. 2yw), etc. I am sure there are lots of functions using multiple-file capability, and the maps, macros, reading in of files for template code, regular expression search, jumping to functions, perhaps minimal code completion, or other features that emulate what programmers have gotten used to in Visual Studio and Eclipse, etc.

  • A nice collection of vimtips.

    And the best Vim cheatsheet around.

    From cschol
  • Try a search on 'vim'

  • I just ended up reading the vim manual a few times, over the years, picking up useful features on each iteration.

    One thing that really made vim work for me as a perl IDE was starting to use tags, as explained here: http://www.vim.org/tips/tip.php?tip_id=94. Using the pltags script that ships with vim, you can jump around between modules to find your functions, methods, etc.

    From zigdon
  • If you are a beginner, vimtutor would be a good way to start with. (Type vimtutor on your shell and get going).

    And once you get hold of the basics of vim, you can look around web and figure out things for yourself.

    This and this may be an interesting read.

    From Jagmal

LINQ to SQL Mapping From Money to Double

I'm working with LINQ for the first time and wanted to get the Mapping to work when I have a money type in SQL, but my domain object property is of type double. How can I express this in the XML file, or in code so that the mapping does not throw the usual "invalid cast" exception?

  • pretty sure Money maps to System.Decimal

    Check here

  • Correct! This now works, but what about something like smallint to integer?

  • In the DBML XML file, you can set the Expression attribute of a Column element to something like this:

     <Column Name="Table1.Amount" DbType="smallint" Type="System.Int32" 
             Expression="CAST(Table1.Amount as int)" />
    
  • Excellent! Thanks again for the help!

  • Slightly off topic, but this is something everyone should know when working with the Money type in SQLServer.

    You don't want to use a double, you want to use a decimal.

    A Double is a long float, and floating point arithmetic should never be used for financial calculations.

    Think of this, the fractions 1/3, 1/3 and 1/3 equal 1. However, when expressed as a double ie:

    .3333 + .3333 + .3333 = .9999 not 1.

    You may think that losing a thousandth of a cent is trivial, but its not when your working with someone elses money.

    Use System.Decimal.

    From FlySwat

Javascript and CSS parsing performance

I am trying to improve the performance of a web application. I have metrics that I can use to optimise the time taken to return the main HTML page, but I'm concerned about the external CSS and Javascript files that are included from these HTML pages. These are served statically, with HTTP Expires headers, but are shared between all the pages of the application.

I'm concerned that the browser has to parse these CSS and Javascript files for each page that is displayed and so having all the CSS and Javascript for the site shared into common files will negatively affect performance. Should I be trying to split out these files so I link from each page to only the CSS and Javascript needed for that page, or would I get little return for my efforts?

Are there any tools that could help me generate metrics for this?

  • I believe YSlow does, but be aware that unless all requests are over a loopback connection you shouldn't worry. The HTTP overhead of split-up files will impact performance far more than parsing, unless your CSS/JS files exceed several megabytes.

  • Context: While it's true that HTTP overhead is more significant than parsing JS and CSS, ignoring the impact of parsing on browser performance (even if you have less than a meg of JS) is a good way to get yourself in trouble.

    YSlow, Fiddler, and Firebug are not the best tools to monitor parsing speed. Unless they've been updated very recently, they don't separate the amount of time required to fetch JS over HTTP or load from cache versus the amount of time spent parsing the actual JS payload.

    Parse speed is slightly difficult to measure, but we've chased this metric a number of times on projects I've worked on and the impact on pageloads were significant even with ~500k of JS. Obviously the older browsers suffer the most...hopefully Chrome, TraceMonkey and the like help resolve this situation.

    Suggestion: Depending on the type of traffic you have at your site, it may be well worth your while to split up your JS payload so some large chunks of JS that will never be used on a the most popular pages are never sent down to the client. Of course, this means that when a new client hits a page where this JS is needed, you'll have to send it over the wire.

    However, it may well be the case that, say, 50% of your JS is never needed by 80% of your users due to your traffic patterns. If this is so, you should definitely user smaller, packaged JS payloads only on pages where the JS is necessary. Otherwise 80% of your users will suffer unnecessary JS parsing penalties on every single pageload.

    Bottom Line: It's difficult to find the proper balance of JS caching and smaller, packaged payloads, but depending on your traffic pattern it's definitely well worth considering a technique other than smashing all of your JS into every single pageload.

    From kamens
  • To add to kamen's great answer, I would say that on some browsers, the parse time for larger js resources grows non-linearly. That is, a 1 meg JS file will take longer to parse than two 500k files. So if a lot of your traffic is people who are likely to have your JS cached (return visitors), and all your JS files are cache-stable, it may make sense to break them up even if you end up loading all of them on every pageview.

    From levik

File format for generating dynamic reports in applications

We generate dynamic reports in all of our business web applications written for .Net and J2EE. On the server side we use ActiveReports.Net and JasperReports to generate the reports. We then export them to PDF to send down to the browser.

Our clients all use Adobe Reader. We have endless problems with the different versions of Adobe Reader and how they are setup on the client.

What file format/readers are others using for their dynamic reports? We need something that allows for precise layout as many of the reports are forms that are printed with data from out systems. HTML is not expressive enough.

  • I've used SQL Reporting Services for this purpose. You can design a report template in Visual Studio or generate the XML for the report on the fly in code. You can then have SSRS export the report to about 10 different formats and send to the client including pdf, excel, html, etc. You can also write your own plugin to export to your own format.

    Crystal Reports has a similar product thats more expensive but has a better report designer.

  • I've always had the most success using PDFs to accomplish this. I can't think of a more universally acceptable format that does what you are trying to do. Rather than looking for another format, perhaps it would be better to try to understand how to overcome the problems that you are experiencing with Acrobat on the client side. Can you provide some more information on the types of problems that you are experiencing with Acrobat?

    From Joe Barone
  • Please keep in mind this is a business intranet application. We require that PDFs be opened in the browser so we know when they navigate away to ask them if the report printed correctly or if they would like to save the report on the server for later viewing/printing. This is a client setting that we have no control over. Also several different version combinations of IE and Reader do not play well together. Sometimes the pdf won't even open depending on client settings. But if you copy the pdf out of the temporary internet files and open directly, it works fine. Maybe this is more of an IE thing, but we are stuck with that problem.

  • I does know only 3(4) possible viewer(formats) for reporting in browser.

    • PDF
    • Flash
    • Java
    • (Silverlihgt)

    For all 3 there are reporting solutions. Silverlight are to new and I does not know a solution. You can test how flash and Java in your intranet work and then search a reporting solution. I think PDF should be made the few problems if you use the newest readers. The old readers has many bad bugs.

    From Horcrux7

Is there a way to ensure entire code block execution in a .NET thread?

In my C# program, I have a thread that represents a running test, which can be aborted by a click on a button labeled "Interrupt execution". In order for the thread (and therefore the test) to terminate in an elegant manner (and do some vital work beforehand), this button is enabled only in some well-defined moments, in which I catch ThreadAbortedException, do Thread.ResetAbort() and die beautifully (that is, the thread).

The problem is that, in the time window in which aborting is possible, there are some tasks that need to be done from start to finish once initiated, and, so, I fear TAE. Locks don't provide a solution for this, and, although finally blocks do, I don't find it elegant to wrap important code in the following manner:

try {
} finally {
  // vital code
}

However, I didn't find any other solution.

Is this another way to delay the interference of ThreadAbortException until the end of the block?

  • The situations under which a ThreadAbortException will be generated can be complex. A better option might be to ask why you are using ThreadAbortExceptions at all.

    A better pattern would be to simply use a flag to indicate to the running thread that it should abort, and then test the flag regularly while it is working. If it is set you can throw an exception at that point if you want.

    With this pattern you are totally in control of when the thread will act on the abort, and don't have to worry about the critical operations being interrupted.

    From Rob Walker
  • Use Thread.BeginCriticalRegion()

    Notifies a host that execution is about to enter a region of code in which the effects of a thread abort or unhandled exception might jeopardize other tasks in the application domain.

    Thread.BeginCriticalRegion()
    
    //do important stuff here
    
    Thread.EndCriticalRegion()
    
    André Neves : Although BeginCriticalRegion() might, in a way, ensure code execution, because your application could halt otherwise, I wanted a device which disallowed ThreadAbortExceptions from interrupting my code.
    From Corey

Why do ruby setters need "self." qualification within the class?

Ruby setters -- whether created by (c)attr_accessor or manually -- seem to be the only methods that need "self." qualification when accessed within the class itself. This seems to put Ruby alone the world of languages:

  • all methods need self/this (like perl, and I think Javascript)
  • no methods require self/this is (C#, Java)
  • only setters need self/this (ruby??)

The best comparison is C# vs ruby, because both languages support accessor methods which work syntactically just like class instance variables: foo.x = y, y = foo.x . C# calls them properties.

Here's a simple example; the same program in ruby then C#

#!/usr/bin/ruby

class A
  def qwerty; @q; end                   # manual getter
  def qwerty=(value); @q = value; end   # manual setter, but attr_accessor is same 
  def asdf; self.qwerty = 4; end        # "self." is necessary in ruby?
  def xxx; asdf; end                    # we can invoke nonsetters w/o "self."
  def dump; puts "qwerty = #{qwerty}"; end
end

a = A.new
a.xxx
a.dump

take away the self.qwerty =() and it fails (ruby 1.8.6 on linux & osx). Now C#,

using System;

public class A {
  public A() {}
  int q;
  public int qwerty {
    get { return q; }
    set { q = value; }
  }
  public void asdf() { qwerty = 4; } // C# setters work w/o "this."
  public void xxx()  { asdf(); }     // are just like other methods
  public void dump() { Console.WriteLine("qwerty = {0}", qwerty); }
}

public class Test {
  public static void Main() {
    A a = new A();
    a.xxx();
    a.dump();
  }
}

Question: Is this true? Are there other occasions besides setters where self is necessary?

UPDATE:

Thanks all for the feedback. First let me be more precise about the concluding question.

Question at Bottom: Are there other occasions where a ruby method cannot be invoked without self?

I agree, there are lots of cases where self becomes necessary. This is not unique to ruby, just to be clear:

using System;

public class A {
  public A() {}
  public int test { get { return 4; }}
  public int useVariable() {
    int test = 5;
    return test;
  }
  public int useMethod() {
    int test = 5;
    return this.test;
  }
}

public class Test {
  public static void Main() {
    A a = new A();
    Console.WriteLine("{0}", a.useVariable()); // prints 5
    Console.WriteLine("{0}", a.useMethod());   // prints 4
  }
}

Same ambiguity is resolved in same way. But while subtle I'm asking about the case where

  • A method has been defined, and
  • No local variable has been defined, and

we encounter

qwerty = 4

Ambiguity: Is this a method invocation or an new local variable assignment?

The title question as to why ruby always treats this as an assignment, is perhaps best answered by ben. Let me paraphrase

Summary: The parser could treat "symbol =" as an lvalue and dynamically decide between assignment and invocation. The dynamic nature of ruby means every assignment potentially faces this ambiguity, so in the interest of performance, ruby treats this as assignment always. C# benefits from knowing what all the methods are, and treats this case the opposite way (as a method invocation).

  • Well, I think the reason this is the case is because "qwerty = 4" is ambiguous... are you defining a new variable called "qwerty" or calling the setter? Ruby resolves this ambiguity by saying it will create a new variable, thus the "self." is required.

    Here is another case where you need "self.":

    class A
      def test
        4
      end
      def use_variable
        test = 5
        test
      end
      def use_method
        test = 5
        self.test
      end
    end
    a = A.new
    a.use_variable # returns 5
    a.use_method   # returns 4
    

    As you can see, the access to "test" is ambiguous, so the "self." is required.

    EDIT: Also, this is why the C# example is actually not a good comparison, because you define variables in a way that is unambiguous from using the setter... if you had defined a variable in C# that was the same name as the accessor, you would need to qualify calls to the accessor with "this." just like the ruby case.

    From Mike Stone
  • The important thing to remember here is that Ruby methods can be (un)defined at any point, so to intelligently resolve the ambiguity, every assignment would need to run code to check whether there is a method with the assigned-to name at the time of assignment.

    From ben
  • @Purfideas

    My point before about C# not being a good comparison was that it is a completely different case because "qwerty = 4" is UNAMBIGUOUS in C# when there is no variable defined... it has nothing to do with knowing what all the methods are. That simply is not how you define a variable, whereas in ruby that expression alone IS ambiguous (as both variable definition and method invocation).

    I seriously doubt it has anything to do with performance... because consider this for a moment: let's say you WANT to define a variable that has the same name as a setter... how would you do this syntactically if "variable=" always invoked a method, if there is one? The answer is you couldn't unless you introduced a new language construct. However, with how the language ACTUALLY works, there already is a way to both define a variable and invoke the setter. With this consideration in mind, it seems a no-brainer to me to have the ambiguity be resolved by creating the variable... and this isn't even considering the fact that Ruby just was not designed for performance.

    "Why do ruby setters need “self.” qualification within the class?" Because of the ambiguity in the language of variable creation vs method invocation.... C# doesn't have this issue because this ambiguity just does NOT exist... the self/this is required in both languages when the ambiguity exists in both languages. (this is what I was trying to point out before)

    From Mike Stone
  • In your particular test case, there does not appear to be any reason not to use the @ syntax from within the class. Your setter isn't performing any data validation, so there's nothing lost by referring to it with the instance variable syntax.

    You do have to use self for those cases where you'd like to use a data-checking setter, but I have run into very few circumstances where I needed to use a data-checking version of a method from within my object. My checks are typically put in place to prevent invalid values supplied when external objects are using my object's interface.

  • @Mike Stone

    Hi! I understand and appreciate the points you've made and your example was great. Believe me when I say, if I had enough reputation, I'd vote up your response. Yet we still disagree:

    • on a matter of semantics, and
    • on a central point of fact

    First I claim, not without irony, we're having a semantic debate about the meaning of 'ambiguity'.

    When it comes to parsing and programming language semantics (the subject of this question), surely you would admit a broad spectrum of the notion 'ambiguity'. Let's just adopt some random notation:

    1. ambiguous: lexical ambiguity (lex must 'look ahead')
    2. Ambiguous: grammatical ambiguity (yacc must defer to parse-tree analysis)
    3. AMBIGUOUS: ambiguity knowing everything at the moment of execution

    (and there's junk between 2-3 too). All these categories are resolved by gathering more contextual info, looking more and more globally. So when you say,

    "qwerty = 4" is UNAMBIGUOUS in C# when there is no variable defined...

    I couldn't agree more. But by the same token, I'm saying

    "qwerty = 4" is un-Ambiguous in ruby (as it now exists)

    "qwerty = 4" is Ambiguous in C#

    And we're not yet contradicting each other. Finally, here's where we really disagree: Either ruby could or could not be implemented without any further language constructs such that,

    For "qwerty = 4," ruby UNAMBIGUOUSLY invokes an existing setter if there
    is no local variable defined

    You say no. I say yes; another ruby could exist which behaves exactly like the current in every respect, except "qwerty = 4" defines a new variable when no setter and no local exists, it invokes the setter if one exists, and it assigns to the local if one exists. I fully accept that I could be wrong. In fact, a reason why I might be wrong would be interesting.

    Let me explain.

    Imagine you are writing a new OO language with accessor methods looking like instances vars (like ruby & C#). You'd probably start with conceptual grammars something like:

      var = expr    // assignment
      method = expr // setter method invocation
    

    But the parser-compiler (not even the runtime) will puke, because even after all the input is grokked there's no way to know which grammar is pertinent. You're faced which a classic choice. I can't be sure of the details, but basically ruby does this:

      var = expr    // assignment (new or existing)
      // method = expr, disallow setter method invocation without .
    

    that is why it's un-Ambiguous, while and C# does this:

      symbol = expr // push 'symbol=' onto parse tree and decide later
                    // if local variable is def'd somewhere in scope: assignment
                    // else if a setter is def'd in scope: invocation
    

    For C#, 'later' is still at compile time.

    I'm sure ruby could do the same, but 'later' would have to be at runtime, because as ben points out you don't know until the statement is executed which case applies.

    My question was never intended to mean "do I really need the 'self.'?" or "what potential ambiguity is being avoided?" Rather I wanted to know why was this particular choice made? Maybe it's not performance. Maybe it just got the job done, or it was considered best to always allow a 1-liner local to override a method (a pretty rare case requirement) ...

    But I'm sort of suggesting that the most dynamical language might be the one which postpones this decision the longest, and chooses semantics based on the most contextual info: so if you have no local and you defined a setter, it would use the setter. Isn't this why we like ruby, smalltalk, objc, because method invocation is decided at runtime, offering maximum expressiveness?

    From Purfideas

How do you prevent SQL injection in LAMP applications?

Here are a few possibilities to get the conversation started:

  1. Escape all input upon initialization.
  2. Escape each value, preferably when generating the SQL.

The first solution is suboptimal, because you then need to unescape each value if you want to use it in anything other than SQL, like outputting it on a web page.

The second solution makes much more sense, but manually escaping each value is a pain.

I'm aware of prepared statements, however I find MySQLi cumbersome. Also, separating the query from the inputs concerns me, because although it's crucial to get the order correct it's easy to make a mistake, and thus write the wrong data to the wrong fields.

  • Prepared statements are the best answer. You have testing because you can make mistakes!

    See this question.

    From Rob Walker
  • @Jonathan Holland: This question is specifically about MySQL. I appreciate your sentiment, but it's not always possible to switch databases.

    However, if I were to switch databases, which would you suggest, and why? What interface does your preferred database use, and what's your opinion of it?

  • I've always used the first solution because 99% of the time, variables in $_GET, $_POST, and $_COOKIE are never outputted to the browser. You also won't ever mistakenly write code with an SQL injection (unless you don't use quotes in the query), whereas with the second solution you could easily forget to escape one of your strings eventually.

    Actually, the reason I've always done it that way was because all my sites had the magic_quotes setting on by default, and once you've written a lot of code using one of those two solutions, it takes a lot of work to change to the other one.

    From yjerem
  • as @Rob Walker states, parameterized queries are your best bet. If you're using the latest and greatest PHP, I'd highly recommend taking a look at PDO (PHP Data Objects). This is a native database abstraction library that has support for a wide range of databases (including MySQL of course) as well as prepared statements with named parameters.

    From pix0r
  • I would go with using prepared statements. If you want to use prepared statements, you probably want to check out the PDO functions for PHP. Not only does this let you easily run prepared statements, it also lets you be a little more database agnostic by not calling functions that begin with mysql_, mysqli_, or pgsql_.

    From Kibbee
  • @Jeremy Ruten: I really hope you don't still rely on magic_quotes. It's deprecated in PHP 5, gone in PHP 6, and notorious in the programming community as an example of a worst practice.

    I often find myself outputting values I get as input, for example when I want to let a user confirm what they entered. If I were to use your method, I would have to use stripslashes() followed by htmlspecialchars(), and it would be all too easy to accidentally end up escaping values multiple times.

  • @pix0r: Thanks for the tip. Named parameters make a lot of sense, and although PDO's interface isn't ideal IMNSHO, at least it's standard.

  • PDO may be worth it some day, but it's not just there yet. It's a DBAL and it's strengh is (supposedly) to make switching between vendors more easier. It's not really build to catch SQL injections.

    Anyhow, you want to escape and sanatize your inputs, using prepared statements could be a good measure (I second that). Although I believe it's much easier, e.g. by utilizing filter.

    From Till

Slow SQL Query due to inner and left join?

Can anyone explain this behavior or how to get around it?

If you execute this query:

select * 
from TblA
left join freetexttable ( TblB, *, 'query' ) on TblA.ID = [Key]
inner join DifferentDbCatalog.dbo.TblC on TblA.ID = TblC.TblAID

It will be very very very slow.

If you change that query to use two inner joins instead of a left join, it will be very fast. If you change it to use two left joins instead of an inner join, it will be very fast.

You can observe this same behavior if you use a sql table variable instead of the freetexttable as well.

The performance problem arises any time you have a table variable (or freetexttable) and a table in a different database catalog where one is in an inner join and the other is in a left join.

Does anyone know why this is slow, or how to speed it up?

  • Index the field you use to perform the join.

    A good rule of thumb is to assign an index to any commonly referenced foreign or candidate keys.

  • What you should usually do is turn on the "Show Actual Execution Plan" option and then take a close look at what is causing the slowdown. (hover your mouse over each join to see the details) You'll want to make sure that you are getting an index seek and not a table scan.

    I would assume what is happening is that SQL is being forced to pull everything from one table into memory in order to do one of the joins. Sometimes reversing the order that you join the tables will also help things.

  • A general rule of thumb is that OUTER JOINs cause the number of rows in a result set to increase, while INNER JOINs cause the number of rows in a result set to decrease. Of course, there are plenty of scenarios where the opposite is true as well, but it's more likely to work this way than not. What you want to do is keep the size of the result set as small as possible for as long as possible.

    Since both joins match on the first table, changing up the order won't effect the accuracy of the results. Therefore, you probably want to do the INNER JOIN before the LEFT JOIN:

    SELECT * 
    FROM TblA
    INNER JOIN DifferentDbCatalog.dbo.TblC on TblA.ID = TblC.TblAID
    LEFT JOIN freetexttable ( TblB, *, 'query' ) on TblA.ID = [Key]
    

    This probably isn't good enough to fully explain the behavior you're seeing, so you'll still want to examine the execution plan used for each query, and probably add an index as suggested earlier. This is still a good principle to learn, though.

  • Putting freetexttable(TblB, *, 'query') into a temp table may help if it's getting called repeatedly in the execution plan.

    From eed3si9n

Library or algorithm to explode an alphanumeric range

I was wondering if there is an open source library or algorithm that can expand a non-numeric range. For example, if you have 1A to 9A you should get 1A, 2A, 3A, 4A, 5A, 6A, 7A, 8A, 9A.

I've tried Googling for this and the best I could come up with were Regex that would expand numerics with dashes (1-3 becoming 1,2,3).

  • The question is not clear.

    For example, if I give E6T to 5G9 as input, what should be the output?

    From Niyaz
  • And what about E6 to 5G9 ?

    From Mendelt
  • I was trying to leave it somewhat open because the number of possibilities is staggering. I believe this one of those questions that could not be answered 100% here without going through a lot of technical detail about is considered a "good" or "bad" range. I'm just trying to find a jumping point for ideas on how other people have tackled this problem. I was hoping that someone wrote a blog post explaining how they went about it solving this problem or created a whole library to handle this.

    From Gariig
  • I would say the first step in the solution will be to define how characters and numbers interact and form a sequence. The given example isn't clear, as you would at least assume it to run 1A, 1B .... 8Y, 8Z, 9A - that's assuming your input is restricted to decimal followed by a single character.

    If you can define a continuous sequence for characters and decimals, then you it will simply be a matter of some recursion / looping to generate part of that sequence.

    For example, you could assume that each character in the input is one of (1-9A-Z), therefore you could easily make that continuous by grabbing the decimal ascii value of the alpha characters and subtracting 55, in effect giving you the range (1-35)

    From iAn
  • If we assume that the start and end ranges will follow the same alternating pattern, and limit the range of digits to 0-9 and A-Z, we can think of each group of digits as a component in a multi-dimensonal coordinate. For example, 1A would correspond to the two-dimensional coordinate (1,A) (which is what Excel uses to label its two-dimensional grid of rows and columns); whereas AA1BB2 would be a four-dimensional coordinate (AA,1,BB,2).

    Because each component is independent, to expand the range between two coordinates we just return all combinations of the expansion of each component. Below is a quick implementation I cooked up this afternoon. It works for an arbitrary number of alternations of normal and alphabetic numbers, and handles large alphabetic ranges (i.e. from AB to CDE, not just AB to CD).

    Note: This is intended as a rough draft of an actual implementation (I'm taking off tomorrow, so it is even less polished than usual ;). All the usual caveats regarding error handling, robustness, (readability ;), etc, apply.

    IEnumerable<string> ExpandRange( string start, string end ) {
      // Split coordinates into component parts.
      string[] startParts = GetRangeParts( start );
      string[] endParts = GetRangeParts( end );
    
      // Expand range between parts 
      //  (i.e. 1->3 becomes 1,2,3; A->C becomes A,B,C).
      int length = startParts.Length;
      int[] lengths = new int[length];
      string[][] expandedParts = new string[length][];
      for( int i = 0; i < length; ++i ) {
        expandedParts[i] = ExpandRangeParts( startParts[i], endParts[i] );
        lengths[i] = expandedParts[i].Length;
      }
    
      // Return all combinations of expanded parts.
      int[] indexes = new int[length];
      do {
          var sb = new StringBuilder( );
          for( int i = 0; i < length; ++i ) {
            int partIndex = indexes[i];
            sb.Append( expandedParts[i][partIndex] );
          }
          yield return sb.ToString( );
      } while( IncrementIndexes( indexes, lengths ) );
    }
    
    readonly Regex RangeRegex = new Regex( "([0-9]*)([A-Z]*)" );
    string[] GetRangeParts( string range ) {
      // Match all alternating digit-letter components of coordinate.
      var matches = RangeRegex.Matches( range );
      var parts =
        from match in matches.Cast<Match>( )
        from matchGroup in match.Groups.Cast<Group>( ).Skip( 1 )
        let value = matchGroup.Value
        where value.Length > 0
        select value;
      return parts.ToArray( );
    }
    
    string[] ExpandRangeParts( string startPart, string endPart ) {
      int start, end;
      Func<int, string> toString;
    
      bool isNumeric = char.IsDigit( startPart, 0 );
      if( isNumeric ) {
        // Parse regular integers directly.
        start = int.Parse( startPart );
        end = int.Parse( endPart );
        toString = ( i ) => i.ToString( );
      }
      else {
        // Convert alphabetic numbers to integers for expansion,
        //  then convert back for display.
        start = AlphaNumberToInt( startPart );
        end = AlphaNumberToInt( endPart );
        toString = IntToAlphaNumber;
      }
    
      int count = end - start + 1;
      return Enumerable.Range( start, count )
        .Select( toString )
        .Where( s => s.Length > 0 )
        .ToArray( );
    }
    
    bool IncrementIndexes( int[] indexes, int[] lengths ) {
      // Increment indexes from right to left (i.e. Arabic numeral order).
      bool carry = true;
      for( int i = lengths.Length; carry && i > 0; --i ) {
        int index = i - 1;
        int incrementedValue = (indexes[index] + 1) % lengths[index];
        indexes[index] = incrementedValue;
        carry = (incrementedValue == 0);
      }
      return !carry;
    }
    
    // Alphabetic numbers are 1-based (i.e. A = 1, AA = 11, etc, mod base-26).
    const char AlphaDigitZero = (char)('A' - 1);
    const int AlphaNumberBase = 'Z' - AlphaDigitZero + 1;
    int AlphaNumberToInt( string number ) {
      int sum = 0;
      int place = 1;
      foreach( char c in number.Cast<char>( ).Reverse( ) ) {
        int digit = c - AlphaDigitZero;
        sum += digit * place;
        place *= AlphaNumberBase;
      }
      return sum;
    }
    
    string IntToAlphaNumber( int number ) {
      List<char> digits = new List<char>( );
      while( number > 0 ) {
        int digit = number % AlphaNumberBase;
        if( digit == 0 )  // Compensate for 1-based alphabetic numbers.
          return "";
    
        char c = (char)(AlphaDigitZero + digit);
        digits.Add( c );
        number /= AlphaNumberBase;
      }
    
      digits.Reverse( );
      return new string( digits.ToArray( ) );
    }
    
  • As noted by others, it would be useful to be more specific. I don't think you can expect there to be a library that will generate ranges according to any arbitrary order on string you can come up with.

    If you can simply define what the successor of any given string is, then the solutions is quite easy. That is, if you have a successor function S on strings (e.g. with S('3A') = '4A'), then something like the following can be used:

    s = initial_string
    while s != final_string do
      output s
      s = S(s)
    output s
    

    Something I have used in the past to generate all strings of a given length l and with given range b to e of characters, is the following piece of (pseudo-)code. It can be easily adapted to a wide range of variations.

    // initialise s with b at every position
    for i in [0..l) do
      s[i] = b
    done = false
    while not done do
      output s
      j = 0
      // if s[j] is e, reset it to b and "add carry"
      while j < l and s[j] == e do
        s[j] = b
        j = j + 1
        if j == l then
          done = true
      if not done then
        s[j] = s[j] + 1
    

    For example, to start at a specific string you need only the change the initialisation. To set the end you only need to change the behaviour for the inner while to separately handle position l (limiting to the character in the end string on that position and if reached decrementing l).

    Emperor XLII : Just wanted to mention that this is an excellent solution if you consider the problem to be linear (i.e. treating `A1B2` as a single number). I still think the problem is multi-dimensional, but I guess we'll just have to wait for the author to clarify :)
    From mweerden

Load Balancing of Process in 1 Server

I have 1 process that receives incoming connection from port 1000 in 1 linux server. However, 1 process is not fast enough to handle all the incoming request.

I want to run multiple processes in the server but with 1 end-point. In this way, the client will only see 1 end-point/process not multiple.

I have checked LVS and other Load Balancing Solution. Those solutions seem geared towards multiple servers load-balancing.

Any other solution to help on my case?

  • The question is a little unclear to me, but I suspect the answer you are looking for is to have a single process accepting tasks from the network, and then forking off 'worker processes' to actually perform the work (before returning the result to the user).

    In that way, the work which is being done does not block the acceptance of more requests.

    As you point out, the term load balancing carries the implication of multiple servers - what you want to look for is information about how to write a linux network daemon.

    The two kes system calls you'll want to look at are called fork and exec.

  • Yes, i'm looking something like linux network daemon. I wonder whether there's a readily available program/solution to do this.

  • It sounds like you just need to integrate your server with xinetd.

    This is a server that listens on predefined ports (that you control through config) and forks off processes to handle the actual communication on that port.

  • You also may want to go with a web server like nginx. It can load balance your app against multiple ports of the same app, and is commonly used to load balance Ruby on Rails apps (which are single threaded). The downside is that you need to run multiple copies of your app (one on each port) for this load balancing to work.

  • You need multi-processing or multi-threading. You aren't specific on the details of the server, so I can't give you advice on what to do exactly. fork and exec as Matt suggested can be a solution, but really: what kind of protocol/server are we talking about?

  • i am looking something more like nginx where i will need to run multiple copies of my app.

    Let me try it out.

    Thanks for the help.

  • i am thinking to run multiple application similar to ypops.

  • nginx is great but if you don't fancy a whole new web server, apache 2.2 with mod proxy balancer will do the same job

  • Perhaps you can modify your client to round-robin ports (say) 1000-1009 and run 10 copies of the process?

    Alternatively there must be some way of internally refactoring it.

    It's possible for several processes to listen to the same socket at once by having it opened before calling fork(), but (if it's a TCP socket) once accept() is called the resulting socket then belongs to whichever process successfully accepted the connection.

    So essentially you could use:

    • Prefork, where you open the socket, fork a specified number of children which then share the load
    • Post-fork, where you have one master process which accepts all the connections and forks children to handle individual sockets
    • Threads - you can share the sockets in whatever way you like with those, as the file descriptors are not cloned, they're just available to any thread.
    From MarkR

Recursive function for an xml file (hierarchial data).

Hi,

I want to recursively traverse an XML file, and need a little help with the C# code.

My XML file looks like:

<categories>
  <category id="1">
  </category>
  <category id="2">
       <category id="3"> 
       </category>
       <category id="4"> 
              <category id="5"> 
              </category> 
       </category>
  </category>
</categories>

(sorry the angle brackets had to be replaced) : Fixed

Can someone help with the C# function that will traverse the XML file?

  • First off, System.XML provides some excellent ways to work with XML.

    I'm assuming you loaded your XML into an XMLDocument, doing so allows you to use XPath Selectors, or just walk through the DOM.

    Something like this would walk from whatever element back up to the top using recursion:

    public XmlNode WalkToTopNode (XmlNode CurrentNode)
    {
        if (CurrentNode.ParentNode == null)
         return CurrentNode;
        else
         return WalkToTopNode(CurrentNode.ParentNode);
    }
    

    Using recursion to find a node by ID could be done somewhat like this (Note, I typed this in the textbox, it may be wrong):

    public XmlNode GetElementById (string id, XmlNode node)
    {
        if (node.Attributes["id"] != null && node.Attributes["id"].InnerText == id)
        { 
         return node;
        }
        else
        {
         foreach (XmlNode childNode in node.Children)
         {
          return GetElementById(id, childNode);
         }
        }
    
        return null;    
    }
    

    However, if you are using recursion when there are so many better node traversal ways built in to System.XML, then perhaps its time to rethink your strategy.

    From FlySwat

"getting" path in linux

I am writing a c program in linux. Commands like execv() require a path in the form of a c string. Is there a command that will return the current path in the form of a c style string?

  • getcwd()

  • I am not a professional programmer so this isn't an exact answer.

    What you need to do is grab the environment variable PWD (present working directory)

    I'm not sure what the library it is in but it is a standard linux header.

    I'll look around and see if I can find it.

    edit:

    I was thinking of ,getenv() which would help if you also need to run system commands and need the various bin paths located in PATH

    From Tanj
  • This is not ANSI C:

    #include <unistd.h>
    
    char path[MAXPATHLEN];
    getcwd(path, MAXPATHLEN);
    printf("pwd -> %s\n", path);
    
    From akr
  • The path argument to execv() is the path to the application you wish to execute, not the current working directory (which will be returned by getcwd()) or the shell search path (which will be returned by getenv("PATH")).

    Depending on what you're doing, you may get more mileage out of the system() function in the C library rather than the lower-level exec() family.

  • If the path can be a relative path, you should be able to use '.' or './' as the path. I'm not sure if it will work, but you could try it.

    From yjerem
  • gah this place is so fast that if it isn't on the tip of your tongue you'll get down voted while editting your post with a better answer

    Artelius : This isn't a forum. You should only post answers to questions. This doesn't qualify as an answer and you should delete it.
    Tanj : this was posted pre-comments era :)
    From Tanj

What code would I use to convert a SQL like expression to a regex on the fly?

I'm looking to convert a SQL like statement on the fly to the equivalent regex i.e.

LIKE '%this%' LIKE 'Sm_th' LIKE '[C-P]arsen'

What's the best approach to doing this?

P.S. I'm looking to do this on the .Net Framework (C#).

  • I found a Perl module called Regexp::Wildcards. You can try to port it or try Perl.NET. I have a feeling you can write something up yourself too.

    From eed3si9n
  • From your example above, I would attack it like this (I speak in general terms because I do not know C#):

    Break it apart by LIKE '...', put the ... pieces into an array. Replace unescaped % signs by .*, underscores by ., and in this case the [C-P]arsen translates directly into regex.

    Join the array pieces back together with a pipe, and wrap the result in parentheses, and standard regex bits.

    The result would be:

    /^(.*this.*|Sm.th|[C-P]arsen)$/
    

    The most important thing here is to be wary of all the ways you can escape data, and which wildcards translate to which regular expressions.

    % becomes .*
    _ becomes .
    
    From Martin