Code Answer: 04/06/11

Wednesday, April 6, 2011

To remove a space break between in a PDF -file by a low level code

How can you remove the pagebreak between two pieces of text at the bottom of the pdf file without having the source of the original document?

I use Ubuntu.

Just for curiosity: I put the document with the pagebreak to my LaTeX -document. I converted it to PDF by pdflatex. Pdflatex ignores the second page completely. If somebody knows, please let me know how you can insert the second page by includegraphics -command or other command.

From stackoverflow

There are two pages in your document. That gray space between the two pieces of text at the bottom of the file is not part of the document. It's actually part of Adobe Reader, or whichever PDF viewer you use, and it's meant to indicate a page break -- in this case a separation between page one and page two.

So if you want to combine these two pages into one page and remove that gray space, then you'll need to find a PDF library (or another non-library tool) that works in a Linux environment, that will allow you to stitch/merge two pages together to create one page.

Before you go down that path though, I'd recommend that you try to get a hold of the original document and try to re-create PDF, this time using a larger page size so that you fit all of the content onto one page.

Masi : This is clearly the best practical solution. However, I am interested in file standards and pdf-system works with pagebreaks at the low-level. This means that you should now the exact code for a pagebreak in PDF -documents to find it in the document. - I did not find it by google.

Rowan : At a low level there is no code for a page break in PDFs. Each page in a PDF is represented by a page object. The page object is a dictionary which includes references to the page's content and other attributes. The individual page objects are tied together in a structure called the page tree. However, the structure of the page tree is not necessarily related to the logical structure or flow of the document. You can read more about this in the PDF reference - but don't bother modifying the internals of a PDF until you've read the reference. http://www.adobe.com/devnet/pdf/pdf_reference.html

Code in a RSS feed

I am using a feed creator (specifically, Kohana's feed::create()), except some of my text might be like this in the description element

See code below

<?php echo 'example'; ?>

The feed creator is using the SimpleXML Library. Whenever the data is returned (using $xml->asXml()) the html angle brackets inside the description element are converted to HTML entities.

This makes the tags be parsed correctly, useful for p tags and the like. However, in this case - the PHP code won't show up (being surrounded by angle brackets).

My question is - how can I show stuff like this in a RSS feed? How can I display > when it itself is parsed back as <? Does that make sense?

Here is an example of what is being outputted:

<description>&lt;p&gt;some content&lt;/p&gt;&#13;

&lt;p&gt;WITH some code&lt;/p&gt;&lt;p&gt;&lt;?php&#13;
    //test me out!&#13;
?&gt;&lt;/p&gt;&#13;
</description>

(note that is not an error above - the entities are all converted)

What I'd like it to display (in a RSS reader) is

some content

WITH some code

<?php
     //test me out! ?>

From stackoverflow

Haven't you try with CDATA tags?

alex : I tried doing that .... but when I called asXml() the CDATA tags themselves were encoded.
All RSS tags contain strings so can't you just do your PHP manipulation prior to setting the tag?

So instead of saying:
```
$xml->description = 'Description <?php echo $var; ?>';
```
you should be doing:
```
$xml->description = 'Description ' . $var;
```
What is the reason that you want to pass PHP code into your RSS feed? I'm guessing that a lot of feed readers would not execute it anyways.

alex : I want to **show** it... as in ... **look** at the following code sample. I don't want the code being parsed. I'm guessing ALL feed readers would not execute PHP code... it would be quite a security problem if they did execute it on an online reader or on your local machine.
You want the code to actually display in the feed as code, not execute, right? If so, you need to escape it the same way you would if you wanted it to display in HTML, i.e.:
```
htmlspecialchars( "<?php echo 'example'; ?>" )
```
That will result in your feed looking even more garbled than it already does, because the PHP will be double-encoded, once for the RSS XML and again for the HTML contained in the RSS XML.
alex : I tried that - but I couldn't get it to work. Let me try it again. Thanks for your answer.
alex : I had to use it twice - I leveraged Kohana's html::specialchars(). It ugly as hell but it works - the line looks like this `$markup = '
```
' . html::specialchars(html::specialchars($code)) . '
```
';`

Array Allocation Subscript Number

Quick question regarding how memory is allocated.

If someone were to allocate 20 chars like this:

char store[20];

does this mean that it allocated 20 char type blocks of memory, or that it allocated char type blocks of memory starting with 0 and ending with 20. The difference is that the first example's range would be from store[0] to store[19], whereas the second example's range would be from store[0] to store[20].

From stackoverflow

[0] to [19] (20 elements, that is)
The first - char store[20] allocates 20 chars, from 0 to 19.
It allocates memory for 20 chars, i.e. valid indices go from 0 to 19.
It means it allocated one block of memory large enough to hold 20 chars (from index 0 to 19)

trikker : Thanks. Setting this as answer because you were first.

ChristopheD : Hm, I think he was last ;-) But hey, no problem ;-)

trikker : Yeah you're right. Was thinking backwards.

xml file in soap

My last question about soap series :)

Now, i have xml file which must be send to the soap web service.

Do i have to attach it as file to request OR send as method parameter in request?

What is the best approach? By the way size of file is variable.it can be reached 10mb

sometimes.

Thanks

From stackoverflow

I'd look into soap with attachments, or alternatively you could serialise it as byte array and send it as a parameter.

John Saunders : SwA is obsolete.

Is MySQL appropriate for a read-heavy database with 3.5m+ rows? If so, which engine?

My experience with databases is with fairly small web applications, but now I'm working with a dataset of voter information for an entire state. There are approximately 3.5m voters and I will need to do quite a bit of reporting on them based on their address, voting history, age, etc. The web application itself will be written with Django, so I have a few choices of database including MySQL and PostgreSQL.

In the past I've almost exclusively used MySQL since it was so easily available. I realize that 3.5m rows in a table isn't really all that much, but it's the largest dataset I've personally worked with, so I'm out of my personal comfort zone. Also, this project isn't a quickie throw-away application though, so I want to make sure I choose the best database for the job and not just the one I'm most comfortable with.

If MySQL is an appropriate tool for the job I would also like to know if it makes sense to use InnoDB or MyISAM. I understand the basic differences between the two, but some sources say to use MyISAM for speed but InnoDB if you want a "real" database, while others say all modern uses of MySQL should use InnoDB.

Thanks!

From stackoverflow

I've run DB's far bigger than this on mysql- you should be fine. Just tune your indexes carefully.

InnoDB supports better locking semantics, so if there will be occasional or frequent writes (or if you want better data integrity), I'd suggest starting there, and then benchmarking myisam later if you can't hit your performance targets.

zombat : +1 - my sentiments exactly. I don't use MyISAM anymore unless I have a specific reason to. It's very capable, I used to work with tables with 20 million+ records with no problems, but the transactional capabilities of InnoDB usually outweigh any performance differences you are likely to need until you encounter specific performance issues.

Branden Hall : Thanks Tim! Can I ask you to expand on "just tune your indexes carefully"? I get the purpose of indexes, but I'm not sure what the tradeoff is - i.e. why not index everything. I'm also a bit confused by what it means to tune an index - I was under the impression you decided to index a column or not and that was it.

Tim Howland : When you create an index, you add extra overhead when doing inserts (not much, just a little, but it adds up). If you index everything, then it can really slow things down. Indexes can be based on a single field, the first N bytes (or characters) of a field, or two or more fields. You need to work with the "explain query" tool and benchmark your system to figure out the best mix for your particular data set and the searches you typically run. Check the mysql docs here: http://dev.mysql.com/doc/refman/5.0/en/create-index.html for more info.

Branden Hall : Thanks so much Tim - that explains a lot!

Greg Smith : Indexes also take up additional disk space. If you create too many of them, the overhead when doing inserts can be huge--it's not safe to assume it will be minor. The problem is that all those little writes to update many of them involves a lot more seeking over the disk surface, and seeks are very slow on most drives.
MyISAM only makes sense if you need speed so badly that you're willing to accept many data integrity issues downsides to achieve it. You can end up with database corruption on any unclean shutdown, there's no foreign keys, no transactions, it's really limited. And since 3.5 million rows on modern hardware is a trivial data set (unless your rows are huge), you're certainly not at the point where you're forced to optimize for performance instead of reliability because there's no other way to hit your performance goals--that's the only situation where you should have to put up with MyISAM.

As for whether to choose PostgreSQL instead, you won't really see a big performance difference between the two on an app this small. If you're familiar with MySQL already, you could certainly justify just using it again to keep your learning curve down.

I don't like MySQL because there are so many ways you can get bad data into the database where PostgreSQL is intolerant of that behavior (see Comparing Speed and Reliability), the bad MyISAM behavior is just a subset of the concerns there. Given how fractured the MySQL community is now and the uncertainties about what Oracle is going to do with it, you might want to consider taking a look at PostgreSQL just so you have some more options here in the future. There's a lot less drama around the always free BSD licensed PostgreSQL lately, and while smaller at least the whole development community for it is pushing in the same direction.

Branden Hall : Thanks Greg, the politics of MySQL definately do scare me a bit. Looks like I should do some reading on PostgreSQL and see if I can fit some ramp up time into my development schedule.
Since it's a read-heavy table, I will recommend using MyISAM table type. If you do not use foreign keys, you can avoid the bugs like this and that.

Backing up or copying the table to another server is as simple as coping frm, MYI and MYD files.
If you need to compute reports and complex aggregates, be aware that postgres' query optimizer is rather smart and ingenious, wether the mysql "optimizer" is quite simple and dumb.

On a big join the difference can be huge.

The only advantage MySQL has is that it can hit the indexes without hitting the tables.

You should load your dataset in both databases and experiment the biger queries you intend to run. It is better to spend a few days of experimenting, rather than be stuck with the wrong choice.

Is coding style mostly for readability or are there other factors?

I remember reading in Douglas Crockford's "Javascript the Good Parts" book that there is potential for error with blockless statements because of automatic semicolon insertion.

if (condition)
   foo = true;

if (condition) 
{
   foo = true;
}

In the second the example it will work consistently, in the first example a semicolon will be automatically inserted by the interpreter and can lead to ambiguity in the code. As Douglas points out this is potentially bad and hard to debug, which I agree. But it got me thinking are there examples where coding "style" actually has syntax implications? In other words, examples where failing to follow a certain indentation or apparent style actually results in a bug or error. I suppose Python with its significant whitespace is an example, YML with its requirement for no tabs is another.

Feel free to respond in a wide variety of languages and idioms. I am curious to hear about the paradigm cases. In your answer I would like to know the WHAT and the WHY of the coding style or syntax behavior. I don't want to start any coding style flame wars, just matter of fact scenarios where the uninitiated would get tripped up.

From stackoverflow

In Python, the whitespace indentation, rather than curly braces or keywords, delimits the statement blocks. An increase in indentation comes after certain statements; a decrease in indentation signifies the end of the current block.
Javascript treats these two cases seperately. You have to use the first
```
return {
   // code
}

return 
{
   // code
}
```
If you do not the interpreter adds semi colons in the wrong places. I think it puts one after the condition. So the second would be read wrongly as.
```
return;
{
   // code
}
```
Which is not invalid syntax.

Gordon Potter : Ah that's it! Now I remember reading this in Douglas Crockford's Javascript the Good Parts book. The issue with the automatic semi colons. One of the "evil" features of Javascript. Thanks!

belugabob : I was surprised to read this answer, as I'd never come across the problem myself. I was even more surprised by the comment about the Crockford book, as I've just read it and couldn't remember this nugget of info. Couldn't find it by looking quickly through the book, so I modified some code to see for myself, and the behaviour you describe didn't take place. Am I reading this correctly - are you saying that the second way of doing things will cause the code inside the braces to always be executed? You're not confusing this with the 'Block-less statements' on pg 113, are you?

belugabob : Ah! - now that you've edited it to use 'return' instead of 'if...', it makes more sense. Having said that, why on earth would you use that syntax in the first place - it's horrible (Both versions)

Andreas Grech : @belugabob, refer to Crockford's book pg.102 "Semicolon Insertion"

David Raznick : To be honest, I agree, its ugly both ways. I just thought it was the example that Gordon was after.

belugabob : David, thanks for the page reference, now I know why I didn't remember it - I must have decided that I'd never use that construct anyway, and that remembering that rule was a waste of memory space. ;-)

Gordon Potter : Yeah, what is the point of the brackets after the return? It is an odd construction. Does this mean you are returning some anonymous object instead of say a single value, or variable? Does seem like a goofy construction. Perhaps this has a use when returning JSON key/value stores. But it seems more clear to name your objects in a variable and return the variable.

belugabob : Totally. By the way '{}' are braces, '()' are parentheses and '[]' are brackets. Some people refer to them as different types of brackets, but the naming convention that I gave is more common and less confusing.
Whitespace is not significant in any of the C family of languages, except to separate certain language tokens. The layout of the source code has no effect on the executable produced by the compiler.
It depends on the language. In the curly family C/C++/Java/C#, whitespace is not the restriction as long as your braces are opened and closed properly.

In languages like VB where paired keywords are used, just because keywords delimit functions you can't have
```
Private Sub someFunction() End Sub
```
But in C, C++ and other curly languages, you can have
```
void someFunction() { }
```
In python, I guess its a completely different matter.

It all depends on the particular language. As per your specific example, I don't think there is a syntactical or semantic difference between the two.
In C++, there's a difference between
```
vector<pair<int,int>>
```
and
```
vector<pair<int,int> >
```
because >> is treated as a single token by the parser.

anon : YThe difference being the first won't compile with some older compilers. This is fixed in the future standard. The meaning of the construct and the code the compiler generates for it will be the same in both cases.
A coding style is not only for readability. There are some other factors like
- Avoid common language pitfalls (silent fails, exceptions while casting etc.)
- Ease of maintenance
- Increase the clarity of code (e.g. private functions always starting loweCase and public beeing PascalCase)
- Enforcing conventions (e.g. not using multiple inheritance or always inherit public in c++)
an example for ease of maintenacne is below:
```
if(x) return true;
```
vs.
```
if(x) 
{
   return true;
}
```
it is clear the second is easier to maintain, because i can simply go ahead and add a new line and add a call to bla() without having to add brackets before.
It is language dependant. For instance in Java
```
void someFunction() { }
```
and
```
void someFunction() { 
}
```
and
```
void someFunction() 
{ 
}
```
will have no implication whatsoever. However Sun has enlisted a list of coding conventions if you are interested you could read them here.

These are mainly for maintainability and readability of code but need not be strictly followed.

Kevin Boyd : oops the braces have not shown up correctly. in my post I will try to repost later as I'm a bit caught up now.
No one has mentioned it before, but one point of style is to write

if (NULL==p) // ...

instead of

if (p==NULL) // ...

The two are functionally equivalent so it's a matter of style. Many prefer the style at the top, because if you type "=" by mistake instead of "==", the first won't compile whereas the second will, and creates a bug that is hard to find (though some compilers now give you a warning for if (p=NULL)).

Gordon Potter : Ah, that would be truly evil if you were having a bad day before a deadline. Good point, and nice explanation. Thanks!

Colin Mackay : It depends on the compiler. The C# compiler demands that the condition evaluate to a Boolean. Accidentally putting if (p=null) won't evaluate to a Boolean so the compiler will reject it. Also, many other compilers that allow non-Boolean expressions to be interpreted in conditionals will warn when an assignment is present.
Every time I open a parentheses, brace, single or double quote I always close it and then go back to write the statement, condition, etc... This saves me from quite some possibly big mistakes!

belugabob : I've noticed other people doing this, as I watched them type - now I know why they do it. Personally, I prefer an editor which does this for me - maybe I've just been spoiled by year of using IntelliJ?

Alix Axel : Old habits die hard! =)
The title wasn't specific to conditional blocks, so this came to mind:

I don't see it mentioned yet, but one thing to consider is how do you find things with simple tools (like grep and its inferior implementations from windowsland).

For example consider this (admittedly a bit contrived) example
```
class Foo
// vs.
class
Foo
```
You can find former with regex "class\s+Foo", but for latter you'll have to have a specialized tool that can parse C++/C#/java.

This applies also in C for function prototypes, some weird people prefer
```
void
bar (void)
// to 
void bar(void)
```
While you can generally assume that if in front of function name are some A-Z characters, then it's very likely definiton/declaration.

Actually when it comes to ifblocks, then placing the brace has very big impact on debugging code in Visual Studio 200x. When it steps into function/block, it will place text/execution line cursor on opening bracket. So if the bracket happens to be waaaaay on the right side, the code window has to scroll there and buggeritall stays there. Extremely annoying.

xcramps : It's not weird. Its easier to find the function definition in example #1, using grep, for example. These choices were not made unintentionally.

How to get a JavaScript Object's Class?

I created a Javascript Obj, but how can I get back the Class of that Javascript Obj?

I want something that similar to Java .getClass() method.

From stackoverflow

There's no exact counterpart to Java's #getClass() in JavaScript. Mostly that's due to JavaScript being a prototype-based language, as opposed to Java being a class-based one.

Depending on what you need #getClass() for, there are several options in JavaScript:
- typeof()
- instanceof
- func.prototype, proto.isPrototypeOf
- obj.constructor
A few examples:
```
function Foo() {}
var foo = new Foo()

typeof(foo) // == "object"

foo instanceof Foo // == true
foo.constructor // == Foo

Foo.prototype.isPrototypeOf(foo) // == true
Foo.prototype.bar = function (x) {return x+x}
foo.bar(21) // == 42
```
Miles : That should probably be `func.prototype` (yes, functions are objects, but the `prototype` property is only relevant on function objects).

earl : Yes, good point.

Christoph : you might also want to mention `instanceof`/`isPrototypeOf()` and the non-standard `__proto__`

Christoph : ES5 has aditionally `Object.getPrototypeOf()`

clarkf : For me, foo.constructor yields something different (Chrome 8.0.552.0 dev on Mac OS X): `Function Foo() {}`

earl : Yes, clarkf, that's `Foo` pretty-printed. The comments don't indicate the return values, but equalities that hold for the return values. So the comment means that `foo.constructor == Foo` holds, which will also be the case for you.
Javascript is a class-less languages: there are no classes that defines the behaviour of a class statically as in Java. JavaScript uses prototypes instead of classes for defining object properties, including methods, and inheritance. It is possible to simulate many class-based features with prototypes in JavaScript.

shambleh : I have often said that Javascript lacks class :)
You can get a reference to the constructor function which created the object by using the constructor property:
```
function MyObject(){
}

var obj = new MyObject();
obj.constructor; // MyObject
```
If you need to confirm the type of an object at runtime you can use the instanceof operator:
```
obj instanceof MyObject // true
```
In javascript, there are no classes but i think you whant the constructor name so obj.constructor.toString() will tell you what you need.

This function returns either "undefined", "null", or the "class" in [object class] from Object.prototype.toString.call(someObject).

function getClass(obj) {
  if (typeof obj === "undefined")
    return "undefined";
  if (obj === null)
    return "null";
  return Object.prototype.toString.call(obj)
    .match(/^\[object\s(.*)\]$/)[1];
}

getClass("")   === "String";
getClass(true) === "Boolean";
getClass(0)    === "Number";
getClass([])   === "Array";
getClass({})   === "Object";
getClass(null) === "null";
// etc...

Does a WCF self hosted service handle more or less load than the IIS hosted option?

Does the hosting option affects the amount of requests a WCF service can handle?

From stackoverflow

Hard to say - the main reason for self-hosting is probably having more control, e.g. being able to tweak the system as you need it.

IIS hosting is convenient and easy to set up, and it offers "on-demand" loading of the service, e.g. the service host is only loaded if a request actually comes in.

This constant loading (and unloading) of the service host will probably hurt performance a little bit - on the other hand, self-hosting a service host, you probably use more memory (since the ServiceHost is active and in memory at all times).

So again - it's a memory-vs-speed trade-off - selfhosting uses more RAM but is probably a tiny bit faster.

Marc
Once the service is running I would expect no significant difference.

But, as with any performance question, you can only get a useful answer by testing your service in both cases with realistic loads and looking at the big picture. E.g. one might server a few more requests but at a slightly higher memory cost.

There are of course going to be other differences e.g. IIS hosting, with on demand instantiation, would be expected to be a little slower to serve the first request from idle, whether this is significant only you can tell.

marc_s : Of course, if you host in IIS and you have the recommended "per-call" activation model, IIS will (in the worst case) keep creating ServiceHost instances over and over again. Once the ServiceHost is up, I agree - no difference to be expected.
Once "Dublin" (a purpose built WCF hosting environment) is released, it will be natural to use that.

EDIT: This answer was originally about the potential difference between IIS and self hosted due to differences in threading. However, I stand corrected, see the comment below.

marc_s : self-hosted WCF is just as multi-threaded as hosting it in IIS - I don't see any benefit pro IIS here.

How to use a defined brush resource in XAML, from C#

So far I have this

<UserControl.Resource>
 <LinearGradientBrush x:Key="KeyDownBrush" .....>

Now I would like to access this defined resource when a key is pressed and replace the current objects fill with the KeyDownBrush, in C#.

I've tried this.Resource.Contains("KeyDownPress") and have been able to get True returned so I presume I am almost there but I'm not sure how to access the object and Parse it correctly to a Brush instance.

Any guidance would be appreciated.

From stackoverflow

From within your UserControl:
```
var brush = this.Resources["KeyDownBrush"] as LinearGradientBrush;
```
Should do the trick.

Sebastian Gray : Thanks Matt. I thought I must be close, I just couldn't seem to find an example for this in the documentation where I was looking. For anyone else the C# I used looks like System.Windows.Media.LinearGraidentBrush aBrush = (System.Windows.Media.LinearGradientBrush)this.Resources["KeyDownBrush"]; aRectangle.Fill = aBrush;

Can I write ISAPI filters in Perl?

I need to write an ISAPI filter for IIS 6.0 to rewrite ugly URLs into SEO-friendly URLs. Because of the need for string parsing and regular expressions, I'd prefer to use Perl to do this. There is a module for IIS called (ingeniously) Perl for IIS, but I'd rather not use that because it's an ISAPI extension itself (running in a DLL), so the Perl scripts run in the same process space as the calling process--meaning if the script crashes, so does everything in the app pool from which the script was executed.

So in short, can I write an ISAPI filter using Perl?

From stackoverflow

I think I just answered my own question:

http://docs.activestate.com/activeperl/5.10/Components/Windows/PerlISAPI.html

It looks like running it as an ISAPI extension is really my only choice.

Chris : Just as an addendum, my fears about running Perl as a DLL are confirmed in this statement from the above link: Extra care should be taken when you write PerlIS applications. It is difficult to crash the web server using CGI, but because the PerlIS DLL runs in the process space of the server, your web server is more susceptible to crashes and hangs caused by programming errors.

Chris : After further research, it appears PerlIS is not a way to write ISAPI filters in Perl. Rather, it's just another way to execute Perl scripts other than CGI. :( Oh well.

Linux command to replace string in LARGE file with another string

I have a huge SQL file that gets executed on the server. The dump is from my machine and in it there are a few settings relating to my machine. So basically, I want every occurance of "c://temp" to be replace by "//home//some//blah"

How can this be done from the command line?

From stackoverflow

sed is a good choice for large files.
```
sed -i.bak -e 's%C://temp%//home//some//blah%' large_file.sql
```
It is a good choice because doesn't read the whole file at once to change it. Quoting the manual:

A stream editor is used to perform basic text transformations on an input stream (a file or input from a pipeline). While in some ways similar to an editor which permits scripted edits (such as ed), sed works by making only one pass over the input(s), and is consequently more efficient. But it is sed's ability to filter text in a pipeline which particularly distinguishes it from other types of editors.

The relevant manual section is here. A small explanation follows

-i.bak enables in place editing leaving a backup copy with .bak extension

s%foo%bar% uses s, the substitution command, which substitutes matches of first string in between the % sign, 'foo', for the second string, 'bar'. It's usually written as s// but because your strings have plenty of slashes, it's more convenient to change them for something else so you avoid having to escape them.

Example
```
vinko@mithril:~$ sed -i.bak -e 's%C://temp%//home//some//blah%' a.txt
vinko@mithril:~$ more a.txt
//home//some//blah
D://temp
//home//some//blah
D://temp
vinko@mithril:~$ more a.txt.bak
C://temp
D://temp
C://temp
D://temp
```
dalloliogm : You can use a different character to avoid having to quote the slashes, for example sed -e "s%C://temp%/home//some//blah%". Also, the -i option allows you to save the file inplace, when you are sure of the options.

RD : This is the command I'm typing: sed -i.bak -e 's%C:\\temp\%/home/liveon/public_html/tmp' liveon.sql and this is the error I'm getting: sed: -e expression #1, char 41: unterminated `s' command Anyone?

Vinko Vrsalovic : You are missing the final %, the command is s%foo%bar%

Dave Jarvis : Also, RD, make sure to escape backslashes properly.
The sed command can do that. Rather than escaping the slashes, you can choose a different delimiter (_ in this case):
```
sed -e 's_c://temp/_/home//some//blah/_' file1.txt > file2.txt
```
dalloliogm : you missed the last underscore: "s_c://temp/_/home//some//blah_"

stefanw : thanks! It's now fixed.
Try sed? Something like:
```
sed 's/c:\/\/temp/\/\/home\/\/some\/\/blah/' mydump.sql > fixeddump.sql
```
Escaping all those slashes makes this look horrible though, here's a simpler example which changes foo to bar.
```
sed 's/foo/bar/' mydump.sql > fixeddump.sql
```
As others have noted, you can choose your own delimiter, which would prevent the leaning toothpick syndrome in this case:
```
sed 's|c://temp\\|home//some//blah|' mydump.sql > fixeddump.sql
```
The clever thing about sed is that it operating on a stream rather than a file all at once, so you can process huge files using only a modest amount of memory.
Just for completeness. In place replacement using perl.
```
perl -i -p -e 's{c://temp}{//home//some//blah}g' mysql.dmp
```
No backslash escapes required either. ;)

Telemachus : Please note that if you use the `-i` flag without an extension, you get *no backup*. If you want a backup, try `-i.bak` which will do the in-place edit *and* give you a backup of the original as `original.bak`, pretty much for free.

jrockway : I let my version control system handle making the backups.

Telemachus : @Jrockway: that's lovely for you I'm sure, but it assumes that the files in question are under version control and that you know what -i.bak does and have chosen not to use it. I just wish people who recommend the -i switch would take two seconds to explain the difference between -i and -i.bak. It will really hurt if the files you play with are not under version control and you make a simple typo (e.g, forget the -p flag).
There's also a non-standard UNIX utility, rpl, which does the exact same thing that the sed examples do; however, I'm not sure whether rpl operates streamwise, so sed may be the better option here.

Vinko Vrsalovic : Heh, per chance, are you a friend of the developer of rpl? :-)

Meredith L. Patterson : Nope, never heard of the guy outside of the util; it came in handy for doing a batch-replace job on a few thousand text files once and I've kept it in my toolbox.

Telemachus : It would be worth saying *why* you recommend it in this case (or why you might, since you half take back the recommendation). That is, rather than just throw up the name of a utility, tell us what you liked about it, please.

Tyler McHenry : rpl is nice for simple replacements because it has a much more user-friendly syntax than the combination of sed and find that it replaces. It also has a neat dry-run feature where it will tell you what it would replace without actually doing the replacement. It's main limitation is that it only does straight replacements and no regular expressions.

Meredith L. Patterson : @Telemachus - Tyler nailed it.
```
perl -pi -e 's#c://temp#//home//some//blah#g' yourfilename
```
The -p will treat this script as a loop, it will read the specified file line by line running the regex search and replace.

-i This flag should be used in conjunction with the -p flag. This commands Perl to edit the file in place.

-e Just means execute this perl code.

Good luck

gawk

awk '{gsub("c://temp","//home//some//blah")}1' file

Opening popup links in UIWebView, possible?

Hey guys,

I have a UIWebView which I'm using as an embedded browser within my app.

I've noticed that links in webpages that open new windows are ignored without any call into my code.

I've tried breakpointing on

- (BOOL)webView:(UIWebView *)webView shouldStartLoadWithRequest:(NSURLRequest *)request navigationType:(UIWebViewNavigationType)navigationType

and then selecting a link that would open a popup window, and the breakpoint is never hit. Is there anything I can do to intercept that selection of the popup link and get the URL and just load it normally?

I'm not interested in displaying a popup window in the app itself, I just want the URL of whatever is going to be loaded in the popup window to load in the main webview itself.

Is this possible?

Thanks!

From stackoverflow

Yes this is totally possible, and is the entire point of the shouldStartLoadWithRequest: delegate method.

The core issue here is to figure out why you're not hitting your breakpoint in that method. My gut instinct would be that you haven't properly set the UIWebView's delegate property to the object implementing this method.

Jasarien : Thanks for the suggestion, but the delegate is set. I hit the breakpoint if normal links are clicked. It would seem that interally, the UIWebView class purposefully ignores links that will open in a new window.
So after a small amount of research, it's clear that the UIWebView class purposefully ignores links that will open in a new window (either by using the 'target' element on the a tag or using javascript in the onClick event).

The only solutions I have found are to manipulate the html of a page using javascript. While this works for some cases, it's not bulletproof. Here are some examples:
```
links = document.getElementsByTagName('a');
for (i=0; i<links.length; i++)
{
    links[i].target='_self';
}
```
This will change all links that use the 'target' element to point at _self - instead of _blank or _new. This will probably work across the board and not present any problems.

The other snippet I found followed the same idea, but with the onClick event:
```
links = document.getElementsByTagName('a');
for (i=0; i<links.length; i++)
{
    links[i].onclick='';
}
```
This one is just plain nasty. It'll only work if the link tag has it's href element correctly set, and only if the onclick event is used to open the new window (using window.open() or something similar). The reasons why it is nasty shouldn't need explaining, but one example would be if the onClick is used for anything other than opening a window - which is a very common case.

I guess one could go further with this and start doing some string matching with the onClick method, and check for window.open(), but again, this is really far from ideal.
I ran into this as well, and HTML rewriting was the best solution I could come up with. The biggest issue that I ran into with that approach is that the web browser is interactive for up to a couple of seconds until the webViewDidFinishLoad: method is called, so the links seem to be broken for a few seconds until they're rewritten.

There's three areas that I rewrote: links, form posts, and calls to window.open().

I used a similar approach to the first code snipped in Jasarian's answer to overwrite the target for links and forms by iterating over tags and forms. To override window.open, I used code similar to the following:
```
var oldWindowOpen = window.open;
window.open = function(url, sName, sFeatures, bReplace) {
  oldWindowOpen(url, '_self');
};
```
Here's how I get twitter links to work (i.e. link to pages that try to open with new windows):

-(BOOL)webView:(UIWebView *)mainWebView shouldStartLoadWithRequest:(NSURLRequest *)request navigationType:(UIWebViewNavigationType)navigationType {
```
if (navigationType == UIWebViewNavigationTypeLinkClicked) {
    //Allows for twitter links
    [self.mainWebView loadRequest:request];
    return NO;
}

return YES;
```
}

Avoiding casts when translating public APIs to internal glue code

So, I have this public API that my application exposes, allowing customers to write plug-ins. For the sake of this example, let's say it's a fairly simple key-value pair system, something like:

public interface Key {
  // marker interface, guaranteed unique in some scope
}

public interface KVPService {
  Set<Key> getKeys();
  Object getValue(Key k); // must be a key provided by getKeys() or it blows up
}

Now let's say that internally, the Key has some attributes that shouldn't be exposed -- say a database ID. What I'm currently doing is something like this:

/** Internal impl of external Key */    
class InternalKey implements Key {
  int getDatabaseId() {
    // do something...
  }
}

/** Internal impl of external KVPService */    
class InternalKVPService implements KVPService {
  // ...

  public Object getValue(Key k) {
    InternalKey ik = (InternalKey) k;
    return getValueFromDBOrWherever(ik.getDatabaseId());
  }

  // ...
}

This seems less than ideal. Is there some way I can rearrange the responsibilities to avoid the cast and still maintain the inside/outside encapsulation?

Note that in the real world it's a fair bit more complicated than this; the Key equivalents have a fair bit of metadata attached and there's a number of things someone might want to do with one other than get a simple value, so just exposing, say, Map.Entry-like objects instead won't necessarily solve the problem.

The only thing I've been able to come up with is to separate the internal and external keys entirely, and keep around a Map<Key, InternalKey>. But in that case I'd have to either copy the metadata, which violates DRY, or have the external Key delegate to InternalKey, in which case the Map violates DRY.

Can anyone think of something more clever?

From stackoverflow

I don't think so. But why does the typecast worries you? The potential ClassCastException won't happen in practice (assuming that I understand your use-case), the cast will only take 5 or so machine instructions, and it is hidden from callers of your KPService API.

David Moles : I think mainly I don't like that I can't prove at compile-time that a `Key` is an `InternalKey`. But given that I'm getting `Key`s from the external caller, maybe it's inherently unreasonable to want that.

Robin : If you want the compile time checking, you are going to have to either use classes or generics on your interfaces. Assuming you can change your public API.

Stephen C : @Robin: even with generics, the JVM often has to do a type-cast underneath the covers.
One approach I've seen is to expose a common interface for all objects (in this case keys) and provide a base implementation that simply throws an UnsupportedOperationException (or do nothing) for each method. Then sub-class implementations subsequently override method(s) to provide functionality. Granted it's not very OO-like but you'll find some examples in the JDK API (e.g. Iterator's remove() method).

Another option: You could use the visitor pattern to have each object perform functionality without downcasting; e.g.
```
public interface KeyVisitor {
  void visitInternalKey(InternalKey k);
  void visitFooBarKey(FooBarKey k);
}

public interface Key {
  void visit(KeyVisitor kv);
}

public class InternalKey implements Key {
  public void visit(KeyVisitor kv) {
    kv.visitInternalKey(this);
  }
}
```
The disadvantage here is that you would have to expose the methods on InternalKey (at least via an interface) to allow your visitor implementations to call them. However, you could still keep the implementation detail at the package level.

David Moles : There's enough different things I need to do to `Key` that I'm not sure a straight Visitor pattern will work, but that at least puts the responsibilities in the right place. Good thinking.

Adamski : Thanks. Only thing to bear in mind: If you over-use Visitor it can make code fairly unreadable - Sometimes a switch / if-then statement is actually easier to maintain (and probably faster).

How to find the IWebBrowser2 pointer for an IE8 window given a PID?

Hi,

so far, I've successfully used the following function to retrieve the IWebBrowser2 pointer to a running Internet Explorer instance, given it's PID.

static SHDocVw::IWebBrowser2Ptr findBrowserByPID( DWORD pid )
{
    SHDocVw::IShellWindowsPtr ptr;
    ptr.CreateInstance(__uuidof(SHDocVw::ShellWindows));
    if ( ptr == NULL ) {
        return 0;
    }

    // number of shell windows
    const long nCount = ptr->GetCount();

    // iterate over all shell windows
    for (long i = 0; i < nCount; ++i) {
        // get interface to item no i
        _variant_t va(i, VT_I4);
        IDispatchPtr spDisp = ptr->Item(va);

        SHDocVw::IWebBrowser2Ptr spBrowser(spDisp);
        if (spBrowser != NULL) {
            // if there's a document we know this is an IE object
            // rather than a Windows Explorer instance
            HWND browserWindow;
            try {
                browserWindow = (HWND)spBrowser->GetHWND();
            } catch ( const _com_error &e ) {
                // in case ->GetHWND() fails
                continue;
            }

            DWORD browserPID;
            GetWindowThreadProcessId( browserWindow, &browserPID );
            if ( browserPID == pid ) {
                return spBrowser;
            }
        }
    }
    return 0;
}

What I do is to launch an explorer.exe process via CreateProcess and then use the above function to retrieve the IWebBrowser2Ptr to it (so that I can fiddle with the browser).

Unfortunately, this doesn't seem to work with Internet Explorer 8 anymore, since IE8 seems to reuse processes - at least to some degree. For two code sequences like:

PROCESS_INFORMATION pi;
// ...

if ( CreateProcess( ..., &pi ) ) {
    // Wait a bit to give the browser a change to show its window
    // ...

    IWebBrowser2 *pWebBrowser = findBrowserByPID( pi.dwProcessId );
}

The first run of this code works fine, the second one never manages to retrieve the pWebBrowser window.

After a bit of debugging, it was revealed that the findBrowserByPID function does find lots of browser windows (and it finds more after starting a second browser instance), but none of them belong to the newly started process. It seems that all windows belong to the first IE process which was started.

Does anybody know an alternative way to get the IWebBrowser2 pointer to some IE8 instance? Or is there maybe a way to disable this apparent 'reuse' of processes with IE8?

From stackoverflow

If you're launching the IE Process yourself, don't use CreateProcess-- instead, use CoCreateInstance. That will return you an object for which you can query for IWebBrowser2, which you can use at will. The one complexity is that if the navigation crosses integrity levels (Vista+) the pointer becomes invalid. To address that problem, sync the NewProcess event, which will allow you to detect this condition.

See some more info here: http://msdn.microsoft.com/en-us/library/aa752084%28VS.85%29.aspx

Frerich Raabe : This used to be what I did until I noticed that CoCreateInstance didn't necessarily create a new process. I'm automating multiple IE windows to simulate multiple user logins to a web page - in order to avoid sharing login data and whatnot, it turned out to be necessary that I explicitely create a new process each time using CreateProcess.

Frerich Raabe : Accepting this answer since I cannot accept the comment you wrote to my original question. :-) The term 'LCIE' was the key, I googled for it and found that you can disable this feature using a registry key. I acknowledge that this might not work in future versions, but at least it makes the above code work again, so now I have a workaround. Thanks!

EricLaw -MSFT- : CreateProcess alone won't help unless LCIE is disabled, or you pass the -nomerge command line parameter. Which might be enough, even if you don't disable LCIE. :-)
A couple of alternative approaches might be:
- Get a reference via a HWND
- Use a library like WatiN which might help you do whatever your real end goal is if you're trying to automate IE.

WPF, Image MouseDown Event

I have an control with a mouse down event where Id like to chnage the Image when the image is clicked. But I cant seem to alter ANY of the images properties in the event.

Event

    private void Image_MouseDown(object sender, MouseButtonEventArgs e)
    {
        BitmapImage bitImg = new BitmapImage();
        bitImg.BeginInit();
        bitImg.UriSource = new Uri("./Resource/Images/Bar1.png", UriKind.Relative);
        bitImg.EndInit();

        ((Image)sender).Source = null;
        ((Image)sender).Width = 100;
        ((Image)sender).Visibility = Visibility.Hidden;
    }

The event does fire, and even the .Visibility property does not alter the image and make it hidden.

What am I doing wrong?

From stackoverflow

Assuming the file is in your application, you need to use the Pack URI scheme:
```
        var img = sender as Image;
        BitmapImage bmp = new BitmapImage(new Uri("pack://application:,,,/Resources/Images/Bar1.png"));
        img.Source = bmp;
```
In the above example, this would indicate a subfolder in your project of Resources/Images.

PrimeTSS : hmmm Still doesnt change, I have the Bar1.png in a folder /Images/Bar1.pgn and have its property set as a Resource If I deliberatly miss spell the image to barx.png and exeption is thrown saying it cant locate it, so I know its finding it in the resources... Just does not actually update the image to the new one

Joel Cochran : Looking again at your code, you are setting the source to null. I don't see where you are applying the BitmapImage to the Source?

PrimeTSS : APPOLOGIES!!!!!!!! Ive found out why. I have a two templates on this control one is a "Selected" template, I didnt set the mouse down event on this template. Even though the event fired, I think the selected template over wrote the non-selected template that possibly chnaged the bit map but didnt live long enough to display and was over written by the selected template..... Thankyou!

PrimeTSS : The null was just in debugging to see if it changed! Thnakyou for your help!! you have helped me find the issue, and wake up to the fact Ive over looked the problem was with code else where which i didnt post... Thankyou!

PrimeTSS : I still dont get the different forms of uri("........ I wish there was a good white paper on it to understand it better

Joel Cochran : The link in my post above is pretty decent, it outlines many different options.

Which HTML tags are supported in Swing components?

Many Swing components support embedded HTML, but I cannot find any official documentation on that subject. (Everything on Sun's pages about HTML seems to be targeted at JEditorPane)

So: Which HTML tags are supported in Swing components?

EDIT: Although I said that I'm missing "official documentation", I'd also like any "unofficial" documentation.

From stackoverflow

I don't know exactly what tags are supported, but I would suggest that you restrict yourself to bold/italics (or even better strong/em assuming it supports them) and img tags. Anything else is likely to cause headaches, and probably means you're stuffing too much into that component.

Software Monkey : This is really a non-answer.

Draemon : I don't think so - I'm saying that although it probably "supports" more - it's unlikely to support it well, and I've made a suggestion as to what the real issue might be.

instanceofTom : I think this is a valid answer, I found this question through searching because I had the same question, and I think Draemon brings up a good point- "Anything else... probably means you're stuffing too much into that component."
I believe it's a narrow subset of HTML 3.x, although off the top of my head, I don't remember where I read that.

Software Monkey : The Swing text component supports HTML 3.2 (Wilbur) and a reasonable subset of CSS 1.0.
As with most things with Swing, the best course of action is to look at the source.
This guy feels your pain and is at least starting to collect his experiences:

http://retrovirus.com/brunch/2003/04/html-support-in-jeditorpane/

Wednesday, April 6, 2011

Blog Archive