choose2005
如何从网页代码中抽提展现在网页中的文字
[color=Red][/color]我想能在ie浏览器中看到的东西,全部抽提出来,但是不能有"<></>...."之类的代码
如: <table border="0" cellspacing="3" cellpadding="1">
<tr><td>[color=Red]Mapped EST Accession[/color][color=Red]:[/color]</td><td><b>BE399426</b>     [<a href=http://www.graingenes.org/cgi-bin/WebAce/webace?db=graingenes&class=Probe&object=BE399426>GrainGenes</a>   |  <a href=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=search&db=nucest&term=BE399426>NCBI</a>   |  <a href=http://wheat.pw.usda.gov/cgi-bin/westsql/est_blast.cgi?q=BE399426&t=a>[color=Red]wEST-SQL[/color]</a>]     <font color="red">[color=Red]Sequence Tagged Site in Relevant Diploid[/color]</font></td></tr>
<tr><td>Orthologous Loci by Contig:</td><td><a href="/snpworld/Search?contigName=NSFT03P2_Contig14337&chromosome=2&genome=D">[color=Red]NSFT03P2_Contig14337[/color]</a></td></tr>
<tr><td>Bin:</td><td><a href="/snpworld/Search?bin=2DL3-0.49-0.76">[color=Red]2DL3-0.49-0.76[/color]</a></td></tr>
<tr><td>Forward Primer Name:</td><td><a href="/snpworld/Search?primer=BE399426_cpF1&chromosome=2&genome=D">[color=Red]BE399426_cpF1[/color]</a></td></tr>
<tr><td>Reverse Primer Name:</td><td><a href="/snpworld/Search?primer=BE399426_cpR1&chromosome=2&genome=D">[color=Red]BE399426_cpR1[/color]</a></td></tr>
<tr><td>[color=Red]Chromosome/Genome:</[/color]td><td>2D</td></tr>
<tr><td>[color=Red]Ref Plant:</[/color]td><td>Ae. tauschii, Armenia (At01, D) </td></tr>
<tr><td valign="middle">[color=Red]Ref Sequence[/color][color=Red]:[/color]</td><td><pre> [color=Red] 10 20 30 40 50
TTTGGAAATATCCTGTTACTGCTGCTGATGCATTCTTATTTTTTTTTCAT
GTATGATCTCCAGGCTGTTCGAGTTGGGGACTTAGAAGTGTTTAGAGCTG
TTGCAGAGAAATTTGGGAGCACTTTCAGTGCCGACAGGACATCCAATTTG
ATCGTGAGGCTGCGCCACAACGTCATCCGGACCGGACTACGCAACATTAG
CATTTCCTACTCACGTATCTCCCTTGCTGACATTGCCAAGAAACTGAGGC
TAGATACTAAGACCGCTGTTGCTGATGCTGAGAGCATTGTAGCCAAGGCC
ATCAGAGATGGGGCAATTGATGCCACCATTGATCATGCCAATGGCTGGGT
GGTGTCGAAAGAGACTGGCGACGTTTACTCAACAAACGAGCCACAGGCTG
CGTTTAACTCCAGGATTGCGTTCTGCCTGAACATGCACAACGAGGCAGTC[/color]AAGGCTCTGAGGTTCCCCCCGAATTCTCACAAGGAAAA [488 bases] </pre></td></tr>
<tr><td>[color=Red]Exon Ranges:</[/color]td><td>[color=Red]64-488[/color]</td></tr>
<tr><td>[color=Red]Intron Ranges[/color][color=Red]:[/color]</td><td>[color=Red]1-63[/color]</td></tr>
<tr><td>[color=Red]Lab[/color][color=Red]:[/color]</td><td>[color=Red]UCD[/color]</td></tr>
</table>
即把红色部分提出来
junonly
i think this script can do your work basically,you can try it
[code]
#!/usr/bin/perl -w
open(IN,"d://c.txt") || die "ERROR";
open(OUT,">d://d.txt") || die "FLAG ERRORS!";
$text = "";
while($ri = <IN>) {
if ($ri =~ s/(/w+?)(/n)/$1/g){}
$text .= $ri;
}
if($text =~ s//<.+?/>//g) {}
if($text =~ s// //g) {}
if($text =~ s/(/t+?)//t/g) {}
print OUT $text;[/code]