topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Thursday March 28, 2024, 8:55 am
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Author Topic: Separate Out STOCK Symbols From Large Text File  (Read 9337 times)

nkormanik

  • Participant
  • Joined in 2010
  • *
  • Posts: 552
    • View Profile
    • Donate to Member
Separate Out STOCK Symbols From Large Text File
« on: August 27, 2017, 02:01 AM »
Assume:  Stock symbols are composed of CAPITAL LETTERS.

What might be an efficient way of extracting out all stock symbols from a large text file, with stock symbols appearing here and there throughout?

Thanks much!

Nicholas Kormanik


skwire

  • Global Moderator
  • Joined in 2005
  • *****
  • Posts: 5,286
    • View Profile
    • Donate to Member
Re: Separate Out STOCK Symbols From Large Text File
« Reply #1 on: August 27, 2017, 02:04 AM »
Can you provide us with the text file?  Or a sample thereof?

nkormanik

  • Participant
  • Joined in 2010
  • *
  • Posts: 552
    • View Profile
    • Donate to Member
Re: Separate Out STOCK Symbols From Large Text File
« Reply #2 on: August 27, 2017, 02:10 AM »
Here is an example of part of such a text file.  I am not expecting perfection, mind you.  Some further editing will likely be required.  Again, let's assume that NOTHING other than caps will be allowed, so excluding parentheses, colons, commas, numbers, etc.:

(FWONA), Facebook (FB), Universal Display (OLED), Apple (AAPL), SBA Communications (SBAC), and Abiomed (ABMD

The daily volume for an average stock on the S&P 500 is 4 million shares, while the median is slightly more than 2 million. Bank of America Corp’s BAC, -0.29% average daily volume is about 87 million shares, followed by Apple Inc’s AAPL, +0.37% at roughly 54 million. By comparison less than 300,000 shares of Dun & Bradstreet Corp. DNB, +0.28%  changed hands in the average day.
Other standouts include energy, airline and biotechnology stocks - small and large. Since the start of the year, on average, about 20 million shares of Chesapeake Energy CHK, -1.81% changed hands daily, (The stock lost more than half its value since the start of the year.) Halliburton Co. HAL, +0.57% Exxon Mobil XOM, +0.51% Transocean Ltd. RIG, +3.51% Delta Air Lines DAL, +3.25% American Airlines Group AAL, +5.43% General Motors GM, +0.23% and Ford Motors F, +1.03% saw people churn their stocks.

Symbol Company Last Sale* Change Net / % Share Volume
AAPL
Apple Inc. Apple Inc. $ 159.86 0.59 ? 0.37% 24,783,937
QQQ
PowerShares QQQ Trust, Series 1 $ 141.97 0.30 ? 0.21% 24,236,348
TVIX
VelocityShares Daily 2x VIX Short-Term ETN $ 16.90 1 ? 5.59% 22,703,370
CSCO
Cisco Systems, Inc. Cisco Systems, Inc. $ 31.44 0.20 ? 0.64% 18,206,334
JD
JD.com, Inc. JD.com, Inc. $ 40.87 1.20 ? 2.85% 15,610,100
MOMO
Momo Inc. $ 35.59 2.39 ? 6.29% 15,112,039
BRCD


« Last Edit: August 27, 2017, 02:17 AM by skwire »

nkormanik

  • Participant
  • Joined in 2010
  • *
  • Posts: 552
    • View Profile
    • Donate to Member
Re: Separate Out STOCK Symbols From Large Text File
« Reply #3 on: August 27, 2017, 02:18 AM »
If someone here wants to actually create a program to do the above, perhaps it might be called:  Stock Symbols Only.

(Skwire's Stock Symbols Only??)

I think what I am asking for can be accomplished with RegEx, but not sure.

A little Windows program, even a command-line program, asking for the text file input, and spitting out results to a separate file would be perfect.  Prefer symbols to be one per line, but not necessary, and no need to sort.


skwire

  • Global Moderator
  • Joined in 2005
  • *****
  • Posts: 5,286
    • View Profile
    • Donate to Member
Re: Separate Out STOCK Symbols From Large Text File
« Reply #4 on: August 27, 2017, 02:20 AM »
So, from that example, you just want a list like this?:

FWONA
FB
OLED
AAPL
SBAC
ABMD
BAC
DNB
CHK
HAL
XOM
RIG
DAL
AAL
GM
F
QQQ
TVIX
ETN
CSCO
JD
MOMO
BRCD

nkormanik

  • Participant
  • Joined in 2010
  • *
  • Posts: 552
    • View Profile
    • Donate to Member
Re: Separate Out STOCK Symbols From Large Text File
« Reply #5 on: August 27, 2017, 02:24 AM »
Wow.  Ohhh my goodness.  How'd you do that??!!

Exactly!

skwire

  • Global Moderator
  • Joined in 2005
  • *****
  • Posts: 5,286
    • View Profile
    • Donate to Member
Re: Separate Out STOCK Symbols From Large Text File
« Reply #6 on: August 27, 2017, 02:30 AM »
I just created that manually as an example output.

skwire

  • Global Moderator
  • Joined in 2005
  • *****
  • Posts: 5,286
    • View Profile
    • Donate to Member
Re: Separate Out STOCK Symbols From Large Text File
« Reply #7 on: August 27, 2017, 02:54 AM »
As you mentioned in your first post, this sort of thing is a bit inexact and might require a bit of manual cleanup.  However, please try the attached EXE as it should be pretty close to what you want.
« Last Edit: August 27, 2017, 09:25 AM by skwire »

nkormanik

  • Participant
  • Joined in 2010
  • *
  • Posts: 552
    • View Profile
    • Donate to Member
Re: Separate Out STOCK Symbols From Large Text File
« Reply #8 on: August 27, 2017, 04:13 AM »
I placed your sso.exe and a text file, test.txt, into C:\2 and executed, dragging test.txt onto the open box appearing.  Named output file symbols.txt:

Error!
There was a problem saving the output file.  Perhaps it's open in another application.

I tried a number of other attempts, with same error message.

Shades

  • Member
  • Joined in 2006
  • **
  • Posts: 2,922
    • View Profile
    • Donate to Member
Re: Separate Out STOCK Symbols From Large Text File
« Reply #9 on: August 27, 2017, 05:34 AM »
Are you sure that Windows allows you to write in that folder. Run the exe with 'Run as administrator' option.

If you don't want to do that, you better place the executable in your user folder and try again.

nkormanik

  • Participant
  • Joined in 2010
  • *
  • Posts: 552
    • View Profile
    • Donate to Member
Re: Separate Out STOCK Symbols From Large Text File
« Reply #10 on: August 27, 2017, 09:09 AM »
Same error.  Something is apparently wrong with my system.  Bummer.

skwire

  • Global Moderator
  • Joined in 2005
  • *****
  • Posts: 5,286
    • View Profile
    • Donate to Member
Re: Separate Out STOCK Symbols From Large Text File
« Reply #11 on: August 27, 2017, 09:25 AM »
Same error.  Something is apparently wrong with my system.  Bummer.

Nah, it was my fault.  Apologies.   :-[  Please redownload and try again.

nkormanik

  • Participant
  • Joined in 2010
  • *
  • Posts: 552
    • View Profile
    • Donate to Member
Re: Separate Out STOCK Symbols From Large Text File
« Reply #12 on: August 27, 2017, 05:38 PM »
Beautiful!  Works exactly as I was hoping!

Hope others out there might find and use this as well.

I'm renaming my personal copy: Skwire's Stock Symbols Only.

Thanks a million, Skwire.

4wd

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 5,641
    • View Profile
    • Donate to Member
Re: Separate Out STOCK Symbols From Large Text File
« Reply #13 on: August 28, 2017, 12:19 AM »
@skwire: What regex did you use, (if you did)?

I had a quick look at making a single line PoSh command but my brain wasn't up to the nuances of stopping it grabbing every uppercase letter.  regardless of what followed :-\

Ath

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 3,612
    • View Profile
    • Donate to Member
Re: Separate Out STOCK Symbols From Large Text File
« Reply #14 on: August 28, 2017, 01:32 AM »
Well, I tried a regex too, but I couldn't fix 1 issue.
My regex:
\b[A-Z]\b
But it keeps grabbing both the S and P from S&P, and I didn't want to replace the \b with a (rather complex) group construction of all possible punctuation to solve this so I stopped searching.

skwire

  • Global Moderator
  • Joined in 2005
  • *****
  • Posts: 5,286
    • View Profile
    • Donate to Member
Re: Separate Out STOCK Symbols From Large Text File
« Reply #15 on: August 28, 2017, 11:14 AM »
@skwire: What regex did you use, (if you did)?

What ended up working the best was:

1. Loop through each line.
2. Crack each line on its spaces.
3. Evaluate each part with: [A-Z]+\b

It's not quite perfect, and I'm not sure it ever could be, but it seems to be close enough for the OP's work.

4wd

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 5,641
    • View Profile
    • Donate to Member
Re: Separate Out STOCK Symbols From Large Text File
« Reply #16 on: August 28, 2017, 08:23 PM »
I was trying to end up with something along the lines of:

Code: PowerShell [Select]
  1. (Get-Content -Path .\test.txt) -creplace "\b[^A-Z]+\b" " " | Set-Content -Path out.txt

Would have left spaces between the remaining terms to make output formatting easier ... wasn't getting the RegEx though.

That was the theory anyway  ;D

Now I think about it I could have split the input easily:
Code: PowerShell [Select]
  1. [regex]::Split((Get-Content -Path .\test.txt),'[\s,\.\(\)]') | %{ if($_ -cmatch('^[A-Z]+$')){Write-Host $_}}

Output is OK except for outputting SBA and SBAC and duplicated entries.

@skwire: How did you stop SBA being passed through since SBAC is the stock code, (considering codes occur both inside and outside of surrounding brackets)?

You must be checking for duplicated entries also since there are multiple occurrences of JD and QQQ.

Was something to blow a few cobwebs out of my head anyway  :P

Addendum:
Duplicates removed courtesy of http://www.secretgeek.net/ps_duplicates
Code: PowerShell [Select]
  1. $hash = @{};[regex]::Split((Get-Content -Path .\test.txt),'[\s,\.\(\)]') | %{if($_ -cmatch('^[A-Z]+$')){if($hash.$_ -eq $null) { $_ }; $hash.$_ = 1}} > .\output.txt
Output
FWONA
FB
OLED
AAPL
SBA
SBAC
ABMD
BAC
DNB
CHK
HAL
XOM
RIG
DAL
AAL
GM
F
QQQ
TVIX
VIX
ETN
CSCO
JD
MOMO
BRCD

« Last Edit: August 29, 2017, 05:36 AM by 4wd »

skwire

  • Global Moderator
  • Joined in 2005
  • *****
  • Posts: 5,286
    • View Profile
    • Donate to Member
Re: Separate Out STOCK Symbols From Large Text File
« Reply #17 on: August 29, 2017, 02:24 PM »
You must be checking for duplicated entries also since there are multiple occurrences of JD and QQQ.

Yes, at the end of the list processing, I do an alpha sort and filter out duplicates.
« Last Edit: August 29, 2017, 02:31 PM by skwire »